I teach entrepreneurship at Columbia University. We devote the second-to-last class to a Harvard Business School case study about the early years of Zipcar. It’s a good case and we can review most of what we learned through the lens of the problems Robin Chase faced just before raising her second round of financing.
In the last class we talk about fundraising and venture capital. For continuity, I decided to use Zipcar as an example of the mechanics and norms of VC funding: pre- and post-money, preferred stock, rounds, dilution, etc. Since Zipcar had been a public company, I figured public documents would contain most of the numbers I would need to piece together the story.
This is that story, as far as I could really piece it together. It was stranger than I had assumed it would be, and that was good. My general philosophy about taking VC money is as Rilke’s to writing: if you don’t feel you absolutely must, then don’t. The story of Zipcar’s financing illustrates one of the many ways a company can be successful while its founders are not. It also shows some major mistakes by early investors.
This post isn’t a primer on venture capital, it’s a real-world example of how things can play out. I know that I’m throwing you in the deep end, but Fred Wilson and Brad Feld have explained the mechanics well: if you don’t recognize a term, visit one of their blogs and search for it.
Most of the information in this analysis is drawn from the 424(b)(4) form that Zipcar filed with the SEC in April 2011, soon after their IPO. The 424 is called a Prospectus. The SEC requires a form 424 after an IPO and it contains much of the business and financial information about a company that a public market investor needs to make a rational investment decision. The SEC only requires most of the information required for the 424 to reflect the three year prior to the filing, so many details about the early years of the company can only be inferred from agreements and ownership that survived until the three years covered in the filing. I have had to make some guesses to fill in the gaps.
One thing to note in this analysis is that I refer to the number of shares on an “as-converted” basis. This is important because the company did a 1:2 reverse split before the IPO–probably to make sure the trading price at IPO would be above $10/share. Because of this, all of the number of preferred shares in the 424 need to halved to know how many shares the holder will own after the IPO. That is, if a shareholder owned 1,000 Series B shares, those preferred shares will convert into 500 Common shares at the IPO. It is easier to compare the value of different types of shares over time using their Common equivalents, so that is what I did. The base data and calculations I made are in this Google Sheet, with references back to the Prospectus. I don’t think you all need to refer back to the 424 but included the references so if you want to have a go at reading the SEC document and tying it back to the actual business, you can. I have found the ability to parse SEC documents extremely useful in my finance career so if finance is the road you’re going down, reading through and understanding a prospectus is a good thing to be able to do. Just sayin’.
The second one thing to note is that some of the breadcrumbs around ownership in the early years come from interviews in the media with Chase or others. Chase, in particular, seems to be an unreliable narrator. For instance, in one interview she says “When I finished raising a $7 million round of financing in 2003”^{1}, but the financials show that only $4,000,000 was raised in the round that closed that year. And that round was announced as a $2 million round by Crunchbase^{2}. The information on Crunchbase seems to reflect the first close in a larger round and Chase may be confused about timing, with $4.7 million closed in December of 2002, $4 million closed in November of 2003, and a $2 million bridge note somewhere in that time-frame as well. The point is: what people say to the media may not be accurate, for whatever reason. I relied on SEC documents for the facts, and considered other information interesting but not definitive.
Founding
The company was formed in 2000 by Antje Danielson and Robin Chase. They split the equity 50/50. This analysis assumes they each received 570,000 shares of the new company.
This number, how many shares the founders started with, is one of the most speculative assumptions in the analysis. There is no way of really knowing from public information. In fact, I am pretty sure it isn’t exactly right: who starts a company and decides each founder gets 1,140,000 shares (pre-reverse split)? Why not exactly 1,000,000 each or 2,500,000 to split? That said, it’s my best guess. I arrived at it from three directions:
There is also the possibility that Chase or Danielson or both sold shares back to the company at some point (probably not in a secondary because then they would still be outstanding Common shares). If this happened it was probably after the Series C because the company wouldn’t have wanted to part with the cash before then. There is no way to know from the information I have.
Assuming this, at founding the capitalization table–the enumeration of who owns what shares–would have looked like this (shares and dollars in the cap tables are in thousands):
Owner | Date | Common Stock | Preferred Stock | Total Stock | % | Investment | |
Chase | 1/2000 | 570 | 570 | 50.0% | |||
Danielson | 1/2000 | 570 | 570 | 50.0% | |||
. | |||||||
Total | 1,140 | 0 | 1,140 | $0 |
Series A
The Zipcar case says “[Chase and Danielson] had incorporated in January 2000 and raised their first $50,000 from one angel investor…By October, the fledgling company had 19 vehicles, nearly 250 members, and the founders had raised—and spent—an additional $325,000 to fund the early stages of operations…Beginning in early 2000, Chase had made a series of presentations to potential investors in which she sought $1 million in capital.”^{5}
The Prospectus tells us how many shares of each series of preferred stock are outstanding as of the IPO and what their preference is. Since venture capital preferred stock always has a purchase price equal to the preference, from this we can tell for each series how much was invested and how many shares were bought. And indeed, it shows that in October 2000 the company raised $1,035,606 by selling shares of Series A stock with a preference of $3.80 per share (again, on an as-converted basis): 272,528 shares.
In an interview years later, Chase said “The decision to expand to other cities came after we closed $1.3 million in Series A financing.”^{6} Similarly, the MIT version of the Zipcar case study says “With just three weeks until the company closed on its first round of funding worth $1.3 million…”^{7} I’m going to assume that the difference between the $1.3 million reported as raised and the $1.0 million actually raised is the amount of convertible notes raised before the Series A. If we assume the HBS case misspoke when it said $50,000 plus an additional $325,000 and meant simply $325,000 total, then the $1.3 million makes sense. (If it were indeed $375,000 on top of the $1,035,606 then they would have almost certainly rounded to $1.4 million. In general, everyone involves likes reporting larger numbers).
But then this means the Convertible Notes converted into Common stock, not Series A Preferred. That isn’t unusual, although it’s a really bad idea if you’re the investor, as you’ll see in a minute. I assume the Notes converted at 85% of the Series A price (in my experience, 15% was a more usual discount in 2000, while 20% is more usual now). This resulted in an additional 100,619 Common shares.
I also put in an option pool equal to 20% of the post-round company. This is fairly usual, although the size of the pool can vary. But I was also trying to get the founders closer to the “around 20 percent” mentioned in the quote above. This doesn’t get them there, but I can’t square the 20% after the first round with the 10% after the second round anyway. I think the quote given to CNBC is probably inaccurate.
Cap Table After the Series A
Owner | Date | Common Stock | Preferred Stock | Total Stock | % | Investment | Value |
Chase | 1/2000 | 570 | 570 | 30.1% | $2,182 | ||
Danielson | 1/2000 | 570 | 570 | 30.1% | $2,182 | ||
Convertible notes | ?/2000 | 101 | 101 | 5.3% | $325 | $382 | |
Series A investors | 10/2000 | 273 | 273 | 14.4% | $1,036 | $1,036 | |
. | |||||||
Options + pool | 378 | 378 | 20.0% | ||||
. | |||||||
Total | 1,619 | 273 | 1,892 | $1,361 | $5,750 |
Some points I made to my class here:
Series B
In December 2002 the company raised another round of financing, the Series B.
Just before that, Chase fired Danielson. You may think this is strange, since they were equal partners, but the company had formed a board of directors after the first venture round and the board gave Chase the ability to fire anyone at her discretion. Remember that when you take money from VCs you are giving them some say over how your company is run. When things are going well, this doesn’t matter. When things are going badly, it does. And in the run-up to the Series B things were going badly.
The dot-com bubble popped in March 2000, but many were convinced that the market would come back, that it was just a temporary setback, like in 1997. So Zipcar’s Series A in October 2000 was fully priced. But the market didn’t come back and 2002 was the worst time to raise venture money in the last quarter-century, aside from 2003. By then most investors wouldn’t touch internet companies with a ten-foot pole. The Series B price reflected that.
The company raised $4.7 million at a price of $1/share.
One dollar per share is much lower than the Series A price of $3.80/share. This was a down round. The holders of Common stock found their shares worth less (on paper) than they had been previously. This includes the holders of the Common shares issued on conversion of the Convertible Notes. The holders of the Series A seem to have had a contractual mitigator though: an anti-dilution clause^{8}.
Anti-dilution works this way: if you buy shares for a certain price (and your purchase contract grants you anti-dilution: this only exists if you specifically negotiate it when buying the shares) and then, later, the company sells shares for a lower price, your price is effectively lowered as well. How much it is lowered depends on the type of anti-dilution right. If you have full-ratchet anti-dilution, then your price is effectively lowered to the latest, lower price. More common is weighted-average anti-dilution where your price is lowered to somewhere between your price and the new, lower price.
In this case, where there were a lot of new shares issued at $1/share versus the few shares that were issued at $3.80/share, the effective price of the Series A was reduced to $1.49^{9}^{,}^{10}.
Owner | Date | Common Stock | Preferred Stock | Total Stock | % | Investment | Value | Value @ Last Round |
Chase | 1/2000 | 570 | 570 | 8.3% | $570 | $2,166 | ||
Danielson | 1/2000 | 411 | 411 | 6.0% | $411 | $2,166 | ||
Convertible notes | ?/2000 | 101 | 101 | 1.5% | $325 | $101 | $382 | |
Series A investors | 10/2000 | 695 | 695 | 10.1% | $1,036 | $695 | $1,036 | |
Series B investors | 12/2002 | 4,704 | 4,704 | 68.6% | $4,704 | $4,704 | ||
. | ||||||||
Options + pool | 378 | 378 | 5.5% | |||||
. | ||||||||
Total | 1,460 | 5,400 | 6,860 | $6,065 | $6,481 |
The pre-money here is $2.2 million and the post-money is $6.9 million. You can see that the founders got slammed: Chase’s ownership went from ~30% to less than 9% and the notional value of her shares went down 74%. The value of the Common shares issued for the Convertible Notes was also down 74%. But note that the value of the Series A shares was only down 33% because of the anti-dilution.
The people who invested in the Convertible Notes probably thought they were getting whatever the Series A investors would get, but at a cheaper price. But if it’s true that they ended up with common shares instead of preferred then they were wrong, very wrong. If they had gotten the Series A anti-dilution they would have owned more than 2.5x what they did.
I was told that when she was fired Danielson “walked away” from some of her shares, meaning she either gave them back to the company or the company bought them from her. This lead to her having fewer shares than Chase, as discussed above.
Series C
In 2003 Robin Chase was fired and the company hired a new CEO, Scott Griffith. While most startups don’t so closely resemble a season of Game of Thrones, it’s not surprising this happened immediately after the Series B investors took a controlling stake in the company. Griffith was granted 700,000 shares of restricted stock. Restricted stock usually has some conditions attached, such as vesting.
In November 2003 the company raised $4 million at $1.40/share. I had not increased the option pool before the Series B raise, but here I increase it so that it is again 20% of the company’s shares. It is not unusual for the option pool to be increased at each raise, although the pool as a percentage of the whole company usually shrinks as the company gets larger. In this case, the pool plus exercised options at IPO was 18.6%, I kept the pool at 20% until the Series F.
Owner | Date | Common Stock | Preferred Stock | Total Stock | % | Investment | Value |
Chase | 1/2000 | 570 | 570 | 4.5% | $798 | ||
Danielson | 1/2000 | 411 | 411 | 3.3% | $575 | ||
Convertible notes | ?/2000 | 101 | 101 | 0.8% | $325 | $141 | |
Series A investors | 10/2000 | 695 | 695 | 5.5% | $1,036 | $973 | |
Series B investors | 12/2002 | 4,704 | 4,704 | 37.5% | $4,704 | $6,586 | |
Griffith restricted stock | 2/2003 | 700 | 700 | 5.6% | $980 | ||
Series C investors | 11/2003 | 2,857 | 2,857 | 22.8% | $4,000 | $4,000 | |
. | |||||||
Options + pool | 2,510 | 2,510 | 20.0% | ||||
. | |||||||
Total | 4,292 | 8,257 | 12,549 | $10,065 | $14,054 |
The pre-money is $13.6 million, and the post-money is $17.6 million.
Series D
The Series D was led by Benchmark in January 2005. The company raised $11.7 million at $2.32/share. The pre-money was $32 million and the post-money was $44 million.
If you want to see how much a company grew in value between rounds, the right way to do it is by share price. Going from $1.40 to $2.32 is 66% growth. If you compare the post-money in the last round to the pre-money in this round, you go from $17.6 million to $32 million: 82% growth. The difference here is in the options being added to the option pool.
You may ask why unissued options are included in valuations at all (by including them, the price per share is lower at any given company valuation). Or why they are usually added to the pool before the round rather than after (if they were added after the round, the additional pool would dilute both the previous holders and the new investors; adding them to the pool before the round dilutes only the previous holders). This is the wrong thing to worry about. Focus instead on price per share. You, as a founder or previous investor, own a certain number of shares. This doesn’t usually change, unlike your percentage ownership. The number of shares you own and the price per share now and in the future are what should matter to you.
I’m not going to insert the cap table for each of these rounds. If you want to see it, visit the Google Sheet: there is a tab for each round.
Series E
The Series E was also led by Benchmark, with Greylock investing. The company raised $25 million at $7.32/share in November 2006. The pre-money was $152 million and the post-money was $177 million. The price per share grew by 216% in less than two years. Nice round.
Series F and Streetcar Acquisition
The Series F was different. These shares were issued to acquire Flexcar, one of Zipcar’s competitors, in November 2007. They amounted to 30% of the as-converted equity of the combined company.
There were also Warrants issued as part of the purchase price for Flexcar. Warrants in a startup function exactly like options: they allow the holder to purchase shares in the future at a set price. If these are issued to employees they are called options. If they are issued in acquisitions or to partners they are called warrants. Warrants are often issued to banks when they agree to lend money, and sometimes to landlords when they lease property, etc. I don’t include these warrants explicitly in the cap table below, because they come out of the option pool so they are in that line.
In April 2010 Zipcar also acquired a European competitor, Streetcar. They issued 4.1 million shares of Common for this acquisition. Why was the Flexcar acquisition paid for in preferred stock while the Streetcar acquisition was paid for in Common stock? Usually when an acquisition is made with preferred it is because the target is also venture backed and the venture investors want their preference in the target company replaced by preference in the acquiror. The acquiror would usually rather not have more preference because preference amounts make a difference when the company is sold for less than what the preferred stock holders would get if they converted; in some of those cases there is not enough money to pay back all the preferred and the sale price is then divvied up based on how much preference there is.
A simple example: Company A has one investor, Ms. V, who bought 20% of the company for $1,000,000 at $1/share. She received preferred stock.
Now imagine that Company A buys Company B and issues 1,000,000 Common shares to pay for it. This dilutes Ms. V from 20% to 16.67% ownership. So now
If instead Company A had issued preferred shares with a $1/share liquidation preference to acquire Company B, then Ms. V is in a different position. In this case,
In buying Company B, Company A has raised the threshold for Ms. V to get a positive return on her investment, as well as the threshold for getting her money back in full. This all makes sense. But by using preferred to buy Company B, it has halved her cut of any amounts below $2,000,000.
I’m sure you’re not crying for Ms. V, but think about it from the point of view of Company A’s Common shareholders: they only make money after the preferred is paid back. By doubling the amount of preference the company has doubled the sale price at which the Common take home any money at all. This is called the liquidation preference overhang or just liquidation overhang. It may seem like not a big deal in this example but imagine this scenario with the dollar amounts multiplied by a hundred.
Issue common when you can.
Series G
The Series G was led by Meritech in December 2010. The company raised $21 million at $15.22/share. The pre-money was $539 million and the post-money was $560 million. The share price grew 108% in the four years since the Series E. Much slower growth, but not so bad.
I did not increase the option pool for the Series F because it was not a real round. And I used the actual number of options (and warrants and restricted stock) from the Prospectus here.
Owner | Date | Common Stock | Preferred Stock | Total Stock | % | Investment | Value |
Chase | 1/2000 | 570 | 570 | 1.5% | $8,675 | ||
Danielson | 1/2000 | 411 | 411 | 1.0% | $6,255 | ||
Convertible notes | ?/2000 | 101 | 101 | 0.3% | $325 | $1,531 | |
Series A investors | 10/2000 | 695 | 695 | 1.8% | $1,036 | $10,581 | |
Series B investors | 12/2002 | 4,704 | 4,704 | 11.9% | $4,704 | $71,601 | |
Griffith restricted stock | 2/2003 | 700 | 700 | 1.8% | $10,654 | ||
Series C investors | 11/2003 | 2,857 | 2,857 | 7.3% | $4,000 | $43,491 | |
Series D investors | 1/2005 | 5,059 | 5,059 | 12.8% | $11,736 | $76,991 | |
Series E investors | 11/2006 | 3,249 | 3,249 | 8.2% | $25,015 | $49,445 | |
Series F (Flexcar) | 11/2007 | 7,154 | 7,154 | 18.2% | $108,881 | ||
Streetcar acquisition | 4/2010 | 4,093 | 4,093 | 10.4% | |||
Series G investors | 12/2010 | 1,380 | 1,380 | 3.5% | $21,000 | $21,000 | |
. | |||||||
Options + pool | 8,445 | 8,445 | 21.4% | ||||
. | |||||||
Total | 11,686 | 25,098 | 36,784 | $67,816 | $409,106 |
IPO
Just five months after the Series G Zipcar went public. On April 19, 2011 public market investors bought 9,684,109 shares of common stock. 6,666,667 of these shares were sold by Zipcar directly, for $18/share, less the7% underwriting banks took as their fee. Zipcar ended up with $111 million of proceeds, and the banks took home $12 million for their trouble. Why do investment banks make this much for being an intermediary? No good reason. But very few companies are able to negotiate a lower price, though you can if you’re big enough; SNAP seems to have paid 2.5%^{12}. Even fewer are willing to buck the process entirely, as Google and Spotify did.
The other 3 million shares were sold by existing shareholders. This is not reflected in the below cap table.
You can see below the return and rough IRR each of the investors made at the IPO (using the IPO price and date, that is).
Owner | Date | Common Stock | Conv Pfd Stock* | Total Stock | % | Investment | Value @ IPO | Return | Rough IRR |
Chase | 1/2000 | 570 | 570 | 1.3% | $10,260 | ||||
Danielson | 1/2000 | 411 | 411 | 0.9% | $7,398 | ||||
Convertible notes | ?/2000 | 101 | 101 | 0.2% | $325 | $1,811 | 5.6x | 26% | |
Series A investors | 10/2000 | 695 | 695 | 1.5% | $1,036 | $12,514 | 12.1x | 27% | |
Series B investors | 12/2002 | 4,704 | 4,704 | 10.4% | $4,704 | $84,679 | 18.0x | 39% | |
Series C investors | 11/2003 | 2,857 | 2,857 | 6.3% | $4,000 | $51,435 | 12.9x | 38% | |
Series D investors | 1/2005 | 5,059 | 5,059 | 11.2% | $11,736 | $91,054 | 7.8x | 39% | |
Series E investors | 11/2006 | 3,249 | 3,249 | 7.2% | $25,015 | $58,477 | 2.3x | 21% | |
Series F (Flexcar) | 11/2007 | 7,154 | 7,154 | 15.8% | $128,768 | ||||
Series G investors | 12/2010 | 1,380 | 1,380 | 3.0% | $21,000 | $24,836 | 1.2x | 50% | |
Streetcar acquisition | 4,093 | 4,093 | 9.0% | $73,670 | |||||
Exercised opts & rest stock | 1,448 | 1,448 | 3.2% | $26,059 | |||||
Option pool & warrants out | 6,997 | 6,997 | 15.4% | ||||||
IPO shares | 4/2011 | 6,667 | 6,667 | 14.7% | $120,000 | $120,000 | |||
Total | 20,286 | 25,098 | 45,383 | $187,816 | $690,948 |
If my numbers are correct, Chase had stock valued at about $10 million at the IPO. Danielson had $7 million. These are not numbers to sneeze at. On the other hand, starting a company that becomes worth $700 million and goes public is unusual. The reward for doing so should be large.
Other things to note:
Soon after the IPO the stock price climbed to $29/share, valuing Zipcar at more than $1.3 billion. But it didn’t last. The price fell over the next few years, slipping down under $8.25/share. On January 2nd, 2013, Avis announced they wold buy Zipcar for about $500 million, or $12.25/share. The acquisition was completed on March 14, 2013. Scott Griffith announced he was leaving the company the next day.
http://fortune.com/2012/12/04/robin-chase-zipcars-founder-finds-a-new-gear/ ↩
https://www.crunchbase.com/funding_round/zipcar-series-c–d060b78a ↩
https://www.theverge.com/2014/4/1/5553910/driven-how-zipcars-founders-built-and-lost-a-car-sharing-empire ↩
http://fortune.com/2012/12/04/robin-chase-zipcars-founder-finds-a-new-gear/ ↩
https://mitsloan.mit.edu/LearningEdge/CaseDocs/14-153.Robin%20Chase%20and%20Zipcar.FINAL.pdf ↩
I can’t know this for sure, but the conversion ratio from preferred to common in the prospectus is 2 for all classes of preferred shares except the Series A, where it is 0.784. That means that for every 3 shares of Series B, 1.5 shares of Common are issued at the IPO, but for every 3 shares of Series A, 4 shares of Common are issued. I can’t think of any other reason for this than an adjustment to the conversion price as a result of anti-dilution being triggered. ↩
The mechanism for this is not for more Series A shares to be issued by the company, but for the conversion price of the shares to change. The conversion price starts at the price actually paid but is changed by the anti-dilution clause. The number of Common shares issued when the preferred shares are converted is determined by multiplying the number of preferred shares by the ratio of the conversion price divided by the original price. If you do the math, this works out to be the number of shares you would own if you divided your original investment amount by the conversion price instead of the original price per share. ↩
When doing the math, there’s something else strange going on. To get to the actual conversion price as mentioned in the Prospectus, the number of shares outstanding before the Series B round must be about 997 thousand. This doesn’t seem to count the option pool (which is usually counted), the Series A stock, or the Common issued to the Convertible Note investors. In fact, it only seems to count the founders’ stock. If this is how it is written, it would be the first time I’ve ever heard it done that way. I assume I am just missing something. ↩
It doesn’t always work out this way even if the letter of the contracts says it should. Sometimes acquirors insist that Common holders, especially if they are founders, receive part of the purchase price. Sometimes investors give up part of their preference in deference to the founders. Every contract is both the end and the beginning of a negotiation. ↩
https://www.sec.gov/Archives/edgar/data/1564408/000119312517068848/d270216d424b4.htm ↩
I love his podcast, he asks great questions and tailors them to his guest. You get a distilled view of what they’re thinking about.
He and I talk about some of the things I’ve written about on this blog:
All in just a bit more than 20 minutes. Here’s the splash page, or listen to it here.
]]>If you want to know exactly how much to diversify, that’s a harder question. There are a lot of articles and posts out there that give you advice on how big your angel portfolio should be. I think they’re wrong. Maybe not in the advice they give, but in their methodology. They are a dead end.
Every person in venture, when pushed on why either so many companies don’t succeed or on why any young company deserves to be valued at $1 billion or more, says that it’s because venture-backed companies follow a power law. But when they think about portfolio construction, they treat outcomes as some other sort of distribution, one that’s easier to reason about. This post will take the hard road and try to reason about power law distributions. That means math, and code.
Portfolios in the Normal World
First, a thought experiment. Let’s say someone gives you a die and tells you that if you roll a six you win twelve dollars for every dollar you bet, otherwise you lose: you bet $100. If you roll a 1, 2, 3, 4, or 5 you lose your $100; if you roll 6 you get $1200. It’s a good bet for you: you end up with nothing five-sixths of the time but with $1200 one-sixth of the time. Your expected value is $200, double your bet.
Now the person you are wagering against asks you to divvy up your $100 into some number of equal bets. How many bets should you make?
On the one hand, whether you make 1 bet or 1000 bets, the expected value remains the same: $200. But if you make one bet, you will probably lose your entire bankroll, with a small chance of multiplying it twelvefold. If you make 1000 bets, you will almost certainly end up with exactly twice your bankroll. This is just the Law of Large Numbers: the average of a sample tends to approach the expected value as the size of the sample increases: the more bets you make, the more your average outcome looks like the average of the distribution itself. The standard deviation of the outcome gets smaller and the probability of results far from the expected value decreases.
I simulated this in code. The first chart is a simulation where you bet $5 on each of twenty rolls. The second chart is where you bet $0.50 on each of 200 rolls. I ran each simulation 10,000 times and plotted the distribution of outcomes.
If you look at the first chart, it shows you end up with less than you started with more than 10% of the time. You most often end up with around $200, of course, but there’s an excellent chance you end up with more than $300.
When you divvy your bankroll into 200 parts–the second chart–you almost never lose money^{1}. But you also almost never make more than $300. Your outcome of your portfolio of bets is more predictable when you make more of them.
This is the the entirety of the content of the advice that you should have a lot of companies in your venture capital portfolio: you have a lower chance of losing money the more companies you invest in. Of course, by the same reasoning you also have a lower chance of making a lot of money, although the people giving the advice rarely highlight that part. A ton of bets is the right way to match the market return with this sort of distribution of outcomes–distributions with a finite mean and variance, including all distributions with a finite number of possible outcomes (like rolling a die) as well as common distributions like the normal distribution and the lognormal distribution.
Monte Carlo Simulations
Venture capital is more complicated than throwing a die. There are not only more than six possible outcomes, there are, at least, billions of possible outcomes. The example with the die could have been calculated fairly simply: it’s a textbook binomial distribution. But when you have a lot of outcomes it’s easier to have a computer simulate picking samples from a distribution and take averages over thousands of simulated portfolios. This process is generally called a Monte Carlo Simulation.
As an example, let’s take the very high-level distribution of venture returns from Fred Wilson’s 2012 blog post, The Power of Diversification: “the average startup has a 33% chance of making money for the investors, a 33% chance of returning capital, and a 33% chance of losing everything and that only 10% will make a big return (>10x).” The actual model he uses in the post is slightly different, here it is:
\[\begin{align*}There’s a 40% chance any single company returns nothing, a 30% chance it returns what you invested, a 20% chance it returns three times what you invested, and a 10% chance it returns ten times. Call this the Basic Model. Note that its expected value is \(30\%*1 + 20\%*3 + 10\%*10=1.9\).
To Monte Carlo simulate the Basic Model you would choose a portfolio size, n, pick n outcomes from the distribution, and take the average of those n picks. This is the outcome of one random portfolio. Do this thousands of times and chart the distribution of averages. Then vary n to find the distribution of averages for different portfolio sizes.
Below is Python 2.7 code using numpy and pyplot to simulate the Basic Model.
Here are several runs with different portfolio sizes.
As with the die example, the larger the portfolio, the less variance from the expected value, 1.9.
Fred’s post says: “If you make just one investment, you are likely going to lose everything. If you make two, you are still likely to lose money. If you make five, you might get all your money back across all five investments. If you make ten, you might start making money on the aggregate set of investments.” The below runs of the model show what he means.
If your portfolio size is 1, you lose everything 40% of the time. You make money 30% of the time. And you get your money back 30% of the time. As the portfolio size grows, your chance of losing money shrinks and your chance of making money grows. (The average stays the same.)
Portfolio Size | Percent <=1x | Percent >1x | Mean |
1 | 70% | 30% | 1.9x |
5 | 35% | 65% | 1.9x |
10 | 20% | 80% | 1.9x |
This example doesn’t allow for the small probability of a single company returning more than 10x. This would make sense if extreme outcomes are extremely unlikely and not really that far from the mean. Bucketing them into a catch-all like 10x is accurate enough (and, in the pedagogic sense Fred is embracing, anything above 10x is gravy; better to be conservative when giving advice to amateurs.)
This model not only answers “how many companies should be in my portfolio so that I have less than a 10% chance of losing money?”, it shows you that as you make more investments in a portfolio, you are taking less risk. The average will always be close to 1.9x, even with small portfolios, but the variability (or risk) will shrink as the portfolio size grows. This is just the law of large numbers at work. But the law of large numbers does not work with fat-tailed distributions.
Fat-tailed Distributions
All of these Monte Carlo simulations have one crucial problem: the full range of possible outcomes is much, much larger than the sample the simulation draws from. This is obvious in the Basic Model, where there are only four possible outcomes^{2}. It’s less obvious when the set of possible outcomes is much larger.
Many of the articles that recommend a minimum portfolio size for early stage investors use data from real investing outcomes. (Here are some posts using data from the Angel Investor Portfolio Project that the Kauffman Center ran: link, link, link, link, link. I believe this one relies on the same data, but I’m not positive.) The AIPP collected actual outcomes on 1,137 angel investments over many years^{3}.
The relative strengths of using AIPP dataset is that there is no bucketing–each possible outcome is an actual outcome–and the large number of outcomes represented. Monte Carlo Simulations using the data pick portfolios from these 1,137 outcomes. The implicit argument is that the outcomes that are not in the AIPP data must be not very different from the ones that are, because the sample is so large, and that any real outliers are so unlikely that their expected value doesn’t change the average outcome much at all.
But is this true?
Take the AIPP, with it’s average outcome of 2.43x (after the data is cleaned.) Then add in a few outcomes that weren’t in their data: a 10,000x (estimate of Google seed multiple), a 5000x (estimate of Uber seed multiple), and a 3000x (estimate of WhatsApp seed multiple). If you add these the average of the distribution increases from 2.43x to more than 18x. This is hardly fair because I’m cherry-picking, but you see the point: the extremely unlikely tail of the venture capital outcome distribution contributes a meaningful amount to the mean outcome^{4}. This is the “fat tail” or “black swan” outcome talked about by Taleb in his popular book. With some assumptions, we can quantify it.
Assumption 1: The outcomes of venture capital investments are power-law distributed. Here we’ll talk about outcomes as multiples of amount invested (ie. 1x, 2x, etc.) I talk about power-law distributions (PLDs) in venture in a previous post, which is probably worth reading. A PLD is of the form \(p(x) = Cx^{-\alpha}\), where the normalization constant \(C = (\alpha – 1)x_0^{\alpha-1}\), with \(x_0\) the minimum value x can take.
Assumption 2: The actual distribution of outcomes has an alpha of something just below 2. We will use \(\alpha = 1.98\). There’s evidence for this in the previous post I just mentioned.
Assumption 3: This is the tricky one.
We need to adjust the vanilla PLD to account for the 1/3=0x, 1/3=1x piece. A PLD can’t ever have a value for zero (it would be infinite, and the area of a distribution has to sum to one) so there has to be a dynamic that distorts the underlying PLD at the low end. This may be due to (a) the cost of unwinding a failing company (if a company is worth less than, say, 50% of what was put in, the cost of unwinding it means no money at all is returned); and (b) the structure of venture contracts (preferred stock, convertible loans) that return value to the investors first (if a company is worth 80% of the money put in, the venture investors are made whole first, so they probably get 1x.)
We could adjust the PLD is to set \(x_0\) such that 2/3 of the weight of the distribution falls below 1 and then transform the first third of the weight to zero and the second third to one, leaving the rest as is. But since a PLD is self-similar, we will do something simpler:
\[\begin{align*}In this case, \(x_0\) is 1, but \(\alpha\) stays the same.
Those are the assumptions, and the question is: given a finite set of possible outcomes, how well do they represent the actual distribution?
Any set of possible outcomes misses out on all the possible outcomes larger than the largest in the set, so we can reframe the question as: how much of the mean of the distribution comes from the tail, the part of the distribution further along the x-axis than the largest value in our sample? How fat is the tail? More concretely, if we are using the AIPP data, where the best outcome was 1333x, how much of the mean comes from outcomes larger than that?
To figure this out, we will segment the probability distribution into two pieces: the base and the tail, and note that the mean of the distribution is the mean of the base plus the mean of the tail [EDIT: I should have said the contribution of the base to the mean plus the contribution of the tail to the mean equals the mean]. This sounds funny, so an example.
You throw a die. You have a 1/6th chance of throwing any number 1 through six. The mean outcome is (1+2+3+4+5+6)/6=3.5. Now define throwing 1 or 2 as the base and throwing 3 through 6 as the tail. The mean of the base is (1+2)/6=0.5 and the mean of the tail is (3+4+5+6)/6=3.0. The sum of these is the mean of the distribution. It’s the same for a PLD but because a PLD is continuous, not discrete, we’re going to have to use calculus.
The mean of a probability distribution is
\[<x> =\displaystyle\int xp(x)\,dx\]so the mean of a PLD is
\[<x> =\displaystyle\int_{x_0}^\infty xCx^{-\alpha}\,dx =C\int_{x_0}^\infty x^{-\alpha+1}\,dx = \left[C\frac{x^{2-\alpha}}{2-\alpha}\right]_{x_0}^\infty\]For our distribution \(x_0 = 1\), so \(C = \alpha-1\):
\[<x> =\left[(\alpha-1)\frac{x^{2-\alpha}}{2-\alpha}\right]_1^\infty = 0\, – \frac{\alpha-1}{2-\alpha} = \frac{\alpha-1}{\alpha-2}\]Note that this mean makes no sense when \(\alpha<2\). Going back to the indefinite integral you can see that the mean of a PLD with \(\alpha<2\) is infinite.
To compare the mean of the base and the mean of the tail of a PLD, we’ll pick an arbitrary division between the two and call it b.
\[<x> = \displaystyle\int_1^\infty xCx^{-\alpha}\,dx = \int_1^bxCx^{-\alpha}\,dx + \int_b^\infty xCx^{-\alpha}\,dx =<x_{base}> + <x_{tail}>\]The first term is the mean of the PLD up to b, the base, and the second term is the mean of the tail.
\[<x> =\left[\frac{\alpha-1}{2-\alpha}x^{2-\alpha}\right]_1^b+\left[\frac{\alpha-1}{2-\alpha}x^{2-\alpha}\right]_b^\infty = \frac{\alpha-1}{\alpha-2}\left(1-b^{2-\alpha}\right)+\frac{\alpha-1}{\alpha-2}b^{2-\alpha}\]So
\[<x_{base}> = \frac{\alpha-1}{\alpha-2}\left(1-b^{2-\alpha}\right)\]and
\[<x_{tail}> =\frac{\alpha-1}{\alpha-2}b^{2-\alpha}\]Using this, if we know the mean of the base we can figure out how much bigger the mean of the entire distribution is: \(\frac{<x>}{<x_{base}>}\).
From above, \(<x>= \frac{\alpha-1}{\alpha-2}\) so
\[\begin{align}Note that if alpha is greater than 2 then \(b^{-(\alpha-2)}\) is less than one and so this fraction is greater than one: the mean of the distribution is greater than just the part contributed by the base. We knew that. But as alpha gets close to 2, \(b^{-(\alpha-2)}\) gets close to 1 and the fraction gets very large. As alpha approaches 2, more and more of the distribution’s mean is contributed by the tail until at 2 or lower, all of it is.
Let’s use the AIPP data as an example. The base of the distribution is all the outcomes in the data set, while the tail will be all the outcomes larger than the largest outcome in the dataset. How well does the mean of the AIPP outcomes reflect the mean of the distribution?
The dividing line between base and tail, b, is the largest outcome in the dataset: 1,333 times the initial investment. The mean of the AIPP dataset is 2.43x. This is the mean of the whole distribution including 0s and 1s, call it \(<x^\prime_{base}>\). To get from this to the mean of just the power law part, \(<x_{base}>\), back out the other parts:
\[<x^\prime_{base}>=\frac{0}{3}+\frac{1}{3}+\frac{<x>}{3}\]so
\[<x_{base}>=3<x^\prime>-1=6.29\]and the mean of just the power law piece of the entire distribution is
\[\begin{align}and the mean of the entire implied underlying distribution, including 0s, 1s, and the power law tail not in the outcome is \(<x^\prime>=\frac{2.1}{1-1333^{-(\alpha-2)}}+\frac{1}{3}\).
Here is a chart of the results for alphas between 2 and 2.25.
You can see that as alpha gets large, \(\frac{<x>}{<x_{base}>}\) gets closer to one and the mean of the total distribution is the same as the mean of the dataset. Since a large alpha means a short tail this makes sense. If the tail is short then most of the weight of the distribution is in the base. But as alpha approaches two, this mean of the underlying distribution gets larger and larger: more and more of the mean is in the tail. At two, it becomes infinite.
When the alpha of the distribution is 2.1 the mean of the actual underlying distribution is about 1.8 times the mean of the sample (the sample is the AIPP data). When the alpha=2.05 the actual mean is three times the sample mean. When the alpha is 2.01 the actual mean is more than 30. At alphas close to 2, the sample data misrepresents the underlying distribution by a huge amount. If alphas are less than 2.1, any Monte Carlo Simulation using the AIPP data does not represent the actual underlying distribution in any meaningful way. You can’t use the Monte Carlo Method to simulate a fat-tailed distribution, it won’t be accurate.
One final kicker. It seems like the alpha of venture capital outcomes is less than 2, in the 1.9-2.0 range. If so, the ratio of the mean of the distribution to the part contributed by the base is infinite. In the venture world, the average of any set of sample outcomes, no matter how large, does not represent the mean of the total distribution. Using a sample, even a really big sample, even a sample that has the return of every venture investment in history, to create a Monte Carlo Simulation is the wrong way to analyze venture outcomes.
You can’t simulate venture portfolios using a sample dataset, you have to use the underlying distribution from which the data are presumably drawn, a power law distribution with a fat tail, an \(\alpha\) just less than 2.
Using Power Law Distributions to Simulate Venture Portfolios
A PLD with a fat tail like this can be hard to work with. The mean is infinite, so some of the analyses we would like to do just result in infinity, which is hard to reason about. After all, the next company you invest in is not going to be worth infinity. The variance is also infinite, so Monte Carlo Simulations are unstable, because outliers big enough to skew even a 10,000 iteration run pop up.
The most helpful thing to know would be the average return of a portfolio with a confidence interval. If you have a portfolio size n, then the average return is m +/- 5% with probability 95%, that sort of thing. This is not possible to calculate in venture because the law of large numbers does not hold when the mean of the distribution is not finite.
Since the mean of the distribution is infinite, the average mean across all possible portfolios of any given size is also infinite. Of course, any actual sample from the distribution will consist of finite numbers with a finite average. But will increasing your portfolio by one more company increase or decrease the average of the entire portfolio?
Assuming the existing average is greater than one, then most of the time the next pick will decrease the average (because two thirds of the distribution is one or less so at least two thirds of the distribution has a lower value than your current average.) But in the less likely case that the new entry into your portfolio is larger than the current average, it will be on average extremely large. So while each new company in the portfolio will most likely lower your average return, when it doesn’t it increases the average by a lot. This isn’t two steps forward, one step back, it’s one step back, one step back, one step back, ten steps forward.
If this is so your portfolio should be as large as you can make it (so long as you can keep your alpha below 2.)
So if we can’t calculate that, what can we calculate? The most helpful analysis I can run is the probability of a portfolio of a given size exceeding a benchmark return, one times the money invested, two times the money invested, three times the money invested, etc.
If the portfolio has one company, this is easy. Imagine your portfolio size is one: what is the probability of a return greater than or equal to 1x? Given our probability distribution (1/3 0x, 1/3 1x, 1/3 greater than one) we know the answer is 2/3.
If the portfolio is size two, the answer can still be calculated using probability. Each of 0x, 1x, and the power law distribution that is greater than 1x is equally likely. So there are nine equally likely possible outcomes:
1. (0,0) < 1 |
2. (0,1) < 1 |
3. (1,0) < 1 |
4. (1,1) = 1 |
5. (0,>1) = ? |
6. (>1,0) = ? |
7. (1,>1) > 1 |
8. (>1,1) > 1 |
9. (>1,>1) > 1 |
1, 2, and 3 are less than 1x. 4, 7, 8, and 9 are greater than or equal to 1x. 5 and 6 could be either: if the >1 part is greater than or equal to 2 then the average will be greater than or equal to 1. The 1x part is greater than 2 with probability
\[P(x>2)=C\int_2^{\infty}x^{-\alpha}\,dx=2^{-\alpha+1}\]
If \(\alpha=1.98\) then P(x>2)=51%. So a two company portfolio equals or exceeds 1x when it is (1,1), (1,>1), (>1,1), (>1,>1) and 50.7% of the time when it is (0,>1) or (>1,0). This is 4/9 + (2/9)*50.7% = 55.7% of the time. This reasoning could theoretically be continued for any size portfolio but it would quickly become super-complicated, so we’re going to use a Monte Carlo Simulation. Unlike the AIPP Monte Carlos, this Monte Carlo is not going to pick from an existing set of data, it will pick from the distribution.
I charted the percentage of portfolios exceeding 1x, 2x, 3x, and 4x by size of portfolio, with alpha=1.98 (The code is below.) To exceed one times your investment (that is, make money) 90% of the time, you need a portfolio of at least 34 companies. To exceed 2x 50% of the time, you need a portfolio of at least 85 companies.
Here is the code.
That’s looking at how many companies you need to reach some probability of exceedings some metric. You can also assume a fixed portfolio size and calculate the probability that it exceeds each benchmark. In the chart, if you have a portfolio of 40 companies, you will meet or exceed 1x 92% of the time, 2x 44% of the time, 3x 24% of the time, and 4x 16% of the time. An easier way to look at this is to make a table of outcomes, and I’ve done that, below.
Here’s the code to generate the percent above the first 15 benchmarks (>=1x, >=2x, …>=15x) for a given portfolio size. It runs somewhat faster than the code above, although I’m sure it could be optimized further.
And here is the table.
% of portfolios that equal or exceed multiple | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Port. Size | 1x | 2x | 3x | 4x | 5x | 6x | 7x | 8x | 9x | 10x | 11x | 12x | 13x | 14x | 15x |
1 | 33.2 | 16.6 | 11.2 | 8.7 | 6.9 | 5.8 | 4.9 | 4.2 | 3.8 | 3.4 | 3.1 | 2.9 | 2.7 | 2.5 | 2.3 |
2 | 55.5 | 20.7 | 13.8 | 10.4 | 8.0 | 6.5 | 5.3 | 4.8 | 4.1 | 3.8 | 3.3 | 3.0 | 2.7 | 2.6 | 2.3 |
5 | 56.8 | 24.4 | 15.2 | 10.8 | 8.5 | 6.8 | 5.8 | 4.8 | 4.3 | 3.9 | 3.5 | 3.3 | 3.0 | 2.8 | 2.6 |
10 | 67.8 | 29.3 | 17.2 | 11.7 | 9.0 | 7.3 | 6.0 | 5.1 | 4.6 | 4.1 | 3.7 | 3.4 | 3.0 | 2.8 | 2.6 |
20 | 80.0 | 34.3 | 19.1 | 12.6 | 9.3 | 7.4 | 5.9 | 5.0 | 4.3 | 3.8 | 3.4 | 3.0 | 2.7 | 2.5 | 2.3 |
30 | 87.8 | 41.2 | 23.0 | 15.0 | 10.8 | 8.5 | 7.2 | 6.1 | 5.3 | 4.6 | 4.2 | 3.7 | 3.5 | 3.1 | 3.0 |
40 | 92.4 | 43.9 | 23.6 | 15.5 | 11.4 | 8.9 | 7.2 | 6.1 | 5.3 | 4.6 | 3.6 | 3.2 | 3.0 | 2.7 | 2.6 |
50 | 94.9 | 46.9 | 24.8 | 15.9 | 11.4 | 9.0 | 7.1 | 5.9 | 5.3 | 4.6 | 3.9 | 3.6 | 3.2 | 2.9 | 2.6 |
60 | 96.7 | 48.9 | 25.3 | 15.8 | 11.7 | 9.2 | 7.4 | 6.3 | 5.4 | 4.8 | 3.7 | 3.4 | 3.0 | 2.9 | 2.6 |
70 | 97.8 | 50.6 | 26.4 | 17.0 | 12.5 | 9.4 | 7.8 | 6.5 | 5.5 | 4.9 | 3.8 | 3.4 | 3.1 | 2.9 | 2.6 |
80 | 98.6 | 53.6 | 27.7 | 17.4 | 12.5 | 9.8 | 7.9 | 6.6 | 5.8 | 5.1 | 3.8 | 3.4 | 3.1 | 2.8 | 2.5 |
90 | 99.0 | 54.8 | 28.7 | 18.2 | 12.9 | 10.0 | 8.2 | 7.0 | 6.0 | 5.2 | 4.0 | 3.6 | 3.3 | 3.0 | 2.7 |
100 | 99.3 | 56.8 | 29.7 | 18.6 | 13.2 | 10.2 | 8.2 | 6.9 | 5.9 | 5.2 | 3.9 | 3.5 | 3.0 | 2.8 | 2.6 |
200 | 99.9 | 69.0 | 34.8 | 20.9 | 14.0 | 10.8 | 8.9 | 7.4 | 6.1 | 5.3 | 4.7 | 4.1 | 3.6 | 3.4 | 3.1 |
300 | 99.9 | 76.3 | 38.3 | 22.5 | 15.1 | 11.3 | 9.1 | 7.4 | 6.3 | 5.5 | 5.3 | 4.9 | 4.3 | 4.0 | 3.6 |
400 | 99.9 | 81.4 | 41.7 | 24.0 | 16.0 | 11.9 | 9.5 | 7.7 | 6.5 | 5.8 | 5.1 | 4.6 | 4.0 | 3.7 | 3.4 |
500 | 99.9 | 85.5 | 43.1 | 24.4 | 16.6 | 12.1 | 9.5 | 7.7 | 6.3 | 5.3 | 5.1 | 4.7 | 4.2 | 3.8 | 3.5 |
600 | 99.9 | 88.6 | 45.6 | 25.4 | 16.8 | 12.2 | 9.4 | 7.8 | 6.6 | 5.6 | 5.4 | 4.9 | 4.5 | 4.0 | 3.6 |
700 | 99.9 | 90.7 | 47.4 | 26.5 | 17.3 | 12.5 | 10.1 | 8.2 | 6.9 | 5.8 | 5.1 | 4.6 | 4.1 | 3.7 | 3.5 |
800 | 99.9 | 92.7 | 49.2 | 26.9 | 17.6 | 12.8 | 10.1 | 8.0 | 6.7 | 5.7 | 5.4 | 4.8 | 4.4 | 4.0 | 3.8 |
900 | 99.9 | 93.8 | 49.9 | 27.5 | 17.6 | 12.7 | 9.7 | 7.9 | 6.7 | 5.7 | 5.3 | 4.6 | 4.1 | 3.7 | 3.4 |
1000 | 99.9 | 94.9 | 52.1 | 28.1 | 18.1 | 13.1 | 10.3 | 8.3 | 6.9 | 5.8 | 5.5 | 5.0 | 4.5 | 4.1 | 3.6 |
10,000 | 100.0 | 100.0 | 91.8 | 48.8 | 28.8 | 18.0 | 13.2 | 10.4 | 8.7 | 7.5 | 6.7 | 5.9 | 5.5 | 4.8 | 4.4 |
100,000 | 100.0 | 100.0 | 100.0 | 87.8 | 48.2 | 30.0 | 20.2 | 14.0 | 11.2 | 9.6 | 7.8 | 6.8 | 6.0 | 6.0 | 5.2 |
1,000,000 | 100.0 | 100.0 | 100.0 | 100.0 | 85.3 | 46.0 | 27.2 | 19.9 | 14.2 | 9.9 | 8.0 | 6.6 | 5.8 | 5.5 | 5.2 |
There are some wonky numbers because even at the large number of iterations I used to generate these, the numbers are still susceptible to outliers.
Some observations:
What to make of all this? A few things come to mind immediately.
Certainty is hard
A 20 company portfolio gives you a 1 in 5 chance of a 3x return. To get to a 95% chance, you need 15,000 companies. Alternatively, going from a 25% chance of a 3x to a 50% chance requires 20 times as many companies. A 3x portfolio is not that unlikely, even with only 20 companies in it. But making it more likely than not that you get a 3x (ie. a greater than 50% probability) is impossible for almost any investor.
Optimal portfolio size may be smaller than you think
The probability of exceeding any given benchmark grows as you increase your portfolio size, but it grows very slowly. If you are building an early-stage venture portfolio, the gold standard is a 5x or more return^{5}. This kind of return is unlikely, as the table shows. With only 10 companies the probability is 9%, and it rises extremely slowly as the portfolio size increases. You would have to invest in 100 times as many companies to double the probability of equalling or exceeding 5x.
If you believe, as I do, that actively helping the companies you invest in increases their chances of success, then you have to balance the decreasing rate of growth in probability of reaching the benchmark against the number of companies you can actually help at any given time. While the table shows that going from a 20 company portfolio to a 100 company portfolio increases the probability of 5x from about 10% to about 13%, you can’t possibly help 100 companies as well as you can help 20. Remember, these are not balls being pulled from an urn, the distribution of outcomes arises from the system that exists, one that presumably includes the help of the financier: reducing that help may cause the system to perform worse.
I believe that helping increases the average outcome. Perhaps you question it. But you don’t have to believe very strongly that investing in too many companies decreases your outcomes to see that there is a point where the decline in outcomes offsets the very slow growth in the probability of a 5x.
Where are all the 14x funds?
Is there really a 1 in 33 chance of being a 14x fund? Why aren’t there more of them? (Maybe there are and I just don’t have the right data?) Is there a tail-off as returns increase at successful funds, possibly due to follow-on behavior (later-stage investments have a higher alpha; a firm that follows-on has more money at later-stages the more successful their portfolio, on average.)
Conclusion
This post asserts a numeric accuracy it doesn’t really have. The 1/3, 1/3, 1/3 is a rule of thumb and not based on hard data, for instance, and the whether the alpha is 1.98 or something else requires more research. Regardless, if you believe venture returns follow a power law then you can’t use a set of venture outcomes as the basis for simulating venture portfolios. You have to use that set of outcomes to divine the underlying probability distribution that generated them and then use the probability distribution to simulate the portfolios. I think this approach could yield some useful insights. I hope the math and code help you look for them.
In fact, in this particular set of wagers, none lost money. Analytically, the chance of losing money with 200 rolls has to be less than 5%, from Chebyshev’s Inequality; using the assumption that this binomial distribution is very close to a normal distribution, we can tighten that to be about 0.15%. ↩
Fred is simplifying the model, probably to make his blog post readable, a constraint I obviously do not observe. ↩
The dataset seems to have gone missing from the internet some time in the past couple of years. I summarized the data here. ↩
I should point out first that I am not the first person to notice this problem with the AIPP Monte Carlo simulations: Kevin Dick wrote about it years ago. ↩
Before any carry and fees. This raises a question: is the 1.98 alpha calculated from various venture fund returns pre or post-fees? If it is post, then the alpha may be slightly different. I don’t have the data. ↩
When results come in I re-examine my decision processes. What was it about this company that allowed me to get a good return on my investment? What should I have seen to prevent me from investing in this other company where I lost money? Actually doing this, versus sort-of doing it, is something I have to be conscious about; the alternative is letting my brain do the work behind the scenes, and human brains are so notoriously bad at this they just gave a Nobel Prize to one of the people who pointed it out.
I don’t believe in gut-level decisions. Having a bad feeling about something might reflect some sort of internalized rules, but there’s no real advantage to keeping them internalized, laziness aside. Getting them out in the open allows you to reason about them. This is especially important in venture where long cycle-times, high dimensionality, and sparse data inevitably lead to spurious correlation. You can’t build a venture investing process on data alone, you need to have a theory. The theory has to fit the data, of course, but it can’t just be inductive, there has to be a deductive element as well.
But even a theory that fits the data only gets you so far. In a previous post I said “There are two kinds of pitches. Those that are clearly bad ideas, and those where it’s not clear at all if it’s a good idea or a bad idea.” The reason this is true is that some ideas simply will not work and can be ruled out immediately: perpetual motion machines, selling dollar bills for ninety cents, etc. Ideas that might work, on the other hand, often have some intrinsic element of the unknowable about them. This argues for setting constraints to bound the space of venture-investable companies but also that the constraints can’t be so tight they’re a process for picking winners. Creating a process that picks winners is the same thing as creating a machine that predicts the future. I don’t believe that’s possible.
The beauty of constraints is that they rule things out, they don’t rule things in. They create a murky but bounded space of maybe that allows for ideas no process I know of, other than human creativity, could come up with.
A couple of months ago I wrote down my current understanding of my investing constraints in an effort to consciously improve my process. After writing them down I organized them into a handy-dandy table. It is below.
Market | Product | Team | Deal | |
Feasibility | Market size will be > $1 billion within 5 years. | The product can be built and launched in stages. | Team has experience building this type of product or company. | Can afford to own ~1% after the Series B. |
Desirability | There are customers who would pay now. | The business model can generate a large LTV. | Team is ethical but wants to win. | Possible to make 50x on Seed, 10x total. |
Scalability | Competitive intensity is and will be low. | The product could be scaled to >$100 million in revenue. | Founders can hire, raise, and sell. | Know what will be needed to raise the next round. |
Knowledge | Have spoken to customers. | Know landscape of possible competitors, substitutes. | Know the founders well or know people who do. | Know what similar initiatives are being funded. |
My Role | Can intro to partners, customers, thought leaders. | Know potential customers who can alpha/beta-test. | Team is coachable. | Have appropriate protections. |
Repeated Game | Need to be in the market to understand it; market formation in early stages. | Product draws on or feeds into related new markets. | Can build a relationship with the founders. | Investment and follow-ons big enough to make a difference, small enough not to bankrupt me. |
This list has obvious flaws. For me the main one is that I can imagine scenarios where I’d bend some of these constraints. It’s a work in progress. It also reflects a very specific type of investing, one where founders have to spend years trying to build a market for their company to be viable. Given this assumption most of the other constraints seem to follow naturally. Changing this assumption–after all, most of the startups in the world enter pre-existing markets–changes the constraints entirely. You have to build your own.
]]>I was an entrepreneur once, I have a different job now. But the implication that I was like them, that I understood them, that I would have empathy for what they are trying to do, was comforting. I’ve been in their position.
My friend Steve Schlafman put up a post yesterday, “Venture Investments are Not Bets.” In it he says
Countless times over the last decade, I’ve heard other venture investors refer to their investments as “bets.” Earlier in my venture career, I was absolutely guilty of this because I wasn’t being mindful and probably lacked some necessary empathy. At the time, I don’t think I truly appreciated what it meant and took to be an active investor and support a company over the long haul…We, myself included, should never view our work as gambling given how much time, effort and energy goes into building companies and long-lasting relationships with founders.
If you read my blog, you know I talk about betting a lot. I’ve likened angel investing to betting on horse racing, I’ve tried to figure out what lessons about portfolio management I can learn from assuming outcomes are randomly distributed, I’ve tried to get a better handle on follow-on investing from the Kelly Criterion–best known as the formula Edward Thorp used to Beat the Dealer in blackjack. Managing a venture portfolio has much to learn from managing a stack of chips at a gaming table. And, definitionally, the unit of gambling is the bet.
I understand Steve to have two points: saying that your investments are “bets” is an objectification of the team that is building the company, a way of distancing yourself from them as people; and that gambling is a passive endeavor.
The latter point I simply dispute. Gambling has two parts: working to maximize your edge, and luck. My interest in gambling is in the ways people have figured out to maximize their edge over the course of human history; this knowledge is more like folklore than science and so seems like a fruitful vein to mine. It is portfolio-level thinking.
But the former worry is real. I saw it when I raised money for the company I helped start and I see it in talking to investors now. Thinking about founders as pawns on a board rather than people working their hardest to make their visions real is insulting and wrong. But I don’t necessarily think this is the only way you can think of betting.
When I invest in a company I always wholeheartedly believe it will live up to the vision of the founder. I wouldn’t make the investment otherwise. But I know I am going to be wrong most of the time. I know that from experience, from 20 years of watching other VCs invest, and from reading pretty much everything on the history of venture capital that I have been able to lay my hands on. This is true no matter how much time, effort and energy I put into it. My involvement helps: there are things I can do, introductions I can make, viewpoints I can articulate. I can improve their chances of success, but not to certainty…anyone who has been around startups knows that hard work alone is no guaranty of success, there is always luck.
With every company I simultaneously believe this company is going to succeed and know this company will unfortunately most likely fail. Sometimes when I’m trying to make a decision these two different points of view lead to two different results. The only way to address this is through abstraction, by thinking about portfolio mechanics and probabilities of success across many investments, not the probability that any single one of the founders I back will succeed.
When I think about gambling, about making bets, it is at this level of abstraction. The portfolio-level thinking is a necessary distancing, allowing me to both believe 100% in a founder and not get wiped out when I’m wrong.
But I don’t distance myself from the entrepreneur as a person. Every startup founder knows their idea is thin at the very beginning: no data, no resources, no proof. There is no rational basis for any investor to make an investment in any given startup. Most startups fail and most investors won’t invest. When an investor invests in a startup it’s because they believe something that the other investors don’t believe. And in the absence of any business proof, what they believe in is the founder.
I invest because I believe in the founder, despite the lack of evidence, despite the objective chances of their success. Believing in a founder is the opposite of distancing myself from them or putting an abstraction layer between us, it is embracing who they are and what they’re trying to do.
But I’m still taking a chance, making a bet. There is that element of luck, always. It’s just that I have decided, generally against all rationality, that the founder has managed to shift the odds in their favor. When I believe this it is because they have convinced me of it. And I think this is the greatest possible expression of empathy: choosing to believe what they believe, when all my analysis and everyone I respect insists that there is simply no reason to believe it. I put myself in their head and see what they see. When I bet, I bet on founders.
Here is the best thing any entrepreneur has ever said of me. Over coffee, a propos of nothing.
“Thank you for believing in me. For taking a chance on me.”
I took a chance, I made a bet that she could change the world, I believed in her. I won’t apologize for that.
]]>EDIT, 6/20/17:
Kelly justified betting a larger amount of your bankroll when you have a larger informational advantage. But the Kelly Criterion itself imagines a specific scenario: sequential parlay betting when you have an edge. Since venture investing is never really sequential (the ‘bet’ is not immediately resolved) and only angel investors really parlay, it can’t be literally applied to determine exact portfolio allocation. This post was not meant to suggest that it should be. This post makes a single claim: the Kelly Criterion suggests that because in venture capital edge tends to increase faster than odds as more information becomes available, it usually makes sense to increase the amount invested in a portfolio company–to follow on.
If you try to allocate via the Kelly Criterion alone, you’ll quickly find that you run out of money to invest due to unresolved bets.
I’ve made a couple of edits below to clarify this.
Should you follow-on in later rounds?
The pros:
The cons:
These are good arguments, but they’re secondary considerations to the underlying question. Before all of the strategic pros and cons, how should your portfolio be allocated to maximize your expected return?
There is a way to compute an optimal investment size in a series of investments, it’s called the Kelly Criterion. John Kelly, Jr. was a researcher at Bell Labs in the 1950s and a devotee of fellow Bell Labs researcher Claude Shannon’s information theory. He solved the problem in his 1956 paper A New Interpretation of Information Rate^{1}.
Kelly asked: if you have a private stream of information related to a bet, not known to others and so not reflected in the odds, you have an edge; if you have an edge, how much should you wager? If you only have one shot at a positive expectation bet, you wager everything you can afford. But what if you have a stream of these bets and can wager previous winnings^{2}? If you wager everything, you’ll eventually lose it all and walk away with nothing. But what amount between nothing and all?
Kelly determined that the fraction of your bankroll to wager is equal to \[\frac{edge}{odds}\].
The odds are the multiple of your wager your bankroll increases by if you win. When you roll a die, fair odds are 5 to 1: if you roll your point you win $5 (and keep the dollar you’ve bet), if you don’t you lose $1. In this case the odds are 5 to 1, or just 5.
If it’s a fair die, the probability of rolling your point is one-sixth, and you have no edge. Your expected value is getting your wager back. But imagine it’s a loaded die and it rolls 2 one-fifth of the time and something else four-fifths of the time. No one else knows this, so the odds remain the same. You would bet the 2, of course, because you have an edge.
For every dollar you bet on the 2, you can expect to win $5 one-fifth of the time and lose $1 four-fifths of the time. Your edge is
\[\frac{$5}{5} – \frac{$1*4}{5} = $0.20\].
Your edge per dollar is 0.2. According to Kelly you should bet 0.2/5 of your bankroll, 4%, on each roll of the die.
\[f^{*}\], the fraction of your bankroll to bet, is
\[f^{*} = \frac{bp – q}{b}\],
where \[p\] is the probability of winning, \[q\] is the probability of losing, and \[b\] are the odds.
Since \[q=1-p\], we only need two numbers to figure out \[f^{*}\]: the probability of winning, \[p\], and the multiple of your wager you get if you win, \[b\]. Note that the Kelly Criterion assumes a binary outcome, you either win or lose.
How does Kelly fare? Below is a chart of a simulation of betting on the crooked die. The blue line is your bankroll using Kelly and the green line is your bankroll betting a fixed amount equal to 10% of your initial bankroll on each roll. The bankroll starts at $10. Kelly obviously grows much more quickly (the bankroll is on a log scale; the edge here is huge, btw, that’s why the bankroll grows so fast.)
Before we apply this to venture, two simplifying assumptions. First, each investment is independent of the others. Second, you are aiming for a certain fund IRR. If you are aiming for 20% IRR per year, and you think you will hold each investment on average five years, then your fund should return 1.2^{5}=2.5x. Valuing each investment to achieve this overall return creates your edge (and accounts for the time value of money.)
Here’s an example. You’re running a $50 million seed fund that you expect to have a 20% IRR and hold investments on average five years, a 2.5x return overall. You invest in a company you think has a 1 in 50 (2%) shot at a billion dollar exit. The value of the company now is $1 billion x 2% / 2.5 = $8 million.
If the company is successful your investment will return $1 billion/$8 million = 125x. This is the multiple, \[m\]. The odds are one less than the multiple because the multiple includes the original investment, \[b=m-1\], so \[b=124\] and
\[f^{*} = \frac{bp – q}{b}=\frac{124×2\%-98\%}{124}=\frac{1.5}{124}=1.21\%\].
1.21% of a $50 million fund is $605 thousand. This is what Kelly says you should invest in this company. (Edit: While conceptually true, the practical issue here is: what is the fund size when you make an investment? It has to include the current value of whatever companies you’ve already invested in, although those are hard to determine. Also, they are illiquid: you may find the Kelly Criterion suggesting you up a bet when all of your cash is already in other companies. This is because the Kelly Criterion was built on a scenario where all bets pay off immediately.)
(The $8 million value is not the price you invest at, it doesn’t account for dilution. If you assume that after your round the company will increase its issued shares by 50% because of later rounds and options issuance, you invest at $8 million/1.5 = $5.3 million. You will own $605,000/$5,333,333 = 11.34% of the company. This doesn’t really matter here because your expected multiple remains the post-dilution multiple, I just wanted to point it out. This way of getting to price is called the Venture Capital Method of valuation^{3}.)
Note that the numerator of \[f^{*}\] is your expected fund multiple, \[e\], less one. This is because \[bp-q=(m-1)p-(1-p)=mp-p+p-1=mp-1\] and \[mp = e\]. So,
\[f^{*} = \frac{e-1}{b}\].
The Kelly Criterion is well known to investment managers. But venture capital has a twist most investors don’t have: investments are staged, there are multiple rounds at different prices and risk profiles. VCs have to figure out how their optimal allocation changes as new information comes in.
Imagine another investor decides to invest in this company a year later. The company performed well and mitigated some risk. The ultimate exit value if the company is successful remains the same, but the probability of success has increased. There is now a 10% chance of the company reaching $1 billion. The new fund has a 2.1x goal (because time value of money: it’s a year later and 1.2^{4}=2.1). They value the company at $1 billion x 10% / 2.1 = $48 million (post-dilution but, again, we don’t care about that here.) Their multiple is 21, so the odds are 20. For them
\[f^{*} = \frac{e-1}{b}=\frac{2.1-1}{20}=5.5\%\]
The new investor should invest 5.5% of their fund in the company.
The funny thing is, so should you. The additional information they have, a year later, is information you now also have. Your calculation changes, and is identical to theirs. To remain at the Kelly optimum you should increase the percentage of your fund invested in the company from 1.21% ($605 thousand) to 5.5% ($2.75 million.) You should invest $2.14 million in this round^{4}. (Edit: This assumes your bankroll is still worth $50 million. It may in fact be more–the company in question, at least, is worth more. But other bets in your portfolio may pull that down and it is worth less. If the portfolio has grown in value in that time, then Kelly indicates you should invest at least that amount. Note that this amount is probably much more than your pro-rata anyway: if the company is raising $10 million in the new round, 20%-ish, then your pro-rata is 11.34% of that, or $1.134 million.)
In most up-rounds, especially from the early stages, Kelly recommends increasing your position. Based on an initial investment made with a success probability of 5% and an eight year time to exit, the table below shows the increase in allocation based on the increase in probability of success and decrease in time to exit. In this case, Kelly suggests you should increase your position unless there is a minimal increase in the probability of success a long time after your investment (the red numbers). While these are technically “up” rounds, that company is going sideways. And, of course, if you think the probability of success is only 10% but exit is imminent it’s probably your thought process you should worry about, not your portfolio allocation.
You may also have strategic considerations (signaling, what you’ve promised your LPs, etc.) that have to be set against the Kelly optimum. Do the calculations each time and use them as inputs for your thinking. But the math says you should almost always follow-on.
Originally titled “Information and Gambling” until AT&T nixed that, wanting to distance themselves from gamblers. An entertaining, if non-mathematical, telling of the story is in Poundstone, W. (2006). Fortune’s Formula. http://amzn.to/2sHEPtu ↩
The Kelly Criterion assumes bettors parlay: winnings are recycled into the bankroll. While some venture funds do limited recycling, and angel investors obviously recycle, it’s a conceptual issue for the non-recycled part of the winnings. The underlying observation that information that increases your edge should increase your bet size remains true though. ↩
See https://ocw.mit.edu/courses/sloan-school-of-management/15-431-entrepreneurial-finance-spring-2011/lecture-notes/MIT15_431S11_lec01.pdf if you want a more in-depth explanation of the Venture Capital Method. ↩
Your pro-rata in this example is almost certainly less than this, but you should take as much as you can get, up to $2.14 million. ↩
I was stumped. I couldn’t think of an answer. I figured I’d come back to the process once I did. That was years ago.
I thought of that conversation recently. A company I invested in went public. I was enjoying my fifteen minutes of credibility, meeting with VCs, people with funds. VCs who had convinced someone they were special.
I meet with VCs all the time. I need to know who might be interested in doing the Series A for the seed-stage companies I invest in, or the Series Seed for the pre-seed ones. In the past months, with my current state of credibility, I sometimes meet with the Midas List kind of VC. In all of these meetings there comes the moment when they lean forward in their seat (they don’t always actually lean forward in their seat, but you know) and ask me “what makes you so special?”
Not necessarily in those words.
I don’t really like to sell myself. This was a handicap when I worked for others. It’s why I suppose I no longer work for others. I don’t like relying on someone else’s opinion of how I am doing. Especially when that’s what it is, an opinion. Venture capital you can quantify. You either do a good job (you make money) or you don’t (you lose money.) It’s not an opinion.
But just telling people the numbers doesn’t impress them. We all know that person who made 99 meh investments and then put a small check into…whatever, name a unicorn. Luck.
Me? My numbers are good. Top quintile. In fact, top quintile even after you take out the IPO. Clearly skill. My friend Dave once told me the difference between investment and speculation. “Speculation is when someone else is doing it,” he said. Luck is when someone else is successful. When you’re successful, it’s skill.
To others if you only have a what, it’s luck. Skill needs a how, because it’s only skill if you could do it again.
After being stumped by Chris I cycled through plausible-sounding explanations of how. I used them to answer the question. None of them impressed anyone. When I ran out of plausible-sounding explanations I started asking others the question, to see what they said. I was in turn not impressed, although they were generally better at selling their answer than I was.
Is there no how?
Some people have better investment results. Is it luck? Michael Mauboussin said you can tell skill from luck by asking yourself “can you lose on purpose?” This is an amazing question. In venture the answer is, trivially, yes.
There are two kinds of pitches. Those that are clearly bad ideas, and those where it’s not clear at all if it’s a good idea or a bad idea. Investing in the former will lose you money. Investing in the latter might lose you money or might make you money. Skill is distinguishing between the two. Then luck comes into play.
The reason I could never say how I’m special is because I’m not. And neither were any of the other VCs I met with. At least not in a way anyone could wedge into a 30-second pitch. Divvying the world into two piles is hard, but it’s not magic. Being special is being magic, that’s what the question is really, how are you magic?
Anyone can learn to invest well in startups. But you do have to learn how to do it and then you have to work hard to do it right, every time. Like any job, you show up and you do the work and you notice your mistakes and you try to do better and you improve over time. It requires thinking and trial and error and trying to be rational and asking yourself if you’re thinking about this right and asking other people how they did what worked and what they think about this one you’re thinking about. Figuring out the nos from the maybes is, more than anything else, like solving a puzzle. The puzzle is different each time. Your job is solving the puzzles.
Good puzzle-solvers have all sorts of strategies they use to solve puzzles, but the main way they become good puzzle-solvers is by solving puzzles. Good puzzles are never the same as other good puzzles. There is no generic puzzle solving process. If you ask a good puzzle solver what makes them so special, they would ask for a puzzle to solve, and solve it. What makes them special is that they solve the puzzles. They do the job.
I wanted to impress Chris, I don’t really need to impress VCs. If they think I’m lucky it’s as good as thinking I’m good. Lucky or good, same difference, they’ll take my introductions. So now when they ask what makes me so special, I tell them the truth: I’m not special, I’m just doing the job. It’s sort of underwhelming.
But funnily enough: when I say that to someone new to the business, they think I have no how; the meeting usually ends soon after. When I say it to the Midas List guys, that’s when the meeting starts; they recognize the only real how there is.
]]>Not sure why I’m apologizing for not giving you another 25-page, massively footnoted post…]
I don’t believe virtual reality is a good area to venture invest. I do believe augmented reality is a good place to venture invest. There is a key distinction between the two in terms of the chokepoints in the value chain.
Every media needs several components to work: the medium itself, the content, content distribution, monetization, and content discovery. Each of these pieces has its own economics, depending on the medium. The different economics leads to a different propensity to be controlled by a small group of companies.
Note that by medium I mean here the means by which content is presented to the user. So the medium for a newspaper is the printing press and paper. The medium for the internet is the network, etc. By distribution I mean the means of getting the medium to the user, not the act of doing so. So record labels may have a distribution function, but iTunes and Amazon are the means of distribution.
The radio industry had radio sets as medium, radio shows as content, distribution through radio broadcasts, and branded networks to facilitate content discovery and sell advertising. The cost of broadcasting and the economies of scale of content discovery and ad sales meant that the radio industry quickly became dominated by the networks, who owned these things, and the content creators were usually poorly compensated. The radio sets themselves were subsidized to build the audience needed to create the economies of scale for the networks and radio manufacturing was a break-even business.
This model, similar to what later happened in television, would have been a bad place to venture invest. The networks were built by companies that already had complementary assets and, more importantly, the cash and political power to establish dominance. The winners did not need venture-type financing (as it existed then) and companies that did need outside financing were inevitably destined to go out of business.
The movie industry developed differently. In its golden age it was dominated by the movie studios, who were primarily content creators. (The movie theaters were the means of distribution, not the studios.) This is similar to the record industry through most of its history. But because of the disaggregated nature of distribution and monetization, movies and music could be made outside of the studio/label system and occasionally make money (the “indys.”) Whether these were a good bet for outside financing is arguable, but the odds may have been no worse than VC if you knew what you were doing.
The internet is, again, different. Because there wasn’t a single chokepoint in the value chain, opportunities flourished in all sectors. Startups made (and make) money in the medium itself (meaning, here, companies like Cisco and AOL), content distribution, monetization, and discovery. There was some dominance in distribution, discovery and monetization over content creation, and this lead to early concentration at discovery companies like Google and monetization companies like DoubleClick. It has also lead to a very difficult environment for content creators^{1}
In analyzing any new medium, it pays to figure out the various pieces of the delivery value chain and which ones will have the ability to take whatever share they desire of the overall margin available. These will be the one that become the valuable players in that market.
Virtual reality’s value chain is going to be dominated by content creation. Somewhat like the movies and more like computer gaming. The cost of creating VR content will be high so content creation will economically dominate distribution and discovery. The high cost of creating quality content will mean that less quality content is created, allowing discovery through typical marketing/PR and word of mouth (like how movies are discovered now.) Because recouping the cost of high-quality content will require large audiences, VR headsets will need to be cheap. They may at first be subsidized, but will eventually be required by the content makers to be high-volume, low-margin hardware. Expensive, and thus scarce, content will tend towards the lowest common denominator (like console computer games) so risk can be managed through a portfolio approach (like music and movies.) This suggests that VR content will eventually be dominated by a few very large companies, and probably mainly companies that enter from adjacent industries (my bet would be on EA.)
There may be other uses for VR other than the mass media/broadcast model I describe, such as in business. But because the largest piece of the market will drive revenue in the rest of the value chain down, any other value chain that avoids the chockpoint but uses the other pieces will have very low barriers to entry because its suppliers will have no bargaining power. For instance, the creation of training films for businesses avoided the content creation chokepoint in the consumer media business and benefited from the lower cost of movie-making equipment and talent. But because these had been made plentiful by the mainstream industry, there was no way to build a big business in corporate film-making. Something similar will happen in VR.
Augmented reality is completely different. Because uses of AR will be more varied, content creation will be less expensive (because there will be no “arms race” to create the single work that everyone sees.) No single part of the value chain will dominate. I expect AR to be more similar to the internet in its evolution. Content may explode in popularity overnight and then fade, but there will be no winner-take-all in AR content. Because content will need to be less expensive to make, content tool companies will be needed. Because this will lead to more varied types of content, hardware makers will do well: the hardware will be tuned to specific customer needs and content will address more specific customer problems, so AR will be more valuable to a customer than VR. This will allow both hardware makers and content makers to have higher margins. Distribution and monetization may end up consolidating, but not necessarily through existing players.
If my reasoning is correct, VC investments in VR will end up doing poorly as startups are outcompeted by incumbents. The AR market, on the other hand, is wide open.
There’s an arguable point here about distinguishing between content creation and content discovery on the internet, and normally I would write several pages defending my choice but, luckily for you, no time. ↩
I’m not the first person to note with some pique that the word “disrupt” is overused and misunderstood. Technological disruption has gone from an interesting way to illuminate the workings of the innovation machinery, to an imprecise strategic crutch, to a magician’s misdirection, to, now, the cargo cult of technology commercialization. Cargo cults are fascinating because they mirror our own tendencies to confuse cause and effect, but they have real costs. They misdirect resources to ineffectual ritual from actual problem-solving.
There are better ways than disruption to think about whether you can succeed at building a business with a new technology. In fact, there are few worse ways.
Disruption
I’m sure what the presenter of the slide was getting at was Clayton Christensen’s definition of disruption from his classic book The Innovator’s Dilemma. In Christensen’s terminology, a disruptive technology is one that costs less than existing technologies and has subpar performance by the dominant standards, but performs well along a dimension that the existing market has little need for. His primary research was done on the disk drive industry, where the dominant metric of performance improvement was storage capacity. New companies repeatedly disrupted the existing market by introducing disk drives with less storage capacity but smaller form factors. Incumbents were so motivated by the needs of their existing customers to increase capacity that they ignored nascent markets where new customers needed something else. New companies could enter this new market without competition from well-resourced incumbents and get enough traction to fund the improvement of their technology along the dominant metric. They then started peeling customers away from the incumbents. The incumbents were chased further and further upstream until they ran out of customers.
This is the definition of disruption that innovators try to associate themselves with, and for good reason. A disruptive innovator has a chance to replace the large, entrenched companies that dominate their sector. Competing with Google or Apple or Amazon is daunting and if you can’t think of a recipe for winning then you might latch onto disruption as your savior.
But even disruption as defined by Christensen does not really apply in the life sciences business. New biomedical technologies very rarely completely replace existing ones or chase incumbent life sciences companies out of the business.
[T]he history of the drug sciences revolution is very much one of successive “waves” of new technology that rise up and later become adapted into the flow. Recombinant DNA and MAbs represented the first waves of biotechnology that came on the scene in the late 1970s. Many predicted that new methods of making drugs based on genetic engineering would replace traditional “old” medicinal chemistry. This has not happened; moreover, it now turns out that medicinal chemistry and genetic engineering are complementary. This pattern has repeated itself over the subsequent thirty years, with the emergence of rational drug design, combinatorial chemistry, and high throughput screening; then genomics, proteomics, and more recently, systems biology and RNA interference. Each new approach emerges from science and initial expectations (and hype) are that this one is “the real deal” and will dominate. But there has been little replacement of old with new. Instead, the new technologies, as well as the even newer ones, coexist with the old. Furthermore, it appears that they do not operate independently; rather, they are highly complementary.^{1}
Even outside the life sciences field, new technologies rarely completely change the structure of existing markets, although they often alter them, evolve them, or even create new markets. And at successful companies that did drastically alter a market, the original intent was not usually to “disrupt”, it was to create something new. Google, despite the radical changes it brought to so many markets, set out to create something, not destroy anything. Disruption isn’t everything.
Pure Technology Innovation
As a tool for thought experiments in innovation the biomedical business is especially valuable because it lies on one end of a primary innovation dimension. Every innovator is trying to find a match between some under-solved problem and some technology. Imagine a two-dimensional space of possible companies where the x-axis is technology and the y-axis is the problem being solved. Each (x,y) point has a value we can call ‘fitness’: how valuable a company using that technology for that problem is. If the fitness is noted on the z-axis, this is called a ‘fitness landscape.’ An entrepreneur searches the fitness landscape for a good idea, iterating along the x and y dimensions, looking for a peak in the z-axis: “is there a problem this technology can solve better?” or “is there a technology that can be used to solve this problem better?” In the first, entrepreneurs know the technology and are looking for a problem; in the second, they know the problem and are looking for a technology. Usually you do a little bit of both. But the biomedical business is at one end of the spectrum, it’s pure technology innovation.
The keeping-people-alive-and-healthy business is one of well-known problems. Jonas Salk did not have to look long and hard to know that polio was a problem worth solving: the problems of sickness and death are evident. The difficulty in this market is not finding a problem, it is finding a technology that solves that problem.
Christensen notes that new companies disrupt incumbents not when they introduce new technology, but when they help create new markets.^{2}
The biomedical industry occasionally has new markets, when a new disease surfaces or the like. But this is not the driver of innovation in the field: most new ideas are not in response to new markets, they are the result of new technologies. Christensen’s theory says that none of these are disruptive and very few can survive. His theory says that innovation would be monopolized by incumbents, who would latch onto technological innovations and squeeze out new companies.
But there are new and successful companies in the life sciences field. According to the NVCA, in 2015 about a fifth of the venture capital dollars and more than a fifth of the companies funded were biotech, medical device, and healthcare services companies. I would venture to guess that few of these companies were formed to address a new market^{3} they were using new technology to solve an existing problem. And yet they were successful enough to raise venture capital.
Christensen’s theory of disruptive innovation does not cover this. And if it doesn’t cover new-tech-only startups, then it can at most only partially cover companies somewhere on the spectrum between new-tech-only and new-market-only.
Pure Market Innovation
What about the other end of the spectrum, pure market innovation? Companies that are using proven technology to create new markets? The companies here are familiar. Square, for instance, provided credit card processing to businesses that used to be cash only. There was no new technology here (Square was founded in 2009, well after the smartphone stopped being new.) There was innovation of course, Square used a novel combination of technologies to solve a problem, but it wasn’t really technological innovation. Uber is another example. There is little reason Uber could not have been started ten years earlier than it was: a feature phone interface might not have been as snazzy, but it would have been about as functional. The technologies Uber deployed–mobile phones, logistics, a marketplace–were not new in 2009. Instead, Uber opened up a new market.
But neither of these companies were disruptive in The Innovator’s Dilemma sense. Neither was cheaper and less functional than what they were replacing and then remained cheaper as functionality grew. Merchants that adopted Square early on did not do so because it was cheaper than the alternative, they adopted it because they finally had a viable way to accept credit cards; for most of them it was more expensive than their previous way of doing business (cash) and more expensive than what the incumbents would have provided, if the incumbents had provided them anything. The key difference between what Square did and disruption is that Square did not open a new market by bringing the cost down into the reach of an entirely new set of customers, they served a small set of customers that the incumbents simply didn’t care about because it was a small market. This is not disruption, it’s classic market segmentation.
Nor did people adopt Uber because it was cheaper. Neither set of customers–the riders or the drivers–saved money from using Uber. Riders use Uber because it’s easier to hail a taxi that way, not comparative cost. Drivers use Uber because it’s more attractive than driving a taxi, not comparative pay. Both of these are an indirect result of taxi regulation. Again, this is not disruption. The new market was available because Uber first ignored, then lobbied to change, regulations that made that market seemingly unavailable to new entrants.
Neither of these companies is covered by Christensen’s theory. In fact, if we go down the list of Unicorns, not many actually fit Christensen’s definition. Airbnb? OK, I’ll buy that. Palantir? No. Snapchat? No. SpaceX? Maybe someday. Pinterest? No. Dropbox? Sort of. WeWork? No. Spotify? No. Etc. Generally, you could make an unambiguous argument that 5% are disruptive, and a tortured argument that another 45% are. The rest? They’re just not disruptive. And yet they all seem to have a so-far successful strategy. Why, then, does everyone always talk about disruption?
Disruption as a strategy sucks
Christensen’s theory is vague. What is a new market? What does cheaper even mean? But the bigger problem with the theory is knowing what to do with it. You read the book, you want to start a company…what exactly does the theory advise you to do to create a disruptive company?
You can’t decide to start a disruptive business. You can’t take Christensen’s theory and use it to churn out disruptive companies. Don’t believe me? Try it. Cable TV, for instance, is too expensive and provides more functionality along a specific axis than most customers need. If you can think of a way to disrupt it, then why aren’t you doing it? It’s a giant pot of money just sitting there for you and Clay Christensen to take. None of the millions of people who have read The Innovator’s Dilemma has taken that pot of money because the theory doesn’t say how.
Christensen’s theory is descriptive, not prescriptive. It names a process but does not tell you how to generate that process. You might know disruption when you see it, but you only know it after the fact. You can’t know beforehand that if you create a new market it will grow big enough to sustain your company while you improve the quality of your product until you can go after the established market. You can’t know beforehand because, as Christensen himself notes, “markets that don’t exist can’t be analyzed.”^{4}
Here’s a thought experiment: put yourself in Steve Jobs’ shoes in 1976. You have a personal computer to sell. How many people will buy it? Steve Jobs thought every home in America would have a personal computer. He was low by an order of magnitude (it now looks like every person will have three or four.) Other observers at the time thought the potential market was far, far smaller (“There is no reason anyone would want a computer in their home.” said Ken Olson, founder of DEC, in 1977.) Whatever seems obvious in retrospect, there was no way at the time to know how big the PC market would be, how fast it would grow, or if it would sustain one company, much less the dozens that entered it. Disruption isn’t much of a recipe if it still leaves you with the fundamental risk of every startup: will there be a market for what I’m selling?
So why is Christensen so popular if his theory can’t be put to use? Well, because it can. Just not by you. We are not Christensen’s target audience. The Innovator’s Dilemma was written as a warning to the managers of large companies, the incumbents, not as an instruction manual for startups. For bigco executives, it was a much-needed wake-up call: watch out for those little companies going after the customers you don’t want with technologies that look like toys, they could grow up to displace you. Christensen was not writing to the founders of those little companies on how to disrupt those big companies.
Startups can win even when they are not ‘disruptive.’ Intel, after all, did not enter the microprocessor market by intentionally introducing a cheaper general purpose computer, they entered it by introducing a much more expensive slide rule…the 4004 was developed to power electronic calculators. The market for electronic calculators was small, allowing Intel the room to build expertise in CPUs, but Intel’s entry can’t be described by Christensen’s attack from below process unless you take into account facts not then in evidence: that CPUs would be used to build general purpose computers. Finding a foothold market for a new technology gave Intel the time and space to explore other potential markets for the technology, and even though the strategy itself was not disruption, Intel was successful.
The Innovators Dilemma is a traditional corporate strategy book. It talks about how to recognize threats, how those threats might play out, and how to defend market share you already have. It is not entrepreneurial strategy.
Entrepreneurial Strategy
Most advice to entrepreneurs is tactical. You need a good idea, a good team, a good product, and a good business model. You should interview potential customers, size the market, build a MVP and iterate. These are all tactics. Strategy is the route on the map, tactics are the means of travel. Tactics are more important in the near-term, but strategy is far more important if you want to go the distance.
Every entrepreneur’s long-term question is: how do we get to be a dominant company in a big market? Answering that question provides you with a strategy. Finding a big market is the part most entrepreneurs focus on. But becoming dominant in it is usually neglected: either taken for granted (“we’ll win because we’re better”) or assumed to be beyond control (“someone will take this entire market, it might well be us.”) A good strategy will guide you to becoming the dominant company, and a key component is outlining how you will deter or delay competition as the market grows.
Big companies will compete with you once you show them the way. They pay attention to small companies who are doing interesting things, and especially when they’re doing interesting things in a rapidly growing market. Big companies won’t have any qualms copying your business idea and plan if they can. The former president of PepsiCo once wrote an article on innovation in the Harvard Business Review where he said:
…most of PepsiCo’s major strategic successes are ideas we borrowed from the marketplace–often from small regional or local competitors.^{5}
Apparently, “borrowing” someone else’s idea is considered innovation at big companies. You need to protect yourself.
And, or course, once you start to show signs of success, you will engender many competitors.
There are many strategies to deal with this, a blog post is too short to describe them all.^{6} I’ll outline some key factors that go into a good strategy, but keep in mind that strategies are different for each company and depend heavily on the market you’re going into, the resources you have, and the competition you may face. You need to think about these things before you can start formulating a strategy.
As a quick example, and an apology to Christensen, let’s look at disruption. If, as a starting-state, you have an already large market where the incumbents (because there have to be incumbents for there to be an already large market) have over-provided their customers with whatever customers consider quality, and you have found a technology that you can provide more cheaply that has some ancillary quality, and you have customers who need that ancillary quality and don’t really value what the market currently considers quality, then your situation might well be Christensen’s disruption. Your strategy then should be to use the revenue from the new customers to fund furious improvement of your technology so you can eventually drive the incumbents out of the existing customers and be dominant.
This starting-state of affairs is unusual and very difficult to see until the company is already in business, as we discussed. If you’re waiting for these exact conditions, you’ll probably never start a company. But the reasons it works–you can keep competitive intensity low in a small market until you’re able to compete better than anyone else in a market you already know is large–highlight the two key strategic goals we pointed out above: big market, ability to become dominant. By recasting Christensen’s scenario this way we can articulate a way to build these companies. Christensen has a warning, we have a strategy.
Some other factors to take into consideration as you form strategies:
Intellectual Property: Pharmaceutical companies can enter existing markets, well known to incumbents, because they have new technology. And they can prevent competitors from replicating their technology with patents. Patent protection is valuable in industries like the pharmaceutical industry, where it takes an enormous amount of time and money to find a compound that solves a problem but where that compound is easy to make once known. They are less valuable when the unknown is not how to solve a problem, but which problems are worth solving and for whom.
Continuous Innovation: There are other ways to prevent competitors from copying your product. The best is to keep making it better. It would be relatively easy to build a search engine to compete with Google if Google still used the algorithms they used ten years ago. But Google took advantage of the stagnation of their competitors circa 2000, built a better product, won the market, and then continued to improve it. Despite being a near-monopoly in search, Google has never rested on its laurels. It’s easier to innovate ahead of others when what you are building is technically hard, or you need people with uncommon skills to build it.
Closely related to this is building an organization that understands its customers: a leading-edge company has access to more relevant customer knowledge than a company outside the industry and can use this to build the best products because, in a new market, what the best product is is still being discovered.
Lead Time: With many startups, economies of scale are not that relevant but cumulative time and cost to build makes a difference. If you have developed a complex system that took many years to build, a newcomer needs to do all the work you have done in that time (as well as whatever work you do while they are building) to have a system that can compete with yours. If they can’t generate meaningful revenue until their product is comparable to yours, then they need to raise all the money you received in revenue over the years as investment capital. This hurdle can become very large very quickly.
The height of the hurdle primarily depends on how complex the system is: a new operating system is a high hurdle, a new CRM not so much. With information businesses it depends on how much it costs to gather the information. Many big data businesses have developed a system over several years to deal with an enormous number of transactions per second; if you are just starting out, you need to follow the same path before you can compete with them. If it takes you as long as it took them, you’ll never catch up.
Complementary Assets: Another way of deterring competitors is to have a product that relies on its integration with other products and services for its utility. These other products and services are called complementary assets. For instance, Apple’s iTunes was never the best music management software available, and yet it quickly took a near monopoly position because of its integration with the iTunes store (and through the store with the labels) and Apple’s MP3 players.
Switching Costs: Some sort of lock-in or high switching cost keeps customers even if your product is not as good as your competitors’. The network effect is a good example: networks are more valuable the more people are on them, so if your product needs a network to function and you can create a large one then upstarts will have a harder time competing. Another example is the lock-in created by a product like Microsoft Office, where a proprietary file format for a long time meant that once a company started using Microsoft products, switching to a competing product might mean losing all previous documents.
Speed and Optionality: Many large companies manage by distributing responsibility within fairly tight bounds. Managers can manage in whatever way they think best but need to stick to the goals and timeline they articulated in the Fall of the previous year. In some instances this gives you a year of free growth even after they notice you and become concerned. Similarly, a startup might have the ability to change direction at any time, while this is unusual at a large company. Many fintech companies have taken advantage of this lag time to continue growing unopposed by incumbents even after their trajectory became a concern.
Another sort of optionality is your willingness to bet the company. A startup can do this because you don’t have much to lose. A large company can not.
A good strategy will have more than one of these elements, and will change over time as the market changes and the resources available to your company change. It’s not easy formulating a good strategy, and it’s especially hard when you’re neck-deep in running a startup. Founders don’t often have the luxury of stepping away from the tactical to focus on the long-term.
For most of the entrepreneurs I know, the excitement of building a company comes from working with leading-edge technology, or the smartest engineers, or the hardest-driving customers. When you have these things the temptation is to just start running as fast as you can. When someone asks you how you win, long-term, the easy answer is: “we’re disruptive.” Don’t fall prey to this. You can be right and still lose. Take the time to think about a real strategy. Do it early in the company’s life and revisit it every year, at least. Without a strategy you might predict the market and the technology exactly and still lose to someone who does have a strategy. When I hear the word “disruption” what I hear is “I don’t need a strategy” and that’s a huge mistake.
Pisano, Gary P., Science Business. Boston: Harvard Business School Press, 2006, p. 71. ↩
The below diagram is an interpretation of the data from Christensen, C., The Innovator’s Dilemma. Boston: Harvard Business School Press, 1997, p. 131. ↩
This glosses over exactly what a “new market” is. Some of these companies were formed to solve problems that were created when other companies used new technologies. These ancillary problems could be considered “new.” And, of course, using technology to solve problems created by new technologies has always been a large driver of new companies. But this isn’t a flaw in my argument, it’s a feature. If semantics are a major contributor to your strategy, then your choice of language must be obscuring your view of reality. Better to use a frame of reference with less ambiguity. ↩
Christensen, C., op. cit., p. xxi. ↩
Pearson, Andrall E., Tough-Minded Ways to Get Innovative, Harvard Business Review, May-June 1988. ↩
Even my blog posts. ↩
But over the past couple of years I found that I couldn’t keep up with all of the interesting new venture firms. Five years ago I felt that keeping track of the top 200 firms meant I would see all the interesting deals being done. That’s no longer true. While adding new firms is pretty easy, it still takes time. I have a day job (or two) so I started looking for someone who would take good care of a promising little bot.
I’m happy to say that Mattermark has stepped up. In addition to having great products and an awesome management team, I feel like Mattermark is the one company in the startup data space that is community-oriented by nature^{1}. I know they’re taking a look at how to integrate VCdelta into their services and I look forward to seeing what they do with it. In the meantime, you should go sign up for Mattermark here.
I guess I need to caveat that I am now a shareholder in Mattermark, so not entirely objective, but I had several offers to take over the bot and chose Mattermark for that reason. ↩