The AVC Word Cloud

On thursday, I stopped by usv.com and saw this post:

6a00d83451b2c969e201a510c5ea73970c

I clicked on the link and it took me here. Turns out Asish Datta of Setfive Consultingmade a word cloud using the raw words on AVC since launch in 2003. For 2013, the word cloud looks like this:

6a00d83451b2c969e201a3fc1668f3970b

The dark blue are words that did not appear in the top 100 the year before. You can see word clouds for all eleven years of AVC here.

But possibly more interesting than the word clouds is the table below the word clouds that lists the roughly top 300 words over those eleven years and the years they were popular and when they were not. You can see that here.

As my partner Brad pointed out at usv.com, words like company, investing, business, etc are not particularly interessting or revealing. But there is a lot of signal below that noise. One way to see it is to look at just the company names in the table and see when they were on my mind and when they were not. Only Google and Twitter have managed to make it in the top 100 every year they and AVC were around, for example.

Asish (or someone else as he’s open sourced the code and data here) could do some additional work on this and come up with some pretty interesting observations. I would like to thank Asish for this great work and in particular for open sourcing it so others can work on it if they would like. That’s awesome.

#Weblogs

Comments (Archived):

  1. William Mougayar

    The year that a new trend started is interesting. E.g.:Mobile started in 2008, and Android in 2010.Facebook started in 2007. Bitcoin in 2013.As Brad pointed out, it would be interesting to go further and analyze emerging trends in real-time. This looks more like a snapshot of the rear-view mirror.

    1. awaldstein

      Trend in real time–90% vision and gut, 10% data.

      1. William Mougayar

        Ah, that’s low for data IMO. Data can inform opinion. Data is grounded in reality, but it can be interpreted.

        1. awaldstein

          True most likely –my thinking though is it really worth it?Entrepreneurs build and investors invest way ahead of signs.Data is super useful as both operating input and market affirmation.Both I love, neither honestly helps me decide how to spend my advising time or investment dollars.

          1. LE

            Agree with that. To me business is about details.Along those lines setfive.com (if you are reading this) you should get rid of the “fork me on github” as well as the “contact@” and replace it with a real person’s email address.For that matter end users don’t know what a “stack” is (home page) and don’t care. They are looking for a solution to a problem. Car manufacturers print specs and tech details for cars but it’s not front and center with their marketing message. It’s buried where the geek will find it.(Today’s loosely connected segue of mine..)

          2. Ashish Datta

            Thanks for the feedback – we’ve been struggling with our messaging/value prop for awhile. Hopefully we’ll get it locked up this year.

          3. LE

            Separately you should plop your logo and web address nice and reasonably large in the lower right hand corner of any tag clouds. Large enough so that if anyone embeds it (like Fred has done) people know who the author is.

    2. John Revay

      Yes, the words that popped out to me were….Andriod, FaceBook, iTunes

    3. fredwilson

      it might be even more interesting to look at when i stopped talking about them, ie iTunes

      1. LE

        I suspect that it’s less important to you because you are getting your music fix in other places. Perhaps in the beginning with less choice you were more obsessed with it. (I stopped talking about Playboy years ago..)The name itunes is also a bit old and yellowed like the formerly bright white plastic of 90’s office equipment.

  2. Richard

    My hunch is that the least often used words may be more revealing about Fred’s sagaciousness. How so? Being ahead of the wave means that you might mention Harvard in 2005, probably just a few times. P.S. Fred’s humbleness is confirmed. Note the absence of “I” and “Me”.

    1. Guest

      Not questioning Fred’s humbleness, but I suspect the program just doesn’t recognize letter combinations that have fewer than four letters. I would have expected to see AVC, USV, CEO, MBA (as in Mondays) Gal (as in Gotham), New (as in York, or as in anything new), etc. You could argue that AVC isn’t a word, but Tumblr isn’t a word either so it seems like the program was told to include and exclude certain things. Actually, scanning the list, I don’t see anything with four letters that doesn’t also have an apostrophe (which the system apparently reads as “39”), which would help explain the lack of York (as in New), Etsy, hack, Dick (as in CEO of Twitter), Brad (there are at least two Brads who are frequently mentioned on the blog), Blog, tech, idea. In fact, I’m not sure I see anything with five letters either, unless there’s an apostrophe, which would explain the lack of email, tweet, cloud, blogs, ideas, Union, music, etc.

      1. Richard

        good point

      2. Ashish Datta

        Looks like I forgot to mention this but I had it throw out words shower than 6 letters to stop the noise from prepositions and other short words. Could have used a “stopword” list in hindsight.*edit 6 letters not 5*

        1. Guest

          Do you mean fewer than six letters? (i.e., a word with five letters would not be included)

          1. Ashish Datta

            Whoops yup – “> 5 letters” is what it’s using.

  3. Twain Twain

    The word which stands out is “because”.Sure, the frequency count of nouns is interesting but “because” is more so.It points to Fred and AVC community trying to work out the whys of what makes startups work or not.Whys matter because…….

    1. fredwilson

      I like that observation

      1. LE

        Cialdini “influence” power of the word because [1]:He cites the experiments of Ellen Langer which demonstrated that humans are more likely to comply with a request if a reason is also given, even if that reason makes no sense. The word “because” triggers the automatic compliance responseI have found that I agree with almost all of the conclusions that Cialdini reaches in his books (which is really rare for me) primarily because he has gone out in the field and studies how different business people gain compliance (encyclopedia salesman, car salesman for example) as well as in other situations ie “moonies at the airport”. And his observations dovetail with my own experience interacting with people, negotiating and manipulating. Human behavior is very predictable.[1] http://www.media-studies.ca

        1. Twain Twain

          Less to do with compliance and more to do with “because” being an affirmation of our curiosity?

  4. howardlindzon

    I see $GOOG as the one company on this cloud…

    1. Guest

      See my comment to Rich Weisberger below. Apple isn’t on the list. Is it reasonable to believe that Apple has been discussed less often on this blog than something called Bigfoot? I say no.

  5. panterosa,

    I love word clouds, but the bonus here is seeing the numbers behind the usage. I did wish Asish had devised a way to color code the numbers after the initial blue 1st mention. Wouldn’t it be helpful to see the ramp up on certain words, or collectively how some are simply on a plateau?This ties data visualization with color back to @albert ‘s piece on growth and visualizing exponential growth. I feel a color scale helps this data be digested better, and more quickly. There are possibilities of animating how the words which really take off compare to those which flatline or die off.

    1. Ashish Datta

      That’s actually a great idea. It would probably be interesting to correlate the word colors with the rate of change of the word’s frequency year over year so that the size captured the absolute usage but then the color captured the growth of use.

      1. panterosa,

        Yes, on the word cloud itself, but as a sort of heat map effect on the word count grid you have as well.

  6. Guest

    What do the numbers in the chart represent?

  7. chris dixon

    The rise and fall of “social” is interesting.

  8. Donna Brewington White

    This does not include comments, right? If not, that would also be interesting.

    1. Ashish Datta

      Unfortunately no comments since they don’t exist in the HTML of the pages

      1. Drew Meyers

        a little disqus comment word cloud would be awesome

      2. Michael  McCarthy

        Kudos to you, Ashish, your word cloud for AVC is very insightful, thanks very much!The related table makes it even better. Too bad Apple has a short name, therefore becoming invisible, based on less than 6 characters. Oh well, you have to draw the line somewhere, right?Here’s a tag cloud showing various themes seen in Web 2.0

    2. Michael  McCarthy

      Well said, Donna. By adding comments, it might modify the current view in a major way.Here’s a VC word cloud I just stumbled upon. They’ve also been called tag clouds, and they seem to be rapidly growing in popularity.

  9. Jeremy

    I never quite got the point of word clouds as a data visualization. They’re more along the lines of infographics in the sense of being interesting looking but not super-helpful. Personally, I’d just chart the frequencies as a horizontal bar graph and call it a day.To illustrate what I meant, I created two really straightforward visualizations in Tableau Public. I like Tableau for no reason other than the default visuals follow data visualization best practices for the most part.Take a look here and feel free to play around with the visuals. The first tab is just the total word count by sum and the second one is a visual of the top words over time. You can use the filters/tooltips to drill down into any of the trends you’re interested in.http://public.tableausoftwa