Posts from machine learning

Data Wins

Whenever people ask me which company I think will win the self driving car race, I say Tesla.

And the reason is that they have more data.

And when it comes to training machines to do what humans do, more data is better than more software engineers.

Bloomberg has a good post on that today.

Headlines

One of the issues in all of the concerns about “fake news” is the way headlines are used on the Internet. Newspapers and magazines certainly took the construction of headlines into account to drive readers into the stories. But on the Internet, headlines have become that and more. They are the links themselves that fly around the Internet and “convert” someone into coming to your site and reading a story. They are “clickbait.” If we want to address the veracity and authenticity of content on the Internet, we might want to start with headlines.

I’ve had my issues with headlines for years. Many years ago, I allowed a number of publications to repost content I write here at AVC on their online publications. The publication that does that most frequently with my content is Business Insider. You can see the hundreds of posts that BI has republished on my author page at Business Insider. When they started doing this maybe seven or eight years ago, I would notice that they would leave my post intact, verbatim, but rewrite the headline. It would drive me crazy because I view the headline as an integral part of my post. I think about the words I use to title my posts. So I would send them angry emails and most of the time they would change it back. But it was a lesson in the difference between a headline that I liked and a headline that would drive clicks.

I also have seen hundreds of stories written about me, USV, and our portfolio companies that have sensational and often inaccurate headlines followed by stories that are essentially correct and well reported. It drives me nuts but I don’t often do much about it.

It makes me think that someone, or some company, or some open source community ought to build software that parses headlines and the stories that follow and rate them for how well the headline represents the article. That “headline veracity ranking” could then be offered to anyone who presents headlines to readers. That would be social media like Facebook, Twitter, Reddit, etc. That would be email applications and browsers. That would be search engines. Etc, etc, etc.

It would be nice to see some competition in this sector so that one company doesn’t become the arbiter of what is an accurate headline and what is not. That doesn’t sound like a good outcome. But if this is done via open source, or is community powered in some way, this could be a very helpful tool in getting publishers to behave and represent their stories accurately.

And that would be a wonderful thing for the Internet.

AI: Why Now?

UK-based VC David Kelnar wrote an excellent primer on Artificial Intelligence that is a relatively quick read and helps explain the technology and its advancement over the past sixty years since the term was coined in the mid 1950s.

I like this chart which explains the relationship between AI, machine learning, and deep learning.

But my favorite part of David’s post is his explanation of why AI has taken off in the past five years, as this chart shows:

Like most non-linear curves, it is not one thing, but a number of things happening simultaneously, that is causing this explosion of interest. David cites four things:

  1. Better algorithms. Research is constantly coming up with better ways to train models and machines.
  2. Better GPUs. The same chips that make graphics come alive on your screen are used to train models, and these chips are improving rapidly.
  3. More data. The Internet and humanity’s use of it has produced a massive data set to train machines with.
  4. Cloud services. Companies, such as our portfolio company Clarifai, are now offering cloud based services to developers which allow them to access artificial intelligence “as a service” instead of having to “roll your own”.

I feel like we are well into the “AI wave” of technology right now (following in order web, social, and mobile) and this is a wave that seemingly benefits the largest tech companies like Google, Facebook, Amazon, Microsoft, IBM, Uber, Tesla which have large datasets and large userbases to deploy this technology with.

But startups can and will play a role in this wave, in niches where the big companies won’t play, in the enterprise, and in building tech that will help deliver AI as a service. David included this chart that shows the massive increase in startup funding for AI in the last four years:

I would like to thank David for writing such a clear and easy to understand primer on AI. I found it helpful and I am sure many of you will too.

Machine Learning As A Service

Our portfolio company Clarifai introduced two powerful new features on their machine learning API yesterday:

  • visual search
  • train your own model

Visual search is super cool:

image-search

But I am even more excited about the train your own model feature.

Clarifai says it well on their blog post announcing these two new features:

We believe that the same AI technology that gives big tech companies a competitive edge should be available to developers or businesses of any size or budget. That’s why we built our new Custom Training and Visual Search products – to make it easy, quick, and inexpensive for developers and businesses to innovate with AI, go to market faster, and build better user experiences.

Machine learning requires large data sets and skilled engineers to build the technology that can derive “intelligence” from data. Small companies struggle with both. And so without machine learning as a service from companies like Clarifai, the largest tech companies will have a structural advantage over small developers. Using an API like Clarifai allows you to get the benefits of scale collectively without having to have that scale individually.

Being able to customize these machine learning APIs is really the big opening. Clarifai says this about that:

Custom Training allows you to build a Custom Model where you can “teach” AI to understand any concept, whether it’s a logo, product, aesthetic, or Pokemon. Visual Search lets you use these new Custom Models, in conjunction with our existing pre-built models (general, color, food, wedding, travel, NSFW), to browse or search through all your media assets using keyword tags and/or visual similarity.

If you are building or have built a web or mobile service with a lot of image assets and want to get more intelligence out of them, give Clarifai’s API a try. I think you will find it a big help in adding intelligence to your service.

The AI Nexus Lab

In Matt Turck‘s recent blog post about the state of NYC’s tech sector, he wrote:

The New York data and AI community, in particular, keeps getting stronger.  Facebook’s AI department is anchored in New York by Yann LeCun, one of the fathers of deep learning.  IBM Watson’s global headquarter is in NYC. When Slack decided to ramp up its effort in data, it hired NYC-based Noah Weiss, former VP of Product at Foursquare, to head its Search Learning and Intelligence Group.   NYU has a strong Center for Data Science (also started by LeCun).  Ron Brachman, the new director of the Technion-Cornell Insititute, is an internationally recognized authority on artificial intelligence.  Columbia has a Data Science Institute. NYC has many data startups, prominent data scientists and great communities (such as our very own Data Driven NYC!).

And now NYC has our very own AI accelerator program based at NYU’s Tandon Engineering School Accelerator, called The AI Nexus Lab.

The 4 month program will immerse early stage AI companies from around the world with NYU AI resources, computing resources at the Data Future Lab, two full time technical staff members, and a student fellow for each company. Unlike a traditional accelerator, they are recruiting only 5 companies with the goal of market entry and sustainability for all 5. They won’t have a Demo Day, the program will end with a day long AI conference celebrating AI entrepreneurs, researchers, innovators and funders during which which they will announce the 5 companies. Companies will get a net $75,000 for joining the program.

If you have an early stage AI company and want to join this program, you can apply here.

Fun Friday: Self Driving Cars

I saw this projection from BI Intelligence below. It suggests we will have 10mm self driving cars on the road by 2020. They define “self driving” as “as any car with features that allow it to accelerate, brake, and steer a car’s course with limited or no driver interaction”. The Telsa that my wife and I drive fits that definition.

sdc installed base

It is unclear to me whether this is a global number or not. I assume it is. There are over 1bn cars on the road around the world, so that would be 1% penetration. That seems low to me.

Do you think “self driving” will have penetrated more than 1% of the world’s car population by the end of this decade? I do.

Feature Friday: Photo Search

I’ve been uploading my smartphone photos to Google Photo for my last two phones. Earlier this week, I was in a meeting with some architects and I said that I really liked the way they do the showers at the Soho House in Berlin. They asked if I had a photo of them. I opened up Google Photos, typed “Soho House Berlin”, and got this result.

soho house berlin

Sure enough, I had taken some photos of the shower. I showed them to the architects and we were able to talk about the features I liked in the shower. That was kind of magical because other than taking the photos, I had done nothing to tag or categorize them.

This works for all sorts of searches. I remember seeing a painting I really liked at The Hammer Museum in LA. A search on “Hammer Museum” produces the image I was looking for.

hammer museum

If you are looking for photos you took on a trip, you can do the same thing. Here are some photos I took of Grand Bazaar in Instanbul:

grand bazaar istanbul

Google Photos isn’t perfect. Some searches that I would expect to work don’t. But it is pretty good.

Our portfolio company Clarifai has a similar service in an iOS app called Forevery. If you don’t want to upload your photos to Google Photo and want to search them locally on your iPhone, Forevery is a great way to get a similar experience. Forevery will also search your photos in Dropbox.

Since I’m on an Android right now, I am using Google Photos but I use both apps on my iPhone.

Photo search is amazing. You no longer need to create albums and tag and categorize your photos to be able to find them. You just search for them. Kind of like how email changed when Gmail arrived.

A NSFW Content Recognition Model

Last week our portfolio company Clarifai released a NSFW adult content recognition model.

If you run a web or mobile service that allows users to upload images and videos and are struggling with how to police for NSFW content, you should check it out.

Ryan Compton, the data scientist at Clarifai who built this NSFW model, blogged about the problem of nudity detection to illustrate how training modern convolutional neural networks (convnets) differs from research done in the past.

We are excited about the possibilities that modern neural networks open up for entrepreneurs, developers, and scientists. Our investment in Clarifai is based on our belief that AI/machine learning/neural networks/etc have reached the point of mainstream adoption and usability. And we are seeing more and more use cases for this technology every day.

Solving the problems of content moderators and trust and safety teams at scale, as we discussed here at AVC this past weekend, seems like a particularly good use of this technology.

Artificial Art

Last week we opened up a new thread on USV.com to think about and discuss the intersection of creativity (art) and artificial intelligence.

We have seen a lot of interesting companies in this area but have not yet made an investment.

Of course, the entire notion that machines will help us make art or even make it without human intervention gets to the essence of what art and creativity are.

Last summer I posted an art project by Ian Cheng that my daughter was involved in. The cool thing about that art project is that it evolves over time based on rules provided to a machine. The art is initially made by humans but it evolves and changes over time using a machine. That is one of many interesting ideas that artists are exploring at the intersection of creativity and computing.

An existential question that society is grappling with right now is how humans and machines will co-exist in the future. And one of the roles of art, maybe it’s most important role, is to force us to confront issues like this.

So while the idea of using a machine to make a song or an image or a novel or a sculpture without human intervention is at some level disturbing, it is also revealing. We expect that artists will push the envelope of what is possible with technology and we also expect that technologists and entrepreneurs will be willing collaborators in this effort.

Whether this will lead to interesting investment opportunities is anyone’s guess, but we think it might. And so we are going to spend some of our time and energy thinking about it and we’ve created a public space to do that. If you are interested in this area you can follow the thread and contribute to it here.