Making The Web Smarter

Our “six words to live by on the Internet” are open, global, mobile, social, playful, and intelligent. It has been suggested that we add instant and I think that we may want to do that. But since we put this list together as a firm, I don’t want to be adding to it unilaterally without input from my colleagues. So I’ll stick for the first six for now.

Of those six words, only one of them is not a "done deal". And that is intelligent.

You could argue that too many services aren’t global and you’d be right. You could argue that too many services aren’t open and you’d be right. You could argue that too many services aren’t playful (like and you’d be right. But I feel that we are on a path to get there with all of those words. I’m less sure about intelligent.

It’s not for lack of trying.

The dream of the semantic web has been upon us for quite a while now. There have been hundreds of academic research projects, hundreds of approaches, and hundreds of startups working in this space. We have several in our portfolio like Adaptive Blue, Zemanta, Outside.in, Infongen, and one more we have not yet announced.

But this is a hard problem to solve and I don’t see a single clear path to getting it solved.  And what’s interesting to note is that the most ambitious approaches have largely been failures. If anything, the more pedestrian approaches are showing more promise.

I spent Monday morning talking to the engineers at Zemanta. It was a great discussion and I learned a lot about how their system works. I learned some interesting facts, like how reliant the “semantic web community” has become on Wikipedia. Zemanta and many others use Wikipedia as a kind of expert system. For example, if a page is linked to from a Wikipedia page, you can be pretty sure that page is relevant to the topic of the Wikipedia page. That kind of approach can be used for many different tasks, all with the goal of making the web and web services smarter.

Tagging up pages, posts, videos, images, and other objects on the web is a critically important part of making the web smarter. Thanks to google and the SEO industry many web services have gotten religion about tagging. But tagging is not a simple problem either. It reminds me of speech recognition in some ways. If you are working in a specific domain, auto tagging is easier to do. Infongen does it well in the financial and pharma verticals today and will be adding more. Outside.in does it well in the geo domain with help from Zemanta and Calais.

But my experience suggests that humans are still better at tagging than machines. One important development is the idea of "recommended tags". Zemanta provides this to users of its blogging add-on tool. I never used to tag my blog posts. Then I started using Zemanta. It does not auto tag my blog posts, but it does give me about fifteen recommended tags and it’s simple for me to select four to six of them that are the most relevant. That’s an example of a hybrid man/machine approach that works really well.

I would encourage all content oriented web services, whether it’s a blog platform, a video sharing platform, a photo sharing platform, slide shows, music, or whatever else to add recommended tags to their service, via Zemanta’s API or someone else’s. It will vastly increase the amount and quality of tags that user will submit because it removes the biggest hurdle to user tagging which is the initial exercise of thinking about what words are best.

Another huge problem is figuring out how web pages relate to each other. Links do provide a basic connection mechanism from page to page. But that’s pretty rudimentary. I really like what our portfolio company Adaptive Blue is doing to collect all the pages on the web about a particular item, object, or topic. If you find a musician you like, Adaptive Blue can quickly connect you to the various web pages about that musician. They do this in a large and growing number of content and commerce categories.

And these are just some of the efforts I am most familiar with. There are so many more. One thing that I would like to see more of is collaboration between the various semantic web companies. If we are going build a more intelligent web from the bottom up, slogging it out with mostly pedestrian approaches, then the biggest breakthroughs may come from standards and collaborations.

Just last month, several of our companies, Zemanta and Adaptive Blue, were part of a small group of companies working in this space that put forth the “common tag” proposal. Instead of describing it and getting something wrong, I’ll just link to the common tag initiative and you all can go check it out.

That’s one example of what can happen when companies working to make the web more intelligent start working together. I believe we can and will make the web more intelligent, but it is going to be a slog, it is going to take every single one of us doing our part. When we come upon something or create something we will need to describe it well in a language that machines can understand.

I am pretty sure we'll get there. But there’s no silver bullet and the solution will be a combination of many approaches working in tandem, hopefully in some semi-coordinated way.

I’m pleased that our firm has made this sector an important part of our portfolio. I can’t say that we’ve had any breakaway successes in it yet, but I think this is one area where patience, tenacity, and perseverance are going to be required for success

Reblog this post [with Zemanta]