Making The Web Smarter
Our “six words to live by on the Internet” are open, global, mobile, social, playful, and intelligent. It has been suggested that we add instant and I think that we may want to do that. But since we put this list together as a firm, I don’t want to be adding to it unilaterally without input from my colleagues. So I’ll stick for the first six for now.
Of those six words, only one of them is not a "done deal". And that is intelligent.
You could argue that too many services aren’t global and you’d be right. You could argue that too many services aren’t open and you’d be right. You could argue that too many services aren’t playful (like and you’d be right. But I feel that we are on a path to get there with all of those words. I’m less sure about intelligent.
It’s not for lack of trying.
The dream of the semantic web has been upon us for quite a while now. There have been hundreds of academic research projects, hundreds of approaches, and hundreds of startups working in this space. We have several in our portfolio like Adaptive Blue, Zemanta, Outside.in, Infongen, and one more we have not yet announced.
But this is a hard problem to solve and I don’t see a single clear path to getting it solved. And what’s interesting to note is that the most ambitious approaches have largely been failures. If anything, the more pedestrian approaches are showing more promise.
I spent Monday morning talking to the engineers at Zemanta. It was a great discussion and I learned a lot about how their system works. I learned some interesting facts, like how reliant the “semantic web community” has become on Wikipedia. Zemanta and many others use Wikipedia as a kind of expert system. For example, if a page is linked to from a Wikipedia page, you can be pretty sure that page is relevant to the topic of the Wikipedia page. That kind of approach can be used for many different tasks, all with the goal of making the web and web services smarter.
Tagging up pages, posts, videos, images, and other objects on the web is a critically important part of making the web smarter. Thanks to google and the SEO industry many web services have gotten religion about tagging. But tagging is not a simple problem either. It reminds me of speech recognition in some ways. If you are working in a specific domain, auto tagging is easier to do. Infongen does it well in the financial and pharma verticals today and will be adding more. Outside.in does it well in the geo domain with help from Zemanta and Calais.
But my experience suggests that humans are still better at tagging than machines. One important development is the idea of "recommended tags". Zemanta provides this to users of its blogging add-on tool. I never used to tag my blog posts. Then I started using Zemanta. It does not auto tag my blog posts, but it does give me about fifteen recommended tags and it’s simple for me to select four to six of them that are the most relevant. That’s an example of a hybrid man/machine approach that works really well.
I would encourage all content oriented web services, whether it’s a blog platform, a video sharing platform, a photo sharing platform, slide shows, music, or whatever else to add recommended tags to their service, via Zemanta’s API or someone else’s. It will vastly increase the amount and quality of tags that user will submit because it removes the biggest hurdle to user tagging which is the initial exercise of thinking about what words are best.
Another huge problem is figuring out how web pages relate to each other. Links do provide a basic connection mechanism from page to page. But that’s pretty rudimentary. I really like what our portfolio company Adaptive Blue is doing to collect all the pages on the web about a particular item, object, or topic. If you find a musician you like, Adaptive Blue can quickly connect you to the various web pages about that musician. They do this in a large and growing number of content and commerce categories.
And these are just some of the efforts I am most familiar with. There are so many more. One thing that I would like to see more of is collaboration between the various semantic web companies. If we are going build a more intelligent web from the bottom up, slogging it out with mostly pedestrian approaches, then the biggest breakthroughs may come from standards and collaborations.
Just last month, several of our companies, Zemanta and Adaptive Blue, were part of a small group of companies working in this space that put forth the “common tag” proposal. Instead of describing it and getting something wrong, I’ll just link to the common tag initiative and you all can go check it out.
That’s one example of what can happen when companies working to make the web more intelligent start working together. I believe we can and will make the web more intelligent, but it is going to be a slog, it is going to take every single one of us doing our part. When we come upon something or create something we will need to describe it well in a language that machines can understand.
I am pretty sure we'll get there. But there’s no silver bullet and the solution will be a combination of many approaches working in tandem, hopefully in some semi-coordinated way.
I’m pleased that our firm has made this sector an important part of our portfolio. I can’t say that we’ve had any breakaway successes in it yet, but I think this is one area where patience, tenacity, and perseverance are going to be required for success
Comments (Archived):
The same topic has been on my mind lately. One problem with Zemanta-like proximity recommendations is that it force-groups pieces together, which leads to an echo-chamber like connected enviroment. Disparate pieces get increased weighting, and then that weighting causes a self-fulfilling prophecy. I experience this is Last.fm all the time, as well as Amazon’s recommendations. Yes they are good recommendations but usually many relevant pieces get left out. And the pieces that get pulled in tend to be repetitive. In areas with long tails, this is a huge problem.Avoiding this is what brains are great at. And computers struggle to emulate this.
very true. we had that exact discussion on monday morning at zemanta. theywork very hard to find different sources for this exact reason.
I was just asked to help a friend set up a blog, and I keep thinking what kind of products would be good for this blog (Reviews of Horror movies, lets just say he is such a huge fan that he now sells signed prop items on Ebay.)One of the reasons this keeps happening is that human language have nuances and are extremely expressive (Poetry), and that we develop new ones (dialects and subdialects do end up turning into separate languages over time) which machine languages are noticble mathematically and are often over time developed to be even more precise.Semantic Tagging is difficult by its very nature because computers don’t think in the same sense that humans do. We want things not to be the same because that is the place where the imagination leaps and soars. That is why we create long tails to begin with, and why they will only get longer and longer.In the World of Humans:The sun is not just the sun- it’s also a sign of mysticism, inspiration for sculpture, expressions for math and light, and the list can go on forever, a ball of plasma, a stuff toy that someone had as a child, and the list can go on forever. Meanwhile, the semantic tag for a computer probably would be SUN. And that is just English.
I agree. One approach to this echo-chamber problem is to generate recommendations not as an ordered LIST but as a GRAPH, which has core and periphery. Then let the users do a more sophisticated decision in the topology of the graph rather than simply using the dictated order of the list.
Core-Periphery Graphs may be great at displaying recommendations – I suspect there could be other innovative information envisioning techniques. What makes this idea very interesting is that it lets the user (human) infer from and interpret a large set of data, ratehr than making a specific recommendation. Who’s working on this stuff?
I like that idea.
“But my experience suggests that humans are still better at tagging than machines”I don’t think this will change any time soon. There may be a happy medium between the two, but I think the natural human element will always be most effective.We have the tools to aggregate the collective opinion…that’s where the technology impact should focus!
The topic is very interesting.I haven’t heard till now about the companies you mention and find them interesting. One question related to that is: could Zemanta APIs learn from your indications, e.g. in a “supervised” way, applying this knowledge for later recomendations?.
Serendipity is also key – tags can capture/discover the objective but to be truly intelligent – eg, EQ as well as IQ – we also need to capture and discover the subjective.And that’s the hard bit.
Well stated egoboss. The relation between tags, and even more obscure analogies require a surface that machines can learn from. Linking and image insertion may help machines identify analogies. But the images and links require tags (meta data) to help the systems learn.
That’s why I like user tagging like delicious
Why are they not invited, or are they being invited under Yahoo’s platform? They have a lot of insight of all groups on how to guide people into a collective nudge decision making on a tag without being mean.Delicious is insightful to allow any tag to be used as long as it doesn’t go over their charachter limit (which I believe is very large and hard to go over). No word is left off limits, or groups of words. Anything is possible, from emotions(evil), to the most basic description (bluebackground). So you can create a very rich tag cloud if you so choose, or a very simple one.However, they forced a limit in scope that will be useful to semantic tagging: They showed you your related tagging history and they showed you what everyone else is doing on the same page. Even though they allow for tweaks to make things better, most people are likely to just reuse the popular tags that they link, developing a semantic tagging history that is “logical,” without being totally tied to the system, in case something more “logical” comes and takes its place. (They’ve done studies on this, if you show people what others are using, they are likely to choose the same involving music downloads see here) After a while, the tags become semantically useful by nature because of Delicious’s system. Everyone seems to think in their own heads that Delicious tags are the correct tags. (or group of tags). So I really hope that they are in on this one.This is among the many reasons I’m extremely impressed with them and they way they’ve decided to construct a website, a technology, a web being. They are a case study of good web technology from usability standpoint in almost everything they do. They understand humans. I am extremely happy with them, because of the processes they use. You should be proud Fred. 🙂
I really regret that we sold delicious. What a great asset it is
Comments like this making me think and wonder if being a VC is like being a parent, the companies are a bit like children to you, and it is good to hear news of them flourishing, the way parents do about actual children.
That’s exactly how I think about it
The web will never be smarter for several reasons:1) People still create the websites. That means they are subject to all kinds of human frailties: ego, arrogance, naivete, ignorance, and more.2) The system can be ‘gamed.’ You mention how Wikipedia is at the core of the current semantic efforts. This has two problems – first, the ability to have anyone edit a page means unrelated links could be added; second, the reliance on self-identified experts means it is possible for related links to be deleted because the ‘experts’ don’t want the link for one reason or another.3) Efforts to impose any kind of “common tag” system will not work because tags (like links) only have meaning assigned by the humans making the tag. I could write a blog post on President Obama and tag it with “quantum theory” … which may mean something to me, but not the real meaning of the word. Common tags require common definitions and thus some kind of standard by which they should be applied — which means using them has to be policed in some way.I’ll make the bold prediction now that any real attempt to make the Internet “intelligent” will ultimately fail.
But unlike human tagging, automated tagging can be self consistent. In addition weighted correlations can be generated with probablilistic models to create an instance of “meaning”.Discounting machine learning is naive, after all aren’t we all just complicated biological machines?
. I could write a blog post on President Obama and tag it with “quantum theory” …Yes, but if everyone else gave it meaningful tags, yours would be less relevant.This is why I say that all the tagging systems should be connected via some sort of open tag exchange, where the value of what you get out is worth more than the value of what you put in.
And why user tagging is key
Altruism is evident when the web works best re: tagging, recommendations, links, etc. This is good.If everyone was just a little bit more altruistic, we’d all benefit – and most importantly in life itself, not just in ‘net life terms.
The more important issue is: why on earth should users ever tag?Isn’t it hard enough to get people to do one thing on a given website that we now have to have a call to action to get them to tag?Relying on website visitors to tag is wholly unreliable because it’s too much work.
Unless you make it easy or even fun for them
You are essentially saying any “intelligent” Internet is gated by human capabilities?
I think those things you label as frailties is what makes us as collective humanity strong. We look at the stars, the ocean, the internet, etc and through those very qualities, push to change our enviroment in order to perhaps better our lives and better the world we were brought into. Our very humaness is what gave you this comment system. Someone had to sit there (Hi Daniel and his team) and had to think through a bunch of problems, and wonder about them, and think about how people are using commenting systems, and a whole slew of other issues involving them, alongside all the emotions involved with this. And now you have a commenting system. Never underestimate the power of emotion in logical decision making. Without it, you would totally be crippled into deciding between a cheese sandwich and a tuna sandwich for lunch.I was brought up in a very religious environment- whatever I think of religion may be irrelevant, but it did give a very powerful message to me. When we think of most major religions in the World (barring some of the complexities of Eastern Religions, which I do not fully Understand, I would need to ask a friend or two, who study them professionally) we think of Creator-Beings. Some stories they have hubris, some are more abstract, etc. What makes Humans unique, always, in these stories, is much like Creators, we also have the ability to create, including the ability to create meaning.Our internet is a reflection of the common human soul. It won’t be intelligent per say, but it will be extremely interesting because it is much more dynamic than anything previously thought of, since it will probably be a truer reflection of who we are.
Well I said it wasn’t a “done deal” but I sure hope you are wrong. We are betting on it, literally
If you are betting on it – then you need to determine how you can account for pride, greed, gluttony, sloth, envy, anger, jealousy, hatred, naivety, ignorance, apathy, and more.Just like the alchemists who practice in “search engine optimization” and similar voodoo, there will be “semantic web optimizations” consultants who attempt to game the system you are creating.Humans suffer from base emotions and the ‘deadly sins.’ They are easily persuaded (see conspiracy theories). They prefer to have their own beliefs supported rather than questioned (changing the Internet to a series of echo chambers).To truly succeed requires more psychology than technology.
I very much agree with that last point
Just a follow up Jim since you got me thinking. Semantic Web, Can it Happen? I’m a big fan of how opposing viewpoints can spur on more serious thought and I look forward to getting a stronger vision of what is and isn’t possible in this arena.
Once the semantic web takes hold, I think we’ll all be working for machines. That’s why I’m much nicer to my refrigerator these days – no slamming, keep it clean, etc. I’m hoping the fridge will pass the word on my efforts.
Good thinking. Maybe they’ll spare you when the time comes.
Tell that to the fridge.
You don’t have to worry about the fridge. Just think about all the times you flipped the ‘off’ switch on your PC without shutting it down properly. Did you not hear it cry out in pain? Did you really think nobody was watching?
This thread amuses me.Trust me, what you really want in a computer is one that works seamlessly with you, much like the majority of computers in your car. You don’t notice they are there. They work to help you drive, you do mostly unconscious driving, except when it is conscious. Everyone is happy.
Markoff had a piece in the weekend NYT about that fear
He should know. Markoff was a range stove before the NYT. 6 burner, radiant heat. He was a looker but I didn’t swing that way.
you’re writting as if you were in a mission to help the web. It gaves me the impression of a “non profit” organisation who try to solve some Internet issues.But while thinking a little more about it, I realize how great it will be for Internet Business if, indeed, the web was smarter. I’ll try to keep that in mind : larger views, can bring bigger business.in other words, I loved your post.
I think its best to invest with a mission
I like that, I don’t know why. But it makes me happy all the same.
After chasing linguistic tails and spining semantic wheels for months, we realised that it is not the tools nor techniques that makes this space so cool, but the terrific applications these insights open up.It is all too easy to get caught up solving really fun, difficult language problems. It can consume a startup and eat it whole.Yet right now there is so much low hanging fruit laying everywhere, you don’t need to be a professor to yank them from the tree.Examine a random piece of text and try to identify a song title. Impossible!But look for the right context, look for the text in the right place, and picking a winner is easy.Great post again Fred.Enjoy! Stephen Phillipsfounder / programmerhttp://wearehunted.comhttp://celebrityhunted.comhttp://wotnews.com
Wearehunted is proof that you are doing something right!
After reading what you say here about tagging, wikipedia, and “portfolio” services like Adaptive Blue, I wonder if what you are calling a more intelligent web is a more homogeneous — and therefore less playful, less diverse, less spontaneous, less complex — web. A new orthodoxy about, and enforcement of, what really matters seems to be at work here. In that case the word “intelligent” begins to lose some of its force.
That’s a real concern
Yes, and it’s easy to mistake efficiency for intelligence, or to confuse the two.
A problem is that the meaning of something and the corresponding tags can be different for different people at different times of the day or based on a different mood or context. This makes the ‘perfect’ solution very difficult if not impossible to reach although as we ‘humans’ are not perfect, perhaps the web should not be perfect either.I think that we are still going to need for sometime to use some form of human input and crowd sourcing but we need to focus on how to make these options more relevant.In my view, design and usability can help a lot. Replicating the way people use the web and predicting what they need is actually not that difficult as long as we take a small part of it at the time. Building a solution for a specific sector (e.g. travel) and then use the lessons learned for different sectors is in my view a much samrter approach than trying to come up with a clever algorythm that does it all.That is what I am working on anyway.
The ideal state is a mix of some human + machine. The machine can’t tell what you’re thinking, so a user must at least give it some direction, unless you want the machine to learn from your clicks (which I know you’ve been asking for), but that path might be paved with long trials and errors initially. I’d rather give it a quick start with human input, then let the Web refine itself for you.Common tags might alleviate the repetitive tagging problem, but it doesn’t solve the hierarchical aspect which serves to classify tags into “entities” they belong to. Current folksonomies are a flat list, and that’s an impediment to smart contextualization of information. That’s very important for providing some sort of intelligence for discerning between content itself and its meaning. It’s the meaning of content that can lead to intelligence, not content itself.At least, that’s the way we have approached this at Eqentia (semantic aggregator platform), where we develop the taxonomy first for a given context (whether it’s a subject matter or a person’s interests), and we let that guide how content is tagged and organized for a given context. We’ll be adding later the type of on-going refinements resulting from learning from clicks.
I find this an interesting topic Zemanta I feel is at the beginning of a great journey into world of effective communication when I worked ,communication was very important and my first Word Processor was XY Write-later I switched to Nota Bene http://www.notabene.com/ built on the Xy Write Engine-XyWrite is a word processor for DOS and Windows modeled on the ATEX mainframe program[1]. Popular with writers and editors for its speed and degree of customization, XyWrite was in its heyday the house word processor in many editorial offices, including the New York Times from 1989 to 1993. XyWrite was written by Dennis Erickson and marketed by XyQuest from 1982 through 1992, after which it was acquired by The Technology Group. The final version for DOS was 4.18 (1993); for Windows, 4.13. – and of course I had the OED as a -http://www.oed.com/ which I still use-I can see my Future Zemanta as a combination of all the best of writing and research tools , as the content creators on internet fight for a place in our bookmarks the future wordsmiths will demand even more from their applications using Wordress Drupal Typepad Joolma or more with just a a simple editor won’t cut it the Kings Of Content will have to wear a new crown-Fact Check will be less ignored the ability to hold the readers attention will need data attached that is relevant and publishable-And Zemanta will I feel find first place Among my favorite bloggers people like Louis Gray Seth Godin Dave Winer Shel Israel Frank Eliason Albert Wenger Alex Iskold Michah Baldwin Steve Rubel And More-Adam You Might Look at Nota Bene it will show you really what a top not machine developed app can create especially if we are doing research we have tools but many of us lack certain skills to create effective communication but them I have lived for 73 years as an optimist with a clear conscience! Comments are a wonderful way to start a day !
I hope you are right about zemanta
I’ll bet you a pastrami from Katz’s
Done
Looking forward to the layers of the Internet helping in this domain. Zemanta can keep pushing their semantic extraction tool correctness and perhaps include human feedback to improve the tag matching category. But the essential tags need weights as well, and some form of relationships. Saying I dislike company or product X should be meaningful to the way web describe information to machines, maybe as simple as a NO pretag. Machines need to get adjectives and simple relationships before a fully semantic web can be realized. This is a hot topic for me but I’m mired in authentication and setup before I can dig into thr fun stuff.
Tagging is one way of adding more context. And context is key. I think not all context can be aggregated by use of technology (face recognition, friends lists, geo info, exif data, time, cross checking event databases, pattern recognition, OCR, etc.)With all the innovation going on on all fronts, people with have more and more time on their hands.Tags should be really easy to add, easy to manage, but most of all benefiting the individual to gain something (organizing, becoming part of something bigger, claiming a domain… etc.)
I wonder if the goal of semantic web intelligence is/should be actually less about intelligence independent of humans but more that intelligence of the web has a cerebral cortex that IS human. That the web like left and right brains should not be a complete intelligence in and of itself but collective intelligence that includes humans in it…on some occasions and types of activity… little use of that part…and other intelligence a large use.This maybe making the focus not about recreating a massive intelligence with judgement on moral or artful issues and more on ways to better do the math of collective human intelligence. Idon’t know. …but I think it may effect how we approach it.
Common Tag will never work. Should put it in the trash where it belongs.Taxonimy on domains is what matters. Not some artificial, free but not, construct.
Nice discussion – I just wonder if we as “humans” are ready for this – not just in web services either.After several of your last posts, I thought you would have included “free” as one of your words.I think I would rather see “people” be more intelligent – wish there was something that could make me smarter!
Not free, but maybe freemium 🙂
Hi FredApologies for just posting a couple of links, and hoping that they add to the conversation (which of course I’m happy to participate in in a less linky fashion). The links are to a couple of things I wrote on this subject:The semantic web is the new AI (http://bit.ly/lulZ0), and a followup on the importance of data representation (http://bit.ly/QNU8i).The basic claims are that, as you suggest, we’re probably better off focusing on simpler engineering things that we can actually realize (i.e., be pragmatic about it, like Alex is doing with Adaptive Blue; don’t try to really implement “intelligence”). Then, that the goalposts are historically moved in any case (i.e., plain engineering can get us there, as it did with the claim that you need “intelligence” to play great chess – you clearly don’t, you just need better engineering). And 3rd, my own bet on how to get there, which is to adopt a better underlying information representation – one that makes some things that today look like problems simply go away.
Hi Terry,As Shrek said, ogres are like onions – they have layers. So does semantics.At the top level is the ‘easy’ semantics, tags, RDF etc. Humans resolve the meaning and ambiguity, and machines just do the bookkeeping. As you peel back the layers, the machine’s level of understanding increases, until you get all the way to the Turing Test.As you point out, a dumb chess playing machine can give the appearance of intelligence simply by maniupulating rules input by humans. This is analogous to the first layer of the onion: the machine has no underlying understanding, but gives a credible account of itself simply by virtue of its ability to manipulate large amounts of structured information.Given the current and foreseeable state of AI, all practical/commercial semantic implmentations must similarly be based on the outer layers of the onion.At which point I would argue that it is commercial, rather than engineering issues which are holding back the sector. There is already a vast quantity of tagged information out there, but there is very limited scope for accessing it via open APIs, and being able to aggregate it into a larger and therefore more useful entity. Just as the chess playing computer gets better when it has access to more chess data, so a tagging system improves with the more semantic data at its disposal. It comes as no surprise that Zemanta harnesses the breadth and scope of Wikipedia, for example.Somebody should build the semantic equivalent of Twitter – an open system for the exchange of metadata and semantic taxonomies.On second thoughts, this isn’t too far from what your database can do, or am I wrong?
Hi DavidFirst off – partly to be provocative, partly for the record, partly in earnest – I don’t believe there’s any such thing as understanding. I.e., that it literally does not exist. It’s just a word that we romantically like to think corresponds to something in the real world.I was going to point you to my comments on Gerry Campbell’s article on Big Honking Databases http://bit.ly/NBM4 but then I realized that we had that discussion over there already :-)On a more substantial note, I agree with your conclusion, and yes, FluidDB will offer a way to do as you suggest. I sometimes describe it as a metadata engine for everyone and everything. As you say, more uniformity in data leads to more power and utility. That’s enough for me. People can call it intelligence and understanding if they like, I just prefer to keep my feet on the ground and describe it as better engineering.
I really think you should hang out with the Rhizome people and write to Dr. Ben Fry http://benfry.com/He's one of the two creators of Processing, which is an art based computer programming language off the Java Virtual Machine (though their have been wraps for other purposes/languages). He wrote a book for O’reilly called Visualizing Data, and one the questions it answers is s how to get rid of junk in a Visualization. Aetr in this field is extremely interesting, talking to one of its leaders will be helpful.You’d make a wonky, but good match as you work out semantic vs database issues. And you make we want to go read philosophy (any good translations for Heidegger?)
Thanks Shana. I know of Processing, and that it’s thought very highly of by folks in the graphics world. I’ve not met Ben Fry though. Thanks for the pointer.Yes, you can read Heidegger in translation, but it may not make much difference 🙂 His “Being and TIme” has been called untranslatable. See if you can have a look inside it online before you lash out and buy a copy.
Thanks, and yes Processing is thought highly of in the Graphics world, but I am seeing some cross over with some comp sci friends who want to see me make the moveinto comp sci land. It is hard to find languages and ways to teach where you get visual feedback for what you are doing: Processing is one of the few languages that gives that sort of feedback.If something can draw meaning when it is not so easily there is a huge test in art and its creation. This seems to be an issue running closely parallel to the idea of semantic tagging- if you can draw a meaning onto it and label it, even if it is not immediately apparent.There is a concept known as Visual Language, or how objects and symbols in art actually make meaning without text or words behind them (or even with, if it is text art, in a sort of wonky way, see Jenny Holzer for a good example), and I think if we talk about semantics, drawing in a world where semantics are difficult, but where there is a noticeable difference between good and bad semantics would be useful. Asking the same sorts of questions that are normal in a critique environment for semantic tagging might prove to be the answer needed.
Underpromise and overdeliver
It is why art applications on this level are extremely interesting. The WikipediaArts controversy is a great example, among others, of how far this can go (they tried to make an artwork on Wikipedia that was self-referential that anyone could edit….) Take a look at Rhizome @TheNewMuesem, or people who Like Spoetry. Is it you who see a soul in the machine, or is it your soul being planted in the machine for all the world to see?
It’s hard enough envisaging intelligent machines – let alone ones with souls!
When you design something you stick a little bit of yourself in it, because you stick a little bit of your own perspective in it.
Thanks Terry. I’ll check out the links
I don’t think the web is inherently more playful than anything else human, and that term may trivialize what is going on here. One important facet of the web is that many people’s motivations for doing whatever it is that they do here are extra-market: not directly for financial gain. There’s more of a gift or tribal economy at work, but it’s just as serious and purposeful as anything else we are doing in the world.
Maybe. But look at how the “game of twitter” has made it more fun to use than an RSS reader
great comments – my brain is bulging…
To address your starting point… if listing “six words to live by on the Internet,” I think I’d have to include “free.” Fred, I think that you would have done so at one point. Have things changed? I don’t think so, but I’d be interested to hear the opposing view as of mid-2009.
I’ve been promoting what has become known as freemium for a long time. So I am not a so called ‘freetard’. But I do think free is the dominant model of the web
You can’t stop at six words. Seven is the number of completeness. 😉
Ok then we’ll add instant shortly
personally i prefer six as a number of completion, symbolizing the reconciliation of opposites (i.e. two opposing triangles creating a six sided star — i.e. star of david)also in judeo-christian mythology the universe was created in six days, now that we are creating a new universe on the internet, perhaps the number six is relevant again.some folks say six is the number of the devil (i.e. 666), although i think that is a misinterpretation of some biblical passages….or at least not an interpretation that i personally favor (at which point those folks say i worship the devil….lol)what number is best for your firm though really depends on the numerology surrounding your firm, IMHO.
I’m not into numerology very much. I actually don’t much go in for anything that ends in ology
Technology?
Great post, Fred.I’m a fan of each of the companies you’ve invested in with open editorial APIs—outside.in, Disqus, Zemanta. Each one is creating a layer useful commercial applications whether it’s at the geo-local mashup level (per outside.in), distributed commenting (per disqus) or semantic relevance (per zemanta). They’re all useful in raising the back-end game whether it’s for the single user (blogging) or in an enterprise-wide cms. I also agree that there need to be bridges built between taxonomy and tagging—finding the hybrid solution between vocabularies of meaning and user-based tags; seems to be something a lot of people (myself included) have been thinking about (for an example, see this pretty cool dek http://bit.ly/LJRhd). Common tagging is way cool.But I’d like to make another point that goes to the core of your comments. Making the web smarter whether it’s with these tools or other variants of semantic technology that share a similar use pattern (e.g. Open Calais) won’t do a damned bit of good unless there is content strategy leadership at media companies and agencies recognizing the value of these strategies in organizing their digital for increased customer service as well as growing revenue. (Wait: did I just say customer service and media in the same sentence!?) We talk so much about why newspapers are failing and the absurdities of (for ex.) AP’s new link policy, but rarely get into the strategic opportunities for media companies to create waves of innovation (and invaluable new layers of content and tools) for customers and rev growth by using these technologies.To put this another way: you can make the web smarter via cool tools but unless people at media companies use ’em and make their own sites more intelligent with better content strategy—more useful (more use-case tested) and more relevant on the front-end (more than just a blinking cursor in a SERP textbox)—we all might as well stay home. Innovation has to take place on the front-end as well as on the back-end for real intelligence to grow. And unfortunately there’s still precious little take up of this idea at most media outlets.
So true. We gotta use this stuff
I don’t think most people realize how hard user testing is. I wish I could do a user testing company for a living, where I profile users based on a huge number of factors: Device accessibility, learning curve, psychology, disability access, child access, the list goes on.Doing alpha/beta isn’t enough either, because one of the things you notice on the web (actually in general) is that people don’t have to use a product as intended AT ALL. Doing front end media testing has to include ways of watching and measure this, as well as predicting trends (humans sort of change to their environment, sort of don’t).It’s a huge issue that has not been resolved and involves a huge amount of actual people rather than computers. I get lots of good answers about tech for example from just watching people on the subway and then going up and asking what they are doing and how they feel about it- because it is nearly impossible otherwise to get a good answer of real life usability without paying money, and even then, you get some weird results that you could see for free on the subway, or in the park.
Well user testing is only part of what I was referring to.I was actually thinking of what journalism is, what news is, what the unit of a story consists in, and what the business model of that unit is, particularly in terms of the trade-offs beteween content creation costs and user need.Everyone talks the game about aggregation and curation, but hybrid taxonomy/tagging (e.g., zigtag) and link acquisition may lower costs substantially to newsgathering—and in the Jarvis model (do what you do best and link to the rest) provide a nearly good enough substitute to traditional news operations.The problem is that we still think of narrative units as stories and have done little to no innovation thinking through the news event as a data nexus. There are amazing tools to help writers in finding out, semantically, prior related news-instances (a kind of “author memory” if you will) but knowing how to incorporate those into a story—and how to encompass the tentacles of story across the network—is something we just don’t know how to do yet.So again: this is great plumbing, but how do we innovate on the front-end?
Interestingly, I have been thinking on the subject late last night. http://www.shanacarp.com/es…I thought it was important to look at the last time newspapers took on technological change. In certain specific ways, the story of telegraph and newspapers has some rough parallels so it bears the review. (It is definitely worth looking into the story of the AP, just a great story)If they found a reason and a way to co-opt the aggregation tools of the late 19th century, they can figure out how to not panic now.
My comment is less about common tags than about the six words you “live by” and more importantly, the seventh that has been proposed, which is “instant. I would venture a quick comment that a requirement for instant will hamstring your internet. I would propose “agile” instead. Where instant is a requirement for agility, instant will be found.But it won’t force you into instant. There is nothing wrong with asynchronous on the web, and instant will kill asynchronous. Agile won’t though.Interestingly enough, I find myself brought back to suggested tags by this thought, as there is no instant requirement for a suggestion. In fact, one way for a Zemanta like company to grow, would be to crawl content linked from it’s customers’ sites and send automated “suggestions” to the authors. Or even to suggest to their customers’ that they suggest tags for linked sites as a way to grow…Either way, agile, but not instant…Ben
Good counterpoint on instantThanks
Here’s the key part from your pointer to the Semantic web – “making it possible for the web to understand and satisfy the requests of people and machines to use the web content”At it’s core the web is a client server mechanism – just like two cans and a piece of string. For the web to understand the requests of people and machines it requires “more data”. Think back to the Union Square blog on Mobile where I pointed to RFC 2616 and specifically section 12.1 – The protocol (the string between the two cans) is responsible for sending data – however there is a fixed amount of data inside the headers which the web server (one of the cans) uses to understand the request of the “people and machine”.The beauty of RFC 2616 is that the protocol (the string) is extensible i.e. you can add to it. The problem is how do you do that in a way that people and machines can interact seamlessly with? The web server needs more “semantic data” – in other words it needs more “meta data”. The place to insert that meta data in inside the headers themselves. The protocol allows you to do this – the problem is how do you do this in real time and make it compatible with the roughly 1/4 billion web servers on the planet and the billions of connected devices?For the web to become smarter you have to look at what runs the web – that’s the place you start. It’s pretty obvious that the next issue to solve is how to make the protocol (RFC 2616) smarter by adding more data. Section 12.1 says that there are some big disadvantages to this.To that I say nonsense – simply add the data e.g my latitude, my longitude, my speed, my direction, my everything to the outgoing HTTP request. All the server has to do is read it at the other end.Once you have that meta data then you (the operator) can make the web smarter.It’s all about the meta data. We’ll shortly be releasing a free version of our software which does exactly what I describe above.Peter
Sounds like something interesting peter
I’ve been thinking more about your post, specifically the title – “Making the Web Smarter”… you’re spot on in the title, what I’m focusing in on is the “web” part. The web is RFC 2616 (plus some other stuff). That’s what you “have” to make smarter. Everybody can add tags etc (which is just more data) but the user has to remember all that stuff.What you really want to do is make the protocol smarter. Tim Berners Lee (who you reference above) is pretty specific when it comes to his design. Firstly there is one Internet. He does not want it segmented into a Desktop Internet and a Mobile Internet etc. So that leaves the thing that joins all of us together.What we’re suggesting (and about to release) is a way to make the protocol smarter (more contextual) without having to change any of the current infrastructure. That way the web server (machine at the other end) can do more of the heavy lifting because it will have access to critical data to help determine what kind of services should be delivered to the client at the other end.Bottom line – think of any piece of really useful data (lat/long/device ID etc) and you can add it to the web protocol in real time. Now you’ve just made it smarter. What you do with it at the other end is up to you.Here’s an interesting example – Let’s take Search and Voice. Which are essentially Content and Contact (perfect on a mobile phone). Now imagine you fire up your mobile browser and go to http://www.searchengine.com – the protocol that connects you two has lots of new data in it. The search engine can use that data to deliver more personalized and targeted services. Now imagine you click on the browser menu and because the Search engine already knows Who, What and Where you are the menus change dynamically to accommodate a new service that you signed up for – Voice.In the menu there’s your Voice number (separate from your mobile phone number) You click on it and a new web service fills the browser page – what’s “smart” is the fact that this new service has access to all the same meta data that the search engine had – why? because it’s in the protocol.TBL designed the protocol to be extensible/smarter. Once you extend it with meaningful data there is no end to the services you can offer customers. (BTW the example above is already designed and working.)
I agree. The web is the platform
You run some interesting risks here. Not everyone wants all this data exposed, remember that.
Amen to that – that’s why when we designed it we built in something we call “protected field encryption”. You (the consumer) now have complete control over what you share (right down to the field level) and to ensure your privacy you can chose to encrypt it and only send it to an approved white list controlled web site.
My smart/stupid question envolnving anything that says “trust me” and “encryption” (though you don’t have to answer it right this second- but think seriously on it): No encryption is flawless. That’s because there are humans using it. And humans who will want to abuse it. And humans who are careless too.Is “adding more data” in the stream the right answer when my college still actively takes picture of the Professor that looks like a Pirate in order to warn against Phishing? I work with an assumption that there are days where I may need protection from what I want, for it may be a bitter, barbed pill in reality.
You pose an excellent question, and one we’ve spent many years thinking about. Ultimately it all boils down to one simple little word – Trust. And the more you Trust the more it drives transactions.We knew that people would have a problem with encryption so we built a set of Open API’s so you can add your own encryption (you don’t have to trust ours even though it’s 3DES). On top of that we worked with another simple design principal – only share the data with people you trust, so every field of meta data has a check box next to it. If you trust the web site then it will send the data, if not then uncheck it.We then built in a white list so you could only send data to certain web site – but we knew people would want to send it to everybody so we have a switch (windows mobile version) that allows you to send your data to everyone.There is no silver bullet here. Ultimately it’s all about “Me” and what I want – we’re all different and we all have different degrees of trust – what we’ve attempted to architect is the most flexible, extensible, scalable solution we could, that aligns to the two entities – the consumer (you and Me) who want “Convenience, Privacy and Control” with the Enterprise who wants to make money (but doesn’t want to change their existing learned behaviors when it comes to running a web service.)
Thank you for your answer. Well, I looked up part of this post (what is 3DES, says the ditz in me.). And I know something of design. And I know my email inbox. Trust comes with understanding. :)How many Boxes are we talking here: 3? 24? One for every website? How do we build a whitelist? Are some sites already included? How do we disinclude? Are there sites we want to disinclude sometimes, but not others?The basic question is thus: Will my my pretend high school student, Chris, be able to understand what is going on, and will he want to check the boxes, and will his parents be ok with the information he gives out (he’s only 16 you know, and they are worried about how much time and with he is doing on the internet, they want him to get a sports scholarship, and it on a shared network, which they don’t fully understand, they just like that it works).Remember to think like the people on the ground- are they really going to check the box and understand what the box means? Or is this better off passive, and if so will people be angry at you? Don’t get yourself in a hole here- there is a good idea in here. And if so- why?(You don’t really actually have to answer this in full, just something to think about)
Shana,We’ve designed the solution to be as flexible as possible. It’s eminently customizable so if you only wanted three boxes that could be done. Right now there are multiple boxes under each set of preference (Owner, Location, Device, Search & Advanced).For example we’ve integrated 7 search engines (Our own demo, AOL, Google, MSN, Yahoo, True Local (Ca) and Yellow Pages. If you have our software installed and have activated the check boxes to say share real time location information and then navigate to one of those search engines they will receive your data.Building a White List is as simple as adding a favorite. Lets say you want to send your data to Fred at AVC… all you have to do is enter http://www.avc.com in the white list and when you go to that web site Fred gets to know, Who, What and Where you are (assuming you’ve sharing the data with him).There are currently 24 options for the white list. If you want more than that we can add more fields, but most people never go to more than 24 sites. On WM you even have an option to share your data with every site you go to.Regarding your comment on “are they really going to check the box and understand what the box means?”The answer is I don’t know. Do people really understand what happens when they click on a link and all of the data that moves in the background?We started with a simple premise… people want convenience first and then privacy second. We’ve made it as simple as we can – the default is to behave like the current browsers do and send no data… you can then share as much as you feel comfortable with.To me it’s all about the customer experience – it has to follow the 0-1-2-3 rule. Zero behavioral changes, one log-on, 2 second response time, and 3 clicks to relevant content. If you can do that in a cross platform environment then I believe it has value.The good news here is that as you add more “meta-data” to the Web the smarter it becomes about Who you are, What you are (the device) and Where you are.After that it’s all semantics.
It’s about assistance. Computer assisting humans on different levels at different tasks. content discovery (Adaptive Blue, Outsidein) and content creation (Zemanta) are both great places to improve the level of help computers can offer. Web search already provides everyone with superhuman abilities in information processing! Now the question is what are other activities that can be massively improved (revolutionized?) by new approaches.While most of the web until now was concentrated around sites and services, a new wave of web technologies has been created that helps with getting stuff done faster and better, wherever you are. Siri, Zemanta and others.This was largely overlooked area of technology until recently. The question is not can we deliver fully intelligent agent, but can computers help more than ever, because they have better data than ever, better algorithms than ever and better deployment options than ever.Semantic web, tags, are one part of the story: they mean better data. Web 2.0 also brought better and more data than ever – Wikipedia being great resource, but there are also others: MusicBrainz, Freebase, IMDB, Amazon, all these are relatively accessible databases about the world of human affair.There have been some breakthroughs on algorithms front lately and that is what excites me. Full and complete understanding of natural language, computer vision or speech recognition are still very very far away. But advances from the recent years have a chance to massively improve how human-computer interaction works.The question is which existing behaviour (and business models) will be disrupted by all these changes.Andraz Tori, CTO at Zemanta
DO you think full recognition of natural language is possible because of the nature of dialect and subdialect and language ticks?
It’s funny, when you are dealing with datasets that you use for computer to learn (for example texts marked up with all mentions of places, people and companies), there’s a thing called “human-to-human agreement”. You basically give different humans the same seemingly easy task that needs ‘understanding’. Then you see how similar their results are.You’d be surprised how different those results can be. These tests show a limit that humans have on understanding each other [without additional conversation to refine it]. And those are the upper limits, and computer’s state of the art is way below them (depending on the type of task). So for full understanding of natural language in all cases you are placing a super-human requirements on computers.It’s way too early for such questions. :)byeAndraz Tori, CTO at Zemanta
A) Your smiley face made me giggle first thing in the morning. Nothing better than laughter instead of coffee..B)Thanks- so the answer is a maybe. A lot of what we don’t discuss about human language is that it is non-verbal, and there are some fairly universal agreements about specific non-verbal schema. That might be a good starting point.Have a wonderful day! (or maybe evening) 🙂
Sweet. We got andraz into this discussion!
I’ve been interested for a while in the concept of combining human tagging, machine tagging, and intelligent categorization and have implemented a few features like this for some of my clients. I like your idea, for example, of content sites providing machine generated tags as recommendations for the person producing the content and allowing the person to choose from and augment the tag list. What I would add to that is some sort of system to layer hierarchical categorization on top of the tags. For example, if you write a blog post about the iPhone, your blogging tool should recommend the tag iPhone. Once you select that tag, the system could then look that tag up on a trusted, structured data service like Wikipedia to layer in a bunch of additional information. Wikipedia knows, for example, that “iPhone” is part of the “Apple”, “Smartphones” and “Wi-Fi devices” categories. Furthermore, they know that “Apple” is part of the “Companies listed in NASDAQ” category. You may not have used any of those phrases in your blog post, but the machine should be able to identify related categories with the human only stepping in to help with tags requiring disambiguation. Then, I could browse your site for all posts about NASDAQ companies and drill down into subcategories like Apple or tags like the iPhone.
Good idea as usual Joe
Thanks for sharing. The insights for me: i) pragmatic, narrowly focused approaches work better when tackling a big problem. ii) collaboration aided by simple (but smart) standards is keyMaking tagging more pervasive and improving its accuracy and depth is a good start.
I haven’t seen as many people using simple semantics to help sort through all the news. Techmeme has done some, but it hasn’t taken off in my opinion. Why don’t you think it has reached main stream?
Not sure why
Seems you are equating ‘intelligent’ with ‘appropriate tagging’. I wonder what it means to call the Internet intelligent. Probably not much. Zemanta, which is good at what it does but with all due respect, is not doing something intelligent in a traditional sense. Intelligence is what the user of a search engine applies when he or she filters junk from the useful and navigates to the relevant pages.We need some fundamental breakthroughs but as you rightly mention – the current approaches are ‘pedestrian’. I doubt they’ll do much for us. The question to ask is -how are we different in our approach today than 50 years ago when Turing was around? And if we’re not different – what’s the reason for optimism?
I am optimistic because I see the internet getting smarter every day
Looking at the development of the web and how it improved our lives in last two decades, I think there is every reason for optimism!Compared to 60 years ago we were given a number of superhuman powers. Search through the 8 billion pages of human knowledge in a second? Taking a look at every corner of the planet from your chair in a second? Sending a message to any of your friends no matter where they are in a second?Intelligence is not what we desperately need from computers. We merely need usefulness. We need intelligence to achieve usefulness sometimes, but the history has shown that even without perfect AI and just employing “smart solutions” there are so many advances that can profoundly affect our lives!So Fred, I think your use of word “smarter” instead of “intelligent” in the post title was a smart choice. “Intelligent” still seems to be quite far away.
I wonder if we should replace intelligent with smart in our six words
+1 vote from me 🙂
Not sure if that counts as much is hidden behind the word smart.Here’s some food for thought which will perhaps explain where I’m coming from. A subtlety that is easy to miss – when you say – “I am pretty sure we’ll get there. ” – what exactly is ‘there’ ? Or even – what approximately is ‘there’?
as i work day and night developing a tech that aims to generate content where relevance is specified, you pose a very interesting question regarding the internet as a whole… if it took Google under 10 years to 90% solve search ( http://bit.ly/ucA9e – she states it in the conclusion ), how long will it take us to collaborate and define the web to be more intelligent?…way to make us think!
Fred – Thanks for mentioning InfoNgen in this post. Though we are primarily focused on enhancing content discovery and usability in the financial and corporate markets, I certainly appreciate the challenge of bringing that type of benefit to the broader web. Your post was spot on.One additional point that you didn’t mention is the growing tension between the production centric model of content distribution and the consumption centric model of content aggregation. Given the challenges they are facing, many content producers are starting to pull back and limit where and how their content may be used. At the same time, consumers are demanding even greater access to content, and the freedom to use it in many different way. While there is little doubt about the ultimate outcome, a contentious and litigious resolution could have a real impact on how we make our way to a more intelligent web, and even what it may end up looking like.I posted a more complete view of how I see the move to the semantic web happening over at The Digital Edge: http://www.thedigitaledgebl…Thanks for getting the conversation started.-john
I hope we aren’t going backwards
HiThanks for your wonderfull sharing. We are working on a web service model following up with many major points you blogged.The major one is review and rate the content. specially when it comes to sharing innovations and knowledge.May be you want to keep an eye of this new blog which is on the web consdering your suggestions too.http://snowkiwi.wordpress.com/
🙂
What I find so sad about the semantic web is the mistaken routes followed by well-meaning academics. Tm Berners Lee’s team as Southampton University is probably doing profound and important work – but who knows?You would have thought that TBL of all people who know by releasing something to the “community” would make things happen in a way that academic purity doesn’t. Suck it and see has worked very well in internet development since it graduated from university.
in my opinion the next step is more about humans than about machines. jdawg has the right idea (but wrong angle, IMHO).
jdawg is a master of weaving and bobbing. He will figure out the right angle of attack someday
too bad he’s using his bobbing and weaving skills to bob and weave around the truth, doubt he’ll get anywhere with that approach.
Though we agree with the necessity to keep taking small steps towards the semantic web, these small steps or the semantic web in general are just a part of the solution for a better navigation on the web.The semantic web is a static web that will ultimately provide us with more choice than less choice, because it doesn´t take our realtime context into acount.So yes, let´s all work together 😉 but on a higher level and combine the semantic web with the social web into a more Lateral Web. You might find that a giant leap doesn´t have to wait for the semantic web to come to full fruition.Full story here: http://bit.ly/EELR1
This is another great article but I respectfully disagree with the six words in use at present.I am not even nitpicking whether all sites and services are global and/or intelligent or not. I think the word playful does not belong and I’d definitely propose replaced “playful” with “instant” or something more related to the time element of it all.All the best.
I feel that one of the main problems is scalability. Can a single company process enough content in a timely, useful manner? Acquiring enough raw power (bandwidth/compute) is difficult…of course, I probably think this way because getting that power is what I’m working on now 🙂
Hey Fred,So I signed up for Glue upon your recommendation (Adaptive Blue). I’m crazy about a few bands and I thought I could use it to get a list of all the websites about them that I could scour for content. The plugin is lame. It sent me to three websites in a row that I actively despised, and then there was no way for me to train it. I uninstalled it, maybe I’ll try again in a yearBest,Zack
I’m sorry to hear that.
Tcsertoglu’s comment (about the echo-chamber problem of recommendation become self-fulfilling leaving relevant pieces out) seems to me to be a solvable problem. The phenomena is akin to feedback, and is probably addressable using signal processing techniques, a few nifty algorithms, some FFT…
Fred’s suggestion of a semi coordinated way of getting the benefits of human/machine tagging “on the way” to the “intelligent” web is the most intelligent I’ve run across. i will explore the Zemanta API as a possible contributor to a Linked Data startup I am supporting. Thanks, Fred.
Let me know what you think of zemanta’s api
I have been using Zemanta for some time … when it comes to the tagging I realized two qualities in my user experience: (1) Zemanta usually does a fairly good job at suggesting tags; (2) I don’t care!Tagging is a burden to me because it is useless and of no value (at least no short term value/satisfaction). Actually adding any Meta-data to something I publish is a burden. My key incentive is usually to make things more accessible to readers – human readers – in which case one or two categories will do the trick. This got me thinking that maybe there is something wrong with the “timing of tagging”.I believe that a wonderful quality of mind is the ability to re-frame and create context. You can feel this happening when you are placed in unknown circumstances – like landing in an airport/coutrny you’ve never visited. Your mind does an amazing job if applying what you already know (I am in an airport) to what you don’t know. It does it amazingly fast!Words (tags) have no meaning until they are placed in context. When I am creating something the context is very clear to me (I would hope) – so I have no need to “state the obvious”. When I am looking for something (especially online) – most of my work is to formulate the right question – to create a relevant context.With web-search so readily available, the challenge remaining for me as a user is to choose a good set of words – that will generate search results that are relevant for me (this can be a challenging task). When I find a good result I have essentially tagged it. These are the tags I think we should be creating. This is when I am, as a user, motivated to create tags.Could it be that we are trying to apply tagging in an inefficient point of leverage?
I don’t think so but I do get the point you are making
If you take a look at what organizations like AP in the US and the NLAin the UK are trying to do to control even links to their content, itisn’t hard to imagine a fairly destructive battle developing over thecurrent foundational assumptions we have regarding the fair use of webcontent. It could involve courts from all over the world and enoughpolitical meddling to create an unsustainable tapestry of regulationsand restriction that will make progress difficult.And all of this doesn’t take into account what nations like China maydecide to do if content becomes too discoverable for their liking.I do see challenges ahead, but I am optimistic that everyone will findsome way to step back from the edge before chaos takes hold.
Excellent points, but requiring payment for commercial link use (at least in the case of AP) doesn’t necessarily imply a lack of interest in contributing to some sort of global tag exchange. The key point to remember is that the output from the system (ie. an improved semantic map) will always be greater than any given input (hence gestalt).When viewed in these terms, there is no (apparent) reason not to participate. Then again, this isn’t really my field – so I’m basically just speculating.
Thanks David – I appreciate your perspective on this. A rational approach here would likely embrace some degree of tag level cooperation. The concern that I was voicing previously (somewhat inarticulately) is that there are a lot of actors in this space that are being driven more by fear than rational thought, creating some uncertainty. My hope is that at some point these more parochial fears morph a more outward-looking concern over losing their chance to shape the future direction of this space. Like I said, I’m optimistic.