People First, Machines Second
The brilliance of Google' Page Rank algorithm is that it leverages the actions of real people to determine what pages are the best result for a given search term. The specific action I am thinking about is creating a hyperlink on a web page.
That human action of saying 'if you want to know more about fred wilson, here's his blog' tells google's machines that this blog is a good result for a search on 'fred wilson.'
Someday machines may be smart enough that they don't need humans to give them cues, but today I believe the state of the art in machine intelligence right now is 'humans first, machines second' as Google did it.
The particular event that got me thinking (and thus writing) about this today is the year end best of music listmania that has been going on the past few weeks.
You saw it in action the past three days on this blog and it is going on all over the blogs right now.
I took my 25 top songs of 2009 and created a playlist of them at 8tracks.com. I then embedded that playlist on this blog.
Well I wasn't the only one to do that. At this time, there are 141 "best of 2009" playlists on 8tracks. Click on that link and you can see all of them. But you'd be hardpressed to listen to all of them.
But 8tracks can now take the human intellligence that is contained in all of those playlists and do something interesting. They can have their machines go through all of them and create a 'best of best of' playlist. It could be just the most popular tracks across all of the best of 2009 playlists or it could be weighted by the times each playlist was played or it could be some other algorithm. My point is simple, if humans are doing the curation upfront, then you can turn the machines loose and get some interesting results.
On January 4th, the Hype Machine will unveil its second annual Music Blog Zeitgeist. I am very much looking forward to it. They scour the music blogs for all the "best of 2009" posts and then put them into their machines and crank out the results.
Last year's Music Zeitgeist was terrific and provided our family with weeks of listening pleasure and introduced us to dozens of new artists and records. I'm sure the same will be true this year.
As much as I enjoy reading everyone's best of lists this time of year, I can't get to all of them. Machines can and that's where they can add the most value. But you need humans first, then the machines can take over.
Ha… the human vs. the machine conversation continues from Brad Feld’s post yesterday.http://www.feld.com/wp/arch…I like the importance of human input as it is right now – our direction and curation driving the processes – though it’ll be interesting to see what happens when machines can/do/eventually will take over….but that’s also how Skynet got started. 😉
whoa, i had no idea he’d written on the same topicbeing on vacation is nice in a way i can be completely ignorant of others work and get away with it
Haha, great post by Brad, thanks for sharing it Reece!It’s fascinating that we don’t see what’s coming because we’re part of it. What’s the invisible hand behind driving technology forward in this particular direction. We can’t seem to get enough of intelligent acting software. I count on digital memory with my smartphone, blog, comments, microblogging, and Internet search is mixed into that. The solutions and methods I concoct now weakly resemble the more isolated solutions I used to design.Why not take full advantage of all the collaborative work of colleagues all pushing forward and sharing access to their discoveries through open source and fairly liberal APIs?
I agree humans first, machines second. I don’t think machines will be able to completely take over in the future though. Due to the subjective nature of how humans will always process information and the fact information is product of human thought, I think moving forward machine intelligence will become smarter at interpreting human opinion, interests and needs as data, but it will still be humans first. Great topic for a post, thanks for this.
Hi Fred,I agree with you on this, but can you say the same for communication?Can machines learn human-computer communication from human-human communicationI feel humans are very weak in communicating what they want to communicate (most of the world’s problems exist because of miscommunication), I still haven’tfound the answer to what is the perfect easy to use communication methodologybetween humans and machines.Any thoughts?
I’ll have to ponder this one
There is a huge value well waiting for us to unlock the ability of the human machine interface. When human intuition builds out software organically so many domain expert will be able to “code up” solutions more computer centric folks like myself can’t comrehend.If you look at the example of social media and it’s effect on shrinking the globe you can begin to imagine what improved communication can reap. Our isolated views diminish in the wake of shared human experience.Just recently we have crossed over the language hurdle as voice to text, text to voice and translation services are being combined.As to humans and machine interfaces, I look to neural interfaces as a direction fr great improvement. Thoughts can instantly be broadcast to remote storage, if they can be “recreated” and transmitted back we will have invented the first external human memory devices. That’s a pretty far out tech direction, but my hunch is that it’s inevitable.
I agree with you. We are doing this with factoetum; and we will have a way that developers and members can generate revenue for themselves as well as their communities.
I find the reverse. I find computers very weak. They are very limiting because of ontology problems. Once you step outside of the ontology because the ontology cannot express an idea, you are stuck.Humans are broader.I find the Turing test a good example of this, techincally it is only defined by liguistics and coversation. It’s very narrow, considering that humans connect with other humans and communicate other truths about themselves through a whole slew of mediums that have nothing to do with words (Watch a ballet or a mime.) One of the reasons I’m attracted to HCI is because implied meaning by the human brain is so malleable and so broad across the senses that this techincially fails the Turing Test (each painting is hand programmed in, if I am not mistaken, I can find out, it is not languaged based in the traditional sense), and yet each person will find new meaning if each painting was hung individually. Humans will always end up the qualifier, because ontologies as a rule, tend to fail. The question with computing, it seems to me, especially computing in a networked sphere, is can you create a consensus, where within certain parameters, you have an ontology. Projects like Electric Sheep force people to create ideals of what beautiful sets of sheep (really fractals) are, which can’t be created without the computer. People vote up and down their favorite fractals. (This is what I do when I want to spazz out to think). Enough people vote up the fractal, it stays part of the herd. There is consensus.
Adarsh… mis-communication is the key (strength) in everything we humans do. Machines cannot truly miscommunicate – that’s their weakness. Miscommunication (transformation of the signal) happens every time we process information. The classic speculation is that this is due to the (loss in) translation between the two hemispheres of the brain – holistic/discreet. According to this view, we cannot but mis-communicate. Another implication – people with the strongest ability to mis-communicate (transform) are considered creative. We (all people) cannot but be creative.As I recently twitted re this article – http://web.mit.edu/newsoffi… – AI discovers human thought is ambiguous, inconsistent, and co-produced (welcome to culture) – better late than never.
I love being human, I would never give it up for the world. I would never want to be an AI. The question is, can we reproduce miscommunication. Or rather the subtleties of communication. i would say that opens up ways to grow in communication…
“reproduce miscommunication” is kind of oxymoronic.
Not if your brain works that way! If you want to produce an AI that canpass the turing test, you better to make something that works pretty muchthe way a human does, by reproducing miscommunication.
Signal morphing is something I need to take into account with social semantic search. I think it’s built in based on the limitations of current entity extraction, and relationships that are derived from linked data.Either way, thanks for making me more aware of the desirability of a fortunate miscommunication Emil.
Curation plus automation will be the 2010 buzzword.
Why do you think so?
Indeed Humans First. In fact A.I. can only happen if we have “Humans Always.”I agree about google and their ability to do something with links that are Human generated actions.This is intresting because the links as they are used in page rank and other serach engines are one of the only pure clean pieces of data that can be gleaned from an html page; and without haveing to know the context or validty of where the link leads, you can count them and give a value to them based on how many there are and some other data points. The major issue that Google will have to deal with is that they have no way to know the value of what the links lead to or for that matter the validty of the writerof content as well as the context of the content as it relates to the query. I thnk that we have hit a wall as far as what a computer application can do with content that is not in a data object format. Google and Microsot have devoted hunderds of millions of dollars to try to solve this issue and today as we near 2010 nether of bing nor google can help someone looking for a Washing Manchine easliy find revelant information.You can also use Factoetum to create your favorite dynamic list 🙂
I agree and think this is one of the issues services in the financial information space have gotten wrong quite a bit. There are a number of services that have focused either on pure aggregation or tried to automate all filtering of information by machine — e.g. there’s a mention of x on this message board, here’s a news article etc. But that only takes you so far. There’s been a few moves back the other way, stocktwits comes to mind as do wikicharts where there’s more of a human focus. But there too there needs to be some filtering of human activity or curation in order to not get overwhelmed by irrelevant information.
Very good point about financial systems/algos needing more human input
read abnormal returns linkfest. he does it and stocktwits working on a curation/aoutomation for news and trading ideaz as well for 2010
Excellent. I’ll do that
Great examples Bill. I’m working on a slice of the human machine social interface leveraging semantic extraction. But it’s connected to a human interface, and human curated lists. Semantic on it’s own isn’t enough, we need memory of previous interactions and ultimately a machine that can build and relate analogies (that’s the exciting future path for Victus Media) Check out http://imm.victusmedia.com or unlocked window variant http://victus-imm.heroku.com
I wonder if Yahoo! Finance does its aggregation all by machine. If so, that might explain the frequent Motley Fool articles/advertorials that come up as recent news items when you pull up a stock. Motley Fool is obviously aware of this, as its writers often include tangential mentions of several widely-followed stocks in their pieces.
I don’t know about Yahoo Finance but in looking at that type of presentation I think a great example is a Google Finance chart. They do a great overlay with news but if you pick a well followed stock the amount of news is huge. So what’s important? Unimportant? To what degree important? AAPL outperformed the market over x period of time because of x, y and z. Even if that can’t be stated perfectly, the market knows what those reasons are but the Google Finance charts don’t tell that story. Or at least don’t tell it well. Basically what I think there is a need for is to be able to tell a story of how a stock moved and why and to do that right these days you need a human not a computer. If you’re an investor and new to a name (or haven’t looked at one in a while) you don’t want to have to read all the news, research etc. At least not right away.
Mega cap stocks are a different animal, because, as you point out, the amount of news is huge (much of it often of little importance). With micro caps, there often isn’t any news since the last filing. On my old blog, I tried to fill in the gap a little with a handful of stocks I owned, by posting notes on interviews I conducted with the companies’ CEOs. One thing I noticed was those blog posts of mine would often show up on Google Finance for my obscure companies, but not on Yahoo! Finance (perhaps because Google owns Blogger?).
Our portfolio company tracked.com is working on this problem. I’d say right now its a work in progress
Machine curation and ranking is very useful, but any such ranking with economic benefits immediately suffers from humans actively gaming the system (past example: splogs crowding search results).A savvy agent will flood 8tracks.com et al with playlists pointing to their artists. Then other humans will have to filter the inputs to the machine curation, trying to counteract the damage.If we develop algorithms sufficient to determine when they are being misled, then we’ll really have something…
it’s a perpetual race….search engineers innovate, search marketers counter-innovate….repeat
Right on. Spam is fake human activity that does fool machines
Fake is as fake does- it’s a spectacle, and it points out to the inherent limits of machines. I find fake facinating, only we can imply meaning to it, and we can imply multiple layers of meaning. I’m not sure what makes it so fake unless we label it so.I’m sure there are real people out there would, who are not marketers, make a playlist of just one artist. Because they like it. Or because it is a good way of organizing music.
That’s a critical value identification. Machines that recognize when they are being gamed. Even if they can identify potentially anomalous usage and alert a moderator that’s a pretty valuable curation tool.
It seems like this hypothesis is starting to catch on and it’s something I’m really interested in. But I’ll pose a larger (if somewhat ethereal) question – at what point do a large group of humans, acting independently but in parallel, actually become what we might consider “a machine” to be?There is a great TED talk (I’ll see if I can find it and link to it here) that makes this argument, basically positing that the web itself is the most expansive, complex, and distributed machine ever built, with humans being its atomic units – the moving parts that each do a little bit toward achieving a much larger goal.I agree that algo’s are different humans (thank god), but the point is that it’s becoming harder and harder to draw a clear line in the sand on this issue. Sounds a little ridiculous to say out loud, can someone shoot me down with some hard logic on this?
Here’s the TED talk: “Kevin Kelly on the next 5,000 days of the web”. He talks about measuring the power of the web in “HB’s” – Human Brains
I’m a huge Kevin Kelly fan and really enjoyed this talk. He captures the topic well in his Super Organism post (darn iPhone weak linking).Great tie in, and an influencer behind my comments here. He’s one of my super human filters 🙂
kevin kelly is a genius…..new rules for the new economy is my internet bible….i view it as prophecy
An epic read (I’ve read many of the posts), speaking of which I’m halfway through my kindle edition on my iPhone. My battery’s dead and I’m waiting for a tow great time to catch up the book. Thanks for the reminder Kid
This talk really blew my mind and stayed with me for a while. I just watched it again and the sheer number of clicks, emails, and data exchanged on a daily basis is staggering. The aggregate knowledge being shared is incredible and I think it’s absolutely right to think that tapping into this meta data behind how and where this information travels can help us solve all kinds of aggregation problems, from sourcing new media to finding cures for life-threatening diseases.The sum total is far too much to comprehend and will continue to exceed human capacity for consumption. In my opinion, the way to glean usefulness will have to lie in the patterns that connect one piece of information to another. To me, these patterns are inherently social; people exchanging links, whether one-to-one or one-to-many.
Fabulous TED talk by KK (http://bit.ly/5r2F2Y) but after that great intro I felt like he dove down in the weeds and only skimmed over the most interesting idea – how multiple emergent systems will coalesce from the web. While it’s true there’s only one global system available (hardware internet) I imagine we’ll have multiple organizing entities running simultaneously through the network. These tools will leverage the collective intelligence of our links, annotations, and semantic data to build a unique perspective, a subjective lens for the web. Google is in some ways a first baby step towards this idea.I was a neuroscientist way back when – so hearing this kind of thinking applied to the internet is fun for me. Wild tangent – KK’s analogy to the human mind is really appropriate, since research suggests that we as humans each have multiple emergent systems operating simultaneously, the interactions of which give rise to the mind, personality, and consciousness. Lots of thought-starters possible when you map this idea to the Web:* Emergence and accidental AI* What kind of cocktail-party guest would Google be? What would a ‘cool/outgoing’ search entity look like* Net-neutrality in context of engines/personalities competing for dominance of bandwidth/data. In the end can there be only one?All kinds of fascinating stuff to think about – what will the next 5000 days bring?
I can’t. Please share the TED talk if you find it
This is true but one of the interesting things about PageRank, IMO, is how humans have figured out how to build machines to manipulate the results. The human action PageRank relies upon can easily be faked. If the premise holds true, we’ll see Google modify PageRank to give a higher weighting to paths within the link graph originating from a ‘person’ on a social network. Of course, the arms race will always continue, because as soon as that happens fake ‘people’ will begin to pop up more rapidly than they are today. And then what… I suppose some algorithm around reputation and importance of an individual person will be taken into account.
Yup. Spam is the enemy of this approach
The answer might lie in this very blog post! A machine could never (well, never say never I suppose) take a thread of two or three comments and follow up with the next comment containing a relevant or insightful post. I suppose that if machines develop true AI, the blog and associated comments might then fall victim to spam. But for now, Disqus seems like a safe, reliable source for human intelligence!
humans can be better than algos for fighting spam too. i wouldnt mind flagging posts as spam in search results.
Absolutely, humans can be used to fight the spam. Unfortunately, how do you weight an individual users action on the internet when it comes to identifying spam for everyone else? Are they marking the content spam because they disagree with it? Because it offends them? Because the site is one of a competitors? It’s a slippery slope and depending on depending on how it’s done could simply result in more input noise to the ranking algorithm.If there were a neutral third party rating agency for online reputation then it could serve as the edge of any graph used to weight an individual’s input into content quality. Of course, no such agency exists for a myriad of reasons, but IMO without a trusted third party there will be no end to the arms race.
im biased because im working on something in this area but i believe a hybrid system is the best method. user data can identify trends and eliminate a large amount of spam, but there are some issues that require some sort of moderator. the trick is identifying the murky issues and rating them by importance. efficiency is key.of course, there are ecosystems where there are no moderators. craigslist is a great example. however, i believe that moderators will be needed as we move towards social and curated content. we are already seeing this explosion. community managers and spam czars (actual title?) within startups and other social media companies are exploding in growth.
Sounds interesting. I agree with you re: community managers, etc. Filtering the noise is going to be a big trend going forward. Good luck! I’ve followed you on Twitter and will be interested to see how it turns out.
How do you impose impartiality in that field, something I always to know. Humans are not by nature impartial. I’m driven to consensus- but even there are faults than can come up in that sort of model…
there are faults with every model.community will power the curated web. we have several projects at stocktwitsthat are based upon a curated model. when we get something wrong thecommunity lets us know. its a two way street. stocktwits.tv would be nodifferent than cnbc if the stream did not impact content.
Clearly you have a consensus driven model, and yes I agree that there are faults with every model. They’re models, not life.And I am betting just like you that community will drive the web. But I still would like to see best practices develop. I’m human, and I am sensitive enough to realize I will make mistakes. I would like to make a minimum of them and not hurt a lot of people along the way. I know I can’t please everyone: Yet I still would like to make someone that makes people happy through a refinement of what we see now. And there is a lot to do to make all of us, both the beginner users (my grandfather has decided he wants to write his first email and search the web this upcoming year in his late 80s) to the most advanced of us, much happier and more able along the way.Just me. *shrug*(Off topic: Am I missing you on this list? Bother me! I’m taking a staycation until Jan 4http://twitter.com/shanacar…)
the problem is that every ecosystem is different. content, spam, etc,varies. best practices become difficult to define. curators just need bettertools to listen, interact, eradicate the bad, and highlight the best.
Of course. You are on to some really interesting keywords about your work. Do you have any good practical examples that are good for case study?
It sort of exists in email. Our portfolio company return path’s sender score (senderscore.org) service publishes reputation data on all known commercial mail sending ip addressesThey calculate reputation with data provided by mail recievers, isps, web mail providers, spam filtering companies, etcIf you are not familiar with sender score, you can see it in action at senderscore.org
It certainly makes sense for a company like Return Path to provide a service like Sender Score. Apart from being a valuable tool, it establishes them as an authority in the space. I wonder, does that mean that any reputation system would need to come from someone who already knows about my identity on the web? Facebook, Twitter, Disqus, Google? Each of these, perhaps in that order are positioned to half enough data on me to state, with a certain level of confidence, that I’m “real”. Of course, there might be an opportunity in pulling data from all of these places and applying a novel algorithm to determine a level of “authenticity” and “authority” for a given online identity. It’s a fun thought exercise, at least.
i wrote a post a while back about reputation servicesi’ll try to find itbut the basic point was need more of them
I think I found it. I’m going to ponder it and some of the related posts, lots of good thinking there. It remains unclear to me how a neutral third party such as the one I described can make money w/o the draw of some larger service.
indeed, i’d love to be able to flag my competitors 🙂
Great post Fred!Reminds me of this great quote:Man is the creator of change in this world. therefore he should be above systems and structures and not subordinate to them.
i don’t know about that. parents are also the creators of their children, however i don’t think a hierarchical boss/subordinate relationship between parents and their children is always advantageous, particularly as the children grow into adults. might it be the same between engineers and their “children” (i.e. apps they create, some of which can learn and evolve on their own?)
That is still too creepy- we engineer our children?
This is just a semantic choice. I believe most people tend to believe in tabula rasa with regards to existing knowledge in newborn children.
I do not think we are pure tabula rosa, we are born with certain innately human things that you cannot train out of us. Further, I think we are born with certain starting points, and stopping points. I think environment helps a huge, huge amount. But I don’t think its pure. From a large amount of life experience (and this is annoying wake-up call that I need to write to someone…)
The distinct point is that we (people), through our attention and shared thoughts are a vital part of the Internet.It’s a big ole cyborg amalgam.Together with the algorithms and software that holds all the data together we have extraorinary processing power.The system you describe is a front end bulk filter (super human filters) followed by smart data mining. I count on you Fred, and several other AVC’ers to prefilter the world into manageable sections.
i agree with the basic principle of what you are saying boss but i do think you can also flip the script on that and it will be true. meaning you can say “machines first, humans second” — that is the line of thought i am pursuing, and think it is where the next great value creation opportunity is. as an example, jdawg leverages automated search results (machines first) and then curates them (humans second). in terms of disruptive theory, this is how i see value shifting from automated curation to human curation. of course IMHO the real secret sauce to making this happen is to leverage open source technology, and to go niche, because the scalability issue has to be approached in a very, very different manner. i think that’s where almost everyone gets it wrong because they get blinded by the dollar signs, and because the existing ecosystem (i.e. VCs, legal structures) is not compatible with how to scale niches.
Hmm. Gotta think about this. Interesting point
Great point Kid.I agree it’s not scalable as yet but think the process today is humans first, machines second, humans last. We inform rankings and the machine through this very chatter, they rank and link us, we decide if it makes sense. For now we are in control. Skynet later….I don’t think so,
If you want to go Niche, Kid, You have to be so outrageously Niche that it is eithera) Banal that it is no longer niche.b) worth so much that the niche is worth appealing to.I remember saying that I wanted to make an Iphone app as an art piece that did absolutely nothing. It just would be a flickering dollar bill to the movements of the USDX. (I like the look of the charts that theICE does, they do a nice job, I used to stare at them…)The kicker was I wanted to work it out that only three would sold. One at auction, one at the same price as the USDX contract at the moment, and one at some very unspecified art world price. Making fun of some very art world and money world things…I was told this was a bad idea. You can’t do that with Iphone apps and with art, where do you display it, and what makes it art. And it’s banal. But it fufills both of your roles.No one will go and make it unless there is some reward at the end. It was supposed to be a crafted Iphone app. Most of this stuff is not going to appear, because the return on the value without the machines involved is so low to be neglible. I saw a twitter post that someone was offering a social media job that paid as much as an intro job to Mcds. That’s why.You have to solve the money problem. People need to eat. And house and cloth themselves.
One of the issue here is that we have been blinded by feel good we receive when we so sheepishly add value and content to applications like twitter and facebook without asking to be compensated monetarily for the content/value that we add. So Facebook is worth 11 billion because of the “Machines” ? No thats not the case. You take away the humans that add the content and add the value and Facebook is does not have a large value. For me member content creators that add value should at the minimum have the opportunity to passively generate revenue. How is it rational that a handful of individuals can reap hugh finical reward from content that they do not own ? Why is it they we can hear the valuations in the billions for companies that members add value and content to and not think that this might not be such an equitable thing to do for the members that have created the content and added the value.
The systems for doing so have not been developed. It could be totallypossible to have machines creating information- it has an ineffable quality,it might be what it is interpretable and actable on is what makesinformation possible. The Turing test, at least to me, as a result, feelsreally limiting. I’m pretty sure I’m not the only person really strugglingwith this issue. Further, in developing such a system, one has to be really precise in whatone is measuring. What is the value. And when does it become valuable? And for how much? Information is among the most difficult things to valuesince it does sort of become stale, and it becomes stale at the rate itbecomes used. And yet its use to does not really make it less valuable persay- that’s really a function of time passing. (It’s the not acting on itthat makes it stale, not necessarily how many people know about it) Lesspeople using it versus more people using a piece of information is not goingto change the inherent value of it?I’ve been thinking about this for a while. How do you measure the value ofinformation over time, especially as it passes into the realm of knowledge,and then into wisdom? It’s one of those critical things that I don’t know. But it will become really critical as the internet becomes more ingrainedinto society, and information becomes pervasive and you have to measure itsvalue as it passes through time.I wish I was really fluent in information economics- but even it seems to behindered by information being attached to something else- not pureinformation or macro information theories. That to me, seems to be apressing question…
The modular patterns within isolated but similar niches would lend itself to a form of scaling.
that is a fantastic way of phrasing it! i’ll be spreading that idea.the key question IMHO is how similar do they need to be? or better yet, in what ways do they need to be similar? going back to a recent conversation here on AVC i think the twitter API being turned into a standard is the type of modular pattern that could lend itself to a new form of niche scalability. well, maybe, i guess we’ll find out.
That’s where you come in. Identifying emergent patterns within very specialized subgroups is your genuine solution. Representatives from niche communities can come together to find common frameworks to connect to. I wish I had the best answer to this now, but I don’t … yet.An open API, like Twitters, may be something various networks can leverage.
I agree with you concerning taking another approach to scalability and trying to do it a different way i.e. traditional routes for investment and corporate structure. It seems here one of the issues is who will go first. Who will be brave enough to put this kind of app/idea out. Will those that talk of a need for this kind of way to do things support it, or will it die while everyone stays on the sidelines with their hands in their pocket ?
i expect entrepreneurs to initiate this. this runs contrary to the incentive system that many investors are used to (i.e. big investments, IPO exit) so i doubt they will adapt — it is always difficult for incumbents to adapt because of how entrenched they are in their existing value chains. ideally i think the right microfinance player could be the perfect thing here, although as microfinance invests more labor and less capital, those funds will be sort of like regular companies too.
That reminds me of a facebook app I wanted to build. The basic premise was that there would be a number of lists each about a single topic like “Summer”, “80’s alternative” or “Best of 2009” in this case. Your friends would add one song to the list or maybe three wasn’t sure about that. Then you would have a playlist for a topic from your friends or you could see a list of everyones. even filter the list based on facebook lists.
Several sets of DJs on 8tracks have been doing something like this. One person picks theme + first song, others add subsequent songs to the playlist so as to match theme. I’ve been wanting to develop this concept into a FB app for a while now — I think it could be really viral — but we don’t have the resources to tackle it yet. If you’re still interested, we do have an API ; )
True… they also need to look at the velocity. If a track is released early in 2009 then it is probably going to have more plays than the one released at the end of the year. The track released at the end of the year may be more popular (based on velocity), but if you only look at the number of plays it would probably not rank as well. I noticed this when Twitter released their most popular terms/people. Tiger Woods ranked 8th on the, “people” list behind Joe Wilson, but his scandal broke later in the year.Personally, I would prefer to see a bit of supervision in this type of learning. Music is very subjective and I probably would not have put “Fat” by Weird Al in the top tracks of 1988 even though the number of plays/playlist additions might have warranted this position.
Ryan, totally agree about music being subjective and thus somewhat less suitable for an aggregated ranking (I’m founder of 8tracks). I mention in a comment below.
Hi Fred,Great post. I absolutely agree…one issue with most systems that curate the web is they do not go far enough in capturing human actions. Hopefully we can get together to discuss my new company in early 2010 that addresses this issue. It’s still in quiet mode but will be launching in feb.
What does your new company do?
The company is called Tickreel and our product builds intelligent activity streams for the web that account for a wider breadth of human actions. It applies the insights you share in this post of human first / machine second to provide a unique perspective of the web, highlighting interrelationships that would be otherwise difficult to pick up. If it’s ok, I’ll drop you an email with a quick overview and a few slides once we are close to having a prototype. I know you prefer to wait until companies have a working product.
… maybe very soon Androids WILL dream of electric sheep!
You could just hack the binary of this project since there is a linux version, then your android will actually dream of electric sheep. My Mac does when I feel like it.:http://community.electricsh…
From a true VC – people over product as people can make a product work but a product cannot make people work. Not only is google putting people’s actions first but is also using machines to tract their interest – essentially, after the first action, taking all other actions away.
A very neat thing about Google and HypeMachine is that they leverage actions that real people are already doing as a part of blogging. These machines don’t need to incentivize or change the behavior of humans who create the input to their machines.
people better start doing this soon. My gmail currently thinks I am a guy because I keep looking at and writing to startups. And twittering about startups. No, I do not watch ESPN. Yes, I do like shoes.Can you please get your advertising right?
If you tweet about shoes they better show up on your blog ;). What do you think of the new pluggin, is it more aesthetically pleasing?
Beautiful, but some of the choices are god awful, the babysitters club?
Hah, yeah I get some weird matches too. (we’re confined to a very restricted search space, limited data sampling so you’re gonna get some suboptimal matches).How are the keywords it’s using? For me some are dead on, and it finds some very enticing stuff even in the limited ad network of options from Amazon.If a greater fraction of the keywords fit, that’s a good start. If we can get people to refine their keywords to use in other places that would be even better. The semantic tools are only so good. Each of us has different meanings for similar words/topics, and my goal with IMM is to personalize the search/ads so that they match what you mean, not just what you say. On a side note I’ve found some folks that chat about similar topics, and links that they’ve shared have been enlightening. It’s like adjacent data search through semantic derivative commonality. Oh well back to work on the next cool thing…
Email me- I need to take a closer look.
ok you got mail. No rush I won’t have time to respond for a while. Have to go into reverse sabbatical, where I hide from social and do stuff. Lost 4.5 hours this morning while I got a jump and car battery replaced (cold weather sapped my battery power).
I think we’re still at the baby steps…right now as you mention humans first, then machines has gotten the most traction…and in some cases machines first, then humans is making some ground (and I agree with Kid that we’ll see a lot more improvements in this area very soon)…But to me, it’s not about what’s first (human or machine), but rather about the ability to have a conversation…either the machine or the human starts the conversation and it goes back and forth (often taking you on tangents) as long as both parties remain interested…If you think about how the human brain works when someone describes something new to you, it’s really a conversation where you start with one concept and then refine, refine, refine until the person ‘gets it’…even google has a long way to go in this realm as it’s not that easy to start with one search term, and then continue to refine or build on that initial search (the basic version that does exist right now is simple filtering of results and some basic suggestions of related terms, but there is no true conversation going on yet)…
Kevin you must be reading my mind :). I’m working on a social semantic bot that relates human shared entities and returns a relevant response (with some probability for diversity). I’d be coding more now but my car is dead and I’m waitin on a tow. Will give a shout once we have a fun Twitter account ready to be “talked with”. It’s also going to remeber who you are, and what topics you previously talked about.The setup has some challenges but basically it’s a status message, connected to a quick semantic call, then entities are sent to a database to retrieve adjacent linked data (DBpedia, Freebase). Finally a search call is made to find a meaningful(?) response and link. I don’t know how far I can take it with web data but we’ll see.
You can get it and not be able to do it. It takes time, very left versus right brain. Both need to come together when you need to really work at something, especially the abstract. And this is an abstract problem.
Good to hear that us humans still add some value!I hope that 2010 brings some innovations in terms of how people + machines can be applied to problems more interesting & challenging than “popular” lists. I like your “best of 2009” tracks list because you and I share similar interests in music. An uber best of the “best of” list would probably look a lot like the Billboard 100, which I personally have little interest in. I’d rather see the people + machines approach applied to filters that make the web more personally relevant.
Isn’t this essentially what Facebook is doing/intends to do? You have real world relationships with those people on facebook and therefore a layer of trust is applicable that goes beyond the trust of an “online” friend. You can then leverage this kind of trust by overlaying the machine on top. I suppose the exception to this is that Fred has such internet cred, you don’t even have to meet him in person, so the net in some case overcomes certain human factors.
Yeah Ryan, Facebook is one aspect, but not really what I have in mind. First, there isn’t much of an intelligent machine aspect to thefiltering that occurs on Facebook. It’s almost all people driven andjust a “river” of everything they post. Second, in terms of personalrelevancy, friends aren’t always the best filters. Just because I’mfriends with someone doesn’t mean that we share the same interests inmusic, news, art, food, and whatnot. For example, I’m interested inFred’s music, but less so in his view on NYC politics or his love ofthe Jets.There’s an overwhelming amount of interesting content online. Today’sfilters are primarily based on crude concepts of crowd popularity(Digg, Top 10 Lists, etc) or friend connections (Facebook, Twitter,etc). I think there’s a huge opportunity for something new to emergethat uses machines to connect us with like-minded people… people wemay never meet or may not care to “follow”, but who can expose us tostuff we’re likely to be interested in because we share some commontraits or passions with like-minded strangers. Amazon’s “people wholike this book also like…” is probably the best example today, butthat’s just scratching the surface in my opinion.
I hear ya Joe, but I’m thinking they will add intelligence in the future. Also, friends might not be the best filters but what if you included friends of friends, and so on. Now you are talking about a much larger network with people you “quasi” trust b/c they are your friend’s friends, which is probably better than a network of somewhat random people.
I don’t think friends and friends of friends gets you thereYou need taste neighbors. I wrote a post about this a while back
nice, I’ll check it out
Completely agree. Perhaps CF could be applied to @users and #tags to offer “recommended” tweets, for example.We should catch up for lunch in the Mission again, it’s been awhile. I’m in SF through end of Jan (and in Dolores Heights) if you’re around.
My love of the jets is turning into love/hate
Haha! As a Skins fan, I know the feeling.
Joe, that’s sort of the thinking behind our smart ‘next mix’ approach. Using CF or other means to help guide you to others with whom you share tastes. Rather than applying CF at the level of track (like Last.fm) we’ll apply it at the level of people (DJs) and collections of tracks (mixes).
Your conclusion “you need humans first, then the machines can take over” is the plot of Terminator.
the human aspect of your playlist curation was you specifying which 25 songs made it into your end of year playlist. chances are your 25 top songs were also your 25 most played(?) or thereabouts.rather than needing you as a catalyst to create ongoing mashups of your data, had software had access to your “global” playcount (cross device/platform etc) – it could have done the job for itself.(and even probably brought songs to your attention which would have made it onto your top list but your fallible human brain forget about at the time of the list curation) as our lives and software continue their increasingly fundamental coupling -human curation forming the initial spark of further machine learnings will become less and less neccessarymachines 1, humans 0
Well it turns out that’s not how it workedI did start with last.fm playcounts but for a variety of reasons, they didn’t work for meI used them as one data point but it wasn’t definitive
Had software been hooked in ( for playcount tracking only) for ALL your devices and services which you played music through in 2009 would the top 25 in aggregate playcount match your top 25 of the year. If not, what kind of %age hit rate do you think it would have got?
there aren’t many of my “devices” that don’t scrobble to last.fmmy laptop doesmy ipod doesour sonos doeshype machine doesstreampad doesi bet that i scrobble >80% of my listens to last.fm
TouchéYou won, I lost. 🙂
i didn’t intend it to be a contestjust informativethe fact that i scrobble most of my listens makes it even more interestingthat play count alone is not the answer
Granted play count alone doesn’t suffice but if a machine had access to all your music related actions…..Songs shared, embedded, favourited, tweeted, blogged, the volume played at etc – I’m sure it would do a good job of deciphering your top songs of a certain period.All i’m saying is that with “perfect knowledge” it would surface accurate results.Obviously no match for direct human input. But exciting nonetheless
yes indeedthat would be ideal
brutal victory boss congrats
Right on, Fred. I see a big part of the next chapter of the web being about services and platforms that come up with better ways to harness the goodness of Crowdsourcing as a way to create scale/aggregate knowledge (and metadata around same), Active Curation (by the service provider and power users) as a means to bridge the gap and amplify the links between all of the lose ties, and Official Content as a way of forking between trusted/verified sources and non trusted/verified ones. Interesting times.
Fred welcome back. This blog and comments are a breath in the daily grind 😀
thanksi’ve got a good one i am working on for tomorrow
The conversation ongoing attached to this post was spurred on by Fred’s use of 8tracks and his musings about human first then computer. The same idea is displayed in a recent NYTimes article about Pandora (The Song Decoders, Walker, R., NYTimes.com, Oct 14 2009, online at http://www.nytimes.com/2009…, which told us how the service goes about using humans to define the characteristics of a song. Those characteristics are then input to let the machines figure out how to deliver the most appropriate song to a listener based on that listeners habits. At the end of the day, all the data from the service are dumped into the pot and computers then spit out the results on the other end. That begets the question, can we ever program computers to assign the characteristics AND generate the results? As a society, do we ever want to program machines to actually think for themselves. Where does it start and when would it end? Can a machine ever be the best DJ in the world?
Tony, preaching to the choir here, I know, but there’s an important distinction b/n 8tracks and Pandora, despite that both start with people and then apply technology.1. Scalability. 8tracks has the advantage of scalability in its approach. People already create playlists for themselves and friends, around the world, and there’s no special musical knowledge needed to participate — just a passion for music.2. Cost. While Pandora has its contributors on payroll, a subset of 8tracks’ contributors may be willing to pay for access to a set of premium features (if Live365 provides any lesson), generating revenue rather than expense.
Agreed. I think the point of my post was not so much about the differences in Pandora to 8tracks, which should certainly be clear. It was more about the future of computing and when do we get to that point that we program machines to think like we do. How will that affect society as we know it? Getting a little heady here, lol.But to your point, the distinction between 8tracks and Pandora is clear, in that 8tracks creates value through the efforts of community, while Pandora is simply an enhanced version of a radio station, in that the player dynamically delivers the next song based on a subset of data applied by humans. I (obviously) think there is more of a business model in 8tracks, based on open standards and allowing the community to drive the application, then Pandora’s model, which could simply become the next Clear Channel-like radio service, solely dependent on ads to pay the bills.
Yeah, wasn’t responding to that part of your comment. The AI question has long been a big one, agreed.I’m not sure I’d totally agree with the 8tracks vs Pandora comp tho – I think both are enhanced versions of traditional radio, leverage humans + internet to provide more compelling programming, and will rely primarily on audio advertising to generate significant revs.Hopefully, 8tracks will enjoy some of the success that Pandora is now seeing on the b-model front too ($40m in revs this year and now/shortly profitable)! : )
Thanks again, Fred. The best of 2009 meta-mix will be interesting. I’ll post it on NYE and let you know.I think this is one useful way to tackle myriad “best of 2009” playlists (or myriad music or other subjective offerings on the web generally). However, I’d argue that machine-driven compilation of people’s activity is best suited for objective information (e.g. Google search).What’s more interesting to me for music (and 8tracks) is finding a way to guide people (via proactive search, passive recommendations or otherwise) to mixes that are the best fit for their particular preferences and context. This (http://8tracks.com/numo99/b… is my favorite “best of 2009” mix so far, and while it no doubt will have some overlap with the meta-mix (I think My Girls is on nearly every one), it more closely reflects my musical tastes (e.g. more electronic stuff).But your general point is very well-taken: one of our objectives is to make the “next mix” selection on 8tracks more relevant (to the “seed” mix, in a sense). To do so, we’ll very likely leverage machines (e.g. collaborative filtering). Let passionate people handcraft the programming; overlay technology to connect one mix to the next.Also, I recalled that you and I had discussed this topic on the blog before, and a quick search found this: http://www.avc.com/a_vc/200…. You’ve changed your stance ; ) I think my comment that Last.fm starts with technology was actually incorrect; the starting point is still people (what they listen to), which technology then records and uses in smart ways. So it too starts with people, just like Pandora, 8tracks and the Hype Machine.
Being a DJ and all, what I find funny about the idea of machines reading all year end chart’s and then coming up with it’s own “year end of the year end,” is that the chart computed by the machine could actually be really boring! If the machine picks the most popular tracks in the aggregate of charts, then spits out it’s own chart, it first might not be DMCA compliant, because it may pick two or three songs from a single artist that exist on all the charts. Factor in DMCA compliance and you then come up with a machine made chart that tosses out songs that are supposed to be there. Although Fred’s point is interesting in theory, in practice it is difficult to apply, because you must filter by factor’s that the average consumer doesn’t understand, thereby watering down the list so that it’s compliant. Rules really do suck, because they take the fun out of all of it. LOL!
By the way, don’t mind my horrible use of apostrophes where they don’t belong. I’m so bad at that! I wish I could edit my comments after submitting.
There looks to be only 1 repeated artist in the top 25 that we ran on Monday so the list should be fine.
All I can say is that humans have created computers so they are the ones who can also actually turn them off. Additionally, with humans’ broader intelligence that they have created these machines to allow it to help them and give them convenience though it shouldn’t be a point that computers will have to take over us because the limit of its intelligence might have already determined by us as creators ever since they have improved their capabilities. Without us, they’ll be just spare parts.
Your headline reminded me of a conversation you and I had earlier this year…My analogy is that- you have to put the train on the rails first before you can expect it to start moving in the right direction. So, the human part is putting that train on the rails,- but once it’s there, the machine can do its magic.btw- that has been our approach for adding value around vertical news aggregation – which is why the resulting output has more depth and quality than general-purpose, machine-only, uncurated aggregation. If curation+machine are IN in 2010, then I should get funded soon…:)
I like the post Fred, but I wanted to clarify some things … Beyond PageRank: Learning with Content and Networks http://bit.ly/7cqL5i
Thanks for the link. I’ll check it out