Oct 16, 2008

The "Content" API

Most of the APIs we think about are data driven. Developers can use the Facebook API to build apps that run in Facebook. Developers can use the Twitter API to build applications that leverage the twitter user base and content. Developers can use the AMEE API to build carbon footprinting applications.

So it was interesting to me to see that the New York Times announced the availability of their first API this week. Their first API is also really data, the aggregated set of campaign finance data their reporters are using to write about the US presidential election.

But I suspect that the Times will start to explore how to turn their content into an API as well. Imagine if you could get access to all of the stories written about the Empire State Building via an API. Or if you could get all the stories written about Mike Bloomberg.

Of course, content is data, but it’s a bit different. Content is unstructured data with the benefits of a lot of context, semantics, relationships. Once the vast databases of content that exist inside the big media companies start becoming available via APIs, we can start to do some amazing things.

We’ve seen a number of companies that have built algorithms using wikipedia data. And the things they’ve built are pretty powerful. But if instead of being limited to wikipedia, they could use dozens of highly trusted, accurate content sources, they could probably do much more.

Slowly but surely the web is becoming more intelligent. I am not sure we’ll ever reach the nirvana of the "semantic web" but we are certainly seeing it become smarter every day and I think "content APIs" will be an important part of how that will happen.

#VC & Technology

Comments (Archived):

dave Oct 16, 2008

I haven’t looked at the NY Times API, but I wonder why they need one.Soon Google is going to support RSS for the search engine. You’ll be able to do a search that returns RSS, for anything you could search on using the user interface and plug the result into anything that understands RSS.In other words we already have a fine API for content — RSS. It’s not clear what’s to be gained by creating another. Maybe someone from NYT could illuminate.
1. shafqat Oct 16, 2008
  
  I’m not sure I agree Dave. RSS is content in a raw form. What Fred is talking about is news that is organized, structured and more importantly, contextualized. That is not something that is possible today via RSS – atleast not easily.While it may not be the ‘semantic web’ that we’ve all been dreaming out, semantically classifying news, whether its topics or connections, is really going to enable a whole new generation of value added services to built on top of these platforms.
2. Tom Hughes Oct 16, 2008
  
  I think of RSS as the lowest-common-denominator of APIs; valid, and potent, but not complete. I’m pretty sure you’ll see this as another arena for competition — NYTimes and others campaigning to get developers and site managers to internalize the Times’ API in advance of someone else’s (Bloomberg’s, or the WSJ’s, or CNN’s) API.A similar thing is going on in online brokerages — the battle is moving away from end-user websites and towards integrating trading and data platforms via APIs with third parties.
  1. shafqat Oct 16, 2008
    
    @tmcmh good points. do you have any examples of good trading/data platforms that are enabling others to build end user websites around theose APIs?
    1. Tom Hughes Oct 17, 2008
      
      The one I’m familiar with is TD Ameritrade, among the large brokerages. I believe Schwab has an API but it’s not made widely available. Interactive Brokers has one. I’m pretty sure Ameritrade is doing this strategically so that third parties are aligned with them, and not with their competitors.
3. Tony Bain Oct 17, 2008
  
  The benefit RSS has over an API is that it is a standard so you can pull data from lots of different sources in the same format so no customized integration is required. API’s don’t have this same level of standardization so almost every integration is custom. While it would be cool to pull all the NYT articles on a single subject, really you probably want to pull subject matter content from an unrestricted number of sources then slice and dice or filter or aggregate or represent or mashup or whatever to produce your desired result.But RSS as a current standard doesn’t have this level of flexibility or associated meta-data to allow this to be effective over different data sources or different content “domains” (maybe your want to pull information on the empire state building from NYT, Wikipedia, Tour Schedules, nearby restaurants etc and represent it as a day planning service (ok lame example)). API’s can allow this level of detail but you don’t want to have to be building a custom integration module for every content source site you want to access.Anyway isn’t this what Web 3.0 is about, building a standardized API format that will allow this all to happen? I thought we were done with Web 2.0 already 🙂
  1. dave Oct 17, 2008
    
    That’s right. If I were them, I’d go as far as I can with RSS and parameters on the URL, and then when there’s something you can’t represent in RSS, define a namespace that adds what you need. The network effects of building on something as widely deployed as RSS are incredible, it’s like using an Atom processor in a Netbook, you can run Windows or Linux or Mac OS. If you use RSS in your API you can plug it into Google Reader or any other tool that understands RSS and there are so many of them. All of them ignore things they don’t understand. Incredible network effects.
4. RacerRick Oct 22, 2008
  
  Because publishers are going to want to embed ads within the content, in order to pay for it.RSS could do that… but doesn’t do it very well.
PhilipJames Oct 16, 2008

I dont see the difference between data and content. Content is data – you talk about structure, but its all degrees of structure. I see two kinds of API’s: development platforms (Facebook) and data (ours or many others).Facebook, Twitter etc allow users to build apps inside their platforms or running off their platforms. Boorah, Snooth etc allow users to leverage the data in any way they want.If the NYT API allowed you to pull a list of stories that mentioned the Empire State Bldg then its no different than the way ours works. For them full text matching, or some basic semantic matching and then returning relevant results. Thats a data/content driven API.
kellan Oct 16, 2008

Oddly out of order, but the NYTimes has already built an application on top of their as yet non-existent “content APIs”.If you visit the Times building on 8th ave, there is an installation in the lobby which cross cuts across the content is a variety of ways: photo captions, ingredients, questions asked, headlines.Worth checking out.(and Dave, RSS is an envelope not an API but for straightforward use cases they’re interchangeable)
leafar Oct 16, 2008

We have apply the API to people’es content, it’s time we hit the content to bring it to people
christopher Oct 16, 2008

nytimes is a great content company, but not very good at monetizing that content. perhaps this opens up a new revenue stream for them…slap on utility-based pricing to the API, set up a billing/reconciliation system with developers, and differentiate the API from other ways to obtain nyt content like rss, and you might have a new approach….
awilensky Oct 16, 2008

Hey, I have a question: Why on G-d’s good earth is there no generic service out there with a free API to store user profile and preference data. In other words, why can’t I just get a few of the FB or whoever function for my client’s use without having to take the whole platform. They could charge for it, or have a small tiny banner at run time for a click here to join.But honestly, my only choices are build it myself, use the whole platform, or go 2 tier with MySql from scratch. Am I missing something? Why can’t there be a service where I can instantiate and store profile data and preference variables without having my clients with existing sites migrate their entire user population to XYZ social platform?I am stooopid?
1. Tony Bain Oct 16, 2008
  
  Microsoft tried with MSN then Windows Live (they were really pushing it in 01, 02 from memory) but it never really took off because no one wanted Microsoft controlling their user data. Also with many sites the user demographic data is the valuable bit so keeping this close and available for ad-hoc analytics is probably a good idea and not necessarily easily doable through an API with restrictions put in place by another service provider.
rafer Oct 16, 2008

Fred, baby, give a guy a break. No Mashery love in writing up the NYT API? Look far lower right on their developer page.
1. fredwilson Oct 17, 2008
  
  Oren asked me to leave them out of my postHe’s the one who tipped me off to it
Ryan Merket Oct 16, 2008

That’s a really great idea. Imagine a Wikipedia API mashed with a NYT API. Talk about some really interesting projects.
RacerRick Oct 16, 2008

We are working on a project for a news customer who is going to give out all of their content with ads (text and image) embedded.Websites can sign-up and use our “widget-like” web service to display the content on their site. Then use their CSS to make it look like their site and put their own ads around it.Eventually, I hope to see all newspapers giving out their content (with their own ads inside) and let other folks build sites around it.How cool would it be to build a newspaper with editorials from all the big newspapers, or articles on a specific subject?Unfortunately, there’s not much revenue to be had on “old news” and there are not many publishers who want to let go of the control.
Don Jones Oct 16, 2008

Fred,Your comment in the post – “But if instead of being limited to wikipedia, they could use dozens of highly trusted, accurate content sources (emphasis mine), they could probably do much more.” is interesting so far as the issue of trust and accuracy.The level of “trust” in major news source organizations has dropped dramatically over recent years, especially as these organizations have tilted to one end of the political spectrum or another.Developing a system that highlights bias, such as the recent work to identify left-leaning/right-leaning article citations, is an interesting and challenging field. How to monetize that type of system is perhaps even more difficult.
1. shafqat Oct 17, 2008
  
  We’re trying to do exactly that at NewsCred. It’s certainly challenging – I think the monetization has to come for a related service, not directly from the one that highlights bias or political leaning. You can leverage the latter, but its difficult to build a business around a feature.
Phyllis Oct 17, 2008

Increasingly, content sources are building dynamic sites, where their presentation layer (what you see) makes a page request through an underlying API for dynamic construction of the page. This may include a search input for requesting the source’s indexed content.We (MCN) build mobile search services by federating to local content sources. All the transaction-based sources readily make their APIs available to us. The revenue stream is clearly definable: more traffic leads to more sales.Monetization is not as clear for information sources, and they have lagged behind technologically. What NYTimes is doing, in exposing that layer for externals to build their own applications, is making their own content act as advertising for themselves. As first movers, they are most likely to be included in the new apps being developed for the PS and Mobile internet, increasing their value as a trusted source. This increases their potential reach far beyond that available from print distribution and causes their advertising rate to increase (based on circulation).If they can continue to push ads with their content, then opening their content to developers is an advertising goldmine. NYTimes might want to add an affiliate program to share ad revenue with the application vendors if the vendor leaves in the ads.Even if the new applications strip out the original ads, they still manage to increase NYTimes reach. Increasing the distribution increases brand awareness for NYTimes. A win for all parties.
gregory Oct 17, 2008

Right on the money. Have you seen Daylife API? I never understood Daylife as a company until they pitched us on their API. You can use it to easily make a page about the Empire State Building or Bloomberg from just one source (eg NYT) or a variety of sources.Maybe AVC topic pages are next?
michels24 Oct 17, 2008

Some of the content API calls you mention above are already possible on Yahoo! BOSS. A developer can do a news search call and just restrict the results to come from nytimes.com – or restrict it to multiple news sites if they choose. As a side note, more advanced operators like this are another distinction between RSS and APIs.-bill
Nathan Lipson Oct 17, 2008

Thanks for the information Fred! I’ve been trying for some time now to convince my co-workers that we should open our platform. And since we are a content company, I can relate entirely to NYT’s step.
bombtune Oct 17, 2008

When do you think blog formats like Typepad and WordPress enable API?
Tom Hughes Oct 17, 2008

I think demand for such a service will come from below and spread virally. It’s slowed by two fears — the fear that you’ll be falling into the clutches of a particular company (GOOG or MSFT), followed by a generalized fear of big-brother-ishness. But it will be too useful to keep down forever; I’m looking for something like an API to the Google Toolbar, where the toolbar stores your vital stats and you authorize websites you visit to get the data they need. Maybe having a corporation do that is a bad idea, maybe a non-profit subsidiary of Mozilla would be more trustworthy…
Andrew Warner Oct 17, 2008

Too bad the wall street journal is always left out of these discussions.
Eran Shir Oct 17, 2008

So I couldn’t resist the temptation.Here’s an API for all NYTimes stories for a query:http://www.dapper.net/dapps…And the Mark Bloomberg stories result in RSS: http://tinyurl.com/MarkBloo…People have been using content APIs for quite sometime now to build innovative stuff. It gets interesting now when businesses start to understand that distributing their content is actually a very effective (and probably the most effective) marketing strategy. In fact, that’s exactly where online marketing/advertising and APIs/the semantic web meet.
Andrew Davies Oct 20, 2008

Hey Fred,the content API for http://www.idiomag.com will be live very shortly. for eg, you can call the latest music news, reviews, bios and media, from an artist name or genre.Andrew
dgottfrid Oct 22, 2008

We are working on things stay tuned. For those that really want an RSS of a given subject – we already have thoseMike Bloomberg http://topics.nytimes.com/t…Empire State Buildinghttp://topics.nytimes.com/t…In fact we have 13k of those. Every Topic Page comes w/ an RSS feed http://topics.nytimes.com/t…
1. fredwilson Oct 22, 2008
  
  DerekI love what you and your colleagues are doingFred
Justin Oct 22, 2008

Am I being a literalist? Or is this what you mean by “nirvana of semantic web”?MTVN Content API – Nirvana Video Methodhttp://api.mtvnservices.com…The MTVN Content API is also mentioned in the NYT blog.http://developer.mtvnservic…
Chris Phenner Oct 22, 2008

As a fellow, long-time Mashery customer (who hosts the Times’ API), I’ve had a content API live for about nine months, first on a limited basis, and it’s been thrown wide open for six months.My request: Please lose the quotes around the term “Content,” as it labels the terms as if it’s newer and more novel than I think this use of APIs deserves. I feel like a PETA activist for APIs.Reuters is well ahead of the Times, whose “Campaign Finance API” (that deserves quotes) is toe-in-water at best relative to Reuters’ OpenCalais (also Mashery-hosted), which is basically a way-closer-to-realized example of what Fred’s post is all about. My strong hope is that the Times goes way beyond “Campaign Finance” data (quotes mine again), which I’ll bet isn’t even putting the Times’ data at-risk.
Salvatore Incandela May 5, 2009

Hi All, I’m a software developer and I’m deeply interested to this topic, I’can’t resist to report this Open Source Platform aimed to adhere this content publishers trend.http://madstore.sourceforge…Any suggestion would be appreciated, thanks!