Feb 18, 2010

How Unique Is A Unique Visitor?

I access the web each day from at least seven browsers;

Chrome on my macbook pro,

Firefox on my macbook pro,

Chrome on my windows desktop in the office,

Firefox on our "kitchen laptop",

Safari on our "kitchen laptop",

The Android browser on my google phone,

The blackberry browser on my Blackberry 9700

I know that I am not your typical web user. I keep both Firefox and Chrome open at the same time on my main machine (my macbook pro). I use three machines most every day, my macbook, my office desktop, and our "kitchen computer". And I use two mobile devices every day.

But I am illustrative of something important. Many people access the web from multiple devices and browsers on any given day.

Each of those browsers I use every day drops a cookie identifying me as a "unique visitor" and the web analytics software the website is using counts me as up to seven unique visitors when I am only one.

I read some research done by a startup called Scout Analytics that reveals some interesting data on this trend.

Scout Analytics is a "behavioral analytics" provider. For this research, they tracked web users using some interesting techniques:

Scout Analytics used tracking techniques of device and biometric
signatures to follow the behaviors of hundreds of thousands of named
users accessing paid content products. The biometric signature
identified unique users through an individual’s typing pattern to
eliminate errors in user counting such as account-sharing. The device
signature identified unique devices through data elements collected
from the browser to eliminate errors in device counting such as cleared
cookies. By correlating the named user account, biometric signature,
and device signature, an accurate mapping of individual to devices
could be produced.

And what they found was that a "reliable cookie" overstates user counts 2 to 4 times. That's right, if your analytics software uses cookies, it is possible that your unique visitor counts are 2 to 4 times too high.

This is just one study by one company and I'd love to see more research on this topic, but this is not the first time this issue has been brought up on this blog. In April 2007, I wrote a post on this blog, citing some comScore research, that suggested that cookies overcount unique visitors by 2.5x.

Our portfolio companies are asking me all the time why panel based third party measurement services, like Nielsen and comScore, always generate unique visitor counts that are way lower than their internal numbers. I like to explain to them that there are many ways to count unique visitors and none of them is perfect. The panel based approach, used by Nielsen and comScore, is certainly "old school" and seems out of place in a world where you can just look at server logs. But server logs themselves are subject to inaccuracies.

I advise everyone to triangulate between the various approaches to get some idea of the real numbers. I don't think any service out there today can give you an entirely accurate read. And that is why I am so enthusiastic about comScore's hybrid approach of marrying panel data and tracking pixel data.

I was a founding investor in comScore over a dedade ago, was on their board for ten years, and a fund I help manage still owns a very small amount of comScore stock. So I am not unbiased in this discussion. And this discussion is not just about comScore. It's about web measurement and why, 15 years after the advent of the commercial web, we still aren't measuring it well enough.

#VC & Technology #Web/Tech

Comments (Archived):

Louis Berlan Feb 18, 2010

Google is working in the direction of identifying people, rather than computers. Hence the synchronized bookmarks across different Chromes, the same search history if you’re logged in, etc.Although to them it’s mostly a question of following people across the web, to get more relevant ads in front of their eyes – I don’t think measure is the main objective. Would be interesting to see what solution they would think up, with all that data.
brisbourne Feb 18, 2010

We all like to think of the web as a uniquely measurable medium and sell it to advertisers accordingly – when in fact we don’t know as much as we would like to think. As you point out in this post visitor numbers can be out by 2-4x (and more if you count home computers etc.) and beyond that tracking the impact of display ads is very difficult, as the click doesn’t do justice to their impact (something you have blogged about before). Cracking these problems would provide a major boost for web startups. I don’t know if you saw, but last week here in the UK Yahoo partnered with the largest loyalty card provider (Nectar) to both target ads across Yahoo properties based on Nectar profiles and then track how their offline shopping behaviour correlates with the ads they’ve seen. I’m looking forward to the results.
1. Aviah Laor Feb 18, 2010
  
  True. But the same for TV. At least when a cookie hits the server you know that the person wasn’t in the toilet.
2. ShanaC Feb 19, 2010
  
  where will those results be. And it isn’t anything unexpected. Just annoying that we’ve only come to terms with that now.
HyperActiveX Feb 18, 2010

This may be a dumb question, but don’t any of the analytics take your IP address into consideration? That should at least take care of multiple browsers from the same node.
1. Mark Essel Feb 18, 2010
  
  I think that doesn’t work Hyper. Many pcs/interfaces share an outbound node (local router).
othylmann Feb 18, 2010

I wrote two articles about the subject in 2000, looking at log analyzers and counters back then. I ran a counter, so this might not be unbiased, but the important point is:Both systems have their weaknesses, but the important thing to remember is that you have to understand how each system defines a unique visitor to understand the information each system gives you. (http://www.clickz.com/827831 , first one: http://www.clickz.com/827591)There is no correct way to do this. You just need to understand what you are collecting and how it is collected to really get most out of the data.
1. Aviah Laor Feb 18, 2010
  
  log analyzers are based on in-house data that can be easily gamed.
  1. othylmann Feb 18, 2010
    
    We are mixing two different things here.One: Knowing what kind of / how many visitors/users/realpeople you have on your site. Two: Publishing somewhere to compare with others.One in my mind is the important one in the context of the post. Two is totally different and just needs a decision on where the people you need to reach with that statement go looking for their data.
jonathanmendez Feb 18, 2010

Fred – Your points are spot on but only scratch the surface of the issue. I just blogged some thoughts on cookies (vs. links) this past Sunday — how cookies are negatively impacting the coming data revolution. http://bit.ly/asVRgc
1. fredwilson Feb 18, 2010
  
  you phrase it as “the link vs the cookie”should that be “the click vs the coookie”?
  1. jonathanmendez Feb 18, 2010
    
    it probably should be considering the user-agent also contains valuable data that can be used along with the url (link) parameters.
  2. Desire Athow Feb 19, 2010
    
    in your article “dedade” should read “decade” that’s last paragraph.
2. Mark Essel Feb 18, 2010
  
  Nice writeup tipping your hat to the link economy. The value of a link isn’t visitors/unit time it drives, but the aggregate experience of the browsers who follow that link. If they appreciate the link (like I appreciated you sharing it here) it has stronger value and growth potential.
robertavila Feb 18, 2010

On the other hand my daughter hops on my machine to hit facebook, gmail, et al and to buy books for herself and my tech challenged wife on my Amazon account, which at this point sincerely believes that I have strange and unique reading habits. Such behavior may not be uncommon in many homes. At the high end of the multidevice / income distribution we may be over counting uniques and at the low end under counting them…
1. William Pietri Feb 18, 2010
  
  Back in Ye Olden Dayes of the Webbe (late 90s or so), that was certainly our theory for why cookie tracking (or IP tracking, for that matter) was a reasonable audience measure. Some people used multiple devices, some people shared devices. Both were sources of error, but it was plausible that they roughly canceled out.Now, though, devices are so much cheaper that it’s worth questioning that assumption. One factor that pushes the other way is the rise of personal devices. A decade ago I might have borrowed a friend’s computer or a cafe computer to check my mail. Now people are much more likely to use a laptop or an iPhone than a foreign computer.
  1. robertavila Feb 18, 2010
    
    This is true, however casual empiricism has its dangers. US household income divides roughly into thirds with a third of the income going to the 2% with income above $1 million per year, another third to the 18% with income between $100k & $1 mill and the last third going to the 80% below $100k. In low income neighborhoods there are still plenty of internet cafes. I am not arguing that they cancel out but only that distortion goes in both direction from individuals with multiple devices to devices with multiple users. Computers are no longer personal, they are merely means of access, some of us have multiple access while others access wherever they can. We are not typical users, because the typical user no longer exists.
    1. Mark Essel Feb 18, 2010
      
      Frame the problem with total usage, but also include the monetization potential of visitors. What we want to measure is total purchasing power scaled by their average willingness to follow a link or make a transaction. Fred likely spends more on swag for his family than the average web browser.Actually following Pareto’s principle maybe sites should be catering to only the 20% wealthiest web browsers. Here’s their source IPs…
      1. robertavila Feb 18, 2010
        
        Purchasing power is almost definitionally correlated with number of devices owned and I would not be surprised if transactions via devices were not also correlated with. In which case Fred’s question may be moot: better to count him a dozen times over a dozen devices then to count a dozen individuals over their sole access device. As for just catering to the top 20% remember that WalMart got to be the world’s largest and a most profitable retailer by targeting the lower 80%. What matters is targeting the right customers for what you are selling…
      2. Mark Essel Feb 18, 2010
        
        Who doesn’t love Walmart? Me.They’re a logistics juggernaut, and brand king but I can’t resist their economy size packs of boxer briefs… 🙂
      3. ShanaC Feb 19, 2010
        
        It’s where along the pareto curve you are targeting. This is an interesting question that I’ve been thinking about: How will this affect search, are we going to do class based search (like I get results that are a little too tailored and I dislike it, because in reality they are off) Will different levels of education use search differently and the next gen search engine really be like hunch and we’ll separate out into our own little worlds of class, gender, whatever?There are huge problems with that if that is the case. The internet in some ways allowed specific groups of people to rise above all that. Will secondary/tertiary internet growth stop that behavior?
      4. robertavila Feb 19, 2010
        
        Way back in the really Olde Days before the 1990s target marketing meant knowing what kind of people lived in which zip code so you could send them the right catalogs. The internet was supposed to be about buyers seeking out rather than sellers reaching out, but as every seller knows you won’t be sought out if no one knows you are there. Hence the need to advertise, to establish consumer awareness. Half of all advertising is wasted, but no one knows which half. The trick is to target precisely but not to look like you are targeting precisely so that the target can say, Oh look what I just discovered! and not feel like it was too tailored, Good advertising has always been as much about crafting the message as it has been about targeting.
  2. Mark Essel Feb 18, 2010
    
    Yeah, Fred’s not alone in his over browser, web capable device usage.The average number of web browser capable machines per person will continue to rise, we can probably plot them with a Moore’s law curve 😉
Elie Seidman Feb 18, 2010

Focus on users and visits, not uniques. There is so much (too much) talk about uniques but what would you rather have? 500K visits per month with 10 minutes on site each or 2M uniques with 2 page views (nearly no time on site) each?If the same user visits from 5 different computers, they are one unique but 5 very real visits – visits don’t lie.
1. debunker Feb 18, 2010
  
  umm. visits are good to know yes, but advertisers care about uniqueness (reach) and influence (frequency). still gotta get to better unique #s.
  1. Elie Seidman Feb 18, 2010
    
    The advertisers we speak don’t care much about uniques beyond the fact that there is a critical mass that you need to be at before they can invest the time to advertise with you. What they care about are what those uniques are actually doing. The hardest ad sale out there is “I have lots of uniques with low engagement who are demographic X and demographic X sometimes buys your product”. It’s vastly more compelling when the user – whether 1 of them or 1M of them – are actually specifically manifesting intent to buy product X (vs just being connected via the transitive property to product X). Therefore my point about focusing on users, not uniques. Of course, it goes without saying that if your users are valuable, you want to have as many of them as possible.
  2. Saumil Mehta Feb 19, 2010
    
    Amen! Yes, advertisers care about reaching unique users, heavily engaged users that drive 90% of the volume is great for the fan base and for the site’s lasting prospects but that by itself does not produce high advertising dollars. Uniques, on the other hand, most definitely do.
2. Aviah Laor Feb 18, 2010
  
  visits are skewed for the heavy users, and this can significantly distort the picture for mid-small properties
  1. Elie Seidman Feb 18, 2010
    
    Not sure what you mean or why the skewing matters? Almost any product or company in existence has a subset of heavy users who drive most of the volume. If you have 10M uniques but 1M that actually do any real engagement, the 1M uniques (and the visits they drive) are the ones that matter.
    1. Aviah Laor Feb 19, 2010
      
      Agreed. But many companies want to present reach which is traditionally based on the number of people. By Skewed I mean that if you quantcast large websites the charts for visits and people are very similar, because the large crowd offset the heavy users. For smaller websites you will find a bigger gap between visits and uniques
      1. Elie Seidman Feb 19, 2010
        
        Ah, interesting. Does Comscore data show the same thing?
      2. Aviah Laor Feb 20, 2010
        
        don’t know that
    2. Aviah Laor Feb 19, 2010
      
      In other words, the larger the site, the average visits per unique stabilize, in small sites the heavy users skew this
  2. ShanaC Feb 19, 2010
    
    Yes, these distortions probably signifigant if your website is tiny. This website is claiming that another website has 120 hits on certain days. Due to the audience being both tiny and probably heavy users, there is probably far less hits. Though who knows it’s way tiny when it comes to measurement, and way specific.So I agree with Elie here: there will definitely be heavy users and they are probably the ones that matter; However when you are tiny and need to grow, your heavy users could be everything or they could be a hinderance, this is the essential problem of crossing the chasm. We know that 120 number is wrong, but the people go there are probably very hard core about the website in question. If we assume the real number is 60, then we have a serious problem when the new question is, how do you grow?And this is true if it wasn’t a small niche non-profit website that someone runs for fun. (I think) It matters if you only have 10M uniques and 1M are hardcore. Are you willing to drop 1/4 of those hard core people to expand up your uniques total into more medium core people, of which there seems to be a lot more of in this world.
    1. Mark Essel Feb 19, 2010
      
      It’s a tough question, but each content provider, or web business has to determine their audience. Who are they aiming to please, 120 hard core fans, or 2000 somewhat interested people? Revenue usually drives that decision.Me personally, I’d opt for 10 uber fans that I give awesome value to. They can help build a solid fan base by word of mouth. I can’t convince a large group that I’m worth tuning into or helping found a business, but 10 zealots know how to convince their friends of something they believe in. It’s the power of genuine marketing. It’s something you can’t fake.
      1. ShanaC Feb 19, 2010
        
        I wish I could be the same. if I were building a company right this second,I would be driven by adoption and usablity. I would be aware that there aremy hard core fans and I mean, I would help them, but at the end of the day,they aren’t the bottom line(or the only part of the bottom line)I’d be afraid to get too interlocked with my hard core fans. If I screw upwith them, switching costs become emotional and they would ditch in aheartbeat. Whereas someone who is locked in because they are lazy…Irather have that person.That’s me though. There are definitely multiple schools of thought on thematter, and it probably depends on what you are building and who is it for. The switching cost for an individual is much less than for an organization,for example…
    2. Aviah Laor Feb 19, 2010
      
      The problem of crossing the chasm actually is that the heavy users won’t get you there. If you focus on revenues, than having enough heavy users is great, but if you build a viral loop you will also need the more light users, each to bring his own network. The more users you have, the avg visits per user flatten.Dropping the heavy users is probably damaged, since in the web 2.0 websites they are the 2% that make 80% of the user generated content
      1. ShanaC Feb 21, 2010
        
        I still think we need a closer look at this model. It’s not a super logicalmodel- it doesn’t really explain how other people are using the site atall…
3. ShanaC Feb 19, 2010
  
  FYI, this comment makes me really like subscription models of some sort in combination with uniques. People bite with both time and money. And it is a hell of a lot easier to keep tract of a subscription.Plus, not everything can be advertising. We have to come to terms with that…
  1. fredwilson Feb 19, 2010
    
    freemium
    1. daveevans Feb 19, 2010
      
      I consult and advise many dating sites, this is a fascinating and frustrating topic. There is popularity in terms of advertising (pageviews), popularity in terms of ranking (“We’re #1!”) and most important to consumers, “how many people are on the site that can actually write me back” (unique logins).The undercounting for panels is ridiculous. this is all seriously broken and I can’t believe we’re having this conversation after 15 years. Kill the cookie (and don’t forget Flash cookies). If server logs are the best way to measure, what can be done to get smart people turned on enough by the problem set and a clear view of the financial upside to take this on?Or is it a consortium of large sites agreeing on a protocol and shared data? I don’t know and I’m sure this has been talked over but there is an enormous amount of money at stake here and who knows how much is left on the table due to inaccurate measurement.
      1. Tereza Feb 19, 2010
        
        I’m very sceptical about consortia. I have led major work on several significant ones in the advertising space, for massive companies who thought they wanted to be transformational and rock the industry and had the power to do it. But no matter how much the upside and how fabulous the vision, there’s not enough skin in the game or trust on behalf of any players to ultimately pull the trigger.The consortium structure deneuters the gutsy/entrepreneurial drive which a truly transformational change requires.It’s a lucrative market for consulting fees, though.If you want to get anything actually done, stay away from consortia. Be a pied piper instead. Do something small but amazing and have the others jump into your caravan.
Tereza Feb 18, 2010

So…our financial models, if based on predicted usage, should inflate uniques by 2-4x? ;-)In all seriousness though — if we take as a given that all of them are wrong, when looking at comps do you view them as equally wrong? i.e. do you apply in your head a uniform discount factor or in your head are some more over/understated than others?I would think practically speaking we keep comps apples to apples.What techniques have you seen or employed to pass the sniff test? (for one that hasn’t gone live yet)
1. fredwilson Feb 18, 2010
  
  we just triangulate between all the data we’ve got and do an approximation and accept that is what it is
  1. Tereza Feb 18, 2010
    
    thx. sounds reasonable.
  2. Mark Essel Feb 18, 2010
    
    Triangulation is great for getting an estimate now, but tracking with dynamic system models is a better estimator.If it works for real objects, it may work for real visitors.
Frank Denbow Feb 18, 2010

I was just reading about Neilsen yesterday (article from last August: http://www.nytimes.com/2009…They do use meters in major markets but I still wonder if there are viable alternatives that can be more accurate and give more robust information (i.e at what point in a show did viewers tune out because they were bored)
Jeff Greenfield Feb 18, 2010

Fred -Part of the problem has been the approach we have taken with the Cookie in not educating the public on their importance. Besides ‘tracking’ … Cookies can provide individuals with a unique identity online … regardless of the device they use.We have worked on a solution in our lab — but implementing it would take a massive education effort — but the time may be getting close ; )
1. fredwilson Feb 19, 2010
  
  sadly cookies have a bad name thanks to people like walt mossberg who have lambasted them.
Dave Pinsen Feb 18, 2010

“if your analytics software uses cookies, it is possible that your unique visitor counts are 2 to 4 times too high.”That’s a little discouraging.One thing I haven’t tried is combining Google Analytics and FEEDJIT on the same site (though I’ve used them separately — Analytics on commercial sites, and FEEDJIT on blogs). Comparing the results of the two on the same site over the same time period might be interesting.
1. Mark Essel Feb 18, 2010
  
  Discouraging for me too Dave, that means you’re the only one that reads my blog, I thought I had 3 readers 😉
  1. Aviah Laor Feb 18, 2010
    
    well actually you have 4
    1. Mark Essel Feb 19, 2010
      
      Hope each visit gives you something fun to chew on.
      1. Aviah Laor Feb 19, 2010
        
        you have a good blog. now you have to convert those insights into $$$
      2. Mark Essel Feb 19, 2010
        
        The market price for my insights is still under the wage I can earn for shovelling snow. What do you think I could do to make visiting the blog more fun? Maybe categorize my posts into design, startups, tech/code versus far out versus philosophy/mumbo jumbo
      3. Tereza Feb 19, 2010
        
        Mark I think that’s a good idea.I love far-out thinkers and really dig things you have in there. Have meant to go back for more.But many people need a wrapper that helps them know where/why a post is useful.
      4. Mark Essel Feb 19, 2010
        
        That’s a toughie. I think I need to learn more about my readers (even occasional ones) to better relate specific posts and shoot them their way.”Hey this post is about startups, Tereza might like it, I’ll send here a tweet/email”What do you think of a twitter account by topic? I can setup multiple streams to update tweets based on the tags I associate with a post. So you could follow a topic of your choice, or unfollow an area which doesn’t resonate.
  2. ShanaC Feb 19, 2010
    
    I stop by occasionally…I need to get myself organized that’s the problem…Something else I realize, I really write for myself, if something happens, that’s ok. I’m not writing for the comments.
    1. Mark Essel Feb 19, 2010
      
      My primary reason for writing is selfish too Shana, helps clear my mind for the day. But blogging also let’s me connect to people asynchronously, organize my thoughts in a communicable way, and build up a backlog of ideas that become refined with feedback.
2. ShanaC Feb 19, 2010
  
  I see lijit as the search engine, although I get results back from them. I wish I had a better understanding of analytics so I could compare the two…
Josh Feb 18, 2010

Focusing on trends and whether there is improvement over time towards meeting goals can be very liberating. Different analytics tools measure uniques in a variety of ways, and regardless of which tool or technology is being used to track visitors, there’s a fair amount of under/overcounting. It’s easier to measure the impact of an ad campaign, a redesigned page or funnel, or conversions over time than it is to collect, process and integrate all the technical details about collection to understand and minimize inflated counts.
Pascal-Emmanuel Gobry Feb 18, 2010

It’s fairly easy to imagine a system that accounted for each unique visitor. But precisely for that reason, that system would be a privacy nightmare.
Mark Essel Feb 18, 2010

You could relate authenticated users to total users using a simple ratio. Take a small sample, understand the average number of anonymous visitors to authenticated users (easy to measure) and scale auth user count.A friend and I at work independently estimated the amount of change saved in a jar by a coworker over three years to within $20-30.Some novel user identity services would clear up visitor count quickly (Kynetx). Jesse Stay covered them well last November.
1. ShanaC Feb 19, 2010
  
  I wish I had a save this post to somewhere else feature. This is a good idea…
taylorwc Feb 18, 2010

I think you’re exactly right–I typically use at least 4: work computer, home computer, blackberry, and ipod.I also wonder about referring site statistics vs. direct visits quite a bit. For instance, if I see on an rss reader or on Twitter that there is a new AVC post, but don’t have time to click and read immediately, I frequently make a mental note and go directly to the site later in the day.The metric tracking does continually improve, but perhaps a slower rate than the rate at which people are creative and fickle.
ErikSchwartz Feb 18, 2010

It’s really easy to fool yourself by tracking irrelevant metrics. Just because your app is downloaded doesn’t mean they’re all active users. You need to cross reference a variety of data points to really understand where you are.
1. Mark Essel Feb 18, 2010
  
  Right on Erik. Measurements aren’t the best estimate of truth. Refinement and valdation of a solid estimation model works much better at getting close to actual metrics.
2. fredwilson Feb 19, 2010
  
  that’s why we funded Pinch, which after the merger is now called Flurry. mobile app developers need to know how actively their app is, not how many times it has been downloaded
Florent Peyre Feb 18, 2010

I agree with the tringulation strategy: always compare your Google Analytics logs vs. Omniture (if you have it), ComScore data (if you have access, it’s not cheap) and Quantcast and Compete which are mostly (even if I’m very frustrated by publishers that decide to “hide” their data on these services – huge statement of weakness). That will take time but it’s pretty critical to be on top of that to be able to bring answers to investors, clients etc., especially when ComScore undervalue you.And then, put the ComScore beacon and the Quantcast tracking tag on your site to push out more accurate data (the only downside is that you keep accumulating JS and tags on the backend of your site…).
BmoreWire Feb 18, 2010

Good question and all but, in reality it doesn’t really matter. You just need to come up with a valuation formula for impressions per cookie and the value that you get out of that. That formula is hard enough in itself dealing with refreshing cookies and repetitive cookies and removing all of the Gausian noise. But at the end of the day you have input (impressions/cookie) and output ($$ made or products sold or whatever your value metric is) and then figuring out a filtering system in between the two of those that can most accurately predict the future.
Darren Herman Feb 18, 2010

My favorite call from a client is when they ask why Google Analytics is showing different reports than Coremetrics. Your post above validates the fact (along with thousands of previous posts by people part of this ecosystem) that it’s broken.Triangulation is what we say as multiple providers never measure the same way. Each provider has to have their own “special sauce” which actually hurts the industry when we need to move forward.
Roy Rodenstein Feb 18, 2010

Fred – very much agreed, and there are cases where *under*counting happens as well, e.g. if a user’s internet connection is slow or the whole page fails to load, sometimes they are not counted (especially if the Javascript code is at the bottom of the page or loaded dynamically, which is recommended to minimize impact on the user experience/page load speed).Of the newer wave of services, we found Quantcast, which also takes a hybrid approach in its own way, to be the most accurate for our traffic at Going.comThe biggest problem I have seen with comScore is for sites under 5M uniques or so, the panel-based approach just doesn’t have enough data it seems so the standard deviation of their miss is bigger for small sites.
Eric Leebow Feb 18, 2010

I think the value of a unique visitor is not someone who just visits your website, it’s someone who talks about, and facilitates conversation amongst other people, in order to add uniques through word of mouth conversation. Your greatest asset is not just that of a unique visitor, yet the unique visitor who will say, “I really like your blog” or I enjoy what they write about your blog elsewhere in the social media sphere. Each of these people who share your blog are unique visitors, and it’s safe to say that the uniqueness of you browsing on other browsing methods is not so unique, yet you as a visitor is unique. This makes you a “uniquifier” to an extent, someone who leads more child uniques that relate back to that one unique parent. Think of it similar to wireless networking, where you can have a wireless replicator, yet this replicator is latent since there is not yet a true device and methodology of the ultimate data portability. Suppose there were an ultimate portable cookie, then you’d be just one unique, as your multiple browsers, devices with browsers do not define you as someone who is unique. Someone who brings visitors to your blog is what is called “uniqueator” which is similar to an educator is someone who teaches, the uniqueator turns others into unique visitors. In an ideal world, data should be portable, and sync to all of your devices, browsing methodologies, and whereabouts.
Mahabharata Feb 18, 2010

and what about keyword tools like semrush? did you tried it..? i’d advise to, cause it is really effective and easy in use.Just enter the url and it will provide you with a report which includes organic and adwords keywords that you can find the site, positions, CPC by and more. And its all summurized in easy to understand charts.
Kiril Savino Feb 18, 2010

Definitely not new, but I’m happy to see this kind of engineering applied to the problem. Doing exactly this (tracking patterns in mouse and keyboard use) was my “fun-time” project at DoubleClick in 2004, though I never got it to a usable state. Even then, the multi-computer user and multi-user computer were regular topics of conversation.
Hershberg Feb 18, 2010

Today’s AdExchanger has a good interview with Jeff Zwelling, CEO of Convertro: http://bit.ly/bdzqSEThese guys have developed a tracking solution that eliminates the issues associated with cookie dependency (use of multiple browsers, deleted cookies, etc) — far superior to anything else I have seen. Definitely worth a read.
MattCreamer Feb 18, 2010

Great post. But it just scratches the surface of the problems with uniques, which don’t tell you anything about the quality of the audience in question.
Howie Feb 18, 2010

I LOVE THIS TOPIC. I have 4 twitter accounts. 2 Facebook. 2 Myspace. Google-Analytics I block on Firefox because I use No Scripts to prevent viruses. No need for it to run since it does me personally no value. I just posted a on my Marketing-Sensei Blog yesterday about this, but more from a Business point of view. I do know that over 15% of my 500 friends on Facebook are not active on the site, yet they are included in the 400 million accounts? If you log in once per month your value is zero to advertisers yet your included in the total monthly visitors?So the problem is Advertising Based Revenue Models for some Web Properties. Facebook and Twitter should be technology selling based business models like Apple because they offer technology vs information like Yahoo. Facebook could charge $2 per month and increase their revenue 12x tomorrow and ditch every other source of revenue. (assuming all accounts are valid)So an Ad Based model means Sites want higher visitor counts and page views. They need this. And Brands/Advertisers are willing to pay for REAL numbers not fake.Nothing different than the TV ratings battles for measuring. The question is how to accurately measure these things. As a Brand I want to reach Fred Wilson but I am lying to my CFO if I say I reached 7 people yet really only reached one.
justinkistner Feb 18, 2010

Digging up the old debates! The only way to resolve a person surfing a site with multiple browsers on multiple devices is to have that person identify themselves on all devices with cookies enabled that would tie back to a data warehouse on those visitors. Since there are so few users that would be fully identifiable on all devices and browsers, it wouldn’t be a reliable number for the site.You could, however, do a few things. 1) Using the cookie and data warehouse method, you’d at least know how many unique returning visitors you have 2) Provide the number as a range. The high number would consider each session a unique visitor and the low number would assume visitors visited the sites with all browser/device combos and would calculate how many visitors would there be if each used as many combos as possible. For example, if a site tracked 3 browser/device combos for 100 visits and combo A represented 50 visits, combo B represented 35 visits, and combo C represented 15 visits; then we’d assume 15 people used combos A, B, and C; 25 people used combos A and B; and 10 people used just combo A; giving us a total UV count of 50 people.If one did a study that monitored a sample of visitors to find the number of device/browser combos attributable to the average user, you could also calculate a middle number for the UV range that would be the number closest to the correct number of UVs.I’m sure a comScore or a Nielsen could provide that constant on a ongoing basis. If they were really ninja, they could do the study based on user demographics so people could get an even more targeted constant (like the difference between average device/browser combos for tech companies vs. agriculture sites). Maybe in the tech sector people use an average of 3 browser/device combos and in the agricultural sector the average is 1.5.
Scott Gatz Feb 18, 2010

I’d love to see more analytics products take into account “signed in user”. Many (but not all) of the services we use have login systems. When you log into a site on multiple browsers, it’s a clear signal that you are the same person. That’s an opportunity to link up all the unique browser cookies into one “truly unique user”.Our systems at my last company had that and it gave a much better picture – our internal data often more closely matched the external data from comscore and others.I can’t imagine that google doesn’t already do this, I would love to see it added to analytics.(of course, this doesn’t solve for the users who view but don’t sign in, but it’s a step in the right direction).And +1 to Roy’s plug for Quantcast, we use them and find it a great tool.
Aviah Laor Feb 18, 2010

The pandora box just opened. The key is to focus on usage of the stats:1. Seeking revenues: than transactions and paid customers are the bottom line.2. Raise money: Than investor should trust the company more reliable in-house data which never goes public. If you don’t trust on that, you will probably won’t invest anyway.3. Identify trends overtime: use the same system. If traffic doubles on the same measure tools, it’s probably doubled.4. Compete against “traditional media”: The truth is that nothing scares “traditional” advertising than facts and actual impact measures. The most flawed web analytics is much better than any impact measure of newspaper/TV/radio ad.5. Make some noise, provocative blog posts, “competition coverage”: write down your case, and than find the tool and the marginal query that proves your point. You will be able to prove almost anything.
Jamie Lin Feb 18, 2010

I also keep 2-3 browsers open most of the time and use my Nexus One and iPod to surf when I’m on the road. But except for the sites/services that I use daily/constantly, like Google and Google Reader, I rarely visit the same sites with 2 or more of my browsers. What I’m saying is maybe you need to discount the visitors with high frequencies but leave the ones without. With that said, having some sort of internal analytics based on logins is infinitely more useful than the numbers derived from cookies.
mojaam Feb 18, 2010

Accurate web analytics is a complex tricky thing.
Mihai Badoiu Feb 18, 2010

I also use several devices, firefox and chrome on macbookpro, firefox on a 2 linux boxes, safari on a mac station, and safari on an iphone. However, I generally visit different pages. Only the pages from firefox on the macbookpro would be more personal, while from the rest is just sparingly. So, my internet presence wouldn’t necessarily be counted more than once.
Mihai Badoiu Feb 18, 2010

Does it matter much if measuring the internet presence is not accurate? As long as we have reliable metric, a service can easily figure out its growth. The only issue may be comparing different services, but that will always be a problem.
Josef Feb 18, 2010

What’s so new about this? Advertisers only care about Nielsen numbers which are based on panel data. Nobody cares about internal server data which relies on cookies.
joeagliozzo Feb 18, 2010

Fred, you must be an Xmarks user with all those browsers and devices – I know it solved a lot of problems for me (I am not affiliated).Seems like this could be a great business for Xmarks – if they could get a statistically significant portion of their users to agree, they could do a study of how many “uniques” there actually are given multiple browsers, etc. At the very least you would have a valuable estimating tool/service. Websites/properties probably wouldn’t be happy but I am sure advertisers would pay for the information – it would be the equivalent of an audit.
Akira Hirai Feb 19, 2010

I think focusing on absolute visitor counts is a fool’s errand. It’s much more useful to focus on correlating changes in traffic with changes in marketing strategies, revenue generation, etc.The one obvious exception is where you might need to project some kind of revenue per user for the purpose of valuing the company in a financing (or acquisition) before the company has a track record of measurable revenue. And you can always skirt the value question by going with convertible debt.
leigh Feb 19, 2010

triangulate? most clients barely even look at their stats except once a quarter in a review. it’s amazing to me how much potentially company changing info is buried deep within what people actually DO within an experience and most marketing departments still prefer to ask people what they think they might like in a focus group.As for unique visitors? why we continue to apply mass media metrics to interactive media is beyond me.
lawrence coburn Feb 19, 2010

I’m not an analytics expert, but I do care about my site’s analytics and have tracked this subject pretty closely.The idea of triangulating “guessed” (third party panel) data with “actual” data (server / pixel) data has never sat right with me. It’s like averaging the actual temperature with yesterday’s forecast to come up with some meaningless data point. Nobody can tell me that adjusting an actual data point with a guess will yield a result better than the real data point.It seems to me you have a panel if you can’t get a pixel. With companies like Quantcast proving that websites will give up that pixel data in exchange for their traffic being reported accurately, the days of a panel’s utility seem numbered.Giving a third party like Comscore access to both panel data and pixel data is a step in the right direction. But I’d rather see Comscore go all the way – invest in creating a smarter pixel and scrap this panel triangulation stuff.One other point: it’s going to be a tough sell to get websites to pay a third party to tell them that their analytics data is way overstated.
1. ShanaC Feb 19, 2010
  
  I rather be honest than sound like a shill.
  1. lawrence coburn Feb 19, 2010
    
    Hi Shana, a shill for who? For Quantcast?Nah, I don’t know any of those guys.Analytics is a bit of a sensitive issue for those of us who run sites, because under-reported traffic affects your ability to sell advertising, as well as the general market’s perception of your site’s importance. Comscore and Compete are notorious for under reporting. Alexa is notorious for just being wrong. Quantcast is the closest thing on the market to transparency because of their pixel.The fact that Comscore is now offering to triangulate pixel reported traffic (for $5K) is a step in the right direction. But they need to go farther.I know you are a fan of this blog (as am I, it’s my favorite). But please be careful about dropping a “shill” on somebody.
    1. ShanaC Feb 19, 2010
      
      myself.
      1. lawrence coburn Feb 19, 2010
        
        ah, whoops. didn’t mean to get all defensive there 🙂
      2. ShanaC Feb 19, 2010
        
        When disqus turns into Chatroulette… I’m sorry I just had to make thatjoke.;) Lack of faces though. Nothing like seeing real people… There was noway to know for you to understand that I meant myself. I think it is also,as I said, a reason to move to some sort of subscription model included(freemium). At least then you have an idea that there are certain amount ofusers who are definitely willing to pay/use your service (voting withdollars) If you are going to include advertising, you could at leastsubtract those people from your numbers…
    2. Wavelengths Feb 19, 2010
      
      So analytics is analogous to the appraisal business in real estate? Like the uneasy relationship that a homeowner has with the appraiser? No one wants to hear that their $2.5 M house is suddenly worth $825,000?It seems that the “meaning” of the analytics is also somewhat lacking. If I visit Fred’s site and explore all the info leads from his blog and comments, I’m one type of visitor. If I drop in, read the blog and disappear, I’m a different type of visitor.Likewise, that example house might have $2.5M of materials, but be located next to a waste treatment plant. Or it might be a $500K house located on a $3M piece of beachfront property.
      1. lawrence coburn Feb 20, 2010
        
        Agreed that all traffic is not created equally. Somebody who intentionally goes to my site is worth more to me than someone who gets there by accident. Yet both are a unique visitor.
2. fredwilson Feb 19, 2010
  
  it might make sense to you if you are a third party, an investor, advertiser, potential business partner
Aaron Gray Feb 19, 2010

At least one web analytics vendor, Coremetrics, has addressed this challenge for years for sites that require authentication. As part of their solution, they regularly scan your database of behavior profiles associated with unique cookies (they call them LIVE Profiles), looking for cookies that have the same user id. When multiple cookies with the same ID are found, the behavior profile from each is stitched back together to form the history of one “person”. It has always perplexed me that other vendors don’t do something similar.Though I know less about how they do it, I’ve been impressed with Quantcast’s measure of “People” as well.
scottythebody Feb 19, 2010

Ya. it’s a little insane. For example, I visit your site at least once / day when I’m in active “web browsing” times (i.e. not too busy to surf the web). However, I might do it from:My personal iPhoneMy work iPhoneMy work Windows computer using either IE, Firefox or ChromeMy work Windows laptop with IE or SafariMy work MacBook pro with either Firefox, Chrome or SafariMy home Mac with either Safari, Chrome or FirefoxMy home MacBook Pro with either Safari or FirefoxOver a week, it’s not inconceivable that I could account for a dozen or so unique visitors.
1. scottythebody Feb 19, 2010
  
  And don’t forget that I also have you subscribed in Google Reader and I might view your site from a link I received in Tweetie and read in-app on Tweetie.
  1. ShanaC Feb 19, 2010
    
    wow
iptiam Feb 19, 2010

Can we make an argument that Fred Wilson on the Android browser is different from the Fred Wilson on macbook pro (for example, you wouldnt necessarily do something that takes time, on the Android) and therefore, are two unique users anyways?
1. fredwilson Feb 19, 2010
  
  yes in some casesno in othersbut its an interesting point
Desire Athow Feb 19, 2010

Fred, what about multi tabbing? I routinely use Chrome with 25 tabs opened (I’m a journo). What impact can this have on PVs and time on site?
1. ShanaC Feb 19, 2010
  
  Does anyone know the answer to this, I’m curious too
  1. jarid Feb 20, 2010
    
    Shana, time on site is another one of those misunderstood metrics. Leaving a page open, in a tab or not, does not increase time spent on site in the web analytics sense. Time spent on site is calculated as the difference between the time of the first page request and the time of the last page request. Any time spent on the last page does not count. Thus, if you only visit one page, time spent on site is not even calculated.
    1. ShanaC Feb 21, 2010
      
      That’s just…*sigh* back to the drawing board.
2. mwexler Feb 19, 2010
  
  Tabs are treated by most web analytic tools as more requests in the stream. Tabs currently share the same cookie space. So, if you visit my site in 2 tabs, one looking at my “About” pages while bouncing also to a tab looking at my “Product” pages, the analytics tool sees 1 “disjointed” stream. It will look odd in path reports and other “page to page” reports, but the time spent and PV metrics will sum up correctly.In the future, when each tab can be it’s own cookie space (if that happens), then it will be a different story. But for now, I think it’s biggest impact is on “page-to-page” metrics and “pathing” in a site, which fall apart for heavy tab users.So, tabbing has problems, but they aren’t necessarily the same dimensions as the cookie overcounting, undercounting, deletions, etc.
  1. Desire Athow Feb 19, 2010
    
    But then, how do cookies evaluate meaningful activities. Leaving a tab opened doesn’t mean that I am on it all the time.
    1. Aaron Gray Feb 19, 2010
      
      Cookies merely serve as an identification token. When a new pixel request is sent to the data collection server of the analytics tool, the image request is loaded with parameters about the browser status, the machine, and the behavior of the user. One of the parameters is the cookie ID. The analytics tool simply analyzes this incoming stream of pixel requests, and assembles requests with the same cookie ID into sessions (visits) and increments visit, and page view counts for that cookie (visitor). As mwexler said, multiple tabs in the same browser share the same cookie. So user behavior across tabs is collected as one singular stream of activity by the analytics data collection server (the pixel server).
Josh Auerbach Feb 19, 2010

Quantcast has done some research on translating between cookies and people. There’s an overview of the methodology and a link to the associated white paper here: http://www.quantcast.com/di…
jarid Feb 19, 2010

I agree – all of the methodologies are flawed in one way or another. We could spend days arguing over which is least flawed. As an aside, Visual Sciences (acq by OMTR) had an interesting approach of combining log data with pixel data to try and “triangulate”, if you can call it that with just 2 sources.But, I take an entirely different approach. I pick a methodology and stick with it. With web analytics, it’s the trends that matter, not the absolute values. Yes, my 1MM “unique visitors” may really well be .5 MM. But if that number goes up 2X next month, that’s my growth rate, despite how over/under-inflated the absolute value is. And, growth is what matters.
Michael D. Feb 19, 2010

This is a little off topic, but how do you manage your bookmarks between the browsers? I have a Windows desktop and Mac laptop it’s a pain bouncing between browsers and also desktop software. Eventually, I would like to become completely “cloud” based. Thx
1. scottythebody Feb 19, 2010
  
  I use Mobile Me, which sync my bookmarks between all 3 of my macs and my two iPhones. Don’t use it on Windows since those bookmarks are more “intranet” related and don’t function outside my office network anyway.Anyway, Mobile Me is a mixed bag. The web apps suck completely, but there are some killer pieces of functionality that surpass other offerings on the market (if you’re a Mac user). For example, the address book and bookmarks work flawlessly across my Macs. But I have to use Google Calendar to sync my calendar on all my devices because my work security policies don’t allow me to install Mobile Me sync client on my work computer and I need to get those appointments to my iPhone somehow.
2. fredwilson Feb 19, 2010
  
  google used to have a firefox extension that did that for mei believe they discontinued it
Dean Wormer Feb 19, 2010

Your third party research is confirming something that both comscore and netraings have been saying for years. Web publishers hat to listen to this and it is dissapointing to deal with lower numbers, but the fact is that log analytics are NOT accurate when it comes to counting UVs for the reasons you have just described. Dropping cookies is simply not reliable in an age of multiple machines and aware users who clean their cookies or have privacy software that does it for them regulalry.However while it is fun and interesting to look at your server log uv counts and pat yourself on the back, the only number that really matters is the number that advertisers are willing to pay for…and that number is and always will be a third party number……the reason for the existance of comscore and nielsen in the first placced.
robertavila Feb 19, 2010

Perhaps the internet should be thought of more like outdoor advertising, what is most important is the volume of traffic passing by…
shelly turner Feb 19, 2010

isn’t uv determined by IP address not machine address, ie if you log on from 20 computers and 3 browsers off your home wifi, isn’t that 1 UV not 60???
1. Aaron Gray Feb 19, 2010
  
  IP address is a very old method of identifying a UV, and is still used as a fall-back method in some tools for users who do not accept cookies. Cookie is the most wide-spread method. Some tools allow identification using user authentication, but that only works on sites that require authentication, such as intranets.
  1. Adam Ware (@wheresitworking) Feb 20, 2010
    
    It’ll be interesting to see how this changes in the future though, as more sites (e.g. this one) have you authenticate (e.g. via OpenId, FB connect, etc.) to interact. How that data begins to be passed from the authentication providers into analytics, and how it is used will be an interesting factor in the unique visitor discussion. Tools that can pass that authentication info into their the analytics identification process may be able to provide much more accurate and complete data on unique visitors.
ADstruc Feb 19, 2010

Fred, that is an amazing amount of interaction on so many different machines. I was just telling this to my girlfriend and she was like how/where is he able to devote time to his wife and family – obviously I need this advice haha!
1. fredwilson Feb 19, 2010
  
  7am to 8am and after 6pm is family timei don’t “work” on weekends but i do email in the mornings when the rest ofthe family is sleeping
johnfurst Feb 19, 2010

It doesn’t really matter, does it. As many of the previous comments point out as well. At the end of the day, what counts, is the money in the bank respectively how many took the action requested on the page they visited. All those values are relative to “something.” It’s important to measure them consistently.Cookies don’t “count” as well anymore, that’s definitely a fact. But there is another trend of “being” logged in on sites like discqus, facebook, twitter, openid, etc. That produces a lot of data…BTW: I’m not sure how reliably “typing patterns” can be assessed especially on sites where reading is the most likely task.
mwexler Feb 19, 2010

No moreso then when the user leaves a page open, minimizes, does a call for 15 minutes, and comes back. Should their session really count as previous time + 15 minutes + new time, or should we eliminate the 15 minutes? Same issue.Most “traditional” web analytic tools take the simple approach of stitching requests together, by cookie (and sometimes other variables) until 30 minutes of no activity have passed. Smarter tools could, for example, look for the “onFocus” javascript event and only track that time, for example, but most don’t.
1. Josh Fleischmann Feb 19, 2010
  
  Segmentation is a good way to reduce noise from outliers (like your 15 minute phone call example). It’s easier (and more rewarding) to focus on groups of visits than torturing individual sessions for meaning.
2. Aaron Gray Feb 19, 2010
  
  I like the “OnFocus” idea. Also, most traditional analytics vendors allow you to dial down the session inactivity timeout from 30 minutes to whatever value is appropriate to the site being measured. Dialing down to, say, 15 minutes, or 10 minutes would spike your visit count, but give what is probably a more accurate picture of more, smaller bursts of activity.
CGW Feb 19, 2010

Absolutely. Also to consider that if you’re constantly on your own site checking it, that has a non-trivial impact on your data, especially if your fanbase is < 1000.
1. Aaron Gray Feb 19, 2010
  
  Filter out your IP addresses.
Adam Ware (@wheresitworking) Feb 20, 2010

I like the “onfocus” idea also. If that was something that could be “played around” with more, I think we’d see the industry standard shift quite a bit. Having it be 30 min for “most” sites makes no sense, with the variety of differences between various sites’ functions.
avertiz Feb 20, 2010

I believe there are a lot of companies out there, not in the analytics/metrics space, who are trying to solve this exact problem. The issue is that they are running into privacy issues. This all ties back to OpenID, Facebook Connect and such.Privacy aside, I believe a company like Facebook can solve the problem. Ideally, no matter what computer or browser you are using you’ll also be logged into Facebook (or Google, or OpenID, etc). Once that authentication has been established it won’t matter what browser or device you browse from, Facebook will know that each visit is actually the exact same person and therefore not entirely unique.The key here (and I believe Facebook and Google are the front-runners in this space) is for there to be a repository of information. That way, when your analytic software sees a user it first checks with the repository (say Facebook Connect) to see if this user is actually the same user who visited a few minutes ago but on a different browser or device.In my case, the only two companies that know who I am all the time are Facebook and Google because no matter what browser or device I’m on I’m usually authenticated to one or the other. Do I like that? Do I trust them with having that information? No, not really. Can I see passed that if I get a better browsing experience and if sites can do a better job of knowing who I am (not for advertising but for customer interaction)? You bet ya.
1. Aaron Gray Feb 20, 2010
  
  This can already be done (with some analytics tools) for sites that require authentication. If the analytics tool allows “authenticated user sessionization” (some do, not all), you don’t need to rely on cookies for UV tracking at all. The site simply needs to pass an authenticated user string to the data collection server (the pixel server) in place of the cookie ID when making data collection calls. In such a case, it wouldn’t matter what device you were on – the user string is always the same (unless you’re running multiple accounts on the web site).
  1. avertiz Feb 20, 2010
    
    They key here is to explode such an approach so that a vast majority of users are already being tracked via such a method. Sites that require registration (such as newspaper sites) already reap the benefits of this approach however registration requirements on a B2C site (even for news sites) are becoming harder and harder to push – mostly as an added value. On top of that, if the site allows unauthenticated browsing visitors tend to authenticate only on a single browser/device so they can access the “benefits” or registration but still browse “casually” via other devices. For example, I’ll log into CNN.com from my main home office computer but I won’t from my iPhone or work device.I agree that authenticating via registration (pixel server, etc) is key. The challenge is getting everyone to log in every time. With Google for example, someone who has Gmail running in the background on their PC and browses a site via his/her Android device (assuming they’ve registered it via Google) can be (and surely is) tracked as a single entity by Google.There is no doubt in my mind that this is the future model of visitor tracking – the question is who will own the ability to know what sessions belong to which person. Google? Yahoo? Facebook? Or maybe a standards group embraced by all (similar to IAB).
    1. tedryan Feb 20, 2010
      
      The problem with registration is that it is a killer to traffic and page views which is why you see less and less sites using required registration. Using a methodology to count pages and users that is an impediment to their growth is not really a long term solution.There are two purposes for measuring your traffic 1) Quantifying for advertisers. 2) Quantifying for your own publishing optimization and growth. The only reason to use third party numbers is for reason #1. However if you are running an ad based business of some scale, it is important to make third party numbers a key focus though not neccessarily your only focus.It is true that the smaller a site you are, the less reliable the third party panel numbers become due standard error, but this is well known and disclosed. If you are such a small publisher, you don’t have much need for the advantages of published third party measurment since your ad revenue will likley be either network based or very niche revenues.If you are a larger (3 Mil + UV) publisher, you need to organize all of your traffic efforts around optimizing for UV growth on third party numbers. All the other numbers like time spent, PVs, sessions etc are second and third order numbers. They may give you clues and markers toward increasing loyalty, length, depth etc which will ultimatley help grow UVs.Third party numbers are the only numbers advertisers will pay for. Full stop. Whether they are accurate or not is important but irrelevant. It is very hard for people to accept, I have been in many many discussions on this, but panel counts of UVs are not inaccurate. They simply aren’t. Server counting of UVs are usually off by about 30% on the high side.Server analytics do many other things better like counting PVs and time spent, sessions and all the derivatives off those, but not UVs.
      1. Aaron Gray Feb 21, 2010
        
        I agree that not all sites can require registration, and so such a method does not apply. I was thinking specifically of sites that require registration – twitter, foursquare, facebook, etc. If you’re not logged in, you’re not using the site – on any device.
Dave Hendricks Feb 20, 2010

My Value as a visitor is totally dependent on what device I visit with. I am more serious on some than others, and my purchase intent and engagement is totally device-dependent.
BClifton Feb 20, 2010

Nice summary about one of the biggest issues affecting on-site web analytics tools i.e. those that are cookie/javascript based (the vast majority of them). It really touched a chord with my own thoughts and writings…One thing I wanted to point out however, is that there are plenty of examples were panel based metrics e.g. comScore, exceed the on-site web analytics metrics for UV and other metrics. So its not true to say “always generate unique visitor counts that are way lower than their internal numbers”. Sometimes its true, sometime its not…Accuracy is an issue I have been discussing/blogging about for some years, so I would be interested on your thoughts on the following articles:Should you focus on website visitors as individuals?http://www.advanced-web-met…Why counting uniques is meaninglesshttp://www.advanced-web-met…Improving the web with web analyticshttp://www.advanced-web-met…For a summary of all the accuracy issues that affect on-site web analytics vendors, there is the Accuracy Whitepaper:www.advanced-web-metrics.co…As I say, I really would appreciate your (and your readers) thoughts on these.Best regards, BrianAuthor, Advanced Web Metrics with Google Analytics
Nick Molnar Feb 21, 2010

Does anyone remember which Doug Coupland book talks about governments using typing patterns to uniquely identify web surfers around the world. I thought it was pure fantasy when I read it.I didn’t know, at the time, that the British used the same principle during WWII to determine the sender of Morse Code transmissions. Guess this stuff isn’t that new at all. I wonder what other – less known – ways there are of identifying a person.
Adam Ware (@wheresitworking) Feb 21, 2010

… and I think we’ll see consumers much more comfortable with “registering” if it involves authentication through Facebook Connect (over 800,000 sites now using, versus about 8,000 a year ago), or other less-threatening authentication methods. I also think more and more sites will find ways and reasons to employ these types of registration/authentication methods.
Mrinal Feb 21, 2010

Seems WSJ read your post Fred – article today: Dot-Complicated: Measuring Traffic on the Web http://online.wsj.com/artic…
paramendra Feb 21, 2010

This sounds like a cool, level headed response to that rant by the Mahalo guy. I went to his comments section and praised this blog without participating in his discussion per se.
James Sherwin-Smith Feb 21, 2010

OK, so you, me and the majority of people posting below are familiar with this problem as we all access the same sites via different devices and different browsers – potentially even different IP networks as we move from 3G to WiFi to LAN and back again.I work for a firm that used to be an Associate Subscriber of the ABCe (http://www.abce.org.uk), which itself aligns to the JICWEBS (http://www.jicwebs.org). We let our ABCe membership lapse as frankly very few of our customers needed 3rd party validation of our figures.That’s not to say our figures are more accurate than any of our competitors, but I believe more a reflection of the sophisticated understanding of our customer base. I think that they recognise that internet statistics are indicative rather than precise – and therefore paying an additional fee to someone else to validate our results is only necessary in fairly unique circumstances. In my mind it’s akin to asking an education expert to assess the intelligence of your child outside the regular assessment that is conducted by the teachers at their school. So, if a customer of ours wants an ABCe audit of their statistics, we would happily re-engage with the ABCe to achieve this.On the same day as this post, the ABCe announced they were changing their primary digital audience metric from “Unique User/Unique Browser” to “Unique Browser” only. (http://www.abc.org.uk/Press… It looks to me that they have given up pretending to measure individuals (i.e. people), and instead of now measuring devices (i.e. hardware), they have made a further jump to measuring browsers (i.e. software). Ultimately, they haven’t changed the measurement method, but instead re-labeled the metric with a less disingenuous name.They define the UB (Unique Browser) metric as follows:“A unique and valid identifier. Sites may use (i) IP+User-Agent, (ii) Cookie and/or (iii) Registration ID [i.e. a log-in]. NB: Other identifiers may also be allowed.”As is discussed below, in a world where we use multiple devices, clearly any stakeholder reviewing website statistics should note that they are likely to have more UBs than unique human visitors (UHVs) viewing the site (discounting the deflationary impact of multiple humans sharing the same browser without changing their Registration ID).Within a VC context, my guess would be that when evaluating an online business from an investment perspective, beyond “actual” revenue growth and its relationship with profit generation, the next key performance indicator is “potential” revenue growth. The growth trajectory of the number of humans that use the service is clearly one part of that equation. Another is their evolving propensity to generate revenue for the business through their online habits; be it a direct purchase, advertising revenue or some other source.So from my perspective, it’s probably just as important, if not more so, to know whom your visitors are than just how many you have (or one day may have).Why? Because this helps you understand two drivers of the above:i)The ratio between UB : UHVsii)The propensity for UHVs to generate revenue for the businessIt’s probably best to illustrate what I mean via a simple, fictitious example. Imagine you operate a business that sells care products direct to the elderly – let’s say Zimmer frames and the like.I would argue that if you primarily cater to, and attract, the 60+ age bracket, then the ratio between UBs to UHVs is closer to 1:1 than it would be for the 20-30 age bracket [driver (i)]. Further, if your users are 60+ and online, they are likely to have money to spend online, and hopefully your products are of some relevance [driver (ii)].[Similarly, people who comment on online blogs about internet statistics are more likely to be users who interact via multiple devices, browsers and network connections – and thus have a much higher ratio under (i) – not to say that they aren’t a signal of things to come 🙂 ].So, what’s a good way to estimate the ratio of UB: UHVs? The user agent string is probably the good place to start, as it’s a good signal as to your audience’s access and affinity with new technology. If you have a significant and growing population of users that access via mobile devices (e.g. Android, iPhones, BlackBerrys) then you’re likely to have an audience that uses different devices to access your site (unless the site delivers content that is only relevant to a mobile device). Similarly, if you have a high proportion of users accessing via browsers that are not native to the device’s operating system at the point of manufacture, then you may see multiple browser usage per individual. Similar segmentation could be drawn from time of day analysis – where you could isolate segments of your audience using your site within working hours on a presumably work device vs. those outside working hours on a different, presumably personal, device.In conclusion, there are clearly benefits for stakeholders who care about accurate website statistics to force a login or make your session persistent so that the Cookies and IP+User-Agent methods become redundant for measurement purposes. My own post serves as a good example of this in action (I logged into the DISQUS service using my Twitter credentials – Google Accounts and Microsoft Passport are other bold examples of portable identity services.)But I would agree with those opinions already expressed below: pursuing the nirvana of perfectly accurate internet statistics via Registration IDs can come at a price. On the one hand, user experience (and ultimately acceptance) may suffer via the need for constant registration and provision of login credentials (cf. the evolution of most online newspapers), while helping users keep sessions persistent can pose security concerns.However I think the bigger danger here is that a relentless pursuit of accurate internet statistics draws increasing focus into understanding numbers, at the detriment of better understand the audience that you are trying to represent.
Alex Feb 23, 2010

In which period it counts? I think for a month, overcounting for 2.5 it is possible, but for a day, i think is too much… maybe 1.5/1.25
Mark Essel Feb 19, 2010

As long as you’re only reading the good half.I save my best comments for AVC. Fred’s thought riffs are always open for commentaries on all sides.Real readers are the most valued group on the web to me, so visitor length is my meaningful metric.
fredwilson Feb 19, 2010

yes, that’s why we do it