The Dictated Blog Post

I am dictating this blog post via my google phone. I’m doing its name is a test to see how easy it is to do something like this. I don’t plan on taking my blog posts in the future very often what is pretty neat that you can do this

I been dictating text messages to my kids and email via gmail on this phone that also works pretty well

The speech recognition is in the cloud not on the phone c a good 3g connection oregon better wifi connection for this to work really well

My friend brad feld says we won’t even use keyboard input in 10 years i’m starting to think is right

The results you see are the results i got a didn’t edit a thing i’d like to know you think

#Weblogs

Comments (Archived):

  1. HysBrian

    Looks surprisingly better than what I get for google voice transcripts.

    1. Mark Chou

      Definitely. Haven’t had good experience with Google Voice transcription either.

  2. Pascal-Emmanuel Gobry

    I think Feld is wrong on this one — keyboards will always be faster than voice.And as you can tell once you look at this, voice recognition still has a ways to go (although I agree that 10 years from now, *most* things will probably be voice controlled).

    1. Brian Joseph

      I can’t agree. I think that speech recognition will definitely be faster, and quite soon. jmo

      1. Pascal-Emmanuel Gobry

        I type 60-100 wpm. I can’t speak that fast.

          1. Pascal-Emmanuel Gobry

            The guy speaks really fast. When I’m dictating I’m not rushing to pull wordsout of my mouth — I’m ruminating, thinking outloud, etc.I type fast, but I don’t *try* to type fast, it’s just a habit. What mattersis the “normal” speed, not the maximum speed.

          2. bfeld

            My assertion isn’t that we’ll use voice instead of the keyboard. It’s that “in 20 years we’ll view the keyboard and mouse the same way we currently view punch cards.” Voice is one type of input, but not the only one.

          3. fredwilson

            have you done or seen a good blog post on all of the various emerging ways we’ll interact with a computer in the next 20 years?

          4. Ian Betteridge

            Brad, I think you have the wrong analogy there. No one still uses punch cards, but you’re then saying that keyboard/mouse will still be around.

  3. Jeff Brown

    Will swype be history before it even has it’s 15 minutes?

  4. Julien

    Can do better 🙁

    1. fredwilson

      agreed. i can do better talking into the phone too. once it trains me, it will be better. i’ve gotten pretty good and i’m “voicing” my kids via sms a lot now

      1. Mohamed Attahri

        The software is being trained live too. I’m sure it is based on some sort of artificial neural network. The good thing about it being a web service is that it improves itself over time using the help of all its users combined.

        1. fredwilson

          and the bad thing is it doesn’t work in the subway. it would be great if there were a hybrid solution

          1. Mohamed Attahri

            I agree. Noise cancellation + Offline would be the ultimate solution.

          2. andrewwatson

            you mean an online/offline combo… like Gears? It’s too bad they’re phasing that out because of HTML5.

          3. fredwilson

            something like that

          4. Mohamed Attahri

            Google Gears is basically background processes, offline database, and local server. They’re all in the HTML5 spec. so: HTML5 = Google Gears + More.It will actually be very useful for a little while to add support for HTML5 features to old browsers.

      2. Andy

        funny(and sad) but true (VERY FUNNY but VERY TRUE) Fred – “…once it trains me…”

        1. fredwilson

          that’s just how it feels to me andy

      3. markslater

        really? thats interesting – so you could voice a command in to an app using SMSlike: “get me a table at XXXX”

  5. rajatsuri

    Typing something out allows for a lot more thought than speaking it out, I’d say. For some reason, typing imposes some sort of discipline for me. But maybe that’s just me?

    1. fredwilson

      that’s why i said up front that i won’t be doing this often. i agree completely

    2. Mark Essel

      You could always reread just a you edit.

    3. Carl Rahn Griffith

      Agree – I expressed that same sentiment in Fred’s previous blog entry.

    4. Max Kennerly

      It depends upon the person and the situation. Often, I deliberately choose to dictate a message (on a handheld recorder) because the difficulty of correcting the recording forces me to really think and plan ahead before I speak, as opposed to throwing stuff on the screen and playing with it later.I get a similar effect from Dragon Naturally Speaking; since the program works far better if you speak out full phrases, you have to think of where you’re going before you get there.Of course, YMMV, and it depends on the situation.

    5. Keith B. Nowak

      Agreed. Typing is the better method for composing something lengthy and where the proper words need to be chosen. For quick sms or email messages though speaking is much easier and should take off with the right technology. It seems what Fred is using on the google phone is getting there.That said, I was actually having a similar conversation with my father the other day who is a lawyer and he said years ago dictation was the way even the largest legal briefs were done. With the spread of desktops and advanced word processing software he no longer uses dictation. He was saying he had to learn how to dictate properly but once he did composing large and complicated documents became easy. Perhaps as speech recognition software becomes more advanced and wide spread we will return to using dictation more than keyboards.

      1. fredwilson

        the partners who taught me the venture business used to dictate memos and letters to their assistants when i started working for them in the mid 80s. they managed to be very thoughtful when they did that.

      2. JLM

        What goes around comes around.When I first started in business, the critical skill for a secretary was the ability to take dictation. I would often have her open my mail, read it and I would dictate a response and thereby dispense with all of my written correspondence in an hour a day. Much more important correspondence came by letter in those days.I would edit her first draft and have it out the door on the same day as it was received.I also believe in keeping exquisitely complete files for future reference. Much more so in those days.Today the computer has provided the ability to create “exemplars” — forms of complicated documents which provide for the ability to customize them to the deal at hand. This has been a creeping development and many business folks do not consciously see that they are doing just that.In a multi-unit, multi-state operating business, I have created “form fill” documents for transactional, real estate, reporting documents which provide a consistent set of deal points, a consistent legal framework and a consistent tone. I only pay for the legal work one time as a result. I bet I have 50+ exemplars of everything from a term sheet to a mutual general release.One thing I think is important today is to realize that no e-mail ever is really deleted and to ensure that even the presentation of e-mail continues to be important. May be a bit old school but it is important to understand that a lot more folks than just the NSA are reading your e-mail.Write every e-mail as if it were going to be printed on the front page of the NY Times or the National Enquirer (the real paper of record these days, unfortunately).

  6. Andy

    Having used the Motorola Droid now for almost a month, it is clear to me that at least on the Mobile front – The (Physical)keyboard is indeed dead. I think that with the advent of competent (I use the Swype app. and it’s amazing) and increasingly better virtual keyboards, and voice UI’s the only reason the industry still ships phones with a physical keyboard is to help with the psychological transition. Ya, I know – people will defend keyboard’rs like RIMM’s Blackberry because ‘Enterprise users want/need a Keyboard’. Bull – Brad Feld is right IMO but will happen much, much quicker in the mobile space – I predict that within TWO years the physical keyboard (in Mobile) is dead.http://twitter.com/A_F

    1. fredwilson

      if you go back and look at my posts on blackberry and iPhone over the past three years, you’ll see that i am/was a keyboard lover. but once i made the psychological commitment to myself to switch, i realized i could do it. you are right, the issue is more mental than physical for most people

      1. ShanaC

        I totally disagree. I still see touchscreens as a mixture of sight and touch. May you be really healthy at 85: but generally people are a tad frailer, their joints just not quite as good, and their vision not quite as good either.And that will add up on essentially a computer the size of ones hands. Keyboards have added feedback from not relying on sight and sight alone.

      2. Aaron Klein

        when the technology gets it right 99.9% of the time, I guess I can contemplate it again. But I’m an ex-iPhone user over the lack of a keyboard. I *know* what will happen when I tap my BlackBerry. I never knew for sure without looking at the screen of the iPhone.

    2. andrewwatson

      It took me some time to get used to the “soft” keyboard on the myTouch because it works slightly differently than the iPhone but i can type very well on it now. Also, it has the suggestions across the top that you can point to and not have to finish the entire word.The physical keyboard is dead. It just doesn’t know it yet.

    3. bfeld

      Yeah – physical keyboards on mobile devices make absolutely no sense to me. It’ll be interesting to see how the “input” on the Apple Tablet is going to work. Swype is a great example – there’s just no reason to be using a qwerty keyboard on an iPhone or Nexus One – there are so many faster ways to input data, even using a keyboard type device. And – when it’s implemented in software (as on the iPhone), you have a whole new type of control and flexibility.

      1. fredwilson

        you are my guru on this stuff brad. thanks for encouraging me to give up the keyboard orthodoxy. it’s liberating to think i can engage with devices in new ways

  7. Brian Kung

    Agreed – my google voice transcripts are useless. This seems pretty good. I’m pretty sure voice has been guaranteed to supplant typing “in the next ten years” for the past thirty years, but this is really promising.Let me guess some errors: “I’m doing it mainly as a test…” “I don’t plan on dictating…” “…which is pretty neat”I don’t know what “oregon” is supposed to be, though.

    1. fredwilson

      correct on all three countsoregon is “or on a”

    2. Jeff Brown

      oregonor on a better wifi

      1. fredwilson

        yup

  8. bramcohen

    Not bad, although some artifacts of you being less coherent when talking off the cuff than when your thoughts are forced to go through your fingers are showing up, and those will still be there no matter how good the voice recognition gets

    1. fredwilson

      that’s very true bram. i wonder if i would like to dictate and then edit. i may try that sometime.

      1. bramcohen

        Having special hotkeys for forward/back a complete word and delete a complete word, with the ability to switch to regular typing, might make voice recognition very compelling

  9. andrewwatson

    it’s sure better than the Google Voice transcriptions! Maybe they should move those to the same architecture…Seriously though, I like the idea of performing the transcriptions on the server side. Consumer devices (even with 1Ghz processors!) will not be able to match the processing power of compute clusters.I’ve been looking at transcription technology a lot lately but for recorded phone calls, not recordings made directly on the device. That reduction in quality makes a huge difference in the difficulty of transcribing it. Also, with something like transcribing voicemail you have to be speaker-independent which is much harder.Also, I noticed there was no punctuation but it did make paragraph breaks. I’m wondering what semantic clues it used to accomplish that and why it didn’t put periods at the end of your sentences.

    1. fredwilson

      i made the paragraph breaks. i should have stated that in the post. sorry about that.

      1. andrewwatson

        ah! that makes more sense then, that it didn’t do any markup at all. It seems like you should be able to say “paragraph” or “break” like old-school dictation back in the day…

        1. fredwilson

          maybe you can. google products have a lot of hidden features

  10. Guest

    B+But really, I do believe that keyboards will become far less used.Perhaps voice recognition will be the replacement.

  11. Alex Iskold

    I agree. Beam us up Scott!y!

    1. fredwilson

      if you’ve got any ideas on teleportation, i am all ears. talk about a disruptive technology

      1. andrewwatson

        LOL have you read “The Physics of Star Trek”? that was one of the more implausible technologies. Something about generating as much power as the sun outputs in its entire lifetime…

      2. Alex Iskold

        LOL. Here is what I know:1) There has been good progress in deploying Quantum Technologies in real life. In particular, I have small personal investment in MagiQ Technologies (http://www.magiqtech.com), a company specializing in secure communications and made very significant progress over the last 10 years.2) A lot of these Quantum Technologies are “entangled” (pun intended), and a few breakthroughs can quickly result in profound changes around us. I am not sure if we are talking about 10 years on this, but I do think that within our lifetime some fairly mind blowing things might be possible.AlexP.S. I was one of the first users of Dragon Naturally speaking from IBM and that thing had a hard time understanding my accent. I wonder how the dictation software you used handles accents.

      3. kidmercury

        already exists, suppressed by military industrial complex. depends on interdimensional technology, comes from extraterrestrials and inter-dimensional creatures. here are some people who talk about this: dan burisch, jim marrs, project camelot, steven greer, john lear, jim sparks, jim keith, arthur neumann. the challenge is that govt puts out disinfo about this stuff, and lots of honest people end up believing disinfo, plus it is about aliens and stuff so it is already a wtf type of situation to begin with. but, if you pound down the doors of rand corporation, boeing, lockheed, and probably a bunch of companies no one ever heard of, as well as various compartments within the intelligence agencies (NRO, naval intelligency, army intelligence, CIA, probably some other agencies govt created that no one knows about it), you’ll start finding answers. suppressed technology is significantly more advanced than what most people experience.the price is peace though. no renaissance until we are spiritually grown up enough to responsibly deal with the privileges a renaissance affords. sorry boss, don’t know what to tell you. cosmic law, i’m just the messenger.

  12. im2b_dl

    I agree … just wish Google phone customer service wasn’t a 3 day wait time. ugh. they have to work on that. and Fred lol… if you check out hyv.tumblr predictions… the muses that visited us this morning.. totally in “sync” so to speak …the Tu-Is that we designed is all about the voice and body reads… and I was writing it about why I focused on voice/gesture in the transmedia user interface. As someone commented keyboard may be faster than most people can do with voice protocol but what I think the year of “blackberry/iphone distraction has taught us” a keyboard is not great for life.

    1. fredwilson

      couldn’t find the comments at hyv.tumblr.com but i love them. did follow you though

      1. im2b_dl

        sorry it was on the wave.

  13. ksrikrishna

    Not bad at all Fred. Reads a bit like when I write an email after 2am (or occasionally nod off between key pecks) and don’t bother editing. As you say, once some training is done, can get a whole lot better. I can think this might be a good thing, where you try to capture an idea as fast you can with voice, and make it even better later when you are ready to publish. Can be sweet from drafts even today. As to Feld position that we’d never need/use keyboard but go all voice – we’ve been saying that for maybe >20 years of speech recognition. I anticipate both will have their own place though as another commenter said a whole lot more voice than typing maybe doing content and certainly control!

    1. bfeld

      Yeah – but to be clear, I’m not saying “the answer is voice.” I’m just saying “the keyboard / mouse is almost obsolete.”

      1. ksrikrishna

        Brad I have to agree – other input methods, tactile or otherwise will for sure play a bigger role. Voice probably makes it a lot easier as well, particularly when we look at a whole world of folks out there that are non-English speaking and SMS/voice consumers today and will become (serious) data consumers. Still a few wrinkles to be worked out there I suspect for non-English voice inputs.

  14. RacerRick

    I like . needs mor work me think

  15. davidkpark

    Looks like where handwriting recognition was 10 years ago (or maybe a little better). I recall a lecture by Michael Jordan (not the basketball player but the computer scientist and statiscian at UC Berkeley) discussing Bayes Ball (how graph and probability theory are actually one in the same) and how they use Bayesian statistical algorithms for voice recognition. To make dictation, via a phone, better there has to be two things that need to improve – the algorithm and processing speed. The algorithm may actually exist but the raw processing power to get accurate estimates may take too long. For example, you may have to use an EM algorithm instead of Metropolis Hastings because EM can provides estimates in 1 second with 85% accuracy instead of 5 minutes with 95% accuracy. This is a fascinating field. The fun question is how long will it be when you don’t need to type or talk to write your blog posts.

    1. andrewwatson

      5 years ago a researcher at Georgia Tech told me that truly accurate, speaker independent, context free speech transcription was 30 years out. So we only have 25 years more to wait…

      1. davidkpark

        Computers are now writing academic research papers…Computer-Generated Paper Accepted for Prestigious Technical Conference – Ieee – io9 http://bit.ly/7QS1SI so it may be less than 25.

        1. Mark Essel

          Whoa..

    2. fredwilson

      not long it seems to me as a observer. i don’t know that much about the technology to be honest.

      1. harpos_blues

        Fred,I’ve used speech-to-text technology regularly (sometimes continuously) on desk/laptop since 1996-7 due to disability. I will tell you that he technology has improved dramatically. I often dictate long, complex technical or legal documents with accuracy in the 95 or greater percentile. It takes a couple of hours to learn to speck succinctly (no “uhms” or mumbling etc), however, you can train the speech-to-text software on your vocabulary by parsing previously written documents. In addition, the more popular products have dictionaries / vocabularies for occupations that commonly use dictation — medical & legal, for example.With speech-to-text I have far more speed & accuracy that I have by typing, but that’s twelve plus years of very regular usage. It’s not hard to learn though, and the added bonus is that dictation has improved my public speaking as well.One thing to keep in mind, is the sheer speed of human transcribers and stenographers. In California, licensed stenographers must be able to type at 180 words per minute at something like 95+% accuracy.For software products I use MacSpeech Dictate (http://macspeech.com) and Dragon Naturally Speaking (http://WWW.NUANCE.COM/naturallyspeaking/products/editions/default.asp) on the PC.Nuance has been in the the speech-to-text market for 10-15 years now, first with Dragon Speech back in the early days, and then with the acquisition of IBM’s ViaVoice product a few years back.Of course, Nuance technology provides the speech recognition for many online and automated call routing systems worldwide, they are also a major force in the standards community for voice technology (along with IBM). When implemented properly, it amazing what some of these technologies can deliver.Speech recognition was also the genesis for some of the original distributed artificial intelligence languages and systems back in the late 1960’s. These systems are/were called “blackboards” often written in LISP. one of the speech specific languages was from the Stanford Artificial Intelligence Laboratory (SAIL). I worked on Blackboard systems with Daniel Corkill at University of Mass at Amherst back in the early 1990’s.There’s a lot of history in these systems, and Google is far from the top of the food-chain.To my knowledge, Google is working with a somewhat home-grown solution, but I need to look into it.

        1. fredwilson

          awesome comment. did you dictate that?

          1. harpos_blues

            Fred,I didn’t dictate today, as my hands are working well (thus the typos). Many of my comments on your blog are dictated though. In real life, my dictation is smooth enough, that most of my colleagues can tell when I’ve typed a response. So I occasionally get e-mail responses that read: “Are you typing today?”It’s actually rather amusing…

        2. toddsavage

          So, is Nuance (NASDAQ: NUAN) a buy? I have been following it for two years now waiting for the right chance to buy it. If this all becomes true, it won;t matter where my entry point is in the long run.

    3. Mark Essel

      Thanks for the patten recognition details. I wasn’t aware of the algorithms and processing times. At least I have something to look up now.

  16. lauradaly

    Surely this must be an advantage for blind / visually impaired people, as keyboards can not be easy for them to manage.

    1. fredwilson

      i think so. but they can’t correct the errors.that said, i’ve been using transcribed voice mails for about three years now. they aren’t perfect. but i can make sense out of 99% of them even when they have errors.

      1. andrewwatson

        they could if it read the text back to them and they could “edit” it with spoken commands. Not elegant but better than machine-braille interfaces probably.

        1. ShanaC

          There is someone in my neighborhood with JAWS actually (if I think about it, it is also one of the few dual language setups of JAWS in the country, my father set it up). He probably types faster than you because he can’t see: My father is an extremely fast typist- he reports that this guy, who is also a family friend types faster than he does.Also apparently the interface reads sounds faster than my father can hear. Apparently when you are blind your hearing get so much better that they overcompensate, JAWS takes advantage of this- it is doubtful that the native phone application does though.You should know that this guy has complained about mobile phones…

    2. ShanaC

      I’m not so sure, it depends on the speed and how accurate the dictation. See this About JAWS Also, generally the setup is that there is a physical keyboard with braille attached to correct the dictiation if one is using JAWS. You can’t see what you are pressing, or even make a guess with the current setup: People do not realize how odd it is to think of a touchscreen as a touch based technology until one can’t do anything with it without vision. Or almost anything…

  17. gregorylent

    much less than ten years … typing will be like the telegraph and morse code

    1. fredwilson

      my upper back and next muscles are hoping you are right

  18. steveplace

    this shirt is awesome

  19. pranav

    the problem with voice to text is that some times, a few words get transcribed incorrectly — and readers are left scratching their heads, trying to figure what the ‘author meant to say’. Some such goofups are obvious – others – not so easy to figure out. The reason i have ‘author meant to say’ in quotes is because the reader can only make an educated guess – ultimately, they’ll have to confirm with the author – just like few did about ‘oregon’

  20. Aaron Naparstek

    I’m writing this comment standing on one leg and whistling. I’m doing its name is a test to see how easy it is to do something like this. What is pretty neat that you can do this.

    1. fredwilson

      ha!

  21. Newmind

    Pretty decent. A huge improvement over the transcription on google voice in my opinion. Wonder if/why google is using different code for the same purpose?

  22. ryan singer

    Are you voicing your replies?!

    1. fredwilson

      nope, because i’m on my laptop now

  23. andrewwatson

    a thought just occurred to me. what if by the time voice recognition and comprehension becomes viable… it’s already obsolete.

    1. Mark Essel

      Neural interfaces are coming along fast. We’ll be quietly thinking posts and comments. It gets to the heart of the problem and obviates language in search of thought transcription.Imagine the applications: Driving by thought! Or developing software as fast as you can think about the structures.

      1. andrewwatson

        that’s exactly what i was thinking. direct read/write brain access whether through resonance or physical connectedness is rapidly progressing. We’ve spent so much time teaching computers to understand our language, though, that I wonder how hard it will be to get them to understand our thoughts…

        1. Mark Essel

          Longer than we want, but much sooner than we predict.

        2. markslater

          there is quite a bit of development related to ALS research too – sufferers that are trapped in their bodies can transpose their think on to machines. There is a company up here (boston) that recently made an announcement in this vein.

      2. kidmercury

        imagine the level of transparency and integrity companies will need….or actually govt can probably just make it legally required to fly, so that they can scan your mind to make sure you are not a terrorist (for your protection, of course.) will the people choose to believe the lie? or will they choose the truth that sets them free…..

        1. Mark Essel

          I prefer liberty over security (BF).But I’d love to have a recording of some of my dreams. Some dark, some epic, others gorgeous and full of bizarre reality mashups.

  24. pjwilk

    The Dragon App for the iPhone works pretty well too — it seems faster and almost as accurate as V10 of the full program on my PC, although as someone who was trained to write on a keyboard, I’m more comfortable typing — even though I think my writing may sometimes be better when it’s dictated first and then manually edited. Don’t know what Dragon might charge for the App, but they were smart enough to give it to me free since I purchased the desktop program. And Google Voice transcription works well too, as long as the caller has a good connection. Once there’s a Web interface where I can click a button in a Google Doc or in a Disqus comment box (are you listening, er reading, Fred?), we’ll be closer to a tipping point. I’m using IntenseDebate at http://paulwilkinson.com now, but will switch if Disqus gets a good dictation interface first.

    1. mbrosen

      Ditto.The Dragon Dictate iPhone app is REALLY good.

  25. daveclark

    Great for use while driving.

    1. fredwilson

      yes, anything that can get our kids voicing behind the wheel instead of texting is a really good thing. we’ve insisted that our kids don’t text in the car. and i believe they don’t and won’t. but the mere thought of it gives me nightmares.

  26. udeme

    Looks more like 5 years, sir. This is quite impressive.With a few more iterations of this device (or feature), I can see many people (including/especially) myself punching keys less and less.Amazing stuff!

  27. markbrandon

    Initial take, it looks pretty good. However, I have been playing around with voice recognition on tablet pc’s for a while and VR gets very challenging or unusable as the ambient noise increases.

    1. Mark Essel

      Agreed Mark. My VR tests while outdoors walking next to a road were terrible.

  28. dan manco

    Funny last night was had a discussion at a dinner party about voice recognition, swype, etc. We talked about the the vid of someone swyping faster than someone using a keyboard. http://www.techcrunch.com/2…I agree w/Jeff Brown that voice recognition will leap frog Swype. I absolutely LOVE the blur of progress happening…

  29. Mark Essel

    I dig the fact that you left the post raw, nice touch for capturing the “state of the art” now in speech to text. I used dragon natural speaking on iPhone and it works best in small fragments. My only problem is that it’s bad enough that I walk around a mall for hours staring, typing and laughing at my phone. I could try and play off speech to text as a phone call, but the act of typing gives my brain time to process the language. By the way, I put together a fun post on metaprotocols for the ultimate in dumb pipes this morning.

    1. fredwilson

      it represents the start of my art of dictating as much as google’s state of the art in recognizing it. i bet i’ll get better at dictating faster than they’ll get better at recognizing for the next few months

  30. saieva

    The quality has a way to go, but one advantage in using the “cloud” is that improvements to the voice-to-text conversion algorithms can be implemented without requiring changes (or waiting for software/firmware upgrades) on all of the deployed devices.Sal.—Salvatore Saieva

  31. falicon

    I think it’s really cool…to me it seems like a great way to write more (ie. just talk your thoughts, then go back, edit, and post)…very awesome.At it’s current stage, I think it’s all about the editing (but that’s really the case about a lot of things that are typed as well — and really where I think newspapers should be focusing [it’s not about generating the content, their specialty should now come from selecting/editing and augmenting the content] but that’s a completely different topic).Anyway, anything that lowers the barrier for the average busy person to generate more content seems like a good thing to me 😉

    1. Mark Essel

      As a terribly challenge writer I can attest to the cost of editing being much greater than the initial raw message generation. A post or topic goes from nonsense to almost coherent, and sometimes a single thought comes out just right. Ideas are so much easier than words for me.

  32. terrycojones

    I’ll list myself among the doubters on this one. The problem is with error rates and the fact that even what looks like a low error rate severely limits the usefulness of things and means that they wont displace existing solutions that almost always work. E.g., say you have a backlit keyboard on your phone. It works virutally anywhere you happen to take your phone. You can’t get rid of your keyboard in favor of a solution that only works 99% of the time because that 1% will drive you nuts. And the 1% tends to be important, like proper names, places, etc.No one has said much about noise. It’s the killer for voice to text. It’s just such an incredibly ridiculously mind-bogglingly hard problem that…… well, I certainly don’t want to work on it, wouldn’t invest in it, etc. Just think about the sources of noise in your life: TV, outside, subway (as Fred says), your kids, the phone, winding down the window in the car, music, random interruptions, the variations in your own voice (tired, drunk, panting, nervous), conversations you may be involved in (try dictating a message while you’re also talking to someone else. Now try sending an SMS using a keyboard – no problem). Those are just single noise sources. Now combine them: sit in NYC traffic with the music on, your window wound down, the engine running, the kids in the back seat watching a DVD, etc.I remember a study done maybe 5 years ago of something very simple: voice dialling while driving a car. Just say the digits out loud (i..e, don’t speak a name, just say numbers one after the other). With the variable noise in a car, the accuracy was just 90%. Sounds high, but that’s one mistake per phone number. And that’s a simple *recognition* problem – far easier that voice to text.It’s just too hard.

    1. davidkpark

      All very true but noise (and subsequently ambient noise) canceling technology is getting better every day.

      1. ShanaC

        I think that is true- I don’t think that solves secondary issues about how your body relates to machines and how you see them as extensions of your own body. If you cannot communicate through them fast enough and accurately enough, they will be discarded.I’m one of those people who are proponents of trying to pack in as many kinds of interfaces to interact with as possible, including styluses, despite the fact that doing is is incredibly difficult for this reason and reason alone, once the processing power becomes available: people have different needs for different situations, and to leave them blind to them when you could stick them into a device seems to be a tad, silly.

    2. Mark Essel

      Almost all the limitations you mention are overcome by neural interfaces. You can’t overcome thinking poorly (drunk).I like to imagine the far out applications of read write neural interfaces and external memory and identity.

      1. terrycojones

        Hi MarkIm not sure what you mean by “neural interfaces”. You mean by our brains? By neural networks? Can you elaborate in either case?

        1. Mark Essel

          I gathered a little info last year at http://www.squidoo.com/brai…external wave sensing devices. Coarse now, but sensors have a way of improving with $$

    3. terrycojones

      I forgot to mention another show-stopper: error correction. “Oops, not Rogers, Dodgers, wait, delete that last word, no, not THAT word the one before, oh shit delete backup, no, no I wasn’t talking to you, oh shit…”When I used to know a bit about voice to text systems, the ones that were usable had a secondary input method that was used to correct errors, since it’s even more mind-bogglingly impossible to use the voice transcription system to correct errors in itself while it’s in motion.The secondary input method of choice…? Yep, a keyboard.

    4. harpos_blues

      Terry,There are some amazingly effective (and very affordable) noise-cancellation microphone headsets available to the everyday consumer now, so noise reduction, while still very important is not the barrier it used to be.One of the most impressive consumer-level noise-cancellation headsets I’ve used is “The Boom” (http://www.theboom.com/v/in…. Back in the day, they had a demo video from inside a Blackhawk helicopter.

      1. terrycojones

        Maybe I’m missing something obvious here, but I don’t understand why people are replying to say that there are advances in noise cancelling headphones. Why is that relevant?Are you imagining that people will carry around some kind of Cone of Silence device that blocks out outside noise while they make their dictation? 🙂 The impressive noise cancelation demos using simulated or recorded airplane engine noise are based on eliminating low frequency noise, which is an easier task. But that’s just a side detail – I’m missing your point here, I think.Thanks for the reply, and I’d be happy to hear more details of what you’re thinking.

  33. Jeremy

    “My friend brad feld says we won’t even use keyboard input in 10 years i’m starting to think is right”Very cool post – and impressive to see the quality before any training is applied to your voice. You should do this again in a month, both to see 1.) if you’ve just gotten better at talking at the phone and 2.) if the phone has gotten better at understanding you.Voice may replace typing in a lot of situations where it is more convenient, but voice disturbs your neighbors. Will you talk out a private email on a subway? Will you talk out a text to your wife in line at Starbucks?

    1. bfeld

      How would “talking a private email” or “talking a text to your wife” be any different than all the “talking on cell phones” that goes on today. It’s amazing what you learn if you just sit quietly next to a business person on the subway or listen to all the conversations going on when you are standing on line in a Starbucks.

      1. ksrikrishna

        Touche! And as my neighbors at work found out, if you inhabit the next cubicle to an even old fashioned telephone line. The Japanese seemed to have mastered this talking in public practically silently, whilst we Indians have mastered the art of ignoring the nut who insists on screaming into his phone 🙂

  34. hadar

    I agree that VR will make dramatic leaps and account for a lot of input in the coming years. I also agree that virtual keyboards will obviate most of the need for physical ones on mobile devices. While the use case isn’t frequent, the one thing that makes physical keyboards necessary at times is that they don’t cover up _any_ of the screen real-estate.If you’re just filling in a form field, covering up half the screen isn’t annoying. In some apps (perhaps fewer each day), a virtual keyboard makes the app close to useless.

  35. matthewdbenson

    the quality of the writing/dictation is better than some facebook status updates that I have seen …

    1. ShanaC

      Hahaha! How I speak in my own time is my business, if I want to speak in a subdialect of English is my own concern. And actually that is a concern with this phone. I do feel like using my local subdialect with my friends who speak the same way.

  36. ErikSchwartz

    The tools offered by google’s cloud are why I think Android overtakes iPhone.

    1. fredwilson

      spot on erik

  37. kidmercury

    IMHO any outlook of more than 7 years (probably less) that ignores the “big picture” — i.e. global economics, global politics — cannot be taken seriously, it’s missing too much contextual information to make accurate long-term analyses (i.e. how is product development going to get funded, who is the buyer, how much government risk is involved, etc).

  38. Reg B

    Excellent voice recog coupled with fast, easy to use text editing would be a game changer. Shouldnt be too far off.

  39. Jim MacLennan

    I’ve long been a fan of Dragon voice to text (http://bit.ly/6qg8qV); I consider it an “alternative KM tool”, trying to make it easier for tech folks to do documentation (http://bit.ly/M1FN). Using Google Voice is an interesting idea – makes me want to try this out – but as a long time user of software transcribing, I will have a laundry list of things to test for (background noise, clear enunciation, vocabulary, editability, etc.)

  40. paramendra

    For one, it reads funny. 🙂

  41. Dave

    My wife is full Chinese and, as English is her third language, I recognize this dialect immediately. 😉

  42. maverickny

    It is definitely a big improvement than the efforts I’ve seen on Google Voice.If it’s anything like Dragon, it will produce better efforts in English with an American accent than for those of us with a thick British one 😉

    1. Aaron Klein

      Yes, why haven’t they implemented the same technology on Google Voice? The other day, my wife’s voice mail told me to meet her and the kids at the Starlight Bar. We have no such bar, fortunately, so I didn’t have to worry about where my wife was taking the kids.

  43. Herb Greenberg

    My interest in voice recognition is trying to figure out how it will evolve as a BUSINESS.My hunch is that voice mail-to-text, for example, has limited appeal as a PAID app beyond busy business users. And voice for messaging is interesting — but will you pay for it? Voice search on mobile, I believe, will be huge — assuming it gets the words correctly. I’ve been experimenting with Vlingo on my BB Storm2 for searches. It’s pretty good, but far from perfect. And for search it’s FREE.Does that mean that, ultimately, voice is just another utility, even for messaging, emails, etc? Google makes it free, but it costs $$ on the IPhone and BB for all-inclusive voice messaging. Would YOU pay for it?As for keyboards, I know I’m in the minority here because I’m probably the only person who reads Fred who doesn’t have a Droid or IPhone. But FWIW the virtual horizontal keyboard on my Storm2 works really well once you get used to the touch. I can speed type when I’m focused. I’m just not considered cool doing it. But….so what, who cares?!!!!

    1. fredwilson

      yes, Herb, i think voice becomes another utility

    2. Vipul Bhatt

      If you’ll let me use a bad pun, I want to add some nuance here….Speech-to-text will be a utility, but in a somewhat new sense of the word, and the road to utility will be paved with plenty of entrepreneurial achievements.If by “utility”, we mean “a low-cost, reliable substitute for a keyboard, becoming a part of all operating systems”, then surely we shouldn’t bet against it becoming a reality. That day will come when devices/cloud will have sufficient hardware power and semantic intelligence to handle accents, speech variations, noise, special vocabulary, punctuation, and formatting preferences. That day will come, but I wouldn’t hold my breath. And along the way, there will be great problems to solve and money to be made.And what will life be like when technology advances that far? It’s safe to say that we will want to mix instructions with messages — “Reschedule this appointment, inform everyone in my group, and book me on the cheapest flight to New York”. Our natural tendency with voice will be to produce more than just a message. Again, plenty of room for innovation and value creation.

  44. jcristofer

    Rest assured: I think.

  45. William Mougayar

    It would be interesting to see this at work with eyewitness/on-the-ground reporting of a real-time event. Imagine a number of people with voice activated reporting commenting on the Hudson River landing, and then some aggregator quickly making sense of all of that.

    1. Herb Greenberg

      Elaine handed the Hudson River a cow. (A plane landed in the Hudson River just now.)

    2. JLM

      I think that came very close to actually happening as there were many very quick observations made. I remember a “man on the street” conversation before the passengers had even been rescued and many iPhone pictures being transmitted.As an aside, I am very, very surprised that commercial aviation has not adapted some fairly simple technologies — burst radio transmissions of black box contents when below 100′ AGL, cameras on planes showing both cockpit and passenger compartment, air sampling triggered passenger compartment fire suppression systems.

  46. Eric Friedman

    You should try the Twilio posting tool (mine worked for WordPress only right now) that does the same thing but also posts a recording for those that want to listen – was a good learning experience getting it working too.

  47. jonathanpberger

    The 10/GUI concept video has some nice ideas about where interfaces could go: http://www.youtube.com/watc

    1. fredwilson

      thanks for that linki’ll boxee bookmarklet it and watch it lateri really wish i could right click on the url and do that without having toclick through

    1. fredwilson

      hmmthat youtube link isn’t working for me

        1. fredwilson

          that works for methanks

  48. hardaway

    Dictating is an art, like writing. My late husband was a radiologist and dictated his readings to a transcriptionist who acted like Google Voice and he had to go back and edit every note, even after she had been listening to him for fifteen years. I found your blog post hilarious, as I find my Voicemail delivered through GoogleVoice. Ribbit VM is a lot better, but still less than perfect. But i like to use Voice Control to call people in my iPhone address book. It works well in limited situations. I was a VAR for IBM speech-to-text products about fifteen years ago; they’ve not come as far as I hoped they would.

  49. jer979

    Love the innovation, of course, but as others have said…not quite ready for prime time. Still, you had to test it out and demo it publicly, that’s the “AVC” brand 😉

    1. fredwilson

      yes indeed

  50. howardlindzon

    who is powering the voice recognition world…is it $nuan nuance communications?

    1. fredwilson

      i believe all of the voice recognition stuff on the google phone and googlevoice is home grown by google but i could be wrong about that

    2. harpos_blues

      Howard,Yes, nuance has the lion’s share, but is not the sole vendor in the voice recognition space. Another company to look at is Autonomy (http://www.autonomy.com/), as they manufacture software that can interpret/extract knowledge from audio/video, and they’re very good at it :).

  51. John Stack

    I love it! I’ve been using it about a week and have been a huge fan of Naturally Speaking for years. I can’t hand write even a monthly check so I find VR of significant value.(Note, perhaps its just late Sunday night but your site load time is quite long.)Glad you’re on it. I’m hoping they add all sorts of update oriented plug-ins – tumblr, etc.Thanks for the post!

  52. lawrence coburn

    I see this kind of stuff and I really do get excited about the possibility of flipping the UGC ratio from 90% lurking / 10% publishing to the opposite.

    1. fredwilson

      that’s a good point. hadn’t thought about it

  53. infoarbitrage

    A scholarly post by Brad on this topic is what I’d like to see. I completely lack vision and the necessary knowledge when it comes to this stuff… Thanks for really amping up the discussion on this topic, Fred.

  54. Hunter Coch

    Not perfect, but still a lot better than my Google Voice transcriptions. This technology is improving fast.

  55. withdrake

    What we are missing here is the sense of context, order, diction and…drama that can be provided by proper punctuation. Our speech is full of the natural pauses, mood clues and emphasis that simple dictation does not cover.Beyond that, the technology is impressive. Having spent several years teaching people how to write for radio, I think people using text-to-speech will become acutely aware of something my students struggled with—how different their words came out as compared to how it sounded in their head.

  56. joshua schachter

    Has anyone really been far even as decided to use even go want to do look more like?

    1. fredwilson

      how did you produce that joshua?

  57. Ian Betteridge

    Remember how lampooned the Newton’s pen input initially was? This is worse than Newton v1.0 quality. It’s also about three generations behind the current starte of the art in voice recognition (Dragon Dictate is really good these days).But more importantly, those who are placing a lot of emphasis on the Nexus’ voice recognition ignore a lot of the culture of mobile phones, and how it is different around the world. In Japan, for example, talking on the phone in public transport is not just frowned upon – it will get your told off (politely!). Talking to your phone to dictate something is going to be similarly frowned upon.Even in Western countries, talking to your phone in a public place is starting to be unusual. How many people text or email on their phones, rather than call? Do you think you could speak a blog post in Starbucks? And what would a coffee shop sound like if all those people working on their laptops were dictating to their phones instead?

    1. fredwilson

      i spoke a few emails while waiting in line at the supermarket last nightnobody seemed too upset about itand one young man in front of me told me he was going home and buying thegoogle phone

  58. davidkpark

    EM is faster than Gibbs and MH, but the numbers 1 sec with 85% accuracy and 5 min with 95% accuracy is completely made up.

  59. Greg Strosaker

    Voice recognition is the technology of the future. And always will be.

    1. fredwilson

      ha!

  60. Sidharth Dassani

    Damn good for what is a new technology and still being worked upon. The best thing is that it works in the cloud so the changes in the algorithm or their technique does not need to be pushed as an update and can happen in the background. After reading this post I can say that speech to text might be near perfect in 5 – 7 years

  61. peterghickey

    I for one welcome losing the keyboard. I have previously used dictation software and find it is improving. Where it really needs to improve is accommodating people with accents. Imagine the primary information we can gather at that point coupled with the ability to distinguish between speaking apply semantics and improved sentiment analysis.

  62. Matt

    Well,From the looks of it, you have good middle-school education!

  63. davidorban

    It is not explicit when Isaac Asimov’s Foundation trilogy is played out, but given that even the memory of Earth has become legend, I think it is safe to assume that it’s year 10000 or beyond. Here’s a quote from its third book the Second Foundation:http://bit.ly/arkady-dictation“Arcadia Darell declaimed firmly into the mouthpiece of her transcriber:“The Future of Seldon’s Plan, by A. Darell” and then thought darklythat some day when she was a great writer, she would write all hermasterpieces under the pseudonym of Arkady. Just Arkady. No last nameat all.“A. Darell” would be just the sort of thing that she would have to puton all her themes for her class in Composition and Rhetoric–sotasteless. All the other kids had to do it, too, except for OlynthusDam, because the class laughed so when he did it the first time, And“Arcadia” was a little girls name, wished on her because hergreat-grandmother had been called that; her parents just had noimagination at all.Now that she was two days past fourteen, you’d think they’d recognizethe simple fact of adulthood and call her Arkady. Her lips tightenedas she thought of her father looking up from his book-viewer just longenough to say, “But if you’re going to pretend you’re nineteen,Arcadia, what will you do when you’re twenty-five and all the boysthink you’re thirty?”From where she sprawled across the arms and into the hollow of her ownspecial armchair, she could see the mirror on her dresser. Her footwas a little in the way because her house slipper kept twirling abouther big toe, so she pulled it in and sat up with an unnaturalstraightness to her neck that she felt sure, somehow, lengthened it afull two inches into slim regality.For a moment, she considered her face thoughtfully–too fat. Sheopened her jaws half an inch behind closed lips, and caught theresultant trace of unnatural gauntness at every angle. She licked herlips with a quick touch of tongue and let them pout a bit in moistsoftness. Then she let her eyelids droop in a weary, worldly way–Oh,golly if only her cheeks weren’t that silly pink.She tried putting her fingers to the outer comers of her eye andtilting the lids a bit to get that mysterious exotic languor of thewomen of the inner star systems, but her hands were in the way and shecouldn’t see her face very well.Then she lifted her chin, caught herself at a half-profile, and withher eyes a little strained from looking out the comer and her neckmuscles faintly aching, she said, in a voice one octave below itsnatural pitch, “Really, father, if you think it makes a particle ofdifference to me what some silly old boys think you just–”And then she remembered that she still had the transmitter open in herhand and said, drearily, “Oh, golly,” and shut it off.The faintly violet paper with the peach margin line on the left hadupon it the following:“THE FUTURE OF SELDON’S PLAN“Really, father, if you think it makes a particle of difference to mewhat some silly old boys think you just“Oh, golly.”She pulled the sheet out of the machine with annoyance and anotherclicked neatly into place.But her face smoothed out of its vexation, nevertheless, and her wide,little mouth stretched into a self-satisfied smile. She sniffed at thepaper delicately. just right. Just that proper touch of elegance andcharm. And the penmanship was just the last word.The machine had been delivered two days ago on her first adultbirthday. She had said, “But father, everybody–just everybody in theclass who has the slightest pretensions to being anybody has one.Nobody but some old drips would use hand machines–”The salesman had said, “There is no other model as compact on the onehand and as adaptable on the other. It will spell and punctuatecorrectly according to the sense of the sentence. Naturally, it is agreat aid to education since it encourages the user to employ carefulenunciation and breathing in order to make sure of the correctspelling, to say nothing of demanding a proper and elegant deliveryfor correct punctuation.”Even then her father had tried to get one geared for type-print as ifshe were some dried-up, old-maid teacher.But when it was delivered, it was the model she wanted–obtainedperhaps with a little more wail and sniffle than quite went with theadulthood of fourteen–and copy was turned out in a charming andentirely feminine handwriting, with the most beautifully gracefulcapitals anyone ever saw.Even the phrase, “Oh, golly.” somehow breathed glamour when theTranscriber was done with it.”As we are accomplishing things similar, and being in the cloud in my opinion even beyong Asimov’s Y10K and it’s barely 2010, I think we can be proud!(Sorry if the quote is so long, but I think it is relevant, and I find it delicious…)

  64. DuncanLogan

    Old school would be. If you write, I will read, If you speak I will listen, I guess we are already happy to mix the two. Blog post = read, Podcast = listen, Now I can get my voice mail in as text and emails at voice.

  65. Ms. Freeman

    It’s pretty cool, there are some issues that still need to be worked out like missing words, but overall I am excited about the possibilities forecasted over the next decade. 🙂

  66. Guest

    you might be interested in reading “the language instinct” by steven pinker. pretty interesting stuff. http://www.amazon.com/Langu

  67. GeekMBA360

    I think you’re 80% there, but the last 20% is the hardest. Maybe the solution (for now) is dictating the draft and having a virtual assistant to edit it (since a human can pretty much figure out what are missing and/or incorrect?). 🙂

  68. Tenkely

    I think that’s a pretty good result for it being unedited! I love that dictation has come this far on a mobile. I thought I heard that the more you use this feature on Android 2.1, the better it gets. I assume that is because it is cloud based or is there some kind of ‘learning’ built in? Either way this makes me even more jealous I can’t afford that Nexus One!

  69. Ben Bederson

    There are four fundamental reasons why speech will never replace typing (of some kind):* Quality – Voice recognition quality is very unlikely to ever be as good as typing in all situations* Speed – a good typist can type faster than they can talk* Cognitive – the act of speaking interferes with the act of putting sentences together becauseboth require the same verbal short term memory. Typing doesn’t. This is why extemporaneousemail is better than extemporaneous speech. Also, typing allows you to use your visual system to review your text as you type* Mode – when are you talking to your computer, and when are you just talking? You will always have to do something to tell the computer when to listen to your speech.It isn’t just that speech recognition isn’t good enough.

  70. spencerbryan

    If typing as we know it will be obsolete, what does that say about the future of office layouts? We peons that sit in cubicles would never get any work done if dictation were the norm. Maybe you should fund a soundproof cubicle startup, or simply invest in the largest manufacturer of voice silencers that are used in courtrooms and congressional hearings.

  71. davidshore

    I haven’t used Google Voice yet because it isn’t available where I live, but I’ve been using Jott.com for this for a couple of years and it is now part of my work routine. The key, once again, is convenience.I’m looking forward to testing Google Voice, but Jott has integrated with Outlook and Google Calendar for entering appointments, Facebook, Tumblr, Twitter, and a few others for updates; Saleforce, Remember the Milk and dozens more for data entries. You can even buy books from Amazon and post speed traps with Trapster.Their interface is so simple, most of my calls last under 15 seconds.Emails and texts are delivered with the mp3 recording so the recipient can check for errors, but I use it primarily for personal notes and calendar entries, so any speech recognition errors are just for me and not a problem Once a month or so I’m in a jam and to be able to quickly dictate an email or text to someone without looking at a keyboard (especially a virtual one) is a lifesaver.Now they need to integrate with Disqus for this comment!Like most new technologies, I dont think mobile speech recognition will do away with typed entries in a significant way for quite a while, but in certain circumstances and with certain people it has the potential to significantly change routine.It already has with me.

  72. Jeffrey Greenberg

    Not very impressed with the accuracy you seem to be getting. It needs to edited and corrected. If you are getting 95% it sounds accurate, but really that’s 1 in 20 words wrong and that’s alot of failure. And if you fix those errors verbally where the error correction is further subject to misunderstanding you experience the frustration of getting errors trying to correct errors: it can be immensely frustrating if you want to produce quality text.Been working on speech technology since 1996 when 95% was standard if you had trained the speech engine. With training you can get up to 98% at times, and if you do specialized speech, like law or medicine, it can be more reliable. Now anonymous (rather than trained) recognition is about there, which is good, but still not ready for ‘the masses’. And there are challenges with recognizing child and teen speech as their voice are far more dynamic than adults, and their evolving idioms.With that said, I think it’s a low-risk call to think it’ll be at the level of keyboards in 10-20 years… but that’s the infinite future. And there are social problems to not using keyboards: with everyone chattering to themselves in the streets of SF already it has an oddness in real-life that is only growing, even as we get closer to each other virtually.

  73. bojanbabic

    I type faster then I speak 😉

  74. Brian Hayashi @connectme

    In 10 years there will be more non-English content on the Web than English. Your post suggests a future Transcription Turing Test – whether a person can interact with two different systems and be able to determine whether the system was designed for their native language or whether it used machine translation.The point about exemplars is right on – a library of tools that facilitate translations and authentications, either between UI modalities or languages, to accommodate the diverse needs of future audiences and aware objects.

  75. mikereilly45

    I’m not looking forward to a world where I have to dictate everything I do to machines. I will hold on to my keyboard and whatever manual knobs and switches I can for as long as I can.

  76. fredwilson

    that gave me a nice chuckle charlie

  77. andrewwatson

    Google voice transcribes my wife’s name “Elizabeth” as “Little Bits”. Cracks me up every time.

  78. Mark Essel

    It has improved in leaps and bounds since then Charlie.

  79. pranav

    charlie – i’m not dismissing it — Just highlighting a few kinks..Moreover, with an Indian accent, the voice to text thingy doesnt work really great for me – although the mileage varies from each app to app..