Feature Friday: Voice Posting
It’s Friday and I’m in Middletown Connecticut and I don’t have a laptop with me. All I’ve got is my Nexus 7 and my phone and so I thought I would I try Google Voice Recognition to write a blog post and see how it goes.
So I just fired up the WordPress app on my Nexus 7 and I hit the little button at the bottom of the keyboard that’s the microphone and I’m just talking into the Nexus 7 now and this is what being recorded.
I think it’s pretty easy to post with voice. This is working out pretty great and I think Google has pretty much got it nailed. Anybody who doesn’t like to type but is happy to talk can start blogging.
I don’t really have a whole lot to say today other than this so I will quit now.
Feature Friday is Google Voice Recognition on Android.
PS – I did not edit any of this other than to do some punctuation and make some paragraphs.
Comments (Archived):
it reads well.i wonder when Google will retire the nexus series and go endoskeleton?
Can you comment that way?
yes you can. and you can even make paragraphs by saying the word new and then the word paragraph, but together.Like that. This was spoken in to my Nexus 5.
That is impressive. Anyone know what is under the GoogVoice hood?
Satan
There’s a lot of evidence that their secret sauce is machine learning. Rather than trying to just “get it right” a la Siri, their recognition iterated over time and learned from its own mistakes (google voice voicemail transcription has a rate this transcription feature etc). The algorithm then reached a point where it began trouncing everything else. And it’s just accelerating.This is one of the biggest long-term bear cases on APPL. Big data is not in their DNA because web services aren’t in their DNA. And big data/machine learning/evolutionary algorithms may turn out to trump “good design”We shall see
>There’s a lot of evidence that their secret sauce is machine learning.Yes, applied to huge volumes of data, similar to how spelling correction is done in Google Search:https://www.google.co.in/se…Peter Norvig, Google’s Director of Research, has spoken (pun intended π about it. This article by him, though it’s about a toy spelling corrector he wrote in Python, has links to some of the actual research behind the stuff done at scale (links both in the body of the article and in the Further Reading section):http://norvig.com/spell-cor…
There are several papers giving the details, basically the first iteration was a standard Hidden Markov Model recognizer with a deep neural network instead of a Gaussian Mixture Model to predict the transition probabilities. These are remarkably better than what other people are doing typically since the Neural Network is better than the GMM at giving accurate predictions. They have recently been experimenting with using a completely neural network design based on Recursive Neural Networks and I know that they have gotten state of the art performance on some standard test sets using systems like this (TIMIT, etc.). Not sure if those have made it into production yet though.So basically they are something like Pocketsphinx + neural networks for the GMM piece + shit tons of labeled data.
yes just did
This has been so so long in the coming.A good friend’s father was a victim of the Unabomber and we worked to find him a voice solution back then. Nothing really worked.It’s a new behavior honestly but if it works, it will I”m certain find usage. It may take an Evernote or someone to mass market it though.
So I just fired up the WordPress app on my Nexus 7 and I hit the little button at the bottom of the keyboard thatβs the microphone and Iβm just talking into the Nexus 7 now and this is what being recorded. http://xub.me/VmOay
.This is but another example of how technology and convergence are working together.Wordpress, Google, Fred Wilson — all converging.Is this a great time to be alive or what?JLM.
i’d be happy to press my own ‘pause’ button now, and then ‘play’ in about 150 years from now. it would be an interesting experiment.
You wouldn’t have a clue how to proceed
it’s my human condition
Imagine what happens when you add BTC to that mix!I read an article a couple days ago about upstate dairy farmers using robots to replace old tech and labor. Better for cows, better for farmers, better for consumers. It is such a good time to be alive!
Neat. It reminds me if how disappointing Siri is time and time again, finding that going to google voice and most google apps is what I do more and more-on Apple products. Time to add Androids to the mix.
Now anyone can write a book. It could make for some interesting oral histories.
Middletown Connecticut …great falafell place there enjoy this was a voice message I mean comment and yes it tends to read as a run on sentence
Um.
Did you type that or speak it?
You can’t tell? That’s some good technology.
π
Nailing voice could be a really important step in making the use of technology when you are in the car safer. It just seems that more and more people are using their mobiles whilst driving.
Yup. I blogged about that three or four years ago. I don’t drive much (the joy of living in NYC), but the idea that people are texting and driving all the time scares the shit out of me
The car will do the driving before the voice will do safe texting
Let’s just skip the voice recognition. Soon google will have a car that will drive you around and write your blogs while you enjoy the scenery.
Now we’re getting somewhere.
I am lucky enough to still be able to tell the tale of someone using their phone and driving their car head on into ours at over 60mph. I can’t put into words how I feel when I see someone doing it.
What? That happened to you? Did everyone come out of that OK?
I walked out, my wife was cut out, broke her back and lost her two front teeth. First thing she said to me was she couldn’t move her legs fortunately after six months of hard work she recovered to about 90%. She now has persistent back pain but having said that it could have been a lot worse.
Wow. I am really sorry to hear about your wife. I hope she finds something to relieve the back pain
Thanks Fred
I hope that the jerk that did this to you and your wife suffered for what he did, physically or legally.
Oh, I am so sorry.
Thanks Anne. We do count ourselves lucky. It was a bit of a wake up call actually.
I’m glad she’s still here, and hope she can find a way around the pain.
Wow. I hope his insurance paid for all f the recovery
Ultimately it did Shana. But as it happened in France that tale would require a blog series!
…every time i go out riding my bicycle.i did read a report yesterday about an Apple product that will disable iphones when driving.
I won’t ride on the road. It’s way to dangerous where we live, cyclists are third class citizens, you are expected to ride in the gutter. I guess the advantage is that it has forced me to find some great trails!
i do, and the only way i minimise the risk is to always ride exactly the same route and at approx. the same time of day. i know where all the danger points are, and i ride with a paradoxical combination of assertiveness and defensiveness. the trick is knowing when to be which.
Also, where you live all cars drive on the wrong side of the road, it is crazy!
It was in France Fernando! and I was on the right side of the road, both literally and legally correctly! (although when we reached hospital they assumed, incorrectly that I had been driving on the wrong side)
The wrong side was a reply to your cycling comment, would not joke about the other one!A couple of summers ago I spent part of my holidays driving around Scotland and I found it surprisingly easy to adapt to your side. I guess that it is the same for you while out of the UK. The toughest thing was switching gears with my left hand, but apart from that I stopped worrying in no time.
So sorry to hear that.
.Driving while texting has become the greatest source of non-drug or alcohol driving accidents and deaths in that demographic.JLM.
What could be so important that it can’t wait until one could pull over and stop?My aunt’s yoga teacher died in such an accident, and the teenager who hit her will have to bear this responsibility for life.
It’s coming. There is no other choice.
That’s cool. I like the idea of the Amazon Fire. I hate working the remote and trying to toggle back from DirectTV to AppleTV and then searching on both. DirectTV search is particularly painful.Years ago, we tried a voice app for commodity trading so traders could talk into the machine rather than point and click. It didn’t work. Now its 15 years later and they are getting better and better.
Bezos seems to be quite bullish on voice with 2 product releases based around it in the past month. I have some feeling that their data from the Mayday feature shocked them into seeing the actual average technical competence they were dealing with and pushed voice recognition to priority #1
i furniture voice positing woks incredulously wheel. it’z reeliable, acarrot and phun to yoose. – sent from my iPhone
Holy shit. Is that really the best that iOS can do?
deadpanning the satire. #nice
You just started a mini APPL selloff π
Very sarcastically funny of LIAD. I was actually very fond of voice recognition on IOS, but now I feel compelled to check it out on Android as well to see how close to perfection it is there.
Haha, that’s hilarious and gives me something to think about because I’m about to upgrade my iPhone and have no Android devices.
Do we call this FredChat?
Posting from my android. Its pretty good no big deal.It even recognize when I told it to create another paragraph here.in any event there is golf to be played and Pappy to be sipped
At the same time?
Is there another way?
Yes. It is called frustrating π
Last year I went from a 17 to a 10 by doing two things:1) refuse to ever ever think about my swing and always visualize every shot before I hit it2) focus on enjoying the bourbon, the outdoors and gambling with friends.Done with swing thoughts!
I am a 15. I will try your approach
In my head you guys were both commenting back and forth via voice recognition a.k.a. having a ‘real’ conversation asynchronously π
Shhhhh Fred won’t take my calls this is my only shot
It’s a good approach, my golf also improved when I started to relax over the golf and just enjoy the company.
Let’s get the Fred Wilson Invitational on the calendar… I’m in!
Has no one here ever read Inner Game?
Yes….good book, must dig it out again
it surprises me that the PC crowd hasn’t targeted that system.
I think it’s pete sampras who said you have to ‘play stupid.’
That’s great. Dawned on me last year playing golf and I realized… I played a lot of pretty high level basketball and never once…ever… did I think about how my elbow and wrist were lined up during a freakin shot. Be an athlete FFS!!!
I play best when I’ve got a song stuck in my head – gives me rhythm, and get’s me out of my head.I once broke 80 as a 15 handicap at the club championship one year – by teeing off with a 5 iron all day…
first shot is the least important just like “launching” a company
Well now I must know — what song was in your head that day?
I change it up… today I have Joe Jackson’s Sunday Papers stuck in my head… and I’m going to the driving range later today….https://www.youtube.com/wat…
Not caring (and not giving a shit) goes a long way to reducing the tension that causes a person to lose the fine motor control needed to just wing it and win at something. Not a golf observation (not a player) but a general thought related to not clamming up or getting to nervous. That nervous reaction (and second guessing) is a killer.
aloof wins life
To expand on this, my “we’re here we’re queer get used to it” award for the decade goes to Joan Rivers.http://www.washingtonpost.c…
My husband played in a tournament this week with regulars and a few semi-pro types. Required by his job… had to borrow a set of clubs. Won the long ball contest and placed third for the day. I think the last time he golfed was last year’s tournament. No pappy though. Go figure.
.A man who comes off the bench rusty with borrowed clubs and takes the long ball contest, is a keeper.The force is powerful in such a man.One would not be remiss to consider breeding with such a stud.JLM.
Ha! @jlm:disqus I just shared this with Mark and you made his day! Thank you. Everyone at work is accusing him of being a sandbagger.
> always visualize every shot before I hit itA standard in, say, violin is to ‘hear’ the sound in own head just before trying to make it on violin. It helped when I was playing violin. That is, instead of concentrating on just violin, bow, fingers, and hands. Somehow the brain and body can handle the manual details if just asked to make the sound heard. Also, of course, hearing the sound is a good source of comparing what the sound intend with the sound are making so that can make corrections.Much the same might work for throwing balls in basketball, football, baseball, along with golf and more. It’s an old trick.
An investor friend of mine in Jordan last week was raving to me about a Jordanain startup SWOT – https://www.sowt.com/ – that is basically a ‘social network powered by your voice’. I have yet to use their app but the concept is pretty cool and they apparently are doing very well. Here is another link with a commercial they are showing in Jordan for the app https://www.youtube.com/wat…
That is cool
This is why I never used voice dictation. It works, but it doesn’t engage the part of the mind that knows how to write, at least not for old guys like us. It sounds like a transcribed voice mail, not a post or a chapter of a book. Maybe it’s just me, not biology, but other than Erle Stanley Gardner, I don’t know of any successful writer who dictates.
You are sooooo right Seth
Deepak Chopra dictates as far as I know. Take that as you will π
I’m not sure that’s true. Put it this way… a few years ago it was definitely not true. But perhaps he has started doing that in more recent times. Is this something you’re totally certain about? If so, when did he switch methods?
.Dictation was always an art. You betray your age when you use that term.Back in the day before PCs, I used to handle all my correspondence in an hour or so daily. I always had an assistant who could take flawless dictation.She would hand me a letter. I would read it. I would dictate an answer. She would type it up. I would edit it. She would revise it. I would sign it. She would mail it and file it. Every letter went into a subject or client file and a continuous reading file by day of the year. All the same day.It was “in box zero” before the invention of in boxes.I learned this when I was a 4-star’s aide de camp and when I worked for a Fortune 5 CEO. It was a great habit.Now, I struggle with my in box like a rube. I have not been to “in box zero” in ten years.JLM.
You can’t be inbox zero anymore because the cost of filling your inbox for others is too low now. Even if you delete like crazy some people won’t stop filling it.
For me–Before PCs there was Wang systems and before the internet, faxes.Dictation was never the tool of the startup in my world.
Between GMail and Mailbox App I’m able to keep it pretty close but it’s not easy!
I asked my boss if I could use his dictaphone. He said, “No. You have to use your finger like everybody else.”I’ll show myself out now.
.Very funny.Well played.JLM.
Glad you like π It’s one of my favorites.
Different things work for different people. I create my drafts, usually from voice recognition, and then I do editing as required from the wordprocessor to further sculpture my thoughts until I’m satisfied with the outcome. Sure works for me.
Hm. I guess I always saw it as the opposite.For example, you (Seth) talk out loud like you write. I imagine it’s because you know that when something you’re saying might be recorded and read by people later, you want to put your thoughts into easier, readable, digestible thoughts. Thoughts that look like they were written, because that makes them easier to read.The best speakers do that.But they don’t do it because it’s just how they speak. They talk that way because they practice.So to bring it back around I guess I just see it as: if you want your dictated word to read like your written word, just do the same thing you would in an interview β talk like you write. (Which, again, takes practice.)
The problem is it’s not as easy to edit when you speak. And even if you have voice commands which allow you to edit you can’t as easily hop around and make minor changes. Not to mention that by using a keyboard and typing you automatically pace yourself when writing. And hearing your voice gets in the way of the quiet creative process that is necessary for many people (hand being raised here) to put the right words on paper if you want to call it that.
. It works, but it doesn’t engage the part of the mind that knows how to writeAgree. “engage part of the mind..” While some might argue that it is what you are used to (“old guys”) or how you are trained, it’s hard to believe that the ability to have a keyboard and to be able to quickly visualize and most importantly edit what you write doesn’t add to the process to achieve your particular writing goal.And since you are a writer (and make a living that way) you take what you write very seriously and realize it’s important. [1] I am the same way. I close deals and make money from writing exactly precisely the right sentence or paragraph with exactly the right tone for the occasion (negotiating by email essentially). So I have to obsess over the placement of words and the exact word many times. And it works. For me.I remember (“old guy”) when I had a graphic designer exclaim at the site of one of the first Mac’s running Pagemaker 1.0 say “the creativity just flows” (a manic state it put them in) vs. having to do pasteups (very slow). To me that’s the same handicap that you have in iterating when using any method other than a keyboard. And a good keyboard at that.[1] For example when I am on vacation or traveling I always have two laptops I would never think of being anywhere without at least one laptop at my disposal. As well as two or three forms of Internet access to boot.
Would you have preferred a SoundClound audio file of Fred speaking instead?
So true. For most of my life, my father had romanticized about the idea of writing a great novel. He had no track record whatsoever as a writer, mind you. Nevertheless, he, like many, had the recurring fantasy of one day sitting down and writing some great book for everyone to marvel over. His excuse for never actually doing it was always his lack of typing skills.One Father’s Day, many years later as well as ago (he passed away in 2010), I decided to buy him a state-of-the-art voice recognition setup to encourage him to make it happen. It was a desktop PC dedicated expressly to this purpose with a top quality professional microphone and literally the best software money could buy at the time. The total investment was just over ten grand but it was decked out and turnkey.He was delighted to receive it and it seemed his dream might finally be realized thanks to technology. The result, however. was that after he learned to use the tools, it made no difference at all. The real problem with getting his book done had nothing to do with typing skills but rather his lack of acumen as a writer.He only knew THAT he wanted to write a book but when it came right down to it, he didn’t know WHAT to write or HOW to convey it. And what little he was able to produce came out precisely as you would describe it.. like a transcription NOT like an immersive narrative. The faculty of mind that is engaged in the act of WRITING is utterly different than those used in speaking and cognition.Having said all that, it is great that this tech gets better and better. It has other fantastic use cases like AI and IVR (or to drop a quick blog post to your loyal fans from the airport so they know you’re alive but busy). Just within the last five years alone, it seems this tech has improved by leaps and bounds. This is thanks both to cloud computing that can hide a lot of the interpretation and understanding to big servers in the sky to the simple surge in demand caused by Siri and Google Now and the evolving UX around human-machine interface.I hope the art of great writing and the associated joy of reading such works doesn’t become yet another lost art. Looking at it optimistically, writers used to resist the pen to the quill and then the word processor to the old clanky typewriter. Perhaps there will be an era of writers who learn to “write” via voice using the SAME faculty of the mind as their scribing forerunners. But unless and until that day ever comes, I believe your assertion will prevail.
I have found that this is a game changer for middle aged technophobic women wrt texting. Feels like magic each time they use it and gets them hooked. Sample size is about 8 π
Does it work for men, too?
LOL! For some reason, I spend more time with middle-aged women than middle-aged men. YMMV π
I’m amazed at the quality. I can’t get a text message right on ios7
But there is a lot more to voice applications than dictation. How about voice commands to control devices around the home, asking for information or analysis from the Internet, connecting with other people, and doing things that were not possible before.I’m interested in voice, not to replace typing, but to enhance human performance and make our lives better, easier, and more productive.
“…voice commands to control devices around the home…”I’ve got that already: “Son, go get my slippers”
Nooo….you need an Ubi. http://www.theubi.com/https://www.youtube.com/wat…
Oh, nice, easy NSA listening portal. Joy.
@wmoug:disqus This is fascinating. We could do an entire post on this.
The machines are going to take over our lives….soon.
It’s little known, but you can actually form paragraphs with Google voice too – say ‘return return’. Punctuation is ‘comma’ and ‘period.’ There are probably more tricks that avc peeps know, would be useful if others have other good ones to share!
Anne Libby could have used that the last couple of months.
Hah, exactly what I thought when I read this post.
Why?
Broken wrist…recently cleared to type again.
Oofff
Fred, the feature works extremely well with the iPhone as well, which is what I’m using now.
Next time, try also dictating the punctuation and spacing. Say: period (.), comma (,), question mark (?), exclamation point (!), new line, new paragraph, smileyface.For example: “hi comma how are you question mark new paragraph”. Say commands quickly without pausing after actual words.Siri has a vastly richer list, btw.
Victor Borge!
OMG– what an esoteric and awesome reference !
http://www.youtube.com/watc…
I can never talk to a machine for more than a minute. Only Americans do.
Glad to see you visiting CT, nice day for a ride up the Merritt.Go Cardinals!
I don’t really drive. I trained it up and back
Good call. They call it a parkway because you spend a majority of your trip not moving.
Parkway vs 1-95 – I would always take a parkway vs an interstate HW with trucks.Going North on the Merritt should not be too bad in the AM
Valid point, most of my memories are driving south in the morning. I used to bring breakfast to eat during the traffic jam on the Merrit into Stamford. Rarely braved 95.
Hi Matt, Where are you living now? We live in Fairfield ct
I am currently up in Wethersfield and working in Hartford. Are you active in any of the startup/developer events down that way? Fairfield county is a bit far with my work schedule but I’ll usually trek down to New Haven for a NewHaven.io event once a month or so.
Not driving – Agree, as I get older – I would prefer to take mass transit.Re: Train – hopefully you took the Amtrak Penn Station to New Haven, not sure if you switched and took a Amtrak north toward Middletown.Re: Middletown – assuming you were visiting Wesleyan again, I think I traded disqus comments w/ the GothamGal…and she said your son is going there as well – Great School – Congratulation! Three-peat as Pat Riley would say. Our oldest child (Lizzie) is graduating HS this year as well, she is planning on attending Stonehill College just outside of Beantown.
One more on taking train……We went to the west coast two weeks ago for spring break….I have been there many times over the years…..but I have recently been more in tune w/ mass transit……I am still amazed that with all of the people living in Orange & LA counties they don’t have many trains…. It was not until we were headed to LAX ( I think we were on the 110) – that I saw some trains running…and there were only a few cars on each train.Just amazing!
I remember going to CompUSA to purchase the voice recognition software packages that were always big (size) and pricey ($$$$). They came with headphones and all sorts of dictation restrictions/needs. Isn’t it amazing at how much simpler it’s become? Instead of $499 software packages, it’s now native to your device. I’d imagine it still can improve, but at least it’s huge leaps over yesteryears.
The biggest disconnect for me is the fact that I don’t need/want those around me “listening” to my reply/post. Often my replies or posts would be of no concern or interest to them…as well as non of their business. That’s the rub for me…regardless of the practical appeal. This from a guy who sat through an entire Dragon infomercial as if hypnotized. They make it look sooo easy and I’m a sucker for convenience.At the same time, as a manner of input, I believe it is inevitably going to gain ground. I remember demonstrating the Microsoft Sync system to car buyers in 2010. The voice recognition was impressive and the various functions available via voice command were also impressive. However, as soon as passengers are in the car, my original point rings true.
I feel some where humane element is missing or nice collection of thoughts which you can by writing is missed using dictation …secondly probably there is a solution but curious of how do you comment on a response via dictation ?
This transcription is way better than the last time you tried it
Google really has nailed it (I love KitKat’s Touchless Control), but I always feel that how the company nailed it is underappreciated. In 2007, Google released a free (but widely advertised) directory service GOOG-411 across the US. Until its discontinuation in 2010, they used the service to collect immense ammounts of data on how each regional or international accent pronounced the most commonly searched and used english-language words. It’s not a surprise that seven years later (and after millions hours of Android voice-to-text enabled by GOOG-411), that they’ve got the best detection out there.Page and Schmidt continue to suggest Google is, at its core, a machine learning company. But the company is very effective at driving user behavior that will speed up this learning (or supplement it where needed). ReCaptcha is another great example, as Google silently uses it to transcribe unclear text from its Google Books Library Project, rather than just to “make sure you’re human”. These are great bets, made well in advance of real day-to-day adoption.
Since you’re in Middletown, you should check out Tschudin Chocolates and First & Last Tavern. They’re right next to each other… so good.
The real First & Last is in Hartford π
Fred, there is a lady on the apple store on 14th, I had her card at some point, she has hearing limitations and she is hired there with the sole purpose of helping others with accessibility limitations. She knows all the ins an out of dictation, she can speak into the phone and do paragraph formatting and punctuation all with voice only.She is truly remarkable. Since you are in the neighborhood, you should stop by and say hi to her. It helps that she is a sweetheart to talk to.
I did the same thing one time when I forgot I needed to do a blog post and I’d just boarded a plane and had about 5-10 minutes before take off. Worked almost exactly the same as your post.I’ve occasionally wondered why I don’t use it more since it worked so well. I think part of it goes back to what Seth said, but I extend his idea by saying that typing speed is closer to my thinking speed. So, the time spent typing is used for me to formulate my next thought. Speaking goes so fast that your thinking can’t keep up. At least for me.
The progress in voice recognition is admirable. If you have iPhone, try the Google app.
I’ve thought for a while that an interesting startup idea would be for a conference call service with voice recognition note taking. Multiple people dial in, each person agrees to have their conversation recorded, voice recognition creates text notes along with who said what, and the whole thing is archived online where each participant can access it in the future. The text notes would be searchable and allow you to jump to that point in the audio recording. Free service to store a certain number of minutes. Pay to unlock more storage, more callers, etc. Voice recognition is probably good enough now to make the archive / search useful, even if not perfect.
I think after reading all these comments i would love to see what it wiuld be like to have had ust voice comments.