You’re lost in a foreign country and you’ve forgotten that little card from the hotel, the one you hand the taxi driver that probably says ‘please try not to let this moron get him/herself killed (it’s terrible for tourism) but take him/her back to his/her hotel instead.’ A host of translation technologies and fast mobile networks might be your salvation.
Translators are a dime a dozen on the web — if you have a decent onscreen keyboard you can do worse than pointing your browser at Yahoo’s Babelfish or Babylon‘s free online version. Just type in ‘where is the train station?’ If you’re lucky the poor soul you’ve accosted will smile and point down the street, but if they launch into detailed instructions in their own language you’ll have to smile, shrug and shuffle embarrassingly away.
Because conversation is the natural form of communication, speaking into your mobile’s in-built microphone is far more accessible, which is possible with the mobile Google search app (available for iPhone and Android). You can use it for simple web searches — which is handy enough — but it’s when you want to conduct a conversation that it takes full flight.
First, your spoken query is sent into the cloud to Google computers where it crunches the file. The speech recognition engine uses statistical analysis to figure out which words match the sounds you make with each utterance (so speak as clearly as you can).
The resulting string of text is then submitted as a normal search query. As with every Google search you can view the results in websites, images, videos, news, shopping and a host of others, including Translate.
Choose your target language and your search query is translated. There’s a button on the Google mobile website which you click to clear the screen and display the text as largely as possible. It’ll be just like the lost card from your hotel, only you wrote it.
But the best part of the process has only just begun. Invite your erstwhile host to reply to your phone, and go through the process again in reverse. Their words will likewise be sent to the cloud, where Google’s speech recognition works with all the languages listed by the Translate engine (64 so far).
The resulting text will be submitted as another search string and will give you the answer as quickly as a normal text search. Translate it back to English and repeat until satisfied, then shake hands, bow or lightly touch your right fingertips to your forehead and go along your way.
How speech recognition works
There’s a tricky balance to be struck. When you call your bank and say ‘loan’ or ‘balance’ it’s relatively easy to widen the net so all our different speaking styles, pitches, speeds and tics are recognisable. The more detailed the speech, the more the characteristics of individual voices can muddy the water.
Speech recognition systems work by comparing your spoken recording to libraries of samples. A stock standard microphone picks up the sound waves generated by your voice and the software digitises it into thousands of arcane bits of information, from the shifting wavelengths (heard by us as changes in pitch) to what’s called ‘plosive consonant sounds’ — the unique sonic properties of vocalising quick bursts like ‘ch’ or ‘p’.
Until we can build software that can literally understand human speech, it has to compare the set of digital markers of your recording against the profile of the prerecorded sample, generated a map of what you likely said. It’s easy to program a few basic rules about sentence structure (which further informs on which words do or don’t belong together) and the result is a workable digital copy of your utterance, which is then quite easy to convert into text.
More detailed voice software works by adding more samples to its libraries from your own voice, which increases the accuracy.