Movies and pop culture have always depicted the ideal electronic companion to help manage your life. Siri was a good first step towards that long-promised future but as Drew Turney discovers, you might be surprised just how close it might be.
Picture this; you leave work and stop at a coffee shop on the way to the bus stop. You’ve in the queue when your mobile beeps. When you answer, Maia’s familiar voice cheerily says ‘hey, you know if you drink coffee after 3pm you won’t sleep, and you’ve got that early meeting tomorrow’.
Maia isn’t your secretary. An acronym for ‘Machine-Autonomous Intelligent Assistant’, she’s your personal artificially intelligence helper.
She knew where you were from the GPS on your phone. Your heart rate and breathing told her you were standing still so she assumed were standing in the queue. Since she knows your medical history, she knows about your afternoon caffeine intolerance, and with access to your calendar data, she knows that meeting is tomorrow. Maia has constructed a picture of your life at that moment from a huge variety of sources, and she talks to you like a friend.
But it’s not all about convenience. Imagine Maia knows from your GPS data that you’re on a major highway and that you suddenly go from 90 to zero very quickly. She sees that you applied the brakes hard before you stopped. An accelerometer in your car (the same mechanism that reorients your phone or tablet screen based on how you’re holding it) reports that the car isn’t level.
Step one would be to ask if you’re okay. If you don’t respond, step two would be to summon emergency services with immediacy that might mean save your life.
Maia might seem like science fiction, but it might surprise you to learn that many of the technologies that might build Maia or something like her are already in widespread use – you hold several elements in your hand every time you use a smartphone.
When it comes to the future of artificial intelligence we’ve dreamed about for so long, what if the only thing missing is the systems integration needed to make it happen?
What algorithms do
Where HAL 9000, the coldly murderous robot of 2001: A Space Odyssey, seemed far off in 1968, Spike Jonze’s 2013 film her – in which an introverted writer (Joaquin Phoenix) falls in love with an artificially intelligent operating system (voiced by Scarlett Johansson) – seemed all too prescient.
Could we be fooled into falling in love with AI because it’s so lifelike? Might we even prefer it, avoiding the emotional uncertainty and compromise of human relationships? How long before scammers use AI to pose as real people, befriending us to empty our bank accounts?
So far, most AI applications aren’t designed as personal helpers (‘Siri on steroids’, as it’s sometimes called). Most take advantage of the unique talents of digital computing (remembering numbers with perfect fidelity) to perform outstandingly well at a single problem. Your antivirus software is very good at stopping threats it’s never seen before (more below) but it’s never going to forecast the weather – even if both use rudimentary AI in some form.
Understanding us
The way humans developed language couldn’t be more different than the way we designed computers to do it. Where ours is slipshod, abstract and messy, a computer imparts information in a strict, rigid form – the tiniest variation renders the information unintelligible and meaningless.
Speech-to-text software has been bridging the gap for decades now. Modern voice applications come pre-loaded with a huge dictionary of words and sentence structures, matching the sound patterns in your words to examples in the built-in library.
If you have a lisp and say ‘thtate library’ and the system has no word that matches it, it’ll ask you to input it, which assigns that sound pattern to that word. Next time it hears ‘thtate’, it knows how the word ‘state’ sounds in your particular voice, adding to the dictionary of knowledge and improving accuracy over time in an ever-more refined feedback loop.
iOS’s Siri does just that today, but with a much bigger dataset than just an onboard library. Ask her ‘where’s the thtate library?’ and she’ll have access to countless examples of the same sound pattern matching the word ‘state’, letting her take a much more accurate guess at what you mean.
Google is still one of the leaders in the field, launching a neural network product in 2012 a spokesperson for the company said resulted in a 20-25 percent improvement on dealing with the wrong word.
More than just words
Of course, most communication between humans is non-verbal. Even forgetting body language for a second (there’s no model for a computer to even see us, let alone understand what our bodies are communicating) the pauses, tones and a hundred other non-linguistic cues contain a wealth of information software isn’t looking for. In 1971 a psychologist coined the 7-38-55 rule – percentages of the relative impact of words, tone of voice and body language when speaking.
IBM’s Watson, famous for beating a human player at Jeopardy, offers a service called Tone Analyzer, designed to offer feedback on text you write and upload. Programmed to look for emotional and social cues including negative, cheerful, angry, confident, sad and disgusted, it forms the bedrock for a computer interpreting our emotional state by what we write. Combine it with speech to text, and it might give the software an appreciation for our emotional intent when we speak.
Of course, digitised mental-state machines are already widespread. Wearable devices are the first steps in reading the things we don’t – or can’t – say. They already track a host of ‘personal metrics’ from the number of steps walked to the quality of your sleep.
Can a wearable sensor be very far off that puts together heart rate, capillary dilation, skin conductivity and other physical giveaways of our real intent when we say ‘of course I’ll still respect you in the morning!’, and use them to build a picture of how we feel?
Backchat
Of course, when we talk to a computer we don’t want a reply like ‘function(a,b,c){function d(a){var c,d,e,f=b.createElement(“style”)’
Having AI respond like it was an approximation of a human is a whole other problem. As described above, the system has to understand the intent of what we’ve said to begin with in order to craft a response. In this case dictionaries and databases won’t cut it – human speech is just too hit and miss, and even though you might be saying something quite simple, the system won’t understand if there isn’t a comparable example somewhere.
Enter machine learning (more below), an area that might have the most impact on our AI helper of any other. Has your GPS ever suggested a different route to avoid traffic? It’s not using data someone programmed in, it’s interpreting new data available according to rules about its own behaviour – algorithms that let it learn.
When it comes to crafting speech to respond to us appropriately it’s all about context. A 2013 University of Texas research project called ISOGRAM gives a computer the power to draw conclusions about sentence meaning based on common word associations. If it reads ‘the robber was charged’ it’ll find more examples of robbers receiving convictions than of robbers being hooked up to batteries in a given dataset.
Although ISOGRAM wasn’t developed to teach computers how to talk, such examples of ‘word relationships’ let it constantly build on the experience of correct phraseology. And expanding the available data to the entire contents of the web would widen the potential exponentially and form the first step for a computer to construct an appropriate reply.
When Watson won at Jeopardy it was a different kind of victory than when IBM’s Deep Blue beat chess champion Garry Kasparov in 1997. At it’s simplest, Chess is just maths and probability, but Jeopardy calls for more than just semantic, factual knowledge. According to the game’s rules Watson had to understand the spoken prompt and use the semantics of language to construct the right response.
More recently, a system called PEGWriting was used in education that uses algorithms to measure more than 500 text-level variables and provide feedback on characteristics like idea development, organisation, style, word choice and sentence structure.
How hard would it be to turn such a system inward and let the software critique its own composition and construct the best sentence to answer with – all at the speed of computing?
Machine learning
The art of machine learning is to program a machine to learn the way a human does thing – no easy task given our talent for abstracts and computers’ appreciation for immovable absolutes.
A biological knowledge system (like your mind) wields an incredible array of elements, all of them solid enough to ensure you do a pretty good job surviving in your environment. The brain works by recognising patterns. If you hear a sentence but don’t know the specifics, you probably have enough information and experience to put two and two together and understand the gist.
Computers are pattern-blind – they need an exact example of what they’re looking for. But a good example of machine learning can be found in antivirus software. Needing to know exactly what to guard against makes it tricky when programmers constantly rewrite bugs so they don’t look like older nasties.
In ‘flat’ computing, a single difference in the code of the virus would be enough to get through, so we program some room for error. If a virus can be expressed as ‘12345’, the antivirus software can be told to treat a process containing ‘12354’ as suspect, flagging it for further checking.
It’s called heuristics or experience-based programming. Given a little ‘wiggle room’ in what it needs to identify, a computer can appreciate nuances that aren’t hard-programmed in, and as that experience builds, further wiggle room gives it a body of knowledge that’s been ‘learned’ in the traditional sense.
Today heuristic programming is used in everything from optical character recognition (OCR), credit card fraud protection, computer-generated armies in movies or games and designing train timetables.
It finds a natural home in a field called sentiment mining, which aims to take the temperature of feeling among a given topic online. Where other text-based big data schemas will just search for keywords or phrases, sentiment mining accounts for the human variables that reveal feeling – idiosyncrasies, (‘I looooooove you’ rather than ‘I love you’), misspellings, alternative social use (‘you kill me’), booster words (‘totally love that’) or words that don’t appear in many formal dictionaries (‘amazeballs’).
Give an AI agent a big enough body of such experiental data {again, like the entire web) and it will be able to talk back in two crucial ways – informally, and at the usual speed of human communication.
Of course, there’s always the chance of false positives. Your antivirus might block a legitimate program, or Maia might say ‘you’re at the coffee shop? Careful – remember the time you tripped over’.
But when a friend says ‘I absolutely went to town’ and you say ‘really, did you drive or catch the train?’ and laugh as you embarrassingly realise your mistake, it’s clear human communication is full of such misunderstandings, yet we still co-exist just fine. In fact, brandishing such talents for discerning the optimum from variables might be the very definition of ‘intelligence’ we’re looking for.
Machine learning applications
As machine learning techniques mature, they’re being applied to more problems and industries all the time.
* A deep learning (a subset of machine learning) system in Israel is looking at X-rays to identify pathologies, taking some of the burden off overworked clinicians who don’t want to miss something important just because they’re harried or tired.
* A news service in Chicago uses algorithms to repurpose statistics and facts in raw form into narrative-driven news stories.
* A Georgia Tech research project involves an automatic video editing algorithm that analyses video for images with ideal artistic properties (geolocation, composition, symmetry, color vibrancy) to determine what’s important or picturesque, and then assembles it into a final presentation automatically.
* A University of Pennsylvania study looked at 28,000 domestic violence cases and enabled the computer to ‘learn’ who was more likely to re-offend, giving judges a far better idea of who to detain in the initial arraignment.
Nice knowing you
One of things that makes our friends and loved ones so personal to us is how well they know us – they relate appropriate to the relationship and our shared history. One aspect of an AI helper like Maia that will be crucial is that she knows your preferences, inclinations and personality.
Here’s the exciting (some would say scary) thing – a computer has the opportunity to know more about us than any human. As long ago as three years, it was determined that 90 percent of the data humanity had produced in our entire existence had been generated in the previous two years.
That’s an awful lot of information out there about you. Called everything from your social graph to your digital DNA, it’s not nearly as segmented as information in real life. While your mother knows you love cheesecake, she doesn’t know what adult websites you might have visited, but both facts about you exist online in the form of an Instagram pot about your last cheesecake and your last purchase from Red Hot Video Inc.
An AI agent with access to all that information will be able to put together an astonishingly detailed, real-time and constantly evolving profile about you. Maia won’t be able to just speak to you, you’ll tell friends with giddy excitement that he/she really ‘gets me’.
Even more surprising, the technology to do so predates many of the others mentioned above. As far back as 1995, Amazon CEO Jeff Bezos explained how he was in the business of online shoppers’ data – selling books was just a way of collecting it.
The model created the first ever recommendation engines, where a website suggests something else you’d like based on a past selection. Amazon made headlines in 2014 with a patent for ‘anticipatory shipping’, sending you something before you even know you want it, but consider the scenes in ‘her’ when Samantha leads Theo (while blindfolded) to a slice of pizza and sends his book to a publisher without asking him. In both cases it’s because she knows him well enough to know what he really wants.
Making it all about you
After our experience with AI through art and culture, we might respond to it with terror that it’s come to overthrow us – the list is long, from Frankenstein to The Matrix.
But it’s in the commercial interests of the companies who produce our tools to have us love them, and they spend considerable money and effort designing and focus testing their appeal. To do that they’re unwittingly hijacking a particular quirk of human nature – research has shown that the same brain circuits are involved irrespective of whether the object of our affection is human or inanimate.
her used the husky tones of Scarlett Johansson, but an even better way is to use the voice of someone you already love – a spouse, child or even someone who’s no longer with us. Add it to the concept of an AI familiar with your social graph and you have ready-made facsimile of a real person.
In talking about ‘her’, futurist and AI expert Ray Kurzweil wrote; ‘Jonze introduces another idea … AIs creating an avatar of a deceased person based on their writings, other artifacts and people’s memories of that person’. As more recordings of our voices find their way online in YouTube videos, podcasts or Skype voicemail messages, there’s a ready-made record an AI can use to ‘build’ a virtual copy of our voice.
And as you might expect, voice cloning is already a nascent industry. Services exist that let you produce dialogue, speeches, game content or presentations spoken in custom voices, including that of your favourite celebrity.
Build all that into a robot and a beloved, long-dead grandma might bring you soup and chocolate when you’re sick in bed. Though robotics is a whole other engineering problem from AI (which is software-based), it’s only going to make AI more relatable for us thanks to our tendency to anthropomorphise.
Pulling it all together
Exactly how speech recognition, fuzzy logic response programming and detailed personal information will interact to build Maia isn’t clear. Maybe the principles that form the individual parts themselves – particularly machine learning – will be recruited to design a blueprint for them to work together.
Where it will happen is an easier question to answer. Aside from the very real access and privacy concerns, cloud computing means software and data can connect together to form systems that are more than the sum of their parts like never before. Think of big data sales analytics, which synthesises information from multiple sources to identify new markets no human operator could ever find.
Before the web age there wasn’t much data available and it was cumbersome, slow and expensive to connect together. It’s also going to take a lot of distributed computing power and the old days of the standalone PC would never have cut it.
Maia’s natural home will be among the power and wisdom of both clouds and crowds. The internet – both as a repository of information and a distributed processing medium – is the biggest network that’s ever existed. Every system connected to the network will be its RAM and hard drive, and the websites and apps of the world will be conduits into the machine brain.
Even then, not all the computation need happen in real time either – which will make an AI software agent even faster. As the AI gets more experience, many of the appropriate responses and actions will already be on hand somewhere for the system to merely pluck out and execute, just like a grown-up constructing sentences properly after a lifetime of word use.
The processing will decide what to do next by mapping inputs to outputs according to a massive connection matrix. Using the software in real time takes much less computer power once the off-line analysis creates the data.
The prime mover
All it might take is a ‘brain’ to put it all together, and we have a model for that too. In 2014 Cornell University developed Robo Brain, a system that downloaded and stored a billion images, 120,000 YouTube videos and 100 million how-to documents and appliance manuals. Why? When robot helpers become standard, they’ll have a library of knowledge they can refer to with ease and speed to carry out the tasks we assign them. Couldn’t a similar digital information clearinghouse form the executive-level cortex of the AI helper as it trawls the online world to maintain and inform itself?
Cybersecurity gives us another example. Many antivirus software providers now contribute to a huge database of known threats. As the database grows, the ‘story’ of a computer virus takes shape, letting vendors figure out what havoc it will cause and how similar it is to an existing threat faster and more accurately, letting them get ahead of the situation and issue defenses faster.
Right around the corner?
All of which prompts the question; when? Getting the technologies described above to work together seamlessly is a must – nothing will shatter the illusion of a human-like AI helper like having his/her dulcet tones interrupted by a canned recording saying ‘An unknown error has occurred, code 17199’.
When AI helpers will arrive depends on who you ask. Tim Tuttle, CEO of natural language interface provider Mindmeld, says that while rudimentary applications are already all around us, such all-encompassing, self-contained AI software agents are unlikely in the next 30 years.
But Gary Clayton, former chief creative officer for Nuance Communications (the company behind speech to text application Dragon Naturally Speaking) and currently a Silicon Valley investor and advisor, told an Australian science magazine that a guess of even 10 years might be a little conservative.
Whichever is true, and whether it’s Samantha, Maia or HAL 9000 reading this, we’re anxious to meet you.
Delivery
In her, hero Theo takes Samantha (through the conduit of his phone) everywhere, and it never fails him. The reality – even in Australia in the 21st century – is very different. The data generated by constant voice conversations with all our AI helpers will need stronger, more robust networks than we’re used to. According to Mindmeld, voice usage has grown from virtually nothing when the company began in 2014 to over 15 percent of all search traffic today.
A high-speed public network like the NBN shows promise, as will some comparable future wireless network, but when many of us still experience call dropouts in major cities, mobile or wireless networks are going to have to be way better to package and transport such services.
The role of fiction and culture
Even if the technology exists today and we figure out how to put it all together tomorrow, it’s easy to forget about the extent to which movies, TV, books and popular culture ‘prime’ us for possible worlds.
For something to catch on, it needs more than technical feasibility – just ask the makers of the Apple Newton, WebTV or the Rabbit Phone, variations of which are all hits today despite failing in their time.
One of the most distinctive elements of Steven Spielberg’s Minority Report was the scenes of Tom Cruise assembling evidence from images and video by swiping and gesturing on a giant heads-up display. Was it a mere coincidence tools like the iPhone and iPad popularised swipe interface in the mainstream?