Most tech giants are investing heavily both in applications and research, hoping to stay ahead of the curve of what many believe to be an inevitable AI led paradigm shift. At the forefront of this resurgence are the fields of conversational interactions (personal assistants or chatbots), computer vision and autonomous navigation, which thanks to advances in hardware, data availability and revolutionary machine learning techniques, have enjoyed tremendous progress within the span of just a few years. AI advances are turning problems previously thought to lie beyond the realm of what machines could tackle into commodities that are percolating our everyday life.
Tailing the remarkable growth in popularity enjoyed by AI, a new generation of chatbots has recently flooded the market, and with them the promise of a world where many of our online interactions won’t happen on a website or in an app, but in a conversation. Helping turn this promise into reality is a combination of better user interfaces, the omnipresence of smart-phones, and new, state of the art, machine learning techniques.
Perhaps one of the main drivers behind this wave of novel AI applications is deep learning, an area of machine learning that, despite existing for roughly 50 years, has recently revolutionized fields such as computer vision and natural language processing (NLP). Nonetheless, despite its incredible performance, deep learning alone is not sufficient to solve the challenges faced by chatbots. The ability to understand context, disambiguate between subtle differences in language that can lead to wildly different meanings, logical reasoning, and most crucially, understanding the preferences and intent of the consumer, are just a few of the many challenging tasks a system must be able to perform in order to sustain a conversation with a human.
The ability to answer complex questions using not only context, but also information beyond the confinements of the dialog, is indispensable for building truly powerful chatbots. To answer questions effectively, the bot needs to rely on information that was either shared previously in the conversation, or even within other conversations between the bot and the consumer. Moreover, business goals and the intent of the consumer can influence the kind of response the bot will give.
If a modern conversation engine hopes to go beyond answering simple, one-level questions, it must blend together the most prominent techniques emerging from the field of deep learning, with solid statistics, linguistics, other machine learning techniques, and more structured classical techniques such as semantic parsing and program induction.
The first stop in building an intelligent conversational system is data. While we live in an era where endless streams of data are constantly being generated, most of it is too raw to be of immediate use for machine learning algorithms. In particular, deep learning is notorious for its need for vast amounts of high quality data before it can unleash its true potential.
Unsupervised Learning, the subfield of machine learning devoted to extracting information from raw data, unassisted by humans, is likely a promising alternative. Among its many uses, it can be utilized to build an embedding model. In plain English, these techniques allow one to represent their data in a less complex form, allowing for easier discovery of patterns.
While unsupervised learning is already ubiquitous in machine learning, deep learning offers additional innovative ways to build such embedding models providing state of the art performance. Optimization of these techniques can alleviate the need for a lot of high quality and expensive labeled data, which is essential in getting artificially intelligent chatbots to perform well.
However, the standard approach in deep learning involves collecting a large, highly specific dataset, which is subsequently used to train a network with a mostly static architecture. Once trained, the network maps directly from input to a fixed, known in advance, set of outputs. Despite being the foundation of remarkably powerful systems, this approach lacks the flexibility needed to handle the kind of information needed to carry a realistic conversation. Therefore bringing us to the next big obstacle in the way of truly human-like chatbots: The ability to maintain and reason with an internal model of the world.
We humans are constantly (and usually subconsciously) checking every new piece of information we receive from our surroundings against an internal model of the world a model of what is normal and what is not, of how entities are related, how we can make logical inferences involving said entities and so on. If, when driving, we see a ball rolling down the street, we immediately know we should slow down, and remain in state of alert, looking out for the possibility that a distracted child will soon pop out of nowhere while chasing their ball. This kind of intuition is built on top of an understanding of how an of entities relate to each other, combined with the ability of making logical connections along a knowledge graph and coming up with a conclusion that requires multiple reasoning steps.
This level of automatic and extremely broad reasoning still eludes AI researchers and is perhaps one of the last frontiers in the way of truly intelligent and autonomous AI agents, conversational bots included. To accomplish this goal the ability to reason is central.
Finally, putting it all together is yet another frontier waiting for a solution. Unlike a search engine where the user is content with being presented a list of matches ordered by relevance, a conversation engine must be more specific. Simply using NLP to identify a set of relevant information is insufficient. It should be able to parse the input, break it down, and present a response to the user that is not only clear and concise, but highly relevant to their taste rinse and repeat.
We are still at the early stages of the AI-powered conversational revolution, and it is fair to assume some problems that seem insurmountable today will likely be solved in the years that follow. We are quickly moving towards a world where you will be able to have long and complex interactions with your AI assistants, which will not only understand what you say, but know your preferences and style, tailoring your experience accordingly.
To do so we must merge multiple disciplines including deep learning, statistics, and others, building technology that blends consumer preferences, environment and language into one piece of intelligent, flexible software.
Mazdak Rezvani is the founder and CEO of Shoppe AI, an artificial intelligence shopping assistant platform for retailers that allows their customers to converse with them to discover and purchase products, and get post-purchase support.