AI Chatbot, Progress, And Limits

The market is multiplying on a global scale and in India. According to research by the Artificial Intelligence Observatory of the School of Management of the Polytechnic of Milan, the turnover is 380 million euros. The use of chatbots has been progressively growing in recent years.

These AI software applications, used to interact with human conversations naturally, have become commonly used across multiple industries for various purposes. Nonetheless, the development of AI chatbots faces difficulties since even the most advanced systems still have limitations, unsolved problems, and inadequacies.

Table of Contents

Chatbot Market Size And Factors

The chatbot market is developing extensively on a global scale, as well as in Italy, according to research by the Artificial Intelligence Observatory of the School of Management of the Polytechnic of Milan, which photographs a more overall growth in the artificial intelligence sector, capable of moving a turnover of 380 million euros.

Driving the demand for AI chatbots in sectors ranging from finance to entertainment, healthcare to education, and retail to well-being is the demand for 24-hour assistance services. In addition to harmonizing with customer needs, companies equip themselves with this program to substantially reduce costs, as highlighted by a study by Juniper Research.

On the other hand, there are increasingly more platforms and tools for creating chatbots that are increasingly accessible via mobile and sophisticated applications, despite the complexity of the AI, machine learning, and natural language processing activity needed to implement and make conversational, helpful software. The market is also enriched by different solutions offered by a growing audience of companies, large or small, and startups, such as Kore.ai, Omilia, Rasa, Senseforth.ai, Verint, and Yellow.ai.

The Experience Of Big Tech

From big tech, in particular, an exciting fieldwork experience emerges that allows us to verify the progress and limits of AI chatbots. Facebook, first since August 2015, has launched a virtual assistant called M for the Messenger platform.

In the intentions of the creators, the project had the ambition to overcome the limited scope and performance of previous conversational bots, proposing a new type of supervised learning AI ( memory network ) software capable of carrying out various activities such as booking, purchasing goods online, arranging travel, or delivering parcels. The attempt, however, failed because M, abandoned after three years of testing, often suggested the user had inadequate and irrelevant textual answers.

However, Meta (Facebook) researchers have tried again to develop a text-based AI chatbot with BlenderBot, now in version 2.0. In this case, it is an open-source chatbot model pre-trained on the Wizard of Wikipedia dataset, based on the new Retrieval Augmented Generation approach, more advanced than M, but in which the long-standing problems of machine learning recur, frustrating the original ambition of creating a system capable of a more natural conversation by fusing multiple aspects and abilities and being empathetic, intelligent and with a personality.

The meta-researchers themselves are forced to admit that BlenderBot 2.0 cannot fully understand what is most fitting and appropriate in the conversation and that, despite building a long-term memory, it cannot learn from its mistakes.

Google-Branded Chatbots

Big G, for its part, has developed LaMDA (Language Models for Dialog Applications) is a model based on the architecture of Transformer neural networks (such as Bert and GPT-3), through which standard scientific benchmarks in the field of Natural language processing are improved, requiring a significant increase in computing resources, available open-source thanks to Google Research.

The system has been trained to dialogue with humans, drawing on 1.56 trillion words from approximately 3 billion documents, over 1 billion conversations, and more than 13 billion transcribed dialogues. The training activity for the most performing version lasted two months using third-generation TPU processors and AI accelerators purpose-built by Google.

The result is that LaMDA can dialogue on many topics, responding more promptly to the questions posed by the human interlocutor and having a more natural and fluid conversation with more convincing and sensible jokes and terms, better simulating the human style.

Nonetheless, the system has significant shortcomings, and, for example, in the role of Mount Everest, researchers discovered that in many answers provided (one-third), the facts were not actual. In another test, LaMDA failed to answer questions on musical themes 1 out of 10 times. For Google researchers, there are many steps forward in the work carried out on their AI chatbot concerning the three metrics that measure its improvements, but there remains much to do.

Especially in terms of safety, in avoiding the model providing answers with inappropriate or violent content, prejudices, and hateful stereotypes, and the ability of the software to be accurate and make statements based on actual data. Compared to these metrics, the gap with the “human level” remains far from being filled.

Deepmind Models

Depend has also taken on the AI chatbot challenge, developing Gopher. It is a natural language processing model with 280 billion parameters, based on Transformer architecture, and trained on a dataset of over 10 TB, called Massive Text, with text contents taken and filtered from C4, Wikipedia, books, articles, web pages, and GitHub. Gopher, which can, like LaMDA, impersonate other subjects such as mathematician Ada Lovelace, surpasses the ratings of the best programs in the performance of 100 critical tasks out of 124 analyzed.

Based on the research of Deepmind, scaling a model improves performance in areas such as text comprehension, fact-checking, and identifying “toxic” language. Still, progress does not occur in logical domains and tasks where common sense must be used to draw inferences. On the other hand, the risks inherent in NLP models are still many and at different levels.

According to the researchers, two areas are among the most critical: the first concerns the insufficiency of benchmarking tools to prevent the output of disinformation, and the second is the risk mitigation concerning harmful social stereotypes reproduced by AI software.

Conclusions

Even if evolved, the most cutting-edge AI chatbot systems struggle to have a conversation similar to the human one without a hitch. The models have significant gaps and defects and could cause severe damage to user interaction in sensitive sectors such as healthcare. Not to mention the aspects more linked to controversial topics such as religion, illegal activities (use of drugs), morals, and politics, or the risks associated with possible attacks by malicious individuals who could abuse them for the harmful purposes of manipulation and disinformation.

Share this content: