One of the criticisms made of ChatGPT and Google Bard is how much they are wrong and how much they make up or hallucinate. It happens when we ask for data that we can corroborate —historical, for example—, but also when we ask them to program something or give an answer to a mathematical problem. Google’s chatbot, Bard, has just taken an interesting step to try to improve on this type of task.
They don’t calculate, they predict. As they explain in that Google ad, Large Language Models (LLMs) are essentially predictive engines. When you give them input, they generate output that tries to predict what words should go next. This is good in the creative field and in the generation of texts, but things change when we want precise answers in fields such as mathematics or programming.
Google Bard was not very good… This meant that when asking Bard about math or programming questions, this chatbot could frequently either give the wrong answer or even immediately indicate that it was not prepared to answer such questions.
… but that changes now. In Google they have made a series of changes that now allow Bard to behave better in these areas. As the developers explain, “relying on LLMs alone was not enough.”
think fast, think slow. The method is inspired “by a well-studied dichotomy in human intelligence, notably covered in Daniel Kanheman’s book —Nobel laureate in economics— ‘Think fast, think slow’ and talks about “System 1” and “System 2” of thought The first is more intuitive and gives quick responses, the second is slower, deliberate and with effort.
Bard wants to be a little more “System 2”. In this analogy, LLMs could therefore be included in system 1, producing text quickly but without thinking too much. However, traditional computing is aligned with system 2: “it is formulaic and inflexible, but the correct sequence of steps can reduce fantastic results, such as solutions to long division operations,” they indicate in Google.
If you can solve it with a program, do it.. The method Bard uses to make it “think slow” is in explicit code execution: when it identifies prompts that can benefit from logical code, it uses that code in the background and uses that code to produce more accurate results.
say this backwards. A typical example would be to reverse the letters of a word: Bard didn’t do it well in many cases, but now he is able to identify that for example there is a function in Python that does it, he uses it, applies that function to the word and that allows to obtain the correct result.
This logical reasoning problem is quite simple, but no engine has solved it well. Google Bard showed only 4 possibilities, ChatGPT (GPT-3.5) showed 6 possibilities, and ChatGPT Plus (GPT-4) showed 8 possibilities. The latter was missing the rest: a result of 3-2 (which he did consider GPT-3.5).
30% better, but not perfect. According to Google’s tests, this method allows for a set of problems that they use internally, the answers improve by approximately 30%. Bard’s folks warn: it’s still not entirely accurate, and in fact in math problems we’ve tried separately—here are a few examples—the answers weren’t always right. Bard, yes, advances, and that is good news.
Image | Xataka with Midjourney
In Xataka: I am a computer scientist and I work reporting software bugs to large technology companies