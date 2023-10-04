ChatGPT has demonstrated its ability when it comes to writing: its use in the academic field is becoming (dangerously?) popular, and its way of writing texts – with different styles and imitating certain authors – is convincing. There are media outlets that are already starting to use these engines to write content, and the question is whether professional writers should be worried. The answer, at least for now, appears to be a resounding no.

Can creativity be measured? A recent study carried out by researchers from Salesforce and Columbia University sought to find out how texts written by professional writers, by amateur writers and by a generative AI model are currently appreciated. The idea was to evaluate “creativity as a product” through a variant of the Torrance Test of Creative Thinking (TTCT).

Here’s how the experts rated each story: most of The NewYorker’s stories were identified as coming from professional authors, while only Claude got quite a few stories that were identified as coming from amateur writers. ChatGPT stories were most frequently identified as written by an AI. Source: ArXiv.

Tests. In the study, a test bank was created with 48 short stories of about 1,400 words. 12 of them were created by professional writers, and 36 by three major language models (LLMs) applied to their respective chatbots: ChatGPT (GPT-3.5), ChatGPT (GPT-4) and Claude 1.3. They recruited a team of 10 creative writing experts, who administered the TTCT tests (in this case, TTCW, for ‘Writing’ and not ‘Thinking’) performing three different evaluations for each story.

At AI, mediocre. In the results they confirmed how “stories generated by LLMs” are between three and ten times less likely to pass the TTCW tests than stories written by experts. The conclusion of the study is obvious, and highlights “the competence of experienced writers to evoke creativity, outperforming LLMs by a considerable margin.”

ChatGPT has a lot to learn. In fact, the researchers also evaluated how these LLMs could improve in this area and in these tests to evaluate their creativity. According to experts, LLMs “not only have a double challenge in producing intrinsically creative content, but they also lack the necessary delicacy to evaluate creativity as experts do.”

Surprise. These results are somewhat surprising, especially because the tools that exist to detect texts generated by AI do not offer the confidence that was expected. OpenAI launched its own and ended up withdrawing it after admitting that it did not work accurately enough.

We still have a problem. It seems that professional writers can (we can) rest easy, but that does not mean that there are already problems with this type of generative AI engines. We see it especially on Amazon, where a large number of titles written by AI engines are appearing that even steal the name of human authors.

Image | Xataka with Bing Image Creator

