The difference between this approach and its predecessors is that DeepMind hopes to use ādialogue in the long term for safety,ā says Geoffrey Irving, a safety researcher at DeepMind.Ā
āThat means we donāt expect that the problems that we face in these modelsāeitherĀ misinformation or stereotypes or whateverāare obvious at first glance, and we want to talk through them in detail. And that means between machines and humans as well,ā he says.Ā
DeepMindās idea of using human preferences to optimize how an AI model learns is not new, says Sara Hooker, who leads Cohere for AI, a nonprofit AI research lab.Ā
āBut the improvements are convincing and show clear benefits to human-guided optimization of dialogue agents in a large-language-model setting,ā says Hooker.Ā
Douwe Kiela, a researcher at AI startup Hugging Face, says Sparrow is āa nice next step that follows a general trend in AI, where we are more seriously trying to improve the safety aspects of large-language-model deployments.ā
But there is much work to be done before these conversational AI models can be deployed in the wild.Ā
Sparrow still makes mistakes. The model sometimes goes off topic or makes up random answers. Determined participants were also able to make the model break rules 8% of the time. (This is still an improvement over older models: DeepMindās previous models broke rules three times more often than Sparrow.)Ā
āFor areas where human harm can be high if an agent answers, such as providing medical and financial advice, this may still feel to many like an unacceptably high failure rate,” Hooker says.The work is also built around an English-language model, āwhereas we live in a world where technology has to safely and responsibly serve many different languages,ā she adds.
And Kiela points out another problem: āRelying on Google for information-seeking leads to unknown biases that are hard to uncover, given that everything is closed source.āĀ