Business / Fri, 28 Jun 2024 The Hindu

OpenAI launches CriticGPT to spot errors and bugs in AI-generated code

OpenAI has introduced CriticGPT, a new AI model that can help identify mistakes in code generated by ChatGPT. Built using their flagship AI model GPT-4, CriticGPT was built to help human AI reviewers at checking code generated by ChatGPT. The research paper titled, ‘LLM Critics Help Catch LLM Bugs,’ showed decent competency when analysing code and identifying errors that help humans to spot hallucinations that they may not notice on their own. The researchers trained CriticGPT on a dataset of code samples with bugs that were inserted on purpose so it could recognise and flag coding errors. The study also found that notes given by CriticGPT were preferred by annotators over human notes in 63 percent of the instances involving LLM errors.

OpenAI has introduced CriticGPT, a new AI model that can help identify mistakes in code generated by ChatGPT. The tool will improve the process of alignment in AI systems through what AI developers call Reinforcement Learning from Human Feedback or RLHF. This will eventually make the outputs from large language models more accurate.

Built using their flagship AI model GPT-4, CriticGPT was built to help human AI reviewers at checking code generated by ChatGPT. The research paper titled, ‘LLM Critics Help Catch LLM Bugs,’ showed decent competency when analysing code and identifying errors that help humans to spot hallucinations that they may not notice on their own. The researchers trained CriticGPT on a dataset of code samples with bugs that were inserted on purpose so it could recognise and flag coding errors.

The study also found that notes given by CriticGPT were preferred by annotators over human notes in 63 percent of the instances involving LLM errors. The tool was also found to assist human reviewers help write more comprehensive critiques using a new technique called ‘Force Sampling Beam Search,’ while reducing hallucination rates compared to human-only and AI-only critiques.

Users can also adjust the thoroughness of the tool when looking for bugs and exert more control over its tendency to hallucinate or highlight “errors” that don’t exist. Besides having its own hallucinations, there are other flaws with the tool as well, like it could struggle to evaluate longer and more complex tasks since it is trained on comparatively brief responses from ChatGPT.

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

Additionally, a lot of times in coding, AI hallucinations happen after errors spread across multiple different code strings, making it even harder for CriticGPT to identify the source of the problem.