Skip to content

Smaller AI models reach 82% win rate by asking better questions, MIT study finds

Smaller AI models can outperform larger systems by asking better questions: MIT study
SHARE THIS ARTICLE

MIT and Harvard researchers have introduced a new approach to improving AI models, using a modified version of “Battleship” to teach them how to ask sharper questions and show how smaller models could outperform larger systems at far lower cost.

The study, conducted by MIT CSAIL and Harvard SEAS researchers, used the game as a testing ground for a broader AI challenge, examining whether language models can investigate uncertain situations by asking useful questions rather than simply responding to prompts.

Smaller AI models improve by asking sharper questions

The researchers created a “Collaborative Battleship” game in which one participant, called the captain, asks natural-language questions about where hidden ships may be located, while a second participant, the spotter, answers in real time.

The team first had more than 40 humans play the game, using their questions and yes-or-no answers to build a dataset called BattleshipQA. They then tested large language models, including GPT-5, and smaller systems such as Llama 4 Scout.

Without additional training, top models were able to finish the game in fewer turns than human players, but smaller AI models struggled to ask rational and useful questions.

The researchers found that the weakness was not only about model size, but about how effectively the systems explored possible answers.

Llama 4 Scout jumps from 8% to 82%

To improve performance, the researchers gave the models a Monte Carlo inference strategy, a method that helps weigh different possible locations for hidden ships as new answers come in.

The change produced one of the study’s most striking results. Llama 4 Scout, a smaller language model, initially beat humans only 8% of the time. After the inference strategy was added, its win rate rose to 82%, allowing it to outperform GPT-5 in the game while operating at about 1% of the cost.

“Today’s language models are primarily optimized to answer complex queries, but it’s less clear whether they learn to ask good questions for themselves,” said Gabriel Grand, an MIT PhD student and CSAIL researcher who led the work.

“Our work shows that asking informative questions depends on the ability to predict and simulate the world,” he added.

Code helps AI answer more accurately

The researchers also improved how AI systems handled the spotter role by converting natural-language questions into Python commands, giving the models clearer instructions for checking whether a ship was located in a specific area.

The method lifted answer accuracy by 15% on average, with GPT-4o-mini posting a nearly 30% performance boost and Claude 4 Opus improving by about eight points.

The team later tested the method on “Guess Who?”, where Llama 4 Scout’s success rate climbed from 30% to more than 72%, while GPT-4o rose from 62% to 90%.

Findings point to cheaper path for stronger AI agents

The findings suggest that future AI agents may become more useful in research-heavy tasks, including scientific discovery, medical diagnosis, coding and mathematics, by improving how they search for information.

The researchers cautioned that the game remains a simplified setting, but said the results point to a cheaper path for building capable AI agents by improving reasoning and exploration instead of relying only on larger models.

About The Coin Headlines

The Coin Headlines strives to bring trust into crypto media. At a time when every soundbite and headline can move the markets from red to green and vice-versa, The Coin Headlines promises to bring verified, credible and timely news and analysis from the world of crypto, blockchain, Web3, tech and markets. Founded in 2026, The Coin Headlines is based in the UAE with a team of experienced journalists and editors covering breaking news and updates from around the world.

From covering the biggest events to interviewing some of the most popular KOLs in the industry, The Coin Headlines keeps you informed of the latest trends and insights.

At The Coin Headlines our focus is clear: Real-time news updates, market movements, whale transfers, macroeconomic trends, tech and AI and geopolitical breaking news. The news we report goes through a strict editorial audit before its published to ensure the readers only get verified and credible information. We realize the world of crypto is dynamic, volatile, and many times, confusing. At The Coin Headlines we break down these complex issues into simple articles which cater to not just the experienced trader but also the student and first-time investor who wants to understand the space before committing to it.