How Combinatorial Reasoning Enables Small Language Models (SLMs) to develop specialized reasoning

Written by: Maddy Higgins

Published: June 2026

Old reasoning methods

A few years ago, LLMs were only able to surpass high-level reasoning benchmarks with labor-heavy prompt engineering. For example, researchers found that if a model was shown a few examples of the problem being solved in prompting, especially with the reasoning process written out step by step, the models would reason significantly better. This became known as “chain-of-thought” prompting.

However, having humans write out detailed examples is both time-consuming and impossible to scale. For every new type of task, someone had to craft good demonstrations and step-by-step reasoning explanations. But if the example-generation process could be automated, the machines could independently develop a robust reasoning system without external input. Combinatorial Reasoning was developed to address this issue.

The core idea

A language model is fairly good at generating potential chains of thought for a given prompt. It is less reliable at choosing the right pieces to build the best possible reasoning process. Combinatorial Reasoning, developed by the team at Icosa Computing combined with researchers at NASA, seeks to solve this problem.

Here is a broad outline of how the method operates. First, it asks the model the same question many times. The models were prompted to produce chains-of-thought along with their answers that express each part of the chain as a one-sentence “reason.” Hundreds of runs produce a larger number of total reasons. Many are essentially duplicates of each other and appear multiple times across the different responses. With this, researchers created a smaller subset of “distinct” reasons, matching similar reasons to each other using natural language processing.

In order to determine which of these distinct reasons are the “best,” the system extracts a few data points from each reason indicating its importance. For example, a reason appearing frequently (including the same reason framed in different terms) and being more similar to other reasons in the set increases its importance. Using these data to find the correct subset of reasons is where Combinatorial Reasoning comes in.

“Combinatorial” mathematics

Out of the set of distinct reasons, which subset should go into the final prompt? You want the combination that best supports the correct answer: reasons that are reliable, that reinforce each other, and that don't contradict or simply repeat.

Here's the difficulty. With just 50 distinct reasons, the number of possible subsets you could choose is two multiplied by itself fifty times, more than a quadrillion combinations. You can't try them all. And real cases are larger: the original work dealt with optimization problems involving a few hundred reasons, where the number of combinations is far greater still. This is a combinatorial optimization problem: choosing the best option out of a finite but exponentially large set.

Combinatorial optimization problems are common. A familiar example is a delivery driver looking for the shortest route through twenty cities, since there are far too many possible routes to check one by one. Choosing a set of investments for a portfolio is another. So is scheduling shifts, packing a truck, or routing a circuit board. These problems are hard, and over decades researchers have developed specialized methods to find very good answers quickly without checking every possibility.

Combinatorial Reasoning uses those methods. It translates the question into a standard mathematical form that these solvers understand, then hands it off to one of them. The solvers used here are sometimes called “Ising machines,” and they draw on an idea from physics: the way a physical system settles into a low-energy state as it cools. The solver treats a good combination of reasons as a low-energy state and works toward it.

Once the solver settles on a combination, those reasons are assembled into a final prompt. The model sees the question once more, now paired with a selected chain of its own reasoning, and produces an answer. No human wrote an example; the prompt was built automatically.

Frontier model reasoning capability

When this research first appeared, getting a model to reason well without hand-built prompts was a struggle, and there was a greater emphasis on the idea of “prompt engineering”: the idea that building strong prompts was itself a valuable skill. Today's frontier models are much stronger reasoners with or without strong prompts. However, this technique still has strong potential, but its uses and value proposition have shifted.

Frontier models reason well, but they are expensive, general-purpose, and run by an exterior company. You rent them through an API, your data travels to an outside cloud, and you pay a premium for a model with skills that your company doesn't need. It needs a model that is reliable in a specific area: reviewing contracts, analyzing financial filings, triaging support tickets, or evaluating investment ideas.

Application to Icosa's Small Language Models

This is why Icosa Computing is integrating Combinatorial Reasoning into fine-tuning local AI models for specific domains. Local AI models are cheaper to run, can operate on your own hardware, and keep your data under your control. Their limitation is that, on their own, they tend to reason less reliably than the large models. Combinatorial Reasoning can help narrow that gap. Because the technique produces structured, higher-quality reasoning, it can generate the kind of worked reasoning data used to fine-tune a small model. This trains it to reason more effectively within a specific domain.

This is at the core of Icosa's platform, building and deploying local AI models that run on a company's own devices, with an emphasis on privacy, ownership, and lower operating costs. Our platform lets a company fine-tune a local model for a particular use case. By integrating Combinatorial Reasoning, local models built by Icosa have reasoning capabilities tuned to specific domains at a level competitive to frontier models, at lower costs and with greater privacy.

Sources

arxiv.org/abs/2407.00071