Iteration 15. The path forward: Search & Learn

17-06-2025

Goal

Define the solution I want to implement in the following months.

Search and Learn

Search and learn are the two mechanisms to adapt to novelty, and I bet that to beat ARC we need both. Thus my proposal for ARC25 is a system that is able to explore the program space of the solutions and learn during the exploration.

The core of the system would be an LLM that uses a DSL to write python code to solve the ARC tasks. The LLM will be used to search solutions for the task, and the model itself will learn to guide the search process balancing the depth/width dilemma. The generated programs that do not solve the tasks will be relabeled with hindsight and the model will be trained on those to adjust its prior beliefs and search more efficiently.

Search

As a human I know everything that I have tried (at least at a high level summary) when solving a new task. I also know when to change direction when reaching a dead end. The search history guides the next steps of the search to avoid repeating previous failures. This was one of the weaknesses of my previous iterations, where all the generated solutions were independent.

When searching a solution for an ARC task there are 3 high level actions that we can take:

Generate a new function
Refine an existing function
Combine multiple existing functions into a new one

Search can be visualized in a graph like in the Sakana blogpost

During training we can sample the actions randomly or exhaustively, and if we find a solution we could label the actions in retrospective. That would allow to train a model to acquire the taste to guide the search intelligently.

When generating new functions we should give the previous functions as context to encourage generating novel functions. This could also be trained, with some novelty loss function. Another way to foster diversity would be to exclude some of the DSL functions in the prompt (because typically we would give the footprint of the available functions in the DSL)

Learning

All the predicted functions can be treated as new task, with hindsight relabelling. On previous iterations I have already seen that on toy tasks this enables broader generalization, and on the SOAR paper a small improvement was measured when using this technique.

Alternating cycles

One possible implementation would alternate between search and learning phases in a cycling fashion. For example the search phase could involve sampling 32 new functions, then switch to training on those new 32 tasks.

Other details

Data augmentation

Each task variation would require a different function most of the cases. Thus when searching we should treat each task variation generated with data augmentation as a different task. The value of using data augmentation during search would be adding diversity. But since compute is limited, it should be investigated if it is beneficial.

It could be used for evaluation and to generate more training data.

Continuous metric

Evolutionary methods require a continuous metric. The nature of ARC is binary in essence, but maybe we can define some proxy metrics that are continuous and allow to better explore the search space. This should be investigated using the search traces of the model.

Data generation

I believe that the best data generation mechanism is the one used on the transduction and induction paper. This requires implementing a DSL and writing solutions and generator for the ARC tasks. But once we have that data we can use frontier LLMs to generate an arbitrary number of new tasks.

The other source of data would be the data generating during evaluation of the system.

Summary

Generate python code with an LLM to solve ARC tasks. The system uses an intelligent search process and learns from its mistakes.

Next steps

Start by using the DSL defined in the BARC repo. Do search with base models
Fine-tune on BARC and repeat search experiments. Analyze unsolved tasks