Iteration 32. Analyze model predictions

12-10-2025

Goal

Analyze model predictions to understand the accuracy. Why is only solving ~20% of the ARC-AGI-1 evaluation tasks.

Motivation

To be able to improve I need to understand why it does not solve the tasks.

Development

Using predictions from previous experiments, I need to create a notebook to select the most accurate predictions and visualize them. I will do a random sampling of the unsolved tasks to diagnose the problems.

Results

I have analyzed a random subset of 128 predictions, 16% of the evaluation ARC-AGI-1 tasks were solved.

The plot shows that model has a good intuition of ARC tasks. Only 20% are complete misunderstood.

But at the same time only 16% of the tasks are solved when doing 128 predictions per task. With 20k predictions the solve rate is 38% according to the paper. But making so many predictions does not have sense and it is not efficient. Making a few independent attempt makes sense to have diversity in the predictions, but not in the order of hundreds or thousands.

Conclusion

Next steps

TODO

[ ]