Skip to content

Data Understanding

Collect initial data

https://github.com/fchollet/ARC-AGI

There are 400 training tasks and 400 evaluation tasks. The evaluation tasks are said to be more difficult than the training tasks.

The test set is hidden and it has 100 new and unseen tasks.

The tasks are stored in JSON format.

External data

There are some variations of the ARC dataset:

  • ConceptARC is a new, publicly available benchmark in the ARC domain that systematically assesses abstraction and generalization abilities on many basic spatial and semantic concepts. It differs from the original ARC dataset in that it is specifically organized around "concept groups" -- sets of problems that focus on specific concepts and that vary in complexity and level of abstraction. It seems to be easier than the original ARC benchmark.
  • Mini-ARC a 5 × 5 compact version of the ARC, was generated manually to maintain the original’s level of difficulty.
  • 1D-ARC A simpler version of ARC tasks with only one dimension.
  • Sort-of-ARC shares ARC’s input space but presents simpler problems with 20×20 images containing three distinct 3×3 objects. I only could find the paper, not the dataset.
  • https://github.com/michaelhodel/re-arc RE-ARC: Reverse-Engineering the Abstraction and Reasoning Corpus by Michael Hodel, member of MindsAI team
  • MC-LARC Text descriptions for the ARC training set.
  • arc-generative-DSL-infinite-data Jack Cole repo "Slowly building a collection of infinite riddle generators for benchmarking data-hungry methods"
  • Abstract Reasoning Challenge - community resources
  • arc-dataset-collection Multiple datasets for ARC (Abstraction and Reasoning Corpus)
  • Vercel app to create new tasks

Describe data

Explore data

Verify data quality

Amount of data