Skip to content

Data Understanding

Collect initial data

Test data

There are 100 problems in the test set, 50 in the public and 50 in the private set.

The answer to each problem is a non-negative integer, which you should report modulo 1000.

To receive the super prize I should aim to achieve a score of at least 94/100, current leaderboard score is 20/50 so it might be possible.

Train data

This is the train data, 10 hard mathematical problems. We can see that the problem description is short, in the longest case around 100 tokens.

id problem answer
229ee8 Let \(k, l > 0\) be parameters. The parabola \(y = kx^2 - 2kx + l\) intersects the line \(y = 4\) at two points \(A\) and \(B\). These points are distance 6 apart. What is the sum of the squares of the distances from \(A\) and \(B\) to the origin? 52
246d26 Each of the three-digits numbers \(111\) to \(999\) is coloured blue or yellow in such a way that the sum of any two (not necessarily different) yellow numbers is equal to a blue number. What is the maximum possible number of yellow numbers there can be? 250
2fc4ad Let the sparkle operation on positive integer \(n\) consist of calculating the sum of the digits of \(n\) and taking its factorial, e.g. the sparkle of 13 is \(4! = 24\). A robot starts with a positive integer on a blackboard, then after each second for the rest of eternity, replaces the number on the board with its sparkle. For some special numbers, if they're the first number, then eventually every number that appears will be less than 6. How many such special numbers are there with at most 36 digits? 702
430b63 What is the minimum value of \(5x^2+5y^2-8xy\) when \(x\) and \(y\) range over all real numbers such that \(\|x-2y\| + \|y-2x\| = 40\)? 800
5277ed There exists a unique increasing geometric sequence of five 2-digit positive integers. What is their sum? 211
739bc9 For how many positive integers \(m\) does the equation \(\vert \vert x-1 \vert -2 \vert=\frac{m}{100}\) have \(4\) distinct solutions? 199
82e2a0 Suppose that we roll four 6-sided fair dice with faces numbered 1 to~6. Let \(a/b\) be the probability that the highest roll is a 5, where \(a\) and \(b\) are relatively prime positive integers. Find \(a + b\). 185
8ee6f3 The points \(\left(x, y\right)\) satisfying \(((\vert x + y \vert - 10)^2 + ( \vert x - y \vert - 10)^2)((\vert x \vert - 8)^2 + ( \vert y \vert - 8)^2) = 0\) enclose a convex polygon. What is the area of this convex polygon? 320
bedda4 Let \(ABCD\) be a unit square. Let \(P\) be the point on \(AB\) such that \(\|AP\| = 1/{20}\) and let \(Q\) be the point on \(AD\) such that \(\|AQ\| = 1/{24}\). The lines \(DP\) and \(BQ\) divide the square into four regions. Find the ratio between the areas of the largest region and the smallest region. 480
d7e9c9 A function \(f: \mathbb N \to \mathbb N\) satisfies the following two conditions for all positive integers \(n\):\(f(f(f(n)))=8n-7\) and \(f(2n)=2f(n)+1\). Calculate \(f(100)\). 199

External data

External data is going to be crucial in this challenge since the training data is tiny.

AMC 12 Problems and Solutions

https://artofproblemsolving.com/wiki/index.php/AMC_12_Problems_and_Solutions

AIME Problems and Solutions

https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions

Other

  • https://www.kaggle.com/competitions/ai-mathematical-olympiad-prize/discussion/488473
  • https://www.kaggle.com/competitions/ai-mathematical-olympiad-prize/discussion/492945
  • https://www.kaggle.com/datasets/pedromoya/math-problems-solved-dataset-andersonbcdefg-hf/data
  • MATH dataset https://github.com/hendrycks/math
  • https://github.com/kipok/nemo-skills
  • https://huggingface.kxxx.link/datasets/pharaouk/math-orca-arch
  • OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
  • https://github.com/OpenBMB/OlympiadBench
  • https://www.kaggle.com/datasets/pedromoya/math-and-python-code-datasets-hf-collection

Describe data

All problems are text-only with mathematical notation in LaTeX. Please see the AIMO Prize - Note on Language and Notation.pdf handout for details on the notational conventions used. Although some problems may involve geometry, diagrams are not used in any problem.

Verify data quality

Since the data is just 110 problems I assume they are all correct. The train set has been solved manually in the forum thus I assume all the problems are correct.


Last update: 2024-04-28
Back to top