Mathematics Model Prompt Evaluator Job at SaidGig, Remote

QXk4bVhiYWtLZlFMTkk2aVNyckxKNjYyb0E9PQ==
  • SaidGig
  • Remote

Job Description

Role Overview

Expert mathematicians are invited to author and verify high-quality open-ended prompts for AI model evaluation. In this role, you will craft and review challenging, unambiguous mathematical problems across core subdomains, assessing AI reasoning quality and helping establish rigorous evaluation standards for frontier language models.

Task Types

You will be assigned one of two task types:

  • Authoring Task: Create 5 original, open-ended prompts from your assigned subdomain at varying difficulty levels (undergraduate, advanced undergraduate, or graduate/professional). Prompts should require human judgment to evaluate the quality of the AI''s response, such as chain-of-thought reasoning or proof construction.
  • Verification Task: Review 5 authored prompts for clarity, scope alignment, difficulty accuracy, and uniqueness. Edit prompts and difficulty ratings where needed.
Mathematics Subdomains Covered

Probability & Statistics, Algebra (including Linear Algebra), Ordinary/Partial Differential Equations & Dynamical Systems, Geometry, Graph Theory, Number Theory.

Key Responsibilities
  • Author clear, unambiguous, open-ended mathematical prompts that elicit evaluable AI responses.
  • Verify prompts are within the scope of the assigned subdomain and correctly rated for difficulty.
  • Ensure all 5 prompts in a task are sufficiently distinct from one another with varying difficulty levels.
  • Apply expert judgment to assess the depth and quality of mathematical reasoning required.
  • Edit prompts and difficulty assignments where standards are not met.
Ideal Qualifications
  • Master''s degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field.
  • 2–6 years of professional or research experience in a quantitative field.
  • Strong command of graduate-level mathematical concepts including proof writing, analysis, and formal reasoning.
  • Experience in academic research, mathematical competition design, or quantitative industry roles is a plus.
  • Excellent written English and ability to craft precise, well-scoped technical questions.
Work Terms

Expected commitment: 10+ hours/week. Asynchronous, fully remote work.

Job Tags

Remote job

Similar Jobs

DHL eCommerce

Fast Track Apply - Senior Staff Accountant Job at DHL eCommerce

Job description: Senior Staff Accountant DHL eCommerce | Weston, FL | Hybrid The Senior Staff Accountant will have responsibility for supporting all parts of our financial reporting process in the US and will be expected to understand internal control design concepts...

Buckhead Pet Pals

Dog Walker - Buckhead Job at Buckhead Pet Pals

 ...Company Overview Buckhead Pet Pals is Atlanta's Premier Dog Walking company. We have been dog walking in Atlanta since 1998! Our goal is to offer the best dog walking experience to our 4 legged clients. Job Summary Dog walkers would be responsible for walking... 

Centerview Health

Family Medicine Physician Job at Centerview Health

 ...centered care and improve health outcomes. Utilize electronic medical records efficiently to document patient interactions and...  ...best: care for patients. We have a unique practice model with scribes, 1:1 medical assistants, and EMR support that allows our physicians... 

Trust 1 Services

Service Manager Plumbing & Hvac Job at Trust 1 Services

 ...We're Not Looking for a Manager. We're Looking for a Coach. The team is here. Are you? At Trust 1 Heating, Cooling & Plumbing, we've built something special. Our service team is strong, talented, and genuinely passionate about what they do. They show up every day... 

Policy Research Associates

Research Assistant - I Job at Policy Research Associates

 ...Research Assistant I (Full Time) Policy Research Associates, Inc. (PRA) is seeking to fill a full-time Research Assistant position to support...  ..., practice, and scholarly audiences. This position is remote, (subject to PRAs remote employee/telework policy)with occasional...