Terac x Berkeley AI: Our Hackathon Challenge

We're excited to partner with the Berkeley AI Hackathon. This year, we're sponsoring a track built around the hardest part of improving a model: getting high-quality human data.

Over the past year, we've seen a clear trend of teams shipping and training their own AI models at hackathons. Traditionally, teams have leaned on Kaggle and Hugging Face for data. This year, we want to raise the bar: collect your own human data, live, and prove it moved the needle.

About Terac

Terac makes human labor accessible on-demand through a simple API. You tell us what job needs to be done and what kind of person you need, and we handle recruitment, screening, verification, and payouts. We power frontier research, and we run human-data and AI-training programs for Fortune 100 companies.

We've raised $9M from Emergence, SignalFire, Audacious, and Z Fellows. We believe that as AI agents start running companies, the bottleneck won't be code, it'll be access to the right human at the right time.

We're hiring in-person engineers in San Francisco. If you build something great this weekend, we want to talk.

The Challenge

This track is about making a model measurably better with human data you collect yourself during the hackathon using Terac. You don't have to fine-tune to win. What matters is that real human judgment improved your system and you can show it.

Here's the shape of it:

Build an annotation environment. A simple app (a Vercel app works great) where an AI generates tasks or outputs and a human labels, rates, ranks, or compares them.
Call the Terac API to bring the annotators. Launch your task on Terac and we'll get real people to complete it. You focus on the environment, not on recruiting or incentives. That part is on us.
Turn that human data into a better model. Use what you collect however fits your project:
- Fine-tuning (SFT) on human-labeled or human-corrected outputs
- Preference ranking and RLHF-style training from human comparisons (DPO, reward models)
- Evals where human-labeled data becomes the benchmark you measure against
- Prompt, routing, or retrieval changes validated by human judgment

Then show a clear before and after: the base model versus your improved one, ideally judged by a fresh round of humans on Terac.

Example Project Ideas

A few directions to spark ideas. You're not limited to these:

Comparison arena. An LLM generates two outputs (SVGs, summaries, code snippets, images); humans pick the better one. Train a reward model or run DPO on the preferences.
Rubric scoring. Humans score model answers against a rubric (helpfulness, factuality, tone). Use the scores as an eval, then fine-tune to lift the weakest dimension.
Correction loop. Humans edit or rewrite model outputs; fine-tune on the corrected pairs to teach the model the better behavior.
Label and classify. Humans label a tricky dataset (intent, sentiment, safety, spam); train a classifier and measure the accuracy gain over the base model.

The Deliverable

By the end of the hackathon, have:

A working annotation environment that anyone can open and use.
A dataset of real human labels collected through Terac.
A measurable result: base model versus improved model (or your eval), with the numbers and a short before and after, ideally a human eval run on Terac.
A 2 to 3 minute demo walking through the environment, the data you collected, and the improvement.

How You'll Be Judged

Criteria	Weight	What We're Looking For
Model Improvement	40%	Improvement over the base model, credibly shown (ideally a before/after human eval via Terac)
Annotation Environment	35%	Task design, creativity, and UX
Use of Human Data	25%	Smart use of the human data: label quality, and getting signal efficiently within your credit

Setup and Guidelines

You'll sign in to Terac as a researcher. Send us a Slack message and we'll give your team credits.
Each team gets $250 in Terac credit to spend on annotations.
Annotators are general-population only. No specialist panels (no doctors, lawyers, and so on), which keeps turnaround fast and your credit going further.
Set up our MCP from the Agent MCP page.
Explore the Terac API in our developer docs.

Prizes

1st place: $1,000 cash + interviews at Terac
2nd place: $500 cash + interviews at Terac

Questions?

Reach out to Moritz Kamke at moritz@terac.com with any questions, and we'll see you there.