Inside the gig economy training the next generation of AI

The new gig economy of AI training has, in the space of a year, become one of the fastest-growing categories of paid human work. The scarcity it answers is not raw labor. It is expertise: senior judgment, taste, taxonomy, the kinds of decisions you only know how to make after a decade of doing the job.

A senior backend engineer in Buenos Aires runs a three-hour debugging session, narrating her reasoning as she narrows down a flaky production issue, and earns more than a day's contract rate. A radiologist in Bangalore grades a stack of edge-case scans against a rubric a frontier lab wrote with her in the loop. A litigation associate in Chicago reviews two hundred model-drafted motions, flagging the seven a partner would actually sign her name to.

These are not edge cases. They are the production line behind a new class of AI models, the ones whose remaining headroom is not in pre-training but in post-training: reinforcement learning from expert feedback, expert demonstrations, expert-graded reward models. They are also the latest expansion of what Terac has been building from the start, and they are the clearest preview yet of what the future of skilled labor actually looks like.

The data drought reaches expertise

For three years the field has been quietly running out of the public-internet data the largest models were initially trained on. The most-cited corpora are now closed to commercial training, synthetic-only pipelines have documented failure modes, and the open web is increasingly polluted with model output of its own. Researchers have publicly estimated that the supply of new high-quality text on the open internet will be exhausted before the end of the decade.

But the deeper bottleneck is not text. It is judgment. A frontier model that can write a passable motion needs a senior litigator to tell it which of two near-identical drafts a real judge would prefer, and why. A model that can answer a SQL question needs a staff data engineer to tell it that the answer is correct, idiomatic, and unlikely to lock the table in production. None of that lives on the open web. None of it can be synthesized. It lives in the heads of people who do the work for a living, and it only comes out when those people are paid to put it down.

This is the gap Terac is filling. The same primitives we built for market research (opportunities, submissions, attestations, payouts, AI-moderated review) generalize directly to RL post-training, evaluation, and expert demonstration. The work is paid, distributed, and globally sourced. Experts are paid in dollars, often into mobile-money accounts. Submissions are reviewed within hours, not weeks.

Why expertise is irreducible

Generic crowdwork is not interchangeable with expert work. A junior reviewer can tell you whether a piece of code runs. A senior engineer can tell you whether it is the code a senior engineer would write, and that distinction is the entire game in post-training.

The same is true everywhere the model still falls down. The difference between a good legal brief and one a partner would file. The difference between a treatment plan a resident would propose and the one an attending would actually choose. The difference between a debugging path that finds the bug and a debugging path that teaches you something. These are taste judgments, made fast, by people who have absorbed enough reps to make them without thinking.

Buyers do not want crowds. They want narrow, vetted populations: not "engineers" but "engineers with five years of Postgres in production," not "doctors" but "board-certified radiologists who have read ten thousand head CTs." The pipeline either delivers that, with proof, or it does not deliver anything worth training on.

The data layer for the next generation of AI is not labor. It is expertise, captured under a spec, in a controlled licensing regime, by a paid expert who understood what they were producing and why.

How the pipeline scales

Building an expert pipeline at the scale labs are now asking for is a different problem from running a single high-quality cohort. The constraints are operational. Specs have to be tight enough that experts in twelve time zones produce mutually consistent output. Submissions have to be reviewed within a turnaround that supports a weekly delivery cadence. The output has to be licensed cleanly enough to clear an enterprise legal review. And the supply curve has to keep up: every dataset is a new audience, and the audience does not exist until you go find it.

The pipeline rests on the four product pillars we already run:

Recruiter finds the right people for each opportunity. Whether a buyer needs senior litigators in three U.S. circuits, ICU nurses with sepsis experience, or staff-level Rust engineers shipping in production, the same targeting stack (targeted ads, referrals, outbound calls, network traversal) that sources niche research panels sources niche expert cohorts. Most of the audience for any given lab does not exist on a platform yet. The job is to go find them, qualify them, and bring them in.

Panel is what makes the work durable. Every expert has a context-rich profile, vetted with attestations (Government ID, LinkedIn, employer email, IP) that turn a soft claim into a hard filter. "Board-certified," "currently employed at a top-100 law firm," "ships to production at a public company": each of those can be enforced before an expert ever sees the opportunity, instead of caught after the fact.

Moderator is our AI voice agent and the thing that lets us do this without an army of human reviewers. It screens experts before they start work, conducts structured competence interviews, probes long-horizon submissions (multi-hour coding sessions, multi-turn case reviews, written rationales) for the patterns that distinguish genuine reasoning from copy-paste, and re-verifies seniority over time. It is the same agent that conducts UXR interviews. It has just been pointed at a different rubric.

API / MCP is how buyers consume what the pipeline produces. A frontier lab does not want to manage a vendor; it wants programmatic access to a labor primitive. Buyers define the spec, pull submissions through our API or MCP, and pay out only on verified completion. The same surface that lets an agent commission a research interview lets it commission a graded code review, a rubric-bound legal annotation, or a multi-hour expert demonstration.

Underneath those pillars, the operational shape is consistent. Every opportunity is described as a structured rubric: audience, task taxonomy, deliverable, duration, grading criteria, exclusion list, payout schedule. Experts do not work from English briefs; they work against a machine-readable rubric, the same one the Moderator and reviewer pool grade against. That is what lets the same pipeline produce litigators in Chicago and radiologists in Bangalore against the same QA bar.

Review is split between the Moderator and senior experts who have advanced through a competence ladder, with tooling that shows them only what the rubric asks them to score. Delivery is structured: what ships to a buyer is a manifest, the underlying submissions, annotations, provenance, and consent records, auditable end-to-end against the opportunity it was collected for.

What "10,000 expert-hours a month" means

The pipeline now sustains a monthly run rate of around 10,000 expert-hours, across dozens of professional audiences, with active expert counts that keep climbing in step with demand. The numbers matter less than the fact that this is a steady-state operation, not a one-off campaign. Recruitment, screening, capture, review, delivery, and payout all run continuously. That distinction is what determines whether a buyer can train against the data once or as a recurring component of their roadmap.

What experts see

Work is paid against a published rate card. An opportunity might pay per submission, per hour, or per graded item depending on what the spec asks for, and the rate is sized to the seniority the spec requires. Advancement is structured (expert, senior reviewer, spec author) and is gated on quantitative scores rather than tenure. Payouts settle on a known cadence and are visible in the app from the moment a submission is accepted.

Terac's posture is that this kind of work ought to look like labor, not a side-hustle, and that the consent terms ought to be specific enough that an expert can read and understand what they are agreeing to. That stance has product consequences: the rate card is public, the consent forms are short, and the app surfaces, for every submission, the opportunity it counts against and the buyer category it can be licensed to.

For a senior expert in a sought-after audience, this is already among the highest-paid flexible work in the world. For an early-career expert, it is a way to compound credentials: every accepted submission, every passed screen, every step up the competence ladder is an attestation that travels with the profile.

The future of labor, in outline

Across human history, the unit of work has only gotten shorter. Lifetime employment gave way to the thirty-year career, the thirty-year career gave way to the four-year tenure, the four-year tenure gave way to the gig. The next step, and the one Terac is building toward, is the unit of work becoming a single task: a graded code review, a debug session, an annotated case, a multi-step demonstration. Each one priced, vetted, paid out, and recorded against a portable professional identity.

In a world where AI agents run more and more of the work, they will not need employees. They will need people. They will not need crowds. They will need the specific expert, found, verified, and engaged in seconds. Today that mostly means RL post-training. Soon it means an agent commissioning a senior expert to settle a hard decision in the middle of a longer task, the same way a human team would page a specialist. The interface is the same in both cases: a tightly specified opportunity, a programmatic API, and a vetted human on the other end.

We are at the early innings of this. The supply curve is already wide enough to support new audiences (clinical specialties, legal practice areas, narrow engineering disciplines, language-rich professional content) without a fresh build-out. The unit cost of high-quality expert labor is on a path that puts it within reach of buyers who, twelve months ago, could not have afforded a comparable in-house program. Each new audience is another proof point that the same expert network that ships UXR insights to product teams ships training data to frontier labs and, increasingly, on-demand expertise to agentic products, because the underlying primitive (a vetted human doing real work on a clear spec, on demand) is the same.

The constraint that remains is the one the field has had since the beginning. High-quality expert judgment is irreducibly human in origin. There is no model that can manufacture it from scratch. That is the gap Terac was built to close, and it is the same gap that, taken seriously, reshapes what skilled labor looks like for the next decade.

To discuss a custom expert program, or to evaluate the data Terac ships today, the team is contactable through the platform or at hello@terac.com.