125+ software engineers sourced and vetted over a weekend

Frontier AI Lab·

125+ software engineers sourced and vetted over a weekend

Overview

A leading AI training company approached Terac with two parallel data programs. One was a SWE-Bench evaluation effort requiring strong software engineers, with a meaningful subset of Go programmers. The other was a computer-control program requiring generalist engineers plus an ML and AI research subset. The combined target was over 200 vetted profiles, and the timeline was a weekend.

Hitting that timeline by hand is impossible. The only way to source, screen, and verify hundreds of senior engineers in 72 hours is to put an AI moderator in front of every candidate and let it do the depth interviewing humans would do if they had unlimited time.

The approach: an AI moderator on every call

Every applicant who passed the initial filter was routed to Terac's AI voice moderator for a structured interview. The moderator is not a chatbot screener with multiple-choice questions. It runs a real conversation, follows up on what the candidate says, and probes the parts of their experience that would actually matter for this specific client's work.

For these projects, the interview script was tuned around three things: how the engineer actually used Go in production, how they reasoned about long-horizon agent tasks, and where they had personally hit the limits of current tooling. Candidates were asked to walk through specific projects from their resume, describe the design choices they made, and explain trade-offs in their own words. The moderator pushed back when answers were vague, asked for examples when claims were generic, and stayed on a topic until it had enough signal to score it.

That conversational depth is what separates "this person says they know Go" from "this person has shipped Go services and can defend their choices to a stranger on a 20-minute call."

What the AI actually screened for

Title-based filtering is close to useless at this scale. The AI moderator was calibrated to test for the qualifications that mattered to this client:

  • Working proficiency in English, evaluated live during the call rather than self-reported.
  • Concrete Go experience for the SWE-Bench track, including idiomatic patterns, concurrency choices, and module ecosystem fluency.
  • ML and AI research experience for the Computer Control track, including hands-on work with agent loops, tool use, or evaluations.
  • Multi-task readiness and the patience for a long-horizon program rather than a one-off survey.
  • Time-zone overlap with the client's working hours.

Every filter was applied during the conversation, not after delivery. The moderator graded each dimension in real time and attached a short, written rationale tied to specific things the candidate said.

Depth, at the speed of an API call

The reason this works is that the AI runs hundreds of these interviews in parallel. A human recruiting team scaling to 200 in-depth screens over a weekend would either cut the depth or miss the deadline. The moderator does not. Every candidate gets the same structured probe, every answer is transcribed, every claim is checked against their GitHub and LinkedIn before a profile reaches the client.

The output for the client is not just a name and a resume. It is a profile with a transcript, a per-dimension score, and a written summary of why this person is a fit, what they have actually built, and where their experience is thinner than it looks on paper.

Outcomes

The depth of the AI-led interviews is what made the timeline possible. By Wednesday after kickoff, profiles were flowing into the client's review queue with full interview context attached, and across both projects the qualified pipeline was ahead of the original target. Just as importantly, every Go, Maybe, Needs Review, or No Go decision the client made fed back into the next wave of moderator calibration, so the screen sharpened over the course of the program.

What this unlocks

For frontier AI labs running RLHF, evaluations, and agent training programs, the rate limit on data quality is the depth of vetting you can do before an expert ever touches a task. Terac moves that depth out of the hiring funnel and into an AI moderator that can run an in-depth conversation with every candidate, every time, at the volume the program actually needs.

More Customer Stories

View all
Nexa

192 hours of dairy video labeled in a weekend

How Nexa, a smart cattle implant company, used Terac to source dairy vets and dairy farmers in days and label 192 hours of cattle video in a single weekend.

Nexa
Nexa
Read

Ready to recruit quality experts, fast?

Unlock the power of AI-led screening and gain deeper understanding of who's powering your research.

© 2026 All Rights Reserved by Terac