Expert Preference & RLHF Data

Name: Expert Preference & RLHF Data
Creator: Terac
Keywords: Text

TextEnterprise

200+
Expert domains: Pairwise + rank
Comparison types: Rationale
Per judgment: Agreement
Scores included

We typically reply within 24 hours

Pairwise comparisons, ratings, and rankings from identity-verified domain experts, with rationales, for RLHF and reward modeling.

Human preference data produced by identity-verified experts in the domain you are training on: side-by-side comparisons, Likert ratings, and full rankings over model outputs, each with a written rationale. Because raters are vetted for real expertise, the signal reflects domain judgment rather than crowd guesswork.

Every judgment is collected from paid experts with signed consent, captured with rater identity and attestation provenance, and reviewed for PII before delivery. Inter-rater agreement and rater metadata ship with the data.

Highlights

Preference judgments from vetted domain experts, not generic crowdworkers
Pairwise comparisons, scalar ratings, and full rankings on your rubric
Written rationale attached to every judgment
Rater attestation, identity provenance, and inter-rater agreement included
Consent-cleared and PII-reviewed before delivery

Domain coverage

200+ specialized domains spanning STEM, medicine, law, finance, software, and the trades. Coverage extends to specific domains, rubrics, or seniority profiles on request.

Sample judgment formats

Pairwise preferenceBest-of-n rankingLikert ratingsRubric scoringError annotationPointwise critiqueTie-breakingCalibration sets

Capture and format

Delivered as JSONL with prompts, candidate outputs, the expert judgment, rationale, rater ID, and rubric. Tasks run through the Terac platform with attestation gating per domain.

Annotations

Judgment, rationale, and rubric scores as standard, plus optional error taxonomies, severity labels, and confidence scores on request.

Provenance

Identity-verified expert contributors, paid on verified completion
Task-level attestations and reviewer agreement captured per record
PII reviewed and redacted before delivery
Per-record audit trail and licensable usage rights

Use cases

RLHF and reward-model training with domain-grounded signal
Preference-tuning and DPO datasets
Evaluation rubrics and leaderboard judging
Reward-model calibration and agreement analysis

Provided by Terac

terac.com

Explore more datasets

View all

TextEnterprise

Expert Reasoning Traces

Step-by-step solutions to hard problems from verified domain experts, with intermediate work, for process supervision and RL.

TextEnterprise

Long-Horizon Agent Trajectories

Full traces of verified experts completing long, multi-tool tasks, with actions, tool calls, and outcomes.

AudioEnterprise

AI-Moderated Interview Transcripts

Transcribed AI-moderated voice interviews with verified participants, including audio, speaker labels, and screener metadata.

TextCustom

Red-Team Safety Evaluations

Adversarial prompts and expert safety judgments across risk categories, with severity labels, for red-teaming and safety evals.

VideoEnterprise

Computer-Use Workflows

Continuous screen recordings of real practitioners working in 500+ professional desktop and web applications, with action context and end-to-end workflow boundaries.

VideoEnterprise

General Egocentric Video

First-person recordings spanning 20,000+ unique tasks across households, factories, shops, and more, with synchronised IMU data and roughly 95% hand visibility.

Expert Preference & RLHF Data

Highlights

Domain coverage

Capture and format

Annotations

Provenance

Use cases

Ready to recruit quality experts, fast?