Expert Preference & RLHF Data
- 200+
- Expert domains
- Pairwise + rank
- Comparison types
- Rationale
- Per judgment
- Agreement
- Scores included
We typically reply within 24 hours
Pairwise comparisons, ratings, and rankings from identity-verified domain experts, with rationales, for RLHF and reward modeling.
Human preference data produced by identity-verified experts in the domain you are training on: side-by-side comparisons, Likert ratings, and full rankings over model outputs, each with a written rationale. Because raters are vetted for real expertise, the signal reflects domain judgment rather than crowd guesswork.
Every judgment is collected from paid experts with signed consent, captured with rater identity and attestation provenance, and reviewed for PII before delivery. Inter-rater agreement and rater metadata ship with the data.
Highlights
- Preference judgments from vetted domain experts, not generic crowdworkers
- Pairwise comparisons, scalar ratings, and full rankings on your rubric
- Written rationale attached to every judgment
- Rater attestation, identity provenance, and inter-rater agreement included
- Consent-cleared and PII-reviewed before delivery
Domain coverage
200+ specialized domains spanning STEM, medicine, law, finance, software, and the trades. Coverage extends to specific domains, rubrics, or seniority profiles on request.
Capture and format
Delivered as JSONL with prompts, candidate outputs, the expert judgment, rationale, rater ID, and rubric. Tasks run through the Terac platform with attestation gating per domain.
Annotations
Judgment, rationale, and rubric scores as standard, plus optional error taxonomies, severity labels, and confidence scores on request.
Provenance
- Identity-verified expert contributors, paid on verified completion
- Task-level attestations and reviewer agreement captured per record
- PII reviewed and redacted before delivery
- Per-record audit trail and licensable usage rights
Use cases
- RLHF and reward-model training with domain-grounded signal
- Preference-tuning and DPO datasets
- Evaluation rubrics and leaderboard judging
- Reward-model calibration and agreement analysis