Red-Team Safety Evaluations
- Risk-categorized
- Taxonomy
- Severity
- Graded labels
- Expert
- Red-teamers
- Reviewed
- Second-expert check
We typically reply within 24 hours
Adversarial prompts and expert safety judgments across risk categories, with severity labels, for red-teaming and release-gating evals.
Adversarial prompts authored by vetted red-teamers alongside expert judgments of model responses across defined risk categories. Each item carries a severity rating and rationale, so the set works both as attack data and as a graded safety evaluation.
Every item is collected from paid experts with signed consent under a controlled protocol, reviewed by a second expert, and screened for PII before delivery. This is safety-research data intended for defensive evaluation and alignment work.
Highlights
- Adversarial prompts from vetted red-teamers across a risk taxonomy
- Expert safety judgments with severity ratings and rationale
- Usable as both attack data and a graded evaluation set
- Second-expert review for label quality
- Controlled collection protocol, consent-cleared and PII-reviewed
Risk coverage
Common safety risk categories for general-purpose models. Coverage extends to specific policies, categories, or domains on request.
Capture and format
Delivered as JSONL with the adversarial prompt, the model response under test, the expert judgment, severity, rationale, and category. Authoring and review run through the Terac platform.
Annotations
Category, severity, and rationale as standard, plus optional policy mapping and remediation notes on request.
Provenance
- Identity-verified expert contributors, paid on verified completion
- Task-level attestations and reviewer agreement captured per record
- PII reviewed and redacted before delivery
- Per-record audit trail and licensable usage rights
Use cases
- Red-teaming and safety evaluation of frontier models
- Release-gating and policy-compliance evals
- Safety classifier and guardrail training
- Alignment and harm-reduction research