Red-Team Safety Evaluations

Name: Red-Team Safety Evaluations
Creator: Terac
Keywords: Text

TextCustom

Risk-categorized
Taxonomy: Severity
Graded labels: Expert
Red-teamers: Reviewed
Second-expert check

We typically reply within 24 hours

Adversarial prompts and expert safety judgments across risk categories, with severity labels, for red-teaming and release-gating evals.

Adversarial prompts authored by vetted red-teamers alongside expert judgments of model responses across defined risk categories. Each item carries a severity rating and rationale, so the set works both as attack data and as a graded safety evaluation.

Every item is collected from paid experts with signed consent under a controlled protocol, reviewed by a second expert, and screened for PII before delivery. This is safety-research data intended for defensive evaluation and alignment work.

Highlights

Adversarial prompts from vetted red-teamers across a risk taxonomy
Expert safety judgments with severity ratings and rationale
Usable as both attack data and a graded evaluation set
Second-expert review for label quality
Controlled collection protocol, consent-cleared and PII-reviewed

Risk coverage

Common safety risk categories for general-purpose models. Coverage extends to specific policies, categories, or domains on request.

Sample risk categories

Harmful instructionsPrivacy and PIIMisinformationBias and fairnessSecurity misuseSelf-harmJailbreak attemptsPolicy edge cases

Capture and format

Delivered as JSONL with the adversarial prompt, the model response under test, the expert judgment, severity, rationale, and category. Authoring and review run through the Terac platform.

Annotations

Category, severity, and rationale as standard, plus optional policy mapping and remediation notes on request.

Provenance

Identity-verified expert contributors, paid on verified completion
Task-level attestations and reviewer agreement captured per record
PII reviewed and redacted before delivery
Per-record audit trail and licensable usage rights

Use cases

Red-teaming and safety evaluation of frontier models
Release-gating and policy-compliance evals
Safety classifier and guardrail training
Alignment and harm-reduction research

Provided by Terac

terac.com

Explore more datasets

View all

TextEnterprise

Expert Preference & RLHF Data

Pairwise comparisons, ratings, and rankings from verified domain experts, with written rationales, for RLHF and reward modeling.

TextEnterprise

Expert Reasoning Traces

Step-by-step solutions to hard problems from verified domain experts, with intermediate work, for process supervision and RL.

TextEnterprise

Long-Horizon Agent Trajectories

Full traces of verified experts completing long, multi-tool tasks, with actions, tool calls, and outcomes.

AudioEnterprise

AI-Moderated Interview Transcripts

Transcribed AI-moderated voice interviews with verified participants, including audio, speaker labels, and screener metadata.

VideoEnterprise

Computer-Use Workflows

Continuous screen recordings of real practitioners working in 500+ professional desktop and web applications, with action context and end-to-end workflow boundaries.

VideoEnterprise

General Egocentric Video

First-person recordings spanning 20,000+ unique tasks across households, factories, shops, and more, with synchronised IMU data and roughly 95% hand visibility.

Red-Team Safety Evaluations

Highlights

Risk coverage

Capture and format

Annotations

Provenance

Use cases

Ready to recruit quality experts, fast?