Long-Horizon Agent Trajectories

Name: Long-Horizon Agent Trajectories
Creator: Terac
Keywords: Text, Code

TextCodeEnterprise

Multi-tool
Real workflows: Verified
Task outcomes: Recovery
Failure handling: Step-level
Action logs

We typically reply within 24 hours

Full traces of verified experts completing long, multi-tool tasks, with actions, tool calls, observations, and outcomes, for agent training and evaluation.

End-to-end trajectories of identity-verified experts completing real long-horizon tasks across software, research, and operations: the actions they take, the tools and APIs they call, the observations they get back, and how they recover when something fails. Each trajectory carries a verified outcome so models can learn what actually works over many steps.

Every trajectory is collected from paid experts with signed consent, captured with full action and tool-call context, and reviewed for secrets and PII before delivery.

Highlights

Complete multi-step trajectories, not single-turn snapshots
Actions, tool and API calls, observations, and final outcomes captured
Real error recovery and replanning by experts
Verified task success or failure labels on every trajectory
Secret and PII review before delivery

Task coverage

Long-horizon tasks across software engineering, data work, research, and operations. Coverage extends to specific tools, environments, or task types on request.

Sample task types

Multi-file code changesData analysis pipelinesWeb research and synthesisTicket-to-PR workflowsTool and API orchestrationDebugging and incident responseSpreadsheet and BI tasksDocument drafting

Capture and format

Delivered as JSONL trajectories with timestamped actions, tool calls and arguments, observations, and outcomes, optionally paired with screen recordings of the same session.

Annotations

Action and tool-call logs with outcome labels as standard, plus optional sub-goal segmentation, error-recovery tags, and reward signals on request.

Provenance

Identity-verified expert contributors, paid on verified completion
Task-level attestations and reviewer agreement captured per record
PII reviewed and redacted before delivery
Per-record audit trail and licensable usage rights

Use cases

Agent training, fine-tuning, and behavior cloning
Tool-use and function-calling datasets
Long-horizon planning and recovery research
Agent evaluation and benchmark construction

Provided by Terac

terac.com

Explore more datasets

View all

CodeEnterprise

GitHub Repositories

Real repositories from verified expert engineers, with full commit history, pull-request review threads, and license and authorship provenance on every repo.

TextEnterprise

Expert Preference & RLHF Data

Pairwise comparisons, ratings, and rankings from verified domain experts, with written rationales, for RLHF and reward modeling.

TextEnterprise

Expert Reasoning Traces

Step-by-step solutions to hard problems from verified domain experts, with intermediate work, for process supervision and RL.

AudioEnterprise

AI-Moderated Interview Transcripts

Transcribed AI-moderated voice interviews with verified participants, including audio, speaker labels, and screener metadata.

TextCustom

Red-Team Safety Evaluations

Adversarial prompts and expert safety judgments across risk categories, with severity labels, for red-teaming and safety evals.

VideoEnterprise

Computer-Use Workflows

Continuous screen recordings of real practitioners working in 500+ professional desktop and web applications, with action context and end-to-end workflow boundaries.