Long-Horizon Agent Trajectories
- Multi-tool
- Real workflows
- Verified
- Task outcomes
- Recovery
- Failure handling
- Step-level
- Action logs
We typically reply within 24 hours
Full traces of verified experts completing long, multi-tool tasks, with actions, tool calls, observations, and outcomes, for agent training and evaluation.
End-to-end trajectories of identity-verified experts completing real long-horizon tasks across software, research, and operations: the actions they take, the tools and APIs they call, the observations they get back, and how they recover when something fails. Each trajectory carries a verified outcome so models can learn what actually works over many steps.
Every trajectory is collected from paid experts with signed consent, captured with full action and tool-call context, and reviewed for secrets and PII before delivery.
Highlights
- Complete multi-step trajectories, not single-turn snapshots
- Actions, tool and API calls, observations, and final outcomes captured
- Real error recovery and replanning by experts
- Verified task success or failure labels on every trajectory
- Secret and PII review before delivery
Task coverage
Long-horizon tasks across software engineering, data work, research, and operations. Coverage extends to specific tools, environments, or task types on request.
Capture and format
Delivered as JSONL trajectories with timestamped actions, tool calls and arguments, observations, and outcomes, optionally paired with screen recordings of the same session.
Annotations
Action and tool-call logs with outcome labels as standard, plus optional sub-goal segmentation, error-recovery tags, and reward signals on request.
Provenance
- Identity-verified expert contributors, paid on verified completion
- Task-level attestations and reviewer agreement captured per record
- PII reviewed and redacted before delivery
- Per-record audit trail and licensable usage rights
Use cases
- Agent training, fine-tuning, and behavior cloning
- Tool-use and function-calling datasets
- Long-horizon planning and recovery research
- Agent evaluation and benchmark construction