Verified human data for frontier AI
Explore and license production-ready datasets sourced from paid, consenting experts, or brief us on a bespoke build we scope from scratch. Every record is rights-cleared and reviewed for PII before delivery, ready to drop into your training and evaluation pipeline.
Computer-Use Workflows
Continuous screen recordings of real practitioners working in 500+ professional desktop and web applications, with action context and end-to-end workflow boundaries.
View datasetGeneral Egocentric Video
First-person recordings spanning 20,000+ unique tasks across households, factories, shops, and more, with synchronised IMU data and roughly 95% hand visibility.
View datasetGitHub Repositories
Real repositories from verified expert engineers, with full commit history, pull-request review threads, and license and authorship provenance on every repo.
View datasetDiverse Egocentric POV Video
First-person dexterous hand activity across 50+ real-world environments.
View datasetEgocentric Vision for Accessibility AI
First-person video from accessibility users with rich metadata.
View datasetEgocentric Gemstone Carving & Lapidary Video
First-person POV footage of gemstone cutting, polishing, faceting, and lapidary craftsmanship.
View datasetCancer Medical Imagery
Close-up melanoma, carcinoma, and keratosis images with rich metadata.
View datasetExpert Preference & RLHF Data
Pairwise comparisons, ratings, and rankings from verified domain experts, with written rationales, for RLHF and reward modeling.
View datasetExpert Reasoning Traces
Step-by-step solutions to hard problems from verified domain experts, with intermediate work, for process supervision and RL.
View datasetLong-Horizon Agent Trajectories
Full traces of verified experts completing long, multi-tool tasks, with actions, tool calls, and outcomes.
View datasetAI-Moderated Interview Transcripts
Transcribed AI-moderated voice interviews with verified participants, including audio, speaker labels, and screener metadata.
View datasetRed-Team Safety Evaluations
Adversarial prompts and expert safety judgments across risk categories, with severity labels, for red-teaming and safety evals.
View datasetEnglish Conversational Speech
Stereo multi-speaker English dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetFrench Conversational Speech
Stereo multi-speaker French dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetGerman Conversational Speech
Stereo multi-speaker German dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetJapanese Conversational Speech
Stereo multi-speaker Japanese dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetSpanish Conversational Speech
Stereo multi-speaker Spanish dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetTelugu Conversational Speech
Stereo multi-speaker Telugu dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetHindi Conversational Speech
Stereo multi-speaker Hindi dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetTamil Conversational Speech
Stereo multi-speaker Tamil dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetMarathi Conversational Speech
Stereo multi-speaker Marathi dialogue with left/right speaker separation, native-reviewed transcripts, and emotion annotations.
View datasetEnglish Monologue Speech
Professional single-speaker English recordings with word-level timestamps and emotion annotations.
View datasetFrench Monologue Speech
Professional single-speaker French recordings with word-level timestamps and emotion annotations.
View datasetGerman Monologue Speech
Professional single-speaker German recordings with word-level timestamps and emotion annotations.
View datasetJapanese Monologue Speech
Professional single-speaker Japanese recordings with word-level timestamps and emotion annotations.
View datasetHindi Monologue Speech
Professional single-speaker Hindi recordings with word-level timestamps and emotion annotations.
View datasetTamil Monologue Speech
Professional single-speaker Tamil recordings with word-level timestamps and emotion annotations.
View datasetMarathi Monologue Speech
Professional single-speaker Marathi recordings with word-level timestamps and emotion annotations.
View datasetChinese Mandarin Speech
Professional Mandarin Chinese recordings for ASR, TTS, and voice-AI training.
View datasetTelugu Expressive TTS Voice
Natural, expressive Telugu speech recordings from native speakers across major regions.
View datasetDoctor-Patient Consultation
Clinical consultation dialogues between doctors and patients in English and Urdu.
View datasetSpanish Finance Conversation
Customer-service conversations in Spanish across finance and banking contexts.
View datasetSpanish Customer Support Conversations
Stereo role-play customer-service dialogues in Spanish with L/R speaker separation.
View datasetSpanish-English Contact Center ASR
Bilingual Spanish-English contact-center conversations with natural code-switching.
View datasetNighttime Traffic Audio Narrations
Urban nighttime audio narrations with ambient noise profiling.
View datasetMusic Library
A large-scale, professionally produced music collection across modern genres, with full construction kits and production toolkits.
View datasetBespoke collection
Need something bespoke? Let's scope it together.
Whether you are training speech models, vision systems, or multimodal agents, our verified contributor network and QA pipeline plug directly into your roadmap. Tell us what you need and we will source, vet, and deliver it.
- End-to-end sourcing, QA, legal, and delivery handled by our team
- Flexible licensing: flat fee, per-unit, or recurring access
- A dedicated team for enterprises with ongoing data needs
Security and compliance
Data is encrypted in transit and at rest, access is scoped and audited, and every delivery is reviewed for PII against a signed-consent trail.
ISO 27001 and CCPA compliance in progress. Request our security documentation for the latest status.