Sorus sources and licenses voice, video, and text datasets from India — collected with explicit consent, processed for quality, and delivered ready for model training.
Upload voice recordings, videos, and other content in your language. After review and anonymization, earnings go directly to your UPI within 24 hours.
Access curated, consent-based multimodal datasets across 22+ Indian languages. Speaker-diarized, PII-redacted, and annotated. API or direct delivery.
Real-world conversational recordings across 22+ Indian languages. Speaker-diarized, transcribed, and PII-redacted. Built for ASR, TTS, and conversational AI.
Structured video datasets from diverse Indian contexts — expressions, gestures, scenes, and environments. Built for vision and multimodal AI.
Native-language text datasets, translations, and annotated corpora across Indic scripts. Ideal for LLM fine-tuning and NLP research.
OTP-based login, no registration form. Your phone number is the only identity we hold.
MP3, M4A, WAV, MP4 and more. Flag the language and call type before submitting.
Once reviewed and anonymized, earnings transfer to your UPI ID — usually within 24 hours.
Tell us the language, modality, volume, domain, and annotation requirements you need.
We deliver samples before any commitment. Evaluate quality and fit via API or direct file transfer.
One-time purchase, subscription, or a custom collection brief — we adapt to your procurement process.
Every recording is submitted with active contributor opt-in. No passive or inferred collection.
All PII — names, phone numbers, account details — is redacted before any data leaves our pipeline.
India's Digital Personal Data Protection Act compliant. Contributors can request deletion at any time.
Data is delivered only to verified, licensed AI company partners. No third-party sharing.
Tell us about your project and we'll respond within one business day with a sample dataset and licensing details.