The data layer
for real-world conversation

Sorus sources and licenses voice, video, and text datasets from India — collected with explicit consent, processed for quality, and delivered ready for model training.

22+
Indian languages
100%
Consent-based collection
3
Data modalities
24h
Contributor payout SLA
For Contributors

Your data has value. We pay for it.

Upload voice recordings, videos, and other content in your language. After review and anonymization, earnings go directly to your UPI within 24 hours.

Start contributing
For AI Companies

Proprietary Indian datasets, ready to train on.

Access curated, consent-based multimodal datasets across 22+ Indian languages. Speaker-diarized, PII-redacted, and annotated. API or direct delivery.

Talk to our team
Data catalog

What we collect and license

Live

Voice & Conversational Audio

Real-world conversational recordings across 22+ Indian languages. Speaker-diarized, transcribed, and PII-redacted. Built for ASR, TTS, and conversational AI.

22+ Languages Speaker-diarized PII-redacted ASR / TTS
Coming Soon

Video & Visual Data

Structured video datasets from diverse Indian contexts — expressions, gestures, scenes, and environments. Built for vision and multimodal AI.

Vision AI Multimodal Labeled
Coming Soon

Text & Linguistic Corpora

Native-language text datasets, translations, and annotated corpora across Indic scripts. Ideal for LLM fine-tuning and NLP research.

LLM Fine-tuning NLP Annotated
How it works

Simple on both sides

For contributors
01
Sign up with your phone

OTP-based login, no registration form. Your phone number is the only identity we hold.

02
Upload your recording

MP3, M4A, WAV, MP4 and more. Flag the language and call type before submitting.

03
Get paid via UPI

Once reviewed and anonymized, earnings transfer to your UPI ID — usually within 24 hours.

For AI companies
01
Share your data brief

Tell us the language, modality, volume, domain, and annotation requirements you need.

02
Receive a sample dataset

We deliver samples before any commitment. Evaluate quality and fit via API or direct file transfer.

03
License and access

One-time purchase, subscription, or a custom collection brief — we adapt to your procurement process.

Explicit consent

Every recording is submitted with active contributor opt-in. No passive or inferred collection.

Full anonymization

All PII — names, phone numbers, account details — is redacted before any data leaves our pipeline.

DPDP Act 2023

India's Digital Personal Data Protection Act compliant. Contributors can request deletion at any time.

Encrypted in transit and at rest

Data is delivered only to verified, licensed AI company partners. No third-party sharing.

For AI companies

Request access to our datasets

Tell us about your project and we'll respond within one business day with a sample dataset and licensing details.

For urgent requests, write directly to us at hello@sorus.io. Data deletion requests go to delete@sorus.io.
Received. We'll be in touch within one business day with sample data.