Your AI still trains
on synthetic hallucinations

When AI learns from AI-generated data, it trains on its own mistakes. Sorus gives you something synthetic data can't — real conversations, real emotions, real accents. Collected in India. Cleaned and ready to train on.

See the dataset → Contribute & get paid

22⁺

Indian languages covered

100^%

Explicitly consented recordings

0^%

Synthetic or staged audio

Real

Unsolicited call recordings

For Contributors

Share a spam call recording. Get paid.

We specifically collect real spam call recordings — telecallers, scammers, and unsolicited sales pitches in Indian languages. Upload one from your phone, and once we verify and clean it, the money goes straight to your UPI.

Only spam and unsolicited call recordings — no personal conversations
Works with any Indian language
UPI payout, not vouchers or points
Sign up with your email address — no phone number required
Request deletion anytime under DPDP Act 2023

Start contributing

For AI Teams

Real call data. No staged scripts. No synthetic noise.

Most voice datasets are either scraped, read-aloud in studios, or generated by AI. Ours aren't. We have real spam calls — actual telecaller conversations in 22+ Indian languages, with all the natural stutters, code-switching, and emotional tone that synthetic data simply cannot replicate.

Real conversations — natural pace, real accents, real emotion
Speaker-separated with timestamps and transcripts
Custom collection runs for specific languages or domains
One-time license or ongoing supply — your call

Talk to us

What we have

Our datasets

Live

Conversational Call Audio

Real phone call recordings across 22+ Indian languages. People talking naturally — not reading scripts. Cleaned, speaker-separated, and transcribed. Good for ASR, TTS, and anything that needs to understand how people actually speak.

22+ Languages Speaker-separated PII removed ASR / TTS

Coming Soon

Video Data

Video recordings from everyday Indian settings — facial expressions, gestures, natural environments. For vision models that need to work in South Asian contexts.

Vision AI Multimodal Labeled

Coming Soon

Text & Written Language

Native Indic script text — not just transliterations. Translations, annotations, and raw corpora across multiple languages. For LLMs that need to handle Indian languages properly.

LLM fine-tuning NLP Annotated

How it works

Straightforward on both ends

For contributors

Sign in with your email

Create an account with your email address — no phone number needed. We send a verification link and you're in.

Upload a spam call recording

MP3, M4A, WAV, MP4 — whatever you have on your phone. We specifically collect real spam and telecaller recordings, not personal conversations.

Get paid via UPI

We review it, remove personal info, and send the money to your UPI once approved.

For AI teams

Tell us what you need

Language, volume, domain, annotation style — send us the specifics and we'll see what we have.

We send you samples first

No commitment required upfront. Evaluate the quality via API or direct file download before deciding anything.

License and access

One-time purchase, recurring supply, or a custom collection run — we'll work with however your team buys data.

People choose to share

Every recording is submitted voluntarily. We never collect passively or without the contributor knowing exactly what they're agreeing to.

Personal info is stripped out

Names, phone numbers, bank details — all removed before any file is logged or stored. The cleaned version is what we keep.

DPDP Act 2023

Compliant with India's data protection law. If a contributor asks us to delete their data, we do it — no questions.

Only licensed partners see the data

We don't sell to brokers or aggregate marketplaces. Data goes directly to verified AI companies under a usage license.