BharatGen-India’s first sovereign multilingual and multimodal AI driven Large Language Model

  • Union Minister of State (Independent Charge) for Science & Technology, Jitendra Singh, during his visit to IIT Bombay, described BharatGen as “India’s first sovereign multilingual and multimodal AI-driven Large Language Model (LLM)”.
  • He emphasised that this project is a national effort to build AI that reflects India’s linguistic, cultural and social diversity, rather than relying on foreign-centric models.

What is BharatGen — and what makes it “sovereign” and “multimodal”

  • BharatGen is being developed under the National Mission on Interdisciplinary Cyber‑Physical Systems (NM-ICPS) of the Department of Science & Technology (DST), implemented through a Technology Innovation Hub at IIT Bombay.
  • The model is designed to support over 22 Indian languages and integrate three major modalitiestext, speech, and document-vision — enabling it to understand, generate and interpret information in the same way Indian citizens naturally communicate.
  • This multimodal capability aims to overcome limits of earlier, English-centric or unimodal AI, making BharatGen more inclusive and relevant for India’s multilingual, multi-modal society.

🔹 Funding, consortium & institutional backing

  • The project has been allocated ₹235 crore via the Technology Innovation Hub at IIT Bombay.
  • The consortium behind BharatGen is broad and pan-Indian, including institutions such as IIT Madras, IIT Kanpur, IIIT Hyderabad, IIT Mandi, IIT Hyderabad, IIM Indore, IIT Kharagpur and IIIT Delhi — showing a coordinated deep-tech push across the country.

🔹 The role of Bharat Data Sagar — India’s data sovereignty effort

  • A key component of BharatGen is Bharat Data Sagar — described as “one of the most ambitious data initiatives undertaken in the country.”
  • The purpose of Bharat Data Sagar is to build, collect and curate high-quality India-centric datasets: capturing regional, linguistic, cultural and social diversity — thereby ensuring that India retains complete ownership and control over its digital knowledge resources, rather than relying on external or global datasets.
  • This approach is intended to improve AI performance on Indian languages and contexts — while safeguarding data sovereignty and giving India long-term autonomy in its digital knowledge infrastructure.

🔹 What has been built so far: Models & Use-cases

According to the presentation reviewed by the minister:

  • Param-1: A foundational text model with around 2.9 billion parameters, trained on a massive multilingual corpus (with substantial Indian content).
  • Speech models:
    • Shrutam — a 30-million parameter automatic speech recognition (ASR) system.
    • Sooktam — a 150-million parameter text-to-speech (TTS) model, available in nine Indic languages.
  • Document-vision model: Patram — India’s first document-vision model, with seven billion parameters, designed to understand and interpret complex documents in Indian formats.

Source: PIB

Written by 

Leave a Reply

Your email address will not be published. Required fields are marked *