Santali (Ol Chiki: ), also known as Santhali, is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognised regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6 million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by Pandit Raghunath Murmu in 1925. Ol Chiki is alphabetic, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali in India.


According to linguist Paul Sidwell, Munda languages probably arrived on the coast of Odisha from Indochina about 4000–3500 years ago, and spread after the Indo-Aryan migration to Odisha. Until the nineteenth century, Santali had no written language and all shared knowledge was transmitted by word of mouth from generation to generation. European interest in the study of the languages of India led to the first efforts at documenting the Santali language. Bengali, Odia and Roman scripts were first used to write Santali before the 1860s by European anthropologists, folklorists and missionaries including A. R. Campbell, Lars Skrefsrud and Paul Bodding. Their efforts resulted in Santali dictionaries, versions of folk tales, and the study of the morphology, syntax and phonetic structure of the language. The Ol Chiki script was created for Santali by Mayurbhanj poet Raghunath Murmu in 1925 and first publicised in 1939. Ol Chiki as a Santali script is widely accepted among Santal communities. Presently in West Bengal, Odisha, and Jharkhand, Ol Chiki is the official script for Santali literature & language. However, users from Bangladesh use Bengali script instead. Santali was honoured in December 2013 when the University Grants Commission of India decided to introduce the language in the National Eligibility Test to allow lecturers to use the language in colleges and universities.

Geographic distribution

The highest concentrations of Santali speakers are in the Bhagalpur and Munger districts of southeastern Bihar; Hazaribag and Manbhum districts of Jharkhand; Paschim Medinipur, Jhargram, Purulia, Bankura, and Birbhum districts of West Bengal; and in the Balasore and Mayurbhanj districts of Odisha. Santali speakers are also in Assam, Mizoram, and Tripura states. Santali is spoken by over seven million people across India, Bangladesh, Bhutan, and Nepal. According to 2011 census, India has a total of 7,368,192 Santali speakers. State wise distribution is Jharkhand (3.27 million), West Bengal (2.43 million), Odisha (0.86 million), Bihar (0.46 million), Assam (0.21 million), Maharashtra (0.10 million) and a few thousand in each of Chhattisgarh, Mizoram, Arunachal Pradesh and Tripura.

Official status

Santali is one of India's 22 scheduled languages. It is also recognised as the second state language of the states of Jharkhand and West Bengal.


Dialects of Santali include Kamari-Santali, Karmali (Khole), Lohari-Santali, Mahali, Manjhi, Paharia.



Santali has 21 consonants, not counting the 10 aspirated stops which occur primarily, but not exclusively, in Indo-Aryan loanwords and are given in parentheses in the table below. :* only appears as an allophone of before . In native words, the opposition between voiceless and voiced stops is neutralised in word-final position. A typical Munda feature is that word-final stops are "checked", i. e. glottalised and unreleased.


Santali has eight oral and six nasal vowel phonemes. With the exception of /e o/, all oral vowels have a nasalized counterpart. There are numerous diphthongs.


Santali, like all Munda languages, is a suffixing agglutinating language.


Nouns are inflected for number and case.


Three numbers are distinguished: singular, dual and plural.


The case suffix follows the number suffix. The following cases are distinguished:


Santali has possessive suffixes which are only used with kinship terms: 1st person ''-ɲ'', 2nd person ''-m'', 3rd person ''-t''. The suffixes do not distinguish possessor number.


The personal pronouns in Santali distinguish inclusive and exclusive first person and anaphoric and demonstrative third person. The interrogative pronouns have different forms for animate ('who?') and inanimate ('what?'), and referential ('which?') vs. non-referential. The indefinite pronouns are: The demonstratives distinguish three degrees of deixis (proximate, distal, remote) and simple ('this', 'that', etc.) and particular ('just this', 'just that') forms.


The basic cardinal numbers (transcribed into Latin script IPA) are: The numerals are used with numeral classifiers. Distributive numerals are formed by reduplicating the first consonant and vowel, e.g. ''babar'' 'two each'. Numbers basically follow a base-10 pattern. Numbers from 11 to 19 are formed by addition, "gel" ('10') followed by the single-digit number (1 through 9). Multiples of ten are formed by multiplication: the single-digit number (2 through 9) is followed by "gel" ('10'). Some numbers are part of a base-20 number system. 20 can be "bar gel" or "isi". 30 can be "pe gel" (3 × 10) or "isi gel" (20 + 10) (or "mit' isi gel" (1 × 20 + 10)).


Verbs in Santali inflect for tense, aspect and mood, voice and the person and number of the subject and sometimes of the object.

Subject markers

Object markers

Transitive verbs with pronominal objects take infixed object markers.


Santali is an SOV language, though topics can be fronted.

Influence on other languages

Santali, belonging to the Austroasiatic family, has retained its distinct identity and co-existed with languages belonging to the Indo-Aryan family, in Bengal, Odisha, Jharkhand and other states. This affiliation is generally accepted, but there are many cross-questions and puzzles. Borrowing between Santali and other Indian languages has not yet been studied fully. In modern Indian languages like Western Hindi the steps of evolution from Midland Prakrit Sauraseni could be traced clearly. In the case of Bengali such steps of evolution are not always clear and distinct, and one has to look at other influences that moulded Bengali's essential characteristics. A notable work in this field was initiated by linguist Byomkes Chakrabarti in the 1960s. Chakrabarti investigated the complex process of assimilation of Austroasiatic family, particularly Santali elements, into Bengali. He showed the overwhelming influence of Bengali on Santali. His formulations are based on the detailed study of two-way influences on all aspects of both languages and tried to bring out the unique features of the languages. More research is awaited in this area. Notable linguist Khudiram Das authored the Santali Bangla Samashabda Abhidhan''' (), a book focusing on the influence of the Santali language on Bengali and providing a basis for further research on this subject.
Bangla Santali Bhasha Samparka
() is a collection of essays in E-book format authored by him and dedicated to linguist Suniti Kumar Chatterji on the relationship between the Bengali and Santali languages.

