Dr. Medari Tham Web Page
  • Home
  • Ka Grammar Khasi
    • Ka Subject bad ka Verb
    • Ka Subject bad ka Object
    • Ki Pronominal Marker
  • NLP Tools
    • Khasi POS Taggers
    • Khasi Shallow Parser
  • References
    • Journals and Conferences
  • Resources
  • seminar/Others
  • About Us
  • More
    • Home
    • Ka Grammar Khasi
      • Ka Subject bad ka Verb
      • Ka Subject bad ka Object
      • Ki Pronominal Marker
    • NLP Tools
      • Khasi POS Taggers
      • Khasi Shallow Parser
    • References
      • Journals and Conferences
    • Resources
    • seminar/Others
    • About Us
  • Home
  • Ka Grammar Khasi
    • Ka Subject bad ka Verb
    • Ka Subject bad ka Object
    • Ki Pronominal Marker
  • NLP Tools
    • Khasi POS Taggers
    • Khasi Shallow Parser
  • References
    • Journals and Conferences
  • Resources
  • seminar/Others
  • About Us

Ka Grammar Khasi
Da ka Jingdro

Ka Grammar Khasi Da ka Jingdro Ka Grammar Khasi Da ka Jingdro Ka Grammar Khasi Da ka Jingdro

Companion Website

Companion WebsiteCompanion Website

Welcome

the "THAM khasi annotated corpus" is now available via the european language resources association (ELRA). 

http://catalog.elra.info/en-us/repository/browse/ELRA-W0321/

OR you may Contact us!

khasi annotated corpus

what is the khasi annotated corpus?

Researchers working in natural language processing will require resources and one such resource is an annotated corpus. The Khasi annotated corpus is annotated with parts of speech using the BIS (Bureau of Indian Standards) tagset which is the standard annotation scheme prescribed for Indian languages.. The corpus was used for language modeling in developing the Khasi POS taggers and Khasi Shallow parser mentioned in the Tools section. The source of the data comprises of Khasi sentences extracted from textbooks prescribed for students in secondary, higher secondary, graduation, and post-graduation in the year 2015-2016.  In total, the corpus contains 83,312 words, 4386 sentences, 5,465 word types which amounts to 94,651 tokens (including punctuations).   


Copyright © 2024 Ka Grammar Khasi - All Rights Reserved.


Powered by GoDaddy

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept