Publications, and Projects

Publications:

  • J. A. E. Wibawa, S. Sarin, C. F. Li, K. Pipatsrisawat, K. Sodimana, O. Kjartansson, A. Gutkin, M. Jansche, and L. Ha, “Building open javanese and sundanese corpora for multilingual text-tospeech,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 7-12 May 2018, Miyazaki, Japan, 2018, pp. 1610–1614. [Online]. Available: https://ai.google/research/pubs/pub46929
  • K. Sodimana, P. De Silva, S. Sarin, K. Pipatsrisawat, “A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Khmer, Nepali, Javanese, Sinhala, and Sundanese” https://ai.google/research/pubs/pub47347
  • K. Sodimana, P. De Silva, R. Sproat, T. Wattanavekin, A. Gutkin, K. Pipatsrisawat, “Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala and Sundanese Text-to-Speech Systems” https://ai.google/research/pubs/pub47344
  • Alena Butryna, Shan{-}Hui Cathy Chu, Isin Demirsahin, Alexander Gutkin, Linne Ha, Fei He, Martin Jansche, Cibu Johny, Anna Katanova, Oddur Kjartansson, Chenfang Li, Tatiana Merkulova, Yin May Oo, Knot Pipatsrisawat, Clara Rivera, Supheakmungkol Sarin, Pasindu De Silva, Keshan Sodimana, Richard Sproat, Theeraphol Wattanavekin, Jaka Aris Eko Wibawa, “Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview” https://research.google/pubs/pub48928/

Projects

  • Sinhala GPT2
    First open sourced GPT2 model trained on a Sinhala dataset
  • Sinhala BERT
    An open sourced BERT model trained on a Sinhala dataset