Khmer Misspelling Correction Model with Deep Learning Approach to Improve Khmer Typing Accuracy in Educational Context

Starting: 01/11/2022
PhD Student: Seanghort Born
Advisor(s): Madeth May

We are moving toward digital usage in industrial 4.0, and it has become ingrained in our lives. Prior to the invention of digital devices, we kept track of information via handwriting, but now we type. Typing errors, especially misspellings, can cause text, typing assurance, and social disruption. In comparison to other major languages, there are few studies and tools available on Khmer misspelling. It is a challenge for Khmer language users who make typographical errors in their Khmer typing due to the complexity of the Khmer language and the lack of good spelling suggestions, especially in the academic. The purpose of this research is to develop a Khmer language model capable of correcting Khmer misspelling words in the education field with three objectives:

  • to increase Khmer typists’ confidence,
  • to participate in and promote national literature in academic,
  • and to develop assistant tools for the most widely used text wring applications, which include the Microsoft Word add-on, and the Chrome extension.

We will offer a model for correcting Khmer misspellings using a deep learning technique that enables the computer to self-learn using the BERT model utilizing data of 100,000 words from the Khmer dictionary.