Research Article

Sautex: A Language-Specific Phonetic Matching Algorithm for Resolving Spelling Variations in Hausa Personal Names

Authors

  • Bernard Ephraim Department of Computing Sciences, Admiralty University of Nigeria, Ibusa, Delta State, Nigeria https://orcid.org/0000-0002-7348-5825

    ephraim1989ben@gmail.com

  • Ajah A. Ifeyinwa Department of Computer Science, Ebonyi State University Abakaliki, Ebonyi State, Nigeria

Abstract

Spelling variations in personal names pose significant challenges for information retrieval and record linkage, particularly in low-resource languages such as Hausa. This paper presents a phonetic encoding algorithm, Sautex, specifically adapted to the phonological structure of Hausa, derived from the English Soundex system. Sautex was evaluated using a dataset of 17,591 Hausa name spelling attempts with edit distances ranging from 0 to 4. The system achieves a phonetic match accuracy of 77.20% and 66.86% recall for the positive class for the H* variant and 83.02% and 75.32% correspondingly for the H** variant, outperforming the baseline English Soundex by up to 11.53 and 16.76 percentage points in accuracy and recall for the positive class. These results demonstrate the viability of phonology-aware, language-specific encoding systems for African languages. Further studies might be undertaken to evaluate the performance of this algorithm on English names and its generalisation for other Nigerian names. The research aligns with two United Nations Sustainable Development Goals (SDGs), notably SDG 10 (Reduced Inequalities) by ensuring equitable digital representation of Hausa names, and SDG 9 (Industry, Innovation, and Infrastructure) by advancing localized NLP innovations.
Note: 
i. Sautex is the contraction of the word Sauti, which means Sound in Hausa, and Soundex.
ii. H* indicates values for the Sautex code with the first character included
iii. H** indicates values for the Sautex code with the first character excluded
iv. English is abbreviated as Eng.

Keywords:

African Language Technology Hausa Natural Language Processing Low-Resource Languages Name Matching Phonetic Encoding Record Linkage Soundex Algorithm

Article information

Journal

Journal of Computer, Software, and Program

Volume (Issue)

2(2), (2025)

Pages

25-33

Published

16-11-2025

How to Cite

Bernard, E., & Ajah, A. I. (2025). Sautex: A Language-Specific Phonetic Matching Algorithm for Resolving Spelling Variations in Hausa Personal Names. Journal of Computer, Software, and Program, 2(2), 25-33. https://doi.org/10.69739/jcsp.v2i2.1141

References

Names of Allah. (n.d.). 99 Names of Allah. Retrieved May 28, 2025, from https://99namesofallah.name/

Adamu, A. U. (2020). Suna Linzami: Hausa Names as Ethnographic Identifiers. Algaita: Journal of Current Research in Hausa Studies, Bayero University, 13(1), 1-15. https://auadamu.com/phocadownload/Encyclopedia/2020_Suna_Linzami_Hausa_Names_as_Ethnographic_Identifiers.pdf

Adamu, A. U., & Muhammad-Oumar, A. A. (2023). 1001 Traditional Hausa Names. 10.13140/RG.2.2.25251.73768

Ambalegin, A. (2021). Phonological Analysis of English Vowel Pronunciation. Annual International Conference on Language and Literature (AICLL), 28-45. KnE Social Sciences. https://doi.org/10.18502/kss.v5i4.8665

American Soundex. (n.d.). Wikipedia. Retrieved April 10, 2025, from https://en.wikipedia.org/wiki/Soundex

Apindi, C., & Simwa, A. (2022, August 22). 150 Hausa names and their meanings: List for boys and girls. Legit.ng. Retrieved May 26, 2025, from https://www.legit.ng/1117187-hausa-names-meanings.html

Bernard, E., & Ifeyinwa, A. (2025). Hausa Soundex dataset with evaluation script for reproducibility. Zenodo. https://doi.org/10.5281/zenodo.16812500

Bhatti, Z., Waqas, A., Ismaili, I. A., Hakro, D. N., & Soomro, W. J. (2014). Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System. AENSI-AEB, 8(4), 1147-1155. Retrieved 3 9, 2025, from https://arxiv.org/pdf/1405.3033

Celko, J. (2014). Joe Celko’s SQL for Smarties: Advanced SQL Programming (5th ed.). Elsevier Science. https://doi.org/10.1016/C2013-0-18881-2

Erbasi, B. (n.d.). Sounds of Hausa. General Phonetics Final Project. Retrieved 4 8, 2025, from https://sail.usc.edu/~lgoldste/Ling415/Final_Project/LanguageX/languageX.pdf

Haruna, S. (2023). A Phonological Study of Consonants and Vowels Phonemic Merger in Hausa. British Journal of Multidisciplinary and Advanced Studies, 4(3), 45–59. https://doi.org/10.37745/bjmas.2022.0196

Hausa Submitted Names. (n.d.). Behind the Name. Retrieved April 29, 2025, from https://www.behindthename.com/submit/names/usage/hausa

KasarHausa24. (2024, December 6). See Real Hausa Native Names and their Meanings. KasarHausa24.com. Retrieved May 26, 2025, from https://www.kasarhausa24.com/see-real-hausa-native-names-and-their-meanings/

Kperogi, F. A. (2022). Wonders of Northern Christian Names. Facebook: Esan People’s Blog. Retrieved 5 23, 2025, from https://www.facebook.com/Esanpeopleblog/posts/wonders-of-northern-christian-namesby-farooq-a-kperogihausa-speaking-christians-/360221082775201/

Lawrence, P. (2000). The double metaphone search algorithm. C/C++ Users Journal, 38-43.

Lawson, E. D., Sheil, R. F., & Rogers, P. A. (1965). The Onomastic Treasure of the CIA. Central Intelligence Agency. 10.13140/2.1.1735.0722

Malah, Z., & Rashid, S. M. (2015, 6). Contrastive Analysis of the Segmental Phonemes of English and Hausa Languages. International Journal of Languages, Literature and Linguistics, 1(2), 106-112. https://doi.org/10.7763/IJLLL.2015.V1.21

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (P. Raghavan & H. Schütze, Eds.). Cambridge University Press. https://www-nlp.stanford.edu/IR-book/

Mohammed, U. A. (2001). Aspects Of Segmental Phonology Of Hausa. Retrieved from https://www.researchgate.net/publication/369481573_ASPECTS_OF_SEGMENTAL_PHONOLOGY_OF_HAUSA

My Islam. (n.d.). Learn The 99 Names of Allah (With Meaning and Benefits). My Islam. Retrieved May 28, 2025, from https://myislam.org/99-names-of-allah/

Name Spell. (n.d.). Retrieved from https://name-spell.thrinkle.com

Oketunji, A. F. (2024, 4). Enhancing and Applying Daitch-Mokotoff Soundex Algorithm on Ethnic Names. https://doi.org/10.5281/zenodo.11009946

Raghavan, H., & Allan, J. (2004). Using Soundex Codes for Indexing Names in ASR documents. ACL Anthology. Retrieved from https://aclanthology.org/W04-2905.pdf

Simon, C. (2023). A critique on English homophones and homographs. African Journal of Social Issues, 6(1), 106-113. 10.4314/ajosi.v6i1.7

Soundex System. (2024, January 9). National Archives. Retrieved on April 10, 2025, from https://www.archives.gov/research/census/soundex

Turk, J. (2011). Jellyfish. Retrieved 8 6, 2025, from https://jamesturk.github.io/jellyfish/

Downloads

Views

0

Downloads

0