Article section
Sautex: A Language-Specific Phonetic Matching Algorithm for Resolving Spelling Variations in Hausa Personal Names
Abstract
Spelling variations in personal names pose significant challenges for information retrieval and record linkage, particularly in low-resource languages such as Hausa. This paper presents a phonetic encoding algorithm, Sautex, specifically adapted to the phonological structure of Hausa, derived from the English Soundex system. Sautex was evaluated using a dataset of 17,591 Hausa name spelling attempts with edit distances ranging from 0 to 4. The system achieves a phonetic match accuracy of 77.20% and 66.86% recall for the positive class for the H* variant and 83.02% and 75.32% correspondingly for the H** variant, outperforming the baseline English Soundex by up to 11.53 and 16.76 percentage points in accuracy and recall for the positive class. These results demonstrate the viability of phonology-aware, language-specific encoding systems for African languages. Further studies might be undertaken to evaluate the performance of this algorithm on English names and its generalisation for other Nigerian names. The research aligns with two United Nations Sustainable Development Goals (SDGs), notably SDG 10 (Reduced Inequalities) by ensuring equitable digital representation of Hausa names, and SDG 9 (Industry, Innovation, and Infrastructure) by advancing localized NLP innovations.
Note:
i. Sautex is the contraction of the word Sauti, which means Sound in Hausa, and Soundex.
ii. H* indicates values for the Sautex code with the first character included
iii. H** indicates values for the Sautex code with the first character excluded
iv. English is abbreviated as Eng.
Keywords:
African Language Technology Hausa Natural Language Processing Low-Resource Languages Name Matching Phonetic Encoding Record Linkage Soundex Algorithm
Article information
Journal
Journal of Computer, Software, and Program
Volume (Issue)
2(2), (2025)
Pages
25-33
Published
Copyright
Copyright (c) 2025 Bernard Ephraim, Ajah A. Ifeyinwa (Author)
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Names of Allah. (n.d.). 99 Names of Allah. Retrieved May 28, 2025, from https://99namesofallah.name/
Adamu, A. U. (2020). Suna Linzami: Hausa Names as Ethnographic Identifiers. Algaita: Journal of Current Research in Hausa Studies, Bayero University, 13(1), 1-15. https://auadamu.com/phocadownload/Encyclopedia/2020_Suna_Linzami_Hausa_Names_as_Ethnographic_Identifiers.pdf
Adamu, A. U., & Muhammad-Oumar, A. A. (2023). 1001 Traditional Hausa Names. 10.13140/RG.2.2.25251.73768
Ambalegin, A. (2021). Phonological Analysis of English Vowel Pronunciation. Annual International Conference on Language and Literature (AICLL), 28-45. KnE Social Sciences. https://doi.org/10.18502/kss.v5i4.8665
American Soundex. (n.d.). Wikipedia. Retrieved April 10, 2025, from https://en.wikipedia.org/wiki/Soundex
Apindi, C., & Simwa, A. (2022, August 22). 150 Hausa names and their meanings: List for boys and girls. Legit.ng. Retrieved May 26, 2025, from https://www.legit.ng/1117187-hausa-names-meanings.html
Bernard, E., & Ifeyinwa, A. (2025). Hausa Soundex dataset with evaluation script for reproducibility. Zenodo. https://doi.org/10.5281/zenodo.16812500
Bhatti, Z., Waqas, A., Ismaili, I. A., Hakro, D. N., & Soomro, W. J. (2014). Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System. AENSI-AEB, 8(4), 1147-1155. Retrieved 3 9, 2025, from https://arxiv.org/pdf/1405.3033
Celko, J. (2014). Joe Celko’s SQL for Smarties: Advanced SQL Programming (5th ed.). Elsevier Science. https://doi.org/10.1016/C2013-0-18881-2
Erbasi, B. (n.d.). Sounds of Hausa. General Phonetics Final Project. Retrieved 4 8, 2025, from https://sail.usc.edu/~lgoldste/Ling415/Final_Project/LanguageX/languageX.pdf
Haruna, S. (2023). A Phonological Study of Consonants and Vowels Phonemic Merger in Hausa. British Journal of Multidisciplinary and Advanced Studies, 4(3), 45–59. https://doi.org/10.37745/bjmas.2022.0196
Hausa Submitted Names. (n.d.). Behind the Name. Retrieved April 29, 2025, from https://www.behindthename.com/submit/names/usage/hausa
KasarHausa24. (2024, December 6). See Real Hausa Native Names and their Meanings. KasarHausa24.com. Retrieved May 26, 2025, from https://www.kasarhausa24.com/see-real-hausa-native-names-and-their-meanings/
Kperogi, F. A. (2022). Wonders of Northern Christian Names. Facebook: Esan People’s Blog. Retrieved 5 23, 2025, from https://www.facebook.com/Esanpeopleblog/posts/wonders-of-northern-christian-namesby-farooq-a-kperogihausa-speaking-christians-/360221082775201/
Lawrence, P. (2000). The double metaphone search algorithm. C/C++ Users Journal, 38-43.
Lawson, E. D., Sheil, R. F., & Rogers, P. A. (1965). The Onomastic Treasure of the CIA. Central Intelligence Agency. 10.13140/2.1.1735.0722
Malah, Z., & Rashid, S. M. (2015, 6). Contrastive Analysis of the Segmental Phonemes of English and Hausa Languages. International Journal of Languages, Literature and Linguistics, 1(2), 106-112. https://doi.org/10.7763/IJLLL.2015.V1.21
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (P. Raghavan & H. Schütze, Eds.). Cambridge University Press. https://www-nlp.stanford.edu/IR-book/
Mohammed, U. A. (2001). Aspects Of Segmental Phonology Of Hausa. Retrieved from https://www.researchgate.net/publication/369481573_ASPECTS_OF_SEGMENTAL_PHONOLOGY_OF_HAUSA
My Islam. (n.d.). Learn The 99 Names of Allah (With Meaning and Benefits). My Islam. Retrieved May 28, 2025, from https://myislam.org/99-names-of-allah/
Name Spell. (n.d.). Retrieved from https://name-spell.thrinkle.com
Oketunji, A. F. (2024, 4). Enhancing and Applying Daitch-Mokotoff Soundex Algorithm on Ethnic Names. https://doi.org/10.5281/zenodo.11009946
Raghavan, H., & Allan, J. (2004). Using Soundex Codes for Indexing Names in ASR documents. ACL Anthology. Retrieved from https://aclanthology.org/W04-2905.pdf
Simon, C. (2023). A critique on English homophones and homographs. African Journal of Social Issues, 6(1), 106-113. 10.4314/ajosi.v6i1.7
Soundex System. (2024, January 9). National Archives. Retrieved on April 10, 2025, from https://www.archives.gov/research/census/soundex
Turk, J. (2011). Jellyfish. Retrieved 8 6, 2025, from https://jamesturk.github.io/jellyfish/
Stecab Publishing

Call for Papers
Author's Guidelines
Manuscript Template
References Guideline