Preserving African Languages through Technology: A Conversation with Olanrewaju Samuel, Founder of Linguistics Island, a Community for Linguists.

Kabod is a language service provider  that specialises in providing quality language services in African languages as well as other Western languages. Our primary goal is to elevate African culture through our languages.

Mr. Olanrewaju Samuel is a graduate student at the Department of Linguistics, University of Toronto (Canada). His research focuses on the fascinating fields of Phonology, Phonetics and Computational Linguistics, which explore the intricate systems of speech sounds, their production and the application of computational methods to linguistic analysis. Alongside his academic endeavors, Mr. Samuel founded the Linguistics Island. This community serves as a platform to both educate individuals on Linguistics and offer African language’ scholars valuable opportunities.

Kabod: Can you provide us with a brief background or information about your community?

I have been running Linguistics Island since 2014. The idea behind the community is that we want to connect with linguists from all around Africa, the ones that are working on African languages, those who will be interested in coming to talk to us about their languages, share opportunities and grow the African space. This vision emerged even before the advent of me getting a lot more into natural language processing and computational linguistics.

So, the community has been growing since then and now, we have over five hundred members from what began as a small group of just five or six individuals.

 

We’ve had a lot of successes and challenges too. But basically, I think our successes surpass our challenges. From scholarship to conferences, from conferences to opportunities, to job opportunities and many things.

Currently, we are working on a volunteer project which I call “Yawa Linguistics.” “Yawa’’ is in Pidgin language that means “what is going on linguistics?” And we are doing that to create the first open-source dataset that is tonal, that is on proverbs and is a different task on the parallel languages we have represented in the group. These are Hausa, Yoruba, Ogba and other languages in Nigeria.

Kabod: What progress have you made so far with regards to Yawa Linguistics?

In terms of progress, everybody is getting their hands dirty in terms of data.  We have a collaborative platform we are building dataset from. I had a symposium where I explained the modalities, the reason, the methodologies and other things that are needful to push the project. So, in terms of progress, I would say we are already on the 20 percent level.

We just started not quite long and it is a big project which I’m not hoping to finish in the next 6 months because I want it to be representative enough; I want it to be written by Africans because Africans must learn how to write their own history and not to let another person talk about their history. So let us own it.

Kabod: Going back to your experience in Rwanda, I know you taught natural language processing for linguists in Rwanda. How was the experience for you?  Can you share some key takeaways from your teaching experience in Rwanda?

 I taught Natural Language Processing (NLP) in a nontechnical way, which is without coding, without the use of technological jargons, emphasizing Africans, particularly Rwandese, creating and managing their own data independently.

By leveraging linguistic tools familiar to linguists, not typical coding languages, we bridged the NLP gap through collaboration between NLP practitioners and native speaker linguists from Rwanda.

Major highlights included widespread participation from across Rwanda and successful training sessions at the University of Rwanda. One key thing is that Africa is growing and I’m very happy to be part of the community and people that are growing Africa in terms of technology.

Kabod: As the leader of the Yoruba data sets creation for the Aya Project, can you share insight into some challenges that you faced and how you overcame them. While you are speaking on that, could you highlight the role your community members played in supporting the Aya project?

The Aya project was a very fantastic initiative by Cohere AI. The aim was to create a dataset collaboratively with native speakers. The challenges I faced started from the dataset itself. The datasets were not properly written. So we had to go through series of orthographic corrections and separating verbs from objects.

Additionally, we faced difficulties with finding the right keyboard. What we did was to use Microsoft shortcut to deal with the tone and the subscripts that are on some of the special letters.

The other challenge was maintaining contributor motivation. So I had a lot of people rushing in at the initial stage of the dataset creation, but at some point, people were not really motivated.

To address these, I organized weekly meetings to remind contributors of the project’s significance, resulting in periodic surges in participation. These challenges underscored the importance of community involvement, dataset integrity and sustained motivation in linguistic projects.

Kabod: How do you envision the future of African linguistics, especially in the context of advancements in computational linguistics and AI chatbots. 

I envision a future where Africa plays a pivotal role in AI and global development. Despite current challenges such as limited resources and expertise, initiatives like Masakhane and Mbaza NLP are driving progress. These efforts aim to empower African voices and create technologies that reflect our culture and language.

I believe the future holds more African datasets, products and projects where Africans are actively involved in their creation, shaping a more inclusive and representative digital landscape.