New best practice word collection workshop hosted in Ghana

Photo of Ron Moe and Ernest Nniakyire with word collection results

The Rapid Word Collection Research Group was formed to investigate and promote a best practice method for using Ron Moe’s semantic domains to collect words in a workshop setting. Semantic domains function as categories of words and each domain has a number of questions used to elicit words, such as “What words refer to water coming out of something?” As the question is relayed to the team members in the vernacular, the responses are written down, making it possible to collect between ten and twenty thousand words by running through all 1,800 semantic domains in a workshop setting in two weeks.

For those who have run word collection workshops in the past, the process of collecting words was generally a success. Once the workshop was over, however, the impetus would be lost and very few projects would find the time to deal with the mass of handwritten words in a way that resulted in useful data. Many such word collection attempts are still waiting for processing, and may never be in a useable state. In an attempt to correct this deficiency, the research committee worked out a best practice method that would add glosses (English translations) and perform data entry during the workshop. The new instructions are highly prescriptive and clearly documented and have been published on a new website, http://rapidwords.net to make them readily available to those both in and outside of SIL.

As a first pass, the research group sought out a project in an Anglophone region to showcase the new method. With the help of staff at GILLBT, the Buli language in Sandema, Ghana was selected as a good candidate for the Rapid Word Collection method. The main workshop was held from Jan 23 – Feb 3 2012, with additional work done the weeks before and after the workshop to prepare the team and to process the data. Ron Moe, Art Cooper, and Doug Higby traveled there and followed the method as closely as possible with a team of roughly thirty Bulsa people who performed the various roles of collection, glossing, and typing. In the end, close to 15,000 raw words were collected. Since some of the words appeared in more than one category—as the case will be with multiple senses—the number of lexical entries created was closer to 10,000. All of those entries are glossed in English and have a semantic domain code which enables the words to be grouped in categories. At the end of the workshop, several copies of the draft Buli glossary were printed, and the local Bulsa project made plans to continue work on the database in order to have a more polished result.

The results, however, are not to be measured only by what can appear in print. What the Rapid Word Collection method provides is a lexical tool that can be a tremendous aide to translation work even before any further work is done to polish it up. SIL is currently building tools that will help translators access the lexical corpus in order to propose word choices in much the same way as a professional translator works with a dictionary and thesaurus.

The Buli experience was videoed by a professional team on site as part of the research team’s goal of promoting the new best practice. The video will soon be completed and available on the website in an effort to bring concreteness to a method of building a lexicon so radical—it can be mistaken for folklore.