New best practice word collection workshop hosted in Ghana

Photo of Ron Moe and Ernest Nniakyire with word collection results

The Rapid Word Collection Research Group was formed to investigate and promote a best practice method for using Ron Moe’s semantic domains to collect words in a workshop setting. Semantic domains function as categories of words and each domain has a number of questions used to elicit words, such as “What words refer to water coming out of something?” As the question is relayed to the team members in the vernacular, the responses are written down, making it possible to collect between ten and twenty thousand words by running through all 1,800 semantic domains in a workshop setting in two weeks.

For those who have run word collection workshops in the past, the process of collecting words was generally a success. Once the workshop was over, however, the impetus would be lost and very few projects would find the time to deal with the mass of handwritten words in a way that resulted in useful data. Many such word collection attempts are still waiting for processing, and may never be in a useable state. In an attempt to correct this deficiency, the research committee worked out a best practice method that would add glosses (English translations) and perform data entry during the workshop. The new instructions are highly prescriptive and clearly documented and have been published on a new website, http://rapidwords.net to make them readily available to those both in and outside of SIL.

As a first pass, the research group sought out a project in an Anglophone region to showcase the new method. With the help of staff at GILLBT, the Buli language in Sandema, Ghana was selected as a good candidate for the Rapid Word Collection method. The main workshop was held from Jan 23 – Feb 3 2012, with additional work done the weeks before and after the workshop to prepare the team and to process the data. Ron Moe, Art Cooper, and Doug Higby traveled there and followed the method as closely as possible with a team of roughly thirty Bulsa people who performed the various roles of collection, glossing, and typing. In the end, close to 15,000 raw words were collected. Since some of the words appeared in more than one category—as the case will be with multiple senses—the number of lexical entries created was closer to 10,000. All of those entries are glossed in English and have a semantic domain code which enables the words to be grouped in categories. At the end of the workshop, several copies of the draft Buli glossary were printed, and the local Bulsa project made plans to continue work on the database in order to have a more polished result.

The results, however, are not to be measured only by what can appear in print. What the Rapid Word Collection method provides is a lexical tool that can be a tremendous aide to translation work even before any further work is done to polish it up. SIL is currently building tools that will help translators access the lexical corpus in order to propose word choices in much the same way as a professional translator works with a dictionary and thesaurus.

The Buli experience was videoed by a professional team on site as part of the research team’s goal of promoting the new best practice. The video will soon be completed and available on the website in an effort to bring concreteness to a method of building a lexicon so radical—it can be mistaken for folklore.

First meeting of Rapid Word Collection Research Group

Ron Moe of SIL International has developed a set of semantic domains and accompanying questions that enable the rapid elicitation of words. He has been promoting it for years as the Dictionary Development Process (DDP). Although teams that have tried it have not often effectively managed the resulting mass of data, it is unarguably the best and most efficient method of collecting words. Today, despite its potential, the word collection method is woefully underutilized in our organization. Our projects typically collect a lexical database with only a few thousand words despite many years of engagement. By contrast, experience with the rapid word collection method has shown that it is possible to collect over 15,000 words in just two weeks at an early point in a project, providing a rich lexical resource for both language development and Bible translation. The Rapid Word Collection Research Group was formed to find out why the method has not been adopted wholesale, to address any flaws in the method, and repackage and rebrand it in a way that is destined to succeed and to attract outside funders. We're calling it an extreme makeover for DDP.

For a week, our research group worked together to investigate the reports of word collection workshops that had taken place, noting what worked and what didn't work. We went over the existing DDP documentation that explains how to run a word collection workshop, and we distilled its message into a very clearly defined set of instructions based on what has shown itself to be the best practice. We even spent ten minutes arguing over whether or not the workshop leader should staple certain word collection pages together! Our goal was to ensure that not only were these words collected as efficiently as possible, but that they were also entered into the computer at the same time. Far too many workshops have ended in reams of paper with scribbles on them that are now collecting dust in some corner! The revised plan specifies a total number of 30 local participants, including people dedicated to glossing the words, and others dedicated to data entry. We believe that following the new parameters closely will bring a minimum result of 15,000 words---glossed, semantically tagged, and accessible on the Internet for further research. If those requirements required extra time or resources, we included them in the project plan and budget.

It soon became evident that we needed to round up the two weeks to a month by including a week of advance preparation and a following week to clean up the data. In addition, we discussed specific areas where we can further test and confirm this best practice method while documenting it on video. There is funding already available to run the first couple workshops, and we will be actively pursuing major funding via outside agencies who are interested in vernacular education, language documentation, or language development.

All the revised and expanded materials for Rapid Word Collection will soon be available on a multilingual website, including a downloadable form that will help any language project do the planning and budgeting to host such a workshop in their location. As funding becomes available, we hope to be able to provide the means for word collection to any language group who would benefit from it. Stay tuned for the website announcement which will be sometime in the last quarter of 2011!