Planning a Rapid Word Collection workshop

The Rapid Word Collection model is an easy one to follow, but it does take work to set up and host a workshop that will result in more than 10,000 words. SIL’s goal is to support you in your efforts, provided that the set of lexical data you create is made freely available under the Creative Commons Attribution-ShareAlike license. Simply put, this means that people are free to copy, distribute, and transmit the data for non-commercial purposes, and are free to adapt or modify it as needed.

Running a word-collection workshop can be costly. While the motivation and effort must come from inside the language community, it can be difficult for participants to suspend their means of livelihood for two weeks or more in order to help at the workshop.  In addition, the participants need to be fed, and in some cases must be given lodging.  SIL International does not have internal funding to help with these costs, but our goal is to match your project with a funding agency that cares about language development.  By filling out a funding-request form, you can estimate the cost of running an RWC workshop, and hopefully we can match your request with a donor and with a consultant who can help you in its implementation.


To know if your language group is a candidate for this workshop or not, you must analyze the language situation.  It is important that the following factors be present in order to proceed:

1. A motivated language community

It is important that the language community already be interested in preserving and promoting their language. Rapid Word Collection isn’t something that can be used as a last-ditch effort to preserve a dying language if there isn’t enough interest. The more motivated the community, the better the results you will get.

2. A functional orthography

If the language has not been written before, there is no way that you will be able to make much progress by guessing at all the sounds and letters. This is not to say that the orthography must be approved and published. It could be provisional, or could be similar enough to a different language that writing it is fairly straightforward for those literate in the other language.

3. A sufficient number of literate people

There needs to be at least one Scribe for each of the six word-collection teams.

4. A sufficient number of participants who are bilingual

There need to be at least eight people—twelve would be ideal—who are bilingual in the vernacular language and the language the RWC Questionnaire is written in. They are needed to serve as Team Leaders and Glossers. (The RWC Questionnaire has been translated into about 10 major languages.)

5. A Workshop Coordinator

The Workshop Coordinator is someone from the area who takes on the responsibility of planning the workshop, identifying the participants, and arranging for specific ones to come for an additional three days of training prior to the main workshop.  (The Coordinator will also receive training at that time.)


The workshop requires a minimum of 35 participants who perform the various roles that are described in the Workshop Coordinator’s Manual. They are as follows:

  1. Coordinator* (1) - oversees the word-collection process
  2. Logistics Manager (1) - takes care of the needs of the participants
  3. Record-keeper* (1) - Records progress and assures adherence to data standards
  4. Glossers* (4-6) - add glosses to the collected words in a language of wider communication (LWC).
  5. Typists* (4-6) - enter the collected words into computers
  6. Team Leaders* (6) - responsible for keeping collection teams on track and for interpreting the semantic domain questions to their team members.
  7. Scribes* (6) - responsible for writing the words down as they are “found” by various members of the team
  8. Language Experts (12-18) - participants who help collect words
  9. Spelling Expert (2-3) - several Scribes or Language Experts who have mastered the vernacular orthography will be invited to stay for the final cleanup week.

*Roles marked with an asterisk will attend three days of training the week prior to the workshop.

The Event

Pre-workshop Training

In most cases, a consultant will come to train the Workshop Coordinator, Team Leaders, and other key participants the last three days of the week prior to the workshop. The consultant is someone who has had prior experience in a word-collection workshop and who will help the Coordinator to implement the practices that are recommended in the documentation. Three days of training are planned, introducing the RWC method, setting up the environment, and organizing materials. The consultant will stay for the first week of the word-collection workshop, and will be responsible for training the Language Experts on the first day. S/he will serve as a resource person to the Coordinator to resolve any issues that arise during the word-collection phase, and will prepare the Coordinator for the task of editing the collected data during the week after the word collection is finished.

The Workshop

All of the participants traveling will arrive the day prior to the workshop, and those local will come the morning of. At the beginning of the first day, the new participants (i.e., the Language Experts) will be oriented into the word-collection method, after which the word-collecting will begin, starting with some of the easier domains. Divide the participants into six Teams of four or five people, making sure that each Team has a Leader and a Scribe. The semantic domains and questions are organized into a series of folders, and each Team works on one folder at a time. The folders contain a number of pages from the RWC Questionnaire, stapled together, and covering a series of related semantic domains and their questions designed to help with the elicitation of words in the vernacular language. Although the domain description and questions are in the LWC, the Team Leader will convey (in the vernacular) the sense of the domain description to the group, and the word-collection process begins, as the Language Experts say the words in their language that come to mind. When the Language Experts find it difficult to think of more words related to that domain, the Team Leader may translate one or more of the questions in that portion of the Questionnaire in an effort to help the Language Experts recall more words that they know which pertain to that domain. The words are written down onto a specially formatted Response Sheet. When the flow of words and expressions related to a particular domain slows to a trickle and all of the catalytic questions have been used, the group moves on to the next domain, recording words belonging to that category on a different sheet of paper so as not to mix the contents of one domain with another. When all of the domains in the folder have been dealt with, the folder is turned in to the Record-keeper.

The Record-keeper makes note of the progress on a spreadsheet, then passes the folder to the Glossers, who add a short description or word in the LWC for each vernacular word written on the Response Sheets. When one of the Glossers completes his work on a folder, he passes it to the Typists, who enter both words and glosses into the computer. The teams are encouraged from time to time with public announcements of their progress and projections of how many words can be expected at the end of the workshop, based on progress to that point. When all the domains are completed, if there is time left over, some of the early domains can be reviewed and more words added.  At the end of the workshop, a formatted draft of the lexicon is printed directly from either FLEx or WeSay and put on display during the closing ceremony.


Admittedly, numerous errors will exist in the database at this point. The word-collection workshop itself has two steps that help catch errors. Those who glossed the words were not the same people who furnished the words, and were already able to resolve some of the spelling issues for words they could not understand. Secondly, the typists had to confirm the legibility of the words during data entry. Unfortunately some errors will still slip through. So rather than ending the workshop here, several of the Scribes or Language Experts who understand the vernacular orthography well will stay on for an additional week. They will make corrections by hand to the draft document, and the typists will make those corrections in the database. By the end of this final week, the data should be in a relatively clean state, at which point two or three copies will be printed out and bound for the language community, stating, of course, the draft nature of the document. The database will be given to the sponsoring agency or intermediary in order to be put up on the internet in the next 30 days.

Ongoing Work

The computers that are provided for the word-collection workshop will have the WeSay or FLEx lexical database program installed, and these computers—if purchased via funding for the workshop—will normally remain the property of the language community at the end of the workshop. The software allows collaboration to continue on the lexical data by adding parts of speech, grammatical information, example sentences, and full definitions. Work is currently in progress on a web portal that will allow the same data set to be edited from any web browser. Because the data is protected under the Creative Commons license, it will remain available for any ongoing effort at language development in the community.