The Oxford-NINJAL Corpus of Old Japanese

The Oxford-NINJAL Corpus of Old Japanese (abbreviated ONCOJ) is a long-term research project which aims to develop a comprehensive annotated digital corpus of all extant texts in Japanese from the Old Japanese period; this is a collaborative project with the National Institute for Japanese Language and Linguistics in Japan, NINJAL, led by Oxford. The project has been adopted as an Academy Project, and is supported by, the British Academy.

Old Japanese is the earliest attested stage of the Japanese language, largely the Japanese language of the Asuka and Nara periods of Japanese history (7th and 8th century AD). This is the formative literate period upon which the development of Japanese civilization is based, and these texts are of paramount importance for the study and understanding of the origins and development of civilization of Japan, including language, writing, literature, religion, history, and culture.

The ONCOJ is accessed at this website which both introduces the corpus and gives access to it. The corpus is freely available and open to the public.

The ONCOJ has four core components:

  1. Texts: The corpus will contain all extant texts in Japanese from the Old Japanese period. At present the great majority of texts are included and the remaining are under preparation. The texts are presented in a phonemic transcription and also include original script.
  2. Annotation: The texts are fully lemmatized and have a large amount of information encoded. This currently primarily includes linguistic information (writing mode, phonology, morphology, and syntactic constituency). The expandable format makes it possible to add information of any kind continuously and to the current information will be added literary, biographical, historical, and other relevant information about the texts and their content.
  3. Translations: The texts are being supplied with translations into English; currently well over a third of the texts have side-by-side translations.
  4. Dictionary: A bilingual Old Japanese – English dictionary is being developed alongside and as an integrated part of the corpus. The texts and the dictionary are linked in both directions through unique identifiers assigned to each lemma.

The corpus is searchable through a suite of sophisticated search facilities.