previous day
all days

View: session overviewtalk overview

09:00-10:00 Session PL4: Plenary 4: Naoaki Okazaki

Plenary speech 4:

How Deep Learning Changes Natural Language Processing

Naoaki Okazaki (Tokyo Institute of Technology, Japan)

Location: Conference Hall
10:00-10:20Coffee Break

Entrance Lobby of Kagawa International Conference Hall, Tower building 
6F, Takamatsu Symbol Tower

10:20-11:50 Session 5A
Using a Large Social Science Corpus to Build an Automatic Writing Suggestion System

ABSTRACT. There is an increasing need for ESL/EFL writers to publish their papers in various academic disciplines. It is, however, a difficult task for many non-native English speakers to write formal academic research papers. Some CALL (Computer Assisted Language Learning) and EAP (English for academic purposes) researchers have begun to develop new tools for ESL/EFL writers. For example, Mizumoto (2017) has collected some professional journal articles and developed an interesting writing tool called AWSuM (Academic Word Suggestion Machine). In this study, we used a new corpus crawling tool called AntCorGen developed by Laurence Anthony at Waseda University to download 28,000 social science articles form the PLOS ONE open-access database. These research articles were then compiled into a large academic corpus. With this large social science corpus, we were then able to compile useful teaching and learning materials. About 400,000 4-6 words lexical bundles were extracted and loaded into a large online database. This large lexical bundle database was then used to support the development of an online academic writing suggestion tool. This online writing tool allows ESL/EFL students to input any word(s) and the writing suggestion system can automatically suggest some commonly used lexical bundles after the search word(s). The writing suggestion tool was also made available to a group of graduate students in Taiwan. The feedback from students was very positive. Students found that the new tool can help them write their abstracts and papers more easily. However, students also noticed that the system sometimes generated too many suggestions for some search words. To further enhance its performance, we plan to revise the search algorithm. For the long-term development, in addition to social science, various corpora form the other academic disciplines can also be loaded into this writing support system.

The Use of Active Reading to Integrate Language Skills: A Data-Driven Learning Approach

ABSTRACT. The aim of this study is to assess the use of active reading to integrate language skills through data-driven learning (DDL) in a Sophomore English Reading (ER) class at the University of Shimane, which meets twice a week for 45 minutes. The ER class uses two primary learning procedures: a Moodle-based activity and a collaborative paper-based DDL activity. The Moodle-based activity consists of four learning tasks: (1) reading an article; (2) answering questions; (3) sharing learners’ comments through a collaborative learning in a group of four students; and (4) each student writing his or her comments as a first draft. Following the Moodle activity, learners complete the paper-based DDL activity as follows: (1) reading a DDL worksheet that shows concordances in keyword in context (KWIC) format; (2) guessing and writing the keyword (as task 1); (3) writing their discoveries in both English and Japanese to focus on left of the keyword (task 2); (4) writing their discoveries to focus on right of the keyword (task 3); (5) sharing their discoveries in a group of four; (6) reading NSs’ comments concerning the article; (7) sharing their comments; and (8) writing their comments as a second draft. Based on these two activities, this study developed a learner corpora of their first and second drafts. Quantitative and qualitative analyses of learners’ vocabularies and grammar were also conducted using n-gram analysis. This study assesses how learners develop their usage of vocabulary and grammar using an n-gram analysis. It also focuses on their discoveries in DDL and verifies their skills through visualized multiple analyses (correspondence analysis, self-organizing maps, and word clouds). The results suggest that the practice of a productive DDL in English Reading classes integrating four skills is an effective teaching method.

Prefixes and suffixes in Japanese junior high school English textbooks

ABSTRACT. Acquiring morphological knowledge is one of the essential keys to expanding learners’ vocabulary. Learning this knowledge requires a huge amount of exposure to the target language, as well as systematic instructions; however, it is unclear how much input of derived words Japanese junior high school students obtain from their textbooks. This is because previous studies mainly focused on words, but not on prefixes and suffixes (cf. Chujo, 2004). This study aims to show 1) the frequency of prefixes and suffixes in textbooks and 2) the frequency of prefixed and suffixed words in textbooks. By comparing these frequencies with Affix Levels proposed by Bauer and Nation (1993), this study suggests which prefixes and suffixes need more input or instructions. Our textbook corpus consists of all English textbooks for first-, second- and third-grade Japanese junior high school students published by six companies, for a total of 18 textbooks with 222,599 tokens. All prefixes and suffixes in the corpus were identified and categorized by the study done by Laws and Ryder (2014), which followed the semantic and functional categorizations by Stein (2007). After analyzing the corpus, we identified 760 types of derived words with 10 types of prefixes and 79 types of suffixes. Among those prefixes and suffixes, some were highly frequent and attached to various types of bases; for example, -er appears 857 times and is attached to 314 different bases. Though -able is under the same level as er in Affix Levels, it appears only 25 times and is attached to six different bases. These results indicate that students need more exposure to or systematic instructions for those prefixes and suffixes with low frequencies that are attached to few types of bases such as un-, -ish, and -less outside of textbooks.

References Bauer, L. and Nation, P. (1993). Word Families. International Journal of Lexicography, 6(4), pp.253-279. Chujo, K. (2004). Measuring vocabulary levels of English textbooks and tests using a BNC lemmatised high frequency word list. In: J. Nakamura, N. Inoue, & T. Tabata (Eds.), English corpora under Japanese eyes (pp. 231–249). Amsterdam/New York: Rodopi. Laws, J. and Ryder, C. (2014) Getting the measure of derivational morphology in adult speech a corpus analysis using MorphoQuantics. Language Studies Working Papers, 6. pp. 3-17. Stein, G. (2007). A Dictionary of English Affixes: Their Function and Meaning. Munich: Lincom Europa.

10:20-11:50 Session 5B
A Usage-based Approach to Criterial Canonical Construction in an L2 Learner Corpus

ABSTRACT. Usage-based approaches to canonical constructions have been widely studied and the canonical associations between constructions and verbs have been clarified through the corpus-based studies. So far,L2 learners’ canonical construction development has been investigated and described in some detail (e.g., N.Ellis, Römer, & O’Donnell, 2016). Recently, construction grammar researchers have been assuming and investigating canonical embodied associations between argument realization and verbs (e.g., Bergen & Chang, 2013; Perek, 2015). Among them, Radden and Dirven (2007, p. 298) propose eleven canonical verb-argument constructions related to canonical event schema (e.g., Self-motion), which are assumed to be a cognitive linguistic and potentially pedagogical framework. Based on their framework, a previous corpus-based study shows that Japanese EFL learners at the CEFR A1-A2 levels tend to use: mainly Occurrence schema with copula-based SVC (e.g., This is a dream.), Self-motion schema with SV (e.g., I go to school.), Possession schema with SVO (e.g., I have a lot of money.), and Emotion schema with SVO (e.g., I like it.) in L2 writing. As a point of difference, the current study focuses on intermediate and advanced L2 learners’ preferred canonical constructions with different L1 backgrounds and explores criterial canonical constructions across the L2 proficiency levels. The procedure is as follows: (1) the extended version of thirteen canonical construction framework is referred as the theoretical framework; (2) the actual frequency distributions of the thirteen coded canonical constructions in randomly selected 1000 usages are described through the Open Cambridge Learner Corpus (OpenCLC) from the Sketch Engine (Kilgarriff, et al., 2014); (3) the relationships between thirteen canonical constructions and four L2 proficiency levels (CEFR B1-C2) are statistically confirmed through the correspondence analysis; (4) criterial canonical constructions identifying a particular L2 proficiency level are explored through the random forest analysis and the proficiency-level classified concordances extracted from the OpenCLC.

Examining the applicability of the mean dependency distance (MDD) for SLA: A case study of Chinese learners of Japanese as a second language

ABSTRACT. This study examined the application of the mean dependency distance (MDD) between two syntactically related words in a sentence as an effective index of syntactic development for a second language using written data by Chinese learners of Japanese.

Jiang & Ouyang (2017) is the first study which applied MDD to second language acquisition (SLA) data, comparing the writing data from Chinese learner of English. They observed a significant increase in MDD between the first year of junior high school and college sophomores, arguing that MDD can be used for SLA.

To examine the applicability of MDD to SLA, we used the data from the Japanese Parallel Composition Database: 13 Chinese speakers of one-year level (CN1) and 11 of two-year level (CN2) as well as 59 Japanese native speakers (JP). For parsing data syntactically, Cabocha and IPADic were used. When calculating MDD, we excluded sentence fragments from the data.

As a result, we observed an increase of MDD score from 0.92 of CN1 to 1.09 of CN2, validating the applicability of MDD to SLA. However, the score of JP was 1.06, which was almost the same as CN2. Jiang & Ouyang (2017) also reported that higher level learners mark as high as native speakers. Our comparison among CN1, CN2 and JP also showed the same trend that MDD reflected linguistic development at a relatively early stage but MDD may have a ceiling effect at higher level learners.

To confirm this ceiling effect, we further analyzed the composition of 30 Chinese advanced learners and 30 Japanese native speakers from YNU written corpus (Kanazawa, ed. 2014). The MDD of the learners was 0.83, whereas that of the native speakers was 0.79. We conclude that MDD can be effective for beginner to intermediate levels in SLA but not for advanced level.


Jiang, J. & Ouyang, J. (2017) Dependency distance: A new perspective on the syntactic development in second language acquisition, Physics of Life Reviews, 21, 209-210. Kanazawa, H. (ed.) (2014) Nihongo kyoiku no tame no tasuku betsu kakikotoba kopasu (Corpus of task-based writing for Japanese language education), Tokyo: Hitsuji.

10:20-11:50 Session 5C
A Corpus-based Analysis of Hedging in Korean Academic Texts

ABSTRACT. In academic texts, hedging devices such as possible, might, could, etc. play critical functions, such as presenting information as an opinion rather than accredited fact and reserving a discursive space for readers to dispute or accommodate their expectations (Hyland, 1998). In Korean academic writing, hedging is a crucial discourse strategy to master. Among specific grammatical units of hedging, the ones used most frequently are formulaic hedging expressions involving a bound nominal, which lacks concrete nominal meaning and is morpho-syntactically connected to the modifying ending of the preceding word, such as kes ‘thing’ in -ul kes-ita ‘will be the case’ and su ‘way ’ as in -ul su issta ‘can’, etc. The main function of these formulaic hedging devices is to mitigate the degree of commitment, while expressing the writer’s respect toward readers and accommodating the cultural expectation for saving face in Korean society. In spite of rigorous research on English hedging, there is not much research on Korean hedging expressions. This study provides a corpus-based analysis of the linguistic distribution and function of formulaic hedging expressions in Korean academic texts. In particular, we focus on hedging expressions that convey possibility and uncertainty and examine their properties in academic discourse in comparison with other registers. Our corpus study with over 2000 academic papers finds the relative frequency of certain formulaic forms including -ul su issta ‘can’ doubles in academic texts. In contrast with English, much more complex patterns of hedging are observed in Korean; two hedging expressions can combine as in -ul su iss + -ul kes-ita (will + be able to) or frequently accompany lexical verbs of speaking or cognition as in tako ha-l su issta [quotation particle + say + can]. Linguistic features of these complex patterns will be thoroughly examined using the academic corpus.

New English Education in North Korea: A Corpus Approach

ABSTRACT. United States of America (USA) is certainly one of the most hated foreign countries by many North Korean people after the Korean War. Since then, English has been the least popular foreign languages in North Korea because of the political and ideological relationship between USA and North Korea. In a nutshell, English is their enemy’s language. Although North Korean students learn English at primary and secondary schools, it has been very sensitive to speak English freely outside of their classrooms. However this attitude seems to be changing radically after Kim Jong-Un took power in December 2011. According to Rodong Sinmun which is the official newspaper of the Central Committee of the Workers’ Party of North Korea, North Korean government announced that they would only allow to teach English as a foreign language at primary and secondary schools from 2014. In addition, North Korean government significantly changed their old state-authored English textbooks that used to be primarily exercising the cult of personality surrounding North Korean leaders and political ideology. In this talk, we will introduce the new state-authored English textbooks with the help of English teachers from British Council and report some recent progress on the new state-authored English textbooks in North Korea using a corpus approach.

References Cho, J. A., Lee, K. D., Kang. H. J. & Jung, C. K. (2015). Education Policy, Education Curriculum, and Textbooks in the Kim Jong-Un Era. Seoul: Korean Institute for National Unification. Jung, C. K. & Cho, J. A. (2017). National curriculum in the Kim Jong-Un era: General guidelines and English education at the middle school level in North Korea. The Journal of Foreign Studies, 39, 147-166. Jung, C. K. & Kim. S. Y. (2015). A study of English learning of North Korean refugee youth in South Korean high schools, The Journal of Foreign Studies, 32, 65-88

Trends of Interdisciplinary Corpus Research Conducted by Korean Undergraduate Students Majoring in Sciences

ABSTRACT. Corpus linguistics is expanding as an interdisciplinary research area not only for linguists but also for scientists in other fields who are interested in text data. Based on this interdisciplinary approach, a corpus linguistics class has been created for university students majoring in science and engineering in South Korea. While developing the class, a questionnaire including 23 questions was conducted in order to examine the students’ needs and background knowledge of corpus linguistics. Not surprisingly, the questionnaire results showed that students were highly motivated to learn corpus linguistics despite a lack of knowledge in linguistics. This presentation demonstrates the detailed results of the questionnaire, as well as the curriculum and contents of the corpus linguistics class for non-linguistics students. In addition, this presentation shares the analysis of topics and methods in the corpus linguistics research completed by 74 students as their final projects in the 16-week class. The findings indicate the perspectives of scientists who are not from linguistics on corpus linguistics. Finally, the presentation suggests an ideal frame of cooperative work between linguists and scientists from other fields to boost research in corpus linguistics.

10:20-11:50 Session 5D
Is this "problem" giving you "trouble"? A corpus-based examination of the differences between the nouns "problem" and "trouble"

ABSTRACT. This study investigates two near-synonymous general nouns, "problem" and "trouble", both of which are ranked among the general 3,000 word list available from online Oxford Advanced Learner’s Dictionary (OALD). These two lexical items, however, can pose difficulty among EFL learners; even a learner dictionary defines them in terms of each other, suggesting to learners that they can be used interchangeably: "problem: something that is difficult to deal with; something that is a source of trouble, worry, etc." "trouble: a problem, worry, difficulty, etc. or a situation causing this" (Merriam Webster Learner’s Dictionary) The present study, therefore, seeks to find out how "problem" and "trouble" differ in their uses. To explain this systematically, data from the Corpus of Contemporary American English (COCA) were analyzed, with particular attention to the relationship between the lexical items and their discourse properties. This is studied in the light of their frequencies, distribution patterns across text types and collocations with verbs and adjectives. It has been found that "problem" occurs more often in all text types, with spoken and academic discourse topping the list and fiction at the bottom. In contrast, "trouble" is most commonly used in fiction and least used in academic prose. This disparity was in line with findings about the verb collocates of the synonyms: "trouble" collocates more with verbs that are characteristic of conversations pervasive in fiction, i.e. phrasal verbs, modals, contractions. Observation about adjective collocates revealed that "problem" occurs more freely with various adjectives, suggesting that it has a less specific meaning than "trouble". These findings provide pedagogical implications that the two words be taught in relations to registers and with different emphases on their formal patterns; more weight should be given to lexical collocates in the case of "problem" while that given to grammatical patterns when teaching "trouble".

A phraseological account of structural patterns: "it is no NN that" and "there is no NN that"

ABSTRACT. Native speakers of English intuitively combine a word with a structural pattern, as in the case where "wonder" co-occurs with "it is no NN that" and "doubt" with "there is no NN that". The other combinations, "wonder" with "there is no NN that" and "doubt" with "it is no NN that", are unacceptable. For non-native speakers of English, the selection of such collocations (colligations) as "it is no wonder that" and "there is no doubt that" is one of the difficulties in the production of acceptable expressions. This seems to suggest that native speakers communicate using fixed patterns among grammatically possible combinations. This study discusses the combination of nouns with the structural patterns "it is no NN that" and "there is no NN that". The ultimate goal of this research is to empirically elucidate the mechanism of language use, drawing on corpus data. As part of the ongoing research, firstly, nouns co-occurring with "it is no NN that" or "there is no NN that" were extracted from large-scale corpora, such as ukWaC, COCA and BNC. The result shows that nouns frequently occurring in "it is no NN that" rarely appear in "there is no NN that", and that the converse is the case. Furthermore, nouns frequently used in the structure "it is no NN that" are limited: "wonder", "surprise", "coincidence", "accident" and "secret", whereas nouns used in "there is no NN that" do not have such a tendency. Secondly, qualitative as well as quantitative analysis was carried out with these five words. The results indicate a chronological shift as well as semantic differences in synonym pairs: "wonder" and "surprise", or "coincidence" and "accident". In the presentation, based on these findings, language use will be accounted for from phraseological perspectives.

The Transitivity Continuum in English Non-finite Verbal Complement Clauses: A Corpus-based Study

ABSTRACT. The study investigates the transitivity status in four types of English non-finite verbal complementation constructions based on data extracted from ICE-GB. These constructions include: (1) NP + MATRIX V + to V (e.g. I don’t want to go now),  (2) NP + MATRIX V + V-ing (e.g. I don’t mind doing work), (3) NP1 + MATRIX V + NP2 + (to) V (e.g. I don’t want you to leave), and (4) NP1 + MATRIX V + NP2 + V-ing (e.g. I don’t want you dribbling on those). The verbs in the complement clauses fall into seven transitivity subcategories: copula, intransitive, monotransitive, ditransitive, dimonotransitive, complex-transitive and TRANS. Our empirical results show that the four complementation patterns demonstrate different preferences of transitivity types: They show, in the same sequential order above, a continued decrease for monotransitive (62.8%, 56.8%, 50.6%, and 33.4%), complex transitive (5.5%, 3.2%, 2.3% and 0.6%) and TRANS (2.8%, 2.2%, 0.5%, 0.3%). Meanwhile, they reveal an increase in the use of intransitive (21.7%, 27.6%, 32.9% and 60.3%). Based on these observations, we argue that infinitival complementation clauses display relatively higher transitivity than gerundial complementation clauses, and single-NP complementation clauses display relatively higher transitivity than dual-NP complementation clauses.

11:50-13:00Lunch Break
13:00-14:00 Session PL5: Plenary 5: Michael Barlow

Plenary speech 5:

The Individual and the Group in Corpus Linguistics

Michael Barlow (University of Auckland, New Zewland)

Location: Conference Hall
14:00-14:20Coffee Break

Entrance Lobby of Kagawa International Conference Hall, Tower building 
6F, Takamatsu Symbol Tower

14:20-15:50 Session 6A
Analyzing English Verb-Noun Miscollocations Extracted from a Large Chinese/Taiwanese EFL Learner Corpus

ABSTRACT. Previous studies on V-N miscollocations produced by ESL/EFL learners have undoubtedly shed some light on common miscollocation types and their possible causes. However, barriers to further understanding, such as the limited amount of learner data which can be generated through elicitation tasks and the labor-intensive task of manually extracting and examining corpus data, still exist. To provide researchers with a more efficient method of corpus data extraction and analysis, this study used Sketch Engine (SKE) to perform a computer-assisted analysis to identify common Chinese/Taiwanese EFL learners’ V-N miscollocations. The SKE tool can help researchers compare various collocations used in a native corpus and a learner corpus. In this study, we first uploaded the following two corpora onto SKE: one native reference corpus and one NNS writing corpus. The native corpus includes the news sub-corpora taken from COCA (Corpus of Contemporary American English), it is about 85-million words. The NNS writing corpus is a large collection of writing assignments by Chinese and Taiwanese EFL learners’ writing (13-million words) in The EF-Cambridge Open Language Database (CAMEFDATA). Based on the automatic comparison, more than of 150 common types of V-N miscollocation were identified. The extracted collocation errors were then further checked by the researchers. Analysis of these miscollocations revealed that most were verb-based, such as inappropriate verb choice or missing prepositions after verb collocates, and that many were often caused by a negative transfer from the learners’ L1, ignorance of L2 syntactic rules, and the misuse of synonyms. This study demonstrates that using SKE tool to extract V-N miscollocations from a large learner corpus is both feasible and efficient. This method can also be applied to other languages and can further deepen our understanding of L2 learners’ difficulties in this area.

An Analysis of the Vocabulary Used in Oxford Reading Tree: As a Reference for Early English Education in Japan

ABSTRACT. It is often pointed out that vocabulary has strong relationships with language proficiency. In Japan, English is going to be taught at elementary schools as an official subject from 2020. For the compilation of textbooks for this subject, it may serve as a reference to investigate the vocabulary used in the language arts textbooks of native speakers, of which Oxford Reading Tree is a prime example. It is composed of readers divided into 13 stages, and is used at over 80% of the primary schools in the United Kingdom. The purpose of this study, therefore, was to analyze the vocabulary used in Oxford Reading Tree, and then to present empirical data. To complete the study, the author obtained an entire set of 246 textbooks comprising Oxford Reading Tree. All the texts in them were then typed into a database and confirmed. Every title of the stories was subsequently deleted, and simultaneously every heading in each specific story, if found, was removed. The ORT Corpus created in this way was analyzed counting the number of constituent words, examining the readability, and listing up the high frequency words and clusters. In particular, those textbooks included in the first five stages, considered to be elementary school level in Japan, were analyzed on each stage. The results revealed that the ORT Corpus consisted of 3,929 Types and 117,722 Tokens. The following results were also obtained: a) Flesh-Kincaid Grade Level was 1.72. b) Average Words per Sentence was 6.48. c) The three most frequent content word forms were “said,” “was,” and “had.” d) The three most frequent verb-clusters were “began to,” “looked at,” and “wanted to.” Limited to the first stage, they were a) 0.00, b) 3.24, c) “go,” “is,” and “look,” including d) “look at,” “come on,” and “hurry up.”

Extracting Patterns from Transition of Occurrence Frequency of Grammar Items in a Junior High School Textbook in Japan

ABSTRACT. In the recent trends in second language learning frequency effects is considered crucial in converting declarative knowledge to procedural knowledge (Ellis, 2002; DeKeyser, 1998, 2001, 2007). In textbook organisation a cyclic, rather than linear, exposure to new items is desired and emphasised (Valdman, 1978; Kaur, 2001). Despite the ongoing curriculum reformation advocated by the Ministry of Education (MEXT) towards communicative language teaching, the MEXT-authorised junior high school textbooks are still organised by a grammar-based syllabus with linear presentations of grammar items. Teaching analytical faculty is one of the key issues in language learning, yet the linear nature of the target grammar items will make it dificult for learners to remember, reuse, and reformulate the items. This study is an attempt to capture the degree of cyclicity and its variation across grammar items by measuring the frequency of repeated occurrence and by applying a curvi-linear regression model. In my analysis of grammar items in New Horizon 1-3 (Kasajima, et al., 2012) 41 grammar items were counted every time it appeared or was assumed to appear in the main text and exercises. In order to smoothen the data a cumulative frequency was calculated. After multiple trials of nonlinear fitting using JMP, it was found that a cubic regression fits the data most (average squared residual = 0.956). A factor analysis of estimated coefficients of the simple, quadratic, and cubic items as well as the intercept of the cubic regression formulae produced factor scores in a two-dimensional space. A closer look revealed two distinctive patterns: (1) a convex curve pattern for early boosting (eg., ‘must’), and (2) a concave curve pattern for late boosting (eg., adjectival infinitive). EFL teachers’ knowledge of the potential risks of these patterns will help them provide supplementary activities to fill the vacancies of the textbooks provided.

14:20-15:50 Session 6B
Corpus-based study of the spontaneous forms omowareru and omoeru in Japanese

ABSTRACT. The concept of jihatsu “spontaneous” is often used in Japanese grammar to describe the use of a sentence construction in combination with a verbal form in order to refer to an event as occurring by itself, without any agent’s intention. There is no consensus on the grammatical status of jihatsu, (voice, construction, or lexical class of verbs), but it is well established that in contemporary Japanese, a few verbs denoting a mental activity can be used with a valence-modifying suffix to convey this meaning (Teramura 1982, Moriyama 1988, Moriyama & Shibuya 1988, Nihongo Kijutsu Bunpô Kenkyûkai (ed) 2009). The verb omou “think” is often cited among the verbs which can be used in a spontaneous construction, but both omowareru and omoeru can be considered as its spontaneous form. They are often considered as having the same meaning (Martin 1975, Iwasaki 2002), but some studies have tried to show their differences (Adachi 1995). In this presentation, I will show how a corpus-base study can shed a new light on this matter and help to distinguish these two forms. The results I will present are part of a larger study about the means to express a personal opinion in Japanese and French (Tuchais 2014), and will focus on the data obtained from BCCWJ. Adachi 1995 claims that omowareru expresses a more typical spontaneous meaning than omoeru, but admits that linguistic tests don’t provide a clear distinction. My study of a large corpus shows that while some characteristics of jihatsu – such as showing the thought as emerging by itself – are more predominant in omoeru, others – such as backgrounding the subject – appear more clearly in omowareru, and suggests an emerging division of roles, with omowareru becoming a kind of stylistic variant of omou.

References Adachi, Tarô (1995), « “Omoeru” to “omowareru” [Omoeru and omowareru] », in T. Miyajima & Y. Nitta (eds.) (1995), Nihongo ruigi hyôgen no bunpô (jô) Tanbun hen [Grammar of Japanese synonymous expressions, vol. I (Simple sentence)], Tôkyô : Kuroshio Shuppan.121-130. Iwasaki, Shoichi (2002), Japanese, Amsterdam / Philadelphia : John Benjamins. Martin, Samuel E. (1975), A Reference Grammar of Japanese, New Haven : Yale University Press. Moriyama, Takurô (1988), Nihongo dôshi jutsugo bun no kenkyû [A Study of verb-predicate sentences in Japanese], Tôkyô : Meiji Shoin. Moriyama, Takurô, & Shibuya Katsumi (1988), « Iwayuru jihatsu ni tsuite – Yamagata-shi hôgen o chûshin ni [On “spontaneous” – with a focus on the dialect of the city of Yamagata] », Kokugogaku 152, Kokugogakkai, 47-59. Nihongo Kijutsu Bunpô Kenkyûkai (ed) (2009), Gendai nihongo bunpô 2 – Dai 3 bu : kaku to kôbun, dai 4 bu : voisu [Grammar of contemporary Japanese 2 (Third part : Case and sentence structure ; Fourth part : Voice)], Tôkyô : Kuroshio Shuppan. Teramura, Hideo (1982), Nihongo no shintakusu to imi I [Syntax and meaning in Japanese I], Tôkyô : Kuroshio Shuppan. Tuchais, Simon (2014), Comment dire ce que « je » pense en japonais et en français – Étude contrastive de l’expression de l’opinion personnelle, Thèse de doctorat, EHESS.

Corpus Balanced Corpus of Contemporary Written Japanese (BCCWJ), National Institute for Japanese Language and Linguistics (NINJAL), accessed through the search engine Chunagon.

The many faces of Lelouch Lamperouge: A corpus approach to language in Japanese animation

ABSTRACT. Japanese animation, or anime, has been growing in popularity and has been integrated into popular or mass culture internationally. As a growing popular phenomenon, anime receives a substantial amount of scholarly interest in a number of areas including language pedagogy (e.g. Armour and Iida, 2014), fan-translation/fan-subtitling (e.g. Lee, 2011), and so on. However, there are still limited examinations of the discourses in anime and other Japanese telecinematic texts (e.g. television dramas, feature films). Corpus linguistic studies of telecinematic texts have so far been limited to English-language texts (e.g. Bednarek, 2015; Quaglio, 2009). In this paper, a newly constructed corpus of the science fiction anime series, Code Geass: Lelouch of the Rebellion (2006-2007) is used to analyse the discursive construction of the titular character, Lelouch Lamperouge.

The case study focuses on how social meanings indexed by language use in the real world are recontextualised in the anime to construct and convey different aspects of a character’s identity, such as social standing, thoughts and beliefs. Combining corpus linguistics with sociocultural linguistics (Bucholtz and Hall 2005), this study shows that salient aspects of a character can be understood from the character’s established linguistic repertoire. Additionally, it shows that deviations from these established repertoires foreground different stances or personae. More broadly, this study demonstrates that corpus linguistics can be used effectively in a mixed-method approach to examine characterisation and contributes to the still under-represented areas of Japanese telecinematic discourse studies and Japanese corpus linguistic research.

References Armour, W. S. and Iida, S. (2014). Are Australian Fans of Anime and Manga Motivated to Learn Japanese Language?. Asia Pacific Journal of Education, 36 (1): 31–47.

Bednarek, M. (2012). 'Get Us the Hell Out of Here': Key Words and Trigrams in Fictional Television Series. International Journal of Corpus Linguistics, 17 (1), 35–63.

Bednarek, M. (2015). Corpus-Assisted Multimodal Discourse Analysis of Television and Film Narratives. In P. Baker, & T. McEnery (Eds.), Corpora and Discourse Studies: Integrating Discourse and Corpora (pp. 63–87). Basingstoke, UK: Palgrave Macmillan.

Bucholtz, M. & Hall, K. (2005). Identity and interaction: A sociocultural linguistic approach. Discourse Studies, 7(4/5), 585–614.

Lee, H.-K. (2011). Participatory Media Fandom: A Case Study of Anime Fansubbing. Media, Culture & Society, 33 (8): 1131–47.

Piazza, R., Bednarek, M., & Rossi, F. (Eds.) (2011). Telecinematic Discourse: Approaches to the Language of Films and Television Series. Amsterdam/Philadelphia: John Benjamins Publishing Company.

Quaglio, P. (2008). Television Dialogue and Natural Conversation: Linguistic Similarities and Functional Differences. In A. Ädel, and R. Reppen (Eds.), Corpora and Discourse. The Challenges of Different Settings (pp. 189–210): Amsterdam/Philadelphia: John Benjamins.

The varying complexity of the syntactic role of nouns: Evidence from Japanese corpora

ABSTRACT. Expository language contains more nouns than narrative language. Biber (1988) explains this difference as degree of “informational density” (p. 107), which in turn depends on the communicative goals of the speaker and the pressure to speak quickly, such as during dialogue. Speakers produce less cognitively complex and information-rich utterances under pressure. This effect has been documented for syntactic complexity within the noun phrase (Fang et al. 2006). But do the syntactic roles of nouns (Andrews 2007) show similar patterns? We explore these questions by examining the syntactic role of nouns in the academic presentations (971 samples, 3 million words) and monologues (1,714 samples, 3.4 million words) of the Corpus of Spontaneous Japanese, and the dialogues (136 samples, 1 million words) of the Corpus of Kansai Vernacular Japanese. We used a Python program to determine the usage rates of nominative, accusative, dative, and predicate nouns in each speech sample. As a measure of information density, we calculated the noun type-token ratios (TTR). More varied language expresses more information. As a measure of cognitive complexity, we calculated accusative case mark omission (CMO) rates. Case marker omission has been linked to cognitive complexity (Fedzechkina et al. 2017). We tested the strength of the correlations between the grammatical role rates, TTR, and CMO. The following grammatical roles strongly correlated (p < .001) with TTR: predicate and dative. In contrast, the following grammatical roles strongly correlated with CMO: accusative, dative, and topic. Furthermore, the number of strong correlations with TTR decreased in the dialogues, whereas the number of strong correlations with CMO increased in the dialogues. Our results expand previous work on cognitive complexity to also include syntactic role, and give us a better understanding of what is complex language.


Andrews, Avery D. 2007. The major functions of the noun phrase. In Timothy Shopen (ed.) Vol. 1, Language Typology and Syntactic Description: Volume 1: Clause Structure, 132-223. Cambridge: Cambridge University Press. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Fang, Zhihui, Mary J. Schleppegrell & Beverly E. Cox. 2006. Understanding the Language Demands of Schooling: Nouns in Academic Registers. Journal of Literacy Research 38(3). 247-273. https://doi.org/10.1207/s15548430jlr3803_1 Fedzechkina, Maryia, Elissa L. Newport & T. Florian Jaeger. 2017. Balancing effort and information transmission during language acquisition: Evidence from word order and case marking. Cognitive Science 41(2). 416-446. https://doi.org/10.1111/cogs.12346

14:20-15:50 Session 6C
Structural approaches for understanding diachronic culture corpus

ABSTRACT. The research interests in the cultural studies are to understand cultural phenomena, the past and the current of the cultural factors, and the specific patterns of those changes. Therefore, many studies aimed at comparing diverse cultures or conducting studies from a diachronic viewpoint. However, despite the long academic tradition, many studies have been conducted based on the intuition of researchers and subjective literature review. Recently, diverse studies in humanities and social science have actively attempted to analyze the accumulated language data. However, the results from these quantitative analyses fail to provide authentic interpretation. Those researches focused on the extraction of a high-frequency list or a keyword list. The extraction of these lists might be good enough for certain cases. However, when the objectives of a study are to identify the cultural contexts of the corpus and understand the implications of those contexts in the comparative perspectives, the list of frequency is not sufficient. This study intends to propose a macro-analysis for identifying the structure of frequency lists and a micro-analysis for understanding the relationship between vocabularies in the lists. The macro-analysis is composed of three procedures: semantic classification based on the “BunruiGoiHyo”, restructuring process by a cultural framework, and exploration of the changing trends of each classified lexical group. In the micro-analysis, co-occurrence network analysis is used to figure out the relationship of the vocabularies. Lastly, those technical analysis results are interpreted from a cultural anthropological perspective. In the presentation, the difference between previous studies and my case study will be explained. This study is meaningful because it provides a new research methodology in the cultural studies. Additionally, this study has proven that the methodology of corpus linguistics can be applicable to the field of humanities. The results of this study are expected to broaden the corpus linguistics studies.

Historical Analyses of English Negative prefixed words from ME to PE - ‘-able’ adjectives -

ABSTRACT. This paper aims at describing how English negative prefixed words have been used from the Middle English period (ME) to Present-day English (PE) by native speakers of English. It is true that English has some negative prefixes in the modern usage, all of which are said to have survived as relatively productive morphemes in spite of the fact that they share similar meanings. Given that, linguistically speaking, similar morphemes exist in English, it is discernable that two or more distinctive morphemes occur with regard to words as word pairs, such as is the case of in- / un-certain or in- / un- /dis-honest. Moreover, it is interesting to investigate how negative morphemes such as un-, in-, -less and so on, have been used during the course of the long history of English. To quote one of the previous studies on the in- / un- doublets, Kwon (1997) shows fifteen in- / un- doublets from the 14th century to 19th century to be found in Chadwyck-Healey’s English Poetry Full-Text Database. In order to recognize how many such pairs, especially ‘-able’ adjectives, can be found in the modern usage, the British National Corpus (BNC) will be of great use. The Oxford English Dictionary 2nd edition (OED) will also supply necessary information for investigating how negative morphemes, especially, doublets have appeared in the past texts.

Negation in Benjamin Franklin’s Writings: A Stylistic Analysis of his Autobiography and Letters

ABSTRACT. The present paper discusses negation in Benjamin Franklin’s (1706-1790) English, highlighting its varying features from historical and stylistic perspectives. As a multi-talented figure, Benjamin Franklin, one of the Founding Fathers of the United States, left a large amount of writings, of which this research focuses on his autobiography and letters. After explicating how a corpus of his writings has been compiled for this research, this paper concentrates on the analysis of his use of negative constructions. Features explored in this study include: the frequency of negative sentences themselves; the frequency of sentences with not (instead of other forms such as never and no); the use of the auxiliary do; contracted forms of negation (such as won’t and don’t); negative connectives (especially the contrast between neither … or and neither … nor); etc. The analysis shows that there are clear stylistic differences between his autobiography and letters, and also between his letters addressed to Deborah, his wife, and those addressed to other people. Negation with not (rather than never, no, etc.) is, for example, the most frequent in his letters to Deborah, and the least frequent in his autobiography, illustrating the relative formality of the style of his autobiography (cf. Tottie (1991) for negation with not in general). Also, his English presents negative constructions without do, as in I know not, as expected to some extent from the date (cf. Tieken-Boon van Ostade (1987)). Here again, the phenomenon is stylistically conditioned: different genres of his writings show different stages of the decline of the know not construction. Thus, an internal analysis of this kind, namely an analysis of a single author, can demonstrate the process of the historical development of English negative constructions (cf. Iyeiri, Yaguchi & Baba (2015)).

References Iyeiri, Yoko, Michiko Yaguchi & Yasumasa Baba. 2015. “Negation and Speech Style in Professional American English”. Memoirs of the Faculty of Letters, Kyoto University 54: 181-204. Tieken-Boon van Ostade, Ingrid. 1987. The Auxiliary Do in Eighteenth-century English: A Sociohistorical-linguistic Approach. Dordrecht: Foris. Tottie, Gunnel. 1991. Negation in English Speech and Writing: A Study in Variation. San Diego: Academic Press.

14:20-15:50 Session 6D
Lexical bundles in ESL science and technology perspective majors’ writing

ABSTRACT. Recently, second and foreign language teaching practitioners and researchers have become increasingly interested in the creation of learner corpora and the pedagogical use of it (Cortes, 2004). Studies have shown that learner corpora not only make a significant contribution to second language acquisition research (Granger, Hung, & Petch-Tyson, 2004). In tandem with the recent attention to EAP (English for Academic Purposes) in higher education, the analysis of the L2 writers’ writing within academic disciplines has been one of the focuses of corpus studies (Nam, 2017). On one hand, to model and analyze the authentic language use and appropriate genre/ discourse structure of the writing, there has been much research of writing practices in such academic disciplines (Hyland, 2004). On the other hand, there are few studies about writings of college students, whose majors are within specific academic disciplines, for example, science and technology. The purposes of the current study, therefore, is threefold: (1) creating an L2 writing corpus of science and technology major college students; (2) identifying writing characteristics of the science and technology major students writing compared to other majors’; and (3) suggesting pedagogical implications of writing instructions for science and technology major college students. To create such corpus, argumentative essays of placement tests are collected. The essays are contributed by perspective Korean college students who are accepted to a science and technology institution. The essays are graded based on a standardized test rubric, i.e., TOEFL, which later might be useful to compare and contrast the characteristics of other similar L1-based L2 corpora (e.g., The International Corpus of Learner English, ICLE) or different standards for describing language ability (e.g., The Common European Framework of Reference for Languages, CEFR). To identify the stylistic characteristics and discourse organization of the writing, functional and structural lexical bundle categories are examined (Biber, 2006). A pedagogical and pragmatic implication can be also suggested to improve L2 writing quality: the use of hedging; overstating tone; proper use of lexical bundles for the proper argumentative writing (Chen & Baker, 2010).

Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam / New York: John Benjamins. Chen, Y., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and Technology, 14(2), 30-49. Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4), 397–423. Nam, D. (2017). Lexical bundle structures of nuclear science and engineering research article. Language Facts and Perspectives, 40(1), 167-186. Granger, S., Hung, J., & Petch-Tyson, S. (2002). Computer learner corpora, second language acquisition and foreign language teaching. Amsterdam / New York: John Benjamins. Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variations. English for Specific Purposes, 27(1), 4-21.

Genre-based comparative study of German word length

ABSTRACT. As a fundamental lexical feature, word length has been a key research issue in the field of quantitative linguistics for almost 150 years (Liu 2017). More than 70 languages have been analyzed from various perspectives (Chen 2016). However, era-specific or genre-dependent conditions have been not taken into account in many cases (Grzybek 2007).

Based on thousands German-language texts from 1600-1900 that are documented in Das Deutsche Textarchiv, this paper is going to look into word length of the written German language from both the diachronic and synchronic perspectives. The diachronic study will explore the change of word length during 300 years and try to reveal its reasons in cultural and social context; while the synchronic study compares word length in four different text genres, namely belles-lettres, scientific works, functional literature and journalistic articles. Is there any difference in German word length among different genres? Has word length in written German changed during hundreds of year? Does German word length in different genres and different periods of time fit the law of word length distribution and word length sequence? This paper is going to answer these questions.

References 1. Best, Karl-Heinz (2005): Wortlänge. In: Köhler, Reinhard/Altmann, Gabriel/Piotrowski, Rajmund G. (eds.): Quantitative Linguistics - An International Handbook. Pp. 260-273. 2. Chen, Heng (2016): Quantitative Studies of Chinese Word Length. Dissertation - Zhejiang University. 3. Grzybek, Peter (2007): History and methodology of word length studies. In: Grzybek, Peter (eds.): Contributions to the Science of Text and Language: Word Length Studies and Related Issues. Dordrecht: Springer. Pp. 15-90. 4. Liu, Haitao (eds.) (2017): An Introduction to Quantitative Linguistics. Beijing: The Commercial Press.