APCLC 2020: ASIA PACIFIC CORPUS LINGUISTICS CONFERENCE 2020
PROGRAM FOR WEDNESDAY, FEBRUARY 12TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:40 Session 10A
Location: Choi Young Hall
09:00
Investigating Stylometric Features of PhD Theses across Disciplines: A Corpus-based Study

ABSTRACT. Stylometry, also known as computational stylistics, studies textual styles and writing habits based on the application of quantitative methodology to linguistic features (Liu & Xiao, 2018). In the field of EAP (English for Academic Purposes), knowledge is constructed and negotiated within each discipline (Hyland, 2000), which is in turn reflected in learners’ academic writing through certain stylometric features. This study focuses on a stylometric dyad, namely the activity and descriptivity of text (Zörnig et al., 2015), and conducts a corpus-based analysis on PhD theses. Our corpus is made up of 120 PhD theses, incorporating 30 texts under each of the four broad disciplinary headings (i.e., hard pure, soft pure, hard applied, soft applied). Our primary aims are to find out: (1) Based upon activity and descriptivity, what stylometric features are displayed in PhD theses within each (broad) discipline? (2) Inferred from these stylometric features, are there any pedagogical implications for theses writing instruction? On the whole, this study is expected to contribute to the current knowledge of PhD theses’ stylometric features and have practical implications for future theses writing instructions.

References: Hyland, K. (2000). Disciplinary discourses: Social interactions in academic writing. London, England: Longman. Liu, Y., & Xiao, T. (2018). A stylistic analysis for Gu Long’s Kung Fu novels. Journal of Quantitative Linguistics, 1-30. Zörnig, P., Stachowski, K., Popescu, -I. -I., Mosavi, M., Mohanty, P., Kelih, E., . . . Altmann, G. (2015). Descriptiveness, activity and nominality in formalized text sequences. Lüdenscheid, Germany: RAM-Verlag.

09:25
A Corpus Analysis of Featured Wikipedia Articles for Use in an Academic Reading and Writing Course

ABSTRACT. In academic reading and writing courses in undergraduate universities, students are often assigned to conduct online and offline research and then to synthesize the research into a written report. These reports are usually written for pedagogical purposes only, but in a university in Thailand, a similar assignment is given to first-year students to write and publicly publish their research reports as Wikipedia articles. However, it is not always clear to the students which syntactic and lexical choices will best follow the Wikipedia writing model. This is especially true for learners who are reading and writing English as their second language. In an attempt to provide guidance for these students and their teachers, a corpus of high-quality Wikipedia articles in different categories of topics was created, and various corpus linguistics techniques were employed, such as key-grams, key parts of speech, and key semantic domains. The results of the analysis provide several useful linguistic features for students to focus on when reading and writing academic texts, such as useful phrases, guidelines for different verb tenses, and lexical items to aid in justifying the importance of the topic in order to meet Wikipedia standards. The results also highlight the potential for students to use corpus-based tools to conduct their own data-driven investigations into the English language.

09:50
Phraseology, Discourse Function and Writer Identity in Chinese EFL Writers’ Academic Discourse

ABSTRACT. There is much research that uses investigations of the first person pronouns (FPPs) I and we in academic discourse to cast light both on the interaction between writer and the imagined reader and on the self-representation of the writer (e.g. Ivanič 1998; Harwood 2007). Classroom instruction in ESL/EFL contexts, however, tends to neglect teaching these uses in favour of achieving objectivity in academic discourse by avoiding personal pronouns (Hyland 2002).

Based on a self-built learner corpus (456 graduation dissertations written by Chinese EFL learners, 4,193,413 tokens in total), this paper reports research that explores the phraseology and pragmatic functions of the two FPPs, ‘I’, ‘We’, and the verbs collocating with them. It is found that the phrases comprising ‘I’, ‘We’ and their verb collocates are used for the following four textual functions: organizing text, reporting research, interpreting research and construing the knowledge community. Further, these discourse functions are mostly presented by seven phrasal types. For example: I + base form verb (e.g. I think) and we + ability modal + verb (e.g. we can see).

Three writer identities proposed in this study are: Self as Reporter, Self as Text Organiser and Self as Evaluator. A correlation between the identified discourse functions and writer identities is observed in this study. This observation helps to pinpoint writer identity presented by the FPPs through the phraseology and their corresponding discourse functions. The findings of this study may provide some new insights into the teaching and learning of academic writing in EAP classrooms.

10:15
A Comparative Study of N-grams in Abstracts Written by Researchers and Graduate Students

ABSTRACT. As English is regarded as a lingua franca in the academic field of engineering and science, graduate students in Japan or other EFL situations are often required to write their thesis or dissertation or academic reports in English. Various learning materials for academic writing are now available; however, few studies have been conducted to examine the linguistic features of texts written by Japanese graduate students. This study aims to clarify what graduate students have learned and what they should have learned by comparing N-grams frequently used in the abstracts written by professional researchers and graduate students majoring in the six main fields of study; applied physics, chemical engineering, computer science, architecture, material engineering, mechanical engineering. We created two corpora: one includes 60 abstracts written by professional researchers and published in high prestigious journals, and the other written by Japanese graduate students. Though we cannot deny the possibility that mentors gave some linguistic advice to their students or actually edited their abstracts, the results of the study showed that Japanese graduate students tend to use particular N-grams, such as “it is found that,” “it is concluded that,” or “it is expected that,” significantly more frequently than professional researchers. It can be suggested that Japanese graduate students have learned particular fixed expressions and they sometimes use such expressions repeatedly even when they should use another one.

09:00-10:40 Session 10B
Chair:
09:00
The definite ‘the’: Contextual and schematic usage

ABSTRACT. Traditional explanations and teaching methods for English articles (‘a/an, the’) have significant shortcomings, particularly for the definite ‘the.’ These articles (or hereafter, “delimiters”) express specific contextual meanings and nuances, and ‘the’ in particular has a number of specialized meanings or uses. Some uses are typically described as idiomatic, generic, or cultural knowledge. A corpus study will examine its uses and compare them with other delimiter patterns, and an alternative approach will be presented within a cognitive linguistic framework and schema theory. Specific hypotheses about ‘the’ will be tested with COCA corpus data, e.g.: some instances of use or non-use of ‘the’ can be explained by activity or event types in context, how nouns are conceptualized in specific contexts, semantic entitivity, and specialized schemas such as exemplar, prototype and background schemas. These factors can lead to a more coherent explanation of delimiter patterns, particularly for the many functions of ‘the,’ within a cognitive linguistic framework, which can be of practical value for linguistic analysis and for various needs within applied linguistics such as L2 instruction.

09:25
A cross-linguistic study on the asymmetrical distribution of personal pronouns among English, Korean and Chinese: based on a parallel corpus of TED Talks

ABSTRACT. Personal pronouns(PPs) in different languages, associated primarily with a common grammatical-person criterion, share a pervasive three-way classification but quite divergent usage. While the literature concentrates on quantitative analysis on the similarities and differences of PPs in one or two specific languages, this paper aims to explore the motivations and effects of the antisymmetric distribution of PPs in a balanced parallel corpus. Generally, multilingual transcripts on the same contents manifest a corresponding distribution owing to the information consistency. If there are different frequencies about PPs for the same words, why would different languages prefer different usage to express it? Did the extra more or less PPs in a specific language cause a redundancy or loss? Is that a grammatically obligatory function or pragmatically adaptive strategy? To find the answer, 93 Ted Talks transcripts of 2018 are transformed into a trilingual parallel corpus. As a result of a corpus-based analysis, comparing with 3rd person, 1st and 2nd person PPs are more sensitive to changes in topics and genres. However, in completing PPs positions, English performs in a quite “well-behaved” way while Korean reveals the most “unstrained” nature. Besides, Chinese shows a free attitude towards it. Some typical instances indicate that Chinese would rather fulfill than omit them while Korean would prefer doing oppositely because it sounds more “natural”. In conclusion, PP represents an “old” information usually. When and how to recall this information could separately depend on grammatical rules and speech strategies and finally be affected by accuracy and efficiency alternatively.

09:50
A study on the degree of emphasis adverb use - Focused on ‘neomu(너무)’, ‘aju(아주)’, ‘maeu(매우)’, ‘doege(되게)’-

ABSTRACT. This paper examines the meaning and the function of the degree adverb ‘neomu(너무)’, ‘aju(아주)’, ‘maeu(매우)’, and ‘doege(되게)’ by examining the data extracted from the words and phrases of the "New Yonsei corpus". The study focused on comparing the differences of each adverb, ‘neomu(너무)’, ‘aju(아주)’, ‘maeu(매우)’, and ‘doege(되게)’, in their uses with sentence-types, co-existed words, the way they are used in meaning. It also figures out the gap of the uses between spoken-korean and written-korean. The purpose of the paper is, through the study, to show the differences between the adverb in significance-degree of the emphasis, Also it reveals that they are classificationally chosen to carry out the expressing the degree by the sentence-type and context in spoken and written language.

10:15
Grammatical and lexical explorations on ‘outer circle’ Englishes and ‘expanding circle’ Englishes: A corpus-based comparative analysis

ABSTRACT. According to Morrison (2002), with an estimated 350 million native speakers and 1.9 billion competent speakers, the spread of the English language around the world over the last few decades has been swift and steady. English has become the lingua franca of our time. It is the international language of the airlines, the sea and shipping, computer technology, science, and indeed communication generally. In the course of its spread, English has diversified by adapting to local circumstances and cultures, resulting in different varieties of English in every country. This study analyzed 50 selected research papers from professional language and linguistic academic journals to portray the differences between Kachru’s (1994) outer circle and expanding circle Englishes. The selected outer circle Englishes include those of Bangladesh, Malaysia, the Philippines, India, and Singapore; and the selected expanding circle Englishes are those of China, Indonesia, Japan, Korea, and Thailand. The researcher built ten corpora (five research papers for each corpus) to represent each variety of Englishes. The corpora were examined under grammatical and lexical features using Modified English TreeTagger in Sketch Engine. Results revealed the distinct grammatical and lexical features through the table and textual analyses, illustrated from the most to least dominant linguistic elements. In addition, comparative analyses were done to distinguish the features of each of the selected Englishes. Hence, the findings suggest that the described differences between the ‘outer circle’ Englishes and ‘expanding circle’ Englishes were influenced by the transfer of the first-language knowledge to the target language.

09:00-10:40 Session 10C
Location: IBK Hall
09:00
Data Driven Learning in English Language Education of Bangladesh: An Optimal Solution for Learning English Grammar

ABSTRACT. The study investigates the importance and pertinence of using data-driven learning (DDL) as a teaching-aid for developing English grammar skill of the mainstream high school English learners of Bangladesh. The two-fold purposes of this research are to check the effectiveness of this aid and to figure out learners’ views towards it. In Bangladesh, Compulsory twelve years-long English language education and encouraging fund support of the government are still unsuccessful to ensure the required number of skilled professionals. The study finds DDL optimal as no such solution is available that can satisfy all such constraints. Since ELE belongs to a system of conflicting forces, DDL in this regard can provide optimal satisfaction of conflicting constraints. The present study finds that English language corpora-based DDL can be helpful for students to learn the variety of English grammatical rules and related applications as well as they are able to be aware of what types of common errors are occurred using this language. Freely accessible versions of British National Corpus and Cambridge Learner Corpus along with necessary concordancers (i.e. Lancsbox, sketchengine) are used as tools for this investigation. The research followed a quasi-experimental framework by including quantitative research instruments. A placement examination along with pre-test and post-tests were used to explore the effectiveness and receptivity of corpus-based English vocabulary learning; while a questionnaire based on a 5-point Likert scale was used to assess the attitude of the students towards this process. Necessary statistical analysis was accomplished to get the result.

09:25
A Study on Spoken Formulaic Sequences in Taiwanese Elementary, Junior and Senior High Textbook Series

ABSTRACT. Formulaic expressions help learners to develop speaking fluency and select native-like language use (Shin & Nation, 2007), so they are expected to be integrated into teaching materials (Martinez & Schmitt, 2012). This study therefore investigated the distribution and forms of spoken formulaic sequences in three textbook series used in Taiwanese elementary (EM), junior high (JH) and senior high schools (SH). The current study adopted a top-down approach to examine the occurrence of 346 most common spoken formulaic sequences on the Phrasal Expression list (Martinez & Schmitt, 2012). Findings showed that there was a notable rise in distribution of the spoken formulaic sequences from the EM to SH textbook series, but the corpus size of the EM textbook series was nearly as large as that of the JH textbook series. Moreover, based on Biber et al.’s (1999) classification of formulaic sequences, five types of spoken formulaic sequences were identified in the SH and JH textbook series, while the EM textbook series failed to include coordinated binomial phrases. The frequency of occurrence of each type was uneven among three graded textbook series. The most frequent form was lexical bundles, followed by idiomatic phrases, free combinations of verb + particle, inserts, and coordinated binomial phrases. The results suggest some implications for improving Taiwanese textbooks used in primary and secondary education in terms of spoken formulaic sequences.

09:50
Stance in "introductory it" construction: A comparative study of Korean EFL and native speaker argumentative writing

ABSTRACT. The construction of introductory it followed by an extraposed subject has been found to be a frequent feature in academic prose for its multiple functions in terms of information packaging and thematic choice. Recent research has focused particularly on rhetorical motivations for using the construction and demonstrated that it provides the writer with a rhetorical means of marking authorial stance while concealing the source of the stance expressed, giving the statement an appearance of objectivity. This study investigated how Korean university EFL students used this rhetorical means in their academic writing to encode stance, in comparison with a corresponding group of native speaker (NS) students. Instances of the construction were identified from two corpora of argumentative essays written by Korean university students and English L1 students and analyzed with an integrative classification framework of stance. Results showed that while the Korean EFL writers used the construction far more frequently for stance marking, especially to mark attitudinal stance, than their NS counterparts, their use was more limited in terms of lexical choice and its rhetorical function of depersonalized stance marking. This paper concludes with pedagogical suggestions on how to help novice EFL academic writers acquire the multi-faceted usage of the construction and appropriately use it in their writing.

10:15
Corpora and Translanguaging: Facilitating Language Acquisition in a Multilingual Context

ABSTRACT. The concept of translaguaging, in its initial version, referred to situations when learners changed a language depending on its receptive or productive use, e.g., they were reading a text in English and writing a summary based on a text or discussing its content in Welsh (Baker, 2011). It as a process of creating a meaning, shaping experiences, understanding and gaining knowledge with the use of two or more languages, hence it has been widely applied in multilingual education. The overriding objective of this presentation is to prove that students’ linguistic repertoire is very flexible and open, which results in the sudden use of various linguistic elements and forms. This pedagogical strategy supports the acquisition and processing of the content. In addition, the use of several languages depending on their receptive or productive aspect has a positive influence on linguistic transfer. I will demonstrate how students of diverse linguistic and cultural backgrounds successfully acquire new academic language in Geography and Natural Sciences classes through the exposure to texts in English, brainstorming ideas in their native languages and providing output through the translanguaging lens in the classroom. The students built knowledge based on the content and obtained access to academic language within the discipline. They helped one another during interactions, prompting missing words or expressions in English. Linguistic transfer repeatedly aided the provision of fast responses. Employing this pedagogy created a space where the students juggled languages to reach a communicative target.

09:00-10:40 Session 10D
09:00
Corpus-Based Semantic Analysis for Identifying Literary Appeal

ABSTRACT. Corpus-based approach in literary studies has been used to investigate words use, and specific arrangements of words to compose highly appreciated and aesthetic works of Literature. This paper attempts to use similar approach in revealing the style of writing in two novels. The first belong to Thomas Hardy, Tess of D’Urbeville, and the second novel, Beloved¸ was written by Toni Morrison in 1987. The method of this study is qualitative and corpus-based. With the procedure conducted using AntConc for finding collocations, data were derived and then analyzed. This study is also guided by R.M.W. Dixon’s (2005) semantic classification. It is found that the use of semantic properties in Tess of D’Urberville and Beloved were not very much different, except in the use of Verb. Interestingly, the novel by Thomas Hardy is considered to be more creative, in which less repetition is used for each type of word. In addition to this, Hardy also employs more variation and power in exploring more combinations of Verbs. He is also widely address feelings, thoughts and moods for his story using combinations of the verbs. In can be concluded, although in limited set of data, that the use of semantic properties in Thomas Hardy’s Tess of D’Urberville is more varied.

09:25
The Use of “Gurindam 12” at the Opening Ceremony of Musabaqah Tilawah al-Qur’an in the Kepulauan Riau: Between Poetry and the Language of Culture Nowadays.

ABSTRACT. Kepulauan Riau society is a Malay traditional community. Malay society has an appreciation of a classic language artwork that comes from man of letters who have been made around third century. One of the literary works ever written by Indonesian writers is Gurindam 12 that made by Raja Ali Haji and now known as Gurindam 12 Raja Ali Haji. Gurindam 12 is one of the old Malay poems. They usually used in Musabaqah Tilawah al Qur’an in Kepulauan Riau which aims to maintain the culture that has been carried down hereditary. The opening ceremony of Musabaqah Tilawah al-Qur’an in Kepulauan Riau is a massive annual event that is the right momentum to preserve Malay culture in the form of poetry. This study analyzes the use of Gurindam 12 rhymes and their existence in competing with culture nowadays. Gurindam 12 rhymes contain a lot of meanings that should have been implemented by humans in the current era. The results of this study are the existence of Gurindam 12 never died, especially in the souls of the people of Kepulauan Riau who are Malay. Especially for Malay youth in Kepualauan Riau very enthusiastic about the implementation of Gurindam 12 at the Opening Ceremony of Musabaqah Tilawah al Qur’an in Kepulauan Riau.

09:50
Contrasting Narratives: The Greek Financial Crisis in US Newspaper Editorials

ABSTRACT. The financial crisis in Greece reached dramatic heights in the summer of 2015, when the Syriza government and its creditors in the IMF and European Union came to a confrontation over austerity policies demanded by the creditor institutions. This crisis, and the public referendum called by Prime Minister Alexis Tsipras in 2015, received much attention in the editorial pages of US newspapers The New York Times and Washington Post. An SFL analysis of two sample editorials revealed differences in how each newspaper's editors characterized the Greek government, its creditors, and the Greek people, and in how the editorials attributed blame for the crisis. To test whether these portrayals were anomalous or representative of consistent characterizations in the editorial pages, corpora of over half a million words of editorial texts from each paper from 2013-2015 were compiled. Clauses representing one of the three main groups of social actors in the crisis were identified and examined for patterns in characterization. In each corpus, dominant patterns of representation that could be called 'narratives' or schema emerged which revealed consistency in how sympathetically the groups in question were characterized during this period. Hypotheses as to the ideological grounding of these representations are offered, as well as comments on these representations' potential effects on readers' understanding of the crisis.

10:15
Sundanese Women on the Move: A Corpus-Based Semiotic Study of Construction of Woman in the Sundanese Magazine Manglé (1958–2013)

ABSTRACT. By implementing corpus linguistics and semiotic approaches, the present study discusses the usage of five Sundanese nouns denoting woman, viz. geureuha, mojang, pamajikan, wanita and wanoja. The data were taken from Manglé magazine, which uses Sundanese language, and the analysis covers four different eras between 1958 and 2013: the Guided Democracy, the New Order, the Transition to Democracy and the Reform eras. The corpus-based analysis is carried out to investigate the frequency of the nouns and to study their meaning by means of collocation analysis. The result of this corpus-based analysis is used as the basis for a semiotic interpretation of how Sundanese women have been naturalized by the words used to signify them. This study found that the naturalization of Sundanese woman has occurred by means of verbal signs whose meanings are developed from metalanguage and connotation. The discussion also shows that Sundanese women who were initially depicted as dependent with regard to their traditional roles were becoming gradually portrayed as independent in terms of their existence in the public sphere. This indicates that the myth of woman’s best place in the domestic sphere has gradually shifted.

09:00-10:40 Session 10E
Location: Helinox Hall
09:00
Constructing English Lexical Bundles in Linguistics Textbooks

ABSTRACT. In an attempt to help EFL students in linguistics enhance their reading proficiency, this study identifies a total of 274 three- and four-word lexical bundles (Linguistics Academic Bundle List: LABL) drawing on a 1.14-million-word corpus of linguistics textbooks. These bundles were first investigated in terms of their structural properties. Results of the structural analysis showed that NP-based and PP-based bundles accounted for almost 80 % of all bundles, consistent with previous studies showing that professional academic writing contained more intensive use of NPs than student writing (both native and non-native). Functional taxonomy of lexical bundles indicated that a high proportion of lexical bundles in the LABL had referential functions (84.9%), whereas much less of them were discourse organizers (8.6%) and stance expressions (6.5%). Results of the present study add corroborative evidence to those of previous studies. The current study also produces a few findings regarding subject relatedness of yielded lexical bundles, the use of colloquial expressions (i.e., a lot of or pronoun-framed bundles), a higher frequency of passive constructions, and a notable number of tokens of the bundle around the world. Pedagogical implications of these findings are also suggested.

09:25
Quotational Formulaic Expressions in Korean Academic Corpus

ABSTRACT. This paper explores linguistic distributions and functions of quotational formulaic expressions in Korean academic corpus. The corpus-based analysis provides substantial evidence for genre variation in quotation expressions, which contribute to representation of human knowledge and information. The significance of quotation expressions in academic writing has been repeated and emphasized in numerous literature including Campbell (1990), Hyland (1999; 2002), etc. With respect to quotation expressions in Korean, most studies have focused on grammatical forms of direct and indirect quotations or the usage of quotations in specific genre such as newspapers (Han 2013; Kim 2004). This study utilizes formulaic expressions extracted from academic corpus (2,072 papers published within the academic disciplines of humanities and social science) using the N-gram model (Lee et al. 2018) and explores genre specific properties of direct or indirect quotations. Based upon formulaic sequences containing the quotation markers, we first identify the comprehensive distribution patterns of quotational expressions. Then a new annotation scheme identifying distinct quotation markers (lako, ko, lanun, hako, etc.), report verbs, embedded sentence types, etc. has been developed to determine the related linguistic properties. Based on the annotation scheme, we categorize six different types of quotational forms ranging from standard and transformed. The result of our analysis demonstrates that indirect quotation is much preferred in academic corpus as well as standard type, which correctly reflects sociolinguistic practice of academic literature, highlighting accuracy. We also argue that the rigorous usage of indirect quotation forms is linked with hedging function that mitigates the writer’s commitment to arguments.

09:50
Processing Formulaic Expressions in Korean Corpora: Why is it challenging but significant?

ABSTRACT. Since Biber et al.’s (2004) pioneering research on lexical bundles (formulaic expressions, “FEs”) in English, there have been a few studies to adopt the similar N-gram model and extract Korean FEs. Two main approaches have been identified: one is the space unit (ecel) based approach (Kim, 2009) and the other is the morphemic bundle approach (Choe et al., 2010; Nam et al., 2016). In contrast to English FEs consisting of multi-word units, the morpheme-based N-gram approach has clear advantages for identifying Korean FEs by capturing distributions of significant morphemic units including particles and verbal stems (Choi et al., 2010; Nam et al., 2016). However, as Lee et al. (2018) point out, the morphemic N-gram processing over-analyzes certain morphemic variations as distinct units and results in significant misrepresentations of certain N-gram bundles. Thus, the frequency ranking of morphemic bundles does not function as a proper tool for identifying meaningful FEs. This study examines unique properties of Korean FEs and focuses on challenges associated with developing complementary processes. We demonstrate how to implement pre-processing and lemmatization of predicate forms to provide better outcomes for phraseological units. A revised process of merging a verbal lexeme with the following dependent morpheme(s) significantly decreases the number of extracted meaningful unit candidates as compared to the purely morpheme-based analysis in Korean. Furthermore, we introduce new methodology facilitating the determination process of Korean FEs by recycling N-gram outcomes and making constructive comparisons.

10:50-12:30 Session 11A
Location: Choi Young Hall
10:50
Morphological complexity and lexical diversity in Bengali-speaking children with Down syndrome

ABSTRACT. In Bangladesh every year approximately 2000 children born with Down’s syndrome (DS). As far as language structure especially morphological complexity (MC) and lexical diversity (LD) of a narrative performed by Bengali-speaking children with DS is concerned,the present study describes MC and LD of Bengali speaking children with DS in comparison with age-matched Typicality developing (TD) children (9.5 – 10.5 years). The task was to narrate a story of frog which was presented in only picture form. MC was measured by counting MLU, while LD was assessed through TTR and VOCD. The overall result indicates that Bengali-speaking children with DS exhibited significant deficiencies compared to the performance of TD counterparts. Put elaborately, compared to the Bengali-speaking TD children, Bengali-speaking children with DS frequently used simple sentences- often one word with either subject or verb- which indicates the nature their syntactic deficiency. Particular difficulties were found vector verbs case inflections, and definiteness markers. Regarding LD the children with DS show less diversity than their TD counterparts. The members of the DS group mainly depended on concrete words but lacked abstract words and rich vocabulary. Sometimes, to provide the name of objects or events. In addition, in the narrative production they mainly produced single words with fewer postpositions and adjectives. This result also states that in a narrative activity the poor MLU performance by DS children correlates with the weaker scores in LD. In sum the MC and lexical richness of Bengali language provide particular difficulties for Bengali-speaking children with DS.

11:15
A Dictionary-based MWE Level list and its Application

ABSTRACT. Ramisch et al. (2013) report that multiword expressions (MWEs) account for significant part of language use (e.g. 51.4% out of 117,827 nouns and 25.5% out of 11,558 verbs in wordnet are MWEs). This indicates that in order to evaluate the level of difficulty of the text MEWs have to be deeply and widely examined (Martinez and Schmitt 2012; Martinez and Murphy 2011; and others). However, the MWE lists available for now, such as the PHaVE list (Garnier and Schmitt, 2015), are not of great use in that they were created largely for a receptive purpose (e.g. the phrasal expression list) and underestimate the level of difficulty of MWEs. In this respect, the present talk suggests a dictionary-based list consisting of 13,739 MWEs for both receptive and productive purposes. The development and rationale of the list will be discussed, and then an experiment to evaluate the feasibility of the current MWE list will be conducted. The current MWE list has a potential to improve accuracy of multiword expression processing as well as the analysis of the text difficulty.

11:40
Distinction of English Near-synonyms in Contextual Uses between Destiny and Fate by means of Binary Opposition Strategy

ABSTRACT. There have been many heated debates about the definitions of some near- synonyms. In particular, it is very difficult for English non-native speakers to distinguish among English synonyms and use them in appropriate contexts even if we look up the meanings in an English dictionary, which are sometimes more confusing. The objective of this study is to explore the distinctions between Destiny and Fate by means of ‘Binary Opposition(BO) Strategies’ proposed by Kim(2015). The basic idea of BO is that in order to distinguish sharply and effectively between a pair of near-synonyms, it is necessary to focus on a pair of (contextual) BO features, such as [actual vs. non-actual], [cause vs. effect], [controllable vs. uncontrollable] and [changeable vs. unchangeable] ect., and identify the critical differences between them. This demonstrates that in the case of non-interchangeable uses, ‘destiny’ and ‘fate’ are distinguished in the context by BO features like 1)[+divine vs. -divine](which means destiny can be influenced by man’s actions, whereas fate is seen as divinely planned), 2)[+controllable vs. -controllable](which means destiny is like an appointment you can make or change, whereas fate is like death you cannot avoid), 3)[+changeable vs. -changeable](which means while destiny is something we can actively shape and change. whereas fate is based on the notion that there is a natural order in the Universe which cannot be changed, no matter how hard we try, and 4)[future vs. past](which means destiny relates to the probable to almost certain future, whereas fate relates to events of the past).

12:05
The phraseology of non-high-frequency words

ABSTRACT. The paper proposes to remedy the current lack of awareness of the various contributions to language use, specifically phraseology, made by the lower frequency bands of the lexicon. It has been proved that phraseology is pervasive in all language fields and it has become established recently as a discipline (Sinclair, 1991; Altenberg, 1998; Cowie, 1999; Wray, 2005; Granger and Meunier, 2008). However, research on the semantic and phraseological tendency of words has been mostly focused on high frequency words in both phraseology and lexicography. The phraseological features of non-high-frequency (NHF) words are mostly neglected.

This paper addresses the following research questions:  Do NHF words form strong phraseological patterns in the texts?  Is there a positive correlation between the frequency of NHF words and the number of phraseological patterns formed by these words and their co-occurring words?

The COCA (Davies, 2008) wordlist was used and the words were grouped into three sub-lists in terms of word frequency. Twenty NHF sample words were randomly selected from the mid-frequency and low-frequency wordlists and their core collocates within the co-text were examined and the salient phraseological patterns were spotted. It was found that most NHF words do not form many strong patterns with their co-occurring words, which partly explains their comparatively low frequency in the corpus. Besides, no positive correlation between the frequency of NHF words and the number of phraseological patterns formed by these words and their co-occurring words was observed in this study.

10:50-12:30 Session 11B
10:50
The Corpus-based Study on the Syntactic Productivity of Chinese Separable VO Compounds

ABSTRACT. This study conducts a hapax-based analysis to examine the syntactic productivity of Chinese separable VO compounds in two different genres of corpus databases. While Chinese separable VO compounds are recognized as words, they can also be separated into phrases via syntactic reanalysis. While the hapax-based analysis has often been employed to measure the productivity of suffixes, this study attempts to apply the method to examine the productivity of Chinese separable VO compounds focusing on the number of syntactically reanalyzed novel expressions. The corpus data of Weibo and newspaper articles were selected as databases from BCC corpus developed by Beijing Language and Culture University to compare the syntactic productivity of separable VO compounds between refined language usages (newspaper) and non-canonical internet language usages (Weibo), which resemble natural spoken language. For the analysis, 25 VO compounds were selected from the HSK vocab list. Tokens of separated VO compounds were analyzed from each corpus database. The result of a z-test revealed that the overall syntactic productivity of the newspaper corpus (26.29%) and Weibo corpus (15.6%) differed significantly (z = 20.906, p < 0.001) from each other. The lower productivity of separated VO compounds in Weibo indicates that separated VO compounds may be used with more fixed patterns nowadays compared to canonical usages. Also, while longer and more complex forms are preferred in written language context, it is possible that language speakers tend to opt for simple and efficient forms in casual language settings.

11:15
An analysis of the Thai reflexive pronoun tuaʔeeŋ ‘self’: Insights from Thai-English parallel corpora

ABSTRACT. The Thai reflexive pronoun tuaʔeeŋ ‘self’ has distinctive properties when compared to reflexives in English and East Asian languages (Hoonchamlong, 1991). This study aims to examine: (1) similarities and differences between the Thai and English reflexive pronouns and (2) co-occurrence patterns of tuaʔeeŋ and their meanings. A total of 200 concordance lines of tuaʔeeŋ and its English equivalents in the English-Thai parallel corpora were analyzed in terms of their grammatical positions and semantic correspondence. It was found that 52% of tuaʔeeŋ instances occurred when English reflexive pronouns -self/-selves (Peter told himself) were used in the source texts, predominantly as an object of a transitive verb, thereby confirming their correspondence in grammatical and semantic properties as reflexive pronouns. What is more interesting, however, is that over 40% of tuaʔeeŋ were used to express other meanings in English; 22% were used in translation of an English possessive NP construction (his ears) while 21% were opted for in Thai translation when no traces of English equivalents can be precisely identified. There were also cases wherein tuaʔeeŋ, unlike English reflexive pronouns, occurred in the subject position of an embedded clause. These distinctive uses of tuaʔeeŋ were examined thoroughly in the light of their co-occurrence patterns. We argue that the use of tuaʔeeŋ in the Thai translation is associated with differences in verb subcategorization frame, construction choice as well as phraseology of the Thai and English languages.

11:40
Comparative Study of Negative Sentence Based on Korean-Chinese Bilingual Parallel Corpus

ABSTRACT. The Korean language has deep roots with the Chinese language in history. Chinese needs to express grammatical relations and meanings through word order and function words, while Korean mainly relies on additional components (suffix) to represent. Negative sentences are the typical grammar project at the Chinese teaching initial stage. However, we found out even South Korean students in the middle or advanced stage do not have a good grasp of the usage of negative sentences, and there are a lot of related typical errors. This paper focuses on the comparative analysis of Korean and Chinese explicit negative sentences, Chinese is represented by "bu" and "mei you",and Korean is selected as "anda" and "moda". We will use the authentic Korean Parallel Corpus "21st Century Sejong King Corpus" to look for negation syntactic differences and connections of sentence patterns, and express them in the form of Semantic Map Model.

12:05
Effects of verb-construction association on second language processing of English argument structure constructions

ABSTRACT. This study investigated how association strength between a verb and a construction affects second language (L2) learners’ reading speeds on the English argument structure constructions. Previous studies have shown that L2 learners produce sentences containing less frequent verbs as their writing proficiency increases, indicating the learners’ sensitivity to a verb-construction association in production (e.g., Kyle & Crossley, 2017). However, little is known how L2 sensitivity to verb-construction association extends to L2 sentence processing. To address this issue, this study tested whether sentences showing weaker verb-construction association cause more processing difficulties for L2 learners. In corpus-based analyses, we calculated verb-construction association strength for 24 English argument structure constructions based on collexeme analysis (Stefanowitsch & Gries, 2003) In a subsequent self-paced reading task, we analyzed 66 Korean-L1 English-L2 learners’ reading time patterns for the 24 sentences in comparison with 27 native speakers of English. The results of the self-paced reading task indicated that the L2 learners showed increasing reading times for integrating information between a verb and a construction as the verb-construction association was weaker. In contrast, the native speaker control group showed no sign of difficulty contingent upon verb-construction association. These results suggest that the association strength between a verb and a construction significantly influences L2 learners’ integration of the verb with the construction during real-time sentence processing. Our findings support the main idea of the usage-based models that distributions of language input play a key role in language learning and use.

10:50-12:30 Session 11C
Location: IBK Hall
10:50
The development of collocation use in second language writing: A quasi-experimental method

ABSTRACT. Collocations, prefabricated multi-word combinations, are considered to be a crucial component of language competence which indicates the central role they should play in language teaching and learning. However, collocations remain a challenge to L2 learners at different proficiency levels.

This study, therefore, used data from 100 Chinese students of English from a Chinese university and employed a quasi-experimental method with a control-group research design. It aims to compare the uses of collocation through the Data-Driven Learning (DDL) teaching approach and the use of a computer-based dictionary with a traditional teaching approach. One of the experimental groups used #Lancsbox (Brezina, McEnery & Wattam, 2015), an innovative and user-friendly corpus tool. The other experimental group used the computer-based Oxford Collocations Dictionary. Writing pieces were collected regularly from learners within 15 weeks and compiled into a learner corpus. Collocations were extracted from the learner corpus and their mutual information and log-dice scores were calculated by consulting the British National Corpus.

Results indicated that learners in three groups demonstrated a tendency to use more collocations at the end of treatment. However, the percentage of acceptable collocations used by the DDL group has increased after revising, while the other two groups showed either a weak decline or a stable tendency. Moreover, the collocations produced by learners showed the different distribution of the strength of association according to MI and log-dice bands. The findings contribute to our understanding of the effectiveness of DDL for teaching collocations and L2 learners’ collocation development over time.

11:15
A study of highlighting cultural information of English Learner’s dictionaries through corpora

ABSTRACT. With the requirements of the New Standard of English Curriculum in China, Cultural Awareness has been attracting more and more attention in foreign language teaching. However, the cultivation of cultural awareness is not very effective in English teaching. Words are regarded as a carrier of culture. dictionaries are essentially cultural artefacts, and English learner’s dictionaries, as pedagogical dictionaries, should not only include cultural words and idioms, but it should also contain the implicit cultural information of common words. word definitions and examples are indispensable to dictionaries, but it is inadequate for presenting cultural information, which is reflected in the following aspects :1) word sense with register label, stylistic label or regional label; 2)words with cultural meanings presenting different semantic prosodies in different contexts; 3) common words presenting cultural information by collocating with other words. This paper takes some examples to highlight cultural information of three English learner’s dictionaries including Oxford Advanced Learner's English-Chinese Dictionary, longman Dictionary of Contemporary English, Merriam-Webster's Advanced Learner's English Dictionary by means of corpora such as British National Corpus, Corpus of Contemporary American English, the TV corpus and the Movie corpus from the perspectives of word definitions and examples to improve cultivation of cultural awareness of English learners.

11:40
Influence of L1, L2 Proficiency, and Task Types on Lexical Features of L2 Speeches by English Learners in Asia: A Study Based on the ICNALE Spoken Dialogue

ABSTRACT. It has been widely known that learners’ L2 performance can change according to various parameters concerning learners and/or tasks. Cervantes and Gablasova (2019), for instance, examine Trinity Lancaster Corpus and show that the types of high-frequent phrasal verbs change according to learners’ proficiency levels. Caines & Buttery (2019) analyze Cambridge Learner Corpus and conclude that lexical features, word-class frequency, and subcategorization frames describing verbs and their arguments are all influenced by task-topics. However, fewer learner corpus studies have focused on learners in Asia. This study, therefore, used the ICNALE Spoken Dialogue (Ishikawa, 2018), which currently includes approximately 600,000 tokens of L2 English speeches by Asian college students collected in the interview settings, and examined how learners’ L1 (Japanese, Chinese, Thai, etc.), L2 proficiency (A2, B1 low, B1 upper, and B2+) and task types (picture descriptions, role-plays, and casual conversations) influence (a) the number of tokens, (b) the number of types, (c) lexical variety (Herdan’s C), (d) word level, (e) the ratio of five basic word classes (personal pronouns, nouns, verbs, adjectives, and adverbs), (f) the number of interjections, and (g) the number of non-words such as false starts. Our quantitative analyses have shown that both of the learner parameters and task parameters have a significant influence on these lexical indices. This study suggests the importance of paying due attention to learner/ task variables in the analysis of learner corpus data.

12:05
Learner interactions and perceptions of a new pattern-based referencing tool

ABSTRACT. This process-based study examined how EFL college students interacted with a new collocation tool that has a new interface and provides advanced search query features. A class of college students participated in this study. In order to elicit students’ tool consultation behaviors, a vocabulary test with collocation questions was designed. The students’ use of the tool to answer the questions was screen-recorded for further analysis, serving as the major data source. After analyzing the students’ online search behavior and questionnaires, one-on-one interviews with selected students were conducted to clarify issues related to the study and their experience in using the tool. The findings indicated that the pattern-based tool was efficient in helping students solve collocation problems and students were positive about the new tool. This study concludes with some pedagogical implications and suggestions for further research.

10:50-12:30 Session 11D
10:50
Different Conceptualisations of Medical Term Chest Pain in Corpora of Doctors, Nurses, Patients and Carers

ABSTRACT. Effective communication between healthcare professionals and patients can be challenged by differences in the use of medical terminology. One of the most common causes of emergency department admissions in Australia is chest pain, which is discussed using a wide range of alternative terms and different descriptions, and are likely to embody different conceptualisations of the medical condition among those involved in healthcare -- doctors, nurses, patients, and carers. This research aims to explore their different conceptualisations of chest pain, to improve healthcare communication in acute care settings.

A corpus of over 1.5 million-word texts was compiled from medical and nursing journal articles and patients and carers monologues relating to chest pain, consisting of three subcorpora of outputs by 1) doctors, 2) nurses, and 3) patients and carers. This highly specialised corpus allows us to search for alternatives and variants of chest pain, and to investigate collocational strength between terms, using two different association measures: mutual information (MI) and delta-P (ΔP).

The collocational strength of chest pain was examined by looking at MI scores of chest and pain as a node word each time, and the result shows the exact match of MI scores for chest pain in the doctors’ corpus, showing the strong co-occurrence of the words within the term for doctors. Whereas the MI scores were unmatched for the other groups, indicating the collocation is much weaker for them, comparing ΔP2|1 and ΔP1|2 for each corpus revealed the collocational variations are more diverse in nurses than the others.

11:15
The role of body location in iconicity: the relationship between iconicity and location difficulty

ABSTRACT. This study tested whether iconicity interacts with articulatory difficulty of locations. Previous research by Lin (2019) has found that the distribution of difficult handshapes and easy handshapes does not significantly differ across pure iconic and metaphorical iconic signs. However, it is not clear if this is also true for location iconicity. Ann (2006) proposes a system for ease of articulation of handshapes based on physiology.

In this study, we also construct a model for determining the ease of articulation of locations based on physiology. The model assumes that lower positions are easier to produce and higher positions are more difficult to produce. Battison (1978) divided locations into two categories: the head is a difficult location; and the trunk, arm, and weak hand are easy locations.

We collected 120 signs from the TSL Online Dictionary (Tsay et al. 2015) and ran a statistical analysis. The analysis contrasted 60 pure iconic signs with 60 metaphorical iconic signs, in which pure iconic signs included real physical objects and metaphorical iconic signs did not. The analysis found a significant result: pure iconic signs tend to have easy locations and metaphorical iconic signs tend to have difficult locations. This result suggests that iconicity of body location influences lexical structure when imitating a real object referent. The form of pure iconic signs may rely on easy locations since it is more abundant in shape combination with weak hand and arm locations.

11:40
Discourse Analysis of Young Adults’ Sensemaking and Attributions of Dating Apps in Manila, Philippines

ABSTRACT. Digital platforms have reshaped romantic intimacies. Using mediated spaces like mobile dating applications paved the way for meaningful interactions in both the relational romantic and sexual relationships. Individuals curate themselves in mobile dating applications based on certain desires and needs. Certain reasons motivate a person to self-present. People attribute their success and failures to external and internal factors, too. This paper answered the following questions: 1. How do young adult mobile dating app owners make sense of their mobile dating app experience via their self-presented selves? 2. How do young adults in Manila, Philippines attribute their success and failures in the use of the mobile dating apps?

The study used a descriptive case study communication research design. The approach was constructivist with a slant on how individuals perform in technology. Using in-depth interviews, the informants’ musings and insights were thematic analyzed. The study found out that presentation process allowed individuals to create an online self that contained information that they thought reflected their online intents. I found out that the mobile dating app users utilized tactical routes or courses in order for them to convince their dating matches of their value. Further, successful Tinder and Grindr dating were evidenced by actual chats and dates, and, for some, sexual encounters. An unsuccessful engagement was equated to any of the following experiences of the user: snubbed before or after an app match, blocked by users in the app, or stood up before an actual date.

12:05
CONCEPTUAL BLENDING IN METAPHORS IN THE 2016 PRE-ELECTION CAMPAIGN

ABSTRACT. The article examines metaphor as transferring features of the social world onto the other elements of reality, the 2016 pre-election campaign in particular; the theory of conceptual integration of J. Fauconnier and M. Turner is used to analyzing the metaphor. Metaphor in political discourse represents an interaction of two mental spaces. The first one is equal to the source domain that is an element of the social world whereas the second one is the target domain which represents an element of the political world. So, the result of the interaction is the blended mental space which is conducted from a way of considering of its productiveness, several models of metaphor are distinguished. The analysis reveals convergent and divergent features of metaphor in the pre-election campaign of Donald Trump and Hilary Clinton. Metaphor in D. Trump’s texts tends to focus on conceptual models ‘we’ and ‘they’ which is deduced with the help of quantitative analysis whereas in H. Clinton’s texts ‘divided nation’ model is mostly described through metaphor.

10:50-12:30 Session 11E
Location: Helinox Hall
10:50
Corpus Linguistic Perspectives on Issues Related to the Extraction of Sentiments Expressions from User Review Corpora

ABSTRACT. This study analyzes high-frequency patterns and addresses issues regarding sentiment analysis in two Korean corpora, one compiled from apparel reviews and the other consisting of film reviews. First, it has been necessary to extend the unit of evaluative language analysis from simple words to 'extended lexical units' (Stubbs 2009). This study is in line with Sinclair's (2004) views by adopting a corpus-driven approach for the inclusion of patterns and phrases as the units of analysis. 1,000 sentiment related expressions have been extracted from each corpus, which constitute around 50% (apparel review corpus) and 40% (film review corpus) of the phrases. Secondly, the study has examined the domain specific aspects of the sentiment expressions by comparing the high-frequency patterns of each corpus. The analysis then shows around 10% (44 patterns) agreement between the two domains, which is a rather low result. This demonstrates that the sentiment expressions extracted vary considerably depending on domains and suggests that genre characteristics and domains need to be taken into consideration in the extraction of sentiment expressions.

11:10
Classification of Metonymic City Names in Newspaper Corpus

ABSTRACT. Most research on metonymy in Korean is based on the cognitive approach. Previous literature reports that metonymic patterns are quite common in the Korean language, with ‘places’ often referring to events, people, institutions, etc. rather than strictly geographic locations. However, we expect that in reality, patterns of metonymy in Korean are more complex. To my best knowledge, there is no existing research based on corpus related to metonymy in Korean. Hence, we will examine the patterns of metonymy with city names in Korean on the ground of corpus linguistics. This study explores the use of metonyms with city names in Korean newspaper corpus and extensively tests their reliability. First, we identify all instances of selected city names in headlines from three major Korean newspapers: Chosun Ilbo, Joongang Ilbo, and the Hankyoreh. They are extracted from news items, editorials, sports articles, and others from the ten year period between 2000 and 2009. We then manually annotate those instances, determining whether the city name is used literally or metonymically. We believe that our study contributes to the field of linguistics both theoretically and technically. We attempt to illustrate characteristics of metonymic patterns and the real distribution of literal and metonymic readings (including patterns) in corpus. This exercise extends our understanding of metonymy from the theoretical perspective. Furthermore, our study provides objective and comprehensive data which will help improve NLP systems in the future. Comprehensive data on metonymy can serve as a basis to implement NLP algorithms on naturally occurring texts.

11:30
On the Functions and Meanings of SEA Constructions - focused on 'po-da', 'siph-da', 'ha-da'

ABSTRACT. The purpose of this study is to examine the grammatical functions and meanings that auxiliaries such as ‘po-da(see)’, ‘siph-da(want)’, ‘ha-da(do)’ which are combined with sentence-final endings have. In the preceding studies, ‘sentence-final ending + auxiliary’ constructions have been suggested as “SEA(Sentence Ending Auxiliary) constructions”, and their grammatical category was suggested as auxiliary verb constructions, quotation constructions, or modality verb constructions. This study selected ‘po-da(see)’, ‘siph-da(want)’, and ‘ha-da(do)’ as subjects, because they have similar syntactic and semantic characteristics. The SEA constructions examined in this paper are different from the general auxiliary verb constructions. Unlike general auxiliary verbs, the SEA constructions take complete sentences including sentence-final endings as supplements. Furthermore, according to the description in dictionary, this above three verbs can be considered as various grammatical categories such as the main verb, the auxiliary verb, and the affix, and the meanings of these verbs are variously presented based on their preceding sentence-final endings. As mentioned above, the purpose of this study was to examine the actual meanings and the functions of SEA constructions including ‘po-da(see)’, ‘siph-da(want)’, and ‘ha-da(do)’. Therefore, in this study, the spoken and written corpus were used to identify their usages and which sentence-final endings are preceding above three verbs. Based on corpus analysis, this paper describes what grammatical categories each construction performs and what meaning language users represent by using these constructions.

11:50
Corpus-based study on combination among endings in Korean

ABSTRACT. The purpose of this study is to investigate the frequency of the contemporary Korean endings in the corpus data. Korean is characterized by the presence of hundreds of endings and a variety of meanings by combining several endings to the stem. The possible combinations are theoretically close to infinity. However, if we take into account the fact that Korean speakers use these combinations unambiguously, it is expected that there will be certain patterns in actual use. First, we examine the usage pattern of each ending by calculating the frequency in each register. In this study, we identify the ending patterns that occur at a high frequency according to the specific environment and explain why these differences occur. Second, binding patterns of ending complexes are compared. We analyze how each ending appears depending on the number and types of other endings. Based on this, we suggest what kind of ending combinations can be processed as a unit in terms of language use. Third, we observe the constraints or tendencies found in ending combination. In this study, we explain the reasons for the limitations of ending combination from various perspectives and explain the frequency difference of the endings according to the environment. The results of this study can provide a theoretical basis for actual conjugation types by showing individual ending lists and their combinations of Korean. And Korean learners who have difficulty in acquiring various ending combinations can learn verb conjugation more easily based on the frequency information.

12:10
The politeness of Korean second personal pronoun- Contrast with other languages-

ABSTRACT. The purpose of this study is to examine the politeness of Korean second personal pronoun by comparing them with languages avoiding pronouns when they indicate politeness. This study is meaningful in that it can more objectively examine the politeness of Korean second personal pronouns. Lyons(1977) suggested that in some languages, personal pronouns are grammaticalized according to their social status and role. In the case of Korean, it is a language in which personal pronouns are determined according to social status and anecdotal role. In particular, the second personal pronoun is selected according to the human relationship, the age, the social relation, do. In Helmbrecht (2005), 207 languages are distinguished according to Politeness Distinctions in Pronouns: (1) No politeness distinction, (2) Binary politeness distinction, (3) Multiple politeness distinction, (4) Pronouns avoided for politeness. Korean is a language that avoids second person pronouns when expressing politeness. (4) includes Japanese, Thai, Vietnamese, Burmese, Indonesian, and Khmer in addition to Korean. In these languages, we use appellation, kinship terminology and titles, instead of second personal pronoun to indicate politeness. This paper examines how to use second person pronouns to express politeness through the contrast between (4) type languages and Korean. The origin of these languages can not be regarded as the same because Korean and Japanese belong to the agglutinative language, and Vietnamese and Thai belong to the isolated language. Therefore, this paper aims to describe in detail the similarities and differences in the languages of type (4) from the viewpoint of contrast linguistics.

13:30-14:30 Session 12: [Plenary 2] Vaclav Brezina. Building and analyzing large corpora: the case of BNC2014 and TLC

Plenary Session

Location: Grand Ballroom
13:30
Building and analyzing large corpora: The case of BNC2014 and TLC

ABSTRACT. TBA

14:40-16:20 Session 13A
Location: Choi Young Hall
14:40
BROKEN-HEARTED GENERATION: A LEXICAL ANALYSIS

ABSTRACT. A video uploaded on Instagram by @sobatambyar went viral because it contained footage of a young man standing front row at Didi Kempot's Campursari (a combination of traditional Javanese and pop music) concert in Taman Balekambang, Solo, Central Java. The young man sang, danced, and cursed along the song "Cidro" (hurted), one of Didi Kempot's hits, which was about heartbreak. Since then, Didi Kempot, a senior Javanese musician was made "The Godfather of Broken Heart" by his young diehard fans. Young adults in Indonesia are known to be closely attached to social media. They grow up with the media that contributes in shaping their identity. Therefore, social media becomes a role model for teens and young adults in the country. Sadly, the recent role model evolves around stories of separation and heartbreak. The emergence of "meek and prone to heartbreak" generation and at the same time, fond of Campursari songs becomes language research-worthy. This study attempts to carry out a lexical analysis on postings made by @sobatambyar which started the wave of heartbreak among young adults in Indonesia. The researcher uses photo captions as the source of data and analyzes the lexical characteristics that emerge in the photo captions. The results indicate that there are a number of nouns using prokem (slang words in Indonesian) to express heartbroken condition, as well as popular Javanese and Indonesia verbs and adjective to punctuate the young adults' feelings.

15:05
Exploring the relationship between corpora and word association

ABSTRACT. Two broad, dichotomous assumptions have been made about word association in the 2000-year history of research into the phenomenon: firstly, that it is a largely semantic process, and secondly that it is largely reflective of textual co-occurrences. This presentation seeks to show that these approaches should not be seen as mutually exclusive. A novel experiment will be described showing that entropy of the textual distribution of a given cue word interacts with both the number of responses given to that cue in a word association test, and also the grammatical class of those responses. However, a qualitative examination of the same set of cues also offers clear evidence of a semantic selectivity to these responses. These findings will be discussed in the context of usage-based models of language which emphasise the role of both textual distributions and cognitive processing on linguistic knowledge. One particular cognitive process present in these models, that of abstraction, is presented as a key mechanism in the establishment of the independent semantic representations of words from which semantic word associations emerge. These models offer a view of word association as a complex, multi-stage phenomenon involving phases of exemplar-based learning, cognitive processing of linguistic knowledge, and finally the generation of word association responses.

15:30
Academic word list of biology: a corpus-based research of articles in life sciences

ABSTRACT. The most influential word list in recent years has been the Academic Word List (AWL) containing 570 word families (Coxhead, 2000) beyond the 2,000 most frequent word families in general English of the General Service List (West, 1953). The methodology of compiling word lists is now well-established and has been confirmed in a range of studies across genres, types of text and corpora (Brezina & Gablasova, 2013; Dang, 2018) and subject areas (Coxhead & Demecheleer, 2018; Lei & Liu, 2016; Liu & Han, 2015; Tongpoon-Patanasorn, 2018; Valipouri & Nassaji, 2013). This paper presents a discipline-specific Biology Academic Word List (BAWL) of written texts in life sciences. To this end a corpus of modern biology Lomonosov Corpus of Modern Biology (LCoMB) was designed. The corpus is compiled of research articles and review articles from high-impact scientific journals (Nature, Science, Cell, Annual Reviews, etc) across a comprehensive sample of major areas of biology (physiology, neurobiology, genetics, molecular biology, etc) and comprises 4 million running words. The word families were selected on the basis of high levels of frequency, range and uniformity/dispersion. The identified word families provide a significant degree of coverage of written texts in biology and overlap with the AWL. The BAWL provides a useful academic word pool for advanced English learners and early career researchers who need to read and publish articles in English and may have multiple applications in teaching EAP in research-informed usage-based materials design, assessment of the vocabulary load of teaching/testing materials, etc.

15:55
A Cultural Historical Study on Sweetness Using Corpus - Focused on Analyzing Vocabularies and Co-occurring Words Related to Sugar and Sweetness

ABSTRACT. Sugar is a representative commodity that showed explosive increase of production and consumption all over the world including Korea since modern times. It is a product that best indicates the changes of society and daily life. Therefore, examining the sugar product and the aspects of sweetness generated by sugar along the passage of time allow us to understand not only the lifestyle changes but also the consciousness shift of Korean people. People’s perception of taste, in particular, is a social and cultural product that reflects community consensus. This study uses the corpus analysis methodology to track the changes of the times surrounding sugar and sweetness. Specifically, it examines the vocabularies and co-occurring words related to sugar and sweetness that appeared in four major daily newspapers and two popular magazines until now from the 1960s, when sugar consumption began to grow in earnest after the product became one of the basic industries in the 1950s. Sugar and sweetness showed how the lifestyles and ways of thinking of the people living in Korean society changed as they encountered various discourses such as hygiene and civilization, nutrition and health. Language is a guide that represents culture, and vocabulary, as an extremely sensitive indicator of culture, is closely related to the society in which it is used (Sapir, 1949). This study using the corpus analysis is meaningful in that it deals with an area related to food, which is one of the areas that effectively represent this connection (Wierzbicka, 1997).

14:40-16:20 Session 13B
Chair:
14:40
Identifying similar sentences with BERT

ABSTRACT. This study attempts to solve the problem of identifying similar sentences. Here, 'similarity' means semantic similarity and identifying similar sentences means determining whether two sentences are paraphrase of each other or duplicated. Public data sets that can be used to address this challenges include STS, SICK, MRPC and QQP data set. This research implemented a model determining whether given two questions are duplicated or not with QQP data set. To determine the similarity, it is necessary to consider sentence representation. In this study, we used BERT and encoded the token vectors from BERT using GRU and LSTM. Also, we employed an interaction layer to reflect the direct relationship between the two sentences, and applied the DenseNet to extract the key features of the output from the interaction layer. Basic structure is similar to the DIIN by Gong, Luo, and Zhang. Our study improved the DIIN by simplifying the model without harming the performance as follows. First, we simplified the embedding layer by using BERT only, while DIIN using GloVe and Exact Match. We also simplified the structure in encoding layer using Bi-GRU and LSTM with masking to exclude meaningless information, but DIIN used a combination of high-way network and self-attention. Feature extraction layer is the same as DIIN, but we applied various dropout techniques to avoid overfitting. Our model shows 89.33% performance in the QQP validation dataset. There is a very small difference in performance comparing to the current state-of-the-art model (<1%) although our model doesn’t require large computing resources.

15:05
A Statistical Analysis on Korean and Chinese Personal Pronouns

ABSTRACT. This study aims to organize the concrete examples which were extracted from Korean and Chinese corpora and analyze the actual data to accurately identify the usage frequency of personal pronouns, comprehensively compare the frequency and ratio of the personal pronouns in Korean and Chinese to describe the characteristics and commonalities of two languages, reexamine the features and causes of their usage based on the theories of linguistic typology. Although there have been many studies or comparative studies on the related issues of Korean and Chinese personal pronouns, it is less common to systematically discuss the contents of personal pronouns in these two languages with actual data. Therefore, the characteristics of Korean and Chinese personal pronouns are not fully exploited. For this purpose, by using Yonsei Korean Corpus and Peking University CCL Corpus with empirical techniques, the actual frequency of appearance, using rate, and distribution of personal pronouns are investigated. For instance, the Korean second personal pronouns are used less because of the complex subordinate relationship between the speaker and the listener. Although there is no restriction on the developed honorific system in Chinese, the Chinese second personal pronouns are used less as well. This study not only contributes to describe the characteristics of usage and distribution of Korean and Chinese personal pronouns, clarify the relationship between usage and distribution, but also it will help to analyze the typological characteristics of personal pronouns, and explore the reasons of these characteristics.

15:30
Time and Tense Corpus: Annotation of grammatical tenses

ABSTRACT. This paper describes the development of a corpus of English sentences (n=5000) annotated with the names of grammatical tenses. Strictly speaking, tense is limited to present and past, while aspect is used for the future. Grammatical tenses defined as the twelve commonly-used labels for verb forms based on the morphosyntactic structure and encompass tense, aspect and mood. Examples include present perfect progressive and simple future.

The time and tense corpus is designed to serve as a dataset for supervised deep learning. The deep learning aims at creating a model that can be used to help learners of English understand grammatical tenses in context by identifying the tense name, tense function and verb class of finite verb groups.

In this project, grammatical tenses are annotated automatically using rule-based parsing of the raw text and part-of-speech tags. This in-house developed script assigns grammatical tense to verb groups based on the particular permutations of parts of speech and specific words. For example, to identify present perfect progressive, the code matches “(has | have) + been + VBG”. From this, the verb group can be assigned the correct tense. The development of the corpus and the Python code to annotate the corpus will be described and explained. Although gold-standard accuracy is achieved, the automated parsing algorithm has difficulty identifying the grammatical tense when elision or intrusions occur.

15:55
Local grammars and Chinese speech acts: A case study of apology

ABSTRACT. This study extends our recent research on local grammars (e.g. Su 2017, 2018a, 2018b, 2019; Su & Wei 2018; Hunston & Su 2019) to Chinese speech acts, which is demonstrated with a case study of apology (i.e. 道歉 "daoqian" in Chinese). Using data taken from BCC (a corpus of mandarin Chinese developed by researchers at the Corpus Centre of Beijing Language and Culture University), the study shows that local grammars can be usefully applied to explore Chinese speech acts, though there are some challenging issues that need to be carefully considered. The study will discuss in detail these issues and also the opportunities/advantages of using a local grammar approach to explore speech acts. The overall argument made is that we may ultimately derive a communicative grammar, i.e. a grammar for communication, by developing a set of local grammars, with each defined by a specific communicative function (cf. Butler 2004).

14:40-16:20 Session 13C
Location: IBK Hall
14:40
From Transcripts to Textbook: Creating a Corpus-based Textbook for Adult Korean ESL Learners

ABSTRACT. The increasing prevalence of task-based, spoken-language proficiency assessments, such as the ACTFL OPI, and the IELTS, has created a demand for textbooks which can assist learners in acquiring the skills needed to complete the communicative tasks covered in these assessments. This article describes the development process of such a textbook in three stages. In the first stage, a 170,000-word learner corpus was created from audio transcripts of over 100 interviews with Korean-speaking adult employees at an international engineering firm headquartered in South Korea. The interviews were collected as part of an in-house task-based, spoken-language proficiency assessment which was adapted from the ACTFL OPI. In the second stage, the corpus was searched for the 100 most frequently-used English words. The usage of these words were then analyzed for grammatical and lexical errors and patterns of errors where identified. Errors were grouped according to the tagset described in Izumi (2005). In the third stage, these tagged error groups were used as the basis for a textbook designed to be used at the engineering firm to help employees improve their spoken-language proficiency by reducing errors in high-frequency vocabulary. This corpus-based development process may reveal important implications for both the development of textbooks for adult Korean learners of English and for assessing the extent to which task-based, spoken-language proficiency assessments, such as the ACTFL OPI, provide a holistic assessment of English Speaking ability.

15:05
A Corpus-Based Assessment of Longitudinal Development of Lexical and Syntactic Features in Second Language Writing

ABSTRACT. While a growing body of research has examined developmental patterns in linguistic features (e.g., Verspoor et al., 2008), these studies have at least two limitations. First, previous studies included a small sample size. Second, research has not considered developmental patterns which may differ by content words (CWs) and function words (FWs). To fill these gaps, the purpose of this study was to assess developmental patterns of lexical and syntactic features in L2 writing using a learner corpus.

Data were taken from the Gachon Learner Corpus collected from Korean EFL university students. Each student wrote approximately ten English essays over an academic semester. Data included 1,043,585 words in 10,413 texts produced by 1,196 students. Based on TOEIC scores, students were divided into beginning- and intermediate-level groups. To examine the longitudinal changes, linear mixed-effects models were used.

Findings indicated that over time, learners tended to produce more sophisticated CWs (e.g., less frequent CWs), while using more frequent FWs. Also, mean length of clauses increased over time, but not mean length of T-units or sentences, which suggests an improvement on phrasal complexity. Furthermore, CW and FW type-token ratios decreased (i.e., more CW and FW repetitions) over time, which indicates learners’ increasing focus on the continuity of lexical ties. Different developmental patterns across beginning and intermediate levels were found, such that intermediate-level learners produced more sophisticated CWs, longer sentences, and greater numbers of CWs and FWs. Overall, findings provide a more systematic understanding of developmental patterns in linguistic features in L2 writing.

15:30
Pragmatic Discourse Markers you know and sort of in Korean EFL Teacher Spoken Corpus

ABSTRACT. The purpose of the study is to investigate and compare usage of pragmatic discourse markers you know and sort of between Korean English teachers (KET) and native English teachers (NET) in EFL teacher corpus, a spoken corpus collected from English learning classes (Kwon & Lee, 2014). To select focal pragmatic discourse markers, two word lists were extracted and compared using Keyword in WordSmith 7.0. (Scott, 2016). The study analyzed the functional and structural distribution of you know and sort of in KET and NET corpora. The results showed that Korean EFL teachers used pragmatic discourse markers you know as a gap filler between utterances far more frequently than native teachers. Native English teachers used sort of, mostly as a pragmatic discourse marker imposing mitigating effect, while Korean English teachers did not use it at all; instead, Korean English teachers uttered kind of almost two times more than NETs. Both KETs and NETs used kind of more as a non-pragmatic discourse marker, indicating ‘a type of,’ rather than as a pragmatic discourse marker, to impose mitigating effect. Limitations of the study and pedagogical implications are discussed.

15:55
A Study on Modality Makers ‘-eulrae(-(으)ㄹ래)’ and ‘-eulge(-(으)ㄹ게)’ in Korean and their Chinese Equivalents Based on Spoken Language Corpus – Focused on the Distribution Aspect by Register and Semantic-Pragmatic Function

ABSTRACT. The purpose of this study is to analysis the distribution characteristics by register of modality makers ‘-eulrae(-(으)ㄹ래)’ and ‘-eulge(-(으)ㄹ게)’, and by investigating their semantics-function and syntactic function, this study also aims to find their corresponding Chinese form based on oral and written corpus. The corpus used in this study is made up of new Yonsei corpus (nYsca1 and nYsca2) and semi-spoken corpus.

First, based on ‘the theory of the restructuring of modality makers’(IM Hong-Pin: 2018) and ‘the category on modality(Bak Jae Yeon: 2016)’, this study creates a modal named ‘disambiguation modal on modality for Korean language teaching’, and by using this modal, this study chooses modality makers ‘-eulrae(-(으)ㄹ래)’ and ‘-eulge(-(으)ㄹ게)’ as research object.

Second, this study reveals the frequency and their distribution characteristics by register of ‘-eulrae(-(으)ㄹ래)’ and ‘-eulge(-(으)ㄹ게)’ by using oral and written balanced corpus. Meanwhile, based on ‘A three-dimensional grammar framework (Larsen-Freeman: 1991)’ and ‘The double triangle theory (Fuyi Xing: 2016)’, this study enriches ‘modality analytical factor framework (Eom, Nyeo: 2009)’ and analyzes the semantics-pragmatic function and syntactic function of modality makers ‘-eulrae(-(으)ㄹ래)’ and ‘-eulge(-(으)ㄹ게)’ by using semi-spoken corpus.

Third, this study also attempts to find the corresponding Chinese form of modality makers ‘-eulrae(-(으)ㄹ래)’ and ‘-eulge(-(으)ㄹ게)’ by using corpus-based translation analysis method and analyzing the sentences collected form the parallel corpus.

14:40-16:20 Session 13D
14:40
Metaphorical Representation of 2018 Presidential Elections in the Political Discourse of the Presidential Opponents in Georgia

ABSTRACT. The study of conceptual metaphors has been on the agenda of Sociolinguistic research for decades. Politicians of different gender seem to have their own ways of exploiting conceptual metaphors as a means of persuasive rhetorical devices while talking about or evaluating the same social situation or political event. Therefore, gendered use of conceptual metaphors become the essential part of politicians’ identity, their style of communication. The present study focuses on the gendered use of metaphors in two presidential candidates’ (Salome Zurabishvili and Grigol Vashadze) English as well as Georgian speeches and interviews made during 2018 presidential election campaign in Georgia. Conceptual Metaphors: POLITICS IS JOURNEY, POLITICS IS WAR and STATE IS BUILDING are investigated with regard to we-they opposition in Salome Zurabishvili’s and Grigol Vashadze’s political discourses as gendered use of conceptual metaphors are supposed to highlight hidden ideological effects and power asymmetries exercised in the discourse. The research is based on the theoretical premises of Critical Discourse Analysis (CDA) in combination with Lakoff and Johnson’s Conceptual Metaphor Theory. The study combining quantitative corpus-based investigation with qualitative text analysis is supposed to shed more light onto gendered cognitive repertoire of the Georgian Presidential Candidates.

15:05
Newspaper Corpus and Political Terms: ‘Liberalism,’ ‘Democracy,’ and ‘Liberal Democracy’ in South Korea from 1945

ABSTRACT. This paper aims to analyze newspaper corpus in various ways to study how the terms ‘liberalism’ and ‘democracy’ have been conceptually changed and how distinctively used in South Korea. We also look at the use of the term ‘liberal democracy’, a closely related term to these words. The newspapers published after the end of the Japanese colonial rule are used for analysis. Based on the relative frequency trends of the terms, we examine the changes of meanings at critical political events, such as the establishment of the South Korean government, Yusin Restoration, Democratization, etc. Since the progressive media appeared in the late 1980s, we examine the changes in the proportion of usages of the terms in the conservative and progressive media by using time serial analysis. In order to grasp the conceptual changes of the terms in detail, we performed a co-occurrence network analysis to identify the changes of words used with each term. There is a strong tendency for vocabularies such as duty, state, responsibility, and negligence appeared with the term liberalism, and words such as order, compliance, state, and nation with democracy. Furthermore, by analyzing the characteristics of concatenated words by extracting n-grams, we can grasp the more detailed changes in meaning. This paper is meaningful in that it presents the possibility of political discourse analysis based on corpus as an interdisciplinary study of linguistics and political science.

15:30
Performative Tweets: Trump and the Representation of North Korea in US News

ABSTRACT. North Korea has been a viral topic in international news – perhaps even more so since Donald Trump became president of the United States. In the paper at hand, we focus on a straightforward research question: Is there any correlation between Trump's tweets about North Korea and the semantic prosody of North Korea found in US American news? Tweets by @realDonaldTrump might provide a glimpse into the president's mindset, as he prominently uses Twitter to bypass the traditional gatekeepers of media outlets to deliver 'unfiltered' messages. His tweets, though not necessarily coherent, have had a real influence on world politics, as when he recently tweeted for a meeting with Kim Jung Un that realized itself a couple of days later when both leaders shook hands at the DMZ.

For our corpus-based analysis of the discourse regarding North Korea, we are looking at four corpora in detail: firstly, Trump's personal tweets, collected in the Trump Twitter Archive (http://trumptwitterarchive.com/). Secondly, we use a general reference corpus for English Tweets, obtained by streaming the 1% sample of the firehose (a.k.a. the 'spritzer'). Last but not least, for US American news, we use articles from the right-leaning Fox News and the left-leaning CNN. Our time of analysis stretches from Trump's election in 2016 until the end of 2018, covering the North Korea–United States Singapore Summit in June 2018, where Donald Trump and Kim Jong Un met for the first time. We use CQPweb for all our analyses.

15:55
'If you ride a lame horse into a race ...': a corpus-based analysis of metaphors in John Mahama's political speeches

ABSTRACT. Political speeches are a central genre in political discourse analysis and they have facilitated studies into language and rhetoric since Greek and Roman antiquity. Still today, the delivery of a political speech is largely considered a rhetorical act, and politicians are aware that they must speak persuasively in order to gain the trust, confidence and ratification of their (potential) followers (Charteris-Black, 2005, 2014). Using cognitive rhetoric theories (Lakoff& Johnson, 1980; van Dijk, 2008), this study examines President John Mahama’s use of metaphor expressions and conceptual metaphors in his political speeches from a corpus approach. The corpus of approximately 70, 000 tokens was compiled from 21 speeches that he delivered before, during and after his presidency. I argue that John Mahama’s use of metaphor is conscious, consistent and conceptually structured, and that this represents an important rhetorical strategy that may have contributed to his political success. In the corpus analysis, 39 lexical resources reveal or communicate metaphorical representations in Mahama’s speeches, but 4 of these – namely GROW, BUILD, HEAD and FIGHT – carry the most frequent metaphor uses. However, all the 39 resources contribute to the conceptual metaphor constructions (i.e. target-source domain mappings) commonly associated with Mahama as a political speaker. The findings in this study do not only make a useful contribution towards a better understanding of John Mahama’s political ideology, they also foreground the persuasive potential of metaphor, especially for audience engagement in political talk.

14:40-16:20 Session 13E
Location: Helinox Hall
14:40
Idiomatic Expressions Containing “Gesicht (face)” And “Kopf (head)”: A Corpus Analysis of Semantic Extension

ABSTRACT. Idioms are commonly defined as figurative linguistic expressions which cannot be interpreted on the basis of compositional meaning of lexical elements therewithin. The intriguing inconsistency in terms of the normal rules of grammar and semantic compositionality have caught concentration of numerous linguists, making idiomatic expressions a long-standing interest in syntax, semantics and pragmatics. This study aims to reveal and categorise the aspects of semantic extensions of “Gesicht” and “Kopf” by investigating idiomatic expressions containing these two German body terms. To examine their collocations, the corpus of German Newspaper “die Zeit” extracted from DWDS was encoded in CQP Format – one of the modules of the IMS Corpus WorkBench (CWB) – for query and analysis. Although “Gesicht” and “Kopf” have conceptually related central meaning regarding positional proximity, differences in semantic extension between the two body terms are to be observed through divergent lexico-grammatical patterns.

15:05
Comparative Research of German SMS Dialogues and Telephone Conversation - based on Verbmobil Corpus and MOCODA

ABSTRACT. This study has the purpose of revealing the differences between registers in formal and informal communications. Based on the telephone conversation data in German Verbmobil Corpus and SMS Dialogues of MOCODA with the topic “Appointment”, we will be comparing the statistic distribution of vocabulary and part of speech between the two corpora. Furthermore, we will look into different dialogue structures in formal and informal communications. 1. Are there any differences between the corpora in statistic distribution of lemma? 2. Are there any differences between the corpora in statistic distribution of part of speech? 3. How different are the dialogue structures? In order to answer the questions mentioned above, we will be using CWB as the tool to analyze the corpora.

15:30
Extensibility as a focus for corpus analysis software: introducing the CQPweb plugin framework

ABSTRACT. CQPweb is the graphical user interface to the IMS Open Corpus Workbench (CWB). It is open source (see http://cwb.sf.net); thus in principle anyone can add features (e.g. analyses, visualisations). In practice, no one other than the primary developers has done so, because modifying CQPweb requires knowledge of not only programming but also its data model, program flow and internal functions.

No program can fully support the needs of every user out-of-the-box. Thus, what is needed is a means for non-developers to write units of code which extend CQPweb’s functionality without having to understand the whole system. To address this, a framework for ‘plugins’ has been added to CQPweb. Plugins are self-contained code objects which extend CQPweb. Different types of plugins address different use-cases: e.g. ‘Annotators’ tag input data, whereas ‘Postprocessors’ manipulate sets of query results. Each type is defined as an object ‘interface’ – a list of subroutines that the plugin must include. CQPweb calls the interface subroutines to execute the plugin’s capabilities at the appropriate time.

Plugins must be written in PHP, like CQPweb itself; however, it is possible for this PHP code to do no more than delegate the work to an external program, written in e.g. Python, R, or Perl, allowing re-use of existing code. Straightforward access to both external tools and CQPweb’s internal data structures is provided via a library of functions specifically developed for plugin authors. This framework thus demonstrates by implementation a promising paradigm for enhancing the extensibility and flexibility of corpus-analytic software.

15:55
CWB, CQPweb, and Special-Purpose Corpora

ABSTRACT. In this paper, we will present the process of compiling eight special-purpose corpora and searching through them using CWB and CQPweb tools. The corpus of German sentiments, the corpus of German advertising slogans, the corpus of teaching German, the corpus of German FrameNet, the corpus of English advertising slogans, the corpus of Korean advertising headlines, the corpus of K-POP, and the corpus of Korean SMS messages will be handled as special-purpose corpora. These corpora have been designed aiming at finding interesting data in the related fields. At first we will describe the characteristics of each special-purpose corpus including some subsidiary Perl scripts. We will then discuss the advantages as well as limits in using CWB and CQPweb tools by constructing corpora and in searching data using the tools as search engines. At last we will compare CWB tool and CQPweb tool and suggest some improvement points.

14:40-15:30 Session 13F: [Keynote 3] Winnie Cheng. A study of move-specific concgrams: Hong Kong Corpus of Corporate Government Reports

Keynote Session

Location: Grand Ballroom
14:40
A study of move-specific concgrams: Hong Kong Corpus of Corporate Government Reports

ABSTRACT. TBA

15:30-16:20 Session 14: [Keynote 4] Laurence Anthony. Analyzing corpus texts at the text level: reflections, challenges, and possible solutions

Keynote Session

Location: Grand Ballroom
15:30
Analyzing corpus texts at the text level: Reflections, challenges, and possible solutions

ABSTRACT. TBA

16:30-18:10 Session 15A
Location: Choi Young Hall
16:30
Study of the Modification of a Novel Text through adjusting the Coverage of Single Words and Multi-word Units

ABSTRACT. This study aims to modify a novel text for the intermediate Korean learner's level by adjusting single words and multi-word units. For the text modification, this study is based on the result of Hwang (2017) that analyzed the lexical coverage of short novels included in high school textbooks based on the lexicon in “The Development of Vocabulary Contents of Korean Language Education” of the National Institute of the Korean Language (NIKL). Text modification aims to make the single words coming under the beginning and intermediate levels of the lexicon in “The Development of Vocabulary Contents of Korean Language Education” of the NIKL more than 95% of the entire text. This study is according to Nation (2001) and Yoon (2011) who investigated the coverage of understandable reading texts. In addition, this study would modify the coverage so that it would become a text coverage more than 95%, looking at multi-word units as well as single words as the extension of words. The text coverage of multi-word units is based on “The Development of Korean Language Grammar and Expression Contents” of the NIKL. This study will be expected to prepare the base for the actual application of novels in the Korean as a foreign/second language education field, modifying the text for the learners’ level with an objective and efficient method.

16:55
Livi Zheng and Her World-class Claim : a Lexical Analysis on Netizen’s Reaction in @tirtoid

ABSTRACT. There are circumstances in which international recognition becomes an overly-achieved standard. One of them is in the show business as seen in a current case that went viral on Livi Zheng. A series of news coverages released in @tirtoid put criticism Livi Zheng, an Indonesian director based in Los Angeles who made self-proclaimed success on being a nominee in the annual Academy Award through her latest documentary film “Bali: Beats of Paradise”. Additionally, the film was followed by endorsement given by high-rank officials in the country and storming publicity which amplified the claim. This case received various reactions from netizens. This study attempts to carry out a lexical analysis on the postings made by @tirtoid on Livi Zheng’s case. Source of the data is netizens’ comments on two posts in @tirtoid’s Instagram account. The results indicate that netizens tend to show criticism through sarcasm on how media’s coverage went wrong.

17:20
Resolving the polysemy of Korean postpositions with unsupervised learning: -ey, -eyse, and -(u)lo

ABSTRACT. This paper features an on-going project on the resolution of polysemy involving Korean postpositions, which are notoriously ambiguous. For instance, the adverbial postposition -(u)lo is either directional or instrumental (Choo 2008), as exemplified in (1) and (2).

(1) 도로-(으)로 갔다. tolo-(u)lo ka-ass-ta. (I) went to the road (directional)

(2) 자전거-(으)로 갔다. cacenke-(u)lo ka-ass-ta. (I) went by bicycle (instrumental)

Previous research in computational linguistics has attempted to resolve the polysemy of postpositions in Korean (Shin et al. 2005, Kim et al. 2006). However, due to their focus on computational power to the detriment of linguistic expertise, the models have done a poor job at resolving polysemy. To tell the distinct meanings apart, our method consists in (a) limiting the scope to three of the most frequent particles (-ey, -eyse, and -(u)lo) as found in the Sejong Corpus (Shin 2008), and (b) implementing three kinds of distributional semantic models: SVD (Eckart and Young 1936), a combination of PPMI & SVD (Turney & Pantel, 2010), and SGNS (Tomas et al. 2013). The annotated corpus designing to represent the semantic tags was used as learning data set, and the optimal model was calculated by comparing the recognition accuracy of the learning models obtained by the combination of the distributional semantic models and window sizes. Pending results, we hope to better classify ambiguous postpositions in the Sejong Corpus and provide linguists with an improved system of semantic tags.

16:30-18:10 Session 15B
16:30
A Study on the Use of Chinese Conjunctions Huo4shi4 and Hai2shi4

ABSTRACT. Both of huo4shi4 and hai2shi4 are Chinese conjunctions that indicate a choice between alternatives which are all translated into ‘or’ in English. However, the use of these conjunctions is different. In the Mandarin teaching materials ( A Course in Contemporary Chinese and Practical Audio-Visual Chinese), the main difference between two conjunctions is based on grammar. Huo4shi4 is always used in narrative sentences, and hai2shi4 is used in question sentences. While the textbook only suggested the syntactic patterns, the detailed cognitive process is still required before we can determine to choose the two conjunctions in a context. 張園(2010)attributed the difference of the two conjunctions to whether the choice is opposite or not. Nevertheless, many instances in corpora show that the options which are not entirely contrary connected by not only hai2shi4 but huo4shi4. Therefore, we consulted the 2017 written corpus in Corpus of Contemporary Taiwanese Mandarin (COCT) to investigate the difference between huo4shi4 and hai2shi4. There are several findings were explored according to the corpus study. First, the main differences between the two conjunctions were included: Hai2shi4 was found to collocate with the contexts that only one of the choices can be selected; Huo4shi4 was shown in the situation comprised multiple options. Second, the speaker’s perspective determines whether the choices can be multiple selected. Moreover, to verify our findings in corpus study, the next step of the investigation focused on the psycholinguistic judgment task based on the corpus and applied these results to revise the design of Mandarin teaching materials.

16:55
The Restrictiveness and Subjectivity of Jin3Jin3

ABSTRACT. This study probes the restrictiveness and subjectivity of Jin3Jin3 by analyzing its syntactic structures and semantic functions using corpus data. The research material was collected from “Academia Sinica Balanced Corpus of Modern Chinese”. There were a total of 171 instances analyzed, but two ones were removed; thus, this paper counted percentage by 169. The results of the study are as follows: First and the most frequent one, an abstract or real space was restricted (44%): the speaker stressed how narrow the space was for a certain purpose. Second, the scope on instruments was 20%, the speaker emphasized the use of a method and evaluated its effectiveness. Third, the restriction on number was about 14% where the speaker believed a certain number was highly relevant to a negative or positive result. Fourth, there were 12% on the cause or result in a sentence. The speaker emphasized the cause, which could have a positive or negative influence, when it is restricted. When a result was restricted, it was always negative; thus, the speaker hoped it would not happen. Finally, the scope of time constituted about 10% of the total instances. The speaker often described a negative event that happened in a short period of time. The finding proved that there is a strong connection between a speaker’s evaluation and the components restricted by Jin3Jin3. It is expected that this paper should provide empirical evidence to conduct further research on the restrictiveness and subjectivity of Chinese scope-adverbs.

17:20
Corpus-based Insights into Light Verbs in Mesolectal Malaysian English

ABSTRACT. Research discussing nativised grammar is not as extensively documented as nativised lexeme in Malaysia. The issue of grammatical features being a neglected area in Malaysian English as compared to other Asian English varieties with relatively fewer users is also highlighted by Newbrook (2006). It is time to focus on nativised lexico-grammar in the sub-variety that truly represents Malaysian English. This study aims to find out the syntactic and semantic patterns of three light verbs – GIVE, TAKE and MAKE in a corpus of mesolectal Malaysian English. To facilitate this study, a general corpus, which consists of threads from Lowyat.Net, one of the most popular Internet forums in Malaysia, was created. Theoretical constructs governing the syntactic and semantic patterns of light verbs follow Mehl (2017) and Wierzbicka (1982) respectively. The analysis shows that: a) syntactically, most light verb constructions (LVCs) in mesolectal Malaysian English follow the prototypical structure of LVC i.e. light verb + indefinite article + deverbal noun, half of them take the isomorphic nouns preceded by zero article; passive constructions can only be seen in LVCs headed by MAKE b) semantically, almost all LVCs show signs of telicity and they also indicate bounded actions. Insights from the first Malaysian corpus created using web sources reveal Malaysian English is still at the early phase of nativisation because most light verbs constructed still adhere to the prototypical structures.

17:45
Subjunctive complementizers in Korean: corpus studies on attitude predicates

ABSTRACT. The purpose of this paper is to identify the novel types of subjunctive complementizer nka and "ul-kka" in Korean. Korean employs two types of overt particles to mark interrogative complementizers. As shown below, we have the ordinary interrogative complementizer "ci" and modalized interrogative complementizer "nka/ul-kka" (Kang and Yoon to appear). Among the variants of the ordinary interrogative complementizer "ci" and the modalized interrogative complementizer "nka" and "ul-kka", we argue that "nka/ul-kka" are lexicalized form of the epistemic subjunctive mood exponent appearing in subordinator C.

Our claim will be further supported by corpus study by comparing different types of predicates selecting "nka" and "ul-kka" and/or "ci". For this, we proceed our study as follows: First, we collected data from Sejong 21 sense tagged corpus, consisted of approximately 12 million words of written texts. We extracted predicates co-occurring both with "ci" and "nka/ul-kka" by using Perl program. Second, we conducted frequency test to show the list of attitude predicates (Anand and Hacquard 2013) co-occurring with each complementizer. Third, for the statistical verification, we use keywords statistics - the Dunning (1993)’s Log-likelihood based on the frequency data. Our statistical analysis show that there is a statistically significant difference between the occurrence of "nka/ul-kka"-Comp and "ci"-Comp in terms of veridical vs. nonveridical dichotomy (Lahiri 2002; Giannakidou 2015). The theoretical implication of current study is two-folds: First, in Korean, interrogative can be another prerequisite of subjunctivity. Second, for the analysis of interrogative subjunctive, the inquisitive-based theory is untenable but nonveridicality can provide the general theory.

16:30-18:10 Session 15C
Location: IBK Hall
16:30
Comparative study of language use of British students & Japanese EFL learners for application in Data Driven Learning

ABSTRACT. This study aims to provide suggestions regarding the use of a semantic-based, data-driven learning (DDL) approach for elementary and junior high school level Japanese EFL learners by comparing the written language of British students aged 5-9 and 10-13 with that of 13-15-year-old Japanese students. This study built the British students’ corpora using the children’s original story entries for BBC’s “500 Words” story writing competition from 2014-2017, and also the 50 best stories chosen each year. This study set the British students’ corpora as the target language for Japanese EFL learners at an elementary or a junior high school level. In addition, this study put together the Japanese learner corpora by collecting written assignments of first- to third-year students of a national secondary school, including their first emails to pen pals in other countries from 2004-2006. This study identified characteristic verbs, adjectives, and adverbs by correspondence analysis, and also developed semantically annotated corpora. This study provides suggestions regarding the use of a semantic-based, DDL approach for Japanese EFL learners at an elementary or a junior high school level by adopting the part-of-speech tagging based on correspondence analysis and the semantic annotation to yield different types of DDL materials. Semantic-based DDL would be a new approach to learning that provides students the opportunity to learn synonyms, expand their vocabulary, and learn different usages through inductive learning.

16:55
A computer-aided error analysis of Omani students' written English and an investigation of their grammatical competence: A corpus-based study

ABSTRACT. A learner Corpus (LC) gives a clearer description of learners’ language by collecting texts (spoken or written) and can be utilised electronically (Granger, 2002). More value is added if the LC is error-annotated due to its importance to both second language research and language teaching and learning research. By analysing errors we can indicate the actual language output of learners, provide clear information on how a language is learnt and can be used as a learning tool for learners (Corder, 1981). Besides looking at learners’ errors, investigating learners’ grammatical competence gives more insights on the stage of language teaching (Macdonald et al., 2013). Therefore, the current research aims at providing a detailed description and discussion of an empirical, computer-aided error analysis (CEA) of a learner corpus and examining the use of grammatical categories (determiners, prepositions, clauses, conjunctions etc.) in the written production of Omani learners of English. These learners are at the upper-intermediate level of English, which is roughly equivalent to the B2-C1 level of the Common European framework (CEFR). The CEA is conducted to identify the most common errors that Omani learners at the Higher College of Technology make in their writing. Error types are identified and run through O’Keeffe and Mark’s (2017) methodology, which aims to describe grammatical competency at each CEFR proficiency level. By undertaking this investigation, an indication of the written competence of Omani learners of English will be provided. Additionally, an online course will be designed to help learners overcome their written problems.

17:20
Error Analysis: A Corpus Based Study of BS Program Students at University of Sialkot, Pakistan

ABSTRACT. To err is human. In case of language learning it is even more human. This paper focuses on language learning errors. Language errors are basically an unproductive part of language. In past, errors were considered equal to sins, now learners’ errors are considered an important part of language learning by the linguistics. These errors are measured helpful to learners, researchers and even teachers themselves (Corder, 1967). It is through error analysis that we come to know about nature, incidences, consequences and causes of ineffective language i.e errors. Now a days, it is not possible to identify the parameters of second language learning without focusing on learners output. This research focuses on written errors committed by the students of University of Sialkot at BS level. Pakistani component of ICLE (International Centre for Learner English) has been used for this research. It carries one hundred thousand words in total, written in argumentative essays by the students of BS level programs and is POS tagged using CLAWS7 POS tagger.

17:45
On Korean Speakers’ Knowledge of Unaccusativity in English

ABSTRACT. The unaccusative-unergative distinction is presumably universal, but languages vary as to the syntactic and morphological reflexes of such a distinction. Given the cross-linguistic variation, a learnability problem naturally arises for the L2 acquisition of unaccusativity. This study addresses Korean speakers’ knowledge of unaccusativity and unergativity in L2 English. More specifically, this study will address the questions of (1) whether Korean speakers are sensitive to the unaccusative/unergative distinction in English; and (2) whether they are able to distinguish unaccusatives from transitives. In order to investigate the two questions, we employed two-pronged methods of linguistic research; viz. corpus exploration and language experiment. First, this study conducts the collostructional analysis so as to compare the distributional properties of two learner corpora (YELC and GLC) and COCA. Second, this study makes use of the toolkit OpenSeame and used a 5-point Likert scale. 173 adult Korean speakers (31 beginners/ 59 intermediates/ 31 advanced) participated in the study. Three types of verbs were employed in the analysis: unaccusatives, unergatives, and transitives. Each type was represented by seven verbs, which were selected based on frequency analyses of the learner corpora. Korean learners’ knowledge of unaccusative-unergative distinction was tested, using diagnostics such as overpassivization, causativization, and compatibility with a purpose clause. This study also considers two semantic properties, telicity and animacy, which are frequently argued to be associated with unaccusativity/unergativity.

16:30-18:10 Session 15D
Chair:
16:30
Discourse of a Presidentiable: Strategic Presented Self of Rodrigo Roa Duterte during the 2016 Philippine Presidential Debates

ABSTRACT. Presidential candidates form and perform strategic self-presentation in order to attract voters. Using the theories of self-presentation (Goffman, 1956) and social drama (Turner, 1985), this paper looks into the discursive nature of the dramatic presentation of Rodrigo Roa Duterte, then a presidentiable and now the 16th President of the Philippines. The paper analyzed transcripts of three official presidential debates that were broadcasted to the general public. The paper argues that Duterte used self-presentation strategies such as his strong belief in his would-be role, his dramatization of his political history, and an idealization of his role as a president. Discursively, Duterte strategically trumpeted a breach in his predecessor’s administration that allowed him to usher a crisis that led to him performing a messianic discourse so he could rally support his notion of “change” in the Philippine political landscape.

16:55
Image of China in the Diplomatic Discourse of Pakistan and India: A Corpus Based Critical Discourse Analysis

ABSTRACT. Significant role of language in complex societal relations, diverse human affairs, labyrinth of political matters and multifarious issues of communication requires extensive investigation.Diplomatic affairs are represented through a particular type of discourse or using particular linguistic techniques. Both qualitative and quantitative approaches have been employed in the analysis of diplomatic discourses of Pakistan and India based on theoretical footings of Political Discourse Analysis by Fairclough and van Dijk. In this connection, a corpus software ‘Wordsmith’ has been used for the study which helped trace the concordances and patterns of language used in the diplomatic affairs. Data comprises the official discourse which is used in diplomatic affairs of both the states. A large bulk of corpus has been collected through the official websites of Pakistan and India which spans over a period of six years from 2012-2018. The analysis also encompasses concordances, keywords in contexts, collocations, lexical choices, idiomatic expressions, foreign words, and translations of local expression used in the language. The study reveals the multiple layers of the ideological construction of the image of China in the diplomatic discourses of Pakistan and India which has multiple implications. Indian diplomatic stance is indicative of tactful maneuvering and a meticulous employment of verbal and various other significant linguistic choices because of the multiple reasons; already existing Pak-China relationships and a certain rift between Indo-China relationships being a few of them. On the contrary, being a strategic ally, Pakistani has been speaking high of the Pak-China relationships at the diplomatic fronts.

17:20
A CORPUS-DRIVEN STUDY ON THE DISCURSIVE CONSTRUCTION AND REALIZATION OF U.S. AND CHINA-LISTED COMPANIES’ CEO BIOGRAPHIES

ABSTRACT. Objectives The Chief Executive Officers’ English biographies (CEO bios) are prepared by communications professionals as a corporate identity-building piece. The study looks at the discursive construction and representation of CEO bios as a promotional organizational discourse reflective of corporate identity.

Methods Two specialized corpora of 100 Chinese-origin companies of Shanghai Stock Exchange (SSE) and 100 U.S. origin companies of New York Stock Exchange (NYSE) were built and automatically tagged on word/phrase, part-of-speech, semantic patterns levels using Wmatrix3. Drawing on Watson’s ‘three levels of social life’ model (2009), the corpus-driven findings were interpreted in terms of individual, corporate/organizational and social/cultural levels.

Results The discursive construction and realization of the NYSE and SSE corpora portrayed highly different dimensions of a typical CEO. At the individual level, U.S. CEOs valued their job-related achievements, whereas China CEOs’ core competences were based on individual merits. At the corporate level, U.S. CEOs preferred mentions of their broad job scopes and networks, whereas China CEOs tend to emphasize holding multiple roles concurrently. At social/cultural level, foreign markets exposure was seen as an asset for U.S. CEOs, but China CEOs focused relative more on their services with the Communist Party of China.

Conclusion The study has practical benefits to communicators required to produce CEO bios for international consumption as they might need to customize their texts based on country differences. Senior executives from China and U.S. who work with each other could also have more acute awareness and appreciation of what matters as preferred senior executive competences.

16:30-18:10 Session 15E
Location: Helinox Hall
16:30
An investigation into the significance of noise in learners' written corpus: the case of YELC 2011

ABSTRACT. YELC 2011 is based on the Yonsei English Proficiency Test (YEPT) 2011, a placement test of English for the freshmen of the year 2011. Much time and effort have been devoted to cleaning up YEPT 2011, partly owing to 'noise' -- non-English textual input by students -- that frequently appear in the written answers. Noise, obviously, interferes with the analyses of the corpus, so the clean up of this noise was a conscious decision made by the YELC team. From the perspective of second language acquisition, however, this noise might be significant, especially given the rare characteristics of YELC 2011. Given that YELC is a leveled corpus based on the Common European Framework of Reference, noise (non-English textual input) may provide us with information about the process of second language generation (in this case composition) at different levels of proficiency. To this end, we will investigate the instances and types of non-English textual noise at various levels of English proficiency in YELC 2011 and their significance for second language learning.

16:55
A Corpus-based Study of Korean Learners’ Use of Demonstrative Anaphora this and that: Focusing on the Givenness Hierarchy

ABSTRACT. The purpose of the study is to investigate Korean learners’ use of demonstrative anaphora, this and that in terms of the Givenness Hierarchy theory which is introduced by Gundel, Hedberg, & Zacharski (1993). It describes six cognitive statuses (In Focus, Activated, Familiar, Uniquely Identifiable, Referential, Type Identifiable) in the use of grammatical devices of referring given or new information. Since these statuses are related to cognitive elements such as memory and attention which both the speakers and listeners share, EFL/ESL learners may have difficulties of determining proper referential forms is discourse. This study examines how Korean learners use these two anaphora (this and that) and compares their referring patterns with those of native American and British learners. For this, two learner corpora, the Korean learner corpus and the native learner corpus, LOCNESS are contrastively analysed to discover similar or different patterns of adopting the six cognitive statuses on the basis of the theory. The results of the study show that the distributional patterns of the statues are similar between Korean and native learners. However, they are quite different from those of Gundel et al. (1993). In conclusion, the study presents that a variety of factors with regard to cognitive statuses are involved in the grammatical system of referring in discourse. Unlike the traditional emphasis of physical distance in the use of this and that in EFL/ESL pedagogy, an appropriate implication is required to apply real usage found in this study.

17:20
A corpus-based comparative analysis of two different language skills (writing & speaking) in YELC

ABSTRACT. The current study investigates and compares written and spoken corpora of YELC, produced by Korean university students, who are L2 learners of English language. Study of corpus linguistics has shed light on L2 learners’ written corpora or spoken corpora, however, analyses across two different language skills – speaking and writing have drawn relatively lesser attention. The study focuses on exploring types of errors and linguistic features, such as collocational errors and lexical diversity. In addition, L2 learners’ language use in written and spoken text has been analyzed with different aspects of language – word information, syntactic complexity and semantic connection. Two different types of corpora – written and spoken are compared using corpus linguistics tools. In R. Ellis’ study (1994), he proposes that learners may make errors while speaking, but not in writing, or vice versa, since the two skills involve different processing conditions. Therefore, a thorough analysis of L2 learners’ written and spoken language corpora helps discover learners’ knowledge of second language as well as provides pedagogical methodology which can be applied to second language acquisition. Kyle and Crossley (2016) emphasize that range and bigram are stronger key indicators of lexical sophistication compared to frequency in the context of L2 acquisition and L2 writing and speaking proficiency.

17:45
Issues and challenges in the development of Yonsei English Learner Corpus (YELC) 2011

ABSTRACT. Yonsei English Learner Corpus (YELC) 2011 is a collection of naturally occurring written texts (6,572 texts and 1,085,828 words) originally produced by 3,286 students who were accepted by various departments at Yonsei University in South Korea in 2010. YELC is searchable electronically, and all texts are graded into 9 levels. During this talk, I will describe the process of the compilation of YELC in great detail. I will also present various issues and challenges related to the development of YELC. This talk will be of interest to learner corpus builders and to researchers using YELC.