APCLC 2020: ASIA PACIFIC CORPUS LINGUISTICS CONFERENCE 2020
PROGRAM FOR THURSDAY, FEBRUARY 13TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:40 Session 16A
Location: Choi Young Hall
09:00
Academic Collocation in Dutch students' academic writing

ABSTRACT. Collocation plays important roles in promoting language learners' fluency, accuracy and native-like (Wray, 2000). Thus, mastering collocation benefits students, for instances, in improving their vocabulary size and its use in academic writing. However, studies have shown that L2 learners of English face great difficulties in mastering and using collocations in their written language (Nesselhauf, 2003). The present study focuses on collocations in Dutch L1/L2 English students' academic writing, an area in which few investigations have been carried out. Using corpus linguistics to frame the methodological approach, the study investigates the most frequent academic collocations and their functions in academic writing. The data used comprise a learner corpus (98540 tokens and 10512 types) which is integrated in the corpus software LancsBox (Brezina et al., 2018). The software is used to extract a list of words. The first 100 words are chosen and the Academic Vocabulary List (Davis & Gardner, 2013) is used to verify academic words among them. The most frequent academic collocations are visualised in collocation graphs (CollGraph) via the LancsBox function and KWIC (key words in context ) are extracted to undertake a qualitative analysis. The results reveal the nine most frequent used academic words; language (254), however (132), important (131), research (119), states (117), culture (113), therefore (103), example (101), and social (98). Ultimately, the study hopes to provide insights into Dutch students' collocation use which will be benefit for English Learning and Teaching (ELT) academic writing and to add to the growing body of literature on collocation.

09:25
Corpus based activity in ESP: What,Why,How

ABSTRACT. This presentation will discuss the use of data driven learning (DDL), highlight the perceived value and benefits of engaging in DDL, emphasize learning which can happen with the use of DDL, share freely available materials which have been developed for DDL, and provide tips for using such activities. In data-driven learning (DDL) students look at real language via corpus data, discuss and share observations and opinions based on this language, and subsequently make rules for and practice use of the language they studied (Carter & McCarthy, 1995). In a recent meta-analysis, Boulton and Cobb (2017) found that DDL is effective and efficient for students, especially in foreign language environments. Because DDL is still an unconventional teaching/learning tool, however, many teachers feel unprepared to engage with students in such activities and have not taken advantage of the opportunity to use such activities with their students. Presenters in this session have conducted a study with 12 foreign language teachers and 162 students of English as a foreign language using DDL in their general English classroom. Teachers participated in an initial training session on the value of corpus linguistics, reviewed the materials and steps for a lesson plan to use DDL, then, after conducting the lesson, students and teachers both completed a questionnaire which examined how they felt about and responded to the activities as well as what they learned from the activities. This presentation will briefly share results of the study to highlight initial training needed to incorporate DDL.

09:50
Data-driven Dentistry: Corpora, English for Specific Purposes, and Writing to learn

ABSTRACT. This paper explores the affordances of disciplinary corpora for English for Specific Purposes (ESAP) materials development and data-driven learning pedagogy, with a specific focus on the integration of a purpose-built corpus query platform and accompanying corpus-based pedagogical activities into a fifth-year undergraduate ESAP disciplinary writing course for dentistry at the University of Hong Kong (HKU). The paper first describes the rationale behind the adoption of a corpus-based approach for the teaching and learning of the complexities of grammar, disciplinary vocabulary, and rhetorical features as part of ‘writing-to-learn’ for the discipline (Manchón, 2011) before describing the functionality of the online corpus query platform built to facilitate student exploration of the professional (collection of public dental health journal articles) and learner corpora (HKU dentistry student reports). The paper then explains how the findings from the two dentistry corpora drove the development of a suite of accompanying ESAP data-driven learning activities. The overall aim of this presentation is to show that with the right corpus platform and carefully structured accompanying activities, there is less need for students to have extensive technical skills and less need for teachers to devote time and effort on training students in corpus queries - both of which are claimed to result in teachers’ reluctance to incorporate corpus corpus-based data-driven learning into their practice (Lenko-Symnanszka & Boulton, 2015).

10:15
Teaching business discourse using role-play corpora

ABSTRACT. This study introduces a pedagogical tool for teaching business discourse. This study is mainly composed of two parts. First, it describes procedure of collecting role-play corpora. Second, it shows how role-play corpora are used for a teaching purpose. Although role-play corpora include an individual role-playing exercise, emphasis is given on interactive role-playing exercises ranging from brainstorming exercises or debates in business meeting or rhetoric in promotional genres (e.g. cold call). Accordingly, role-play corpora include a challenge or conflict element between role players. Communications between or among stakeholders are the most common scenarios of the corpora. Special attention is paid to rapport management of inherent FTAs (e.g. complaint in call center; disagreement in business meeting; request in sales). This study exemplifies how communication strategies for resolving conflict and empathy are incorporated into call center talk, which type of communication strategies for politeness and persuasive purposes are used in business meeting and cold call situations, respectively. It also illustrates the role of power and distance between interactants in conducting the face-threatening acts. To a lesser extent, it investigates turn taking and topic management and their link to rapport management in the role-play corpora.

09:00-10:40 Session 16B
09:00
Phraseological Variations in Terminology of Public International Law

ABSTRACT. For the past twenty years, “phraseology” has been considered a very important topic of study for various specialized languages. The linguistic view that used to see phraseology such as “idiom researches and lexicography classifying various kinds of idiomatic expressions” has changed meaningfully. Nowadays, thanks to these changes, the new view is focused on identifying and classifying phraseology as well as applying them to research in theory. That is why we would do well to try to define new horizons of phraseology in different specialized languages. The language of interest here is the prescriptive and descriptive language of international law instruments. We should consider this language as the normative language of judges, legislators, courts and international lawyers. These practitioners – who use specific types of phraseology and stable linguistic structures –should perhaps adhere to the use of a professional language that conforms to recognized standards of normative rules. experimental enquiries show that cognitive- corpus based linguistics could improve international lawyers comprehension of the communicating process in this fild. the communicating phases in international law has to do with linguistic- pragmatic approaches since the central focus in the process is the language of Man.

09:25
The shifting collocates of QUITE and RATHER: a diachronic vector-semantics approach

ABSTRACT. Hilpert (2016) draws on ideas developed in Sagi et al. (2011) and builds a vector space model to map the semantic change of MAY in the Corpus of Historical American English (Davies 2010). The semantic vector space model is constructed on the basis of its verbal collocates and the frequencies of those collocates over different periods of time. The co-occurrence matrix is transformed with Positive Pointwise Mutual Information.

Our paper builds on Sagi et al. (2011) and Hilpert (2016), but pursues a different methodology, inspired by Hamilton, Leskovec, & Jurafsky (2016). Diachronic word embeddings are produced by means of word2vec (Mikolov, Chen, Corrado, & Dean 2013a; Mikolov, Sutskever, Chen, Corrado, & Dean 2013b; Mikolov, Yih, & Zweig 2013c). In order to compare word vectors from different time-periods, Hamilton et al. (2016) ensure that the vectors are aligned to the same coordinate axes with orthogonal Procrustes. We apply this methodology to shed light on the semantic changes that QUITE and RATHER have undergone during the last two centuries in US English in both predeterminer and preadjectival positions in the Corpus of Historical American English. Our goal is to determine whether these adverbs have undergone constructionalization (Traugott & Trousdale 2013).


Interim results confirm that: (i) the rate of semantic change scales with an inverse power-law of construction frequency; (ii) constructions that are more polysemous have higher rates of semantic change.

09:50
Exploring concgrams in research articles: A corpus-driven cross-disciplinary study

ABSTRACT. A major focus of corpus research into academic discourse is on its characteristic patterned language use. An oft-studied, related concept is phraseology, an umbrella term having captured various forms of co-occurrences such as N-grams, phrase frames (P-frames) and concgrams. Compared with numerous studies on N-grams which represent sequential word co-occurrences, P-frames and concgrams have been far less studied. Concgrams, which have considered both constitutional and positional variations, could better capture phraseological variations than the other two varieties. Further, the use of concgrams in academic discourse has been rarely studied and the only exception so far is Cheng (2014), which based on Corpus of Research Articles (http://rcpce.engl.polyu.edu.hk/RACorpus/default.htm) has examined 2-word concgrams in RAs from all 39 disciplines. However, for each discipline, there are only 20 RAs for concgram analysis, which may account for the inadequate generalisability of its findings and some difficulties in extracting concgrams.

To bridge the gaps, the project compares the use of 2-word concgrams and 3-word concgrams, which have been most often studied in the literature (e.g., Cheng and Leung, 2012), in two self-compiled multi-million-word corpora comprising 600 empirical research articles from psychology and medical sciences.

Based on these two multi-million-word corpora, the study presents interesting cross-disciplinary and cross-sectional similarities and variations in the distribution and functional use of various salient concgrams. The findings manifest the contrasting disciplinary nature and epistemology and the distinct communicative purposes of various RA part-genres.

10:15
Patty – a Tool for Identifying Lexico-Grammatical Patterns in POS-Tagged Corpora

ABSTRACT. It has long been recognised that lexico-grammatical patterns play a highly important part in language use, especially in SLA (Nattinger and DeCarrico 1992; Wray 2002). A great many such formulaic expressions involve the combination of lexical items with specific grammatical structures, such as exemplified in Huston & Francis’ (2000) Pattern Grammar, and which can also be seen as variable-slot constructions in the sense of Construction Grammar (cf. Goldberg 2007). However, while statistical techniques may make it possible to identify collocates semi-automatically, doing the same for such extended patterns usually involves time-consuming manual approaches. This talk introduces a new tool, Patty, which creates regex-filterable POS-gram frequency lists representing such potential formulaic patterns, and allows the user to retrieve and display all their actual realisations in a given corpus. In my talk, I will illustrate the basic features, applications, and advantages of the tool, and point forward to future developments. The tool itself will be available as freeware by the time of the conference. References Goldberg, Adele. 2007 [2003]. Constructions: a new theoretical approach to Language. In V. Evans, B. Bergen & J. Zinken (Eds.). 2007. The Cognitive Linguistics Reader. Lon-don/Oakville: Equinox. Hunston, Susan & Francis, Gill. 2000. Pattern grammar: a corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins. Nattinger, James & DeCarrico, Jeanette. 1992. Lexical Phrases and Language Teaching. Oxford: Oxford University Press. Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.

09:00-10:40 Session 16C
Location: IBK Hall
09:00
Corpus-based Study of Interpragmatics: A case of English Learners of the Persian Language

ABSTRACT. The present study sets out to describe the design and compilation of a 30000-word learner corpus from recordings of L1 English speakers talking spontaneously in the Persian Language. The purpose of this corpus is to determine the pragmatic functions that second language learners use, namely, pragmatic functions based on theories of politeness. To this end, the learner corpus was tagged for pragmatic functions using both manual and semi-automated tagging systems. The findings of this study are aimed to shed further light on the interpragmatic development of learners, which remains a less-studied area, especially for English speakers learning a second language. In addition, the study aims to provide alternative ways on how corpus linguistics can be used to study the interpragmatics of learners using pragmatically tagged learner corpora.

09:25
Construction Counter: A tool to measure (nonnative) language development

ABSTRACT. Various (morpho)syntactic measures--such as mean length of utterance<1> and mean number of clauses per sentence/T-unit<2>--have been used as indices of L1/L2 development in children. However, many of these indices have been found to have age effects, with older children tending to receive higher scores regardless of proficiency<3>, <4>, casting doubt on the validity of these measures. Drawing from Construction Grammar<5>, this study offers "Constructional Diversity" as an alternative index for children's L2 (morpho)syntactic development. Using spaCy<6> in Python, we designed a new "Construction Counter" tool for English that sorts clauses into 11 constructions and calculates the Constructional Diversity for each text by dividing the number of construction types (max.=11) by the total instances of the 11 constructions. Six of the constructions come from Construction Grammar<5> (caused-motion construction, ditransitive construction, intransitive-motion construction, intransitive-resultative construction, phrasal-verb construction, transitive-resultative construction); the other five are common English constructions (attributive construction, long be-passive construction, simple-intransitive construction, simple-transitive construction, there-expletive construction). To evaluate the tool, we compared its output on a test dataset of 1,000 clauses from the American National Corpus<7> to classifications made by two human annotators (inter-annotator reliability measured by Cohen's Kappa, after resolving inconsistencies: 1.00). The tool achieved fairly high recall (0.82), precision (0.86), and F1 values (0.82). Furthermore, our pilot analysis of adult L2 written data from the International Corpus Network of Asian Learners of English<8> provided promising results, showing that Asian L2 learners of English with higher proficiency (measured by an English vocabulary size test<9>) displayed higher Constructional Diversity.

09:50
Morphological Complexity of Chinese English Learners in L2 Academic Writing

ABSTRACT. In the past decades, there has been a huge surge interest in the study of language complexity in the second language acquisition. However, the studies mainly focus on lexical complexity and syntactic complexity, which cannot explore the whole snapshot of language development of second language acquisition. In recent years, the study of morphological complexity has received extensive attention. Thus, by comparing the abstract of argumentative written texts produced by Chinese English masters and PhD students, this study wants to find out the development of morphological complexity of Chinese English learners in the academic writing. The result shows that morphological complexity is strongly correlated to language proficiency, and also points out significant correlations with other linguistic complexity, such as lexical complexity. The findings suggest implications for L2 academic writing.

10:15
Navigating Focused Feedback to correcting errors through DDL in a Tertiary L2 Writing classroom of Bangladesh

ABSTRACT. Data Driven learning in L2 classroom is an effective innovation in using the language corpora in recent times (Crosthwaite, 2017). Several attempts were taken to find out the student involvement with language corpora (Johns, 1997) and then followed by Johns the other attempts were taken to find out how corpora could be used by doing some qualitative research done by Beranardini (2004) and Kennedy and Miceli (2010). Research was also done on the effect of DDL to find out errors (Crosthwaite,2018) but in Bangladesh there is no research done on this issue. In Bangladesh the learners in a tertiary classroom usually ignore their errors and there is no feedback given by the teachers in some cases. Thus the paper aims at finding out the results of using the data driven learning approach in a language classroom to navigate a focused feedback based on the Error Taxonomy (Ellis, 2009) and examines the results of using this approach in a tertiary classroom where the learners learned a course on DDL and then were given written assignments to correcting errors with the help of Corpus in BASE and BAWE. The findings were very interesting which showed the students could correct errors more successfully with the help of Corpus than they usually do without using it. Finally this paper reveals the challenges of using this approach into L2 classroom of a Bangladeshi context.

09:00-10:40 Session 16D
Chair:
09:00
A Quantitative Analysis of English Sentence Configurations and Its Cross-register Comparison Based on Pattern Grammar

ABSTRACT. The description of sentence construction, or more specifically, the complexity of sentence has great practical application value for language acquisition especially for language testing. In order to develop a better description which is much closer to the law of real natural language, a new system has been proposed here, pattern grammar (Hunston & Francis, 2000), for its three main features: summaries from large corpora, linear analysis in accordance with human psychology, and its more flexible semi-fixed configurations. With the support of the baby edition of British National Corpus, the present research regarded the two pattern configurations, pattern flow and pattern string, as observation objects. By establishing measure indicators, the configuration characteristics of English sentences were systematically investigated quantitatively from two observing angles, “depth” and “length”. Moreover, obtained from the corpus, the configurations in narrative register and those in argument register were compared to test Hunston and Francis’s (2000) observation. By providing sufficient empirical support, the results, which were shown consistent with the law of using language, have confirmed the feasibility of applying pattern grammar to describe the complexity of English sentence construction. The research has enlarged the eyesight of pattern grammar researches from the level of words to the level of sentences. It might also provide a new perspective for the field of natural language processing for its linear feature and improve the validity of the measurement of sentence complexity in the field of language testing due to its closeness to real natural language.

09:25
Dimensions of register variation in British television

ABSTRACT. The goal of the current study is to identify the dimensions of variation for the verbal language used on contemporary British television programs through Multi-Dimensional analysis (MD; Biber, 1988). Television has been the focus of a large number of studies in diverse fields. Corpus-based studies of television talk have been generally restricted to the analysis of individual programs or a selection of programs. Studies looking at large cross-section of television programming are rare, and these have looked at American television only. The corpus for this study comprises more than 50 different registers/genres, which include the most important shows presented on the major British channels such as the BBC and ITV. The texts consist exclusively of the transcripts of the talk on the programs, excluding any textual information shown on the screen. The corpus was tagged with the Biber Tagger for more than 100 different linguistic features. The features were counted and normed to a rate per thousand words with the Biber Tag Count Program. The counts were entered in a factor analysis in SAS that indicated the major factors (sets of correlated features) present in the data. The factors were interpreted qualitatively as dimensions by looking at the functional role played by the linguistic characteristics in the texts. Each text was scored on each dimension, and the mean scores for the different registers were compared using ANOVAs. The ensuing dimensions will be presented and illustrated in the paper presentation.

09:50
Gender construction and representation in genderless languages: a corpus study of the words “female drivers”, “male drivers”, and “drivers” in Chinese

ABSTRACT. Genders are represented differently in languages. The frequency and collocations analysis in corpora showed a prevalent male bias (Baker, 2013).

Scholars explored gender representation in 30 languages and argued gender bias hides subtly in languages without grammatical gender (Hellinger and Bufimann, 2003). However, few empirical studies have been done in this regard.

The word siji ‘driver’ in Chinese is grammatically, semantically, and referentially neutral, making it a good example for case study. The current study investigated how the word nü siji ‘female driver’ is used in Chinese as compared to the words nan siji ‘male drivers’ and siji ‘drivers’. The frequency and collocations of the words were examined in BCC corpus (Xun et al. 2015; Xun et al. 2016). Linguistic data were supplemented by internet trends, government statistics, and theories in social science, as did in Pearce (2008).

The findings showed the frequencies of ‘female drivers’ are much higher than the male counterpart, corroborating the common male genericization. The differences were explained by markedness theory in linguistics. It was also observed that 1950s and 2015 are two peaks of the frequency of nü siji ‘female driver’, but its collocations in the two periods are different. The differences in collocations are accounted by the theory of “doing gender”, which argues that gender is a social construct and is subject to change under the influence of language (West and Zimmerman, 1987).

In summary, the study revealed gender construction and bias in Chinese and had methodological implications.

10:15
Constant Fear, but Lingering Nostalgia: British Press Representations of Post-1997 Hong Kong Twenty Years On

ABSTRACT. This study conducts a corpus-assisted discourse study of the representations of post-colonial Hong Kong in The Times (TT) over the last twenty years. The primary purpose is to reveal its preferential ways of representing Hong Kong and explicate the intricate relations between language use and the historical and socio-political contexts. Through an integration of the methods and theories associated with critical discourse analysis and corpus linguistics, this study conducts both synchronic and diachronic analyses of the representations of Hong Kong from 1997 to 2017. The findings suggest that TT’s representations of Hong Kong tend to be crisis- and conflict-oriented. While evoking the constant fear about the future of Hong Kong, it still suggests Britain’s duty and moral obligation to protect the former British colony.

09:00-10:40 Session 16E
Chair:
Location: Helinox Hall
09:00
Analysis of Linguistic Characteristics of Chinese Textbooks Using Corpus Analysis Tool

ABSTRACT. 1. In the last decade, various researches related to Chinese textbooks have been conducted. Most existing research, however, has been subject to secondary school textbooks. There is little research on Chinese textbooks for college students and the general public. Therefore, this study intends to analyze the linguistic characteristics of general Chinese textbooks for foreigners by expanding the textbooks. The contents to be observed in this study are as follows. (1) What linguistic differences exist between Chinese textbooks? (2) If there is a difference between textbooks, what is the difference? (3) What are the linguistic features of Chinese textbooks compared to the general corpus?

2. Collection of Research Data Chinese textbooks published in Korea, China, and the United States were obtained and electronicized with an optical character recognition program. There are 36 Chinese textbooks. We used Torch Corpus and CNC to compare with 36 Chinese textbooks.

3. Analysis Tool   Word separation is performed with the ICTCLAS program (Chinese Lexical Analysis System). Calculate the frequency using Antconc and the WordSmith Tools program.

4. Research Content (1) Character level: Chinese character frequency (2) Vocabulary level: word frequency, word length, part of speech, and frequency (3) Syntax level: Collocation pattern, sentence length, sentence type (4) Text level: rhetoric, discourse pattern

5. Discussion of research results According to the basic analysis of this study, it is found that the textbooks have different word frequency, lexical diversity, and syntactic complexity. This is because most textbooks have different intents for publishers and writers.

09:25
A Study on the Mandarin Speakers’ Use of a Modal Particle ‘a 啊’

ABSTRACT. This study is about how Mandarin speakers use ‘a 啊,’ the Mandarin modal particle, through analyzing the speech corpus, which collected from a TV drama script and a survey, which has been conducted online. The speech corpus came from Xu(2018), and the survey was produced in Kim(2019). ‘a’ can be pronounced differently, such as [ia], [na], [ua], [ɻa](or[ʐa]), [tsa], and [ŋa], depending on the coda in the precedent syllables. Some kinds of phonetic changes can be written in different Chinese characters from 啊, such as ya 呀, na 哪, and wa 哇. According to the influential Mandarin phonological books, some rules are free variations, while some must be obeyed. However, it is very often and easily found that Mandarin speakers do not follow the phonological rules in their daily uses of ‘a.’ Mandarin speakers especially use [ia](呀) very often. This study will show the gap between the rules presented in the books, and the real uses in Mandarin speakers. Also, it will present if there is any difference of obeying the rules between male and female speakers, between old and young speakers, and if the period of speaking Mandarin affects the phonetic changes of ‘a’.

09:50
Current Issues in Mandarin Chinese Discourse Analysis and a Case Study Using Chinese Spoken Discourse Data

ABSTRACT. This study surveys recent issues of Mandarin Chinese discourse analysis that incorporate an interactional perspective in describing and understanding various grammatical phenomena. This study pays particular attention to the development of discourse-based functional approaches to grammar in the United States. In order to show the ways how researchers use discourse analysis to study complex grammatical phenomena within Chinese discourse contexts, this study investigates the use of the second-person singular pronoun ni occurring at both the beginnings and endings of discourse turns (ni...ni) using case studies. In Mandarin, ni...ni is a grammaticalized device characterized by a set of consistent syntactic, prosodic, and discourse features. From the syntactic, phonological, and pragmatic perspectives, ni used at the beginning of the turn indicates typical deictic function, while ni used at the end of the turn shows the features of a discourse particle. This study shows that the ni occurring at the beginning of a turn refers to the intended recipient of the utterance, while the ni occurring at the end of the turn does not contribute to the discourse unit’s propositional content. Instead, the end-of-turn ni performs the discourse-pragmatic function of expressing the speaker’s negative stance toward the person or object being discussed.

10:15
The Patterns of Corresponding South Korean Pop Music Lyrics and Their Chinese Translations

ABSTRACT. South Korea and China have been closely interacting with each other for many years. As the interactions and exchanges between the two countries become more active, we now see more translations of the dramas, novels, and newspapers. Linguists have been analyzing these data, but not many of them have focused on the matching patterns in popular music. In this study, the author collected the lyrics in the two languages in the perspective of Corpus linguistics to analyze their patterns and characteristics in comparison with one another via statistical analyses.

10:50-12:30 Session 17A
Location: Choi Young Hall
10:50
The study of specificity in novice and expert Engineering writing: A corpus-based comparison

ABSTRACT. In academic context, it is believed that novices are required to demonstrate a mastery of using appropriate discipline-specific conventions in their writing to communicate with their target disciplines, making themselves as one of the expert writers. Such writing journey is the point of the investigation for this study. Therefore, it adopts corpus methods to study the development of specificity between first-year novices and final-years experts by comparing written assignments in engineering produced by three different cohorts of undergraduate students (from year one to three) at the same time. In total, 150 assignments (with 50 for each year of study) totalling around 210,000 words were collected for the analysis of specificity with reference to academic vocabulary and interpersonal metadiscourse features. The findings reveal that all these point to a developmental trajectory in engineering writing, with a greater distribution of discipline-specific lexical items and metadiscourse features, especially in the upper-level writing. This positive development can be partly attributed to the growth of students’ ‘engineering sense’ that they developed over time. In short, this study highlights the value of descriptive, corpus-based studies of first-year writing compared to final-year academic writing in engineering, which may shed light on disciplinary writing pedagogy, specific to engineering.

11:15
An investigation of phrase-frames used in Engineering lectures

ABSTRACT. Several previous studies have stressed the pedagogical importance in compiling academic formulaic expressions for English learners who need an academic repertoire in English as a Medium of Instruction (EMI) context. Phrase-frames, in particular, seem to be helpful because a single frame can be used to express different meanings by replacing words (e.g., “the * of the study”), but they have attracted relatively less attention. Therefore, the present study explored five- and six-word phrase frames in Engineering academic lectures, one of the most important academic speech events, which contained three disciplines Computer Science (CS), Electric Engineering (EE), and Mechanical Engineering (ME), totaling around 1.2 million words. In order to acquire pedagogically beneficial and disciplinary representative phrase frames, academic lectures transcripts were carefully selected from websites offering online courses. Phrase frames, appearing more than ten times, were processed by kfNgram (Fletcher, 2011) and were then categorized based on their functions for further comparison. There were 513 five-word and 95 six-word phrase frames used in Engineering lectures. In particular, the instructors of EE and ME used far more referential expressions to explain content knowledge (61% and 59% respectively). The phrases frames were used to specify the attributes of formula or models, such as “in the * direction and” (can be filled with “x, y”). The instructors of CS used more discourse expressions (55%) to structure courses, such as “at the end of *” (can be filled with “the, this”). The pedagogical implications of the phrase frames can be further discussed.

11:40
From Corpus to Course: Material Design for Students in Fisheries Science

ABSTRACT. With the rapid development of corpora and corpus linguistics software tools, researchers have made use of corpora to help students with academic writing. A strand of research has examined language use in research articles in different disciplines. However, the results of such analysis have seldom been translated into course materials; even when they are, these materials are not necessarily applicable to every discipline. To apply corpora to classroom settings, another strand of research has employed data driven learning, featured by students searching for patterns in concordance lines, to teach academic writing. However, students might feel overwhelmed by unorganized authentic data and often need much training before they can benefit from analyzing concordances. This study attempts to deal with the above issues by designing teaching materials based on the analysis results of a tailor-made corpus for fisheries science. First, a corpus of 200 research articles in fisheries science is built. Then, frequent language features for each research article section are identified. Finally, teaching materials specific for fisheries science, including worksheets and assessments, are designed with the help of a disciplinary expert. The worksheets are not unorganized concordance lines, but activities that scaffold students’ learning of the most frequent language features of their discipline. Examples of worksheets will be shown and wider applications of such worksheets in the fisheries industry will be discussed.

12:05
A Phrase-frame List for Social Sciences Research Article Introductions

ABSTRACT. In the past decades, several corpus-based studies have proposed the necessity of lists of academic formulaic expressions for students of English for Academic Purposes (EAP). One type of formulaic expression that has not yet received sufficient attention is the phrase-frames (p-frames), i.e. semi-fixed sequences that contain a variable slot that can be filled by different words, e.g. the * of the study, where the open slot may be filled by purpose, goal, and objective, among others. In this study, we extend recent efforts in compiling pedagogically useful lists of academic formulaic expressions by deriving a list of p-frames frequently used in an academic corpus of social science research article (RA) introductions. The corpus data comes from several different social science disciplines. These disciplines include Psychology, Sociology, Linguistics, and Economics. Based on this large social science corpus, we extracted five- and six-word p-frames using kfNgram (Fletcher, 2011). For each p-frame, kfNgram provides its token count, a list of its variants, and the token count for each variant. The initial candidate p-frames extracted included all possible p-frames with a single variable slot in any position. These candidates were then manually filtered to ensure their semantic completeness and pedagogical value (Lu, Yoon & Kisselev, 2018). The resulting five-word phrase-frames and six-word phrase-frames were then analyzed and further linked to different moves. In carrying out this project, we hope to contribute to the methodological discussion of the extraction and selection of suitable p-frames as well as the pedagogical applications of these academic phrase-frames.

10:50-12:30 Session 17B
10:50
BriC: a new Brazilian Portuguese-Chinese Parallel Corpus

ABSTRACT. Brazil and China are emerging countries with increasingly stronger connections. However, most of the communication between these countries are in English, reflected in the paucity of resources either for education or translation purposes. A potential solution would be to utilise a parallel corpus for Chinese and Brazilian-Portuguese (BP), but most of the corpora already available are composed of other varieties of Portuguese, rather than Brazilian (e.g. Xing et al. 2016). In addition, most of the data are compiled through machine translation (e.g. Lison & Tiedemann 2016), which affects the accuracy of the translated data. Our research aims at filling this gap by creating a corpus of BP texts and the correspondent parallel corpus in Chinese and vice-versa. The main purpose of this corpus is to inform language learning and translation, and explore the usefulness of Corpus Linguistic methodologies for this language combination. For this initial step of our project, we collected literary data from different periods and authors. Metadata such as textual genre and years of publication were preserved, allowing for easy subcorpora compilation. Both corpora were annotated with part-of-speech and lemma information using the TreeTagger (Schmid 1994). The corpora were then aligned using LF-aligner (Farkas 2018) and uploaded to CQPweb (Hardie 2012). In this presentation, we will outline the process of data collection and how we overcame the main challenges. We will do a demonstration of the corpora and discuss how it could be used for pedagogical purposes. We will also briefly discuss the next steps of the project.

11:15
Leveraging Corpus Queries for Argumentation Mining

ABSTRACT. We are interested in Argumentation Mining, specifically in retrieving arguments from social media, where users express opinions through non-traditional forms of argumentation. We use corpus queries to retrieve such arguments with high precision. Consider the following query for the argument scheme “position to know” (e.g.”as a dentist I can assure you that …“) from a corpus of tweets:

[lemma="as"] @0:[::] <np> []* [lemma=$person_any] []* </np> @1:[::] [lemma="i"] <vp> []* @2:[pos="V"] []* </vp> @3:[::];

$person_any is a word list of suitable professions etc., and @-anchors mark regions and tokens of interest (@0…@1 = a dentist, @2 = assure, etc.).

Standard GUI concordancers and web-based platforms are insufficient for this purpose, though. AntConc does not support a complex query language; CQPweb and (No)SketchEngine do not offer word lists and macro grammars, and they are limited regarding data extraction.

Therefore, we build directly on the IMS Open Corpus Workbench (CWB), the indexing engine underlying CQPweb, which enables users to edit complex query grammars in separate macro definition and word list files. We extended CWB with necessary functionality (e.g. multiple anchor positions) and implemented a wrapper application to support the interactive development of query grammars. It enables users to inspect concordances conveniently, debug queries and obtain frequency tables for selected positions (i.a. as a basis for new word lists).

We demonstrate the tool’s usefulness through several argument queries applied to a large corpus of tweets about BREXIT. The software will be released as an open-source add-on to CWB in time for APCLC2020.

11:40
Incheon National University Multi-Language Learner Corpus (INU-MULC): Its Design and Application

ABSTRACT. INU-MULC is an on-going project of collecting spoken and written data of English produced by Korean speakers. Most publicized learner corpora lack casual conversation data, whose demand is increasing (Love et al. 2017). Casual conversation is the most basic mode of communication (Paltridge 2012) that does not allow time for planning and revision, and thus, learners employ vocabulary and grammar that they can access the most readily. In order to describe learners’ L2 acquisition more completely and accurately, INU-MULC collects not only written data but spoken data including both monologues and casual conversations. About 100 conversations, 215 monologues, and 150 compositions have been collected (330,000 words for spoken and 30,000 for written) over the course of a year. Most speakers in the corpus were students of various majors and proficiency levels at Incheon National University. They provided the data in three forms: a 300-word writing composition, a 2-minute monologue, and a 20-minute casual conversation with 1-3 other speakers. All spoken data was transcribed with a simplified convention of Discourse Transcription (Du Bois et al. 1993). The speakers’ meta data was collected, and their proficiency levels were determined using the Common European Framework of Reference for Languages (CEFR) based on their monologues and conversations. Finally, both written and spoken data are to be POS-tagged. The corpus is applicable to various topics of SLA from three different forms (written, monologue and conversation), such as adverb acquisition (Lauzon, in preparation), semantic application of frequent verbs (Park 2019), and sentence complexity (Yoon, in preparation).

12:05
DIY Concordancing for Sophomore English Majors at a Sino-British University in China: The Aims, Tools, Assessments and Impact

ABSTRACT. Most students in China (and the region) only typically encounter opportunities to develop concordancing and corpus analysis skills towards the end of their undergraduate degree, or at postgraduate level if at all. Earlier exposure to corpus techniques could mean undergraduate linguistic and translation work could be better underpinned by quantitative data and give more authority in their exploration of patterns in their foreign language. From a language learning perspective, Data Driven Learning has been shown to be effective (Boulton & Cobb, 2017) and it engages students fruitfully (Flowerdew, 2015). Pioneering work with postgraduates (Johns, 1991) and undergraduates (Cheng, Warren, & Xun-feng, 2003) serve as inspiration for hands on concordancing.

This paper reports on a sophomore undergraduate module on corpus linguistics for English majors at a Sino-British university, providing an overview of design, corpus tools, assessment and student response (assignments and feedback).

References

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta-Analysis. Language Learning, 67(2), 348-393. Cheng, W., Warren, M., & Xun-feng, X. (2003). The language learner as language researcher: putting corpus linguistics on the timetable. System, 31(2), 173-186. Flowerdew, L. (2015). Data-driven learning and language learning theories: Whither the twain shall meet. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple Affordances of Language Corpora for Data-driven Learning. Amsterdam: John Benjamins. Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. In T. Johns & P. King (Eds.), Classroom Concordancing (Vol. 4, pp. 1-13). Birmingham: CELS, University of Birmingham.

10:50-12:30 Session 17C
Chair:
Location: IBK Hall
10:50
Development of 3-word bundles in Korean Children EFL Learners’ Corpus (KCELLC)

ABSTRACT. This paper investigates the use of 3-word lexical bundles and collocation analysis in Korean Children EFL Learners’ Corpus (KCELLC). A frequency driven approach was taken to identify 3-word lexical bundles. Generally, lexical bundles are the contiguous sequence of words in written texts and spoken texts. They might appear as discourse markers or relatively fixed expressions all over the genre and represent the particular styles of the registers. For this study, Korean Children EFL Learners’ Corpus (KCELLC) was constructed and BNC corpus will be used as a reference corpus. The KCELC was a collection of 108 writing materials written in L2 English by 4 students from 2013 to 2014. The samples were collected from writing essays with various topics such as descriptive writing, letter writing, story writing, and poetry writing. Of the total of 112 writing, 108 were selected for the data except for poems. After building txt. file, I used AntConc 3.5.8 (or Python) to examine the tokens and types of the vocabulary that appeared in the data. The total tokens in the data were 6,086 and the types were 1,234. Overall, the study will retrieve the categories and uses of 3-word bundles in the KCELL corpus then investigate how the categories and uses of 3-word bundles change between the first units and the final units. The results will represent how the target linguistic units from each subject change and develop from unit to unit by comparing with a reference corpus.

11:15
L2 Corpus and Psycholinguistics both Accounting for L2ers' Syntactic Knowledge

ABSTRACT. There is significant evidence that people predict upcoming syntactic structure on sentence based on their statistical experience with the language. This study examines whether Korean L2 learners of English are affected by the expectation of upcoming words during sentence processing. There are three kinds of hypothesis to explain the relationship between the expectations of next words and language processing difficulty: the surprisal hypothesis, the competition hypothesis, and the entropy reduction hypothesis. The surprisal hypothesis suggests that processing difficulty is related to the probability of its possible derivation in sentence (Hale, 2001; Levy, 2008). Readers are likely to work hard when they parse sentences with the lower probability of linguistic materials. The competition hypothesis proposes that processing difficulty could be accounted for by the number of predictions that compete with each other (McRae, Spivey-Knowlton, & Tanenhaus, 1998). Readers are likely to feel difficulty when they process linguistics materials which have a large number of predictions, especially they have similar probabilities in sentences. The entropy reduction hypothesis claims that processing difficulty could be explained by the degree of reduction on uncertainty about upcoming syntactic structure (Hale, 2006; Yun, Chen, Hunter, Whitman, & Hale, 2015). Readers are likely to face costly processing when a reduction in uncertainty is large. In this study, a self-paced reading experiment was conducted to compare three kinds of hypothesis with regard to language processing by Korean L2 learners of English. This study could figure out the processing aspects of reading times according to lexical subcategorization distributions.

11:40
A Study on the Particle Omission Errors of English, Japanese-speaking Korean Learners – Using 'Korean Learners’ Corpus' by National Institute of Korean Language –

ABSTRACT. The purpose of this research is to analyze the particle Omission errors of English, Japanese-speaking Korean learners and to investigate the particle Omission errors using Error-annotated Korean Learners’ Corpus. The Korean Learners’ Corpus is very useful learner language data for both Korean teachers and Korean learners. Specifically, Error-Annotated Korean Learners’ Corpus is extremely valuable since it provides detailed analysis and feedback for each error. Omission is second most frequent particle error types and it account for 30.6% of total particle errors. The analysis of English, Japanese-speaking Korean learner data shows that particle omission errors are occurring across all levels of Korean learners and it shows that Korean teachers and learners should focus on how to reduce particle omission errors. Both English, Japanese Korean learners and teachers need to focus on object(eul/leul), subject(i/ga), topic(eun/neun) and locative/adverbial(e) particle more because these particles account for over 80% of total particle omission errors. However English-speaking Korean learner tend to make more omission error than Japanese do. Particle errors needs to be looked at from a larger perspective to find out better solutions to reduce them, because particles are closely related to verbs, nouns and syntactic structures.

12:05
Different methods of L2 grammar profiling: a comparative study

ABSTRACT. This study investigates how profiling research into a grammar of English as a second language (L2) should be made by using multiple corpus resources. L2 profiling research has been increasingly important as the Common European Framework of Reference for Languages (CEFR) is used for compiling a set of English language resources based on L2 learner corpora (Hulstijn, 2010; Salamore and Saville, 2010). In constructing a grammar profile, grammatical points are to be selected and sequenced in the order of presentation based on the CEFR levels. At the same time, assessment uses this grammar profile information to judge the quality of texts produced by L2 learners, which can be done using L2 learner corpora. However, profiling based on L2 learner corpora sometimes does not match the results of profiling based on the analysis of teaching materials (Tono, 2016). While there are several useful English grammar profiles already available (e.g. English Grammar Profile by the English Profile, Global Scale of English by Pearson, Core Inventory for General English by British Council/EAQUALS), different profiling methods yield different results, which might confuse the end-users of such resources. In this study, a corpus-based methodological comparison was made between profiling based on L2 learner corpora and L2 coursebook materials. The results show that the profiling of grammar items is influenced by multiple factors such as types of reference texts (textbook vs learner corpora), relative frequencies of grammar items in L1 and L2 textbook corpora, and modes of presentation (spoken vs written).

10:50-12:30 Session 17D
Chair:
10:50
Critical Discourse Analysis of Hidden Meanings of Feministic Discourse in Reham Khan’s Interview on BBC Urdu News

ABSTRACT. In this research, critical discourse analysis is used to observe the intended meaning of an interview discourse given by Reham Khan to BBC Urdu News on November, 19th 2017, after the divorce incident from her husband Imran Khan. In order to investigate the importance of feminist discourse as a social phenomenon in Reham Khan’s conversation, macro level analysis by Vann Dijk (1998) is done. Content analysis of choice of lexical items is also observed with the help of occurrences of frequencies. The results of the research highlight the function and intended meaning of spoken discourse in a stereotypical patriarchal context.

11:15
Image report of China´s “Belt and Road” Initiative in the British and German Press

ABSTRACT. This study investigates and compares the image of China´s “One Belt, One Road (OBOR)” Initiative in terms of its discursive representations in the British and German press. It focuses on the different perspectives between countries and the diachronic changes of the discursive themes and discursive strategies in the past six years. The empirical data serving the study are 211 British and 352 German newspaper articles containing the keyword “one belt one road”, dated between September 2013 (the announcement of the initiative) and May 2019. They were collected from the newspaper section of the online database LexisNexis (Academic) and were annotated with metadata such as “year”, “region”, “source”, and “thematicity”. The methodology adopted in the study combines critical discourse analysis (CDA), corpus linguistics (keywords, co-occurrence and concordances analyses), and topic modelling (the Ida2vec model proposed by Moody [2016]). This synergetic approach brings about interesting insights into our research questions. The findings show that the discursive representations of OBOR feature rather conflicting and mixed views in both the British and the German press, and the general image of OBOR has changed evidently over the past six years. In addition, the British and German newspaper articles feature several different patterns of representation, which reveal that the German press sees OBOR, especially the 16+1 initiative, as a geopolitical threat, while the British press emphasizes the business opportunities that the OBOR opens up for Great Britain.

11:40
Constructing Anime Characters Using the Indexical Meanings of First-Person Pronouns: A Case Study of the SciFAn Corpus

ABSTRACT. Japanese animation, or anime, has grown considerably in popularity over the last decades. As an internationally popular cultural phenomenon, anime receives a substantial amount of scholarly interest in a number of areas. However, there are still limited examinations of the language use in the dialogue of anime and other Japanese telecinematic texts, such as television dramas or feature films. Additionally, corpus linguistic studies of telecinematic texts have so far been limited to English-language texts (e.g. Bednarek, 2015; Quaglio, 2009). In order to contribute to these (still) emerging areas of linguistic research, this study draws on a newly-constructed corpus, the corpus of Science-Fiction Anime dialogue (SciFAn corpus), which is comprised of the dialogue from five popular and highly-rated science-fiction (sci-fi) anime television series. This study examines how the meanings expressed by first-person pronouns (1PP) are repurposed in the dialogue of (sci-fi) anime television series to construct and convey character identities and social relationships. Previous studies of 1PPs, particularly those examining anime dialogue, mainly discuss their usage in relation to construction of (cis)gender identity (e.g. Dahlberg-Dodd, 2018; Hiramoto, 2013). Drawing on the concept of indexicality (Silverstein, 1976, 1985), this study shows that the use of 1PPs can index other aspects of identity, such as social roles and personal traits, in addition to gender. More broadly, this study demonstrates that corpus linguistic analysis can be used effectively with a sociolinguistic framework to examine characterisation and contributes to the still under-represented areas of Japanese telecinematic discourse studies and Japanese corpus linguistic research.

12:05
Linguistic Identity Construal of Spanish Youth

ABSTRACT. The present investigation attempts to verify why and through what via the contact with English insertions in Spanish discourse changes the speech style and construes the linguistic identity of Spanish youth, even if they hardly speak English. In order to throw light on the above-mentioned hypothesis, Allan Bell’s (1984, 2001, 2010) Audience Design Theory was combined with three different methods used in case studies (an Attitudinal Experiment, a Sociolinguistic Survey and a Sociolinguistic Interview in combination with an Exposure Experiment and an Attitudinal Experiment) about English insertions in the Spanish discourse. According to the results the following conclusions were reached: (i) The majority of the Attitudinal Experiment participants and volunteers accept that mass media is one of the three main factors that influence their speech style; (ii) Spanish youth use English insertions in their Spanish discourse because they perceive someone who does as being young, modern, a Spanish-English bilingual and someone trendy; (iii) Spanish youth use English insertions without being Spanish-English bilinguals; (iv) For the Attitudinal Experiment participants neither a particular speech style nor a sweet type of vice can measure cleverness of a speaker; (v) For the Attitudinal Experiment participants a sweet type of voice together with a particular speech style does matter; (vi) Spanish mass media proactively construe (this is our assumption strongly supported by the reached conclusions) the linguistic identity of Spanish youth even if their level of English is very poor.

10:50-12:30 Session 17E
Chair:
Location: Helinox Hall
10:50
A Corpus based study on the discourse pragmatic functions of Chinese aspect marker Le and Guo in Mandarin Chinese

ABSTRACT. The paper investigates the discourse pragmatic functions of Chinese perfective, experiential aspects in story discourses. Chinese is regarded an aspect-prominence language as opposed to tense prominent, Based on the discourse modes classification proposed by Carlota S. Smith(2003), this paper is attempted to analyzed the influence of perfective, experiential aspect on the temporal structure of narrative mode in story discourse. Data for the study were collected using Chinese fairy tales. Research finds that Le can signal the boundary of time interval, and connects other clauses mainly by means of sequence and elaboration. It claims that in story-narrative discourses, Le is used to narrate the detailed events in each phases, while Guo is used by narrator to sep up the main phases of the progress of story. Guo in Chinese is generally regarded as the (experiential) perfect, and it does not temporally related to other events in a discourse. We would rather suggest that Guo reveals the feature of counter-sequentiality(Givon, 1993), and refers to the coding of an event out of its chronological order in a narrative.

11:10
A Study on the Aspect Function of Directional Complement of Chinese - Based on corpus statistical analysis

ABSTRACT. In many languages, words expressing movement have been grammaticalized into functional words and these changes have been discussed by Bybee et al.(1994) and Heine & Kuteva(2004). Directional complement(趋向补语) is a representative element that expresses the meaning of movement in Chinese. Especially, due to the distributional feature, it is likely to function as an aspect marker. Nevertheless, to date, most of the research has been analyzed based on traditional methods and small-scale corpus. In this paper, we have investigated the aspect function of directional complement through the corpus statistical analysis. The subjects of this study are “上”, “下”, “起来”, “下去”, “下来”. The previous studies have analyzed only the meaning of verbs in which these directional complements are combined. In addition to the verb meaning, it is thought that the aspect features of other elements in the sentence are necessary to understand the aspect function of directional complement. In this paper, we analyzed a number of parameters, such as aspect features of other elements, sentence type, word order in the Chinese corpus data. This study examined the influence of directional complements and parameters and the relationship between the aspect functions of each directional complements through statistical analysis. Finally, this study seeks to show the position where the aspect function of directional complement occupies in the dual aspect system presented by Smith(1997) and in the development path of the aspect marker presented by Bybee. This study will contribute to identifying the status of directional complement in the aspect system of Chinese.

11:30
Using Web as a Corpus for Loanword Phonology: phonetic adaptation in K-pop fandom

ABSTRACT. This paper investigates a web-derived corpus from the largest Chinese online community “Baidu Tieba” to study the loanword phonology in Chinese. It has been difficult to obtain a large amount of loanword data with statistically controlled variables so far, but it is able to overcome existing obstacles by using web as a text corpus. Based on a real-life “wug test” data that K-pop fandom has phonetically adapted Korean lyrics into Chinese, new findings concerning the constraint hierarchy among native grammars, syllable structure and speech perception are discussed in detail. A brief overview of other ongoing changes of the notion of Chinese characters is also given.

11:50
On trends of Chinese corpus-based research in Korea and some suggestions for Chinese education

ABSTRACT. Chinese corpus linguistics in Korea is an area that has yet to expand its base much. However, more and more corpus-based research is being done. According to the Korean academic database search results, it seems that CCL and BCC were mostly used in the researches, and corpora that were directly collected by researchers also have been being used in the discourse studies. Many of the research targets analysis of collocation in language items, comparison analysis of Korean-Chinese sentence structure, and seems to be aimed at education along with the phenomenon analysis. In light of this, this study proposes that the establishment of the corpus to suit research purposes is necessary for Korea as well. For example, we can think of Chinese textbook/material corpus. Considering that Chinese is a second foreign language for Koreans, its orientation in any field of Chinese language study is likely to eventually be linked to education. If its research has so far focused mainly on language itself, we think that Chinese textbook corpus needs to be built on a pragmatic perspective. The study based on them will be able to observe the pragmatic variable patterns according to the region and learner of education. Also, various conversations under the same subject can be categorized and within them, speech acts that perform different functions of the same form or the same function in different forms can be considered. The result will fill the void that natural discourse research can not immediately lead to pragmatic education.

13:30-14:30 Session 18: [Plenary 3] Mark Davies. New corpus tools for researching and teaching English

Plenary Session

Location: Grand Ballroom
13:30
New corpus tools for researching and teaching English

ABSTRACT. TBA

14:40-16:20 Session 19A
Location: Choi Young Hall
14:40
How Academic are TED Talks?

ABSTRACT. TED Talks are a contemporary genre through which speakers, including entrepreneurs, politicians, artists, engineers, writers, and academics, communicate specialist ideas to a lay audience. TED Talks are a pedagogically exploitative resource with possible applications within EAP, as semi-authentic academic materials. Therefore, this presentation investigates the extent to which TED talks and academic lectures are similar in terms of academic language content. The lexical coverage of academic language content is measured in a TED talk corpus (2483 talks, 5,068,781 words) and an academic lecture corpus (708 lectures, 5,523,791 words). Academic language content is operationalised as: the Academic Word List (Coxhead 2000), the Academic Vocabulary List (Gardner & Davies 2014), the Academic Spoken Word List (Dang, Coxhead & Webb 2017), and the Academic Formulas List (Simpson-Vlach & Ellis 2010). In all cases TED talks were found to have significantly less academic language content on average. The difference was most striking concerning academic formulas, suggesting differences in the ways the genres engage with ideas and the audience. However, a subset of TED talks was identified which had comparable levels of academic content in relation to the average lecture. Talks within this subset are likely to be tagged economics, finance, math, or be related to pure academic disciplines. A measure of high frequency general English showed no significant differences. However, TED talks had significantly more contemporary vocabulary items.

15:05
How, When and Where do I hedge? Malaysian Novice and Expert vs. International Scientific Writers

ABSTRACT. Scientific writers are expected to showcase familiarity with the persuasive practices of their disciplines; this includes conveying appropriate attitudes to their readers when framing arguments in the most convincing ways. Writers commonly show their degree of commitment to knowledge claims by using hedges and subtly mark their stance. This paper examines the use of hedges employed by Malaysian scientific writers and their international counterparts in published research articles. Our corpus comprises three sub-corpora of Malaysian novice and expert writers as well as international writers from science and engineering fields, approximating 100,000 words. Using the taxonomy of hedges (Salager-Meyer, 1994), we examined the occurrences of hedging devices across research article segments. The findings revealed that Malaysian writers used hedges differently in terms of distribution, occurrences and variability, compared to their international peers. Specifically, Malaysian writers used hedges more in the introduction segment and fewer in results and discussion, in contrast to the patterns found in the international sub-corpora. Novice writers used hedges the least across the board. Malaysian writers seemed to have limited resources when hedging as they employed only modal verbs (could, would) and adverbials (mostly, approximately) and generally lack linguistic sophistication in their expressions. While hedging is clearly challenging to novice writers, it remains difficult even to expert writers, especially when presenting and promoting their results. As hedging has prominence in English research writing and it may be a culturally-conditioned maneuver, evidence from corpus studies may assist EAL writers on how to employ various hedging mechanism effectively.

15:30
Disciplinary Differences in the Use of Evaluative That: Expression of Stance via That-clauses in Business and Medicine

ABSTRACT. A growing body of literature now recognizes the importance of interaction between writers and readers in disciplinary academic texts, as well as how writers evaluate propositional content through grammatical structures. An effective way to evaluate an author’s own findings, the findings of others, methods, and theories is to package propositional information in evaluative that constructions (e.g. ‘the author believes that…’). These provide writers with various options for evaluating a proposition (e.g., content of that-clauses, sources, controlling words) and thematizing the evaluation by signaling either epistemic or attitudinal stance towards the propositional content. While disciplinary differences in the use of this construction have been attested in the literature, there is still a need for research that tests and confirms the nature of such differences across contrasting academic disciplines. Adopting a corpus-based approach, this study therefore investigates expressions of evaluation and stance signaled via that-complement clauses in published research articles across business and medicine disciplines, drawing on Hyland and Tse's (2005) model of evaluative that-clauses. We find that evaluative that-clauses were primarily used by authors in both disciplines to comment on their own and previous findings, with such clauses largely controlled by verbal predicates expressing epistemic assessment towards the evaluated entity. We also show certain disciplinary conventions as embedded in the use of evaluative that-clauses, indicating how discoursal practices of evaluation and interaction as achieved through such clauses are constructed according to the communicative purposes of particular texts and the norms and values of disciplinary communities.

15:55
Fear and Disgust? A sentiment analysis of the dentistry case report genre for English for Specific Academic Purposes.

ABSTRACT. This talk explores a corpus of the case report genre of dentistry writing via a sentiment analysis using the Sentiment Analysis and Cognition Engine (SEANCE, Crossley, Kyle & McNamara, 2017). Sentiment analysis involves the use of natural language processing procedures in determining the attitude, polarity and subjectivity of texts. However, despite this genre being crucial for novice undergraduate dentists, there have been few sentiment analyses conducted with the goal of developing disciplinary writing.

The results show that negative adjectives, positive nouns and verbs, and terms related to sentiments of fear and disgust and well-being are salient in dentistry case reports as compared with research articles. We attribute these results to a focus on patients in the case report genre as opposed to procedures in research articles. This is surprising given the function of case reports is that of reporting clinical procedures for dissemination among other dentists. The findings also suggest novice case report writers must be made aware of the different senses of the lexical items and be able to recognise and switch between general and disciplinary senses of these items depending on their audience. Vice-versa, novice writers need to be aware of the disciplinary (rather than general) senses of key terms during disciplinary writing.

The reported findings could not have been gained via a top-down corpus-based analysis alone. Rather, the findings are both corpus-based and corpus-driven, letting the data guide our exploration into the sentiment of lexis within dentistry case reports with as few preconceptions as possible.

14:40-16:20 Session 19B
14:40
Characteristic Features of Inadequate Representations in Translated Public Korean Texts

ABSTRACT. This study investigates English-Korean translated public brochures from Australian government websites with a particular focus on health information, and analyses inadequate representations of translations that characterise inappropriate syntactic, morphological, lexical and pragmatic features. To this end, the study has examined more than 10,000 word units from 65 translation texts of health information and identified more than 400 inadequate representations for analysis. By utilising both quantitative and qualitative methods, the inadequate translations have been further analysed and categorised into different patterns and types. Preliminary analysis indicates that most inadequate representations were of a communicative nature where language features of the source text were represented in the target text in an uncommon or unnatural way for the target language. However, there were also some unacceptable translations where the contextual meaning of the original was rendered in such a way that the subject matter is not readily comprehensible or makes even more difficult to understand. While offering easily readable translations, the study attempts to explain the departure from the original by referring to a number of sources such as incorrect use of words, use of less recognisable terminology, incorrect emphasis, stylistic incompatibility, mistranslation and unacceptable or unnatural construction of the original. The study discusses some crucial implications for the public texts that might impact on the Korean-speaking community, the credibility of governmental information and the training standard for professional translators. It is concluded with some suggestions regarding other public translation texts, provision of scrutiny process and the development of quality training for translators.

15:05
Exploring translation patterns with phrase frames: a corpus-based descriptive study

ABSTRACT. Designed as a proof-of-concept, this descriptive corpus-based study focuses on the construct of phrase frames, defined as a contiguous sequence of n words identical except for one (Fletcher 2002). Although phrase frames were already used as a means of exploring pattern variability across and within different text types or registers written in English (Römer 2010; Gray & Biber 2013; Fuster-Marquez & Pennock-Speck 2015; Grabowski 2015; Forsyth & Grabowski 2015; Cunningham 2017; Jukneviciene & Grabowski 2018; Lu et al. 2018), there has been no attempt so far to employ that construct as a unit of analysis in descriptive research on translation. More precisely, we aim to verify whether phrase frames found in English source-language texts reveal similar generalizable syntagmatic patterns in Polish translations. Also, we aim to investigate whether the observed translation patterns help highlight lexical and stylistic peculiarities of translations which would otherwise be difficult to capture. In this study, we use the English-Polish parallel corpus Paralela (Pęzik 2016), notably its European Parliament proceedings sub-corpus (Koehn 2005), to identify and describe translation patterns that emerge from two functionally-defined English phrase frames expressing attitudinal stance (it is * clear that, it is * difficult to) used as a starting point for the analysis. The findings provided insights into English-to-Polish translation patterns, which revealed that the Polish equivalents are realized with a high degree of regularity and can be generalized into syntagmatic patterns similar to phrase frames. We also obtained valuable cross-linguistic insights into corresponding syntagmatic structures in English and Polish.

15:30
CORPUS IN TRANSLATION STUDY

ABSTRACT. Translation is a communication process that involves two different languages. In the communication process, the understanding of the message between the sender and the recipient is the ultimate goal. In translation, that understanding is manifested in the form of meaning represented by units of language. In other words, real translation is the process of communicating the meaning of units of language from one code to another, from one language to another. Therefore, translation will be said to be successful if language users from different codes have an understanding of meaning or information. Translation corpus is different from the corpus in linguistic studies in general, although in the same substance, it contains a collection of written and oral texts from various sources. However, the manifestations are different. In determining the corpus, the researcher should align it with the translation branch that will be studied so that the curpus is obtained to support the research objectives. In general, there are three types of kopus translation, namely comparative kopus, parallel corpus, and mullingual kopus. In its use the three corpus differ depending on the objectives to be achieved

15:55
A Statistical Approach to Evaluate Acceptability in Chinese to English Translation by Students

ABSTRACT. In contrast with rapid development of translation studies, the specific research in translation assessment remains underexplored. Translation assessment is trapped due to its nature of subjectivity, resulting arbitrary standards of assessment. While the approach of judging translations according to personal taste alone may be acceptable in certain situations, it does not meet the needs in translation training and industry settings, where the evaluation is not limited to a simple binary criterion of “good or bad”, more importantly, it should be on the basis of concrete evidence, and must provide constructive feedback for further improvements. With the advent of computer technology, corpus provides an unprecedented opportunity to fertile empirical and objective translation studies since 1990s. Thus, the central purpose of this paper is to apply corpus as a more objective approach to test the acceptability of student translation in target context, through comparing corpus-based parameters between Parallel Corpus of Chinese EFL Learners, a parallel corpora of over two million words Chinese-English translation data by learners, and Australia Brown Corpus of one million words written native English. Corpus tools would be applied to extract the linguistic features of different corpus, in terms of average word length, lexical variation, lexical density, collocational patterns, and high-frequency word. Then, multi-dimensional comparison would be realized via Principal Component Analysis, which enables a more straightforward display of complicated data. Through analyzing the linguistic deviance and concluding major and frequent errors in student translations, the author would provide some suggestions for future translation training.

14:40-16:20 Session 19C
Location: IBK Hall
14:40
A 4-week Intensive Corpus Linguistics Course at an International Program for Non-linguistics Students

ABSTRACT. Corpus linguistics has become an interdisciplinary research field for anyone who is interested in analyzing authentic text data. Based on this interest of corpus linguistics, a 4-week corpus linguistics course has been developed as a part of an international academic program at a university in South Korea. The participants of the program were 78 students from 21 institutes in 14 countries, and they took one of the 12 courses offered at the program. In the corpus linguistics course, there were 17 students including 11 undergraduate students and 6 graduate students. The students’ majors varied in science; in other words, no one was majoring in linguistics or any related discipline. Nevertheless, the students were asked to understand the concepts of corpus linguistics and to conduct their own research for four weeks. This presentation will introduce the result of needs analysis, teaching materials, given tasks and students’ outcomes, and course evaluations from the intensive course for non-linguistics students.

15:05
Use of corpus linguistics for language learning in ESL classrooms in India

ABSTRACT. This paper reports on the use of corpus linguistics for language learning in ESL classrooms in India. A corpus was created from the digital version of folk tales from Punjab translated into English (Steel, 1894). The corpus analysis of the text was conducted to examine the use of adjectives in the text. The methodology involved the Part-of-Speech tagging (TagAnt 1.2.0) of the entire corpus. The POS annotation revealed that 8.6% of the entire text was adjectives, and 10% of all the unique adjectives were compound adjectives. These adjectives include unique words such as wheaten-bread, earring-decked, high-walled among others. Analysis revealed that these compound adjectives described unique attributes of people and objects and were used to convey the meaning of native words that did not exist in English. It can be concluded that corpus linguistic analysis can help in identifying peculiar grammatical features in a text. The analysis also stressed the need for appropriate vocabulary to communicate cultural elements in English. These findings help in developing awareness in ESL/EFL learners about the cultural element in language learning. The use of corpus linguistics for teaching and learning English in India is particularly relevant as English is used as a bridge language between 23 different officially recognized native languages and a few more unofficial dialects. The scope of corpus linguistics to teach ESL learners in India and beyond is highlighted in the study. The study also shows the use of corpus linguistic analysis for intercultural communication in multicultural India and the globalized world.

15:30
Cross-linguistic Distribution of English Modals in TOEFL Essay Writings

ABSTRACT. This study investigates the distributions of modals in 11 EFL learners’ writings and examines how the distributions change depending on the following two factors: (i) L1 of the test takers and (ii) score levels. The TOEFLL11 corpus is explored, which was recently released and contained the writing samples of 11 L1s. We classified the writings by the above two criteria and examined how these factors influenced the distribution of modals, and the meaning distribution of the most frequently occurring modal can. Through the investigation, we found (i) that the EFL learners preferred to use modals in the order of can, will, and would; and (ii) that the frequency of can decreased but that of will and would increased as score level went up. The meaning distributions were observed by dividing the meaning of can into three categories (ability, possibility, and permission). We found that (i) ability was the most prevalent meaning in the EFL learners’ writings and that (ii) the possibility meaning increased as the test score level went up. The analyses of the distribution of the modals from 12,100 essay samples contributed by TOEFL test takers by 11 different language backgrounds have enhanced our understanding of what modal auxiliary may serve as a default modal with regard to the distribution and meaning, and what effect the score level (as an indirect indicator of proficiency level) may bring about with regard to the form and meaning distribution of the modals.

15:55
A corpus-driven research on the acquisition process of Korean conjugation accuracy using Korean Learners’ corpus

ABSTRACT. Korean language is an agglutinative language with various endings attached to verb stems. Many Korean learners produce conjugation accuracy errors, as combined form with stem and endings can be perceived as completely new forms. Therefore, accuracy is related to language proficiency strongly. This study investigated acquisition process of Korean conjugation forms using a learners’ corpus. A pseudo-vertical learners’ corpus was constructed from level 1 to level 6 including learners’ five first languages: English, Chinese, Vietnam, Japanese, and Russian. The corpus (486,890 words) was constructed based on the Korean Learners’ corpus tagged error annotation which is collected by National Institute of Korean language. I focused on the Korean conjugating rule eun/neun that requires a very complicated cognitive process, such as considering a part of speech of stem, or how to combine other irregular conjugation rules. First, I classified the type of location, such as sentence ending, noun-modifying marker, and connective ending and the type of produced error, such as no-applied conjugation rule, incorrect conjugated form, and substitution errors. Secondly, to analyze the acquisition process, I compared the frequency and ratio of one word by learners’ proficiency. Based on the results, I examined the impact of the first language on the acquisition of Korean conjugation rule through the frequency. Finally, I investigated the differences in the acquisition process according to learners’ first language. The results of the present study provide meaningful implications for teaching the conjugation rule.

14:40-16:20 Session 19D
14:40
Linguistic Etiquette: An Analysis of North Korea’s Public Discourse Employing Corpus Linguistics

ABSTRACT. This presentation will examine a study that had as its research questions how ideology has influenced the language policy of North Korea since the beginning of its nation-state-building project in 1945, as well as how this official language policy has impacted public discourse in North Korea. Political authority in North Korea has long viewed language as an ideological weapon to be used against the enemies of the Korean nation and socialism and as a tool to remold people into patriotic socialists. By investigating various data from North Korea’s state-controlled public discourse (to include mass media, school textbooks, literature, and magazines) between 1991 and 2014, the study examines how official language policy in North Korea impacted propriety, or linguistic etiquette, in language use. This presentation focuses on address-reference terms as part of the linguistic etiquette of language use in North Korea. The divergence in language use between the two Koreas is reflected distinctively in the common address-reference terms employed in those countries since the 1945 division of Korea. The methods of Corpus Linguistics are employed in order to supplement the quantitative dimension of this study. By examining linguistic etiquette in North Korea from a sociolinguistic perspective, this presentation sheds light on the impact of the state ideology on North Korean language use by providing empirical evidence.

15:05
The 2014 Israel-Gaza Conflict: A Corpus-Based Critical Discourse Analysis of the representation of Israel in the Israeli media

ABSTRACT. The Israeli-Palestinian conflict is one of the most intractable conflicts in the Middle East. Revolving around issues of land, borders and rights, it has involved clashes between Israelis and Palestinians in the West Bank and the Gaza strip. In recent years, the conflict has centered on the Gaza Strip between Israel and the Palestinian movement Hamas and has included several rounds of violence.

Societies embroiled in intractable conflicts often use the media to (re)produce and disseminate particular ideologies that, on the one hand, help them handle the conflict and, on the other hand, contribute to the conflict’s intractability (Bar-Tal, 1998). Media, thus, has the power to influence audiences’ perceptions of the conflict’s causes, events, participants, and possibilities of resolution.

This research explores the representation of Israel in the coverage of the 2014 Israel-Gaza conflict in three English-language Israeli news websites. It investigates whether – and, if so, to what extent - the Israeli news outlets converged with/diverged from one another by examining both the language used to report the conflict (using a combination of corpus linguistics and Critical Discourse Analysis) and the news production processes involved in its reporting.

This paper will present the linguistic findings that contribute to the shaping of the conflict’s representation in the Israeli media. More specifically, it will shed light on the degree to which Israeli news outlets “rallied behind the flag” in their coverage of the 2014 Israel-Gaza conflict.

15:30
LINGUISTIC RACISM: NON-INDIGENOUS STUDENTS VS. INDIGENOUS

ABSTRACT. It is easy to find various indigenous tribes in each country globally that isolate from the majority of population in terms of language, culture and religion (Iskandarova, Ladygina, Shambezoda, Zolotukhin, & Abdukhamitov, 2017). This fact obviously leads to discrimination in several aspects of life, of which language is not an exception. According to Skutnabb - Kangas and Dunbar (2010), a large number of education training institutions for ethnic minority students worldwide experiences linguistic racism. Hence, it is no surprise when this situation happens in Australia as there used to have more than 250 indigenous languages being spoken on a daily basis (Walsh, 1993). This paper aims at exploring how linguistic discrimination – a violation of basic human rights based on an individual’s use of language is experienced in the everyday online and offline of indigenous and non – indigenous students at a language university in Vietnam. Besides, the research also explores whether or not linguicism affects students’ language acquisition.

15:55
Thoughts on Suicide in a Facebook Freedom Wall: A Preliminary Sociolinguistic Study

ABSTRACT. While the topic on suicide remains a global issue, "suicide researches" in the Philippines often delve on the psycho-social, medical, and public policy perspectives. This preliminary study looks at the sociolinguistic perspective of suicide by examining thoughts on suicide, which are posted/published (either in English or Filipino) in a Facebook Freedom Wall. This preliminary study seeks how words and concepts reveal (1) what suicide is, (2) causes of suicidal thoughts, (3) personal relationships, (4)Personal situations and experiences, and (5)affective aspect of the various individuals who posted their thoughts on suicide. This study employed Linguistic Inquiry and Word Count Dictionary (LIWC) to categorize and analyze the corpus from posted texts (using Standard Linguistic Dimensions, Psychological Processes, Personal Concerns, and Spoken Categories as LIWC parameters); and thematic analysis to further analyze the different themes and sub-topics that are implicit and were not covered by LIWC. Points of discussion will follow concerning (1) suicide as an end and means of escape/solution,(2)viewed responsibilities for other people,(3)views on the "self," and (4) emotions and behaviors concerning suicide. In the end, this study may contribute to further understanding suicide and persons having thoughts on suicide.

14:40-16:20 Session 19E
Location: Helinox Hall
14:40
Contrastive Research of Korean-Chinese-Japanese Cultural Image Frame by using Cultural Element Mining System

ABSTRACT. There is no denying that language an culture have very close relation which cannot be separated. Therefore, by supplementing cultural elements, it is possible to enhance the efficiency of language education, especially foreign language education. This paper use the Korean-Chinese-Japanese Cultural Element Mining System(CEMS) for the purpose of boosting Chinese language education. CEMS is a tool to find out the distinctive cultural elements by comparing the other different language from the analysis of words’ frequency of co-occurrence. Since CEMS was intended to expand to the multi-language from the very first stage of development, it is very easy to add another language such as English and Japanese. This paper contrast the Korean-Chinese-Japanese Cultural Image Frame by using Cultural Element Mining System.

15:05
A Study on the Pinyin Transcription of Erhua

ABSTRACT. Most Chinese learners in Korea are suddenly exposed to Erhua in the basic conversational stage, but there is little introduction to what Erhua is and explanation of changes in pronunciation. The qualities that make up Erhua are bound together with the preceding syllable to form a new syllable, during which various phonetic variations are shown according to the syllable structure. However, most of the Chinese conversation textbooks that are currently used in Korea only use “r” at the end of syllables while writing Han-yu pinyin. Since the principle of Erhua operating within the Chinese phonetic system differs from other codas, the current notation alone has difficulty understanding and learning about this pronunciation. It is widely known that the different phonetic characteristics of pronunciation appear among native speakers and learners of foreign languages because the mother tongue phonetic system acts as a constraint in learning a foreign language. This study focuses on the specialities of Chinese Erhua and Han-yu pinyin system, pointing out the problems of the current learning style of Erhua by examining through experimental phonetic analysis, whether the difficulty of Erhua learning is caused from the mother tongue transfer or whether the lack of a learning notation works together as a cause of pronunciation error, and exploring more effective teaching methods.

15:30
A comparative study of the formulaic language and structure in Chinese and Korean official document text.

ABSTRACT. The official document text is well-organized and highly formalized text. These formalized features reflect the social, political, and other forms of consciousness of the country. These consciousness influences the formulaic language, the structural features and expression method of the text. In this paper, we analyzed the formulaic language and the structural features of Chinese and Korean official document text, reflecting the social and political consciousness. In this study, based on a corpus, compared formulaic language and the structural features in the official document text of Chinese and Korean, and also analyzed its function.

15:55
Big Data Text Mining Analysis of research trends related to Chinese education in Korea: Focusing on research results for 10 years from 2010 to 2019

ABSTRACT. Currently, the Chinese education community in Korea is in need of organizing existing knowledge related to Chinese language education and setting the direction and specific method of change and innovation based on the results as a prior task of change and innovation to adapt to the era of the fourth industrial revolution. In this paper, we will select the academic research field among various knowledge achievements accumulated in relation to Chinese language education and related research trends in a variety of comprehensive ways. As such, in order to grasp various and comprehensive trends in Chinese language education, we will first utilize big data analysis methods from a methodological perspective. Based on this study, we will use big data analysis techniques that were not attempted in previous research to conduct text mining and social network analysis, one of the big data analysis techniques, on the largest academic research information site in Korea, RISS(www.riss.kr) from 2010 to 2019. Through this analysis method, we will examine the main discourse of each era through keyword extraction over a long period of 10 years from past to present of Korean Chinese language education studies and explore the semantics inherent among each keyword according to these big data techniques. Finally, based on the results of the above study, I would like to present valid guidelines for future Chinese language education research orientation.

14:40-15:30 Session 19F: [Keynote 5] Ute Römer. What can a learner corpus tell us about second language development?

Keynote Session

Location: Grand Ballroom
14:40
What can a learner corpus tell us about second language development?

ABSTRACT. TBA

15:30-16:20 Session 20: [Keynote 6] Heokseung Kwon. 25 years of English corpus linguistics in Korea

Keynote Session

Location: Grand Ballroom
15:30
25 years of English corpus linguistics in Korea

ABSTRACT. TBA

16:30-17:30 Session 21: [Plenary 4] Sylviane Granger. Learner corpus research: we've come a long way but we're not quite there yet

Plenary Session

Location: Grand Ballroom
16:30
Learner corpus research: We’ve come a long way but we’re not quite there yet

ABSTRACT. TBA

17:30-18:30 Session 22: Poster sessions and extended coffee break

Poster Session and Extended Coffee Break

17:30
Creating high-quality bilingual word lists and terms based on same or similar topics in certain languages in Wikipedia

ABSTRACT. We developed a method for creating bilingual word lists and terms from parallel Wikipedia versions. This consists of two steps, the heuristic extraction of parallel sentences from Wikipedia and the construction of robust word vector embeddings for the creation of dictionaries. For the heuristic extraction we align Wikipedia articles on the same or similar topic and find corresponding sentences and paragraphs. Therefore the combination of similar structures of named entities and vector embeddings is used. From these heuristic parallel corpora, we construct a combination of task-specific trained vector embeddings, experiment with a novel alignment technique and similarities in the document-term matrices. The methods provide distance measurements of words between the languages and can be used to create translation dictionaries. Additionally, the measurements can be used in combination with other methods, for example as additional information for translation inference across dictionaries. We create a German to Albanian and an English to Albanian dictionary. As the method works without additional labeled information beyond Wikipedia, it can help low resource languages.

17:30
A corpus-assisted discourse study on transatlantic media coverage of transgenderism

ABSTRACT. A UK-based digital media outlet published an editorial on law in the UK concerning transgender people, though US citizens complained about the language used and the outlet's US-based reporters subsequently published an opinion piece denouncing the editorial in response. This poster summarises a corpus-assisted discourse study that investigated whether the complaints effected a shift in the representation of transgenderism in the outlet's publications. Corpora composed of articles from the UK and US editions covering the time periods before and after the complaints were compiled and compared using both current quantitative techniques and the notion of ‘ethical critique’ in qualitative appraisal of the texts to assess whether there was a subsequent change. This critique was based on the values represented by the editorial that prompted the complaints and also the outlet’s reporters' opinion published in response to them. The study therefore addresses criticism of CDA regarding the analyst’s stance being the only ‘correct’ one by demonstrating how multiple conflicting viewpoints can be taken into consideration in critique of discourse. This has implications for the relevance and validity of corpus linguistics research involving cross-cultural and multilingual data where there may be competing interpretations of data. The poster gives recommendations for handling multiple (even antagonistic) viewpoints in analyses.

17:30
AVOIDING CULTURAL STEOREOTYPES VIA THE ANALYSIS OF STUDENTS’ PRESENTATION SPEECH

ABSTRACT. This paper examines the effectiveness of an intervention on students’ language use in their presentations in an intercultural course. The purpose of the intervention is to help students avoid stereotyping when talking about cultures. Data were collected by means of class observation and analysis of students’ presentation speech, both prior to and after the treatment. Results revealed that students’ committing stereotyping was mainly due to their lack of awareness of using absolute language and that the teacher’s intervention of guiding them to use objective language resulted in a positive presentation quality. The findings could be of practical value to any other cultural courses.

17:30
Designing and implementation of sentence generation model using Deep Learning and KSAT corpora for fabricating false candidate answer sentences in the Korean SAT

ABSTRACT. The “filling-in-the blank” type question in Korean SAT which requires test-takers to choose the most appropriate phrase or sentence from five candidate options demands highest-level of logical reasoning and understanding passage context from the test-takers and from the question makers as well. In this circumstances, we developed a sentence generation model called LXPERMCGEN (LXPER Multiple Choice GENerative model) which fabricates multiple “false answer sentences” using language model and Deep Learning. LXPERMCGEN is a model in which a user specifies the range of a blank or blanks and the vocabulary level of the sentences to be generated, while taking into account of the passage context around the blank(s). The LXPERMCGEN has three characteristics. First, it is a bidirectional sentence generation model that automatically considers the passage context around the blank. Second, it is possible to produce various sentences appraising the vocabulary level. Since LXPERMCGEN is a model that learns for itself from LXPER’s own vocabulary corpus, it generates sentences with words the level of which a user can pre-define. Third, it produces wrong answer sentences that are grammatically correct but not contextually. When LXPERMCGEN creates sentences that will fit into the blank, it uses two measures of how similar they are to the original sentence and how natural they are, which can be used to produce grammatically correct, but contextually incongruous, so that, incorrect sentences.

17:30
The Use of Relative Pronouns in the Inner Circle English and the Expanding Circle English

ABSTRACT. There has been a steadily growing body of research on English relative pronouns (RP) and clauses in various linguistic fields. In terms of EFL or ESL pedagogy, most studies have focused on EFL or ESL learners’ grammatical errors found in English relative clauses (RC). While it is noteworthy that these studies have contributed to our understanding of typical grammatical errors that an English learner produces, there are few studies that compare the differences between the Inner Circle English and the Expanding Circle English in terms of English RP alternation. This study examines the difference in English RP alternation quantitatively and qualitatively between these circles by comparing ICE-GB and the YELC (Yonsei English Learner Corpus). This research reveals that there are some significant differences between these Englishes in terms of the following aspects: semantic and grammatical types of antecedents, distribution of each RP and its usage, extraposition, a grammatical category of the subject of an RC, accessibility hierarchy, restrictiveness, RC length, etc. These findings show how L2 English (by Korean learners) is different from L1 English. That is, Korean learner English prefers short subject RCs with a limited number of antecedent types and other peculiarities. It implies that ESL and EFL English also follow a typical acquisition process that a native speaker encounters while showing some uniqueness due to differences in terms of learning environment, methodology, English language learning time, and grammatical focus, thus resulting in distinctive variation among Englishes.

17:30
Lexical bundles in different academic disciplines: a corpus-based study

ABSTRACT. The use of lexical bundles in 4 different academic disciplines is studied. The disciplines in interest are linguistics, management, engineering and microbiology because they respectively are sought to represent the academic field of arts and humanities, social sciences, physical sciences and life sciences. At least 500,000 tokens are collected per discipline to make up a corpus of approximately 2.5 million words. Research articles from renowned international academic journals in each discipline were collected to comprise the corpus used in the study. 2 features are mainly analyzed in the study, the first being the structure of lexical bundles, following Biber (1999)'s taxonomy. Next, the function of these bundles are studied according to Hyland (2008a)'s categorization. By studying how lexical bundles are used differently in different academic disciplines, the current research is expected to pedagogically aid in discovering disciplinary specificity since it sheds light to some lexical conventions that the members of the disciplinary community find convincing and familiar.

17:30
For Foster Session (Inside the Hermit Kingdom! English Perspective through North Korean English textbooks)

ABSTRACT. This Foster looks at the status of the North Korean English Vocabulary education through investigating phrasal verbs in their textbooks published in 2008. Attendees will be guided through the major problems that have arisen as a result of their teaching and learning phrasal verbs in North Korea and also are introduced to the current obstacles and future direction of further researches for future unified Korea. The current discussion aims to provide attendees with more information about North Korean English vocabulary education and can predict the current English level of who learned English using textbooks published in 2008. Through this presentation, attendees will gain a deeper understanding of the status of many North Korean defectors and future direction of their English vocabulary education as well. Recently, increased awareness on how to deal with the English Education differences between South and North Korea and how to reduce them will help understanding North Korea.

17:30
Parallel corpus in teaching conversional skills in Czech as a foreign language

ABSTRACT. The goal of the present paper is to present the possibilities of using parallel corpus in teaching conversional skills in Czech as a foreign language. As a basis for the study, the InterCorp parallel corpus (http://ucnk.korpus.cz/intercorp/) was used. InterCorp release 11 collects original texts and their translations in total number exceeding 1.5 billion words in 40 different languages. It is consisted of two parts: "core" with manually aligned texts, and "collections" with automatic alignment. In this study, I focus only on the data from collections including film subtitles from the Open Subtitles database (https://www.opensubtitles.org/). Although film subtitles are in fact written texts, they provide a good example of spoken communication. Unfortunately, it is not possible to include spoken data in a form of spontaneous dialogues into parallel corpus, therefore film subtitles seem to be an appropriate source of spoken language within a parallel corpus. Hence, they could be used in teaching conversional skills with emphasis on vocabulary typical for spoken register. Thanks to the parallel concordance, advance students have the opportunity to observe typical spoken vocabulary in Czech such as: chlap ‘guy’, viď ‘right’, hele ‘look’, jo ‘yeah’ etc. and find their equivalents in English (or any other language available in the corpus). Students are exposed to the language and acquire a new material much faster than working with artificially prepared texts in a course book. Appropriate corpus exercises, which will be introduced in this paper, present the whole procedure of using corpus methods in enriching conversional skills in Czech.

17:30
A Corpus-based Contrastive Study of Relative Clauses among Spanish, English and Chinese

ABSTRACT. This study focuses on the word order of relative clauses to determine the similarity and distance among Spanish, English and Chinese by comparing the target linguistic structures in the corpora of these languages and parallel corpora. This investigation, rarely studied previously in the field, is a subset of a larger project on the influences of cross-language transfer on L3 Spanish acquisition of relative clauses by Chinese-speaking learners in Taiwan, whose first foreign language is English (L2) and second foreign language is Spanish (L3). To obtain systematic results for language similarity and distance, a cross-language contrastive analysis of the structure of trilingual expressions with equivalent semantics based on the “Trilingual Parallel Corpus of Spanish, English and Chinese” was conducted. As for the analysis of monolingual corpora material, three corpora: the Spanish corpus “Corpus de Español”, the English corpus “Corpus of Contemporary American English”, and the Chinese corpus “Academia Sinica Balanced Corpus of Modern Chinese” were used. As a result of this study a new computer program, Language Similarity Tool was developed to facilitate the corpus-based analysis for the calculation of linguistic similarities or the distance between different languages. The results of both parallel and monolingual corpus-based analyses show that the language distance between Spanish was greater than the distances between Spanish-English and English-Chinese. The results of the study could provide some linguistic insights in examining the acquisition of L3 Spanish relative clauses by Taiwanese learners whose mother language is Chinese and first foreign language is English.

17:30
Investigating Lexical Bundles in EFL Learners’ academic writing

ABSTRACT. This study investigates the use of four-word lexical bundles by postgraduate EFL learners’ academic writing. The lexical bundles identified in EFLL corpus were compared with lexical bundles identified in ELT journal articles and with bundles identified in previous studies. The aim is to establish whether postgraduate EFL learners use lexical bundles in the same way as experts do. A corpus-based methodology was used to identify frequent lexical bundles of different length at various frequency cut-off points (Biber et al., 1999), the results indicated that EFL learners used more varied four-word lexical bundles in their academic writing more frequently than ELTJ writers. In the structure analysis it emerged that the EFL learners and writers used a variety of structures to form the lexical bundles they used in their writing. Moreover, it was found that the EFLL corpus contains more NP based structures and VP based structured than the ELTJ corpus, whereas the ELTJ writers used PP-based structures more frequently than EFL learners. After this, functional analysis revealed that EFL learners used less referential bundles than ELTJ writers. However, the stance bundles and the discourse organiser were used by the EFL learners slightly more than the ELTJ writers. The findings were interesting and paralleled the findings of other studies who found that non-native speakers produced a larger number of lexical bundles than native speakers.

17:30
MMDA: An interactive Working Environment for Corpus-based Discourse Analysis

ABSTRACT. The research community has long agreed that corpus linguistics and critical discourse analysis are a very fruitful combination, yielding corpus-based discourse analysis (CDA). The important steps of CDA are (1) the definition of topic nodes, i.e. writing a corpus query to capture the linguistic realization of the topic, (2) keyword and collocation analyses, including the identification of lexical items that form discourse themes. This process is iterative: the quantitative analyses have to be repeated several times during the creation of discourse themes.

The existing selection of general-purpose corpus working environments, such as CQPweb, fall short of the expectations of hermeneutic interpreters. Looking at keywords or collocates purely sorted by association strength without the possibility to manually group or annotate them hinders the workflow considerably; we clearly need “new types of algorithmic and computational tools for reading texts”.

We present an integrated toolkit for CDA entitled MMDA (short for “Mixed-Methods Discourse Analysis”), which comprises functionalities such as: (1) visualization of the semantic field of a topic node or a keyword analysis in terms of association strength and semantic similarity, (2) interactive grouping of lexical items into “discoursemes” alongside methods for combining discoursemes into discourse themes and discursive positions, (3) higher-order collocates of discursive positions.

The toolkit has already been beta-tested by several hermeneutic researchers. Its implementation is in Python and Javascript, using the IMS Open Corpus Workbench for query processing and concordancing. Open source release of our software will be available by the time of APCLC 2020.

17:30
The effect of word final stops’ voicing on the vowel duration and its relation with Korean speakers' English proficiency

ABSTRACT. It has been well known that the vowels preceding voiced stops are produced durationally longer than the vowels preceding voiceless stops in English. Based on it, this research investigates to determine whether there is any relation between the Koreans' English proficiency and the difference in the ratio of the vowels before the voiced stops to the vowels before the voiceless stops.

For analysis, Korean-Spoken English Corpus (K-SEC) and rated K-SEC were utilized where the 32 Korean elementary school students from Seoul and Gyeonggi province produced the target items such as bag/back and cab/cap. These speakers in the data were also categorized into one of the three different proficiency groups according to their speaking ability.

From the findings, native English speakers (elementary school students) were found producing the vowels before the voiced stops 1.421 times longer than the vowels before the voiceless stops. Similarly, in the case of the Korean speakers, they produced the vowels preceding the voiced stops longer than the vowels preceding the voiceless stops in English.

However, the study failed to discover any possible relationship between the speakers' English proficiency and the ratio of the vowels preceding the word-final voiced stops to the vowels preceding the word-final voiceless stops in English.

Therefore, this study reached the conclusion that it is hard to predict the level of English proficiency of Korean learners based on how strongly the word-final stops' voicing affects the duration of the vowels preceding the word-final stops in English.

17:30
"The relation between the vowel duration and intensity in English NPs [an article - a monosyllabic noun] and the English proficiency of Korean learners

ABSTRACT. Generally, the vowels in content words are phonetically more prominent than the ones in function words in English. This is because they have a stress of a suprasegmental feature, making them produced longer in duration and greater in intensity. Based on this phonetical phenomenon, this research investigates the relation between the vowel duration and intensity in English NPs [an article - a monosyllabic noun] and the English proficiency of Korean learners. For analysis, the researchers utilized Rated Korean-Spoken English Corpus (made in 2017) to collect and measure the duration and intensity of each vowel in the monosyllabic noun phrases of the 853 sentences articulated by the Korean speakers who were categorized into one of the four different English proficiency groups based on their speaking ability (Advanced, Intermediate-High, Intermediate-Low, and Novice). From the findings, it was revealed that the higher the proficiency level becomes, the greater the ratio of the vowel duration in the nouns of a content word to the vowel duration in the articles of a function word gets, meaning that the higher the English proficiency level reaches, the better the Korean speakers articulate the words with proper stresses as native speakers do. However, the ratio of the vowel intensity in the nouns to the vowel intensity in the articles has less relation to Korean speakers' English proficiency.

17:30
Using the USAS Semantic Tagset to explore persuasive language in Jeremy Taylor’s Holy Living and Holy Dying, 1650-1651

ABSTRACT. This poster reports on the initial stages of a study on persuasive language in two texts, Holy Living and Holy Dying, written by Jeremy Taylor in 1650 and 1651. The purpose of the study is two-fold: 1) to explore the persuasive language Taylor uses in his writing; and 2) to explore the usefulness of semantic tagging in this type of investigation. The corpus, consisting of the two Taylor texts (226,035 tokens), was first tagged with the USAS Semantic Tagset using the Wmatrix interface. Several persuasion techniques (such as “emotional appeal”, “attack”, “inclusive/exclusive language”) were identified from the political and religious persuasive discourse research literature. Semantic tags were then selected from the USAS tagset that related to each of these persuasion techniques. When exploring “emotional appeal”, for example, the “E” tag (Emotion) was used as a search item. A list of key concepts (semantic keyness) was also generated using the Wmatrix interface and concordance lines were consulted for finer detail on the nature of the persuasive technique being used. It was found that Taylor seemed to prefer negative persuasion techniques, such as appeals to fear and sadness in his writing. This is illustrated well by four out of the top five most frequent Emotion-related items being E4.1- (repentance), E5- (fear), E4.1- (sorrow), and E4.1- (sad). By working with a general-to-specific approach — that is, from persuasion technique, to general semantic concept category, to specific lexical item — elements of persuasion in the text could be readily identified.

17:30
Connection point between grammar and media: On the usage of Japanese ending particles in LINE, Feature Phone e-mail and Face to face conversations

ABSTRACT. In the last 2 decades, the rapid advancement in IT has steadily affected Japanese language usage and the connection point between language and each media has gathered attention. In this presentation, we focus on the grammatical function of words and its connection point with communication in each media, from the view point of the usage of Japanese ending particles in LINE, Feature phone e-mail and face to face conversations. Especially, we show the next 2 points: 1)The appearance frequency of the ending particles per turn in Feature Phone e-mail is twice as much as that in LINE and face to face conversations. 2)The tendency described in 1) is related to the length of the turn in each media. Also we take up the usage of " Yo", "Ne" and "Na" (which appearance tendency fluctuates in each media), and divide these usages into 3 categories ("First Pair Part (FPP)" usage, "Second Pair Part (SPP)" usage and "Others" usage). We investigated the ratio of each category and found out the facts of 3) and 4): 3)It is possible to explain the appearance frequency of “Yo” and “Ne” in each media by the increase/decrease of synchronized utterances, which depends on the length of the turns. 4)Considering the grammatical feature of "Ne" and "Na", these words have almost the same functions but only "Na" can appear as a delay reaction when it's used as "SPP". Therefore the appearance tendency of "Ne" and "Na" differs in each media.

17:30
Machine Translation and Accuracy of Tense-Aspect forms: A Study of Compound-Sentences translated from Japanese to Brazilian Portuguese

ABSTRACT. The Japanese language has basically 4 temporal/aspectual forms (ru, teiru, ta, teita), and those forms are used to express several temporal/aspectual meanings. Also, in compound-sentences, the temporal/aspectual meaning of a certain form might change, depending on the combination of forms used in the main clause and the subordinate clause. On the other hand, Brazilian Portuguese has a bigger variety of tense-aspect forms and the temporal interpretation of a form does not vary as much as in Japanese. In the context of machine translation, the above-mentioned difference of tense-aspect forms might cause interpretation problems that lead to a mistranslation of these forms. This study aims to verify the level of accuracy of the translation tool Google Translate® in terms of tense-aspect forms in compound-sentences from Japanese to Brazilian Portuguese. Aiming to also evaluate the level of accuracy of the translation in terms of temporal relation between the clauses, the object of the study was limited to toki-clauses (a type of temporal compound sentences). 573 sentences were selected from the Corpus of Spoken Japanese (CSJ), and translated to Brazilian Portuguese using Google Translate. The translated sentences were analyzed using the criteria of accuracy in terms of Tense and Aspect. The analysis result showed that 60% of the sentences’ tense-aspect forms were accurately translated, but the accuracy level changes depending on the combination of tense-aspect forms of the main clause and the subordinate clause. Also, among the categories of Tense and Aspect, the latter was the most problematic category to translate.

17:30
A corpus-based analysis of singular versus plural "none"

ABSTRACT. This paper examines whether the number of the indefinite pronoun "none" follows the prescriptive rules, which either limit its use to the singular or leave the number determined by the referent or the user's desired effect. A set of corpus data--the Michigan Corpus of Academic Spoken English (MICASE) and the Michigan Corpus of Upper-level Student Papers (MICUSP)--were analyzed to reveal a glimpse of the descriptive rules that underlie people's actual use of the word in modern academic English. The analysis of the MICASE and MICUSP search results did not confirm the prescriptive rules. In the "none + verb" construction, it was shown to be used according to the agreement rule that tightly regulates other indefinite pronouns such as "most" or "some," taking the same number verb as the referent. When "none" is used as part of the subjective "none of + singular (pro)noun" phrase, its number is determined by the number of the following (pro)noun--the singular. In the case of the "none of + plural (pro)noun + verb" construction, the verb forms are mostly determined by the number of the following (pro)noun--the plural--which seems to reflect people's belief that the verb should agree with the closer of the two nouns. In 81% of the MICASE and 78.5% of the MICUSP data, the verbs were found to take the form of the noun closest to the verb, indicating that the principle of proximity applies to the construction.

17:30
The Effect of Indirect Corrective Feedback on the Writing Performance of Senior High School Students in Ansano Memorial National High School

ABSTRACT. Abstract

Sittie Jannah A. Macaurog, LPT, MAELT

The study entitled, The Effect of Indirect Corrective Feedback on the Writing Performance of Senior High School Students in Ansano Memorial National High School aimed at answering the following: 1.) pretest and post-test mean scores of the subject participants in the control and experimental groups, their mean gain scores and significant differences; 2.) the writing performance of the subject participants as shown in the ESL Composition Rubric after the intervention; 3.) feedback and preferences of the subject participants on Indirect Corrective Feedback. The study made use of a quasi-experimental design to know the differences in writing performance of the two groups. 40 students were divided into two homogeneous groups (control and experimental) according to their first grading grade in their subject Reading and Writing. The pretest findings showed that the level of writing performance of the two groups was qualitatively described as fair to poor. The post-test results showed that subject participants in the control group got fair to poor level of writing performance, while experimental group got good to average level of writing performance. There is also a significant difference between the mean gain scores of the two groups. The experimental group had improved in their writing performance specifically in the components rated in the pretest and post-test found in the ESL Composition Rubric. The experimental group got higher scores in all these components. Indirect Corrective Feedback is effective and should be applied in Senior High School’s Reading and Writing Course.