previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session PL3: Plenary 3: Stefan Evert

Plenary speech 3:

Distributional Methods in Corpus Linguistics: Towards a Hermeneutic Cyborg

Stefan Evert (Friedrich-Alexander-Universität, Germany)


Location: Conference Hall
10:00-10:20Coffee Break

Entrance Lobby of Kagawa International Conference Hall, Tower building 
6F, Takamatsu Symbol Tower

10:20-12:20 Session 3A
Identifying the vocabulary size for Korean EFL students in linguistics

ABSTRACT. It is essential for an EFL college student to acquire adequate knowledge of English lexis when English-medium textbooks are utilized in university classes. Hence, developing ways to help students enhance their proficiency in English vocabulary has emerged as one of the main concerns in English education. This study is an attempt to aid Korean EFL students of linguistics majors in attaining reasonable lexical knowledge of English necessary to comprehend their core textbooks in the curriculum requirements. Specifically, this research was guided by two research goals: (1) to examine the vocabulary demands of EFL linguistics majors; and (2) to create a Linguistics English Word List (LEWL). For the current study, a corpus of English linguistics textbooks was constructed, which consists of c. 730,000 running word of five textbooks in general linguistics from ebook database: the Corpus of Linguistics Textbooks (CLT). The LEWL is then derived from CLT by means of the methodology modified from Hsu (2014). The input data were processed to guarantee the optimal relevance of the lexis to the research goals, so that the words are both “common enough” and “worthwhile to learn” (Hsu 2014: 60). The results of the analysis demonstrate that knowing the most frequent 6000 words plus proper nouns, abbreviations, interjections and transparent compounds would command 95% lexical coverage of a linguistics textbook to ensure adequate reading comprehension. Beyond the first 2000 words, the most frequently-occurring word families in the corpus will be ultimately identified, which account for c. 95 % of the total words in the linguistics textbooks. This study hopes to provide a way of assisting EFL college students in enriching their lexical competence needed to grasp main textbooks. It also aims to give some directions to ESP instructors when preparing teaching materials. EFL students could better benefit from focusing on words of imminent need and relevance. References Hsu, Wenhua. 2014. Measuring the vocabulary load of engineering textbooks for EFL undergraduates. English for Specific Purposes 33, 54-65.

Distinguishing L1 and L2 Using Three Linguistic Aspects: A Logistic Regression Model Study

ABSTRACT. The linguistic features that distinguish learners' writing from that of native speakers in English are investigated, providing insights into the critical differences between L1 and L2.

Crossley and McNamara (2009) used 10 lexical features to classify L1 and L2 texts, reporting 79% accuracy. Sugiura, Abe, and Nishimura (2017) expanded on this study by adding syntactic features. They reported an accuracy rate of 93.5% with just two variables: the mean length of T-unit (MLT) and D. Sugiura, Abe, and Nishimura (to appear) refined the variable selection process and achieved 96% accuracy with the MLT and the measure of textual lexical diversity (MTLD) by quadratic discriminant analysis.

In the current study, logistic regression models, selected based on the Akaike information criterion, were used to distinguish L1 and L2 texts using two different corpora, both of which include L1 and L2 data: the Nagoya Interlanguage Corpus of English Reborn (NICER) and the International Corpus Network of Asian Learners of English (ICNALE). NICER is comprised of expository essays on the writer's chosen topic, while ICNALE includes argumentative essays on specific prompts.

The model derived from NICER contained three variables: mean length of sentence (MLS), average number of types of 10 random 50-word samples (ndwerz), and a verb variation measure computed from verb types and tokens (cvv1). Two additional variables, mean length of clause (MLC) and average number of types of 10 random 50-word sequences (ndwesz), were included in the ICNALE model as follows: NICER: 0.43*MLS +1.02*ndwerz +4.01*cvv1 −66.78 ICNALE: 0.59*MLS +0.24*MLC +0.31*ndwerz +0.41*ndwesz +1.23*cvv1 −45.88 Each model achieved high accuracy rates (99% and 95%, respectively). The difference between the models may be attributed to the contrasting essay types. However, the model variables suggest three linguistic aspects are crucial in L1/L2 distinction: syntactic unit length (MLS/MLC), lexical variety (ndwerz/ndwesz), and verb variation (cvv1).


Crossley, S. A. & McNamara, D. S. (2009). Computational assessment of lexical differences in second language writing. Journal of Second Language Writing, 17(2), 119-135. Sugiura, M., Abe, D., & Nishimura, Y. (2017). What Kind of Linguistic Features Distinguish Second Language Learners’ Texts from Those of Native Speakers, and Why? LCR2017. Sugiura, M., Abe, D., & Nishimura, Y. (to appear). Linguistic Traces of L2 Processing Difficulties Found in Learner Corpus Data. ICAME39.

S-genitives and Of-genitives Seen in L2 English Learners’ Essays: A Study Based on the ICNALE Written Essays

ABSTRACT. S-genitives (e.g., Mike’s book) (SGs) are acquired at a relatively early stage by L2 learners (Dulay and Burt, 1973; Dulay and Burt, 1974; Bailey, Madden, and Krashen, 1974). However, not a few learners do not understand the usage of SGs, and they often have problems in the choice between SG and Of-genitives (e.g., the book of Mike) (OGs). Biber, Johansson, Leech, Conrad, and Finegan (1999) analyze L1 English corpus and conclude that (1) SGs occur less often than OGs (approx. 25% of occurrences of OGs), (2) SGs tend to occur in news and fictions rather than academic proses, (3) SGs are chosen when they have a subject status and refer to old information, and (4) the dependent nouns of SGs are often single words referring to human beings. However, it is not necessarily clear whether these patterns are equally seen in L1 and L2 English essays. Therefore, using the ICNALE (Ishikawa, 2013), we analyzed topic-controlled essays by L2 English learners at different proficiency levels as well as English native speakers. The preliminary data analysis has shown that the frequency of SGs is lower than that of OGs both in L2 English (9.3%) and L2 English (7.2%). However, the discrepancies are also observed. In L1 essays, the dependent nouns of SGs refer to people (85.7%), time (9.5%), concepts (3.2%), and places (1.6%), while in L2 essays they refer to people (68.1%), institutes (16.5%), concepts (8.8%), time (4.4%), and places (2.2%). Learners seem not to fully understand the close link between SGs and human dependent nouns in an appropriate way. The analysis has also shown that learners have a clear tendency to use SGs in much simpler forms than native speakers.

A Japanese/English DDL Tool for Primary School CEFR Pre-A1 EFL Learners

ABSTRACT. The efficacy and effects of DDL (Data-Driven Learning) in an EFL pedagogical context have been widely verified in recent years (Boulton & Cobb, 2017). In our presentation, we will report on and demonstrate a newly developed web-based DDL corpus and concordance tool. This freeware is scheduled to be available this fall. One of the unique features of this tool is that its target learners are Japanese primary school students who have very limited English vocabulary and grammar knowledge. Generally, these learners are below the CEFR A1 level; recently, this level was termed Pre-A1. This website was developed because our previous studies have shown DDL can be an effective methodology for Pre-A1 level students and is especially effective at enhancing the ability to notice language behavior underlying concordance lines. Furthermore, there are few useful, level-appropriate corpora available for introductory level learners. Therefore, in this study we developed a 25-million-word corpus of language data from introductory level English material such as graded readers, children’s stories from native speakers of English, authorized English textbooks from East Asian countries, and similar sources, and created copyright-free sentences based on the vocabulary and grammatical structures. To use this corpus, we developed a DDL tool for Pre-A1 learners. This tool provides concordance lines that are short and simple with easy words and contexts. The English concordance lines are shown with corresponding Japanese translations in the form of a parallel corpus. In the presentation, we will demonstrate how Pre-A1 students and non-native English instructors can use this DDL tool. 

10:20-12:20 Session 3B
A corpus-based analysis of the use of subordinators by Japanese learners of English

ABSTRACT. Subordinators such as conjunctions, adverbial connectors, and relative pronouns play important roles in combining different clause elements into complex sentences. Accurate use of complex sentences involving subordinators helps provide contexts in detail and give favorable impressions to readers. However, some researchers claimed that Japanese English as a foreign language (EFL) learners do not acquire sufficient knowledge of appropriate use of conjunctions in academic contexts even though they have studied these subordinators earlier[1]. The typical case is its overuse of simple subordinators like if or when, placed in front of short simple main clauses, and learners do not seem to make a further progress toward the use of subordinators in more complex, post-verbal/nominal positions. In order to investigate distinctive use of subordinators by Japanese EFL learners, this paper examined the relationship between the length of main/subordinate clauses, the frequencies of subordinate conjunctions and relative pronouns, and the positions of these conjunctions/pronouns. The sample complex sentences were categorized as follows: adverbial clauses (if, when, because), nominal clauses (that, how, what), and relative clauses (that, which, who). In this study, the International Corpus Network of Asian Learners of English (ICNALE) Written Essays Version 2.3 was used, which contains approximately 1.32 million tokens of English essays produced by 2,800 participants of Asian L2 English learners as well as a corpus of L1 English native speakers. The findings show that Japanese EFL learners overused if, when subordinate conjunctions with significantly shorter main/subordinate clauses in comparison with learners in other Asian countries. The results imply that Japanese EFL learners' knowledge of other subordinators is quite limited and the same thing can be said about their use of nominal or verbal modifiers. It is suggested that L2 instructors should show learners how to form longer and complex sentences using a variety of subordinators.

Analysis of Associative Information for Second Language Learning of Japanese

ABSTRACT. Natural language processing (NLP) technology has recently been applied and used for support systems or teaching material for learning Japanese in second language acquisition. Research on Japanese language learning using language resources and techniques related to NLP is divided into the following two categories; “dictionary/corpus” and “learning-support system”. The corpora and learning-support systems contain a large amount of data and are useful; however, compared with other major languages, development of dictionaries, corpora and teaching materials based on the results of language and education research on Japanese is not sufficient.

To improve learning efficiency with a different approach than before, we aim to enable second language learners of Japanese to acquire vocabulary along with associative information of Japanese native speakers by using the Associative Concept Dictionary for verbs (Verb-ACD). This dictionary which we previously constructed consists of data from large-scale association experiments, so the word relations obtained from this dictionary are different from corpora based on newspaper or web texts.

In this study, we compared the associative information in Verb-ACD with co-occurrence information in newspaper corpora as the first step toward our goal. We extracted associated words (nouns) of all stimulus words (verbs) from Verb-ACD and co-occurring words (nouns) with the same verbs in each sentence from the corpora. To analysis the differences between them, we calculated Spearman’s rank correlation coefficient for each rank of the same nouns. Moreover, by using word2vec, we got word vector representations from data of association experiments and newspaper corpora, and compared verbs with high similarity verbs between them.

As a result, we confirmed that association and co-occurrence are different for verb-noun and verb-verb relations. Therefore, we conclude that applying our Verb-ACD to second language learning of Japanese has different effects than before.

Politeness and consensus building strategies in a task-based corpus of English learners

ABSTRACT. This study reports on some of the findings of a preliminary study in which we examine an interaction among four advanced learners of English. The data derives from a larger project for constructing corpus by speakers of English as a second language who engage in LEGO Serious Play (Tanimura et al., 2016; cf. Bjørndahl, et al. 2015). The Play is designed for the purposes of social welfare (Gantlett, 2007) and we apply it to an educational setting. In the task, the participants put together LEGO blocks in a way that represents abstract concepts such as “Knowledge” or “Collaboration”. In the process, they first work collaboratively and then someone in the group explains the end products. By focusing on the collaborative part, we show how two proficient speakers dominate the interaction while the rest only signal minimal responses. Analytically focusing on politeness strategies (Brown and Levinson, 1987), we demonstrate that Paul, the most proficient male speaker from an EU country, almost exclusively interacts with Yoko, another proficient female speaker from Japan, by using both positive and negative politeness strategies: use of tokens of agreement and acknowledgement (e.g. “good,” “go on,” “cool,” and lots of laughter), and expression of oppositional views in an indirect way (e.g. “really?” and “do they?”). The two less proficient speakers only produce minimal response tokens such as nodding and smiling while directing the eye gaze to the LEGO blocks. Those elements are all important to achieve consensus. Interestingly, once the participants come up with an idea, usually a particular scene, during their on-going discussion, they suddenly start working to make the scene coherent. Implications are discussed with reference to how “distributed cognition” (Hutchins, 2004) and differential “communicative competence” (Hymes, 1972) instantiate asymmetrical verbal contributions, which are materialized by the final LEGO products.

Annotating the functions of learner utterances from a spoken corpus and assigning the degrees of their grammatical accuracy and discoursal acceptability

ABSTRACT. The aim of this study is to present the annotation scheme for classifying the functions of learner utterances and their degrees of grammatical accuracy and discoursal acceptability during shopping role plays from the National Institute of Information and Communications Technology (NICT) Japanese Learner English (JLE) Corpus. The author examines the learner data at three proficiency levels: the CEFR A1, A2, and B1 levels. The NICT JLE Corpus is composed of written transcripts of the Standard Speaking Test (SST), an oral interview test in which A1 and A2 learners were given a general purchasing task, while B1 learners were given a negotiation task. The following research questions are addressed: 1. What kinds of functions are observed in learners’ utterances? Are there any differences in the distributions of the functions between learners at three proficiency levels? 2. What are the proportions of the degree of grammatical accuracy and discoursal acceptability in the frequently observed functions? Are there any differences between the learners at different proficiency levels? The author investigated 68, 114, and 66 files and segmented 870, 1,828, and 1,079 utterances of A1, A2, and B1 learners, respectively. Although 99.4% of B1 learners’ utterances had the function of “communication for transaction” including “explaining the background” and “requesting an action,” 59.2% and 55% of A1 and A2 learners’ utterances, respectively, had the function of “dealing with transaction” including “expressing their intention to buy an item” and “expressing or asking about a particular item.” The author identified 52.1% of A1 learners’ utterances as having a high degree of grammatical accuracy and discoursal acceptability. The ratio increased with an increasing proficiency level: 55.0% and 66.8% at the A2 and B1 levels, respectively, which indicated a significant difference (x2 = 39.45, df = 1, p < .00001, Cramer’s V = .117).

10:20-12:20 Session 3C
Dead of a lesser God: Victims’ voice and representation in the Colombian press

ABSTRACT. The 50 year+ Colombian conflict reached its most violent peak during the turn of the century (1998-2006), which coincided with peace talks with the major agents of violence: Marxist guerrillas (FARC) and right-wing paramilitaries (AUC). Previous research on the representation of crimes by each group has clearly demonstrated a pattern of highlighting guerrilla violence while concealing paramilitary responsibility in crimes against humanity (García 2013, 2017). These findings shed light on social phenomena such as the popular rejection towards the peace process with Farc and indifference towards the agreement signed with the paramilitaries.

In this paper, I focus on the representation of the victims of each group in a 300.000+ word corpus of newspaper reports of fatal violence from the four major Colombian newspapers from the time period indicated above. The analysis combines concepts, tools and techniques from Systemic Functional Linguistics (participant roles, appraisal), Corpus Linguistic (word lists, concordancing, key words), and Discourse News Values Analysis (Bednarek & Caple, 2017) (personalisation and negativity) to contrast the different construal of the victims. The results clearly indicate not only a significantly higher word count for guerrilla victims’ statements, but also a higher likelihood of eliciting solidarity towards this group through the use of given names, the highlighting of family ties and the description emotional responses, among other personalising techniques. On the other hand, the humanity of paramilitary victims is backgrounded by not only the much lower frequency of this type of content but also the frequent referral to them as merely a number or with generic terms (e.g. the dead, the victims).

Bednarek, M., Caple, H. (2017). The discourse of news values: How news organizations create newsworthiness. New York, NY: Oxford University Press. García-Marrugo, A. (2013). What's in a name?: The representation of illegal actors in the internal conflict in the Colombian press. Discourse & Society, 24(4), 421-445. García Marrugo, A. 2017, ‘On the grammar of death’: The construal of death and killing in Colombian newspapers, Functional Linguistics, vol. 4, no. 1, pp. 1-17.

Developing Language Resources for Indigenous Languages in Indonesia: Annotated Indonesian and Javanese Corpus Building

ABSTRACT. Although considered as the second most linguistically-diverse country, Indonesia is ironically also known as a country with many under-resourced languages. The linguistic situation in Indonesia has been described in terms of three categories: the national language, indigenous languages, and foreign languages. This paper presents our attempt to develop language resources for indigenous languages in Indonesia. Since there are 719 indigenous languages in Indonesia, it would be very time-consuming and costly to develop LR for all Indonesian indigenous languages. On that account, the initial phase will be focused only on major languages. From these major language, Indonesian and Javanese are chosen for our pilot project. Although Indonesian is a national language, it is basically originated from Malay that is still broadly spoken throughout the archipelago in diverse variations. In this perspective, Indonesian is also an indigenous language. Regarding Javanese, it is the most important language with the largest number of speakers in Indonesia. With the total number of speakers reaching 84.3 million people, Javanese is regarded as the twelfth most spoken language in the world. This paper discusses the drawbacks and opportunities in our attempt to build a Javanese annotated-corpus that is publicly accessible. At the first phase, we have developed the database system and architecture for Javanese corpus building. In this paper, we discuss the criteria for corpus building and the design for web-based corpus management and query application. At the second phase, we created an annotated Javanese corpus by implementing a semi-supervised Javanese POS tagging system. The system is developed based on conditional random field (CRF) algorithm that is derived from Hidden Markov Model (HMM). This algorithm is a class of statistical modeling method that uses a structured prediction. In this algorithm, a POS tag can be determined based on a word context, that are words that precede and follow it.

Design principles and data collection for CELEN: a corpus of Learner Spanish in Japan.

ABSTRACT. Written learner corpora for Spanish are still scarce (Alonso-Ramos 2016) and none of them contains data from Japanese speakers. To facilitate the research on the acquisition of Spanish (Mendikoetxea 2014) and to enhance Foreign Language Teaching, we present the creation of a corpus of Learner Spanish in Japan. In Japan, around a dozen universities offer a major in Spanish, and many more offer the possibility of studying Spanish as a second foreign language for one or two years. The target of our study are mainly students majoring in Spanish, so as to be able to include texts of different proficiency levels. We expect to cover mainly four out of the six CEFR levels (Common European Framework of Reference for Languages): A1, A2, B1 and B2. Up to now we have collected data from around 500 learners majoring in Spanish at Kansai Gaidai University and around 300 learners of Spanish as a second foreign language at Kyoto University. In this presentation we will present the data collected at Kansai Gaidai University. We will describe the design principles and workflow for the creation of the corpus: the design of the learner profile questionnaire and written tasks, the ongoing collection of data at some universities and the future post-processing of texts. The research is ongoing, so linguistic annotation with part-of-speech information, extraction of key features and visualization are future lines of research.


Alonso-Ramos (2016), Spanish Learner Corpus Research. Current trends and future perspectives. John Benjamins.

Mendikoetxea, A. (2014). Corpus-based research in second language Spanish. Geeslin, K.L. (Ed.), The Handbook of Spanish Second Language Acquisition, Blackwell, UK.

A Corpus-based Analysis of Formulaic Expressions in Korean Academic Prose

ABSTRACT. This study provides a corpus-based genre analysis of formulaic expressions in Korean academic prose, focusing mainly on prefabricated lexical bundles, collocations, colligations etc. As an agglutinative language, phrasal structures in Korean incorporate particles and verbal endings in word-units and are more complex than the corresponding English structures. While exploring relevant challenges and new methodological tools to capture typologically distinct properties of Korean, we identify unique genre-specific linguistic features of L1 academic texts using verb (and adjective) lemmatization. In previous literature, it has been argued that morpheme-based cut-offs are more useful than word-based extraction (or space-based ecel extraction) for an agglutinative language like Korean (Nam et al. 2012; Jang 2015). Whereas morpheme-based processing has some advantages for identifying formulaic expressions in Korea, it does not accurately incorporate frequency information and differentiates conjugated predicate forms. This is because the morphemic bundle approach over-generates unit patterns by treating all combinations of a core predicate with different endings as distinct patterns. In this study, we combine lemmatization of predicate forms with morpheme-based N-gram extraction and provide better outcomes for phraseological units. An eleven million ecel corpus has been built by collecting 2171 academic papers with the highest ranks within the Korea Citation Index within the disciplines of humanities and social science. The revised process of merging a verbal lexeme with the following dependent morpheme(s) significantly decreases the number of extracted unit patterns as compared to purely morpheme-based processing of N-grams in Korean. While addressing related challenges in language specific processing and analysis, we identify distinct properties of extracted formulaic expressions in the Korean academic register. Distinct linguistic functions and distributions of formulaic expressions will be examined while considering diverse grammatical relations of collocation and colligation in Korean.

10:20-12:20 Session 3D
Scientific Findings and their Markers

ABSTRACT. In this paper we study possible markers of invention, discovery, and expression at the moment of the findings, in the scientific publications of the inventers and discoverers, and, in particular, with the view that these markers be detected automatically. As such, in this paper we present an initial step in providing a corpus manually annotated with such markers. To constitute an initial working corpus for our work, we have chosen the particular fields of computability and information theory. Our working corpus comprises seminal publications of Alonzo Church, Alan Turing, David Deutsch and Claude Shannon. Our choice of these fields of research is due to the interest arising concerning their place in current developments in science and engineering. In the paper, we expose the scientific context of the working corpus by briefly describing the chosen authors’ inventions and discoveries. We discuss the possibility of automating at least in part the manual annotation work exemplified in the paper. We discuss the relevance of the corpora for future work. In particular we make some preliminary observations using our annotated corpus. In respect of the question ‘What is the relation between computation and information in Nature?’, our interest is that emanating from the conflation of Deutsch’s work “Quantum theory, the Church-Turing principle and the universal quantum computer”, and Shannon’s observation concerning the formulae defining entropy in thermodynamics and in information theory. We conclude by suggesting that epistemological research might be aided by means of the automated detection of such markers possibly present in appropriate corpora.

Compiling Chinese Learner Dictionary with Corpora: practices and challenges

ABSTRACT. The part played by corpora has become increasingly important and “no serious compiler would undertake a large dictionary project nowadays without one (and preferably several) at hand” (De Schryver 2003:167). This paper reports ongoing project of compiling the learner dictionary with the aid of corpora for intermediate learners of Chinese as a second language. A wide range of software tools and resources are available today to assist English dictionary compiling. By contrast, the lack of large Chinese balanced corpora for lexicographic purposes is one of the greatest impediments to our project. The project utilizes the BCC corpus, Chinese Interlanguage corpus and the Sketch Engine as valid reference resources for the lexicographical information of meaning, grammar, phraseology, collocation, prosody and pragmatics. Due to limited corpora at our disposal, much remains to be done for the corpus-assisted Chinese lexicography.

The phraseological treatment of non-high-frequency words by online dictionaries for English learners

ABSTRACT. This paper examines the way in which non-high-frequency words are treated in four online dictionaries designed for learners of English (Merriam-Webster, Cambridge, Collins and Oxford online dictionaries). High frequency words are claimed to make the major contribution in text creation and be a major stumbling block for both lexicographers and learners due to their highly polysemous and phraseological nature (Sinclair, 1999; De Cock & Granger, 2004; Nation, 2001; Renouf, 1994). Accordingly, research on the semantic and phraseological treatment of words has been mostly focused on high frequency words in both lexicography and phraseology. The phraseological features of non-high-frequency words and the treatment of those words in dictionaries are mostly neglected. This paper starts by defining high- and non-high-frequency words, and subdividing the latter group of words into medium-frequency and low-frequency words. A wordlist, which is based on the 540-million-word Corpus of Contemporary American English (COCA), was used in this study. The words in the list were further grouped into three lists in terms of word frequency. Twenty non-high-frequency sample words, e.g., lugubrious, adjuvant, unmentioned, rattling and coalescence, were randomly selected and examined. The focus of the observation was put on the investigation of the phraseological behavior of the non-high-frequency words in the corpus. The entries in four learners’ dictionaries for each of the twenty sample items were then examined and their content compared with the findings from the corpus-based study. The corpus evidence shows that non-high-frequency words should not be treated in a slot-and-filler model, since they tend to be contextualized in certain contexts. It was also found that the phraseological behavior of non-high-frequency words is not sufficiently described in the dictionaries. We argue that non-high-frequency words should be treated in a more phraseological way in learners’ dictionaries, i.e. they need to be strengthened in terms phrase collection and arrangement.

What determines the quality of “good examples” for CEFR-levels?

ABSTRACT. This study investigates the criteria of “good examples” for learners at different English proficiency levels. Recently there is a growing awareness that not only the difficulty levels of texts or passages but also the difficulty levels of individual sentences received increasing attention. In pedagogical lexicography, for example, the difficulty levels of individual sentences have often been discussed in terms of the intelligibility of dictionary examples. Methodologically the automatic extraction of examples from corpora have revealed some characteristics of good examples (Kilgarriff et al., 2008; Kosem et al., 2011). Despite these attempts, there still has been no clear understanding on exactly what criteria should play significant roles in determining the levels of difficulty of dictionary examples. Moreover, with increasing popularity of the CEFR in non-European countries, there is a genuine interest in the criteria of choosing “good examples” for different CEFR levels in order to present level appropriate input in textbooks or classrooms.

The present study examined various linguistic features which may contribute to the determination of difficulty levels of example sentences. First, a set of sentences were randomly selected from texts with different CEFR levels, using the CEFR-based Coursebook Corpora. Then the following lexical and syntactic measures were calculated for each sentence: (a) the sentence length (the total number of words in a sentence), (b) the average CEFR levels of content words, and (c) the number of verbs in a sentence, (d) the number of complex nominals, (e) the number of subordinators, (f) the number of complex grammatical categories (e.g. perfective constructions, relative clauses, subjunctives), among others. Each sentence’s corresponding CEFR levels were predicted by these fluency/ complexity/ vocabulary level measures using classifiers such as SVM or Random Forest. Also, the relationship of these metrics and language activity types (e.g. reading/listening/speaking) in the coursebooks were explored.

References Kilgarriff, A., Husák, M., McAdam, K., Rudell, M., & Rychlý, P. (2008). GDEX: Automatically Finding Good Dictionary Examples in a Corpus. In Proceedings of the 13th EURALEX International Congress. Spain, July 2008. pp. 425-432. Kosem, I., Husák, M., & McCarthy, D. (2011). ‘GDEX for Slovene’. In I. Kosem, K. Kosem (eds.) Electronic Lexicography in the 21st Century: New applications for new users, Proceedings of eLex 2011. Ljubljana: Trojina, Institute for Applied Slovene Studies: 151-159.

10:20-12:20 Session 3E
The representation of nurses in oncology and paediatric nursing research articles - who are they and what do they do?

ABSTRACT. This paper aims to investigate the representation of nurses in research articles on oncology and paediatric nursing by conducting a transitivity analysis within the systemic functional linguistics framework. The study mainly examines the use of material, mental and relational processes in nursing research articles to illustrate the roles and identities of oncology and paediatric nurses.

To facilitate the research, two corpora were consulted: the Oncology General Corpus and the Paediatric General Corpus for the years 2010 to 2016, containing oncology and paediatric nursing research articles respectively. Concordance lines showing the search words ‘nurse’ and ‘nurses’ were examined. The type-token ratios of the six kinds of process which co-occurred with ‘nurse(s)’ were calculated and the twenty most frequently used material, mental and relational processes were investigated. Many of the twenty most frequently used material, mental and relational processes which co-occurred with ‘nurse(s)’ are shared between the two corpora. Nevertheless, some processes are more salient in one or other of the corpora, which suggests special emphases in the work and identities of nurses in these two nursing specialties. Furthermore, semi-prefabricated chunks such as ‘nurses play a … role’ and ‘nurses have a … role/position’ are used frequently to highlight some specific contributions and expectations of nurses in these two specialties. 

Is the relationship between semantic prosodies and transitivity reliable? A corpus-based cross-linguistic investigation

ABSTRACT. Semantic prosody is the implicit attitudinal meaning conveyed by lexical items. The link between semantic prosody and transitivity forms the main part of studies on how syntax is influencing the semantic prosody. Previous studies investigating their correlation have focused on semantic prosodies and transitivity in English and suggested that the relationship was reliable in that positive semantic prosodies corresponded to the transitive use, whereas negative semantic prosodies corresponded to the intransitive use (Louw 1991, 2000; Stubbs 1996: 139). However, few studies have investigated the extent to which this statement holds for English as well as other languages and whether this link works in the same way across different languages. This paper aims to bridge this gap by presenting a contrastive analysis of perception verbs across English and Chinese. Instances containing three highly frequent prima facie translation equivalents, i.e., SENSE and juecha 觉察, PEEK and kuishi窥视, LOOK up and down and daliang 打量 were retrieved from the British National Corpus and the Beijing Language and Culture University Corpus Center (a 13 billion tokens Chinese general corpus). The analyses show that, for the English nodes, semantic prosodies vary dramatically according to their in/transitivity, suggesting that the relationship between semantic prosody and transitivity is multi-faceted because in some cases the transitive use is related to negative and neutral prosodies rather than positive prosodies. In contrast, for the Chinese translation equivalents, their semantic prosodies remain the same in spite of the transitivity. The implication might be that Chinese transitivity has little impact on semantic prosodies, which appears to be consistent with Wei & Li's (2014) findings. Therefore, syntactic patterns need to be examined from different perspectives in collocation studies of Chinese.

Corporate Cautionary Statements (Disclaimers): A Critical Genre Analysis of Professional Communication

ABSTRACT. Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impression for the investment. This study compares the language of cautionary statements (Disclaimers) and risk factors in the annual reports presented to US Securities and Exchange Commission (SEC) using seven corpora for the companies listed in NYSE (Apple, Whirlpool, Ford Motor, Sony, Cannon, Toyota and Kyocera), illustrating the similarities and differences in relation to the use of meaningful cautionary statements, and critically examining why practitioners use the way. The findings describe the similarities of the study corpora in the characteristics of specialised genre and in the frequent use of modal auxiliary may and could.  The distinct differences are identified in the word choice. The word ability is used more for legal protection in some corpora, whereas some words changes, affect or fluctuations are used more to convey a better impression in some corpora. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication.

Combining Corpus Linguistics and Critical Discourse Analysis to study the “Leftover Woman” phenomenon in Hong Kong press

ABSTRACT. Gender studies is a blossoming phenomenon in linguistics which brought pressing awareness on how linguistic practices affect one’s social or cultural identity or vice versa. Some research applying Critical Discourse Analysis (CDA hereafter) also examines how sociocultural ideologies are being perpetuated by the dominant groups, for instance, the media, government or political parties, etc. However, research on the correlation between media discourse and gender identities in Hong Kong is very limited. This paper, therefore, will shed light on how the representation of a female identity, ‘Leftover Woman’, has been perpetuated through media discourse by the synergy of corpus linguistics and CDA. ‘Leftover Woman’ is a labelling originated from Mainland China that has created heated controversy over various media discourse in Hong Kong recently and has been used to identify a specific group of Hong Kong women who have been described mainly as undesirable because they remain single after passing their socially-defined marriageable age. This paper, therefore, will examine how this labeling is influenced and promoted by the media in Hong Kong through studying the core lexical item, 剩女 sing6 neoi5 (Leftover Woman in English) through compiling a range of newspapers, and generating a specific corpus for this project. The data set is then analyzed with the model of extended lexical units proposed by Sinclair (1996, 2004) to study the immediate linguistic environment of the core, ‘Leftover Woman’. The initial findings suggest that “Leftover Woman” is usually a patient rather than an actor in media discourse which they mainly receive some kinds of actions, for example, being negatively labelled or described as an undesirable group. It also shows that the semantic preference and prosody of the core word is overwhelmingly negative in the data. The paper will further shed light on how gender ideologies are being consolidated and how the exercise of power by the dominant groups through language use can propagate the dominant patriarchal ideologies in Hong Kong. Reference: Sinclair, J. McH. (1996). The search for units of meaning. Textus, 9 (1), 75-106. Sinclair,J. McH. (2004). Trust the Text: language, corpus and dicourse. London: Routledge.

12:20-13:40Lunch Break
13:40-15:10 Session 4A
CEFR Receptive and Productive Vocabulary Knowledge of Japanese English Learners

ABSTRACT. Vocabulary knowledge has been discussed frequently in terms of receptive and productive knowledge (e.g. Melka, 1997; Laufer, 1998). However, it does not seem to be investigated enough in the context of the Common European Framework of Reference for Languages (CEFR; Council of Europe, 2001). The website, the English Vocabulary Profile (EVP) provides six CEFR levels for individual meanings of each word and phrase. Capel (2015) claims that there should be no larger difference between learners’ receptive and productive vocabulary knowledge in the present than the past. However, it is still worth investigating it within a Japanese learning context.

The aim of this study is to compare the receptive and productive vocabulary knowledge of approximately 240 Japanese English learners in the context of CEFR. Receptive vocabulary knowledge was investigated by statistically analysing the CEFR Vocabulary Test, which is composed of 60 multiple-choice vocabulary questions, including 10 questions for each CEFR level contained in the Japanese University Entrance Exams Corpus. However, productive vocabulary knowledge was investigated by analysing the overall CEFR level and the sub-category, vocabulary CEFR level, of the learners’ essays as rated by professional CEFR raters, along with the percentage of each CEFR level of vocabulary used in their essays.

Results showed that learners’ receptive vocabulary knowledge was relatively high, indicating the average score of the CEFR Vocabulary Test is approximately more than 50%. These learners even got approximately 50% of the C2 level right, whereas their productive vocabulary knowledge was lower, with fewer learners achieving percentages above B2 levels. The correlation between their receptive and productive vocabulary knowledge was reasonable. I hope these findings will be useful in improving learners’ vocabulary productive and receptive knowledge.

References Capel, A. (2015). The English vocabulary profile. In J. Harrison & F. Barker (Eds.), English profile studies 5 (pp. 9-27). Cambridge: Cambridge University Press. Council of Europe. (2001). Common European framework of reference for languages: learning, teaching, assessment. Cambridge: Cambridge University Press. Melka, F. (1997). Receptive vs productive aspects of vocabulary. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description Acquisition and Pedagogy. (pp. 84-102). Cambridge: Cambridge University Press. Laufer, B. (1998). The development of passive and active vocabulary in a second language: same or different? Applied Linguistics, 19(2), 255-271.

Investigating Japanese EFL learners’ overuse/underuse of English grammar categories and their relevance to CEFR levels

ABSTRACT. This study focuses on L2 learners’ overuse/underuse of English grammar categories and examines distributions of grammar use by Japanese EFL learners across different CEFR levels. A revised version of the Japanese EFL Learner (JEFLL) Corpus, a collection of approximately 10,000 compositions written by Japanese lower and upper secondary school students (Tono, 2007), classified according to CEFR levels, was used for this study. In addition, each composition was proofread by a native speaker and a parallel set of original and corrected versions were prepared. An inventory of grammar items and their accompanying corpus query syntax using combinations of lemmas and parts of speech was developed (Ishii and Minn, 2015) and 263 grammar categories, altogether 501 individual grammar items, were extracted from both original and proofread versions of the JEFLL Corpus. Overuse/underuse of grammar categories was examined by the paired comparison between students’ original essays and their corresponding proofread essays across each CEFR-level group. Out of about 500 grammar items in our grammar profile, around 200 items were studied in detail, while the others were too low in frequency to observe. One of the most striking differences was found in postpositive past participle construction. A1-level users used this construction 105 times in the original essays in total, whereas the corrected essays contain 337 occurrences, which shows that A1-level users either could not produce or avoided the construction. At A2 level the difference is much smaller, and at B1 and B2 levels the proofread versions have a little fewer instances, which suggests that B-level learners slightly overused the construction, but native speakers corrected some of their uses without using the construction. As we highlight several other constructions, we will discuss methodological and pedagogical implications of the study.

Assigning CEFR-J levels to English texts based on textual features

ABSTRACT. CEFR (Common European Framework of Reference for Languages) has been widely used as a guideline for assessing the language ability of learners, originally across Europe in reaction to plurilingualism, but currently also employed in the context of EFL and ESL countries such as Japan for educational purposes. Against this backdrop, lists of English grammatical items and words for each CEFR level (A1-C2) have been created (e.g., English Grammar Profile and CEFR-J Wordlist). However, a limited number of attempts have been made to assign CEFR(-J) levels to English passages, and hence teachers and learners are uncertain about the difficulty of the listening and reading materials they encounter.

The present study attempts to assign CEFR-J levels to English texts based on textural features. A coursebook corpus was created from EFL/ESL English textbooks that claim to be based on CEFR levels and then four textual indexes were calculated. The indexes consist of ARI (a readability measure), VperSent (an average number of verbs included in each sentence), AvrDiff (the average of word difficulties when A1 is 1, A2 is 2, B1 is 3, and B2 is 4 based on CEFR-J Wordlist) and AperB (the ratio of B level content words to A level content words). Using these four indexes, a regression model was created to predict the level of the input text which was then implemented as an online application called CVLA (CEFR-based Vocabulary Level Analyzer). To show the validity of our model, the results of reading parts of English performance tests (e.g. National Center Test for University Admissions and TOEFL iBT) will be compared. Also, we will closely examine the results of English passages which exhibit idiosyncratic behavior compared to other texts.

13:40-15:10 Session 4B
A Corpus-based Study on the Use of Linking Adverbials in Master Theses in Psychology by Taiwanese EFL Learners

ABSTRACT. Linking adverbials play an important role in creating cohesion both in spoken and written English. More specifically, linking adverbials in academic writing display and establish arguments. Therefore, it is of great importance for ESL/EFL learners to learn how to use linking adverbials appropriately in their academic writing. The present study probes into non-English major student writers use of linking adverbials and also the difficulties they might encounter in their academic writing. Two corpora were compiled to compare the linking adverbials used by EFL student writers and journal article writers. A total of 99 master theses in psychology served as the learner corpus, and the other corpus consisted of psychology journal articles produced by professional writers. For the data analysis, Liu’s (2008) list, containing 110 linking adverbials with their classifications, was employed as the framework. The linking adverbials in the list were divided into four categories: additive, adversative, causal, and sequential. Sketch Engine was used to perform the word count and frequency of the linking adverbials in the two corpora. In the quantitative data, the overuse and underuse of linking adverbials by EFL writers were identified. Those linking adverbials were also qualitatively examined in the concordance lines, and the misuse of them were then investigated. The results showed that the Taiwanese EFL master students slightly overused linking adverbials in their master theses compared to the journal articles by professional writers. More specifically, linking adverbials in the adversative category were overused the most by the EFL student writers. It was also found that certain linking adverbials were underused by the EFL student writers. By examining the concordance lines of the searched linking adverbials, the misuse of them were also identified. The paper ends with the teaching and learning implications. Possible contributing factors were also discussed in the paper.

A Corpus-based Study of Stancetaking as Seen from Stance Adverbials in Interpreter-mediated Political Discourse

ABSTRACT. Stancetaking is a pervasive phenomenon in human communication (Du Bois, 2007), and it has been attracting investigative attention ever since the twenty-first century (Englebreston, 2007). However, most of the studies on stancetaking are conducted from the perspective of conversation analysis, and focus only on English. Stancetaking investigation of Chinese languages (Mandarin Chinese and Cantonese included) from the perspective of translation and interpretation is almost non-existent. To fill this gap, this study aims to explore stancetaking in interpreted Cantonese by focusing on one particular group of stance markers, i.e. stance adverbials. Based on self-built parallel corpora composed of transcriptions of 7 interpreted proceedings of Chief Executive’s Question and Answer Session in HK Legislative Council (the “LegCo”), the study investigates a comprehensive set of 35 stance adverbials expressing epistemic knowledge, attitude and style (Biber et al., 1999) in terms of their distribution and function in original discourses and their interpretations. Besides, one particular epistemic stance marker jat1 ding6 (certainly) and its interpretations has been examined closely. Through both quantitative and qualitative analyses, the study reveals significant differences in terms of the usage of stance adverbials in original and interpreted discourses: while epistemic stance adverbials, especially those expressing a. certainty and doubt; b. actuality; and c. source of language, are heavily used in original speeches, they have largely been omitted in the interpreted version. Such renditions can be contributed to the genetic differences between Cantonese and English, and the interpreters’ limited amount of processing capacity (Gile, 1992). However, the author argues that such large amount of omissions of stance markers may result in repositioning of original speakers’ stancetaking.

A Corpus-based Study on Linking Adverbials in Academic Writing by Taiwanese EFL Students

ABSTRACT. The use of linking adverbials in academic writing has received much attention in the corpus-based studies. The current research aims to investigate the use of linking adverbials (LAs) by Taiwanese and American graduate students in their academic English writing. To examine the use of LAs in academic writing, two corpora were compiled by the researcher. The learner corpus used in this study consists of 100 Taiwanese MA theses in the field of Biology. The control corpus is composed of 100 American MA theses in the same field as the norm of the analysis. Many previous corpus-based studies have explored the ‘underuse’, ‘overuse’, and ‘misuse’ of linking adverbials. This study also drew on these three aspects and discussed the reasons for the three problems common in EFL learners’ academic writing. Sketch Engine was employed in this study to compare the usage of LAs in the academic writings of Taiwanese students with that of American students. The results suggest that the academic writings of Taiwanese students reveal a significant gap in the usage of LAs, in contrast to the writings of American students. Taiwanese students were found to overuse certain types of LAs. More adverbials were used inappropriately by the students from Taiwan than those in the United States. A few examples of underused adverbials were also identified in the result section. At the end of the paper, the current study first presented the detailed account of the phenomenon and then provided pedagogical implications for EFL language teachers and second language learners.

13:40-15:10 Session 4C
The textual functions of lexis: A case study of "nowadays" in learner and English – Thai general corpora

ABSTRACT. The present study applies one of the central concepts in recent corpus linguistics work – the textual functions of lexis – to learner corpus analysis. It has been demonstrated lately that formal patterns of lexical items are functional in textual cohesion and organization (cf. e.g. O’Donnell et al. 2012; Hoey and O'Donnell 2015; Stubbs 2015). Based on this perspective, the adverb “nowadays” in a corpus of Thai learner English argumentative essays (THAI) is studied. The word has been found to be one of the two key adverbs ranked among the top 30 key words in THAI when compared with its native counterpart LOCNESS. To explain its keyness, the word was examined in terms of its textual colligation, textual semantic association patterns and textual functions in the essays, which were then compared with those found in the British National Corpus (BNC). It has been found that while “nowadays” in THAI tends to be used to provide temporal context for the discussion, enhancing the relevance and significance of the student writers’ arguments, the adverb often serves to signal contrastive discourse in the general corpora. This difference was hypothesized to be attributed to learner L1 interference. The Thai equivalent of “nowadays” was therefore studied, using the Thai National Corpus. Findings from the combined uses of multiple corpora suggest that Thai learner choice of “nowadays” is influenced by L1 rhetorical strategies to serve distinctive purposes relevant to the academic essay genre. This can inform the text-lexis interface in EFL learner language and at the same time yield theoretical support to the theory that patterns of lexical items can vary according to text types and speakers. The study also contributes to corpus-based contrastive linguistics by adding the text-functional perspective to description of English - Thai lexical equivalents. 

References: Hoey, M. (2005) Lexical Priming. London: Routledge. Hoey, M., and O'Donnell (2015) "Examining associations between lexis and textual position in hard news stories, or according to a study by…" In N. Groom, M. Charles and S. John. Corpora, Grammar and Discourse: In honour of Susan Hunston. Amsterdam: John Benjamins, 117- 144. O'Donnell, M., Scott, M., Mahlberg, M. and Hoey, M. (2012) "Exploring text-initial words, clusters and concgrams in a newspaper corpus" Corpus Linguistics and Linguistic Theory 8–1, 73 –  101. Stubbs, M. (2015) "The textual functions of lexis" In N. Groom, M. Charles and S. John. Corpora, Grammar and Discourse: In honour of Susan Hunston. Amsterdam: John Benjamins, 97- 116.

Examining the usage of phrasal verbs in theses of Taiwanese TESOL MA students versus TESOL papers in foreign journals

ABSTRACT. Phrasal Verbs (PVs), which are receiving increasing attention from researchers worldwide, can generally be found in both academic spoken and written English. As described by Waibel (2007) and Chen (2013), despite the missing structure of PVs in the Chinese language, Chinese learners of English are able to produce ample amount of PVs in their writing. However, it can still be difficult for non-native speakers to use some PVs properly for academic purposes. Consequently, this study aims to explore and compare the usage of some PVs in the English-written theses by Taiwanese TESOL MA students, who are the ones with advanced English proficiency, with that of the papers collected in TESOL Quarterly and Studies in Second Language Acquisition (SSLA). By utilizing one corpus of 100 theses of Taiwanese TESOL MA students and another one of 200 papers from the two journals built with the aid of Sketch Engine, and making cross-references with existing PV lists (Gardner and Davies, 2007; Liu, 2011), this study is able to determine the frequency as well as the categories of PVs in the two corpora. The results suggest that differences between the two corpora in terms of the frequency and categories of PVs do exist. In addition, this study illustrates the occurrence of some PVs in informal register, which should preferably be avoided in academic writings. The findings of this study can be served as a reference for EFL learners who intend to write in a more authentic way.

Specific Syntactic Complexity Measures Across Two Corpora of EFL Writing

ABSTRACT. This study examines the validity and consistency of specific syntactic complexity measures across two corpora of Chinese EFL writing. To achieve this goal, we built two corpora differing mainly in writing topics but with equal sample sizes and quite similar score distributions. The samples were extracted from two large online corpora with automatically assigned scores and were divided into 4 proficiency levels respectively according to score ranges. Then we used a computer tool to generate 14 indices of syntactic complexity. We conducted ANOVAs using these indices to check whether these complexity measures could sufficiently distinguish different proficiency levels. Results showed that in general length of production measures and complex nominal measures can distinguish proficiency levels within both corpora, confirming findings from previous studies. On the other hand, some measures exhibited considerable cross-corpus variations, indicating that they are more susceptible to factors such as writing topic variations. We ran additional stepwise regression analyses to explore which complexity measures could jointly predict writing scores. The two regression models derived were considerably different, again indicating that topic variations may exert influence on predictability of complexity measures. In all, findings of this study demonstrate that surface complexity measures, such as length of production measures and complex nominal measures, can distinguish different proficiency levels and the findings also reveal that factors such as writing topics have to be taken seriously in syntactic complexity study.

13:40-15:10 Session 4D
Lexical Bundles and Disciplinary Variation in EFL Master’s Thesis

ABSTRACT. With the advances of corpus linguistics, researchers were able to discover the existence of common multi-word sequences, or “lexical bundles,” and also reveal the essential role they play in spoken and written English (Biber & Conrad, 1999; Biber et al., 1999). According to Biber et al. (1999), lexical bundles can serve the following communicative functions: stance expression, discourse organization, and reference making. Listeners/readers are allowed to grasp the speakers’/writers’ intention more easily by referring to the types of bundles they use. Without using proper and sufficient lexical bundles, one might not be able to achieve fluency in one’s communicative competence (Hyland, 2008a). In Hyland’s (2008b) research, he uncovered the disciplinary variation of lexical bundles in the published writing, doctoral dissertation, and master thesis. Based on the aforementioned research, the current study aimed to identify the most frequently used 4-word bundles in Taiwanese EFL masters’ thesis, as well as to investigate whether the four targeted disciplines (i.e. electrical engineering, microbiology, business studies, and applied linguistics) have distinctive preferences for using different types of lexical bundles. The corpus of EFL masters’ thesis was collected from the electronic database of National Central Library in Taiwan. The AntConc 4.0 Software was adopted to analyze the frequency and range of the bundles in the texts. The results showed that different disciplines share different patterns in the use of lexical bundles. In addition, several interesting distinctions of the most frequently used 4-word bundles were found between the present study and Hyland’s (2008b) research. Probably the cultural factors as well as the different sources of the corpora can account for such discrepancy. The influence of culture, especially the cultures of the worldwide academic communities, on the variation of lexical bundles thus call for more detailed future investigation.

Speech Acts in the Hong Kong Corpus of Spoken English (HKCSE)

ABSTRACT. Since the 1960s, empirical studies of speech acts have examined specific speech acts in spoken or written language in different languages and in different contexts of communication. A number of studies have examined the expressions, the patterns and the strategies of a speech act in a particular language. However, most empirical studies in different genres and in different contexts of business communication have not in particular examined speech acts. Only few studies have investigated the relative frequency of a speech act and the co-occurring patterns of two or three speech acts as found in a genre. The present study aims to investigate, by means of analysis of a corpus of manually annotated speech acts, the features of all the speech acts in six different communicative contexts from a corpus of spoken business discourse. The findings indicate that the process of manual annotation of speech acts is laborious and requires a number of revisions regarding annotation criteria and outcomes. Despite the different contexts of interaction in the corpus, the quantitative data generated by SpeechActConc show that there are similarities in the number and the category of unique speech acts as well as in the frequency and the co-occurrence of different speech acts among the six genres. In analysing the predictable patterns of speech acts, both the preferable adjacency pairs and the most frequent co-occurring speech acts are discussed. In examining the lexicogrammatical patterns of speech acts, traditional markers such as inform or opine markers are not the only linguistic realisations; phrases/clauses are common to perform different speech acts.

A Corpus-based Study on Frequent Noun Phrases in Engineering Academic Texts

ABSTRACT. A great majority of academic papers in engineering fields are now written in English (Sano, 2006; Huttner-Koros, 2015), which has become a de facto common language for engineers. Thus, engineering students in the world are required to acquire English skills to cope with the increasing demand to read academic papers and to discuss a range of engineering topics in English. In general, academic texts contain three levels of vocabulary: high frequency vocabulary, specialized vocabulary, and low frequency vocabulary (Nation, 2001). Engineering students have already acquired high frequency vocabulary at high school, and they are expected to learn specialized vocabulary at university. In previous research (Ishikawa & Koyama, 2007; Ishikawa, 2017), we collected academic papers related to the eight main engineering areas, analyzed them, and compiled a wordlist for engineering students. The wordlist has helped the students to learn the specialized words in an efficient way, from more frequent to less frequent. However, it has some limitations. The biggest problem is that the word use in different contexts is not indicated. One word collocates with different words meaning differently in a different area. For example, “cell” is used frequently both in electric engineering and material engineering, but “solar cell” is used mainly in electric engineering and “receptor cell” is used only in material engineering. Therefore, in this research, we will focus on multiword expressions for each word in the wordlist, and explore the following research questions: 1. What MWEs are used commonly in the engineering fields? 2. What MWEs are specific to each of the eight field? Our preliminary study shows the most fifty frequent MWEs are commonly used in each of the engineering specialties.

13:40-15:10 Session 4E
A Corpus-based Cognitive Study on the C-E Translation of Chinese Political Publicity Texts

ABSTRACT. In recent years, the international community's cognitive demand for China is becoming more and more intense. The English translation of Chinese political publicity texts is an important way to construct and disseminate China’s image, and is also vital for the international community to understand China properly. In the process of English translation, different construal modes can show different linguistic features. This paper combines quantitative and qualitative analysis. The author analyzed the linguistic features through the self built small Chinese-English parallel corpus. Meanwhile, based on the theory of Langacker’s construal theory, the preferred cognitive way and relationship of synchronic correspondence between Chinese and English version’s linguistic features would be explored, so as to make the Chinese story more attractive and tell the Chinese story to the world in an effective way.

Comparing the Hedging Devices Used in Two NNS Academic Writing Corpora

ABSTRACT. Hedging devices play an important role in the field of academic writing. In order to tone down the force of statements, writers tend to use hedging devices when composing academic papers. However, hedging devices have somehow not been taken seriously in the research field. Hence, few studies have focused on the use of hedging devices in academic journal papers. This study aims to investigate how hedging devices are applied and implemented in Taiwanese master thesis. Moreover, hedging devices in the master thesis from two different fields, electrical engineering and linguistics will also be compared and analyzed. Two master thesis corpora from diverse fields, electrical engineering and linguistics, were constructed. There are 100 papers from each journal, and 200 papers in total (= 3,094,970 running words). This research explored the most frequently used hedging devices in master thesis, and did a comparison between the commonly used hedging devices in the electrical engineering and linguistics field. In this study, Sketch Engine was used to analyze word frequency, percentage, collocation, and clusters. Salager-Meyer’s (1994) taxonomy was implemented in this research to identify the hedging devices in the study. The five main categories in the taxonomy are “shields” (e.g. appear, seem, probably, suggest…), “approximators” (roughly, somewhat, often…), “expressions of the authors’ personal doubt and direct involvement” (e.g. we believe…), “emotionally charged intensifiers” (e.g. particularly encouraging…) and “compound hedges” (e.g. it may suggest that, it could be suggested that…)” respectively. After the analysis, the results demonstrated that the three most frequently used hedging devices in master thesis in electrical engineering are “could” (606), “may” (601), and “would” (572). The top three commonly applied hedging devices in master thesis in linguistics are “may” (2989) “would” (2210), and “could” (1866). From the results of the study, it is clearly seen that Taiwanese master students tend to use more hedging devices when composing master thesis in linguistics. The analysis of the texts also implied that hedging devices can serve as a useful tool to assist researchers in stating their ideas in a polite matter.

Alternative Look: A Collostructional Analysis of Ditransitive Constructions in Mandarin

ABSTRACT. There are three major subtypes of ditransitive constructions in Mandarin (Liu 2006), and the difference among them has been a debating issue in terms of alternation or participant roles. Taking a different approach from previous studies, we aim at investigating the collostructional strength of each verb and each type of ditransitive construction and demonstrating different semantic meaning conveyed by each construction. By investigating the frequency distribution of 37 verbs at Mandarin ditransitive constructions (including double-object construction and prepositional dative construction) from corpus-based data and adopting a collostructional analysis as introduced by Gries & Stefanowitsch (2004), we attempt to clarify the constructional meaning of each type of ditransitive constructions, including double-object construction and prepositional dative construction. The preliminary result shows that two constructions differ in terms of the number and completion of transfer events that it entails. The meaning of transfer entailed and expressed by double-object construction are associated with only one entire event, while the meaning of [+transfer] expressed by prepositional dative construction may involve more than one events, which also increase the possibility of conveying incomplete or unsuccessful [+transfer] meaning that prepositional dative implicates.

15:10-18:00 Pre-banquet tour to Ritsurin Garden

Pre-banquet tour for Ritsurin Garden

18:00-20:00 Banquet at Ritsurin Garden

Banquet at Ritsurin Garden