Linguistics in the 21st century: What are we up to? - Enoch Aboh, University of Amsterdam
Linguistics Association Lecture
Since Chomsky’s (1965) characterization of the object of linguistic inquiry some sixty years ago (i.e., an ideal speaker-listener in a completely homogeneous speech-community, who knows its language perfectly…, p. 3), there has been a tremendous development in formal syntax, with important discoveries about structure building (e.g., Minimality or fundamental aspects of first and/or second language acquisition). Insights from formal syntax have also led to subsequent developments in experimental work (e.g., acquisition of pronouns and binding theory, long distance dependencies), all of which brought new insight into the human brain and mind.
Yet, a cursory look at natural contexts of language acquisition and use in various ecologies outside Western and highly educated populations (e.g., Sub-Saharan Africa, South East Asia) suggests that the current generative model must be revised. We must move away from the monolingual bias (including nativism) and adopt universal multilingualism as the default. Based on multilingual ecologies in Africa, the birth place of the human language capacity and now home to about a third of the languages of the world, I reflect on the emergence of grammar out of contact. The latter is understood as resulting from interactions between individual speaker-/signer-learners (SSL’s) with different idiolects, which provide the appropriate context for acquisition (and change). Under this view, contacts between SSL’s create a unique situation in which syntactic properties underlying clause structure emerge, as a result of feature recombination. This is a general instance of Merge applying across different modules (e.g., morphosyntax, phonology, semantics). It is an innate human cognitive capacity, which allows SSL’s to select specific linguistic features from the heterogeneous inputs they are exposed to, and recombine them into new variants. As it appears, the result is a hybrid construct. The examples of recombination presented in this talk are based on data from serial verb constructions involving Gungbe, French, (and Haitian Creole).
Lak is a North East Caucasian language spoken in Dagestan, Russia. According to the latest census (2021-2022), it has about 140,000 speakers, though the actual number of native speakers is likely to be significantly lower.
The second session will be an overview of Lak syntax (and morphosyntax). I will cover such topics as word order and question formation. Unlike many related languages, Lak has developed person agreement in addition to class/number agreement found in other languages of the family. Lak is an ergative language that has two types of split ergativity. I will also discuss whether long-distance agreement exists in Lak. Furthermore, I will show that Lak has unusual agreement targets which include anaphors, adverbs, and locational phrases.
A Test of the Relation between Wh-Acceptability and Syntactic Surprisals in Recurrent Neural Networks
ABSTRACT. Introduction. There is a long heated debate about what causes the convergence of island effects, a principle that people cannot make sentences violating island conditions. Traditional Chomskyan generative scholars (Chomsky, 1964; Hu, 2019; Huang, 1982; Ross, 1967) believe that island effects are a syntactic constraint, but more and more voices (Phillips et al., 2005) argue that island effects are affected by human sentence processing. This study is based on Chinese wh-island effects and employs experimental syntax techniques to explore the origins of island effects. This study conducted one experiment, with wh-acceptability surprisals in Recurrent Neural Networks (henceforth RNNs) as the main evaluation criterion for the experimental measurements. Surprisals in RNNs typically refer to the level of “surprise” or uncertainty when an event (such as a word or symbol in an input sequence) occurs, compared to what was expected. This experiment is a wh-island effect rating task, where participants were required to read sentences and their level of surprise would be assessed.
Methods. In this experiment, surprisals are calculated to indicate the probability of the next word or token, that is:
S(wi) = -log2p(wi|wi-1)
wi represents the i-th word in a sentence, and wi-1 represents the word that appears before wi in the sentence. p(wi|wi-1) represents the probability of wi occurring given the previous word wi-1. -log2 represents the negative logarithm with a base of 2. Using the above mathematical formula, the surprisal value of a word appearing in a sentence can be calculated. A total of 100 participants without any disabilities or linguistic backgrounds were recruited. This experiment adopted a 2×2×2 (islandhood, matrix wh-phrase, and dependency) factive design, in which participants read sentences with conditions overlapped to rate them based on their acceptability for the appearing wh-phrase. According to the factors of islandhood and dependency, we can draw a 2×2 design in Table 1 as a stimulus example. The difference in acceptability between a and d involves additional factors beyond dependency distance and syntactic structure. This additional factor is not the dependency distance or the syntactic structure itself, but the result of the interaction between the two, specifically the effect of long-distance dependencies that cross over island structures. This additional factor is defined as the island effect, i.e., (1).
(1) Island Effect = (a - d) - ((a - b) + (a - c))
For Chinese, which is a Wh-in-situ language, we can form corresponding long-distance and short-distance dependency relations by controlling the position where the wh-phrase appears.
Results. The collected ratings were transferred into z-scored ratings, and both the argument who and adjunct why present strong island effects in Figure 1. Additionally, the stimuli were introduced into large language models (ChatGPT-4 and Gemma 2) to calculate surprisal values, and the differences are presented in Figure 2. It is clear that wh-acceptability and surprisal values are significantly negative.
Discussion and Conclusion. The results indicate that 1) both types of wh-phrases in Chinese move in LF, during which they are constrained by wh-island effects and that 2) although we can see negative significance between wh-acceptability and surprisal values, there is no evidence that they influence each other and that humans process sentences linearly. Basically, it is still believed that wh-island effects are a syntactic constraint.
Selected References. Chomsky, N. (1964). Current issues in Linguistic theory. The Hague:Mouton & Co. // Hu, J. (2019). Prominence and locality in grammar:The syntax and semantics of wh-questions and reflexives. Routledge. // Huang, C-T. (1982). Logical relations in Chinese and the theory of grammar. MIT PhD Dissertation. // Phillips, C., Kanaziza, N., & Abada, S. (2005). ERP effects of the processing of syntactic long-distance dependencies. CBR. // Ross, J. (1967). Constraints on variables in syntax. MIT PhD Dissertation.
What’s in a verb class? Towards an evaluation of Manner/Result diagnostics
ABSTRACT. One of the most deeply researched ontological distinctions in lexical semantics is Manner/Result Complementarity: the claim that a given verb lexicalizes the manner or result of an action, but not both. While this work has generated a wealth of empirical data, different researchers rely on different diagnostics, implemented in different ways. The empirical picture is therefore unclear, making it difficult to apply the same set of considerations when extending the investigation to languages other than English or to additional verb classes (e.g. verbs of throwing or cooking; Beavers & Koontz-Garboden 2020). Our preliminary experimental investigation evaluates the robustness of some diagnostics proposed in the literature, allowing us to better understand what it is that they are probing and what makes a verb class a verb class.
ABSTRACT. This paper reexamines the debate surrounding the vP phase. Through an argumentation against a DP intervention account, combined with a novel approach to Object Shift, we argue in a favor of the vP being a clause intermediate phase.
ABSTRACT. In Azerbaijani morphological causatives, the GF of the causee depends on the transitivity of the base verb: It surfaces as OBJ with intransitive bases and as OBJθ with transitive bases, marked with ACC and DAT, respectively. However, in causativized ditransitives, the causee cannot normally take ACC or DAT, as these are already assigned to the theme (OBJ) and recipient/goal argument (OBJθ) of the base verb. Some accounts (e.g. Agayeva 2015; Ozturk 2023) suggest that in such cases, the causee may appear marked with an instrumental postposition, as in (2):
(2) Əli hədiyyə-ni Aslan-a Leyla ilə göndər-t-di (causative of ditransitive)
Ali present-ACC Aslan-DAT Leyla PP send-CAUS-PST.3
'Ali made Leyla send the present to Aslan' or 'Ali sent the present to Aslan by means of Leyla'
INSTR PPs with inanimate NPs usually indicate instruments, but it has been suggested that with animate NPs these PPs function as causees. However, it is also possible to have an INSTR PP with an animate NP alongside a non-causative verb, as shown in (3).
(3) Əli hədiyyə-ni Aslan-a Leyla ilə göndər-di
Ali present-ACC Aslan-DAT Leyla PP send-PST.3
'Ali sent the present to Aslan with Leyla'
I argue that the interpretation of (2) is 'Ali had someone send the present to Aslan by means of Leyla,' rather than 'Ali made Leyla send the present to Aslan'. I infer this based on the native speaker intuition which reports that for (2) to be uttered truthfully, there needs to be a fourth person involved in the sending event (i.e. other than Ali, Aslan, and Leyla). This shows a strong preference for reading instrumental PPs as mediators, not as true causees. I therefore argue that INSTR-PPs are adjuncts in Azerbaijani and do not serve as a grammatical strategy for marking causees. So, how can the causee be expressed in the causative of a ditransitive? Two constructions are available: (i) The causee is not expressed, while all three arguments of the ditransitive base verb can be expressed, (ii) The DAT argument of the base verb is omitted, and the causee appears in the DAT.
The DAT marking is therefore ambiguous in a causativized ditransitive: it may indicate either causee or recipient/goal. It is important to note that only one DAT marked argument may appear: it is not possible for both causee and RECIP/GOAL to surface in the DAT.
(4) Əli hədiyyə-ni Aslan-a göndər-t-di
Ali present-ACC Aslan-DAT send-CAUS-PST.3
'Ali had someone send the present to Aslan' or 'Ali had Aslan send the present (to someone / a contextually salient referent)'
This paper accounts for the surface optionality of the causee through an analysis in LFG, whose modular architecture includes a distinct level of argument structure (a-structure). Within LFG, Lexical Mapping Theory (LMT) models the linking between semantic roles and GFs; here, argument fusion, where the causee and an argument of the base predicate map onto a single GF, plays a central role for causativization. The analysis follows the approach to argument structure developed by Kibort (2001, 2004, 2006, 2007) and adopted by Dalrymple et al. (2019).
Many works in LFG literature have addressed optionality through the argument–adjunct distinction and its reflection in the a-structure (Bresnan 1982; Rákosi 2006; Reinhart 2002; Needham and Toivonen 2011; Asudeh and Giorgolo 2012). Following Asudeh and Giorgolo (2012), we assume that optional arguments, even if unexpressed in c- or f-structure, must still be represented at a-structure and s-structure. In causatives, then, an unexpressed causee must remain present in the a-structure for semantic interpretation. To account for the omitted causee in (4), I adopt Alsina and Joshi's (1991) Parameters and Alsina's (1992, 1993) analysis of Chichewa, where causee realisation varies with base verb transitivity: it is OBJ with intransitives and OBL or omitted with transitives.
Alsina (1992) argues that the causee fuses with an argument of the embedded predicate, either the 'highest' or an 'affected' one, thereby compromising the traditional fusion requirement. In Azerbaijani, this allows the causee to fuse with either the agent (highest) or the patient/theme (affected). I now outline the a-structure for a ditransitive causative where the causee is omitted from c- and f-structure but still present in the a-structure.
[Note: The complex table from example (5) this shows the argument structure mapping for causatives]
The mapping respects the Mapping Principle but aligns with the argument hierarchy defined by indices, not linear order. Following Kibort's valency template, arg₁ is assigned [-o] and maps to SUBJ; arg₂ gets [-r], and arg₃ receives [+o]. The causee (arg₄) fuses with the theme (arg₂), not with the embedded agent in the a-structure of the base predicate.
Although Kibort's universal template assigns [-o] to arg₄, fusion with the theme eliminates the embedded predicate's [-r] feature. Since internal arguments require [-r], arg₄ receives this feature and maps to OBJ. Arg₃, marked [+o], maps to OBJθ. Because the causee fuses with the theme rather than the embedded agent, the agent is left unrealised. As noted previously, the dative of a base ditransitive verb can be omitted; when expressed, the causee appears in the dative case and fuses with the highest argument of the embedded predicate, unlike in (5). This analysis extends to double causatives of transitives and unexpressed DAT causees in causativised transitives, as well.
Apparent Syntactic Complexity in Korean Elderly Speech: Reassessing Predicate Ratios through Null Form Reconstruction
ABSTRACT. 1. Introduction
This study investigates how aging affects syntactic complexity in Korean speech, focusing on the use of predicates and null arguments. While aging is often associated with cognitive and linguistic decline, recent research suggests that older speakers may develop adaptive strategies to maintain effective communication. In languages like Korean—where null forms and ellipsis are common—surface-level simplicity may conceal underlying structural richness. To examine this, we analyze spontaneous speech from a large elderly speaker corpus and reassess a widely used syntactic metric: predicate ratio per utterance. By including reconstructed null forms, we explore whether high predicate ratios in elderly speech reflect true syntactic complexity or result from covert structural compression.
2. Background
Much of the prior literature on aging and language has emphasized lexical retrieval difficulties and processing speed declines, but less attention has been paid to syntactic change, particularly in typologically rich, non-Indo-European languages. Korean presents a compelling case due to its widespread ellipsis, argument drop, and agglutinative structure. In particular, null pronouns (pro-drop) and omitted arguments allow speakers to produce shorter utterances while retaining meaningful syntactic relations. In corpus studies, syntactic complexity is often measured by proxy metrics such as the number of predicates per sentence. However, these metrics can be skewed in Korean if null elements are not accounted for.
This study builds on earlier experimental findings about working memory decline in aging (Harris, eds., 2016; Hardy et al., 2020), integrating them with corpus-based insights to investigate whether older Korean speakers genuinely produce more complex syntax or whether they rely more on context-dependent, pragmatically enriched forms. The use of large-scale spoken data allows us to observe naturally occurring language, avoiding the artificial constraints of laboratory tasks.
3. Data and Analysis
We analyze a large spoken corpus comprising approximately 550 hours of prompted spontaneous responses from 1,439 elderly Korean speakers aged 60 to 95 (Ok and Kim, 2024). The dataset includes over 250,000 utterances annotated in units called ecels, which segment postpositions, case markers, and grammatical particles according to Korean orthographic conventions. The dataset captures speech from various educational backgrounds, ensuring broad demographic representation.
Our analysis employed the Korean Structure Analyzer (ETRI), examining four main metrics:
• (1) Average number of ecels per Intent-Based Unit (IBU),
• (2) Average number of ecels per clause,
• (3) Predicate ratio per IBU (number of predicates / total ecels),
• (4) Null pronoun frequency per IBU.
To address limitations in surface-based predicate measures, we conducted a manual reconstruction of null arguments in a subset of 2,100 IBUs sampled from three age groups. Predicate ratios were then recalculated including these reconstructed forms to better reflect underlying syntactic structure.
4. Results and Discussion
4.1 Quantitative Simplification with Age, Elaboration with Education
The average number of ecels per IBU decreased with age—60s (15.00), 70s (14.02), 80s+ (13.42)—suggesting syntactic compression (Welch’s F(2, 61,746) = 331.784, p < .001). Conversely, IBU length increased with education—elementary (13.49), middle school (13.59), high school (14.56), college (15.89) (Welch’s F(3, 133,921) = 586.958, p < .001). Clause length patterns mirrored these findings. These results align with cognitive reserve theory, where education supports maintenance of linguistic elaboration in aging.
4.2 Apparent Complexity: Predicate Ratio Patterns
Surprisingly, predicate ratios (based on overt speech) increased with age—.282 (60s), .305 (70s), .314 (80s+)—and decreased with education—.317 (elementary) to .270 (college) (both p < .001). If taken at face value, this would imply that older or less-educated speakers produce more syntactically complex utterances—contradicting cognitive aging expectations.
4.3 The Role of Null Pronouns
To resolve this contradiction, we examined null pronoun use. Frequency of null forms increased with age—.160 (60s), .172 (70s), .186 (80s+) (F(2, 2097) = 15.828, p < .01)—and decreased with education—.191 (elementary) to .155 (college) (F(3, 2096) = 18.739, p < .001). When predicate ratios were recalculated to include null forms, differences between age and education groups both reduced. Specifically, the F-value for age-related differences decreased by approximately 14.4% (from 35.761 to 30.596), while the decrease was even more pronounced for education-related differences, showing an 18.8% reduction (from 32.945 to 26.750):
• Age: F(2, 2097) = 30.596, p < .01 (vs. 35.761, p < .01 without nulls)
• Education: F(3, 2096) = 26.750, p < .01 (vs. 32.945, p < .01 without nulls)
Thus, high predicate ratios in elderly speech are from elliptical structures, not from increased syntactic embedding or argument complexity.
4.4 Interpretation and Broader Implications
These findings suggest that syntactic ability, as measured through predicate density, remains relatively stable across age and education when covert forms are considered. Older and less-educated speakers are not necessarily using more complex syntax but instead relying more on discourse-driven economy: omitting arguments that can be inferred through shared knowledge. This strategy may compensate for cognitive load while maintaining communicative effectiveness.
Methodologically, the study highlights the limitations of surface-based syntactic measures in corpus research, especially in pro-drop languages like Korean. Predicate ratios that fail to account for null elements may overestimate complexity in speaker groups who rely more heavily on contextual cues.
From an applied perspective, these insights are crucial for AI and speech technology development. Elderly speakers' greater use of ellipsis and pragmatic dependency poses challenges for automatic parsing and understanding. Systems trained on written or overt speech may misinterpret elliptical expressions, particularly in underrepresented populations.