WT&D 2025: WINTER TEXT AND DISCOURSE CONFERENCE 2025
PROGRAM FOR TUESDAY, FEBRUARY 4TH
Days:
previous day
next day
all days

View: session overviewtalk overview

17:00-18:00 Session 4: Discourse in Education
Location: Alpine-Balsam
17:00
Assessment of Training Data for Asset-based Technology
PRESENTER: Jaclyn Ocumpaugh

ABSTRACT. Recent advances in generative AI bring the potential for culturally responsive intelligent tutoring systems (ITSs) tantalizingly close, but only if these systems can recognize the assets that students from diverse backgrounds bring to their learning. An important step towards this is ensuring that generative AI systems are trained to interact with students from multiple dialect backgrounds, such as African American Vernacular English (AAVE; Bailey et al., 2013) and Chicano English—sometimes called Mexican American English (Fought, 2002; Preston, 2009a; Thomas, 2019). Unfortunately, much of the training data that might be used to improve this effort is likely biased by the social prejudices against this language. In this study, we present findings from 440 raters who were asked to rate 6 versions of an essay that had been labeled as an exemplar essay by the New York State Department of Education. The original essay was taken as a starting point (1a) and then modified to reduce its lexical and syntactic complexity (2a). Next, each essay was modified to include low levels of AAVE (1b and 2b) and high levels of AAVE (1c and 2c). Results show that raters reacted more strongly to the introduction of dialect features than to the changes to syntactic and lexical sophistication. Implications of this finding to training data and other practical applications—including the development of asset-based educational technologies (Ocumpaugh et al., 2024)—will be discussed.

17:20
Putting the Brain Back in the Eye-Mind Link
PRESENTER: Megan Caruso

ABSTRACT. The eye-mind link theorizes a relationship between attention (visual focus) and internal cognitive processes. Eye movements are the key measure used to reflect attention, and have long been used to develop models of mental processing while reading. However, there is little research investigating the direct relationship between eye movements and the processing occurring in the brain while reading complex, real-world texts. We propose a multimodal approach to investigate the alignment of eye movements and brain activations and how it might be predicted by unfolding cognitive processes. We applied this method to a dataset of 76 participants who read long, connected texts while their eye movements and the hemodynamic responses in their brains were tracked using functional near-infrared spectroscopy (fNIRS). Participants answered comprehension questions and were probed for mind wandering throughout the reading task. We found that eye-brain alignment signals derived from language processing regions of the brain varied based on the participants’ cognitive state during reading.

17:40
An NLP investigation of difference and overlap between ELL and native speakers’ writing styles
PRESENTER: Maria Goldshtein

ABSTRACT. The goal of the current study is to use natural language processing (NLP) tools to examine variance and overlap in writing styles between English Language Learners (ELL) and native speakers. In the literature, ELLs are often characterized as a varied group of language users who rarely attain native-like linguistic behaviors. Native speakers (NS), however, are often used in research as a comparison group to ELL writers. We analyzed a large corpus (n = 38,020) of source-based argumentative essays written by middle and high school students in the United States. Propensity score matching (PSM) was used to create two matched datasets for NS and ELL students using potentially confounding demographic variables. NLP tools from the Suite of Analytic Linguistic Analysis Tools (https://www.linguisticanalysistools.org) were used to extract linguistic features at the lexical, syntactic, and discourse levels of language. Cluster and discriminant function analyses were applied to categorize writing patterns for the full matched dataset, and for the NS and ELL datasets. Cluster analyses resulted in a 3-cluster model for the integrated dataset, a 5-cluster model for ELL writers, and a 3-cluster model for NS writers. DFA revealed patterns in which students exhibited writing styles with either more accessible or more sophisticated and precise language, more or less reliance on source texts, and more or less variety of sentence structures. Overall results indicated that ELL and NS groups vary in writing style internally and display some overlap, suggesting these groups are neither monolithic nor discrete.

18:00-20:00 Session 5A: Poster Session II and Reception
“It feels like we're not meeting the criteria": Examining and mitigating the cascading effects of bias in automatic speech recognition in spoken language interfaces
PRESENTER: Kelechi Ezema

ABSTRACT. Researchers have explored how automatic speech recognition systems (ASR) perform differently across demographic groups (i.e., are biased), yet the impact of these bias systems on downstream natural language processing tasks are unexplored. We examined this question in the context of an AI-powered application that provides tutors automated feedback on the quality of their discourse during small group tutoring sessions. Study 1 found that the Whisper ASR demonstrated systematic bias for black compared to white tutors and that the performance gap was more influenced by their speech patterns (acoustic components) rather than speech content (language). Study 2 investigated the downstream effect of ASR bias on automated classification of the quality of the tutorial discourse using the validated Academic Productive Talk Framework. We found that automated discourse classification models were less accurate for black tutors when presented with ASR input, but there was no difference in accuracy for corresponding human transcripts. As a result, while Black tutors actually used more high-quality discourse moves than their white counterparts, this trend was obfuscated due to ASR errors. To address this issue, in Study 3 we experimented with methods to reduce ASR bias including ASR-augmented training and fine-tuning the ASR using speech from black tutors. Our results indicated that fine-tuning reduced, but did not entirely eliminate, bias in ASR and the corresponding downstream consequences on discourse classification. We discuss the implications of our findings for AI-based spoken language interfaces that aim to provide accurate and unbiased assessments to improve real-world performance outcomes.

Differences in sensitivity to situational and perceptual changes between hearing and deaf individuals when processing stories in American sign language
PRESENTER: Joe Magliano

ABSTRACT. Signed languages convey events via multiple articulators, such as hands, face, shoulder, and mouth, whereas for spoken language events are primarily conveyed orally, with some support from co-gestures. The differences between spoken and signed languages may present challenges for native users of a spoken language to learn a signed language. The present study was conducted to assess differences in sensitivity to the situational structure of narratives between hearing individuals that are learning American sign language (ASL) and deaf individuals who are native users of ASL. Native users of ASL and students in an ASL program experienced stories that were conveyed in ASL. Participants engaged in an event segmentation task in which indicated whenever the perceived the events in the stories had changed. A situational continuity analysis was conducted to theoretically determine when story events changed, and a perceptual change analysis was conducted to assess changes in low level perceptual features. Deaf participants segmentation judgments were significantly correlated with the situational change score, but not the perceptual change score. Conversely, for hearing individuals there was an interaction between the situational change scores and perceptual change score. The relationships between situational change scores and event segmentation judgments decreased as perceptual change increased. We interpreted these results as suggesting the hearing individuals attend to hand movements, rather than other dimensions of the speaker that have less perceptual change but are nonetheless important for conveying events. The implications of these results for theory and practice will be discussed.

Debunking learning myths: from scientific research to YouTube videos

ABSTRACT. Misconceptions about learning are widespread among educators and pre-service teachers, often resulting in ineffective practices. Attempts to challenge these beliefs with evidence-based texts often fail against the influence of personal experiences. Moreover, university students often struggle to interpret studies and have difficulty adopting evidence-based approaches. This study examines the prevalence of teaching and learning myths among university students and assesses a pilot intervention designed to debunk them through scientific literature and media dissemination. The intervention involved students from an Education Teaching program. Two groups (n=66) were assigned to the experimental condition and two others (n=44) to the control condition. Participants first completed a questionnaire, rating 59 statements on a 4-point scale (“True”, “Probably True”, “Probably False”, and “False”) and identifying the sources of their responses. The experimental group then created educational videos debunking assigned myths using scientific evidence. Afterward, all participants completed a post-intervention questionnaire to assess belief changes. Preliminary results revealed a high prevalence of misconceptions, with students incorrectly endorsing 70% of the pretest false statements as “true” or “probably true” (experimental group: M = 2.02; SD = .35; control group: M = 1.99, SD = .26). Post-intervention, the experimental group showed significant improvement in their scores (M = 2.87; SD = .61) compared to the pre-test and control group (M = 2.22, SD = .41), F(1, 108) = 40.26, p < .00, ηp² = .27. These findings demonstrate the effectiveness of combining scientific literature and digital media to correct widespread misconceptions among future educators.

Individual Differences in Thought Patterns and Their Influence on Phenomenology of Reading
PRESENTER: Püren Öncel

ABSTRACT. Individuals’ phenomenological experiences vary widely. While some people tend to experience highly vivid visual imagery, others may be more inclined to think in words. The reasons behind these variations are not well understood. Specifically, we know little about how individuals’ tendencies to engage in certain types of thoughts relate to their experiences during highly specific tasks, such as reading. Researchers have proposed that readers are often transported during highly engaging narratives, which is theorized to involve strong visual imagery and emotional responses. However, it is unclear whether these experiences are consistent across individuals and whether transportation is more likely for those who already tend to experience more visual or verbal thoughts. The current study examines whether and how phenomenological experiences of visual imagery and inner speech relate to each other and transportation. We combined four existing datasets that employed similar methodologies. Individuals (n = 574) were probed multiple times to report the characteristics of their thoughts during reading (e.g., task-unrelated thought, visual imagery, verbal thinking, valence) and then completed the Transportation Scale to measure their immersion into the narrative, along with the Internal Representation Questionnaire to assess individual differences in visual and verbal thinking tendencies. Results suggest that experiences such as visual imagery and valence are significantly related to each other, to tendency to experience visual imagery and to transportation. These findings provide insights into the role of individual differences in how the phenomenological properties of thought manifest during and lay a foundation for future research.

Cognitive reserve effects on cognitive and discourse production processing in healthy aging
PRESENTER: Andrea Marini

ABSTRACT. 105 adults formed three groups according to their levels of cognitive reserve (CR) as assessed by the Cognitive Reserve Index Questionnaire assessing work activity, leisure time and education history (CRIq, Nucci et al. 2011): 35 with extremely high CR (mean age: 70 years); 35 with high CR (mean age: 72 years;); 35 with mild to moderate CR (mean age: 73 years). The three groups did not differ in age. They were enrolled in a wider project entitled “Standardization of the multilevel procedure for discourse analysis and training program for narrative production in healthy adults” supported by PRIN2022PNRR, Proj. Number P2022M9JCM. They received tasks assessing phonological working memory skills (Backward Digit Recall), sustained attention (Trail Making), lexical selection (Phonological Fluency) and a narrative discourse production task requiring them to describe the stories portrayed in five picture stimuli. These discourse samples were analyzed with a multilevel procedure of discourse analysis (Marini et al. 2011). Adults with extremely high CR produced stories with more words per utterance and grammatically complete sentences and fewer false starts and semantic errors than adults with mild to moderate CR. Furthermore, they also produced fewer errors of local and global coherence and more informative words than both adults with high and mild to moderate CR who did not differ from each other. A similar advantage was found also on a task assessing sustained attention and inhibitory control in lexical selection. This suggests that CR enhances sustained attention, inhibitory control and specific aspects of discourse production in healthy aging

Could word lists and short films help young students to write informative texts ?

ABSTRACT. Writing a text requires the coordination of diverse skills and knowledge. Among primary-grade students, spelling takes up a large part of their attention when writing texts (De Week and Fayol, 2009). In addition, students do not always have extensive knowledge on the themes imposed in a writing task, which has a significant impact on the quality of their texts and their organization (Hebert, Bazis, Bohaty, Roehling, Nelson, 2021). This study thus tested two forms of support for writing informative texts with 5th grade students (n=179): word lists and short films. The word lists were expected to help participants with spelling and vocabulary, while the short films on writing themes were expected to help with structure and coherence. The participants wrote six informative texts on different themes (animals, transportation, plant, sandwich, sleeping late and eating fast food), according to three text structures (2 comparative, 2 sequential and 2 cause-effect). For each text, the students had either a word list support or a short film support. Scoring on various writing aspects involved interjudgment. The descriptive and multilevel analysis shows that the themes had the most impact on the overall quality of the texts, regardless of the support. However, the short films had a small effect in helping students, but more significant than the word list, in terms of several writing aspects: syntax, structure and coherence, grammar and even spelling of words. The discussion will focus on possible explanations and also on contributions for practice and further studies.

Evaluating the Effectiveness of Tier 2 and 3 Reading Interventions for Middle School Readers

ABSTRACT. This study explored foundational reading skills—word recognition, decoding, vocabulary, morphology, sentence processing, reading efficiency, and comprehension—among middle school students (n=752; Grades 5-8) in a rural Midwestern town in the United States. The research identified grade-level differences, with Grade 5 showing significant growth compared to Grade 8. We implemented a diagnostic reading assessment system in partnership with a middle school literacy coach and a reading researcher during the 2023-2024 school year to inform and guide targeted interventions. The study's primary research question investigated the impact of Tier 2/3 interventions, delivered in small groups and individually, on various reading-related outcomes.

Though data analysis is ongoing, initial findings suggest that students in Grades 5 (n= 186; M=4.35; M=6.09) and 6 (n=183; M=3.69; M=4.53) demonstrated greater progress in word recognition and reading efficiency compared to students in Grades 7 (n=186; M=3.44; M=0.52) and 8 (n=197; M=0.01; M=-1.80). This suggests that earlier adolescent reading intervention may be more effective in improving foundational reading skills. The study employs comprehensive interventions focusing on fluency, comprehension, word study, and vocabulary, drawing on established research related to effective reading instruction.

Educational implications include the potential for structured intervention systems to significantly enhance reading skills in middle school students, highlighting the need for continuous support and professional development for educators. This research underscores the importance of addressing reading challenges and tailoring interventions to student needs to increase student achievement.

Impact of Tier 2 Reading Intervention on 7th-Grade Multilingual Learners
PRESENTER: Sarah Kocherhans

ABSTRACT. This ongoing study assesses the effectiveness of a comprehensive Tier-2 reading intervention implemented with 7th-grade multilingual learners (MLLs) through a partnership between a university reading clinic and a middle school during the 2023-24 and 2024-25 academic years. The primary research question examined the impact of small-group Tier-2 interventions on reading-related outcomes, a Reading Level Assessment (RLA), the Test of Silent Contextual Reading Fluency (TOSCRF-2), and Star Reading. Participants include 17 MLLs who receive English-only instruction (no L1 support). At the start of the 2023-24 school year participants read 4 years below grade-level, on average. Measures include: the RLA, reports an independent and instructional reading level, the TOSCRF-2, measures reading comprehension and silent reading, and Star Reading, a computer-adaptive measure of reading and language. We also include WIDA ACCESS scores as an indicator of English proficiency growth. Students’ RLA instructional reading level improved by 1.21 years (3.18 to 4.39). TOSCRF-2 scores improved by 4.75 points (83.0 to 87.75). Star Reading grade equivalency scores increased marginally from 2.85 to 2.94. WIDA ACCESS scores decreased from 2.99 to 2.62, and students remained in the Emerging stage of proficiency. These results illustrate that the Tier-2 intervention substantially advanced reading outcomes among multilingual learners. However, grade-level measures did not show enough improvement to support students in grade-level, content-area reading. With intervention modifications specifically targeting language acquisition, this ongoing study now seeks to determine methods that increase MLLs’ ability to proficiently read complex, grade-level texts.

AI Teammates and Inclusion Analytics: Revolutionizing Equity in STEM Collaboration
PRESENTER: Nia Nixon

ABSTRACT. Collaborative teams in science, technology, engineering, and mathematics (STEM) are critical to driving innovation and societal transformation. Yet, navigating the complex landscape of scientific progress and technological evolution confronts a persistent challenge: the underrepresentation of diverse voices in STEM teams. While team diversity is known to enhance creativity and innovation, homogeneity within STEM fields continues to limit this potential. Women and underrepresented racial minorities (URMs) face barriers in STEM education, reducing their access to high-demand STEM careers and hindering socioeconomic mobility. Despite the potential of team environments to foster equitable interactions, these spaces often fail to be inclusive. Women and URMs frequently encounter unique barriers such as feeling unwelcome, having limited opportunities to contribute, and lacking interpersonal power, which can undermine their sense of belonging and diminish their engagement in STEM over time.

This research seeks to address these issues by (a) capturing dynamic communication patterns within student STEM teams; (b) examining their impact on students’ psychological experiences and performance, with a focus on underrepresented groups; and (c) developing robust inclusion analytics and AI-driven teammates that promote inclusive communication dynamics. These AI teammates will use real-time data to identify and mitigate exclusionary behaviors and microaggressions, providing prompts to ensure all voices are heard. This innovative approach aims to create a more inclusive and equitable environment in STEM, enhancing the sense of belonging and engagement for all students.

Supporting Teacher Noticing in Classroom Discourse with Generative AI Analytics
PRESENTER: Fanjie Li

ABSTRACT. This study presents an early proof-of-concept of analytics to support teacher noticing and building-on of student contributions in K-12 science discussions. Using prompt chaining, reiterated pre-pending, personas, goals and task rules, a generative large language model protocol was developed to surface diverse student sense-making resources employed during science talk activities and provide actionable pedagogical insights. The work took an asset-based approach to actively engage with students’ personally relevant (even if non-canonical) understandings, selecting Claude 3.5 Sonnet due to its explicit inclusion of non-Western perspectives during alignment training. The protocol was applied to a transcript of an elementary classroom discussion on “Do plants grow every day?” The output—sense-making themes tied to specific dialogue instances and pedagogical suggestions—was represented in an interactive Figma prototype shared with nine equity-minded STEM education scholars. The approach taken successfully (i) identified everyday funds of knowledge contributed by students, (ii) followed culturally-sustaining discourse practices by highlighting expansive pedagogical opportunities based on student ideas, and (iii) provided entry points for incorporating students’ home and community strengths into the classroom. For example, one theme centered on the different ways students recognized growth, with a suggested follow-up activity exploring measurement practices from different contexts and cultures. Future work will explore the application of this protocol to various data sources, including undiarized audio, and its use in video-centric pre-service teacher workshops to cultivate noticing practices.

18:00-20:00 Session 5B: Demonstrations
Detection and support of states of collaboration through discourse using the Jigsaw Interactive Agent (JIA)
PRESENTER: Peter Foltz

ABSTRACT. Jigsaw activities are a type of collaboration where students first learn different aspects of a topic to develop initial expertise and then come together to pool their knowledge and jointly solve more complex problems. However, students often do not engage in much discussion and could benefit from group- and individual-level support in real-time. The Jigsaw Interactive Agent (JIA) is aimed at boosting more effective knowledge sharing and brainstorming, scaffolding discussion and enhancing engagement to develop productive collaborative discussions (e.g., Järvelä & Hadwin, 2013).

JIA records the audio of small group jigsaw interactions and provides text-based feedback to support the group. The JIA agent architecture detects states of collaboration by performing automated speech recognition, speaker diarization and then analyzing student discourse for collaboration markers. JIA incorporates both a rule-based and an LLM-based agent for detecting states and providing feedback. Agent feedback is oriented around supporting the group in maintaining productive uncertainty (e.g., Chen, 2020; Watkins & Manz, 2022). This includes problematizing by contrasting aspects of students’ talk to seed uncertainty, providing social support for ongoing collaborations, connecting by inviting others into conversations to make connections between ideas, and stabilizing the conversation by taking stock of where the conversation is and areas of convergence or divergence within the group.

The demo will show a version of JIA that is being used in a middle school curriculum unit to support students in addressing the driving question of “How do human and computers collaborate to moderate online gaming communities?”

Hybrid Human AI Tutoring Technology (HAT) Demo
PRESENTER: Sidney D'Mello

ABSTRACT. This demo will showcase an automated discourse analysis platform, HAT (Hybrid Human AI Tutoring Technology), that analyzes tutoring sessions and produces discourse analytics in the form of visualizations and insights that help tutors enhance their tutoring practices, increase their student engagement, and ultimately improve student achievement.

HAT uses automatic speech recognition and discourse analysis models (Figure 1) based on academically productive talk (APT, Michaels and O’Connor (2015)) to process audio recordings of tutoring sessions (obtained with necessary permissions and with safeguards in place) and produces visualizations and insights of components of APT called talk moves, such as encouraging rigorous thinking, expressing content knowledge, and supporting the learning community.

HAT includes the following user interface features: ● a single-session feedback page with an overview of talk patterns and moves along with key moments and highlights from the session (Figure 2) ● an annotated, interactive timeline of talk moves synchronized with a video session recording and transcript (Figure 3) ● a scatterplot showing how tutors’ use of talk moves changes over time grouped by different student periods (Figure 4) ● We are also experimenting with a student talk tree (Figure 5) as a different way to visualize collaborative discourse.

HAT provides the promise to improve tutor discourse practices while advancing capabilities in applying computational approaches to discourse processing. It is currently deployed in a high-dosage math tutoring context in partnership with a nationally recognized tutoring service provider: Saga Education. Initial results demonstrate significant improvements in tutor talk moves (Sawaya et al., in preparation).

Intelligent Texts for Enhanced Lifelong Learning (iTELL)
PRESENTER: Wesley Morris

ABSTRACT. Intelligent Texts for Enhanced Lifelong Learning (iTELL) is a framework for generating interactive texts in any content domain. Content creators compose and publish iTELL texts using a user-friendly, WordPress style interface. The resulting web application will contain constructed response items (CRIs) and summary writing activities, both of which feature automated scoring and feedback mechanisms.

CRIs are automatically scored by an ensemble classification model (Morris et al. 2024a), and summaries are scored by a finetuned LLM (Morris et al. 2024b). Learners who write a low-scoring summary are prompted to reread a portion of the text, selected based on their writing and reading patterns. Learners then engage in a structured dialog with an LLM agent before revising their summary and proceeding to the next page.

In recent exit surveys, learners were overwhelmingly positive about their experience with iTELL (Crossley et al. 2024; Morris et al. 2024a). In a study of learning outcomes, we found greater learning gains for students who used iTELL compared to those who didn't, and these gains were greater for students with lower scores on the pretest (Crossley et al. 2024). These findings indicate that iTELL is an effective learning tool especially for students who start reading with less background knowledge.