Determining Rater and Test-Retest Reliability of Discourse Measures in the Spoken Personal Narratives of People with Aphasia

Establishing a stable baseline and high rater reliability pre-treatment enables more accurate judgments on treatment efficacy. We present two studies which separately evaluated rater and test-retest reliability of discourse metrics in aphasia: NEURAL Research Lab (NRL) and Language Underpins Narrative in Aphasia (LUNA). NRL recruited 25 persons with chronic aphasia (3 excluded for missing data) and 24 prospectively matched adults (1 excluded for poorer performance) without brain damage and measured %correct information units (%CIUs). LUNA recruited 28 participants with chronic aphasia and measured %narrative words and story grammar (a macrostructure-level measure). In both studies, two raters each analysed 50% of transcripts. Test-retest reliability was calculated at two timepoints. 10% (LUNA) and 20% (NRL) of transcripts were randomly selected for rater reliability. Different statistical analyses were used (ICCs for word-level measures; kappa for story grammar). Intra-rater reliability at word-level was excellent in LUNA (not computed for NRL). Inter-rater reliability at word-level was good-to-excellent for both studies. LUNA’s macrostructure-level measure showed good-to-very good intra-rater reliability and moderate inter-rater reliability. For test-retest reliability, LUNA’s word-level measure showed good-to-excellent reliability, and the macrostructure-level measure showed moderate-to-good reliability. In NRL, when averaging measures across tasks, test-retest reliability was good-to-excellent for both PWA and NBD groups, though test-retest reliability ranged from poor-to-excellent when evaluated by task. For both studies, rater reliability was high at word-level. LUNA’s macrostructure measure was analysed reliably within raters but was less reliable between raters or across timepoints. Both studies found word-level variables were highly reliable in aphasia groups. Notably, NRL's reliability was lower for the NBD group, and varied by narrative type.

