| ||||
| ||||
![]() Title:Prompt-Based Adaptation of Vision Language Models for Clinical Pain Note Generation from Neonate Cry Sound Conference:IEEE CBMS 2026 Tags:BLIP-2 model, Clinical Note Generation, Cry Audio Analysis, Few-Shot Prompting, Health Informatics, Medical AI, Neonatal Pain Assessment and Vision-Language Models Abstract: Accurate neonatal pain assessment remains challenging in clinical care, where documentation must be both timely and interpretable. We present a prompt-based method that adapts BLIP-2 to generate clinical pain notes from neonatal cry sounds with expert guided pain features. Cry recordings are converted to log-mel spectrograms, providing a visual representation of pain-related acoustic structure. BLIP-2 processes these spectrograms using a pretrained visual encoder and query-based cross-modal fusion, enabling image-conditioned language generation without task-specific retraining. With few-shot prompting, exemplar spectrograms paired with clinically meaningful 'pain' and 'no pain' descriptions guide the model to produce structured, human-readable notes that include an assessment outcome and salient cues such as high-frequency emphasis, intensity concentration, and temporal irregularity. Experiments show the framework produces consistent pain assessments under limited supervision, supporting AI-assisted neonatal pain documentation and decision support. The novelty impact of this paper is that it extends vision-language prompting into a clinical documentation setting by using neonate cry derived spectrograms not only for pain classification, but also for generating a structured clinical pain note. Prompt-Based Adaptation of Vision Language Models for Clinical Pain Note Generation from Neonate Cry Sound ![]() Prompt-Based Adaptation of Vision Language Models for Clinical Pain Note Generation from Neonate Cry Sound | ||||
| Copyright © 2002 – 2026 EasyChair |
