View: session overviewtalk overview
The registration desk will be on the ground floor of ICAM (Oituz, 4)
11:40 | Exploring Compression as a Proxy for Mineability in LLM-Generated Text ABSTRACT. Recent advancements in large language models (LLMs) have led to a surge of interest in evaluating the quality and usability of their outputs, particularly for information extraction and downstream analytics. However, existing evaluation methods often rely on costly human judgments or complex task- specific metrics. This paper investigates whether compression rate—a simple, model-agnostic measure—can serve as a proxy for the mineability of LLM-generated text. We generate text using varying sampling parameters (temperature and top-p) across two prompt types: product reviews and artifact descriptions. We compute compression rates using ZIP and Huffman algorithms, and evaluate mineability through perplexity based on n-gram language models (n = 2 to 8), both with and without stop-word removal. Our results reveal a consistent inverse correlation between compression rate and perplexity. This relationship strengthens with higher-order n-grams and with removing stop words, suggesting that compression captures underlying structure and predictability aligned with mineability. We further observe that ZIP compression is more sensitive to parameter changes than Huffman, and that artifact prompts yield more consistent patterns than product reviews. These findings support the use of compression as a lightweight indicator of structure and mineability in generated text. |
12:00 | OCWhy: Retrieval-Augmented Question Answering over Open CourseWare Lectures PRESENTER: Mihai Dascalu ABSTRACT. Students frequently seek learning materials for tests, exams, and assignments. To facilitate this process, we developed a Retrieval Augmented Generation (RAG) system that combines a Large Language Model with a Knowledge Base, enabling efficient access to course-related information. This study evaluates the utility of information sources available through the Open CourseWare lectures from the Faculty of Automatics and Computer Science, POLITEHNICA Bucharest. Various retrieval system design choices were refined, including document chunking strategies and reranking approaches, while highlighting corpus limitations. Our experiments argue that larger token window sizes, header-level reranking, and course-specific retrieval improved retrieval performance. The adequacy of the collected information was evaluated using multiple benchmarks, namely True/False statements and multiple-choice questions. Results show consistent performance improvements across most model and dataset combinations when using retrieval augmentation, with the highest gains observed in domain-specific technical content. |
12:20 | CrossRead: An NLP Pipeline for Identifying Similar News Articles Across Multiple Sources PRESENTER: Victor-Miron Boiangiu ABSTRACT. The rate at which fake news articles are written and disseminated on social media is alarming, posing a significant threat to both national security and individual well-being. This paper supports mitigating the propagation of fake news by empowering individuals to fact-check articles by comparing them to those from trustworthy news organizations on a platform that helps them find alternative sources and identify discrepancies between the articles. The backbone of the solution is a 3-staged processing pipeline that processes the article, fetches alternative sources, and generates the final similarity report. For fine-tuning the models in the pipeline, two new corpora were created, as no existing datasets were available for Romanian. One corpus consisted of synthetically generated search queries for a given article, whereas the second consisted of human annotations of pairs of news articles, labeled as either similar or not similar. Our pipeline, named CrossRead, enables users to easily compare sources and quickly fact-check articles while working reliably with articles in Romanian. The presented platform also constitutes an excellent base for a more feature-rich solution, with numerous improvements possible to assist its users in their search for the truth. |
12:40 | LLMic: Building a Romanian Foundation Language Model PRESENTER: Mihai-Valentin Dumitru ABSTRACT. Recent advances in Large Language Models (LLMs) have shown remarkable capabilities across various tasks, with commercial models leading the way. While open models usually operate at smaller scales due to constraints on available corpora and hardware resources, they maintain competitiveness through specialization and fine-tuning. However, a significant challenge persists: the under-representation of low-resource languages in open datasets results in weak model capabilities in many languages. In this paper, we document the complete process of pretraining a foundation model for Romanian, a low-resource language, including corpus construction, architecture selection, and hy- perparameter optimization. As part of this work, we introduce FuLG, a hundred-fifty-billion-token Romanian corpus extracted from CommonCrawl, alongside a 3-billion-parameter bilingual model, LLMic. Our evaluation shows that it is worthwhile to train language-specific models for specialized tasks, achieving results comparable to other much larger open and closed models. We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in the English-to-Romanian translation task. We hope through this work to advance the standing of the Romanian language in the world of LLMs. |
14:00 | Synthetic Data Generation with LLMs PRESENTER: Andreea Dutulescu ABSTRACT. This tutorial provides a technical overview of synthetic data generation using Large Language Models (LLMs), focusing on core methodologies and their integration. Synthetic data has become an essential tool for addressing key limitations in the availability, cost, and distributional coverage of manually annotated datasets. It enables scalable experimentation, facilitates data augmentation in low-resource settings, and supports iterative model refinement. The session begins with a discussion of generation methods and filtering strategies designed to enforce quality constraints. Next, the tutorial examines practical use cases. These include alignment tuning, where synthetic datasets are used to steer model behavior; inference-time augmentation, where generated exemplars support few-shot generalization or contextual adaptation; and self-improvement workflows, where models contribute to their iterative training through synthetic supervision. |
15:50 | PyStash: Retrieval-Augmented Generation Pipeline Context Aware Fine Tuning PRESENTER: Iulian-Teodor Deac ABSTRACT. Retrieval-Augmented Generation (RAG) pipelines improve the factual consistency of large language model (LLM) outputs by grounding responses in external documents. However, most existing RAG implementations rely on fixed retrieval settings, which cannot adapt dynamically to query complexity, user intent, or document structure. This work presents PyStash, an extensible, modular RAG platform that integrates a novel context-aware optimisation algorithm for adaptive retrieval parameter tuning. The system automatically generates synthetic question-answer pairs from the user-provided corpus to optimise retrieval depth (top-k) and semantic similarity thresholds, enhancing the relevance and efficiency of context selection. PyStash supports document management, multi-model evaluation, directory-level isolation of configurations, and traceable chunk-level citations, all accessible through a graphical user interface. Experiments across multiple open-weight LLMs show that the proposed optimisation mechanism reduces inference latency and irrelevant references while maintaining or improving answer quality. |
16:10 | ContaGPT: A Domain-Adapted LLM for Romanian Financial and Accounting Applications PRESENTER: Cezar Tudor ABSTRACT. Large language models (LLMs) perform well across domains but often struggle in low-resource languages and specialized fields. We introduce ContaGPT, a Romanian domain-adapted LLM fine-tuned for financial and accounting-related tasks. Based on Microsoft’s Phi-4 model (14B parameters), ContaGPT was trained with Quantized Low-Rank Adaptation (QLoRA) on curated Romanian fiscal documents, Enterprise Resource Planning (ERP) manuals, and accounting guides. To improve factual grounding, we integrated ContaGPT into a hybrid Retrieval-Augmented Generation (RAG) pipeline combining BM25 and embedding-based retrieval with reranking. Evaluation on 100 real-world Romanian financial queries—via both preference-based and metric-based human review—shows ContaGPT significantly outperforms the base Phi-4 in correctness, relevance, and user preference. Despite its improved performance, ContaGPT remains efficient: it was trained on consumer GPUs and runs at 4-bit precision, requiring only 8GB of memory for inference. ContaGPT illustrates how practical, low-cost techniques can adapt open-source LLMs to low-resource languages and professional domains. |
16:30 | Finetuned Llama-3 based Solution for Specific Information Retrieval with Enhanced Reliability ABSTRACT. Our study investigates solutions for developing a cost-effective chatbot to be used by an organization in need of an automated question answering support system. Many organizations (in particular, Universities) seek AI-based solutions capable to provide students the university life information in an automated manner. Many times however these are based on complex architectures and may prohibitive for institutions with limited resources. In our study, we examine how different models, context configurations, and personas (instructions meant to guide the system in terms of the answer purpose or tone) affect customer satisfaction regarding answers. We evaluated three Llama-3 variants fine-tuned across different combinations of training parameters, extra context provision and personas. Our solution focused on university applicability, namely for the Technical University of Cluj-Napoca. Its performance was measured through customer satisfaction scores obtained from university students who interacted with the chatbot and rated their experience. Results demonstrate that fine-tuned models with 1000 iterations achieved up to 85% customer satisfaction when combined with both extra context and persona features, compared to about 25% for the base model alone. These findings suggest that effective specialized chatbots could be implemented without complex Retrieval Augmented Generation architectures, providing practical guidance for companies considering deployment. Strategic model selection and prompt engineering techniques can achieve high customer satisfaction while maintaining implementation simplicity, cost-effectiveness and replicability. |
16:50 | Source Code Metrics and LLMs Summaries: Do Correlations Exist? ABSTRACT. Source code metrics help developers assess complexity and maintainability. Large language models (LLMs) can generate code summaries, but it remains unclear whether summary length reflects structural properties. This study explores correlations between source code metrics and numbers of words produced by a Large Language Model when summarizing code. We examine data from a suite of systems to search for patterns of positive and negative correlations. We present evidence that there are system-specific relationships, but no single pattern exists over all systems. |
17:10 | A visual comparison between Neutral Networks and Schemata dynamics in Genetic Algorithms ABSTRACT. In this paper, we study how the theoretical perspectives of Schema Theorem [Holland, 1975, 1992] and Neutral Networks [Kimura, 1968, Aguirre et. al., 2009] applied to Genetic Algorithms (GA) agree or differ. In both cases, we use deterministic 2D visualisations to display the distribution of neutral networks and schemata within a generation, and view their changes across successive generations. For easier result comparison, we use four well-known numerical benchmark functions (DeJong's 1st, Rastrigin's, Schwefel's and Michalewicz's). Because the number of schemata increases exponentially with chromosome length, we study only small problem instances; however, combining this limitation with deterministic visualisations affords us visual interpretability. We observe how algorithm convergence focuses both schemata and neutral networks to certain patterns within the population (and within genotype space), how they explore and switch optima, and see some covariance in the counts of population-instanced schemata and neutral networks, suggesting a partial overlap between what the two theories measure. |
17:30 | Do Language Models Help or Harm? The Role of LLM-Augmented Explanations in Human-AI Image Classification Tasks ABSTRACT. As large language models (LLMs) are increasingly integrated into explainable AI (XAI) pipelines, there is growing interest in whether their fluent, human-like explanations improve or hinder decision-making in AI-assisted tasks. In this study, we examine how LLM-generated narrative explanations affect user understanding, confidence, and accuracy in a human–AI computer vision setting. Participants completed a fine-grained image classification task involving dog breeds, supported by either visual-only explanations (Grad-CAM) or visual + narrative explanations generated using GPT-4o. Using a 2×2 within-subjects design, we evaluated the effects of explanation type and model correctness on participant agreement with the AI, confidence ratings, decision accuracy, and confidence–accuracy calibration. Our results reveal a double-edged effect: narrative explanations increased confidence—especially when the model was correct—but did not improve overall accuracy. Critically, participants were more likely to accept incorrect predictions when a narrative explanation was present, suggesting a risk of overtrust. These findings highlight the persuasive power—but also potential pitfalls—of LLM-augmented explanations in vision tasks. |