EVO*2025: EVOSTAR 2025
PROGRAM FOR WEDNESDAY, APRIL 23RD
Days:
next day
all days

View: session overviewtalk overview

11:00-13:15 Session 3A: EvoMUSART Image Making and Visual Art (i)
Location: Room A
11:00
Search-based Negative Prompt Optimisation for Text-to-Image Generation

ABSTRACT. Text-to-image generative models are machine learning models that take a description written in natural language as input and generate images matching this description. As with other types of generative models, text-to-image ones tend not to be precise due to various reasons, such as hallucinations or randomness, and are influenced by the input description (a.k.a. user's prompt). Therefore their use might lead to images that do not fully meet user's expectations. Prompt engineering (i.e., the process of structuring text that can be interpreted and understood by a generative model) poses a significant challenge, demanding a considerable amount of manual effort to ensure high-quality image generation. In this work, we explore the use of a local search guided by sentence similarity to optimize text-to-image generation via negative prompts. Our results suggest that by using our approach, it is possible to improve the generation process, thus obtaining more accurate images with no additional human effort.

11:25
An Ensemble Approach to Music Source Separation: A Comparative Analysis of Conventional and Hierarchical Stem Separation

ABSTRACT. Music source separation (MSS) is a task that involves isolating individual sound sources, or stems, from mixed audio signals. This paper presents an ensemble approach to MSS, combining several state-of-the-art architectures to achieve superior separation performance across traditional Vocal, Drum, and Bass (VDB) stems, as well as expanding into second-level hierarchical separation for sub-stems like kick, snare, lead vocals, and background vocals. Our method addresses the limitations of relying on a single model by utilising the complementary strengths of various models, leading to more balanced results across stems. For stem selection, we used the harmonic mean of Signal-to-Noise Ratio (SNR) and Signal-to-Distortion Ratio (SDR), ensuring that extreme values do not skew the results and that both metrics are weighted effectively. In addition to consistently high performance across the VDB stems, we also explored second-level hierarchical separation, revealing important insights into the complexities of MSS and how factors like genre and instrumentation can influence model performance. While the second-level separation results show room for improvement, the ability to isolate sub-stems marks a significant advancement. Our findings pave the way for further research in MSS, particularly in expanding model capabilities beyond VDB and improving niche stem separations such as guitar and piano.

11:50
Evolving the Embedding Space of Diffusion Models in the Field of Visual Arts

ABSTRACT. This paper presents a novel method to guide image generation by optimizing the embedding space of diffusion models using evolutionary algorithms. Instead of relying on traditional prompt engineering, the approach directly evolves the prompt embeddings that condition text-to-image generation. Evolutionary operators, such as crossover and mutation, are applied to iteratively refine the embeddings, which are then fed into the diffusion model to generate an image. The fitness of each embedding is determined by the resulting image.

Using the SDXL-Turbo model as a test case, a genetic algorithm is employed to optimize its prompt embeddings, leading to improvements in fitness as measured by the LAION Aesthetics Predictor V2. Results show that over generations, the optimized embeddings yield significant gains in fitness scores compared to the initial training images. The underlying framework is publicly available and executable in a Jupyter Notebook, allowing for further experimentation and adaptation to various generative tasks.

12:00
Generating Virtual Landscapes and Environmental Narratives with StyleGAN2

ABSTRACT. This paper presents an AI-driven approach to landscape representation and environmental commentary through a digital art installation focused on a protected natural area threatened by industrial activities. Using StyleGAN2-ADA and a carefully curated dataset of images, the project generates a video artwork that simulates an immersive, continuous exploration of the park's diverse terrain. Beyond visually capturing the park's natural beauty, the artwork highlights its ecological fragility by emphasizing environmental threats posed by nearby industrial encroachment. By blending advanced AI techniques with ecological advocacy, this project contributes a unique perspective to digital arts and expands the discourse on the role of technology in environmental awareness, demonstrating AI's potential to create compelling, awareness-driven artistic expressions.

12:10
Aesthetic biases and opacity tactics in the training of visual artificial intelligence models

ABSTRACT. This paper delves into the concept of taste as a classifier and its role in perpetuating cultural capital. Pierre Bourdieu's study on class and taste highlights how those with higher cultural capital dictate societal notions of good taste, influencing dominated classes. In the era of machine learning visual generative methods, the perpetuation of cultural capital ownership occurs through statistical processes. The evaluation of visual neural networks' aesthetic quality involves two crucial steps: filtering out low-quality images and creating test synthetic images for evaluation during training. However, these evaluations are biased, reflecting the preferences of a select group of individuals. Aesthetic evaluation data is obtained through public rating systems, but most of their users belong to a specific demographic, leading to further homogeneity in taste. Consequently, current rating systems reinforce a limited cultural capital rooted in access to technology and computational creativity. To foster diversity in neural generative systems, the paper proposes the development of new scorer systems, incorporating ratings from individuals in the Global South, those unfamiliar with generative systems or computers, and marginalized cultures. This endeavor aims to build a more inclusive aesthetic guidance and address the dominance of specific cultural capital in the field of visual culture. Finally, we also describe Edouard Glissant’s concept of opacity as a valid resistance strategy for cultures which do not want to be mapped within generative artificial intelligence models.

12:20
Short video interestingness: a machine learning approach to determine creative cues in audiovisual production

ABSTRACT. Models predicting video interestingness often prioritize visual aspects while neglecting audio and the overall audiovisual perspective. They typically depend on less interpretable, handcrafted features to enhance prediction efficiency using deep learning techniques. Video production usually focuses on grammar analysis of trends and viral content rather than exploring signal behaviour or human perception, which are important in other creative fields. This work aims to develop a model that integrates audio, visual, and audiovisual elements, analyzing key instants based on distinct visual and sound characteristics. Handcrafted features are implemented in order to obtain off-the-shelf cues for audiovisual production, which aspires to create potentially interesting content for the viewers.

12:30
Towards the Automatic Evaluation of Legibility for Graphic Design Posters

ABSTRACT. Evolutionary algorithms have been explored to aid creative tasks such as graphic design, e.g. by speeding up the creative process through the generation of innovative visual solutions that designers may get inspired by or may use as a starting point for their work. However, state-of-the-art systems often present shortcomings in controlling the legibility of the text contents present in the generated artefacts. This paper presents an ocr based approach for evaluating the legibility degree of communication artefacts, such as graphic design posters. We experiment with various metrics to compare the detected with the original text of the posters. To test these metrics, the respective results are compared with human evaluations of legibility. Furthermore, posters are evolved using the proposed approach as a fitness metric. Our findings highlight which metrics are more closely aligned with human evaluations. Also, the results suggest our approach can be successfully utilised to generate legible posters.

12:40
Steering Large Text-to-Image Model for Abstract Art Synthesis: Preference-based Prompt Optimization and Visualization

ABSTRACT. With the advancement of neural generative capabilities, the art community has increasingly embraced GenAI (Generative Artificial Intelligence), particularly large text-to-image models, for producing aesthetically compelling results. However, the process often lacks determinism and requires a tedious trial-and-error process as users often struggle to devise effective prompts to achieve their desired outcomes. This paper introduces a prompting-free generative approach that applies a genetic algorithm and real-time iterative human feedback to optimize prompt generation, enabling the creation of user-preferred abstract art through a customized Artist Model. The proposed two-part approach begins with constructing an Artist Model capable of deterministically generating abstract art in specific styles, e.g., Kandinsky's Bauhaus style. The second phase integrates real-time user feedback to optimize the prompt generation and obtains an Optimized Prompting Model, which adapts to user preferences and generates prompts automatically. When combined with the Artist Model, this approach allows users to create abstract art tailored to their personal preferences and artistic style.

12:50
Exploring Multi-Objective Evolution for Aesthetic & Abstract 3D Art
PRESENTER: Ritwik Murali

ABSTRACT. As three-dimensional (3D) art becomes increasingly prevalent in fields such as architecture, game design, and AR/VR, this study examines the use of multi-objective evolutionary algorithms (MOEAs) to generate diverse and aesthetically novel 3D art. While most existing research focuses on 2D art and single-objective 3D art generation, this study explores the effectiveness of combining multiple fitness measures to evolve abstract and complex 3D art. Six objective functions, representing high (Category 1) and low (Category 2) user-rated metrics, were paired in 60 unique combinations influenced by four directional factors. The research also analyses the impact of two MOEAs, NSGA-II and NSGA-III, using Hotelling’s T^2 test to evaluate the diversity of the generated populations. The statistical test results showed that NSGA-III promotes greater diversity in the evolved 3D art when compared to NSGA-II, especially when combining specific fitness measures. Additionally, analysis of user ratings revealed that NSGA-II outperformed NSGA-III in generating models that resonated positively with users. The user rating also identified the most preferred fitness pairings, highlighting the subjective nature of aesthetic judgements. Finally, the research also examines the potential of large multimodal vision / language models to reduce subjectivity by exploring aesthetic appeal, structural complexity, and interpretive potential of the evolved 3D art. Interpretations from an open-source large multimodal model suggested that the evolved 3D art combined organic and geometric elements, resembling abstract representations of real-world objects and art styles. These findings suggest a promising avenue for using MOEAs to evolve abstract art and explore Large Language Models (LLMs) / multimodal LLMs as evolutionary operators to evolve abstract and aesthetic 3D art.

11:00-13:15 Session 3B: EML
Location: Room B
11:00
Social Interpretable Reinforcement Learning

ABSTRACT. Reinforcement Learning (RL) bears the promise of being a game-changer in many applications. However, since most of the literature in the field is currently focused on opaque models, the use of RL in high-stakes scenarios, where interpretability is crucial, is still limited. Recently, some approaches to interpretable RL, e.g., based on Decision Trees, have been proposed, but one of the main limitations of these techniques is their training cost. To overcome this limitation, we propose a new method, called Social Interpretable RL (SIRL), that can substantially reduce the number of episodes needed for training. Our method mimics a social learning process, where each agent in a group learns to solve a given task based both on its own individual experience as well as the experience acquired together with its peers. Our approach is divided into the following two phases. (1) In the collaborative phase, all the agents in the population interact with a shared instance of the environment, where each agent observes the state and independently proposes an action. Then, voting is performed to choose the action that will actually be deployed in the environment. (2) In the individual phase, then, each agent refines its individual performance by interacting with its own instance of the environment. This mechanism makes the agents experience a larger number of episodes with little impact on the computational cost of the process. Our results (on 6 widely-known RL benchmarks) show that SIRL not only reduces the computational cost by a factor varying from a minimum of 43% to a maximum 76%, but it also increases the convergence speed and, often, improves the quality of the solutions.

11:25
EDCA - An Evolutionary Data-Centric AutoML Framework for Efficient Pipelines

ABSTRACT. Automated Machine Learning (AutoML) gained popularity due to the increased demand for Machine Learning (ML) specialists, allowing them to apply ML techniques effortlessly and quickly. AutoML implementations use optimisation methods to identify the most effective ML solution for a given dataset, aiming to improve one or more predefined metrics. However, most implementations focus on model selection and hyperparameter tuning. Despite being an important factor in obtaining high-performance ML systems, data quality is usually an overlooked part of AutoML and continues to be a manual and time-consuming task. This work presents EDCA, an Evolutionary Data Centric AutoML framework. In addition to the traditional tasks such as selecting the best models and hyperparameters, EDCA enhances the given data by optimising data processing tasks such as data reduction and cleaning according to the problems' needs. All these steps create an ML pipeline that is optimised by an evolutionary algorithm. To assess its effectiveness, EDCA was compared to FLAML and TPOT, two frameworks at the top of the AutoML benchmarks. The frameworks were evaluated in the same conditions using datasets from AMLB classification benchmarks. EDCA achieved statistically similar results in performance to FLAML and TPOT but used significantly less data to train the final solutions. Moreover, EDCA results reveal that a good performance can be achieved using less data and efficient ML algorithm aspects that align with Green AutoML guidelines.

11:50
Generate more than one child in your co-evolutionary semi-supervised learning GAN

ABSTRACT. Generative Adversarial Networks (GANs) are very useful methods to address semi-supervised learning (SSL) datasets, thanks to their ability to generate samples similar to real data. This approach, called SSL-GAN has attracted many researchers in the last decade. Evolutionary algorithms have been used to guide the evolution and training of SSL-GANs with great success. In particular, several co-evolutionary approaches have been applied where the two networks of a GAN (the generator and the discriminator) are evolved in separate populations. The co-evolutionary approaches published to date assume some spatial structure of the populations, based on the ideas of cellular evolutionary algorithms. They also create one single individual per generation and follow a generational replacement strategy in the evolution. In this paper, we re-consider those algorithmic design decisions and propose a new co-evolutionary approach, called Coevolutionary Elitist SSL-GAN (CE-SSLGAN), with panmictic population, elitist replacement, and more than one individual in the offspring. We evaluate the performance of our proposed method using three standard benchmark datasets. The results show that creating more than one offspring per population and using elitism improves the results in comparison with a classical SSL-GAN.

12:15
Evolving RNNs for Stock Forecasting: A Low Parameter Efficient Alternative to Transformers

ABSTRACT. Stock return forecasting is a critical application of time series forecasting in finance, facilitating informed trading and management decisions that can lead to substantial returns. However, for large investment portfolios, designing and fine-tuning models for individual stock predictions is time-consuming and computationally intensive. In this work, we propose using the neuroevolution-based neural architecture search algorithm, Evolutionary eXploration of Augmenting Memory Models (EXAMM), to evolve recurrent neural networks (RNNs) for stock return prediction. We compare the prediction performance of these evolved RNNs with that of state-of-the-art Transformer and deep learning models. Our results indicate that EXAMM-evolved RNNs outperform or achieve comparable performance with these models across 50 multivariate stock datasets and a combined high-dimensional dataset with 300 input features and 50 outputs. Additionally, they require orders of magnitude fewer parameters and can be evolved and operate efficiently using a minimal 8-core CPU configuration as opposed to expensive GPUs.

12:40
Into the Black Box: Mining Variable Importance with XAI

ABSTRACT. Recent works have shown that the idea of mining search spaces to train machine learning models can facilitate increasing understanding of variable importance in optimisation problems. However, so far, the problems studied have typically either been toy benchmarks or have not had known ground-truth importances. A newly established combinatorial optimisation benchmark domain, Polynomial Unconstrained Binary Optimisation with variable importance (PUBOi), facilitates problem instances with tunable variable importance. In this work, we explore the potential of using explainable artificial intelligence (XAI) attribution methods for uncovering variable importances from mined search space models on PUBOi instances with ground-truth importances. We compare learning algorithms, XAI methods, and sample sizes used to train the models to better understand which techniques are promising in this context. The analysis lays the groundwork for future possibilities of using XAI on mined search spaces models during search to adapt or switch operators for more effective optimisation.

13:05
Micro-Step Time-Series Regression: Insights from System Identification Using Symbolic Regression
PRESENTER: Hengzhe Zhang

ABSTRACT. Time-series forecasting is widely applied across various domains, yet most approaches rely on predefined time steps given by each problem. Based on observations from dynamic systems with known ground truth, we identify that large-step forecasts can lead to substantial errors due to insufficient modeling of continuous dynamics. To address this, we propose a micro-step time-series regression technique that decomposes predictions into smaller intervals, so that genetic programming-based feature construction can capture finer temporal patterns to improve the prediction performance. Specifically, we employ linear interpolation to allow the evolutionary feature construction process to learn from incremental changes, reducing the difficulty of time-series regression. Experiments on 100 datasets from the M4 forecasting benchmark demonstrate that micro-step regression significantly enhances prediction accuracy compared to traditional methods using raw time steps. Further analysis reveals that features trained on micro-step data evolve into simpler structures, promoting both generalization and interpretability.

11:00-13:15 Session 3C: EvoApplications: Misc Applications (i)
Location: Room C
11:00
A Coach-Based Quality-Diversity Approach for Multi-Agent Interpretable Reinforcement Learning

ABSTRACT. Thanks to the advances in deep Reinforcement Learning (RL) and its demonstrated capabilities to perform complex tasks, the field of Multi-Agent RL (MARL) has recently undergone major developments. However, current MARL approaches still suffer from a general lack of interpretability. Decision Trees (DTs) combined with RL leaves can overcome this problem while still maintaining high performance, but they require efficient search strategies to converge to a high-performing solution. In this paper, we discuss the development of an interpretable MARL algorithm based on DTs combined with RL whose structures are optimized by means of MAP-Elites and Genetic Programming, and test it on a team-based game. To enhance the evolutionary process, we introduce a coach agent that supervises the evolution and team creation during the training of the agents. The proposed strategy is tested in conjunction with different team creation mechanisms and evolutionary selection methods, to assess the effect of having a coach supervise the entire process. Results demonstrate how the algorithm can effectively find high-performing policies to accomplish the given task, while the coach pushes even further the team optimization, hence improving the algorithm's overall performance.

11:10
Evolutionary Reinforcement Learning for Interpretable Decision-Making in Supply Chain Management

ABSTRACT. In the context of Industry 4.0, Supply Chain Management (SCM) faces challenges in adopting advanced optimization techniques due to the ``black-box'' nature of most AI-based solutions, which causes reluctance among company stakeholders. To overcome this issue, in this work, we employ an Interpretable Artificial Intelligence (IAI) approach that combines evolutionary computation with Reinforcement Learning (RL) to generate interpretable decision-making policies in the form of decision trees. This IAI solution is embedded within a simulation-based optimization framework specifically designed to handle the inherent uncertainties and stochastic behaviors of modern supply chains. To our knowledge, this marks the first attempt to combine IAI with simulation-based optimization for decision-making in SCM. The methodology is tested on two supply chain optimization problems, one fictional and one from the real world, and its performance is compared against widely used optimization and RL algorithms. The results reveal that the interpretable approach delivers competitive, and sometimes better, performance, challenging the prevailing notion that there must be a trade-off between interpretability and optimization efficiency. Additionally, the developed framework demonstrates strong potential for industrial applications, offering seamless integration with various Python-based algorithms.

11:20
Variable-Size Genetic Network Programming for Portfolio Optimization with Trading Rules
PRESENTER: Fabian Köhnke

ABSTRACT. We present an extension of a graph-based evolutionary algorithm called Genetic Network Programming (GNP) by a novel mutation operator, which allows for a variable number of nodes and edges per individual. With this operator, the search space is significantly extended, but without the risk of incurring the bloat problem. The operator is fitness neutral and has no hyper-parameter. Due to higher flexibility, it is now possible for GNP to automatically adapt to the complexity of a given task and to find suitable features, especially for high dimensional data sets. We applied our mutation operator successfully in a GNP for a financial data set where it improved over standard GNP with an optimal network size while maintaining the interpretability of the solution candidates.

11:30
Building Cross-Sectional Trading Strategies via Geometric Semantic Genetic Programming

ABSTRACT. Cross-sectional trading strategies involves constructing portfolios by comparing expected performance of assets within a group, typically using predicted returns. In this study, we frame the estimation of cross-sectional expected returns as a symbolic regression problem, and investigate the predictive capabilities of geometric semantic genetic programming in developing cross-sectional trading strategies in the U.S. stock market. We employ standard genetic programming and other common methods used for studying cross-sectional returns as baselines for comparison. Our findings indicate that geometric semantic genetic programming provides better forecast accuracy, portfolio performance, and ranking accuracy than standard genetic programming. Furthermore, we show the limitations of errors-based metrics as performance measurement in cross-sectional trading strategies.

11:40
Facial Geometric Feature Extraction for Dimensional Emotion Analysis Using Genetic Programming

ABSTRACT. Geometric features derived from single static images have the potential to be highly effective for facial emotion analysis, as shape, structure, and spatial relationships are key factors. However, these aspects are rarely explored in existing research. In this paper, we propose a novel approach that utilizes Genetic Programming (GP) to automatically extract geometric features for more effective emotional representation. The proposed GP system uses various evaluation strategies, evolving either a single feature per run or multiple features within a single run. These GP-evolved features capture critical angular and distance-based relationships between facial landmarks, which are then integrated with an existing deep learning model to enhance performance. The results show that the proposed method achieves improved performance in dimensional emotion analysis, providing a more comprehensive understanding of emotional expressions in static images. In addition, our approach is effective in improving the accuracy of emotion predictions, establishing a foundation for more precise facial emotion analysis using geometric information.

11:50
Evolutionary Computation for Causality-Driven Feature Selection: A Preliminary Study

ABSTRACT. Selecting optimal features in high-dimensional spaces remains challenging due to their complexity and the focus on correlational rather than causal relationships. While evolutionary computation algorithms for feature selection show promising results, they often face challenges in identifying feature subsets that are both interpretable and causally relevant. In this paper, we present a preliminary study in which we investigate how causality affects the search capability of feature selection based on evolutionary computation. We tested Genetic Algorithms and Particle Swarm Optimization to this aim and compared their performance with wrapper-based approaches. Comprehensive experiments across multiple benchmark datasets reveal that our methods consistently identify features with stronger causal relationships and superior interpretability than traditional approaches. Our results demonstrate the significant potential of integrating causality to enhance evolutionary computation algorithms for feature selection.

12:00
Inferring Reaction Elasticities from Metabolic Correlations in Cells through Multi-objective Evolutionary Optimization

ABSTRACT. Cell metabolism is a complex dynamical system, and with experimental data being noisy and sparse, parameter fitting in metabolic models is challenging. In Bayesian estimation, prior knowledge about parameter values is weighted against knowledge from data fitting. But since error bars and prior widths are often a matter of debate, a flexible way of regulating this trade-off is needed. Here we propose an evolutionary multi-objective approach to parameter estimation, to find compromises between model parameter values matching the prior (prior loss term) and yielding good data fits (likelihood loss term). Our modeling framework for cell metabolism describes an ensemble of cell states with correlated variation of all model variables, and relies on linearized models with reaction elasticities as parameters to be estimated based on variances and covariances of metabolite concentrations. To evaluate its effectiveness, we conduct two tests with artificial data and a known ground truth. We first consider a simple metabolic pathway with 3 reactions and 4 metabolites, where correlated variation of variables can still be understood intuitively. The second test involves a more complex and realistic real-world metabolic model of E. coli bacteria with 62 metabolites, 57 reactions, and 234 elasticity coefficients to be fitted. Here, the results are almost impossible to guess even for domain experts. In both cases, the proposed method produces satisfactory results. This work paves the way to studying compromises between biological objective functions that do not concern model fitting, but the functioning of the cell, for example information transmission across metabolic networks.

12:10
Optimizing the logistics operations of distribution network operators from a multinational electric utility company

ABSTRACT. Distribution network operators (DNOs) share a significant responsibility regarding the assurance of electrical energy supply quality and continuity. In detail, DNOs are legally required to: (i) address electrical emergency occurrences quickly, especially to restore electricity supply, and; (ii) ensure the efficiency of service concerning commercial occurrences. In this work, we propose a solution to optimize the logistics operations of the Spanish multinational electric utility company Iberdrola. Our work scope is Neoenergia, the Brazilian subsidiary controlling five different DNOs. In our work, we follow the CRISP-DM data science framework to address the allocation of operations bases. The solution was developed and successfully deployed in collaboration with the analytics team of Neoenergia. In detail, we model the problem as a knapsack and tackle it with an iterated greedy metaheuristic. Results show a decrease in the distances between bases and occurrences when compared to the current approach adopted by Neoenergia. Our approach also reduces travel times, contributes to the improvement of supply continuity indices, and better meets company business requirements. Importantly, we provide a simulation tool to recommend future base allocation, which comprises valuable input to planning.

12:20
Hybrid Optimization of Horizontal Alignments in European Terrains: A Comparative Study

ABSTRACT. Path planning across terrain is a fundamental challenge in civil engineering, with applications ranging from transportation infras- tructure to urban development. Recent advances in computational meth- ods have enabled automated route optimization, particularly in horizon- tal alignment problems that balance construction costs with terrain con- straints. However, standardized comparisons of optimization approaches across diverse geographical contexts remain limited, hindering the devel- opment of reliable automated planning systems. Here we show through a systematic comparative study across three European landscapes that A* significantly outperforms RRT* in initial path generation, with bet- ter computational efficiency and terrain adaptation, while PSO demon- strates superior optimization capabilities compared to CMA-ES and DE in refining these paths against roadway construction criteria. Through extensive parameter validation, we find these performance advantages re- main consistent across different geographical contexts and topographical challenges, with the hybrid A*-PSO approach achieving significantly bet- ter results than applying optimization algorithms to straight-line paths alone. These findings provide a comprehensive comparison of key algo- rithms in infrastructure planning optimization, demonstrating the rel- ative strengths of different approaches in horizontal alignment tasks. This comparative analysis offers practical guidance for algorithm selec- tion while highlighting opportunities for further development through the incorporation of real-world engineering constraints.

12:30
Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management

ABSTRACT. Fairness, the impartial treatment towards individuals or groups regardless of their inherent or acquired characteristics [19], is a critical challenge for the successful implementation of Artificial Intelligence (AI) in multiple fields like finances, human capital, and housing. A major struggle for the development of fair AI models lies in the bias implicit in the data available to train such models. Filtering or sampling the dataset before training can help ameliorate model bias but can also reduce model performance and the bias impact can be opaque. In this paper, we propose a method for visualizing the biases inherent in a dataset and understanding the potential trade-offs between fairness and accuracy. Our method builds on quality-diversity optimization, in particular Covariance Matrix Adaptation Multi-dimensional Archive of Phenotypic Elites (MAP-Elites). Our method provides a visual representation of bias in models, allows users to identify models within a minimal threshold of fairness, and determines the trade-off between fairness and accuracy.

12:40
Using Local Correlation Between Objectives to Detect Problem Modality

ABSTRACT. Understanding the characteristics of multiobjective optimization problems (MOPs) is crucial for designing and configuring optimization algorithms that can efficiently solve them. This paper introduces a method that uses the estimation of local correlation between objectives to transform MOP landscapes into single-objective problem (SOP) landscapes. With this transformation, we make it possible to apply SOP landscape features to MOPs, thereby extracting valuable information about problem properties, such as modality. Our approach integrates both sample-based and search-based features, which are assessed for their ability to distinguish between unimodal, moderately multimodal, and highly multimodal MOPs. The proposed method is validated through a two-phase experimental setup. In the first phase, we identify features that can reliably identify problem modality under ideal conditions with abundant data. The second phase evaluates their performance in more realistic scenarios with smaller samples and higher problem dimensions. The results show that features computed on the local correlation landscape achieve comparable or better performance than existing MOP features. These findings demonstrate the capability of SOP features to generalize to MOPs, showcasing their potential for characterizing MOP landscapes and inspiring future research on extending this approach to uncover additional problem properties.

12:50
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model

ABSTRACT. Recent approaches to training algorithm selectors in the black-box optimisation domain have advocated for the use of training data that is `algorithm-centric' in order to encapsulate information about how an algorithm performs on an instance, rather than relying on information derived from features of the instance itself. Probing trajectories that consist of a sequence of objective performance per function evaluation obtained from a short run of an algorithm have recently shown particular promise in training accurate selectors. However, training models on this type of data requires an appropriately chosen classifier given the sequential nature of the data. There are currently no clear guidelines for choosing the most appropriate classifier for algorithm selection using time-series data from the plethora of models available. To address this, we conduct a large benchmark study using 17 different classifiers and three types of trajectory on a classification task using the BBOB benchmark suite using both leave-one-instance out and leave-one-problem out cross-validation. In contrast to previous studies using tabular data, we find that the choice of classifier has a significant impact, showing that feature-based and interval-based models are the best choices.

11:00-13:15 Session 3D: Late-Breaking Abstracts (LBAs)
Location: Room D
11:00
Effect of Fitness Values Weighting on Clustering of Single-Objective Fitness Landscapes

ABSTRACT. Exploratory Landscape Analysis is a powerful technique for characterizing the fitness landscapes of optimization problems. This study evaluates the use of fitness histograms for characterizing black-box optimization problems, drawing on benchmark single-objective problems. Fitness histograms, while effective in capturing the distribution of fitness values, do not account for the relative importance of individual bins. To address this, we explore several weighting methods, particularly based on the TF-IDF statistic, to improve the histograms' discriminative power. The impact of these methods is assessed using clustering analysis, with promising results showing improved silhouette scores.

11:10
The Traveling Tournament Problem: Constraint Violations for Different MaxStreak Values

ABSTRACT. We systematically investigate the validity of randomly generated traveling tournament problem solutions under different values for one of its key constaints: the maxStreak, which controls consecutive game order. It turns out that the expected number of maxStreak violations closely scales with the maxStreak value, and, in the extreme case, can be even more prohibiting than any other constraint of the problem

11:20
The Traveling Tournament Problem: Valid Solutions are Very Different

ABSTRACT. Abstract. We generated and rendered all 160 valid solutions for the 4-team traveling tournament problem, and they look nothing alike. The average differences between two solutions is almost maximal, even when when a key constraint, the home/away designation, is completely ignored. If these fndings hold for larger numbers of teams, it could prohibit the use of metaheuristic algorithms for this problems altogether.

11:30
AI as Co-Creator: Reimagining Musical Authorship and Interaction in Human-AI Music Composition

ABSTRACT. This paper explores how human-AI collaboration reshapes creativity, au-thorship, and musical composition through Cyber Maze, a practice-based project where AI serves as an active co-creator. As AI-generated music be-comes increasingly integrated into creative workflows, this study examines the central question: How does human-AI collaboration redefine the music-making process and artistic agency? Drawing on Umberto Eco’s concept of openness and John Cage’s notion of indeterminacy, the project reimagines openness as an iterative, dynamic interplay between human intentionality and algorithmic unpredictability. Through an evolving exchange—where human inputs, vocal recordings, and compositional structures are trans-formed by AI and reinterpreted by the composer—the project highlights AI’s role not as an autonomous creator but as a catalyst for emergent musi-cal exploration. Rather than replacing artistic decision-making, AI intro-duces unexpected variations, prompting new creative directions and ex-panding the composer’s role from sole author to curator-facilitator. Cyber Maze ultimately argues that human-AI co-creation fosters a hybrid compo-sitional model where openness is realized through continuous interaction, negotiation, and reinterpretation between human intuition and machine-generated output. This study contributes to broader discussions on AI’s role in music, advocating for collaborative frameworks that embrace adapt-ability, uncertainty, and creative dialogue in human-machine partnerships.

11:40
Statistical-Mechanical Approach to Music: A Nature-Inspired Model for Rule-Free Composition

ABSTRACT. Music is a universal language, yet its fundamental ingredients are not objectively established. This paper introduces a statistical-mechanical framework for isolating and testing candidates for these ingredients. Music is modelled as an ensemble of time-frequency events, in analogy with materials as ensembles of atoms, characterised by the macro-properties of energy and entropy. We define energy as a measure of temporal dissonance or tension, and entropy as a measure of unexpectedness or surprise; two quantities that fluctuate over time and give rise to emotionally perceptible musical contours. The model demonstrates that music-like structures can emerge outside equilibrium, without relying on predefined rules or learned styles. Yet they conform to well-established subjective norms, such as those delineated by Cantus Firmus and, indeed, the melodic principles of Species Counterpoint more generally. Unlike mainstream AI-generated music, which relies on trained models and probabilistic interpolation, our system is ab initio, operating purely through algebraic transformations in the time-frequency domain, without predefined scales, chords, or rules. This shifts composition from the frequency-time domain to the tension-surprise domain, providing a more direct and accessible connection to emotional experience. The system contains no trainable parameters, producing entirely novel, unique, and genre-fluid compositions, rather than interpolating between existing musical data; thus, mitigating copyright risks commonly associated with AI music. Beyond offering insight into the nature-inspired mechanisms underlying musical emergence, the system functions as a ‘smart’ instrument, enabling real-time adaptability. This makes it particularly well-suited for applications in EEG-coupled neurofeedback, music therapy, gaming, and interactive media, where dynamic emotional expression is crucial. In contrast to common generative approaches that emphasise structure or style, compositional decisions using this system are made considering instantaneous variations of the emotionally relevant physical analogues. This may offer new tools for researchers and composers seeking expressive means beyond conventional theory or machine-learned imitation.

11:50
gem5/Z3/gcc/Clang/Redis glibc Heap Fitness Landscapes

ABSTRACT. We adapt ``The gem5 C++ glibc Heap Fitness Landscape'' W.B. Langdon and B.R. Bruce GI@ICSE 2025, to use Valgrind Massif on 1300000 line C++ gem5, on 600000 LOC C++ theorem prover Z3 and benchmarks from SMT-COMP 2024. Showing the memory landscape is far smoother than is commonly assumed and that Magpie and CMA-ES can tune GNU malloc giving 2.4 megabytes reductions in peak RAM use without coding changes. Similar results are given on the GCC and Clang LLVM compilers and 150000 LOC C Redis key-value database.

12:00
Testing Self-Organized Load Balancing in Distributed Systems

ABSTRACT. This paper investigates the dynamic load balancing capabilities of a sandpile-based heuristic under a ramp-up workload scenario. By simulating a gradual increase in task arrivals, emergent self-organizing behavior efficiently redistributes loads across processing elements (PEs) while minimizing energy usage. Experimental results demonstrate that the heuristic dynamically recruits resources in a near-optimal fashion, with energy consumption closely tracking the growing demand.

12:10
Multi-objective particle swarm optimization for environmental risk/benefit analysis

ABSTRACT. Hydropower is a fundamental renewable energy source, and the Amazon basin represents one of its largest untapped frontiers. However, its expansion in this ecologically sensitive region raises significant environmental challenges, especially concerning greenhouse gas emissions. In this paper, we develop a multi-objective optimization framework that employs a variant of the Multi-Objective Particle Swarm Optimizer to balance the competing objectives represented by the total electricity generation and the reduction of carbon emissions. We analyse a dataset of 509 dams, categorized by geographical and technical features, to assess the impact of site selection. We further inspect the key features of dams that compose the best configurations to maximize energy output while minimizing emissions. In such configurations, the dams are located in highland areas, offering flexible trade-offs and allowing planners to balance sustainability with energy demands. Decision-makers could take advantage of this work by adopting a strategic approach to hydropower expansion that prioritizes energy efficiency and environmental responsibility, showcasing the effectiveness of computational optimization in sustainable energy planning.

12:20
An Automated Financial Management System for Risk Budgeting Portfolio Optimization

ABSTRACT. We introduce a novel automated decision support system for portfolio optimization that maximizes a financial performance measure subject to cardinality, box, budget, and a set of risk budgeting constraints. First, we analyze the capability of the developed solver to identify feasible solutions. Then, we compare the proposed investment strategy to several common benchmark strategies to assess its profitability. The results show that our solver moves efficiently within the feasible region. Moreover, the risk budgeting-based model attains better ex-post financial performance compared to the equally weighted portfolio benchmark.

12:30
Extending Instance Space Analysis via Bipartite Network Communities

ABSTRACT. We investigate how to obtain human-understandable insights into the joint relationship between algorithm performance, algorithm parameters, and problem instance features. We propose a framework that integrates community detection, from network science, with Instance Space Analysis (ISA), via a bipartite network representation.

12:40
Image classification by evolving bytecode

ABSTRACT. We investigate the potential of evolving the bytecode of a biologically-inspired virtual machine as a plausible strategy for machine learning. We simulate evolution with the Zyme language and strand-based virtual machine. Our test problem is classifying handwritten digits from a subset of the MNIST dataset. Beginning with an initial program with performance no better than random guessing, we achieve consistent accuracy improvements through random mutations over 50 generations. Although these results fall short of state-of-the-art methods like neural networks, they demonstrate that adaptive mutations are found consistently and suggest the potential for evolving Zyme bytecode to competitively tackle the full MNIST task. This result also suggests the value of alternative virtual machines architectures in genetic programming, particularly those optimized for evolvability.

12:50
Differential Evolution for Optimizing Ensemble Weights in Multiclass Sentiment Classification

ABSTRACT. This work addresses the challenge of sentiment polarity classification in an unbalanced multiclass setting. We focus on the Spanish TASS corpus, where the test set is significantly larger than the training set and the class distribution is skewed. Traditional classifiers such as Naive Bayes, Logistic Regression, and SVM perform modestly under these conditions. However, we propose an ensemble approach that uses Differential Evolution to optimize the combination weights, leading to improved performance. Our method outperforms previous baselines on the TASS General corpus without relying on complex deep learning techniques.

13:00
Using SHAP to visualize Pre-Match Outcome Prediction in Dota 2

ABSTRACT. Predicting outcomes in competitive video games like Dota 2 is crucial for strategic planning in e-sports. This study employs machine learning (ML) models—Random Forest, Gradient Boosting, and Logistic Regression—to predict match outcomes based solely on hero selection data. Using the OpenDota API, we collected and preprocessed 4,500 matches, evaluating models via accuracy, precision, recall, and F1-score. Random Forest achieved 98\% accuracy, outperforming other models. Explainability techniques, particularly SHAP (SHapley Additive exPlanations), revealed key heroes influencing predictions. This work highlights the potential of explainable AI (XAI) in gaming, offering insights for players and developers to optimize strategies.

13:10
Multi-Modal Fusion Techniques for Detecting Abnormal Events in Videos

ABSTRACT. Detecting abnormal events in videos plays a crucial role in enhancing security and public safety, particularly in environments with extensive surveillance systems. With the exponential growth of video data and the demand for real-time anomaly detection, deep learning has emerged as a key approach to improving detection accuracy. However, current approaches to anomaly detection face several challenges in effectively leveraging multimodal information. As abnormal behaviors can manifest through multiple modalities such as voice, image, pose, and semantics, effective multi-stream feature extraction and fusion techniques are essential. This paper introduces a cross-modal attention and gating network that improves feature alignment and fusion for anomaly detection. Experimental results on benchmark datasets demonstrate the effectiveness of our approach, achieving an average precision (AP) of 84.98% on the XD-Violence dataset.

14:15-16:05 Session 4A: EuroGP Best Paper nominations
Location: Room A
14:15
Designing Lookahead Relocation Rules for the Container Relocation Problem with Genetic Programming

ABSTRACT. The container relocation problem is an important combinatorial optimisation problem commonly found in warehouses and container ports. The goal of this problem is to retrieve all of the containers from the yard with the fewest container relocations between the stacks. Since the problem is NP-hard, various heuristics have been proposed to solve it, among which relocation rules (RRs) are simple constructive heuristics that incrementally construct the solution. However, it is quite difficult to design such RRs manually, so genetic programming has often been applied to design new RRs automatically. A significant problem with RRs, whether manually or automatically designed, is that they usually have a limited view of the problem. This means that they will often make decisions that can negatively influence the future, meaning that this decision would cause several additional relocations. This is usually caused by the fact that other containers are not well located. Therefore, this study investigates different relocation schemes that can be used within RRs to obtain rules with lookahead ability. These rules will enable containers to be relocated based on future information and, thus, arranged better in the yard. For that purpose, three novel routing schemes for automatically designed RRs are defined and evaluated on an existing problem set. The results demonstrate that integrating additional elements to evolve lookahead RRs can significantly improve the results.

14:40
A Systematic Evaluation of Evolving Highly Nonlinear Boolean Functions in Odd Sizes

ABSTRACT. Boolean functions are mathematical objects used in diverse applications. Different applications also have different requirements, making the research on Boolean functions very active. In the last 30 years, evolutionary algorithms have been shown to be a strong option for evolving Boolean functions in different sizes and with different properties. Still, most of those works consider similar settings and provide results that are mostly interesting from the evolutionary algorithm's perspective. This work considers the problem of evolving highly nonlinear Boolean functions in odd sizes. While the formulation sounds simple, the problem is remarkably difficult, and the related work is extremely scarce. We consider three solutions encodings and four Boolean function sizes and run a detailed experimental analysis. Our results show that GP outperforms other EA in evolving highly nonlinear functions. Nevertheless, the problem is challenging, and finding optimal solutions is impossible except for the smallest tested size. However, once we added local search to the evolutionary algorithm, we managed to find a Boolean function in nine inputs with nonlinearity 241, which, to our knowledge, had never been accomplished before with evolutionary algorithms.

15:05
Multi-Objective Evolutionary Design of Explainable EEG Classifier
PRESENTER: Martin Hurta

ABSTRACT. Deep neural networks (DNNs) have achieved impressive results in many fields. However, the use of black-box solutions based on DNNs in healthcare can be problematic, as they do not provide physicians with any information about the underlying principles of their behavior. For those reasons, we propose a new method for the evolutionary multi-objective design (MOD) of small and potentially explainable EEG (Electroencephalography) signal classifiers. We evaluate a combination of genetic algorithm (GA) for feature selection with multiple algorithms for the automated design of classifier, such as Support Vector Machine (SVM), k-Nearest Neighbors (k-NN) and Naive Bayes. To further improve the classification quality and obtain small and potentially explainable solutions usable by medical experts, we compare three different MOD scenarios targeting the accuracy, specificity, sensitivity, and the number of used features. In addition, we evaluate the use of Cartesian Genetic Programming (CGP) in connection with the compositional co-evolution of selected features as a way to achieve smaller and more interpretable solutions in a reasonable time. The proposed methods are experimentally evaluated on tasks of alcohol use disorder (AUD) and major depressive disorder (MDD) classification. Experimental results show that both newly proposed MOD scenarios lead to significantly better trade-offs between the accuracy and the number of features than the state-of-the-art method employing the NSGA-II algorithm. The proposed co-evolution of features (evolved by GA) and classifier (evolved by CGP) led to 20-100 times faster convergence than the baseline CGP-based approach and allowed the design of small and potentially explainable solutions.

15:30
Population Diversity, Information Theory and Genetic Improvement
PRESENTER: David Clark

ABSTRACT. We evolve the triangle.c C benchmark program to make it 16% faster using the genetic improvement (GI) Magpie tool and gzip. gzip is used to produce algorithmic information (Kolmogorov Complexity) based measures of diversity. In each generation we remove programs of average fitness that contribute less to population diversity. We calculate diversity via approximations to the Normalised Compression Distance on Multisets (NCDm) using both Cohen and Vitanyi's O(n*n) approach and our own, O(n) method, finding the cheaper, O(n), is equally good.

15:55
Unified Piecewise Symbolic Regression

ABSTRACT. Symbolic Regression (SR) searches for a closed-form mathematical expression describing the relationship between input and output features in data. The main theoretical draw of SR compared to traditional black-box regression techniques is that the learned models should be interpretable by design. However, typical SR methods struggle to discover sparse and accurate models when the shape of the output varies locally, depending on the values of some input features. Given that this is a common occurence in physics, SR should be able to learn piecewise models if needed. In this work, we introduce a new piecewise SR framework called Unified Piecewise Symbolic Regression (UPSR). UPSR simultaneously partitions the input space into subregions and learns local regressors for each subregion, forming a global model unifying all subregions. We demonstrate the effectiveness of the approach on a large synthetic SR benchmark containing both piecewise and non-piecewise data structures. UPSR is shown to outperform state-of-the-art piecewise SR approaches, both qualitatively and quantitatively.

14:15-16:05 Session 4B: EvoMUSART: Music and Sound (i)
Location: Room B
14:15
Towards Human-Quality Music Accompaniment using Deep Generative Models and Transformers

ABSTRACT. Automatic music generation, particularly accompaniment, poses unique challenges due to the need for responsiveness to other instruments. We present a system that accompanies bass guitar players with AI-generated drum tracks using Conditional Generative Adversarial Networks (CGANs) trained on multi-track songs to capture bass guitar-drum interactions, with our contribution focused on enhancing the expressiveness of the generated drum tracks. To enhance expressiveness and human-like performance, a Transformer model trained on human-performed drum recordings assigns velocities (the loudness of each drum strike) to the generated drum tracks. Both models are designed for real-time interaction, enabling live jamming sessions. Simplifications facilitate real-time operation, and we provide results from sample sessions. We evaluate the generated music using objective metrics, demonstrating the models' performance and evolution during training.

14:40
Exploiting the Temporal Order of Sound Features for Onset Detection

ABSTRACT. Musical onset detection, a cornerstone in automatic music transcription, involves identifying the precise moments when notes or sounds commence. This paper proposes a neural network architecture for this task that exploits the temporal order of features extracted from spectrogram context windows. The architecture integrates convolutional layers for spectral feature extraction with recurrent layers to capture sequential temporal patterns. Evaluated on the Böck dataset, the proposed method achieves results comparable to the state-of-the-art convolutional neural network models and surpasses other approaches leveraging temporal information, such as convolutional recurrent neural networks and temporal convolutional networks. These results underline the efficacy of the proposed architecture in capturing complex temporal dependencies inherent in musical onsets.

15:05
All YIN No YANG: Geometric abstraction of oil paintings with trained models, noise and self-reference

ABSTRACT. The rapid development of Transformer models and the declarative nature of interfaces developed for the public require automation methods, where media production can harness natural language as a mode of representation but not necessarily of interaction with humans. This article describes an image-to-video Diffusion system which removes practitioners from the process of defining prompts when producing images with conditional reference, documenting a set of results with a custom dataset of oil paintings. Our research focuses on the appropriation of trained model ensembles that are coordinated to produce indefinite sets of frames with occasional human intervention utilising a timeline-based architecture. The proposed system automates a CLIP-guided DDPM with a supplementary depth estimation model and through a set of compositing techniques we found that results with coincidental and diverging descriptions can be useful for moving-image element composition. Visually focusing on the human figure and its morphological transformation, we also sketch the ways in which our practice can be seen as a material exploration of Gilbert Simondon’s philosophy of individuation, to further articulate the co-creative potential of learning to reverse from noise as part of a computational arts practice.

15:15
Music Similarity Through Geometric Overlap

ABSTRACT. This paper presents an approach for determining similarity between geometrically represented polyphonic patterns. This approach utilises a proximity graph representation of polyphonic music from which these patterns are extracted. Extracted patterns then have an outline constructed around them and pattern similarity is determined by finding the maximal overlap of the patterns outline and determining the ratio of the intersection of the outlines to the union. A graph is constructed by first representing music notes as line segments on a Cartesian plane. Polygons are placed around each line segment; these polygons are used to determine nearby notes (those located within the polygon). All notes are represented as nodes in a graph; two notes are joined via an edge if they occur within the same polygon. Patterns are determined by simple paths of length N in the graph.The extracted patterns are used to determine the similarity between works of music. This approach yielded positive results and further work is required to determine its effectiveness on a larger and more varied dataset.

15:25
Graph Neural Network vs Feature-based Folk Music Evolution Analysis

ABSTRACT. This paper explores changes in a novel graph-structured corpus of British folk music across time, to discover whether any evidence for evolution can be found. Feature-based approaches are compared to graph-neural network (GNN) models. Firstly, a large dataset of over 13,000 dated British folk tunes is collected and pitch and rhythm vectors are extracted. This dataset will be made publicly available for future research. To ensure an even class distribution, 1000 tunes are considered henceforth in two datasets, with tunes grouped into 50-year or 25-year time periods respectively. Exploratory analysis is undertaken with the K-means algorithm on pitch and rhythm musicological descriptors extracted from each tune, revealing ill-defined clusters corresponding to tunes of the same time period, with significant overlap implying that any differences across time periods are more high-dimensional than can be represented by simple features. However, it does seem that there are broad differences across 100-year periods. A graph based upon similarity between tunes is then constructed by creating edges using Euclidean distance between tune-vectors. Louvain community detection illustrates ill-defined communities in terms of time-period, with no clear evolutionary trends. Graph Convolutional Neural Network (GCN) and GraphSAGE models are trained on the two datasets and are found to perform above chance for detecting the time period of tunes. Peak performance reaches 57\% accuracy for the GCN model trained on the 50-year class dataset, indicating differences between time periods within British folk music. However, evidence for evolution is tenuous. The GCN embedding space approximately indicates that classes between 1700 and 1900 are chronologically ordered, yet we do not see consistent misclassification to nearby time periods.

15:35
Automated Selection and Ordering of Clip Sequences for Music Videos based on Tonal Tension and Visual Features

ABSTRACT. Creating a musical video that contains a sequence of several clips can be a hard task. Selecting, ordering, and simultaneously considering the tension, feeling, or emotions happening in a musical piece can be difficult. This work relates the tonal tension of a musical piece with the visual and color characteristics of a set of clips. Moreover, the motion of components, the saturation level, and the colorfulness are considered in each clip. The idea is to tackle this problem as a combinatorial optimization problem, where the distance between tonal tension and visual features can be optimized considering different criteria. Here, a local search method is proposed and evaluated to optimize visual features considering real clips recorded with a cellphone and well-known musical pieces. Also, the K-predominant color distance between neighbor clips is minimized to avoid abrupt changes in the selected clip sequence. Results show that the proposal can successfully select and order a subset of clips relating the tonal tension with the mentioned features. Also, scenarios with different importance of the visual clip features are presented and analyzed to evaluate the flexibility of the proposed approach.

15:45
EmotioNotes Dataset: Decoding emotions in classical music through Concert Program Notes

ABSTRACT. Understanding emotion in music is an important task in music information retrieval with a wide range of applications in music recommendation, playlist generation, and music therapy. However, the lack of large, public dataset of music and emotion is a major roadblock for researchers. In this paper, we leverage existing music resource − concert program notes − to create a large novel dataset of emotion labels for works of classical music to serve as a useful asset for research in music emotion recognition. We collect program notes from the New York Philharmonic Society’s archive as unorganized text documents. We use an open source large language model, LLaMA 3.1, with task-specific prompting to extract program notes for corresponding musical works. From a total of 14,743 documents, we extracted 34,580 program notes and annotated them with emotion values determined by the textual content. We use a dataset of emotion values of individual English words to estimate valence, arousal, and dominance scores for each program note. Finally, we present an emotion recognition model using another LLM, Longformer trained on our dataset to estimate valence, arousal, and dominance scores of the program notes. We achieved Pearson’s correlation of 0.899, 0.860, and 0.807 for valence, arousal, and dominance, respectively.

15:55
SyMuRBench: Benchmark for symbolic music representations

ABSTRACT. Symbolic music representation plays a crucial role in various applications, including music information retrieval and music generation. However, the current landscape lacks a comprehensive and publicly available benchmark for evaluating vector representations of symbolic music. To address this gap, we introduce SyMuRBench, a versatile and customizable benchmark designed specifically for evaluating vector representations of MIDI files. SyMuRBench evaluates models on several key tasks, including multiclass classification, multilabel classification, and score-performance retrieval. By providing a standardized framework for assessing the quality and utility of different vector representations, SyMuRBench aims to facilitate advancements in symbolic music information retrieval.

14:15-16:05 Session 4C: EvoApplications: Computational Intelligence for Sustainability
Location: Room C
14:15
A Multi-Agent System for Optimal Train Scheduling in Single-Track Railways

ABSTRACT. Efficient train scheduling on single-track railways represents a significant challenge due to operational constraints. Multiple trains share the same track and an optimal schedule must ensure that trains traveling in opposite directions do not collide while minimising delays. This paper proposes a novel approach to address this problem by formulating train scheduling as a distributed constraint satisfaction problem and applying a multi-agent system to solve it. We propose a simple, yet efficient, system in which agents cooperate to schedule trains on a singletrack railway. The result shows that our system is reliable and fast in comparison to other popular approaches.

14:40
A Genetic Algorithm Approach for Aggregation of Residential Electricity Prosumers’ Flexibility

ABSTRACT. A genetic algorithm is implemented in this paper in combination with a mixed integer linear programming solver to a bilevel (hierarchical) optimization model to define the optimal rewards for the aggregation of flexibility of residential prosumers. The flexibility responsiveness of the prosumers is calculated using the mixed integer linear programming model at the prosumer level. The instantiations of rewards are created by each generation of the genetic algorithm at the aggregator level, and they are fed into the prosumer’s problem. The aggregator, as the leader of the bilevel problem, takes the first step and sends the optimal reward signal to maximize its profit. The prosumers, as the followers, respond to the reward signal by defining the optimal flexibility that minimizes their energy cost. This way, the aggregator can characterize and collect the flexibility of prosumers and define the bids to be traded in the reserve services markets. The illustrative results show the performance of the proposed approach in defining the optimal reward accounting for the aggregator’s and prosumers’ interests.

15:05
A PSO-based MPPT with Dynamic Monitoring Reset for PV Systems

ABSTRACT. Photovoltaic (PV) systems have become essential in the transition to sustainable energy solutions. However, these systems face challenges in maximizing energy extraction, particularly under partial shading conditions. This study introduces an advanced Maximum Power Point Tracking (MPPT) approach that combines the Particle Swarm Optimization (PSO) algorithm with a Dynamic Monitoring Reset (DMR) mechanism. The DMR enhances the PSO’s ability to adapt and maintain tracking accuracy in rapidly changing shading environments, providing robust tracking of the Global Maximum Power Point (GMPP). Simulation results demonstrate that the PSO-DMR method significantly improves PV system efficiency, reducing energy losses and optimizing performance under variable and complex shading patterns.

15:30
Fair Ambulance Allocation via Multi-Objective Evolutionary Optimization

ABSTRACT. The first contact between a patient and the healthcare system often occurs through pre-hospital care, administered by emergency medical service (EMS). EMS is inherently costly, and costs increase significantly with the number of ambulances, staff, and depots (or stations). At the same time, it is essential to achieve fairness goals set for EMS response times. These goals typically vary between different incident types (acute versus urgent) and different types of regions of a country (rural versus urban). The novelty of this work is mainly to explicitly formulate the varying fairness goals as different objectives for multi-objective optimization. We study different methods, including single- and multi-objective metaheuristic optimization algorithms, to identify efficient allocation strategies. Our empirical study, which is based upon an incident dataset provided by the Emergency Medical Communication Center (EMCC) department of Oslo University Hospital (OUH) and the Norwegian National Advisory Unit for Prehospital Emergency Medicine (NAKOS), demonstrates the benefit in terms of fairness of multi-objective formulation relative to the alternatives.

15:55
An innovative approach for managing the water requirements of fig trees using artificial intelligence

ABSTRACT. Mainly rain-fed, irrigation is boosting production in new fig plantations. However, irrigation strategies also present challenges in water conservation. This research evaluates the application of non-linear regression models and artificial neural networks to analyze data collected from a set of sensors, and manual measurements to predict the water requirements of a fig orchard. This work assesses and compares the performance of various machine learning algorithms and artificial neural networks using data collected during the application of two water treatments listed as Control and RDS. The data were first analyzed to determine the relationship between variables. Then, the GridSearchCV technique was used to identify the best hyperparameter values for each algorithm, and finally, different machine learning techniques are applied. Preliminary results indicate that the designed algorithms provide accurate predictions for the water requirements of the fig tree. Machine learning algorithms reach Adj_R2 values ranging from 0.87−0.96 and 0.76−0.93 for Control and RDS treatments, respectively. Results using artificial neural networks ranged from 0.84−0.87 in both water treatments. This demonstrates a good fit of the precision models.

14:15-16:05 Session 4D: EvoApplications: EvoLLMs
Location: Room D
14:15
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing

ABSTRACT. Coupling Large Language Models (LLMs) with Evolutionary Algorithms has recently shown significant promise as a technique to design new heuristics that outperform existing methods, particularly in the field of combinatorial optimisation. An escalating arms race is both rapidly producing new heuristics and improving the efficiency of the processes evolving them. However, driven by the desire to quickly demonstrate the superiority of new approaches, evaluation of the new heuristics produced for a specific domain is often cursory: testing on very few datasets in which instances all belong to a specific class from the domain, and on few instances per class. Taking bin-packing as an example, to the best of our knowledge we conduct the first rigorous benchmarking study of new LLM-generated heuristics, comparing them to well-known existing heuristics across a large suite of benchmark instances using three performance metrics. For each heuristic, we then evolve new instances `won’ by the heuristic and perform an instance space analysis to understand where in the feature space each heuristic performs well. We show that most of the LLM heuristics do not generalise well when evaluated across a broad range of benchmarks in contrast to existing simple heuristics, and suggest that any gains from generating very specialist heuristics that only work in small areas of the instance space need to be weighed carefully against the considerable cost of generating these heuristics.

14:40
Evolutionary Bias Identification with Embeddings

ABSTRACT. This paper introduces EBIE (Evolutionary Bias Identification with Embeddings), a new method to help tackle algorithmic bias in natural language processing (NLP) tasks. The method leverages the powerful representation of word embeddings through an evolutionary algorithm, focusing on classification tasks. EBIE monitors shifts in individual embedding dimensions over generations and, by tracking these dimensional changes, identifies which parts of the embedding are most responsive to changes performed by genetic operations. These insights reveal critical features that influence model decisions and expose latent biases embedded within NLP classifiers. Through correlation analysis between individual tokens and classification scores, EBIE uncovers systematic biases in model behavior, such as reliance on stereotypical markers and neglect of nuanced expressions. By uncovering these tendencies, our methodology provides actionable insights to refine model training, enhance fairness, and improve robustness. Its flexibility ensures broad applicability across various NLP tasks, offering a powerful and versatile framework for developing more equitable and transparent machine learning systems.

15:05
Controlling the Mutation in Large Language Models for the Efficient Evolution of Algorithms

ABSTRACT. The integration of Large Language Models (LLMs) with evolutionary computation (EC) has introduced a promising paradigm for automating the design of metaheuristic algorithms. However, existing frameworks, such as the Large Language Model Evolutionary Algorithm (LLaMEA), often lack precise control over mutation mechanisms, leading to inefficiencies in solution space exploration and potentially suboptimal convergence. This paper introduces a novel approach to mutation control within LLM-driven evolutionary frameworks, inspired by theory of genetic algorithms. Specifically, we propose dynamic mutation prompts that adaptively regulate mutation rates, leveraging a heavy-tailed power-law distribution to balance exploration and exploitation. Experiments using GPT-3.5-turbo and GPT-4o models demonstrate that GPT-3.5-turbo fails to adhere to the specific mutation instructions, while GPT-4o is able to adapt its mutation based on the prompt engineered dynamic prompts. Further experiments show that the introduction of these dynamic rates can improve the convergence speed and adaptability of LLaMEA, when using GPT-4o. This work sets the starting point for better controlled LLM-based mutations in code optimization tasks, paving the way for further advancements in automated metaheuristic design.

15:30
Open and Closed-source Models for LLM-generated Metaheuristics Solving Engineering Optimization Problem

ABSTRACT. This paper explores the applicability of generative AI (genAI), specifically Large Language Models (LLMs), for the automatic generation and configuration of metaheuristic algorithms to address a real-world engineering problem: the optimal parameter estimation of time-delay systems in interconnected heating-cooling loops. The study introduces a pioneering workflow and iterative architecture with feedback within the emerging field of genAI-driven optimization for real optimization problems, eliminating the need for manually crafted or modified algorithms. This automated system empowers domain experts in engineering to solve complex optimization problems with minimal knowledge of optimization algorithms, lowering the barrier to entry for sophisticated algorithm use. We demonstrate how LLMs can generate effective optimizers under conditions like connstrained optimization problems where the solution lies near the boundaries of the search space. Four state-of-the-art LLMs (closed and open-sourced) have been selected for experiments. These are GPT-4o, GPT-4o mini, Claude Sonnet 3.5 and Llama 3.1. All studied LLMs generated metaheuristics that outperformed the initialization baseline optimization method (Random Search and CMA-ES). Notably, the Claude Sonnet 3.5 model generated a metaheuristic with the best mean results, almost matching the performance of the tuned state-of-the-art DISH algorithm, as an example of adaptive Differential Evolution.

15:40
Probing LLMs on Optimization Problems: Can They Recall and Interpret Problem Features?

ABSTRACT. In this study, we explore the ability of Large Language Models (LLMs) to understand and recall features associated with combinatorial optimization problems in both Natural Language Processing (NLP) and code contexts. By probing LLMs using a diverse set of optimization problem instances, we aim to evaluate the models' ability to accurately extract and reason about key attributes, such as parameters and features. Our methodology involves both code-like and extended NLP-based prompts for the models and instructing these models to identify specific features from the provided problem instances. The results reveal that while LLMs exhibit some capacity to identify and extract information, they fail to recall 100% of even the simplest features consistently present within the text. This limitation underscores the current challenges LLMs face in precise reasoning and feature extraction tasks, suggesting the need for further refinement in their interpretability and understanding capabilities when applied to structured problem-solving domains.(1)

(1) Relevant data and code are available at the following link: https://osf.io/fw6ta/?view_only=d8e63cdda6bd409b83aa3d9a4b025b06

16:25-17:45 Session 5A: EvoMUSART Best Paper Nominations
Location: Room A
16:25
Yin-Yang: Developing Motifs With Long-Term Structure And Controllability

ABSTRACT. Transformer models have made great strides in generating symbolically represented music with local coherence. Yet, controlling the development of motifs in a structured way with global form remains an open research area. One of the reasons for this challenge is due to the note-by-note autoregressive generation of such models which lack the ability to correct themselves after deviations from the motif. In addition, their structural performance on datasets with shorter durations has not been studied in the literature. In this study, we propose Yin-Yang, a framework consisting of a phrase generator, phrase refiner and phrase selector models for the development of motifs into melodies with long-term structure and controllability. The phrase refiner is trained on a novel corruption-refinement strategy which allows it to produce melodic and rhythmic variations of an original motif at generation time, thereby rectifying deviations of the phrase generator. We also introduce a new objective evaluation metric for quantifying how smoothly the motif manifests itself within the piece. Evaluation results show our model achieves better performance compared to state-of-the-art transformer models while having the advantage of being controllable and making the generated musical structure semi-interpretable, paving the way for musical analysis.

16:50
AI in Music and Healthcare: A Comparative Survey

ABSTRACT. In this paper, we compare the views of people on the use of AI in Music to its use in the safety-critical domain of Healthcare. We tested this by circulating two similar but independent surveys, one asking for views on the use of AI in Music and one on the use of AI in Healthcare. In comparing the results, we found that in general, people were more supportive of AI in Healthcare as 72% of respondents to the Healthcare survey Agreed or Strongly Agreed that AI is beneficial, where-as just 49% of respondents to the Music survey gave this response. When posed with a statement on the Ethical risks of the use of AI 42% of respondents to the Music survey, as opposed to just 21% of the Healthcare survey, Strongly Agreed that Ethical risks are an issue. We considered all responses in relation to domain expertise and gender. We also posed some statements around emotional responses to the use of AI in the given domains. While the average person did not agree strongly with these emotional statements, we did notice some interesting trends, such as Musicians responding as the most Uncomfortable and Angry at the use of AI in Music.

17:15
Large-image Object Detection for Fine-grained Recognition of Punches Patterns in Medieval Panel Painting

ABSTRACT. The attribution of the author of an art piece is typically a laborious manual process, usually relying on subjective evaluations of expert figures. However, there are some situations in which quantitative features of the artwork can support these evaluations. The extraction of these features can sometimes be automated, for instance, with the use of Machine Learning (ML) techniques. An example of these features is represented by repeated, mechanically impressed patterns, called punches, present chiefly in 13th and 14th-century panel paintings from Tuscany. Previous research in art history showcased a strong connection between the shapes of punches and specific artists or workshops, suggesting the possibility of using these quantitative cues to support the attribution. In the present work, we first collect a dataset of large-scale images of these panel paintings. Then, using YOLOv10, a recent and popular object detection model, we train a ML pipeline to perform object detection on the punches contained in the images. Due to the large size of the images, the detection procedure is split across multiple frames by adopting a sliding-window approach with overlaps, after which the predictions are combined for the whole image using a custom non-maximal suppression routine. Our results indicate how art historians working in the field can reliably use our method for the identification and extraction of punches.

16:25-17:45 Session 5B: EvoApplications: Edge, fog &cloud + IASP&PR
Location: Room B
16:25
Multi-Purpose Image Filter Evolution Using Cellular Automata and Function-Based Conditional Rules

ABSTRACT. A variant of Evolution Strategy is applied to design transition functions for cellular automata using a newly proposed representation denominated as function-based conditional rules. The goal is to train the cellular automata to eliminate various types of noise from digital images using a single evolved function. The proposed method allowed us to design high-quality filters working with 5-pixel neighbourhood only which is substantially more efficient than 9 or even 25 pixels used by most of the existing filters. We show that salt-and-pepper noise and random noise of several tens of percentages intensity may successfully be treated. Moreover, the resulting filters have also shown an ability to filter impulse-burst noise for which they were not trained explicitly. Finally we demonstrate that our filters are capable to tackle with up to 40% random noise where most of existing filters fail.

16:50
A Communication-aware and Energy-efficient Genetic Programming based Method for Dynamic Resource Allocation in Clouds

ABSTRACT. Loosely coupled microservices have emerged as a new paradigm for efficiently deploying applications in clouds. However, dynamic resource allocation in clouds introduces significant challenges to microservice application deployment. On the one hand, frequent invocations between microservices may lead to substantial communication overhead if microservices are not allocated properly. On the other hand, the increasing number of microservices in modern applications make it very challenging to minimize the energy consumption of a cloud data center as it introduces a bi-level optimization problem with an extremely large search space. In this paper, we propose a new communication-aware and energy-efficient genetic programming based method that automatically learns heuristics for dynamic resource allocation to jointly minimize the communication overhead and the energy consumption. Comprehensive experiments using real-world datasets show that our proposed method can evolve effective heuristics that noticeably outperform existing approaches for dynamic microservice deployment in clouds.

17:15
A Genetic Algorithm-Based Parameter Selection for Communication-Efficient Federated Learning
PRESENTER: Mir Hassan

ABSTRACT. Federated Learning (FL) enables decentralized model training without centralized data collection, but high communication overhead remains a key challenge, particularly in bandwidth-constrained environments like IoT and edge networks. Existing FL methods rely on transmitting full model updates, leading to high communication costs. In this paper, we propose Genetic Algorithm-based Selective Parameter Updates (GASPU), a novel approach that uses a Genetic Algorithm (GA) to selectively transmit model parameter updates, significantly reducing communication overhead while maintaining competitive accuracy. GASPU optimizes binary masks allowing only the most effective parameters to be sent. We validate the GASPU approach on the HAR and KWS datasets, which are representative of realistic FL settings. While achieving a 66% reduction in communication overhead over 100 communication rounds on HAR (from 0.49 MB to 0.16 MB) and a reduction to (from 1.27 MB to 0.44 MB) on KWS, GASPU maintained competitive accuracy with only 10% drop. Existing methods achieve higher accuracies (above 80%) but at significantly higher communication costs. Further experiments on the MNIST benchmarking dataset confirm GASPU's generalizability, achieving only a 0.25% drop in accuracy and a 52% reduction in communication overhead over 10 communication rounds.

16:25-17:45 Session 5C: EvoApplications: Advances in EC (ii)
Chair:
Location: Room C
16:25
Trace-Elites: better Quality-Diversity with Multi-Point Descriptors

ABSTRACT. MAP-Elites (ME) has been shown to be a successful way for solving quality-diversity (QD) optimization problems, i.e., those where the goal is to obtain many diverse solutions of high quality, rather than just one single solution. ME achieves diversity by organizing the population in an archive, where solutions are indexed by a descriptor, i.e., a function mapping each solution to a point in a p-dimensional space. There are however cases where mapping a solution to a single point is not enough for describing it: for instance, when the descriptor should capture the behavior of a robotic agent during a simulation and this behavior has many relevant facets. In this paper, we propose a simple modification of the standard ME which addresses this limitation by employing a descriptor which, in general, maps every solution to one or more points in R^p. We call this novel extension Trace-Elites (TE), as the image of a solution in the descriptor space extends across several points, hence corresponding to a sort of trace left by the solution. We experimentally assess the effectiveness of TE on a set of QD problems consisting in the optimization of a controller of a simulated robot which is required to navigate an arena. We show that TE outperforms ME in effectiveness (in terms of both quality and diversity) and efficiency. We also show that, at least in the specific problem considered here, the visualization of the archive evolved by TE gives more insights about the problem than ME ones, potentially permitting a more informed choice of the solutions by the designer.

16:50
Climbing the tower of meta-mutations - the role of higher-order mutations

ABSTRACT. Recently, the field of meta-learning, often described as "learning how to learn", has been gaining significant research attention. Its generalization is given by higher-order meta-learning, where multiple levels (orders) of meta-parameters are stacked, leading to systems that learn how to learn how to learn, and so on. Higher-order mutation rates represent an instance of this paradigm used within evolutionary computation. Under such a scheme, the mutation rate of the k-th order mutation rate is determined by the (k+1)-th order mutation rate, and so forth, continuing up to the top order n. In the self-referential variant, the top meta-mutation rate is mutated by itself, thereby removing the need for an additional hyperparameter. While initial experiments employing higher-order mutation rates have yielded promising results, especially in dynamic and adversarial settings, a comprehensive analysis is so far lacking. To address this gap, we provide an empirical study with a focus on interpreting the behavior of higher-order mutation rates, including self-referential ones, under varying selective pressure dynamics (i.e., fitness functions).

17:15
Stalling in Space: Attractor Analysis for any Algorithm

ABSTRACT. Network-based representations of fitness landscapes have grown in popularity in the past decade; this is probably because of grow- ing interest in explainability for optimisation algorithms. Local optima networks (LONs) have been especially dominant in the literature and capture an approximation of local optima and their connectivity in the landscape. However, thus far, LONs have been constructed according to a strict definition of what a local optimum is: the result of local search. Many evolutionary approaches do not include this, however. Popular al- gorithms such as CMA-ES have therefore never been subject to LON analysis. Search trajectory networks (STNs) offer a possible alternative: nodes can be any search space location. However, STNs are not typically modelled in such a way that models temporal stalls: that is, a region in the search space where an algorithm fails to find a better solution over a defined period of time. In this work, we approach this by systemati- cally analysing a special case of STN which we name attractor networks. These offer a coarse-grained view of algorithm behaviour with a singular focus on stall locations. We construct attractor networks for CMA-ES, differential evolution, and random search for 24 noiseless black-box opti- misation benchmark problems. The properties of attractor networks are systematically explored. They are also visualised and compared to tradi- tional LONs and STN models. We find that attractor networks facilitate insights into algorithm behaviour which other models cannot, and we advocate for the consideration of attractor analysis even for algorithms which do not include local search.

16:25-17:45 Session 5D: EuroGP (i)
Location: Room D
16:25
Introducing Crossover in SLIM-GSGP

ABSTRACT. The Semantic Learning algorithm based on Inflate and deflate Mutations (SLIM-GSGP, or simply SLIM) is a variant of Geometric Semantic Genetic Programming (GSGP) designed to generate compact and interpretable models while maintaining the beneficial characteristic of GSGP of inducing an error surface without local optima. To date, no crossover operator has been defined for SLIM and the existing SLIM framework relies solely on two mutation operators: inflate and deflate mutation. This paper introduces two novel crossover operators for SLIM: Swap Crossover (XOSw) and Donor Crossover (XODn). These crossovers capitalize on SLIM’s linked-list representation to facilitate genetic exchange while controlling program size. Experimental results on five symbolic regression problems demonstrate that the new crossover operators often improve fitness and reduce model size when compared to standard SLIM and to GSGP. Our findings establish these operators as solid improvements of traditional GSGP crossover.

16:50
Exploring the Impact of Data Scale on Mutation Step Size in SLIM-GSGP

ABSTRACT. The Semantic Learning algorithm based on Inflate and deflate Mutation (SLIM) is a promising recent variant of Geometric Semantic Genetic Programming (GSGP) that introduces a new Deflate Geometric Semantic Mutation (DGSM). This operator maintains the key feature of the standard Geometric Semantic Mutation (GSM), inducing a unimodal error surface for any supervised learning problem, while generating smaller offspring than their parents, and thus allowing SLIM to generate compact, and potentially interpretable, final solutions. A key parameter controlling the evolution process in both GSGP and SLIM is the Mutation Step (MS), which regulates the extent of perturbation to the parent semantics. While it is intuitive that the optimal value of MS has a relationship with the scale of the dataset features, to the best of our knowledge no prior research has extensively explored this relationship. In this work, we provide the first comprehensive investigation into this topic. First, we hypothesize a general rule by analyzing results from artificial datasets, and then we confirm these findings with more complex, real-world datasets. This approach offers a solid alternative to the typical hyperparameter tuning approach.

17:15
Was Tournament Selection All We Ever Needed? A Critical Reflection on Lexicase Selection

ABSTRACT. The success of lexicase selection has led to various extensions, including its combination with down-sampling, which further increased performance. However, recent work found that down-sampling also leads to significant improvements in the performance of tournament selection. This raises the question of whether tournament selection combined with down-sampling is the better choice, given its faster running times. To address this question, we run a set of experiments comparing epsilon-lexicase and tournament selection with different down-sampling techniques on synthetic problems of varying noise levels and problem sizes as well as real-world symbolic regression problems. Overall, we find that down-sampling reduces overfitting and improves performance even when compared over the same number of generations. This means that down-sampling is beneficial even with way fewer fitness evaluations. Additionally, down-sampling successfully reduces bloating behavior. We observe that population diversity increases for tournament selection when combined with down-sampling. Further, we find that tournament selection and epsilon-lexicase selection with down-sampling perform similar, while tournament selection is significantly faster. We conclude that tournament selection should be further analyzed and improved in future work instead of only focusing on the improvement of lexicase variants.

17:25
The Role of Stepping Stones in MAP-Elites: Insights from Search Trajectory Networks

ABSTRACT. MAP-Elites (ME) is a quality-diversity optimization algorithm designed to generate a diverse collection of high-performing solutions to complex problems by leveraging "stepping stones". Stepping stones have been defined as intermediate solutions that, while not necessarily optimal themselves, contribute to the development of more effective final outcomes. A deeper understanding of the role of stepping stones in evolutionary optimization would be beneficial. To address this gap, we employ search trajectory networks (STNs), an analytical and visualization tool for studying the behavior of optimization algorithms. We refine the notion of stepping stones by incorporating the idea of betweenness centrality in networks. We consider a robotic navigation task with various controller representations (polynomials, artificial neural networks, and symbolic formulae encoded as trees), comparing the ME search process with that of a genetic algorithm, while also evaluating the differences across representations. Our findings show clearer evidence of stepping stones in ME, particularly when using more "direct" and "local" representations.

17:35
Evolved and Transparent Pipelines for Biomedical Image Classification

ABSTRACT. This article presents an interpretable approach to binary image classification using Genetic Programming (GP), applied to the PatchCamelyon (PCAM) dataset, which contains small tissue biopsy patches labeled as malignant or benign. While Deep Neural Networks (DNNs) achieve high performance in image classification, their opaque decision-making processes, prone to overfitting behavior and dependency on large amounts of annotated data limit their utility in critical fields like digital pathology, where interpretability is essential. To address this, we employ GP, specifically using the Multi-Modal Adaptive Graph Evolution (MAGE) framework, to evolve end-to-end image classification pipelines. We trained MAGE a hundred times with the best optimized key hyperparameters for this task. Among all MAGE models trained, the best one achieved 78% accuracy on the validation set and 76% accuracy on the test set. Among Convolutional Neural Networks (CNNs), our baseline, the top model obtained 84.5% accuracy on the validation set and 77.1% accuracy on the test set. Unlike CNNs, our GP approach enables program-level transparency, facilitating interpretability through example-based reasoning. By analyzing evolved programs with medical experts, we highlight the transparency of decision-making in MAGE pipelines, offering an interpretable alternative for medical image classification tasks where model interpretability is paramount.