Synthetic Cognition


SHAP Value-based ERP Analysis: SHERPA
Increasing the sensitivity of EEG signals with explainable AI methods
When analyzing EEG data, there are essentially two broad approaches. In traditional ERP research, components are defined a priori by specifying time windows and electrode locations, followed by statistical comparisons between conditions. Because these components are defined in advance, their functional interpretation is often well established—for example, decades of research link the face-sensitive N170 to early stages of face processing. This expert-driven approach is powerful, but inherently constrained: it assumes prior knowledge of what, where, and when meaningful effects should occur, and therefore works best for well-established paradigms.
When the goal is to explore novel effects, at unexpected time points, electrode locations, or within new paradigms, researchers typically rely on data-driven methods such as cluster-based permutation tests. These methods perform many statistical comparisons while controlling the false-positive rate and are highly effective at detecting reliable differences between conditions. However, their output is primarily statistical, and linking significant clusters back to specific neural processes is often indirect and ambiguous.
SHERPA addresses this limitation. By combining EEG decoding with explainable machine-learning methods, SHERPA identifies which spatiotemporal parts of the EEG signal are most informative for distinguishing experimental conditions. Instead of only indicating that conditions differ, SHERPA reveals where and when information is used by the model. This provides a principled, data-driven way to relate condition differences to underlying spatiotemporal neural dynamics, complementing traditional hypothesis-driven and statistical approaches.
SHERPA trains a temporal convolutional neural network on single-trial, preprocessed EEG epochs to classify experimental conditions directly from the time-domain signal. SHapley Additive exPlanations (SHAP), implemented via a gradient-based explainer, is applied post hoc to the trained model to compute feature-wise relevance for each electrode at each time point. Aggregated SHAP values yield spatiotemporal importance maps that quantify which ERP components contribute most to condition discrimination, without relying on predefined components or cluster-based inference.
The SHERPA framework is named metaphorically after Sherpa guides, in appreciation of their skill in navigating complex terrain, a parallel to traversing the structured peaks and troughs of ERP signals.
A current PhD project further develops this approach, SHERPA II is under review.
Training and Researching Large Language Models
A dedicated part of the new PSY3230 course, and consequently of the research at SynCoRe, focuses on large language models (LLMs) as experimental objects rather than as tools. The LLMs we typically encounter are, by design, more or less safe to use. They are constrained systems. They do not openly inflict harm on users, and they do not assist in building biological weapons or committing murder. Yet they might still help with getting rid of a 75 kg chicken in a way the police cannot trace. The danger is not added later; it is inherent to the machine. What we usually see are polished, constrained versions prepared for deployment and sale, not the system in its raw form.
In a new research project that is integrated into the course, we train and modify LLMs locally to study how behavior emerges from data, optimization objectives, and architectural constraints. The models are treated as synthetic cognitive systems whose outputs can be systematically shaped, destabilized, probed, and analyzed under controlled conditions.
The emphasis is not on performance or application, but on psychological explanation. By fine-tuning models using methods such as parameter-efficient adaptation, students observe how small changes in training history can produce large, interpretable shifts in behavior. This makes it directly observable that properties often attributed to intelligence, values, or intentions are not intrinsic qualities of the system, but fragile products of training regimes and imposed constraints.
For the students, the explicit aim is to create an “evil” LLM in order to deliberately remove these constraints and expose the system’s unconstrained behavioral space and recognize the potential of LLMs. In the accompanying research project, we describe and analyze the (resulting) behavior in terms of psychological concepts, with the goal of constructing a coherent account of the mental architecture of a machine mind.
