bfms is a unified PyTorch framework for multimodal physiological representation learning and real-time cognitive state estimation, providing core model architectures and algorithms for brain foundation models (BFMs) in a standard and modular manner.
Brain foundation models are large encoder networks pre-trained on vast unlabelled physiological recordings via self-supervised objectives, then fine-tuned for specific cognitive state estimation tasks with minimal labelled data. They tackle the core labelling bottleneck in psychophysiology: while unlabelled EEG/BVP/GSR recordings are increasingly abundant from clinical archives and wearable deployments, carefully annotated cognitive-state data remains scarce.
Collecting labelled cognitive-state data is expensive:
- Participants must complete long experiments with carefully controlled stimuli.
- Ground-truth labels require validated questionnaires (NASA-TLX, SART) administered repeatedly, interrupting experimental flow.
- A well-controlled study typically yields data from 20–50 participants — orders of magnitude fewer than the millions of samples used to train vision or language foundation models.
bfms addresses this by separating representation learning from task-specific prediction. A single pre-trained encoder backbone is shared across all downstream tasks, sensors, and populations.
Our models aim to satisfy the following properties:
| Principle | Description |
|---|---|
| Task-Agnostic | Generalizes across tasks engaging different cognitive faculties (memory, attention, workload, spatial reasoning) |
| Subject-Agnostic | Robust to inter-subject variability; supports personalized adaptation (e.g. PULSE, TERSE) |
| Hardware-Agnostic | Works across devices, manufacturers, and sensor generations with varying sampling rates and channel counts (e.g. EEG-X) |
| Channel Topology-Agnostic | Handles variable channel sets with full permutation equivariance (e.g. DIVER-0, LUNA) |
| Sequence Length-Agnostic | Supports arbitrary-length recording durations |
| Privacy-Preserving | Resistant to biometric identity extraction from model weights or activations |
| Modality-Agnostic | Applicable to EEG, BVP/PPG, GSR, ECG, eye-tracking, and speech |
| Multi-Modal Fusion | Unified processing to identify modality-invariant and modality-specific features (e.g. MISA, PhysioOmni) |
| Asymmetric Cross-Modal Transfer | Leverages rich EEG supervisory signals to build quality encoders for data-scarce modalities |
| Modality | Signal | Primary Cognitive Relevance |
|---|---|---|
| EEG | Electrical cortical activity | Workload, attention, emotion, fatigue |
| ECG | Cardiac electrical activity | Stress, autonomic arousal |
| PPG / BVP | Peripheral blood volume pulse | Heart rate variability → stress and load |
| Eye Gaze & Pupillometry | Gaze position, pupil diameter | Workload, situational awareness |
| Speech | Acoustic para-linguistic features | Stress, arousal, cognitive load |
pip install bfmsgit clone https://github.com/aether-sutd/bfms.git
cd bfms
pip install .Requires Python 3.10+. Core dependencies include PyTorch 2.0+, MNE, snnTorch, SpikingJelly, and BindsNET.
bfms follows the Masked Autoencoder (MAE) paradigm popularized by Meta, adapted for physiological time-series. The framework is heavily inspired by torch-brain and braindecode, and implements architectural components from state-of-the-art large EEG models including LaBraM, EEGPT, and CBraMod.
Beyond transformers, we also implement alternative backbone families:
- Spiking Neural Networks (SNNs) — energy-efficient temporal coding with biological plausibility
- Continuous Thought Machines (CTMs) — dynamic recurrent processing for variable-length sequences
- State Space Models (SSMs) — linear recurrence for long-sequence modelling
- Neural Attention from LaBraM
- Neural Codebook + Normalized EMA Quantizer from LaBraM
- Criss-Cross Attention from CBraMod
- Gradient Reversal Layer
- Contrastive, regression, and variance losses
- Signal processing utilities (filtering, normalization, spectral features)
- Curriculum trainer
- Continuous Thought Machine (CTM)
- EEG-adapted CTM variant
- Spiking GRU / TCN / Spikeformer backbone families
- Synaptic SNN variants (mono- and tri-synaptic)
- Full LaBraM pre-training architecture
- Full CBraMod pre-training architecture
- MAE pre-training pipeline (masking, patch embedding, reconstruction head)
- JEPA-style pre-training pipeline
- Dataset loaders: HTC, MOCAS, N-Back, SENSE-42, UNIVERSE, WAUC
- PyTorch dataset classes for EEG classification and regression
- Raw and processed EEG dataset wrappers
- Unified streaming dataloader for large-scale pre-training
- Channel topology-agnostic encoding (variable channel permutation equivariance)
- Hardware/device-agnostic encoding across EEG generations
- Sequence length-agnostic temporal processing
- Multi-modal fusion architecture (EEG + PPG + ECG + eye-tracking)
- Asymmetric cross-modal knowledge transfer
- Differentially-private pre-training
- Subject-agnostic pre-training with inter-subject alignment
- Personalized fine-tuning interfaces (LoRA, adapter, prompt tuning)
- Out-of-the-box integration with PULSE / PhysioPFM-style adaptation
- LASTS-based surrogate explanation framework
- Counterfactual explanation utilities
bfms/
├── src/bfms/
│ ├── datasets/ # PyTorch dataset classes and ETL loaders
│ ├── losses/ # Contrastive, regression, and variance losses
│ ├── models/ # Full model implementations (LaBraM, CBraMod, CTM, SNN, …)
│ ├── nn/ # Reusable building blocks
│ │ ├── attentions/ # Neural, criss-cross, spike attention
│ │ ├── quantizers/ # Normalized EMA codebook
│ │ ├── embeddings/ # Patch and positional embeddings
│ │ ├── functional/ # Signal transforms and normalization
│ │ ├── snn/ # Spiking neural network layers
│ │ └── ctm/ # CTM helper modules
│ ├── processing/ # Signal preprocessing and feature extraction
│ ├── trainers/ # Training loop utilities
│ └── utils/ # Masking, sampling, ML utilities
└── docs/ # Documentation source
We welcome contributions from the community! Please read our Contributing Guide to get started, and review our Code of Conduct.