FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows (ICLR 2026)

Yeonwoo Cha*, Semin Kim*, Jinhyeon Kwon, Seunghoon Hong
KAIST

[Paper] | [Project Page] | [HuggingFace]

Introduction

TL;DR: FlowBind utilizes bidirectional flows to achieve efficient, any-to-any multimodal generation.

We propose FlowBind, an efficient framework for any-to-any generation. Our approach is distinguished by its simplicity: it learns a shared latent space capturing cross-modal information, with modality-specific invertible flows bridging this latent to each modality. Both components are optimized jointly under a single flow-matching objective, and at inference the invertible flows act as encoders and decoders for direct translation across modalities. By factorizing interactions through the shared latent, FlowBind naturally leverages arbitrary subsets of modalities for training, and achieves competitive generation quality while substantially reducing data requirements and computational cost.

Installation

We recommend using Docker to ensure environment consistency. Alternatively, you can set up the environment manually by installing the dependencies listed in requirements.txt.

Docker Setup

docker pull yeonwoo378/flowbind:latest
docker run -it yeonwoo378/flowbind:latest bash

Clone the repository

git clone https://github.com/yeonwoo378/flowbind.git
cd flowbind

Install Requirements

pip install -r requirements.txt

Inference

We provide a Jupyter notebook to guide you through the inference process, including loading pretrained weights and running generation tasks.

Please refer to demo.ipynb for a step-by-step tutorial.

Data Preparation

We utilize the following datasets for training and evaluation.

Text-Image:
- LAION-COCO (Filtered by aesthetic scores)
- Flickr30k
Text-Audio:
- AudioCaps
Audio-Image:
- VGGSound

Note: Due to copyright and licensing restrictions, we cannot provide the raw training datasets directly. Please download the data from the official links provided above.

Before training, you must extract features from your dataset (e.g., Flickr30k).

# Extract features for Text-to-Image (T2I) tasks
python extract_data/extract_t2i.py \
    --dataset flickr30k \
    --data_root /path/to/your/data

Extracted training features will be automatically saved to ./feats/{dataset_name}

Training

You can train the model using torchrun for distributed training. The command below demonstrates how to launch training on a single node with 4 GPUs.

torchrun --nnodes=1 --nproc_per_node=4 main.py \
    --exp_name flowbind_exp \
    --batch_size 256 \
    --dataset audiocaps flickr30k laion vggsound \
    --t_cond adaln \
    --hidden_dim 1152 \
    --lr 1e-4

Training logs are automatically synced to Weights & Biases (WandB). Please ensure you are logged in via wandb login before starting the training.

Acknowledgements

This repository is built upon the following open-source projects. We thank the authors for their excellent contributions.

Citation

If you find our work helpful, please cite:

@misc{cha2025flowbind,
  title={FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows},
  author={Cha, Yeonwoo and Kim, Semin and Kwon, Jinhyeon and Hong, Seunghoon},
  Eprint={arXiv:2512.15420},
  year={2025}
}

Contact

For any inquiries, please contact Yeonwoo Cha at ckdusdn03@kaist.ac.kr.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
extract_data		extract_data
figs		figs
model		model
samples		samples
utils		utils
.gitignore		.gitignore
README.md		README.md
demo.ipynb		demo.ipynb
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows (ICLR 2026)

Introduction

Installation

Inference

Data Preparation

Training

Acknowledgements

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows (ICLR 2026)

Introduction

Installation

Inference

Data Preparation

Training

Acknowledgements

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages