Skip to content

yeonwoo378/flowbind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows (ICLR 2026)


FlowBind Overview

Introduction

TL;DR: FlowBind utilizes bidirectional flows to achieve efficient, any-to-any multimodal generation.

We propose FlowBind, an efficient framework for any-to-any generation. Our approach is distinguished by its simplicity: it learns a shared latent space capturing cross-modal information, with modality-specific invertible flows bridging this latent to each modality. Both components are optimized jointly under a single flow-matching objective, and at inference the invertible flows act as encoders and decoders for direct translation across modalities. By factorizing interactions through the shared latent, FlowBind naturally leverages arbitrary subsets of modalities for training, and achieves competitive generation quality while substantially reducing data requirements and computational cost.


Installation

We recommend using Docker to ensure environment consistency. Alternatively, you can set up the environment manually by installing the dependencies listed in requirements.txt.

  1. Docker Setup
docker pull yeonwoo378/flowbind:latest
docker run -it yeonwoo378/flowbind:latest bash
  1. Clone the repository
git clone https://github.com/yeonwoo378/flowbind.git
cd flowbind
  1. Install Requirements
pip install -r requirements.txt

Inference

We provide a Jupyter notebook to guide you through the inference process, including loading pretrained weights and running generation tasks.

Please refer to demo.ipynb for a step-by-step tutorial.


Data Preparation

We utilize the following datasets for training and evaluation.

Note: Due to copyright and licensing restrictions, we cannot provide the raw training datasets directly. Please download the data from the official links provided above.

Before training, you must extract features from your dataset (e.g., Flickr30k).

# Extract features for Text-to-Image (T2I) tasks
python extract_data/extract_t2i.py \
    --dataset flickr30k \
    --data_root /path/to/your/data

Extracted training features will be automatically saved to ./feats/{dataset_name}


Training

You can train the model using torchrun for distributed training. The command below demonstrates how to launch training on a single node with 4 GPUs.

torchrun --nnodes=1 --nproc_per_node=4 main.py \
    --exp_name flowbind_exp \
    --batch_size 256 \
    --dataset audiocaps flickr30k laion vggsound \
    --t_cond adaln \
    --hidden_dim 1152 \
    --lr 1e-4

Training logs are automatically synced to Weights & Biases (WandB). Please ensure you are logged in via wandb login before starting the training.


Acknowledgements

This repository is built upon the following open-source projects. We thank the authors for their excellent contributions.


Citation

If you find our work helpful, please cite:

@misc{cha2025flowbind,
  title={FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows},
  author={Cha, Yeonwoo and Kim, Semin and Kwon, Jinhyeon and Hong, Seunghoon},
  Eprint={arXiv:2512.15420},
  year={2025}
}

Contact

For any inquiries, please contact Yeonwoo Cha at ckdusdn03@kaist.ac.kr.

About

We propose an efficient flow-based multimodal generation model with bidirectional flows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors