You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval
Introduction
We propose ReasonEmbed, a new text embedding model for reasoning-intensive document retrieval based on innovations of how synthetic data is generated and used. Our work includes the following technical contributions.
We design a novel data synthesis method, called ReMixer.
We introduce a self-adaptive training method tailored for our synthetic data, termed Redapter.
We implement ReasonEmbed based on multiple LLM backbones of varying model sizes, which achieve state-of-the-art (SOTA) performance on reasoning-intensive document retrieval tasks. Notably, our model built on Qwen3-4B reaches an nDCG@10 score of 37.1 on the BRIGHT benchmark, which already surpasses all existing text embedding models. While the Qwen3-8B based varient improves the performance to 38.1. Moreover, on the R2MED benchmark, ReasonEmbed-Qwen3-8B attains an nDCG@10 score of 43.18, surpassing all of the existing models by a large margin and leading to new SOTA performance.
nDCG@10 = 38.1 on BRIGHT using original queries; fine-tuned on Qwen/Qwen3-8B with our synthetic dataset using the novel RI-InfoNCE loss; submission to BRIGHT leaderboard
nDCG@10 = 34.9 on BRIGHT using original queries; fine-tuned on meta-llama/Llama-3.1-8B with our synthetic dataset using the basic InfoNCE loss
Model
BGE-Reasoner-Embed-0821
-
-
nDCG@10 = 32.5 on BRIGHT using original queries; will not be released due to its suboptimal performance compared to BGE-Reasoner-Embed-0923; submission to BRIGHT leaderboard
used for training all ReasonEmbed models in our paper
Code and Scripts
Resource Type
Name
Link
Release Date
Comments
Data Synthesis Code and Scripts
ReMixer
(TBA)
-
to be released
Training Code
Resource Type
Name
Link
Release Date
Comments
Training Code and Scripts
Redapter
(TBA)
-
to be released
Citation
If you find this repository useful, please consider giving a star ⭐ and citation:
@article{chen2025reasonembed,
title={ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval},
author={Chen, Jianlyu and Lan, Junwei and Li, Chaofan and Lian, Defu and Liu, Zheng},
journal={arXiv preprint arXiv:2510.08252},
year={2025}
}