This repository hosts the code and data to support the paper "Op-Fed: Opinion, Stance, and Monetary Policy Annotations on FOMC Transcripts Using Active Learning" by Alisa Kanganis and Katherine A. Keith.
If you use this data or code, please cite our paper:
@misc{kanganis2025
author = {Kanganis, Alisa and Keith, Katherine A.},
title = {Op-Fed: Opinion, Stance, and Monetary Policy Annotations on FOMC Transcripts Using Active Learning},
year = {2025},
url = {https://arxiv.org/pdf/2509.13539},
note = {Preprint}
}
Corresponding author: Email Katie Keith, kak5@williams.edu
git clone git@github.com:kakeith/op-fed.git
cd op-fed/
conda create -y --name opFed python==3.11 -c conda-forge
conda activate opFed
pip install -r requirements.txt
python -m spacy download en_core_web_sm
The main dataset is in data/opfed_v1.csv and we also provide the raw annotations for all three annotators in data/opfed_raw_v1.csv.
Please see Section B: Datasheets for Datasets in our paper for a detailed description of this dataset.
Columns:
unique_id: Example19811222_189_9. The first two numbers, e.g.,19811222_189, correspond to theidin ConvoKit. In ConvoKit, this is the transcript number, e.g.,19811222, followed by the utterance number, e.g.,189(starting at index 1). In ourunique_id, the third number, e.g.,_9corresponds to the sentence number (after sentence segmentation with spacy); this starts at index 1 for the first sentence.speaker: The name of the speaker, e.g.,MR. TRUMANsentence: The full text of the target sentence.1_opinion: Opinion label on the target sentence. Possible labels:yes,no, orambiguous2_mp: Monetary policy label on the target sentence. Possible labels:yes,no, orambiguous3_mp_context: Whether the2_mplabel needed additional context. Possible labels:sentence,utterance,-5 sentences, or200+ tokens4_stance: StanceNLI labels on the target sentence. Possible labels:neutral,entailment,contradictionorambiguous5_stance_context: Whether the4_stancelabel needed additional context. Possible labels:sentence,utterance,-5 sentences, or200+ tokensutterance: The full utterance within which the target sentence exists.-5 sentences: The previous five sentences leading up to (but not including) the utterance of the target sentence. When this is across multiple utterances, we return a list of dictionaries. Each dictionary is an utterance with keys for the'speaker'and the'text'for the utterance. Example:[{'speaker': 'MR. LAWARE.', 'text': 'With the momentum that he will gain by our acquiescence to [releasing the transcripts], he will then say: Well, this is what I want you to decide to do.'}, {'speaker': 'MR. ANGELL.', 'text': 'Absolutely.'}, {'speaker': 'MR. LAWARE.', 'text': ""He's going to back us right into a corner.""}, ... ]
-200+ tokens: The previous 200 tokens (rounded to the nearest sentence) leading up to (but not including) the utterance with the target sentence. If this is across utterances, we use the same list-dictionary format as-5 sentences.
In this folder, we also provide opfed_raw_v1.csv which contains the per-annotator labels combined into list form (prior to aggregation). For example, in the first row, 1_opinion column, the cell value is ['yes', 'yes', 'yes'] which means all three annotators labeled the target sentence as 'yes' for the opinion aspect.
Because this is a hierarchical schema, one could also have missing values for some of the annotators, e.g., ['yes', 'nan', 'yes'] meaning the second annotator ('nan' value) did not reach that annotation stage due to earlier annotation decisions.
This folder contains scripts to gather descriptive statistics of the Op-Fed dataset (i.e. inner-annotator agreement rates).
descibe.pycreates Table 6 in the appendix with the per-label breakdown of examples as well as inner-annotator agreement levels.- Run
transcript_location.pyto create Figure 3, "Opinions are expressed later in transcripts." - The script
hand_selected_examples.ipynbcreates Table 9. - The script
score_analysis.ipynbcreates Table 13 (in the appendix).
This folder contains the scripts that ran the active learning simulations. The results of the simulation runs are saved in plots/sheets/.
To generate Figure 2 (the active learning simulations results presented in the main paper), run
code/active_learing/plots/plot_main.py
The final deployed human-in-the-loop AL pipeline to create the dataset is in the script code/active_learing/real_deal/real_loop.py
This folder contains scripts for the results on Op-Fed baseline models (zero-shot LLMs).
The resulting from all the zero shot models (run on the APIs in August, 2025) are saved in code/baseline_models/zeroshot/zeroshotpreds.zip.
To re-run the API calls of the LLMs re-run the scripts in code/baseline_models/zeroshot/ folder. Note: You will need to replace the path to your own API keys.
python gpt.py gpt-5
python gpt.py gpt-5-nano
python claude.py
python deepseek.py
This script runs the evaluation after the model's predictions are saved to disk:
python evaluate.py --full_print
For just the accuracy metrics for the latex table run:
python evaluate.py
For just the weighted F1 metrics for the latex table run:
python evaluate.py --f1
If you want to include the ambiguous ground truth labels, re-run with the --includes_ambiguous flag, i.e.,
python gpt.py gpt-5 --includes_ambiguous
Alternatively, to optimize cost, you can submit the GPT and Claude models in "batch mode". Note, DeepSeek currently does not have this option available. Dev examples:
python gpt_batch_submit.py gpt-5-nano --dev
python gpt_batch_monitor.py gpt_batches/batch_ids_gpt-5-nano_2025-09-03-12-57.json --dev
For the bag-of-words logistic regression baseline on OpFed run the script:
baseline_models/finetune/bow.py
The human baseline and majority class baseline are in the script
baselines.ipynb