I’m a Ph.D. student in Computer Science at Virginia Tech, advised by Tu Vu. I work on efficient post-training and model development: transferring alignment-induced behaviors (e.g., SFT and RL) across models via weight merging, to avoid repetitive, resource-heavy training. Broader interests include reasoning-oriented LLM and agent systems, verification-grounded interaction, and data-efficient scaling for multilingual and multimodal settings. I aim to build modular, reusable systems where capabilities accumulate over time instead of being isolated in one-off training runs.

  • Efficient model development: Developing methods for faster, cheaper, and more reusable alignment updates, enabling continual adaptation across evolving model architectures.

  • Parameter-efficient transfer learning: Modular task and document knowledge for scalable transfer.

  • Interactive scaling and verification: Improving reasoning and factuality through interaction with verifiable environments.

  • Advanced reasoning: Reasoning-capable LLM and agent setups for complex multilingual and multimodal tasks.

  • Data-centric methods: Data selection and sampling strategies for stronger performance under limited compute or data.

Previously, I interned at Amazon AGI on distributed model distillation. Before my Ph.D., I received my Master’s in Language Science and Technology from Saarland University, working on efficient transfer learning and low-resource NLP with Dietrich Klakow and Vera Demberg. Earlier, I contributed to NLP research on historical archives at Academia Sinica, and I was selected as a Google CSRMP Fellow in 2023.

Seeking a Research Internship (Summer 2026). Interests: LLM post-training, agents, RL, deep reasoning, and efficiency. My CV is here.

🔥 News

  • 2025.08: One paper accepted as an oral presentation at EMNLP 2025.
  • 2024.10: One paper accepted to EMNLP 2024 Industry Track.
  • 2024.09: Paper Target-Aware Language Modeling via Granular Data Sampling accepted to EMNLP 2024.
  • 2024.08: Started my Ph.D. at Virginia Tech.
  • 2024.07: Paper Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning accepted to ACL 2024 SRW.
  • 2024.02: Successfully defended my Master’s thesis Exploring Task Selection for Intermediate-Task Transfer Learning.
  • 2024.02: Paper Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin accepted to LREC-COLING 2024.
  • 2024.01: Paper Projecting Annotations for Discourse Relations accepted to CODI @ EACL 2024.


📝 Selected Publications

Please see Google Scholar for an up-to-date publication list.

* indicates equal contributions

Efficient Model Development through Fine-tuning Transfer — figure
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu
EMNLP 2025 (Oral, top 10%)
[Paper] [Code]
Transferring alignment-induced capabilities (SFT, RL) across LLMs without re-training LLMs from scratch.


Target-Aware Language Modeling via Granular Data Sampling — figure
Target-Aware Language Modeling via Granular Data Sampling
Ernie Chang, Pin-Jie Lin, Yang Li, Changsheng Zhao, Daeil Kim, Rastislav Rabatin, Zechun Liu, Yangyang Shi, Vikas Chandra
EMNLP 2024
[Paper]
Data-efficient pretraining: ~1% of RefinedWeb to match full pretraining performance.


Task Selection in Intermediate-Task Transfer Learning — figure
Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning
Pin-Jie Lin, Miaoran Zhang, Marius Mosbach, Dietrich Klakow
Student Research Workshop at ACL 2024
[Paper] [Code]
Robust modular task selection via point-wise similarity for transfer learning.


Orthographic variation for Nigerian Pidgin NLP — figure
Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin
Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg
LREC-COLING 2024
[Paper]
Generating synthetic data from a phonological-theoretic, parameter-free framework.


In-Context Prompt Editing for conditional audio generation — figure
In-Context Prompt Editing For Conditional Audio Generation
Ernie Chang*, Pin-Jie Lin*, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra
ICASSP 2024
[Paper]
In-context control for conditional generation (GenAI); featured in HuggingFace Daily Paper.


Cross-lingual adaptive training for low-resource Nigerian Pidgin — figure
Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
Pin-Jie Lin*, Muhammed Saeed*, Ernie Chang*, Merel Scholman
Interspeech 2023
[Paper]
Improving low-resource performance through mixed-language adaptation.


Sample size determination for NLU data efficiency — figure
Revisiting Sample Size Determination in Natural Language Understanding
Ernie Chang*, Muhammad Hassan Rashid*, Pin-Jie Lin*, Changsheng Zhao, Vera Demberg, Yangyang Shi, Vikas Chandra
ACL 2023 Findings
[Paper] [Code]
Sample-size estimation for data-efficient NLU and more reliable scaling decisions.


Two-stage movie script summarization — figure
Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization
Dongqi Pu*, Xudong Hong*, Pin-Jie Lin*, Ernie Chang, Vera Demberg
COLING 2022
[Paper]
Achieving top performance in movie script summarization.