Skip to content

LLM4SE-Benchmarks/LLM4SE-Benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Hi there 👋

This is the collection of LLM4SE Benchmarks (Still under construction...)

📝 The organization of papers refers to our survey
"Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks".

🚀 Welcome to submit issues to include your LLM4SE benchmarks!
Submit Issue

🔥 If you find our survey useful for your research, please cite the following paper:

@article{LLM4SEBenchmark,
  title={Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks},
  author={Xing Hu, Feifei Niu, Junkai Chen, Xin Zhou, Junwei Zhang, Junda He, Xin Xia, David Lo},
  year={2025},
  journal={arXiv preprint arXiv:2505.08903},
  url={[http://arxiv.org/abs/2303.18223](https://arxiv.org/pdf/2505.08903)}
}

Benchmark List

Requirements and Design

Task Benchmarks Year Evaluation Metrics Paper Link
Elicitation NFR-Review 2018 - [Paper] [Link]
Rahman and Zhu 2024 Readability, Understandability, Specificability, Technical-aspects [Paper] [Link]
Habib et al. 2025 Precision, Recall, and F [Paper] [Link]
Voria et al. 2025 Precision, Recall, F, Accuracy, BLUE, ROUGE, METEOR, Brevity Penalty, Length Ratio [Paper] [Link]
Analysis PROMISE NFR 2007 - [Paper] [Link]
SecReq 2010 - [Paper] [Link]
PURE 2017 - [Paper] [Link]
Dalpiaz et al. 2019 Precision, Recall, F1-score, AUC [Paper] [Link]
ReqEval 2020 Precision, Recall, F2, Success rate [Paper] [Link]
NFR-SO 2022 F1 [Paper] [Link]
DAMIR 2022 Precision, Recall, F2, Success rate [Paper] [Link]
Gärtner and Göhlich 2024 Accuracy, Precision, Recall, F [Paper] [Link]
Preda et al. 2024 Precision, Recall, F [Paper] [Link]
Koltoff et al. 2024 Precision, Recall, F, Accuracy [Paper] [Link]
Specification & Validation Jdoctor 2018 Precision, Recall, F [Paper] [Link]
DocTer 2022 Precision, Recall, F1 [Paper] [Link]
Poudel et al. 2023 F2 and MAP [Paper] [Link]
Mandal et al. 2023 Precision, Recall, F1 [Paper] [Link]
SV-Benchmarks 2024 - [Paper] [Link]
SpecGenBench 2024 #Passes, Success Probability, #Verifier Calls, User Rating [Paper] [Link]
Reinpold et al. 2024 Precision, Recall, F [Paper] [Link]
Krishna et al. 2024 Unambiguous, understandable, correctness, verifiable, consistency, non-redundancy, completeness, conciseness [Paper] [Link]
OSVBench 2025 Pass@N, Syntax Error, Semantic Error [Paper] [Link]
Management Wang et al. 2020 Precision, Recall, F1 [Paper] [Link]
Helmeczi et al. 2023 Accuracy, F1 [Paper] [Link]

Coding Assistant

Task Benchmarks Year Evaluation Metrics Paper Link
Code Generation and Recommendation Lin et al. 2013 BLEU, CodeBLEU [Paper] [Link]
Leetcode 2015 Passing Test Cases, Runtime, Memory Usage [Paper] [Link]
ExampleCheck 2018 misuse rate [Paper] [Link]
CONCODE 2018 BLEU [Paper] [Link]
CoNaLa 2018 precision, recall, TPR [Paper] [Link]
NL2Bash 2018 manual, BLEU [Paper] [Link]
Spider 2018 Component Matching, Execution Accuracy [Paper] [Link]
CodeSearchNet 2020 NDCG, MRR [Paper] [Link]
APPS 2021 Test Case Average, Strict Accuracy [Paper] [Link]
MBPP 2021 % solved [Paper] [Link]
CodeXGLUE 2021 EM, ES [Paper] [Link]
HumanEval 2021 Pass@k [Paper] [Link]
miniF2F 2021 Pass@k [Paper] [Link]
Lyra 2021 BLEU, AST match [Paper] [Link]
FC2Code 2022 BLEU [Paper] [Link]
CodeContests 2022 n@k, Pass@k [Paper] [Link]
AixBench 2022 Correctness, Maintainability, Pass@1 [Paper] [Link]
ReCode 2022 robust Pass@k, drop@k [Paper] [Link]
SecurityEval 2022 percentage [Paper] [Link]
MathEquations 2022 Functional accuracy [Paper] [Link]
MBXP 2022 Pass@k [Paper] [Link]
NumpyEval 2022 Pass@k [Paper] [Link]
PandasEval 2022 Pass@k [Paper] [Link]
TorchData-Eval 2022 Pass@k [Paper] [Link]
MonkeyEval 2022 Pass@k [Paper] [Link]
BeatNumEval 2022 Pass@k [Paper] [Link]
MTPB 2022 Pass Rate, PPL [Paper] [Link]
Multi-HumanEval 2022 Pass@k [Paper] [Link]
DSP 2022 Pass@k [Paper] [Link]
ExeDS 2022 BLEU, CodeBLEU, EM [Paper] [Link]
XLCoST 2022 BLEU, CodeBLEU, MRR [Paper] [Link]
Qing et al. 2023 Success Rate [Paper] [Link]
ClassEval 2023 Pass@k, DEP(F), DEP(M) [Paper] [Link]
TACO 2023 Pass@k [Paper] [Link]
xCodeEval 2023 F1, Pass@k, Accuracy [Paper] [Link]
CodeApex 2023 AC@1, AC@all, AC Rate [Paper] [Link]
CloverBench 2023 Accept@k [Paper] [Link]
Mastropaolo et al. 2023 CodeBLEU, Levenshtein Distance [Paper] [Link]
CoderEval 2023 Pass@k, Acc@k [Paper] [Link]
EvalPlus 2023 Pass@k [Paper] [Link]
Shapkin et al. 2023 CodeBLEU, Accuracy [Paper] [Link]
CrossCode-Bench 2023 EM, BLEU, ROUGE-L [Paper] [Link]
MultiPL-E 2023 Pass@k [Paper] [Link]
StudentEval 2023 Pass@1 [Paper] [Link]
TorchDataComplexEval 2023 Pass@k [Paper] [Link]
DS-1000 2023 Pass@1 [Paper] [Link]
ML-Bench 2023 Pass@k [Paper] [Link]
LowCoder 2023 Accuracy [Paper] [Link]
Ren et al. 2023 Time Consumption, Answer Correctness [Paper] [Link]
CodeAlpaca (Py) 2023 - [Paper] [Link]
CoLadder 2023 Usability, Cognitive Load [Paper] [Link]
VeriGen 2023 Pass@k [Paper] [Link]
SOEval 2023 NDCG@K [Paper] [Link]
Decept-Prompt 2023 ASR, WFR [Paper] [Link]
HumanEval-X 2023 Pass@k [Paper] [Link]
ARCADE 2023 Pass@k [Paper] [Link]
MCoNaLa 2023 BLEU [Paper] [Link]
CrossCode-Eval 2023 Code Match, Identifier Match [Paper] [Link]
Pisces 2023 BLEU, Syntax-Match, CodeBLEU [Paper] [Link]
MBPP/HumanEval/APPS-ET 2023 CrystalBLEU, BERTScore, COMET, CodeBERTScore [Paper] [Link]
LiveCode-Bench 2024 Pass@k [Paper] [Link]
Mercury 2024 Beyond Pass [Paper] [Link]
EffiBench 2024 ET, NET, MU, TMU, Pass@k [Paper] [Link]
MBPP-san-DFY 2024 verify@k [Paper] [Link]
CoderUJB 2024 Pass@k, Count@n, Coverage@n [Paper] [Link]
PythonSaga 2024 Pass@k [Paper] [Link]
DevEval 2024 Pass@k, Recall@k [Paper] [Link]
Exec-CSN 2024 Pass@k [Paper] [Link]
Wang et al. 2024 BLEU-4, CodeBLEU, edit sim [Paper] [Link]
EvoCodeBench 2024 Pass@k, Recall@k [Paper] [Link]
RustEval 2024 Pass@k [Paper] [Link]
Devbench 2024 Faithfulness, Pass@k [Paper] [Link]
BigCode-Bench 2024 Pass@k [Paper] [Link]
OOPEval 2024 Pass@k, Pass@o [Paper] [Link]
ODEX 2024 Pass@k [Paper] [Link]
NaturalCodeBench 2024 Pass@k [Paper] [Link]
PAREval 2024 speedup@k, efficiency@k [Paper] [Link]
CAASD 2024 pass rate [Paper] [Link]
CodeScope 2024 Pass@k [Paper] [Link]
CodeAgent-Bench 2024 Pass@k [Paper] [Link]
JavaBench 2024 Completion@k, Compilation@k [Paper] [Link]
Chart2Code-160k 2024 Execution/pass rate, text match [Paper] [Link]
PoorCodeSumEval 2024 BLEU, BERTScore [Paper] [Link]
ComplexCodeEval 2024 BLEU, Syntax Match, Data Flow Match [Paper] [Link]
StackEval 2024 Acceptance Score [Paper] [Link]
Code-Vision 2025 Pass@k [Paper] [Link]
CodeIF-Bench 2025 Pass@k [Paper] [Link]
CodeIF 2025 Satisfaction Rate [Paper] [Link]
LibEvolutionEval 2025 F1-score, MRR [Paper] [Link]
COFFE 2025 Efficienct@k [Paper] [Link]
Deep-Bench 2025 Pass@k [Paper] [Link]
DynaCode 2025 Pass@k [Paper] [Link]
FEA-Bench 2025 Precision, Recall [Paper] [Link]
MaintainCoder 2025 Pass@k, CodeDiff, ASTsim [Paper] [Link]
mHumanEval 2025 BERTScore [Paper] [Link]
REPOEXEC 2025 Functional correctness, Dependency utilization [Paper] [Link]
Plot2Code 2025 code pass rate, text-match ratio [Paper] [Link]
ProjectEval 2025 Pass@K [Paper] [Link]
SolEval 2025 Pass@K, Compile@k, Gas Consumption [Paper] [Link]
ConvCodeWorld 2025 Pass@K, MRR, Recall [Paper] [Link]
Web-Bench 2025 Pass@K [Paper] [Link]
REPOCOD 2025 Pass@K [Paper] [Link]
Code Summarization PCSD 2017 Python BLEU, BLEU-4, ROUGE-L, METEOR, CIDEr 92,545 pairs [Paper] [Link]
JCSD 2018 Java Precision, Recall, F-Score, BLEU-4, METEOR, ROUGE-L, CIDEr, BLEU-DC 87,136 pairs [Paper] [Link]
Deepcom 2018 Java BLEU-4, METEOR, ROUGE-L 69,708 pairs [Link]
Funcom 2019 Java BLEU, ROUGE-L, METEOR 2.1M pairs [Link]
CodeXGLUE 2021 Java, Python BLEU, BLEU-4, ROUGE-L, METEOR, USE, MRR see code_generation table [Link]
Funcom-java-long 2023 Java BLEU 8192 methods [Link]
CroCoSum 2023 English, Chinese ROUGE, BERTScore 18,857 pairs [Link]
CAPYBARA 2023 C EM, BLEU-4, ROUGE-L, METEOR 7,826 pairs [Link]
BinSum 2023 C BLEU, METEOR, ROUGE-L, Semantic Similarity 557,664 functions [Link]
P-CodeSum 2024 Multiple PLs BLEU-4, ROUGE-L 1,500 pairs [Link]
FILE-CS 2024 Python BLEU, ROUGE-L, METEOR 98,236 pairs [Link]
Code Translation CodeSearchNet 2020 Multiple PLs BLEU, CodeBLEU, METEOR, Exact Match 6.45M pairs [Link]
CodeXGLUE 2021 Java, Python BLEU-4, BLEU, ACC, CodeBLEU 11,800 pairs [Link]
CodeNet 2021 C++, Python Compilation, Runtime Errors, Functional Errors 4,053 problems, 13.9M samples [Link]
CoST 2022 Multiple PLs BLEU, CodeBLEU 132,046 pairs [Link]
XLCoST 2022 Multiple PLs CodeBLEU, BLEU, MRR 1,002,296 pairs [Link]
Nova 2023 Binary BLEU, Exact Match, Instruction LCS 60,600 pairs [Link]
SUT 2023 Multiple PLs Syntax Unit Test Accuracy, Syntax Element Score 60k parallel, 200k mono [Link]
xCodeEval 2023 Multiple PLs Pass@K 25M examples [Link]
CodeTransOcean 2023 Multiple PLs BLEU, CodeBLEU, Exact String Match 45 languages [Link]
G-TransEval 2023 Multiple PLs BLEU, CodeBLEU, Computational Accuracy 400 pairs [Link]
AVATAR 2023 Java, Python BLEU, Syntax Match, CodeBLEU, Execution Accuracy 62,520 pairs [Link]
AVATAR-TC 2024 Java, Python BLEU, CodeBLEU, Compilation Accuracy, Functional Equivalence 57,368 pairs [Link]
RustRepoTrans 2024 C, Java, Python → Rust Pass@k 375 tasks [Link]
Code Reasoning CRUXEval 2024 Python Pass@k 800 [Link]
REval 2025 Python Accuracy, Incremental Consistency Score 3,152 [Link]
DyCodeEval 2025 Python Pass@K, DivPass@K 591 [Link]

Software Testing

Task Benchmarks Year Evaluation Metrics Paper Link
Test Generation Evosuite SF110 2011 Line coverage, branch coverage, and test correctness [Paper] [Link]
Defects4J 2014 The number of test case is executable, CodeBLEU, line coverage, branch coverage, and the number of detected bugs [Paper] [Link]
DynaMOSA 2018 Line coverage, branch coverage, number of detected bugs [Paper] [Link]
BugsInPy 2020 Line coverage, branch coverage, and number of detected bugs [Paper] [Link]
HumanEval 2021 Mutation score, Pass@K, the number of killed mutants, line coverage, and branch coverage [Paper] [Link]
MBPP 2021 Pass@K [Paper] [Link]
APPS 2021 Pass@K [Paper] [Link]
CodeContests 2022 Pass@K [Paper] [Link]
HumanEval-X 2023 Pass@K [Paper] [Link]
CoderUJB 2024 Syntax correctness rate, compile passing rate, line coverage [Paper] [Link]
SWT-Bench 2024 Success rate and change coverage [Paper] [Link]
TestBench 2024 Syntax/compilation/execution correctness rate, coverage/defect detection rate [Paper] [Link]
TestEval 2025 overall/line/branch/path coverage [Paper] [Link]
ProjectTest 2025 Compilation/correctness/coverage rate [Paper] [Link]
Assertion Generation ATLAS 2020 Exact match, edit distance, and longest common subsequence [Paper] [Link]
GUI Test Themis 2021 The number of detected bugs, and activity coverage [Paper] [Link]
QTypist 2021 Passing rate, coverage metrics, activity number, and page number [Paper] [Link]
Testing Automation LAVA-M 2016 Coverage, Unique bug [Paper] [Link]
Unibench 2021 Quality of bugs, stability of finding bugs, speed of finding bugs, and overhead [Paper] [Link]
FuzzBench 2021 Coverage, Unique bug [Paper] [Link]
FuzzGPT 2024 Code coverage, API coverage, number of unique crashes [Paper] [Link]
Testing Prediction IDoFT 2019 Precision, Recall, F1-Score [Paper] [Link]
FlakeFlagger 2021 Precision, Recall, F1-Score [Paper] [Link]
Testing Repair TARBENCH 2025 CodeBLEU, BLEU, exact match, repair accuracy [Paper] [Link]
Syn-Bench 2025 Syntactic/semantic correctness, code coverage [Paper] [Link]

AIOps

Task Benchmarks Year Evaluation Metrics Paper Link
Log Statement Generation LANCE 2022 Correct prediction ratio [Paper] [Link]
LogBench 2024 Accuracy, Precision, Recall [Paper] [Link]
SCLoger 2024 Accuracy, Precision, Recall, F1, BLEU, and ROUGE [Paper] [Link]
AL-Bench 2025 Position Accuracy, Level Accuracy, Average Level Distance, Message Accuracy, Dynamic Expression Accuracy, Static Text Similarity [Paper] [Link]
Log Parsing Loghub 2023 Accuracy [Paper] [Link]
Loghub-2.0 2024 Accuracy, F1-score [Paper] [Link]

Maintenance

Task Benchmarks Year Evaluation Metrics Paper Link
Code Review CodeReview 2022 Exact Match [Paper] [Link]
CodeReviewer 2022 Exact Match and BLEU [Paper] [Link]
AUGER 2023 ROUGE, Perfect Prediction Rate [Paper] [Link]
Review-Explaining 2023 Explanation type correctness,the semantic meaning correctness [Paper] [Link]
Code-Review-Assist 2023 Precision, Recall, and F1 score [Paper] [Link]
CodeReview-New 2024 Exact Match Trim, Exact Match, BLEU [Paper] [Link]
ManualReviewComment 2025 Precision, Recall, F1 [Paper] [Link]
Clone Detection BigCloneBench 2014 Precision, Recall, F1 [Paper] [Link]
POJ-104 2016 Precision, Recall, MAP [Paper] [Link]
Company-C/C++ 2023 MRR, Precision, Recall [Paper] [Link]
GPTCloneBench 2023 Precision, Recall [Paper] [Link]
Curated CodeNet 2023 Precision, Recall [Paper] [Link]
Refactoring JavaRef 2023 Accuracy, Exact Match, Edit Distance, Character Error Rate [Paper] [Link]

Quality Management

Task Benchmarks Year Evaluation Metrics Paper Link
Defect Prediction Bugs.jar 2018 Precision, Recall, F1, Accuracy, MCC [Paper] [Link]
Bears 2019 Precision, Recall, F1, Accuracy, MCC [Paper] [Link]
Zeng et al. 2021 Accuracy, Recall, False Discovery Rate, AUC-ROC, AUC-PR [Paper] [Link]
Review-Explaining 2023 Explanation type correctness,the semantic meaning correctness [Paper] [Link]
JIT-defects4j 2022 F1-score, AUC, Recall@20 Effort, Effort@20 Recall, P𝑜𝑝𝑡 , Top-N Accuracy [Paper] [Link]
Opu et al. 2025 Precision, Recall, F1, Accuracy, MCC [Paper] [Link]
Bug Localization Ye et al. 2014 Accuracy, MRR, MAP [Paper] [Link]
Defects4J 2014 ACC@K, FPR, Top@N [Paper] [Link]
Bench4BL 2018 MRR, MAP, HIT@K [Paper] [Link]
Devign 2019 Top@N [Paper] [Link]
BugsInPy 2020 ACC@K,Top@N [Paper] [Link]
Zhu et al. 2021 Accuracy [Paper] [Link]
CodeReviewer 2022 Accuracy [Paper] [Link]
Ciborowska et al. 2022 Precision@K, Recall@K, F1-score@K, MRR, MAP [Paper] [Link]
Ma et al. 2023 MAP, MRR, Top@N [Paper] [Link]
RTLLM 2024 Hit Rate, pass@k [Paper] [Link]
BeetleBox 2024 Accuracy, MRR, MAP [Paper] [Link]
SWE-Bench 2024 Accuracy, MRR, MAP, TopN, Precision [Paper] [Link]
Chandramohan et al. 2024 Accuracy, MRR, MAP [Paper] [Link]
Stracquadanio et al. 2024 Top-1 bug coverage [Paper] [Link]
Manke et al. 2024 TP, FP [Paper] [Link]
D58 2024 Recall, MRR, CandiAvg [Paper] [Link]
Saha et al. 2024 MRR, MAP, HIT@K [Paper] [Link]
Widyasari et al. 2024 Top-K [Paper] [Link]
LINUXFLBENCH 2025 Recall@k, MRR [Paper] [Link]
ACPR 2025 Accuracy [Paper] [Link]
Repair Defects4J 2014 # fixed bugs [Paper] [Link]
QuixBugs 2017 # fixed bugs [Paper] [Link]
LMDefects 2023 # fixed bugs [Paper] [Link]
InferredBugs 2023 Ratio of fixed bugs [Paper] [Link]
ARHE 2023 Accuracy [Paper] [Link]
Leetcode-debug 2023 Acceptance rate [Paper] [Link]
API-Misuse-Repair 2017 Eaxct Match, BLEU, CodeBLEU [Paper] [Link]
DebugBench 2024 Pass Rate [Paper] [Link]
SWE-Bench 2024 Resolution rate [Paper] [Link]
SWE-bench Multimodal 2024 Resolution rate [Paper] [Link]
SWE-Lancer 2025 Resolution rate [Paper] [Link]
Multi-SWE-bench 2025 Resolution rate [Paper] [Link]
Vulnerability Detection Choi et al. 2017 Accuracy, F1, AUC [Paper] [Link]
Lin et al. 2017 Top-k Recall [Paper] [Link]
DGBBench 2017 Precision, Recall, F1, Accuracy [Paper] [Link]
Juliet 2018 Precision, Recall, MCC [Paper] [Link]
VulDeePecker 2018 FN, FP, TN, TP, Precision, Recall, F1, AUC, MCC [Paper] [Link]
Draper 2018 FN, FP, TN, TP, Precision, Recall, F1, AUC, MCC [Paper] [Link]
Devign 2019 Accuracy, Precision, Recall, F1, FPR, AUC, Precision@K,MCC [Paper] [Link]
Ponta et al. 2019 AUC, F1 [Paper] [Link]
BigVul 2020 Accuracy, Precision, Recall, F1, FPR, AUC, Precision@K, MCC [Paper] [Link]
ReVeal 2020 Accuracy, Precision, Recall, F1, FPR, AUC, Precision@K [Paper] [Link]
SmartBugs 2020 Precision, Recall, F1, Top-N Accuracy, MAR, MFR [Paper] [Link]
Great 2020 Precision, Recall, Accuracy [Paper] [Link]
Magma 2020 ROC-AUC [Paper] [Link]
SolidiFI 2020 FN, FP [Paper] [Link]
SySeVR 2021 FPR, FNR, Precision, Recall, F1 [Paper] [Link]
D2A 2021 Precision, Recall, MCC [Paper] [Link]
PatchDB 2021 Precision, Recall, F1 [Paper] [Link]
CVEFixes 2021 Accuracy, Precision, Recall, F1, FPR [Paper] [Link]
CrossVul 2021 Accuracy, Precision, Recall, F1, FPR [Paper] [Link]
VCmatch 2022 AUC, F1 [Paper] [Link]
VUDENC 2022 Precision, Recall, F1, Accuracy [Paper] [Link]
SARD 2023 Accuracy, Precision,Recall, F1 [Paper] [Link]
DiverseVul 2023 Accuracy, Precision, Recall, F1, FPR [Paper] [Link]
Web3Bugs 2023 TP,TN, FP, FN [Paper] [Link]
DeFi Hacks 2023 TP,TN, FP, FN [Paper] [Link]
VulBench 2023 Precision, Recall, F1 [Paper] [Link]
OWASP 2023 Accuracy [Paper] [Link]
TreeVul 2023 F1, Macro-F1, MCC [Paper] [Link]
FormAI 2023 Precision, Recall, F1, Accuracy [Paper] [Link]
Hu et al. 2023 Hit # [Paper] [Link]
FalconVulnDB 2024 Precision, Recall, F1, Accuracy [Paper] [Link]
FormAI-v2 2024 Average Property Violations Per File/Line [Paper] [Link]
MoreFixes 2024 Accuracy, Precision, Recall, F1 [Paper] [Link]
VulEval 2024 Precision, Recall, F1, MCC, Precision@k, Recall@k [Paper] [Link]
InterPVD 2024 FPR, FNR, Accuracy, Precision, F1 [Paper] [Link]
ReposVul 2024 Accuracy [Paper] [Link]
MegaVul 2024 Accuracy, Precision, Recall, F1 [Paper] [Link]
SecLLMHolmes 2024 Response Rate, Accuracy, Correct Reasoning Rate [Paper] [Link]
VulDetectBench 2024 F1, Accuracy [Paper] [Link]
SC-LOC 2024 Precision, Recall, Accuracy, F1-Score [Paper] [Link]
Ma et al. 2024 Precision, Recall, Accuracy, F1-Score [Paper] [Link]
FELLMVP 2024 Precision, Recall, Accuracy, F1-Score [Paper] [Link]
Yıldırım et al. 2024 Accuracy [Paper] [Link]
Vulcorpus 2024 Accuracy, Improvement Suggestion [Paper] [Link]
Fang et al. 2024 Not vulnerability detection [Paper] [Link]
SLFHunter 2024 TP,TN, FP, FN, F1-score [Paper] [Link]
Guo et al. 2024 Precision, Recall, F1-Score [Paper] [Link]
VulnPatchPairs 2024 Precision, Recall, F1, Accuracy, FPR, FNR [Paper] [Link]
Real-Vul 2024 Precision, Recall, F1, Accuracy, AUC [Paper] [Link]
PairVul 2024 Accuracy, Pairwise Accuracy, F1-score, MCC [Paper] [Link]
VulSmart 2024 Precision, Recall, F1, Accuracy [Paper] [Link]
KernJC 2024 TP, TN, FP, FN, Precision, Recall, F1, Accuracy [Paper] [Link]
LLM4Vuln 2025 TP,TN, FP, FN, F1-score [Paper] [Link]
VULZOO 2025 Precision, Recall, F1, Accuracy [Paper] [Link]
CWE-Bench-Java 2025 #Detected, Avg. False Discovery Rate, Avg. F1, Precision, Recall [Paper] [Link]
CASTLE 2025 CASTLE Score, Combination Score, Precision, Recall, Accuracy [Paper] [Link]
SecVulEval 2025 human-evaluated scoring rubric [Paper] [Link]
JITVUL 2025 Precision, Recall, F1, Accuracy [Paper] [Link]
Li et al. 2025 Precision, Recall, F1, Accuracy [Paper] [Link]
BinPool 2025 Precision, Recall, F1, Accuracy [Paper] [Link]
ICVul 2025 Precision, Recall, F1, Accuracy [Paper] [Link]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors