RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 10 of 27 Research Library Publications

An Experimental Comparison of Multiple-Choice and Short-Answer Questions on a High-Stakes Test for Medical Students

Posted: September 4, 2023 | Janet Mee, Ravi Pandian, Justin Wolczynski, Amy Morales, Miguel Paniagua, Polina Harik, Peter Baldwin, Brian E. Clauser

Advances in Health Sciences Education

Recent advancements enable replacing MCQs with SAQs in high-stakes assessments, but prior research often used small samples under low stakes and lacked time data. This study assesses difficulty, discrimination, and time in a large-scale high-stakes context

Category:Assessment-Oriented Research, Links to Outcomes, General Measurement

ACTA: Short-Answer Grading in High-Stakes Medical Exams

Posted: July 1, 2023 | King Yiu Suen, Victoria Yaneva, Le An Ha, Janet Mee, Yiyun Zhou, Polina Harik

Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Pages 443-447

This paper presents the ACTA system, which performs automated short-answer grading in the domain of high-stakes medical exams. The system builds upon previous work on neural similarity-based grading approaches by applying these to the medical domain and utilizing contrastive learning as a means to optimize the similarity metric.

Category:Assessment-Oriented Research, Scoring, General Measurement

Advancing Natural Language Processing in Educational Assessment

Posted: June 5, 2023 | Victoria Yaneva (editor), Matthias von Davier (editor)

Advancing Natural Language Processing in Educational Assessment

This book examines the use of natural language technology in educational testing, measurement, and assessment. Recent developments in natural language processing (NLP) have enabled large-scale educational applications, though scholars and professionals may lack a shared understanding of the strengths and limitations of NLP in assessment as well as the challenges that testing organizations face in implementation. This first-of-its-kind book provides evidence-based practices for the use of NLP-based approaches to automated text and speech scoring, language proficiency assessment, technology-assisted item generation, gamification, learner feedback, and beyond.

Category:Assessment-Oriented Research, Applications of Technology, General Measurement

Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating

Posted: June 14, 2022 | Chunyan Liu, Daniel Jurich

Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547

The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.

Category:Assessment-Oriented Research, General Measurement

Digital Module 28: Unusual Things That Usually Occur in a Credentialing Testing Program

Posted: March 17, 2022 | Richard A. Feinberg, Carol Morrison, Mark R. Raymond

Educational Measurement: Issues and Practices: Volume 41 - Issue 1 - Pages 95-96

Often unanticipated situations arise that can create a range of problems from threats to score validity, to unexpected financial costs, and even longer-term reputational damage. This module discusses some of these unusual challenges that usually occur in a credentialing program.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

Gender Comparison in Milestone Trajectories and Medical Knowledge Examination Scores among Internal Medicine Residents

Posted: May 25, 2021 | Karen E. Hauer, Daniel Jurich, Jonathan Vandergrift, Rebecca S. Lipner, Furman S. McDonald, Kenji Yamazaki, Davoren Chick, Kevin McAllister, Eric S. Holmboe

Academic Medicine: Volume 96 - Issue 6 - p 876-884(9)

This study examines whether there are group differences in milestone ratings submitted by program directors working with clinical competency committees based on gender for internal medicine residents and whether women and men rated similarly on subsequent in-training and certification examinations.

Category:Assessment-Oriented Research, General Measurement

A Problem with the Bookmark Procedure's Correction for Guessing

Posted: November 24, 2020 | Peter Baldwin

Educational Measurement: Issues and Practice

This article aims to answer the question: when the assumption that examinees may apply themselves fully yet still respond incorrectly is violated, what are the consequences of using the modified model proposed by Lewis and his colleagues?

Category:Assessment-Oriented Research, General Measurement

The Role of Data Science and Machine Learning in Health Professions Education: Practical Applications, Theoretical Contributions, and Epistemic Beliefs

Posted: November 3, 2020 | Martin G. Tolsgaard, Christy K. Boscardin, Yoon Soo Park, Monica M. Cuddy, Stefanie S. Sebok-Syer

Advances in Health Sciences Education: Volume 25, p 1057–1086 (2020)

This critical review explores: (1) published applications of data science and ML in HPE literature and (2) the potential role of data science and ML in shifting theoretical and epistemological perspectives in HPE research and practice.

Category:Assessment-Oriented Research, General Measurement, Health Professions

How Examinees Use Time

Posted: June 25, 2020 | P. Harik, R.A. Feinberg RA, B.E. Clauser

Integrating Timing Considerations to Improve Testing Practices

This chapter addresses a different aspect of the use of timing data: it provides a framework for understanding how an examinee's use of time interfaces with time limits to impact both test performance and the validity of inferences made based on test scores. It focuses primarily on examinations that are administered as part of the physician licensure process.

Category:Assessment-Oriented Research, General Measurement, Reliability/Validity

Integrating Timing Considerations to Improve Testing Practices

Posted: June 25, 2020 | M.J. Margolis, R.A. Feinberg (eds)

Integrating Timing Considerations to Improve Testing Practices

This book synthesizes a wealth of theory and research on time issues in assessment into actionable advice for test development, administration, and scoring.

Category:Assessment-Oriented Research, General Measurement

Stay Up to Date

USMLE® Fee Assistance

Communication Learning Assessment

New Psychometric Workshops

NBME Academy

Latin America Grants

USMLE® Fee Assistance

RESEARCH LIBRARY

Filter:

An Experimental Comparison of Multiple-Choice and Short-Answer Questions on a High-Stakes Test for Medical Students

ACTA: Short-Answer Grading in High-Stakes Medical Exams

Advancing Natural Language Processing in Educational Assessment

Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating

Digital Module 28: Unusual Things That Usually Occur in a Credentialing Testing Program

Gender Comparison in Milestone Trajectories and Medical Knowledge Examination Scores among Internal Medicine Residents

A Problem with the Bookmark Procedure's Correction for Guessing

The Role of Data Science and Machine Learning in Health Professions Education: Practical Applications, Theoretical Contributions, and Epistemic Beliefs

How Examinees Use Time

Integrating Timing Considerations to Improve Testing Practices