
RESEARCH LIBRARY
RESEARCH LIBRARY
View the latest publications from members of the NBME research team
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - p 2880–2886
This paper presents a corpus of 43,985 clinical patient notes (PNs) written by 35,156 examinees during the high-stakes USMLE® Step 2 Clinical Skills examination.
Applied Psychological Measurement: Volume 46, issue 6, page(s) 529-547
The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria.
Academic Medicine: June 2022
This study examines the associations between Step 3 scores and subsequent receipt of disciplinary action taken by state medical boards for problematic behavior in practice. It analyzes Step 3 total, Step 3 computer-based case simulation (CCS), and Step 3multiple-choice question (MCQ) scores.
Journal of Graduate Medical Education: Volume 14, Issue 3, Pages 353-354
Letter to the editor.
Journal of Educational Measurement: Volume 59, Issue 2, Pages 140-160
A conceptual framework for thinking about the problem of score comparability is given followed by a description of three classes of connectives. Examples from the history of innovations in testing are given for each class.
Advances in Health Sciences Education: Volume 27, p 1401–1422
After collecting eye-tracking data from 26 students responding to clinical MCQs, analysis is performed by providing 119 eye-tracking features as input for a machine-learning model aiming to classify correct and incorrect responses. The predictive power of various combinations of features within the model is evaluated to understand how different feature interactions contribute to the predictions.
JMIR Medical Education: Volume 8 - Issue 2 - e30988
This article aims to compare the reliability of two assessment groups (crowdsourced laypeople and patient advocates) in rating physician error disclosure communication skills using the Video-Based Communication Assessment app.
Academic Medicine: Volume 97 - Issue 5 - Pages 718-722
The purpose of this 2019–2020 study was to statistically identify and qualitatively review USMLE Step 1 exam questions (items) using differential item functioning (DIF) methodology.
Academic Medicine: Volume 97 - Issue 4 - Pages 476-477
Response to to emphasize that although findings support a relationship between multiple USMLE attempts and increased likelihood of receiving disciplinary actions, the findings in isolation are not sufficient for proposing new policy on how many attempts should be allowed.
Educational Measurement: Issues and Practices: Volume 41 - Issue 1 - Pages 95-96
Often unanticipated situations arise that can create a range of problems from threats to score validity, to unexpected financial costs, and even longer-term reputational damage. This module discusses some of these unusual challenges that usually occur in a credentialing program.