RESEARCH LIBRARY

View the latest publications from members of the NBME research team

Showing 1 - 10 of 11 Research Library Publications

A Comparison of Strategies for Smoothing Parameter Selection for Mixed-Format Tests Under the Random Groups Design

Posted: December 1, 2018 | C. Liu, M. J. Kolen

Journal of Educational Measurement: Volume 55, Issue 4, Pages 564-581

Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed‐format pseudo tests under the random groups design.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Does Incorporating a Measure of Clinical Workload Improve Workplace-Based Assessment Scores? Insights for Measurement Precision and Longitudinal Score Growth From Ten Pediatrics Residency Programs

Posted: October 30, 2018 | Y.S. Park, P.J. Hicks, C. Carraccio, M. Margolis, A. Schwartz

Academic Medicine: November 2018 - Volume 93 - Issue 11S - p S21-S29

This study investigates the impact of incorporating observer-reported workload into workplace-based assessment (WBA) scores on (1) psychometric characteristics of WBA scores and (2) measuring changes in performance over time using workload-unadjusted versus workload-adjusted scores.

Category:Assessment-Oriented Research, Scoring

Palliative Care Competencies and Readiness for Independent Practice: A Report on the American Academy of Hospice and Palliative Medicine Review of the U.S. Medical Licensing Step Examinations

Posted: September 1, 2018 | E. C. Carey, M. Paniagua, L. J. Morrison, S. K. Levine, J. C. Klick, G. T. Buckholz, J. Rotella, J. Bruno, S. Liao, R. M. Arnold

Journal of Pain and Symptom Management: Volume 56, Issue 3, p371-378

This article reviews the USMLE step examinations to determine whether they test the palliative care (PC) knowledge necessary for graduating medical students and residents applying for licensure.

Category:Assessment-Oriented Research, Reliability/Validity, Product-Oriented Research, USMLE, Health Professions

Perceived Utility of the USMLE Step 2 Clinical Skills Examination from a GME Perspective

Posted: July 1, 2018 | M. Paniagua, J. Salt, K. Swygert, M. Barone

Journal of Medical Regulation (2018) 104 (2): 51–57

There have been a number of important stakeholder opinions critical of the Step 2 Clinical Skills Examination (CS) in the United States Medical Licensing Examination (USMLE) licensure sequence. The Resident Program Director (RPD) Awareness survey was convened to gauge perceptions of current and potential Step 2 CS use, attitudes towards the importance of residents' clinical skills, and awareness of a medical student petition against Step 2 CS. This was a cross-sectional survey which resulted in 205 responses from a representative sampling of RPDs across various specialties, regions and program sizes.

Category:Product-Oriented Research, USMLE

Providing Utility, Not Scores: Visualizations to Support Subscore Inferences

Posted: June 26, 2018 | R. A Feinberg, D. P. Jurich

Educational Measurement: Issues and Practice, 37: 5-8

This article spotlights the winners of the 2018 EM:IP Cover Graphic/Data Visualization Competition.

Category:Assessment-Oriented Research, Scoring

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Posted: June 1, 2018 | P. Harik, B. E. Clauser, I. Grabovsky, P. Baldwin, M. Margolis, D. Bucak, M. Jodoin, W. Walsh, S. Haist

Journal of Educational Measurement: Volume 55, Issue 2, Pages 308-327

The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring, Product-Oriented Research, USMLE

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports

Posted: June 1, 2018 | M. von Davier, J. H. Shin, L. Khorramdel, L. Stankov

Applied Psychological Measurement: Volume: 42 issue: 4, page(s): 291-306

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

When Listening is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions

Posted: May 1, 2018 | K. Short, S. D. Bucak, F. Rosenthal, M. R. Raymond

Academic Medicine: May 2018 - Volume 93 - Issue 5 - p 781-785

In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred.

Category:Product-Oriented Research, USMLE

The Use of Multivariate Generalizability Theory to Evaluate the Quality of Subscores

Posted: April 3, 2018 | Z. Jiang, M.R. Raymond

Applied Psychological Measurement: Volume: 42 issue: 8, page(s): 595-612

Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as G, which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate G under various conditions of subtest reliability, subtest correlations, and variability in subtest means.

Category:Assessment-Oriented Research, Reliability/Validity, Scoring

Examining the Validity of the North American Veterinary Licensing Examination (NAVLE) Time Constraints

Posted: February 2, 2018 | R.A. Feinberg, D. Jurich, J. Lord, H. Case, J. Hawley

Journal of Veterinary Medical Education 2018 45:3, 381-387

This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.

Category:Assessment-Oriented Research, Reliability/Validity, Product-Oriented Research, NBME

NBME Self-Assessment Bundles

Stay Up to Date

Stay Up to Date

New Psychometric Workshops

INSIGHTS® Demo

Open Grant Opportunities

RESEARCH LIBRARY

Filter:

A Comparison of Strategies for Smoothing Parameter Selection for Mixed-Format Tests Under the Random Groups Design

Does Incorporating a Measure of Clinical Workload Improve Workplace-Based Assessment Scores? Insights for Measurement Precision and Longitudinal Score Growth From Ten Pediatrics Residency Programs

Palliative Care Competencies and Readiness for Independent Practice: A Report on the American Academy of Hospice and Palliative Medicine Review of the U.S. Medical Licensing Step Examinations

Perceived Utility of the USMLE Step 2 Clinical Skills Examination from a GME Perspective

Providing Utility, Not Scores: Visualizations to Support Subscore Inferences

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports

When Listening is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions

The Use of Multivariate Generalizability Theory to Evaluate the Quality of Subscores

Examining the Validity of the North American Veterinary Licensing Examination (NAVLE) Time Constraints