This article critically reviews how diagnostic models have been conceptualized and how they compare to other approaches used in educational measurement. In particular, certain assumptions that have been taken for granted and used as defining characteristics of diagnostic models are reviewed and it is questioned whether these assumptions are the reason why these models have not had the success in operational analyses and large-scale applications, contrary to what many have hoped.
The purpose of this survey is to understand how the prevalence of beliefs, attitudes, and expectations about Alzheimer's disease dementia in the public could inform strategies to mitigate stigma.
Focusing specifically on examples set in the context of movement from Bachelor's level undergraduate programmes to enrolment in medical school, this publication argues that a great deal of what happens on college campuses today, curricular and otherwise, is (in)directly driven by the not‐so‐invisible hand of the medical education enterprise.
Utilizing algorithms to generate items in educational and psychological testing is an active area of research for obvious reasons: Test items are predominantly written by humans, in most cases by content experts who represent a limited and potentially costly resource. Using algorithms instead has the appeal to provide an unlimited resource for this crucial part of assessment development.
This study uses item response data from the November–December 2014 and April 2015 NAVLE administrations (n =5,292), to conduct timing analyses comparing performance across several examinee subgroups. The results provide evidence that conditions were sufficient for most examinees, thereby supporting the current time limits. For the relatively few examinees who may have been impacted, results suggest the cause is not a bias with the test but rather the effect of poor pacing behavior combined with knowledge deficits.
This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.
The purpose of this paper is to suggest an approach to job analysis that addresses broad competencies while maintaining the rigor of traditional job analysis and the specificity of good test blueprints.
This review is a descriptive summary of the development of National EM M4 examinations, Version 1 (V1) and Version 2 (V2), and the NBME EM Advanced Clinical Examination (ACE) and their relevant usage and performance data. In particular, it describes how examination content was edited to affect desired changes in examination performance data and offers a model for educators seeking to develop their own examinations.
The US Food and Drug Administration (FDA), as part of its regulatory mission, is charged with determining whether a clinical outcome assessment (COA) is “fit for purpose” when used in clinical trials to support drug approval and product labeling. This paper provides a review (and some commentary) on the current state of affairs in COA development/evaluation/use with a focus on one aspect: How do you know you are measuring the right thing? In the psychometric literature, this concept is referred to broadly as validity and has itself evolved over many years of research and application.