Natural Language Processing Predicted to Further the Field of Assessment for Health Professionals

Posted December 11, 2020

Data Scientist Victoria Yaneva, PhD, discusses NBME’s early activities using Natural Language Processing (NLP) for assessments of health professionals. Below, you can learn about NBME's research collaboration with the University of Wolverhampton, as well as an upcoming educational conference in 2021.


By Victoria Yaneva, PhD

In recent years, the health professionals’ assessment field has grown increasingly interested in exploring the potential of Natural Language Processing (or NLP for short) to improve two key areas of exam development: quality and efficiency. This interest stems from remarkable advances that the NLP community has achieved in the past decade, from speech recognition and question answering systems such as Alexa or Siri, to automated translation between languages, customer support chatbots, and even the ability of certain platforms to suggest songs, movies, and products that match our taste. So how can these advances be harnessed for the assessment of aspiring or practicing physicians?

In Fall 2021, NBME will host a conference in Philadelphia to engage in discussions, collaboration and information sharing on how best to lean on NLP capabilities to advance the field of assessment for health professionals, as well as assessment overall.

Early efforts in NLP started at NBME more than a decade ago. Our Office of Research ran pilot projects together with a research team at the University of Wolverhampton, UK. Since then, these ongoing studies have led to the more recent exploration of several NLP applications:  

  • the potential for automated scoring of physician notes taken after a clinical encounter and short-answer questions,
  • automated test item generation,
  • technology-assisted item-writing through automated distractor suggestions,
  • predicting test item characteristics from text (e.g., difficulty and response time), and
  • estimating the likelihood that a newly created test item would meet quality standards.

While these projects are at various stages of research, many of them show a great potential to improve the development or scoring of assessments in several ways. For example, predicting test item difficulty may be used to guide the development of items with desired difficulty levels. The automated detection of key concepts found in patient notes written by students or residents enables evaluators to focus their attention on assessing more complex aspects of examinee writing such as nuanced inferences.

Finally, automated distractor suggestions may help test item writers come up with better distractors, write items more efficiently, or aid the training of novice item writers.

A common theme in all these applications is the use of technology to detect patterns and similarities. This enables humans to focus on those parts of assessment that require creativity, subject knowledge, and the understanding of shared meaning. Inspired by this goal, NBME continues its research in NLP due to its potential for improving the quality and efficiency of assessment development and scoring.

Early next year, you can look forward seeing a save-the-date for NBME’s specially designed conference focused on cultivating NLP capabilities in assessment.

Back to News Archive