Outcome Project Logo


Key Considerations for Selecting Assessment Instruments and Implementing Assessment Systems

The ability to demonstrate educational outcomes as the achievement of competency-based learning objectives provides evidence of preparing competent physicians who can meet the health care needs of the public. Educational assessment is, therefore, a key component of the Outcome Project and is intended to:

  1. Assess residents' attainment of competency-based objectives
  2. Facilitate continuous improvement of the educational experience
  3. Facilitate continuous improvement of resident performance
  4. Facilitate continuous improvement of residency program performance

Assessment is defined as the "process of collecting, synthesizing, and interpreting information to aid decision-making".1 The results of an assessment should allow sound inferences about what learners know, believe, and can do 2 in defined contexts. Assessment, therefore, integrates several concepts, which are described below.

Assessment Instrument or Approach

  1. The assessment approach provides valid data.
    Valid data provide accurate information about what is being assessed. Different types of evidence may be used to infer validity. It may be inferred when assessment results help to predict performance in actual practice. Validity may be inferred also when it is possible to detect change (responsiveness). This occurs, for example, when residents perform poorly on a cardiology assessment prior to completing a cardiology rotation, but perform well on the same assessment following the rotation. In addition, validity may be inferred when there is a strong relationship between data obtained and external indicators (discriminative validity). An example of the latter occurs when medical students perform poorly and cardiologists perform well on the same cardiology quiz. As knowledge about complex assessment advances, however, it is possible that perspectives on validity also will evolve.

  2. The assessment approach yields reliable data.
    An assessment approach may be considered reliable when it yields consistent results regardless of when it is used, who uses it, and which item or case is assessed. The importance of a specific type of reliability depends upon what is being assessed and the method by which it is being assessed. Generally speaking, reliability or generalizability coefficients of 0.8 and higher are desired. Inter-observer or inter-rater reliability is an indicator that different assessors have provided similar ratings for the same performance. Inter-case or inter-item reliability is the degree of consistency in an individual's performance across different cases, situations, or items. Test-retest reliability is an indicator of consistency over time. Generalizability theory offers an alternative approach to assessing the individual reliabilities listed above by allowing examination of specific sources of unreliability and providing an overall reliability index termed a G coefficient.

  3. The assessment approach is feasible.
    Feasibility depends on several issues that include the following: time and training required to implement the assessment, equipment or technology required, number of assessments required per examinee, financial cost, and the extent to which an assessment has been used.

  4. The assessment approach is likely to apply to my assessment circumstances (external validity).
    When choosing an assessment approach, the conditions in which an assessment has been previously conducted should be considered. These conditions include the purpose for which the assessment was used, the characteristics of those assessed and the assessors, and the setting in which the assessment was conducted. Assessments that have been used in testing centers, for instance, may require modification for use in clinics or wards where the pace may vary and interruptions may occur.

  5. The assessment provides valuable information.
    In terms of value, assessment should provide new and useful information that facilitates teaching and learning. For instance, the assessment should allow the collection of enough detailed information that it is possible to know what performance improvements or curricular modifications are needed.

Assessment System

  1. Assessment is consistent with curriculum/program objectives.
    Consistency between objectives and assessment occurs when there are clear parallels between what is taught and what is assessed. If , for example, a course is designed to improve knowledge and procedural skills required to conduct upper endoscopies, then both knowledge and skills in this area should be assessed. Consistency between objectives and assessment also increases the likelihood that learners will attend to a broader scope of course objectives and not just content that will be assessed.

  2. The educational objectives are representative of the educational domains of interest.
    It is not feasible to assess attainment of all educational objectives in all contexts, therefore, it is necessary to select a sample of what will be assessed. Representative behaviors for each competency in defined contexts should be identified. For the medical knowledge competency, identification may be guided by considering, for instance, common acute and chronic problems that occur in ambulatory settings of specific specialties. For the professionalism competency, development of educational objectives might be guided by considering common ethical dilemmas, relevant cultural contexts of patient care, and key professional courtesies intrinsic to patient care and teamwork for specific specialties in defined settings.

  3. Multiple assessment approaches/instruments are employed.
    Because competence is multi-dimensional and individual assessment approaches have limitations, it is unlikely that a single approach to assessment will be adequate. This problem is addressed by using a few different assessment approaches.

  4. Multiple observations are conducted.
    Multiple observations improve the reliability or precision of assessment and allow identification of patterns of behavior over time.

  5. Multiple observers/raters provide assessments.
    Using multiple observers improves the reliability or precision of assessment and enhances the scope of assessment.

  6. Performance is assessed according to pre-specified standards or criteria.
    Pre-specified standards indicate objective criteria for "good enough" or "borderline" performance and help to reduce subjective assessment.

  7. Assessment is fair.
    Fairness pertains to giving all learners the same or equal opportunity to perform. While fairness may be enhanced by valid and reliable assessment, an assessment may still be unfair if the results are influenced by something other than ability. For example, it would be unfair to compare the assessment results of a learner who was on call the night before an assessment with the results of peers who were not on call. With the exception of baseline or needs assessments, fairness pertains also to providing learners opportunities to learn the material on which they will be assessed. Learners should be informed about what will and will not be assessed. In addition, there should be clarity about the assessment format and how performance will be rated.

Lynch DC, Swing SR.
Research Department
ACGME

References
1 Airasian PW. Classroom assessment (3rd ed.). New York: McGraw-Hill, 1997.
2 McMillan JH. Essential assessment concepts for teachers and administrators. Thousand Oaks, CA: Corwin Press, Inc., 2001.