Assessments

Exams help us determine whether students have learned, provide feedback to students about what they have learned, drive students to study, and help discriminate among students. However, writing challenging yet fair exam questions is one of the most difficult things we do as instructors. Luckily, we can use research on learning and data from our own exams to write better questions and make exams learning experiences as well as assessment tools.

Writing exam questions

One way to write useful exam questions is to complete the following process:

Write a learning objective statement that reflects what you expect students to be able to do after they have learned what you are teaching. For example, do students need to recall a fact, analyze data, interpret a graph or figure, solve a problem, make a prediction, or define a term?
Then draft a question or problem that requires students to achieve the objective to answer the question or problem correctly.
Rate the level of thinking students will use to answer the question. Categorizing objectives according to Bloom's taxonomy is one approach to determining whether students will need to think at a higher or lower level. Recalling facts and describing ideas are tasks that are not very cognitively demanding, while applying knowledge and skills to a new situation, analyzing data, and constructing and evaluating arguments demand higher level thinking. This set of verbs can be helpful for thinking about how to turn a low level objective into a higher level objective.
If you want to turn the question into multiple-choice, generate a list misconceptions, erroneous ideas, or errors in logic that students may employ that would prevent them fom being successful on the question. Write response options that students would be likely to choose if they held misconceptions or erroneous ideas, or made errors in logic. Try to avoid "none of the above" and "all of the above" responses since students may be able to rule these out without understanding the material.
Take your own exam or asking a colleague, graduate TA, or undergraduate learning assistant to take it and give you feedback. Are the questions or problems clear? Are they getting at important ideas in the discipline? Do they vary in difficulty? Remember that students will likely take at least twice as long as you will to complete the exam.
Revisit your exam AFTER students have taken it. Did some students finish early? If not, maybe the exam was too long. Do the test statistics reveal issues with the difficulty of the questions or problems, or their ability to reveal useful information about student learning?

Interpreting exam statistics

Two commonly calculated exam statistics can reveal important information about the quality of exam questions. These statistics are calculated automatically for multiple-choice exams using bubble sheets, but can also be calculated for open-ended questions or hand-graded exams by tracking students' scores on individual questions or problems.

The first is item difficulty, which may be better understood as item easiness. It is the percent or proportion of students who anwered the question correctly (the P-value). If all answer correctly, item difficulty is 100% or 1.00 (i.e., the question is "easy" to answer). If no students answer the question correctly, item difficulty is 0% or 0.00 (i.e., the question is impossible to answer).

The other common exam statistic is item discrimination, also known as R or the point biserial correlation, is the correlation between students' performance on a specific question and on the entire test. The closer the value is to +1.0, the better it is at discriminating among students who performed well versus poorly on the test. Discrimination values of >0.20 are considered good.

Item difficulty and item discrimination values should be considered together when evaluating the quality of particular exam questions and the exam as a whole. For example, if the question assessing an idea that is straightforward to understand and that all students should have learned, item difficulty values (i.e., item easiness) will be high and item discrimination will be low. If however many students miss the question (item difficulty value is low), including the highest performing students in the class (item discrimination is also low), it may not be a very good exam question - students may not be understanding or interpreting it as you intended.

In designing exams, here are questions and points to consider:

How many questions should have low, medium, or high levels of difficulty? An exam with too many difficult questions may be discouraging to students, and an exam with too many easy questions may not be fully assessing what students have learned.
In revising your exams from year to year, do you want to keep questions with a difficulty of less than 0.20? Perhaps yes, if the question is fundamental to learning in your course and you want to make sure all students understand the idea the question is asssessing.
In revising your exams from year to year, do you want to keep questions with a difficulty of greater than 0.80? Perhaps yes, if the question is assessing a complex or challenging idea that is fundamental or otherwise important to the discipline.
It is useful to consider difficulty and discrimination together. A question with a low difficulty value (i.e., not many students answered correctly) and a high discrimination value may be helpful for identifying the highest performing students. A question with a low difficulty and a low discrimination value may need to be revised because even the highest performing students are unable to answer it.

Making exams a learning experience

We often think of exams as the last thing students do - a demonstration of what they have learned. However, exams can also be opportunities for further learning. Here are two strategies for encouraging students to LEARN from their exams:

Exam corrections. As much as we would like for our students to review their graded exams and study the content they missed, there are few if any immediate incentives for students to do so. Giving students the opportunity to earn a portion of their missed points back can be a good incentive (not all of them, which may discourage studying in the first place!). Ask students to provide correct responses for the questions they missed along with an explanation for why their first response was problematic. Test corrections work best when instructors DO NOT distribute exam keys, and when exam questions demand higher level thinking. If the questions require only factual recall, then the reason students miss them is because they simply didn't remember the facts.

Group testing. Peer instruction not only works for in-class activities, it can be used effectively through a process called group testing, collaborative testing, or two-stage exams. Students first take the exam solo. Solo exams are collected. Then students work in previously assigned groups to take the exam again, as a group. Through this process students have the opportunity to think on their own, and then are required to explain and defend their thinking to their peers. Most of the time, group test scores are higher than solo test scores because the quality of stduents' thinking improves as a result of the debate. In fact, students who all selected incorrect responses on the solo exam can arrive at the correct response through the process of thinking and debating as a group - this is not simply the result of the "smart" student in the group telling the rest of the group what the correct answer is. Group testing can also encourage less confident students to speak up and defend their selections to the more vocal members of the group when their group test grade is on the line. For more on group testing, see:

Using Group Exams in Your Class
See group testing in action in this video clip from the University of British Columbia

Imagine for a moment the perfect chocolate chip cookie. What would it be like? Would it be big and chewy or small and crisp? Would it have lots of chips, or fewer chips and just a touch of salt to complement the sweetness? each of us has a vision of our ideal chocolate chip cookie, just like we all as instructors have a vision of what high quality student work should look like.

How does this relate to rubrics? Rubrics are tools for making the elements of a learning task clear: what is the desired performance (i.e., the ideal chocolate chip cookie) and at what levels this performance can be achieved (e.g., some chocolate chips are necessary, but the ideal ratio of chip to cooke is...). For example, the idea of "class participation" or "writing a journal-style research paper" may be clear to us as instructors, and rubrics can make transparent what we are looking for. Specifically, rubrics:

Create a common framework and language for assessment for students and instructors
Can expedite the process of evaluating complex or ill-structured products or behaviors
Enable multiple raters to apply the same criteria and standards
Are criterion references rather than norm references - raters ask, "Did the student meet the criteria for level 4 of the rubric?" rather than "How well did this student do compared to other students?"
Can promote shared expectations and grading practices among faculty when they collaborate to develop a rubric

Types of rubrics

There are two main types of rubrics:

Analytic: Analytic rubrics are more common and can be more useful because they outline levels of performance for each criterion. Here is an example course project from an introductory level non-majors biology course, and the associated analytic rubric.

Holistic: Holistic rubrics do not separate levels of performance for each criterion. For example, the chocolate chip cookie could be categorized as:

Excellent: Cookies are approximately 3 inches in diameter, consistently containing ~10 chips, texture is crisp around the edges and soft in the middle
Acceptable: Cookies are between 2 and 4 inches in diameter, containing a varying level of chips with a minimum of at least 5 chips, texture varies from cookie to cookie
Unacceptable: Cookies are either smaller than 2 inches or larger than 4 inches in diameter, there are no chips or the number of chips varies widely from cookie to cookie, texture suggests the cookies are undercooked or overcooked

Writing and using a rubric

Create the rubric. One way of creating a rubric is to:

Determine a task that would benefit from a rubric. This can be a specific assignment such as a project, poster, paper, or presentation, or an overall behavior such as lab citizenship or class participation.
Identify the features of the task that represent the scope of desired performance. In an analytic rubric, these would be the list of criteria that comprise the desired performance.
Determine the levels of mastery or performance for each task. Set the ideal performance at one end of the rubric, and unacceptable performance at the other end. There can be one, two, or even three levels inbetween ideal and unacceptable. Aim for the smallest number of levels that would allow for discrimination between levels of performance. Three or four levels altogether is usually sufficient; more than four levels often becomes difficult to discriminate. Here are some options for levels:
- Exceeds expectations, meets expectations, near expectations, below expectations
- Exemplary, proficient, marginal, unacceptable
- Mastery, proficiency, developing, novice
Add explanations for each level of performance for each criterion. What does it mean to exceed expectations? Meet expectations? Be near meeting expectations, but not quite meet them? Be below expectations?

Get feedback on your rubric. Once you have drafted your rubric, share it with a colleague, graduate teaching assistant, or undergraduate learning assistant for feedback. Are the criteria clear and aligned with the task? Are the levels of performance clear, distinct from one another, and reasonably representative of what students in the course will be able to accomplish?

Make rubrics available to students. Once you are satisfied with your rubric, give it to students when you assign the task. This will make your expectations clear from the outset. Consider asking students to use the rubric to self-assess their performance, for example by assessing their own class participation for the preceding week or evaluating a draft of a project. Also consider asking students to assess one another's performance. This will give them practice interpreting and applying the rubric in a way that will help them use the rubric to improve their own work.

Revise rubrics as needed. As you use your rubric, you may find it necessary to revie the criteria, the levels, or the explanations of each. Make notes as you use your rubric so you can revise it for future use. Try not to make changes while the task is in progress so that students don't feel like expectations are shifting. Changes can be made if it has become clear that the rubric is unclear or unfair. Any changes, including the reasons changes are being made (e.g., to clarify expectations, make grading more fair, etc.) should be communicated clearly to all students in class and through Canvas.

Learn more

For more on creating rubrics and examples of rubrics, see:

Association of American Colleges & Universities VALUE Rubric Development Project
Association for the Assessment of Learning in Higher Education site on Sample Rubrics
Jon Mueller's Authentic Assessment Toolbox.
Science Education Resource Center sites on Assessment Using Rubrics and Developing Scoring Rubrics