What is Assessment?
For many, the word “assessment” conjures images of multiple-choice questions and stacks of blue books. While these are valid, traditional ways of assessing student performance, you can also use other methods to engage your students and allow them to build skills.
Consider that assessment is a link in the chain of alignment. If learning objectives describe what students will do and your course content demonstrates how to do it, then assessment measures whether your students have met the course learning objectives.
Assessment and learning objectives are very closely tied together. In fact, each of your learning objectives should have at least one assessment attached to it, and a major project may measure student progress on several learning objectives. No matter what your learning objective is, there’s a creative way to assess it.
Types of Assessment
If you’ve ever been involved in any kind of stage production, you’ll be familiar with the amount of work that goes on before the first performance. Actors have to memorize their lines and rehearse their interactions with each other. The stage crew has to build set pieces, sequence lighting and music, and also rehearse those transitions with the actors. If everyone involved in the production hasn’t practiced enough prior to opening night, the production could go from being a drama to a comedy very quickly!
Your courses are much the same. If you aren’t prepared or you haven’t given your students enough opportunities to practice, they won’t perform well in major exams and projects. You can prevent this by designing two major types of assessment, formative and summative, into your course.
Summative assessments are large, high-stakes assessments that measure how well your students have met your course objectives. They can take many forms: traditional multiple-choice exams, longer written assignments, or multi-part projects. End-of-unit, midterm, and final exams, papers, and projects are all summative assessments. They are your theater production’s “live performances.”
Formative assessments are your students’ “rehearsal” for the course’s summative assessments. They are meant to prepare your students for success. Think of them as practice for the real thing. They include “homework” assignments, quizzes, casual in-class knowledge-checks, draft papers, etc.
As the name indicates, “formative” means you’re measuring your students’ ability and understanding while it’s still forming and may not be complete. As such, it’s important that your formative assessments are very low stakes, meaning students are not heavily penalized for being incorrect. At this early stage, you want to encourage them to keep trying. When unencumbered by the stress of strict deadlines or risk of failure, students perform better, build confidence, and become more resilient.
In essence, your goal should be to create a learning environment where it’s permissible to be wrong as long as students invest effort to improve. One simple way of doing this is allowing students to re-submit assignments or provide practice quizzes that are randomly generated from a large pool of questions so each attempt a student makes is different.
Another example of formative assessment is an instructor in a face-to-face classroom posing questions to help keep students focused and check that their message is getting through. In an asynchronous class, doing these knowledge checks isn’t possible, so we need to think of different ways to achieve the same thing. One solution is to embed quiz questions into any video you are using to introduce material. Our video platform, Kaltura, allows you to do just that.
Good formative assessment is your most powerful teaching tool, particularly in an online or blended environment. Request an appointment with a Learning Designer to collaborate on how to design different types of assessments into your course.
Assessment, particularly formative assessment, is not just a tool for measuring your students’ performance. It’s also a vehicle for providing students with constructive feedback. Measuring performance tells you and your students where they are, while providing feedback based on those measurements tells them how to move forward and improve.
Let’s say you need some cash from your checking account. You go to the closest ATM and insert your card. In response, the ATM welcomes you with an indignant “BEEP!” Re-inserting your card only results in additional beeps with no other information revealing why the infernal machine will not accept your card.
After failing to make the ATM blush with a string of curses, you proceed across the street to try a second ATM. Once again, you insert your card… and you are once again treated to another indignant “BEEP!” This time, however, a message and animation appear on the second ATM’s screen. The text says, “Turn-over and re-insert your card.” Beneath the text is an animation of a bank card flipping over and sliding into a slot. Armed with this crucial information, you heed the ATM’s advice, and the machine cheerfully allows you to withdraw your cash.
In our introductory anecdote, both machines were assessing your ability to correctly insert a bank card. However, the first machine gave you very poor feedback (only BEEP!). The second machine gave you excellent, actionable feedback by telling you what was wrong. It even provided a visual aid. This allowed you to correct your behavior and finally achieve your goal. Also of note is how you reacted to the first machine. By the end of the interaction, you were confused, frustrated, and angry at the machine before storming off to a different ATM.
Do you see where this is leading? Good feedback isn’t just a way to improve your students’ knowledge. It can very well be the determining factor in whether a student chooses to persist in your course or program!
Measuring the Right Things
When we assess students, we’re trying to measure the amount and quality of what they have learned. This learning is encoded in the neurons within students’ brains where we can’t directly observe it. Therefore, we make assessments that challenge them to demonstrate their skills, which allows us to infer something about what they know.
Unfortunately, whenever you measure something indirectly there’s a chance of interfering factors having an effect on the results. Any given assessment you create will be generally useful for measuring your intended topic, but there will always be other things you are accidentally assessing that you may not have planned for. This concept is known as “validity,” which describes what the results of an assessment actually indicate. Don’t get too hung up on the definition—first, let’s look at some examples.
A math instructor creates a quiz composed of word problems. The words and grammar she uses are at a higher reading level than what most of her students comprehend. As a result, students who otherwise would have done well if the quiz was at their reading level scored lower. This quiz’s validity lies too far in the area of reading comprehension, taking away from its validity in math.
Suggestion: Avoid situations where advanced knowledge of an unrelated discipline is required in order to perform the task you want to actually measure. This happens most often with the wording of exam questions and assignment instructions. Keep your text concise, and use commonly known words.
A chemistry instructor requires students to turn in a lab report after each session. He has very specific preferences on how this report should be formatted and takes several points off the score if his formatting instructions are not followed to the letter. As a result, a significant number of his students lose points each week for making formatting mistakes despite having documented the lab exercises accurately. The lab report assignment’s validity relies too much on the instructor’s formatting preferences, reducing its validity in measuring chemistry lab skills.
Suggestion: Everyone has their pet peeves. The instructor in this example could have made his and his students’ lives easier by making a lab report template for students to fill out or at the very least taking fewer points off for formatting.
An English writing instructor administers a timed exam in which students must write an essay on a topic. While a core number of her students are able to do this in the allocated time, their work is of noticeably lower quality than previous un-timed assignments. A significant portion of students are unable to complete their essays—some of whom she previously considered to have the best writing ability in the class. She quickly realizes that the time limit she imposed was something she thought she had to do for an exam because that’s what she did when she was in college.
Suggestion: Environmental factors weigh in heavily in all assessments, particularly in face-to-face testing environments. Ask yourself if a time limit is really necessary for students to truly demonstrate their capability in your content area. Other environmental factors, such as room noise, air quality, lighting, and assessment length may also affect the results of your assessments.
You can think of your assessment validity as a percentage of a whole. Of that whole, a certain percentage will be in the thing you actually want to measure. The higher that percentage, the better your assessment is at inferring actual student achievement. However, there will always be a certain percentage of your validity that is measuring other unexpected factors. It’s impossible to completely remove interfering factors, but just being aware of them can help you minimize their impact. One major source of interfering factors comes from the way test questions are constructed. This page from Vanderbilt University goes into detail on how to construct good multiple-choice questions.
While traditional exams are a familiar staple of assessment, they’re less effective at measuring more complex levels of achievement (Miller, 2009). This is especially true when students must perform tasks that involve evaluation and creativity. Authentic assessment is meant to do exactly that. Instead of crafting questions to infer what students know, authentic assessment gets much closer to the reality of not only what students know but also what they can do with that knowledge. Because everything that is being assessed is coming from the student’s brain, it’s very difficult to “game” an authentic assessment in comparison to a multiple-choice test. As an added bonus, authentic assessments can also eliminate a lot of the factors that affect the validity of traditional exams.
Authentic assessments are “authentic” because the deliverable is generated entirely by the student. They generally take the form of large projects that incorporate all of the content knowledge in your course that students use or assemble into something that reflects their own personal interests. The deliverables students produce in an authentic assessment are a personalized interpretation of what you have presented to them over the course of a semester. In fact, if your course’s summative assessment is authentic, you could build your entire course around constructing this deliverable by systematically focusing on the process of producing each of its components.
There are a number of tie-ins between active learning and authentic assessment. You could think of the two as formative and summative counterparts of each other. In our Active Learning guide, we provide a number of active learning activity examples that could easily be re-purposed or scaled up into authentic assessments. For further reading, see Jon Mueller’s Authentic Assessment Toolbox.
One caveat to authentic assessments is they require some techniques that you may be unfamiliar with in order to grade them in a way that is both fair to students and makes sense to them. One solution to this issue is to provide grading rubrics for your assignments.
A rubric lists different criteria you will use to assess an assignment. It describes what different levels of quality look like for each criterion (Andrade, 2000; Arter & Chappuis, 2007; Stiggins, 2001). A well-designed rubric does the following:
- helps you consistently assess student work,
- communicates to students your expectations for an assignment,
- describes what different levels of quality look like for your students,
- allows you to frame your feedback to students around specific criteria,
- helps you pinpoint patterns of understanding among your students so you can adjust your teaching accordingly,
- allows students to assess themselves and reflect on their work before turning it in, and
- opens dialogue with your students when you create the rubric for an assessment together.
Types of Rubrics
There are two common types of rubrics: analytic and holistic.
In the example analytic rubric, the vertical axis lists criteria for the assignment, e.g., “grammar” and “references” for a research paper. Criteria are essentially the things you are interested in evaluating in your students’ work. Along the horizontal axis are quality levels. The intersecting square between a criterion and a quality level describes what that specific criterion looks like at that particular quality level.
This example rubric assesses the quality of bacon, lettuce, and tomato sandwiches.
|0 – Needs Improvement||1 – Meets Expectations||2 – Exceeds Expectations|
|Bread||Bread is either absent, soggy, burnt, or stale.||Bread is toasted golden-brown and is warm and crisp.||Bread is toasted golden-brown and is warm, crisp, and has some other notable feature (e.g., seeds, whole grain, fresh baked, etc.).|
|Vegetables||Vegetables are either absent, excessively wet, or wilted.||Vegetables are crisp, robustly colored, reasonably dry, and fresh.||Vegetables are crisp, robustly colored, reasonably dry and fresh, and have some other notable feature (e.g., an uncommon variety, spinach in place of lettuce, etc.).|
|Bacon||Bacon is either absent, mostly burnt, too hard, or has excess grease||Bacon is an even ratio of crisp and chewy. There are at least two strips.||Bacon is an even ratio of crisp, chewy, and have some other feature (e.g., more than two strips, locally raised, etc.).|
This is a fairly good rubric, but it isn’t perfect. The following general guidelines reference how it could be improved. Keep in mind, these are just guidelines. There’s arguably no such thing as a perfect rubric!
Avoid cases where a result in one criteria directly determines the score of another. For example, If the bread in our BLT is soggy, there’s a good chance it’s because the veggies are too wet or the bacon is too greasy. Rather than using the ingredients as criteria, it might be better to base the criteria on attributes of the overall sandwich, e.g., “Texture,” “Appearance,” and “Ingredient Sourcing.”
Whenever possible, use quantities to describe your quality levels. This leaves less room for interpretation in your expectations. Words like “a few,” “multiple,” or “some” make your rubric more ambiguous, and you may find yourself second-guessing yourself while grading, or your students may have different definitions for these words.
Even Quality Levels
When we have an odd number of choices and we’re indecisive, we tend to pick the middle option. This may not be fair to your students. It’s extra work, but if you can, have four levels of quality in your rubric. It also forces you to think and describe with more specificity what your expectations are.
An analytic rubric might be overkill smaller assignments. In these cases, a holistic rubric might be more appropriate. This type of rubric is much simpler to create. It usually has three to five levels of quality; each one tied to a description of what the student’s work would look like to be rated at that level.
Here’s our BLT rubric again but in holistic form.
|3||Artisanal, locally-baked bread toasted golden-brown with fresh, crisp, organic veggies and locally raised, grass-fed bacon cooked with just the right amount of crunchiness and chewiness.|
|2||Bread is toasted golden-brown on ordinary bread with fresh veggies and bacon of unknown origins.|
|1||Components have inconsistent degrees of quality and one of the following flaws: burnt or untoasted bread; soggy or wilted veggies; mostly burnt, undercooked, or hard bacon.|
|0||Sandwich is missing components or has more than one of the flaws listed above.|
This type of rubric is similar to an analytic rubric, but it focuses on describing what student work that meets your expectations will look like. One common problem with analytic rubrics is there are often cases that crop up unexpectedly that are not listed in the descriptors. The single-point rubric solves this by leaving areas that might need improvement or that exhibit excellence in an open-ended way. It also opens up the opportunity for instructors to give students more specific, useful feedback on their work.
Once again, here is our BLT rubric, but now it’s in single-point form.
|Concerns: Areas to work on||Criteria: What is expected||Advanced: Evidence of Excellence|
|Breading is toasted golden-brown and is warm and crisp.|
|Vegetables are crisp, robustly colored, reasonably dry, and fresh.|
|Bacon is an even ratio of crisp and chewy. There are at least two strips.|
References & Further Reading
Andrade, H. (2000). Using rubrics to promote thinking and learning. Educational Leadership 57(5), 13–18.
Arter, J., & J. Chappuis. (2007). Creating and recognizing quality rubrics. Upper Saddle River, NJ: Pearson/Merrill Prentice Hall.
Brame, C. J. (2019). Writing good multiple-choice test questions. Vanderbilt University Center for Teaching. Retrieved from https://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions
Miller, M. D., Linn, R. L., Gronlund, N. E., & Linn, R. L. (2009). Measurement and assessment in teaching (11th ed.). New York, NY: Pearson.
Mueller, J. (2018). Authentic assessment toolbox. Retrieved form http://jfmueller.faculty.noctrl.edu/toolbox/whatisit.htm
Stiggins, R. J. (2001). Student-involved classroom assessment (3rd ed.). Upper Saddle River, NJ: Prentice-Hall.