Here is another page in my attempt to record what I have learned from a semester and a half of playing around with standards-based grading in my 9th-grade history class, this one focused on how standards change the nitty-gritty of my grading practices. You can see what I’ve already said on the matter here, here, and here.
Like many high school teachers, I teach in a setting that requires a hybrid between a standards-based approach and a calculated letter grade. My school reports progress on standards but also a letter grade for each quarter. Moving between the two makes things more complicated and requires some careful thought. Here is my list of things that proved unexpectedly challenging in this process, along with where I’ve come out on them for now. I’m not sure I’ve answered any of these questions to my complete satisfaction yet, but that’s part of what makes this all so interesting.
How much granularity is there in the scale?
I found myself, in my test run last term, moving between three-point and four-point scales (the reason I did this had to do with the technical set up of my grading program, not a deliberate choice, but the results were interesting to me). In theory, three points–“yes,” “no,” and “not quite”–would seem to be plenty for many tasks, but in practice, I really felt the difference between three and four. The shorter the scale, the less nuance one can communicate to the student. This matters. Imagine playing a game of blind man’s buff in which your friends just said, “warm,” “warm,” “warm” over and over. You’d get sick of it pretty soon. Add “warmer” and “cooler” to the mix, and the game becomes playable. We sometimes talk about these small grading steps as hedging our bets or even pandering to students–I’ve been having this discussion a lot lately with regard to whether a B+ conveys any useful information that a B does not–but, if our goal is to make grading more precise and to communicate as clearly as possible, then maybe the question should be turned around: is there a good reason to limit the tools we have?
I think these nuances are especially important in helping students see their progress over time: the more granular your detail, the more quickly progress will become apparent, and progress is, in itself, inherently motivating.
So on this question, I come down, within reason, on the side of more is better. Certainly, I think 3 steps is too few. And if, like me, you are going to have to convert your standards at the end to a 5 point (A, B, C, D, F) scale, I think the standards should also be tracked on five steps. Otherwise, you create an automatic bias toward one end or the other of your grading scale, by, for example, essentially lumping D and F together.
For next semester, I’m toying with the idea of doing my day-to-day grading for students’ eyes on a 10-point scale, just to better show them the sometimes incremental progress they’re making. That would still be easy to convert to one of five letter grades–it’s not that different than what we do in a conventional grading universe with the 100-point scale.
What are the extremes of the scale?
I think it’s particularly important to be clear about what constitutes the top and bottom of the scale. This may seem obvious–the top is the best, and the bottom is the worst–but, in fact, we tend to live with a good deal of ambiguity around what we mean by “best” or “worst.” Is the top of our scale where students should be unless they’re doing something wrong: “meets expectations”? This is how students and parents tend to read it, reflected in so many conversations that start, “I met with you, and I worked really hard, so I’m wondering why I didn’t get an A.” The drawback to this approach is that, if we treat full marks as an expected baseline and anything else as a problem to fix, we discourage students who may fall higher and lower–by not setting the goal far enough outside the students’ comfort zone and, simultaneously, making anything but perfection seem flawed, we invite them to stay on default mode, checking boxes rather than taking risks. And we may never find out how much more they may be capable of.
On the other hand, if we see the top of our scale as aspirational, pitched to capture what really special work in this class might look like, just a little beyond what we think most students are capable of at this moment, we run the risk of having our message misunderstood. Students, parents, and colleges are so often conditioned to a steady diet of A’s that they may all understand us to be reporting a problem if we rank something as “good” rather than “excellent.” To make this approach work requires clear, consistent and persistent communication on an institutional level.
We don’t spend as much time talking about the bottom of the grading scale, partly because, in independent schools, at least, we don’t use it as much, but I find I’m struggling here, too, to translate my standards based mark into an appropriate letter grade. At the moment, my school’s scale has four steps—Fully, Mostly, and Somewhat Meets Expectations or Needs Improvement—plus “insufficient evidence,” which generally means a student didn’t turn in the work. This 4-point scale is pretty typical, I think, for standards-based grading systems I’ve seen, and it meshes fine with a 4.0 gap scale, but if you have to first translate back to a five-step letter grade scale, there’s a problem. As I’ve been using it, at least, “needs improvement” basically captures what would have been both my D and F grades, which is fine for communication purposes: I don’t feel a strong need for the extra nuance there. When I go to convert, however, this has the effect of either giving more students a failing grade, if I translate needs improvement as an F (and this doesn’t actually feel like a fair translation to me: there is a lot of room between ‘not good’ and failure), or making it technically impossible to fail if you turn the work in, if I translate needs improvement as a D. This may seem unnecessarily technical, but grading systems are technical, and one that depends on too much fudge factor seems problematic. I think we need to think, on an institution-wide basis, about where the line of failure falls for us. At what point is work so unacceptable that we should give no credit for it, and how many steps do we need between that and complete success? That, fundamentally, is the philosophical question behind those numbers and letters, and unless we have a clear answer to that question, whatever we do is likely to be at least somewhat inconsistent.
Do standards change over time?
Like many of the questions I’ve been wrestling with this year, this one is not actually specific to standards-based grading, but a standards-based approach has forced me to be clearer about my answers. As time goes by, we expect more from students. Acceptable work from a 9th-grader would be weak work from a senior. Even over the course of the year, I hold students more accountable as the year goes on and I know they’re doing a given task for the third or fourth time, rather than the first.
At the same time, students also need to see themselves getting better. One of the benefits of grading students against specified standards is that the standards provide both us and them a yardstick against which to measure progress. If we change that yardstick too often, we risk losing that advantage.
I think there is a good case to be made for either semester-based or year-based standards. Year-long standards are probably pedagogically clearer, giving students a sense of the entire journey ahead of them, but if we measure students on that basis, we need to expect that most students will have a low grade early in the year, rising as they acquire more knowledge and practice throughout the year. This makes perfect sense in terms of communicating students’ acquisition of new skills and knowledge but, in practice, it’s not the way we usually give grades: students are used to thinking of themselves as “an A student” or a “B student” (this is problematic in and of itself, but that’s another conversation) and expecting that that will remain relatively static–they look at their grades more like the heat gauge on a car, where any fluctuation suggests a problem to be corrected, than an odometer, where change is to be expected. If we communicate clearly about this, however, and if we do it on a school-wide basis, fixed standards that allow students to clearly track their own growth can actually be a tool to combat this fixed mindset, serving our purpose in more ways than one.
Are all assessments equal?
Most standards-based systems weight all assessments equally (and, in fact, it’s impossible to do anything else in grading software I use). The logic for this is that, if one is capturing a student’s ability to meet a specific standard, how the student demonstrates that ability should be secondary. She has either demonstrated it or she hasn’t. (You can find a nice, cogent statement of this philosophy here)
I agree with this sentiment in theory, but in a real classroom I don’t think it always works that way. The most obvious example of this for me is the historical understanding standard. I still give quizzes to assess students’ initial grasp of material, followed by tests which measure their ability to retain that information and combine it with other information in a bigger picture. Right now, both of those fall under “historical understanding” in my standards (although both tests and quizzes usually have skills-based components that are scored under other standards; let’s leave that to the side for now). If a student aces a quiz on the five pillars of Islam but, on a test two weeks later, does not show any further knowledge of the Islamic Empire, its expansion, or the evolution of the Islamic faith, those two performances don’t carry the same amount of information about her overall grasp of historical knowledge. In an unweighted grade book, however, I have to treat them as though they do.
There are several solutions to this problem, but each has drawbacks. I can treat quizzes and other small assessments as purely formative and grade only more comprehensive assessments. This is probably the most accurate way to track student achievement but, with our population of anxious girls, I’m not sure moving towards fewer, higher-stakes assessments is the right answer on an emotional level. (The answer to this problem, of course, is to offer infinite retakes, and, in a pure standards-based system, that would be ideal. In our hybrid, quarterly grade structure, I have some misgivings about whether that’s practical, though). I can create more standards, so that the material on the quiz reflects its own, separate standard but making each small collection of facts its own standard seems to me neither practical nor desirable. Even when it comes to skills, a proliferation of standards making tracking them much more complicated.
I am open to convincing on this point, but for now, I continue to feel that weighting at least some assessments is the approach that best lets me balance accuracy, frequent feedback, reassurance, and thoroughness in my assessment practices. It frustrates me that I can’t do that with the online system we use.
There are, of course, an infinite number of other questions that could be asked on the way to setting up the ideal standards-based grading system, but these are the ones which are occupying my mind right now. If you’ve been playing with some of the same questions, I’d love to hear where you’ve come down on them.