Notes on Lookup – Histograms as Sieves
I think I overlooked one interesting example of something like a sieve being used in the typical K-12 math curriculum, and this post is intended to remedy that. It is possible and instructive to look at a histogram as a sieve.
Let’s suppose I have a folder of test sheets from a standardized test. Each piece of paper in the folder is from a particular student, and is marked with a score from 0-20. On a long table, I make room for 21 stacks of test sheets, side by side, by putting down yellow stickies each marked with one of the possible scores from low to high. I then take the test sheets, one by one, and place them in the stack corresponding with the score on the sheet. If Jesse’s sheet is marked with score 15, that sheet will go on the stack labeled 15 on the yellow sticky. When I’m done placing all the test sheets, I end up with a number of stacks of varying heights – and some stacks may be empty. Below is a representation for what I ended up with:
The usual name for this representation is histogram. My highest stack had eight test sheets in it, for score 15. Score 10′s stack was empty. Jesse’s sheet is somewhere in the 15 stack, but my representation doesn’t show where in the stack it is. Each box in my representation could have been marked with the student’s name, but I didn’t do that. As with any representation, this representation highlights certain information and leaves out other information altogether. One thing left out is the name of the student, another thing is any information on the order in which the sheets were in the original folder. If I had shuffled the sheets in the folder, the representation as shown would have been identical.
What information can we extract from these representations? Quite a bit, actually. Some are simple and some are useful. A simple thing we can see from the representation is that nobody had a score of 10, nor a score of 11. Of course, neither the histogram representation, nor the stacks of sheets with the yellow sticky was critical to finding this out. I could have answered the question “how many students have a score of 10?” by flipping through the entire stack of sheets and counting the ones that have a score of 10. If that was the sole thing I cared about, it might even have been a bit faster and simpler doing it that way. Yet, after I answered the question “how many 10′s?” I would have had to do the same amount of work all over again if there was a follow-up question; “how many students have a score of 20?”
One side effect of putting the test sheets in stacks based on their score is that we have now effectively sorted them by score. If we were to put them all back in a single stack again, by collecting the individual stacks in score-order, the resulting stack is sorted by score. If I divide the sorted stack in half, I’ve located the middle value (called median – though median is a technical term, and I have left out the precise rules for what to do when the number of sheets in the stack is even, and what to do if it’s odd.) If I divide the sorted stack in five equal parts, I get the quintiles, if I divide the sorted stack in four equal parts, I get the quartiles.
The histogram also allows me to answer questions like “what percentage of students scored 16 or above?” All of those questions could have been answered from the original pile of test sheets, but the histogram makes it more straightforward.
The process of getting the histogram has clear parallels with the various sieves I’ve shown in earlier posts. Enough so, that my claim that sieves and look-up tables don’t have much support in the K-12 curriculum was too hasty.