In this post, I will try to connect a set of equations with matrices and vectors in a way that isn’t simply a notational shift. In part 18 of this series, I brought up the vast preponderance of square matrices (a matrix with the same number of rows as columns) and wondered why school mathematics tends to have us think that matrices somehow always have to be square. This issue is of more than casual interest, since you constantly see students confuse rows and columns. The signifiers that make it clear what each row means and what each column stands for – those are exactly the signifiers that are missing from the accepted standard notation for matrices, presumably for reasons of compactness.
Let’s use our Bill Amend comic strip once again, and look at the problem as stated by the brother in the third panel.
There is a cost per shirt and a cost per sweater, and we don’t know these costs. They are the things we’re trying to find out. What we do know is the cost of two shirts and a sweater – that is $60. Similarly, we know that the cost of one shirt and two sweaters is $75. If we treat this exactly the same way we’ve been dealing with fast food orders in prior posts, e.g. in part 6, we could show this as follows:
We’re looking for the numbers in the blue vector that will have the green vector come out just right. This means that both the total price for the first order and the total price for the second order, priced out from the order amounts and the (as yet) unknown prices, must come out to $60 and $75 respectively. In math class, we’d write:
and assume that somehow we will keep track of what is what. The things you learned to do in math class, like doubling the second row and then subtracting the first row from the second, these all have counter parts in terms of shirts and sweaters and orders. Doubling the second row amounts to doubling the second order: 2 shirts plus 4 sweaters will cost $150. From the first order, we know that 2 shirts plus 1 sweater costs $60. In double the second order, we have the same number of shirts, but three more sweaters. For those three extra sweaters, we ended up paying $150-$60, which amounts to $90. If these three sweaters cost $115, then a single sweater will cost one-third of that, $90/3, which is $30. So we found the price of a single sweater, which is $30, and we can use either the first order or the second order to recover the price of a single shirt. If we use the first order, we’d see that two shirts plus the $30-dollar sweater cost us $60, so we must have paid $30 for those two shirts, or $15 per shirt. (Once we knew that a sweater cost $30, we could also have used the second order to figure out how much one shirt cost: one shirt plus double the 30 dollars amounts to $75, so the shirt part of that order must have been $75 – $60, or $15.)
The steps involved in solving these equations, at least till the point where we nailed one of the unknown numbers, all correspond to row operations on the matrix and known vector. In a system of equations, written in either of the ways I’ve shown, you can freely multiply a row by any number you like, and freely replace a row by the sum or difference of that row and another row.
Yet you may have solved the system of equations by thinking about the shirts and the sweaters differently. Is it easy to establish which costs more, a shirt or a sweater? I think it is, since I can imagine walking the first order to the cash register, seeing that it costs $60, and then putting a shirt back on the shelf and grabbing an extra sweater instead. The order now costs $75, and the extra $15 must come from the extra sweater costing $15 more than the shirt I put back on the shelf. So a sweater costs $15 more than a shirt.
This still doesn’t exhaust the ways in which you might have reasoned about shirt prices and sweater prices. You may have noticed that if you first walk the first order to the cash register and then the second, you end up with a total of 3 shirts and 3 sweaters, for which you would have paid a total of $135. From that, you might conclude that a single shirt and a single sweater must cost $135/3 or $45. The first order is like a single shirt and a single sweater – with an extra shirt. The difference between $60 and $45 must account for the single shirt. Similarly, you might have noticed that the second order is like a single shirt and a single sweater, but with an extra sweater. This extra sweater cost $75-$45 = $30, so sweaters cost $30.
How did Paige think about the shirts and the sweaters? The comic strip doesn’t really tell us one way or the other. What the comic strip does suggest very strongly is that Paige can think in terms of shirt and sweaters, but not in terms of x and y.
For us, there is a similar issue whenever we see a matrix. What does each row stand for? What does each column stand for? If we don’t know – if we can’t talk about the second row and the first column of the coefficient matrix as the number of shirts in the second order – then we can’t do better than talk about the number in row 2 and column 1. Just maybe, the amazing thing is that some people do not get confused.
If you have read earlier installments of this series – and I know it has gotten long – you may have noticed that very few of the matrices that appeared there have the same number of rows and columns. A matrix that does have the same number of rows as columns is known as a square matrix. Yet if you have any experience with matrices from high school or college at all, you may have noticed that almost all of those are square. It isn’t untypical for a textbook on matrix algebra to show one or two non-square matrices on the first page, and then have all remaining pages deal exclusively with square matrices. What’s so special about square matrices? Why do they end up drowning out almost all other types?
If you think back to the order matrix in the part 16, you can see that for it to be square, the number of orders has to somehow be exactly the same as the number of items on the menu. That situation would be a total coincidence, and would last only until the next car shows up at the order window. It appears then that the matrices in textbooks must arise from entirely different scenarios than the ones we’ve played with till now.
What are the scenarios that underlie the matrices found in textbooks? It turns out that this is not a trivial question, or at least it doesn’t have a trivial answer. For the textbooks often don’t tell you where a matrix comes from. They may not care, or know, particularly.
My impression is that most matrices in textbooks come from two broad application areas. One, systems of equations, and two, transformations. For those broad application areas, we can justify why many – if not most – of the matrices involved would be square.
Let’s take a quick look at equations (and leave transformations for another day). An equation like 2x+y=60 has many solutions. There are lots of combinations of numbers so that double the first, added to the second, gives us 60. For example, 1 for x and 58 for y will do the trick. But so does 10 for x and 40 for y. If I think of the equation as a clue for what x and y must be, the clue isn’t powerful enough to nail down x and y. But if I have another clue, another hint, like x+2y=75, together these hints may be enough to nail down x and y precisely. Together, these clues give us what is called a system of equations, and they would normally be written in math class like this:
By the time students get to matrix algebra in school, this same system of equations would now be written like this:
where the group of 2 by 2 numbers on the left is called the matrix of coefficients, the vector with the x and y in it is called the vector of unknowns, and the vector on the right is called the known vector, or the right hand side vector. This use of a matrix and vectors is consistent with the notion of matrix multiplication, but at this point the matrix and vectors are often introduced simply as a shift in the notation for the system of equations. it is shorter, more compact, especially if you go through the steps of what is known as Gaussian elimination. If the system of equations can be solved, it turns out that the solution depends on the known vector in an interesting way. This way can itself be expressed in matrix notation, using what is called an inverse matrix, and we can write
I’m skipping a bunch of steps on purpose here, including non-trivial ones like how we would find this inverse matrix in the first place. Here, my main interest is in recovering (imputing) the logic of the progression of topics and techniques in traditional textbooks of matrix algebra.
If square matrices come from systems of equations, where do the systems of equations come from? In many textbooks, systems of equations are simply the starting point – they appear as if dropped from the sky. Let’s take another look at the following comic strip (we encountered it earlier in this post)
The brother in the strip can at least come up with a half-way reasonable scenario that might have given rise to the system of equations shown above. In doing so, he is at a disadvantage: he has to make something up. He’s working backwards. He is making up a “story problem” working backwards from the system of equations. You can’t blame him for coming up with something that – even though Paige can relate to it – is still kind of lame. In what real-life situation would you really know (and remember) the cost of two shirts and a sweater, as well as the cost of one shirt and two sweaters – but not remember the price of each item? If the price of a sweater and the price of a shirt are not known to us, and can be recovered only by solving a system of equations, it is only because they have been deliberately hidden from us – and what store would have any reason to do that? But many people like puzzles, and we could view this as a puzzle, the type of puzzle we would call a math puzzle.
From the systems of equations and the matrices in math class, you might never ever guess that vast amounts of money and vast amounts of computer resources are used every day all across the world to perform matrix operations – operations on square matrices even, operations above and beyond the matrix-vector inner product stuff like pricing out orders in the way we saw in the prior parts of this series.
Let me end by sketching a somewhat more realistic example of a problem where you end up with a system of equations. You test a sample of concrete that is supposed to have a certain amount of steel in it. Concrete is cheap, and steel is expensive but crucial to the strength of the concrete. Your supplier would have had incentives to skimp on the amount of steel used. You know how much pure steel weighs per cubic inch, you know how much pure concrete weights per cubic inch, and you have measured the weight and the volume (cubic inches) of your sample. What is the composition of your sample?
The structure of the concrete/steel problem isn’t all that different from the shirt/sweater problem. It is different in that for the shirt/sweater problem you might simply call the store, or look them up on the web. In the concrete/steel problem you might go ahead and destroy the sample to look at the steel inside, and this might well be a good thing to do. Yet solving the system of equations would be a quick way to establish that there is insufficient steel in the concrete.
Testing the purity of drugs (whether the legal or illegal kind) can be done by similar techniques.
Do you have better examples of solving systems of equations? The criteria I’m looking for are (1) that it’s real-life, and (2) easy to state for a non-specialist audience.
In this post I will show a full-fledged matrix multiplication done in a spreadsheet, Excel. (This series starts here; two recent posts part 15 and part 16 introduce vector inner products done in Excel.)
Above you see one approach, though not one I recommend. First, let’s see what we’ve got here. The blue section represents the menu, containing price and nutritional information for a series of items. The yellow section represents orders, the green section represents totals. As before, I ignore taxes in the total price. (Or if you prefer, the price shown is the price before sales tax). Each section is what is called a matrix: it has rows and columns, and the meaning of each row as well as the meaning of each column is clear.
The blue matrix has 8 rows of numbers, and three columns of numbers; the yellow matrix has 8 rows of numbers and 2 columns of numbers; and the green matrix has three rows of numbers, and two columns of numbers.
What works about the spreadsheet above is that columns G and H work exactly the same way. In fact, I can copy the whole column H and then right-click on column I and select “Insert Copied Cells” and I will get another column, with order amounts in yellow, and totals in green, and if I change the label from “Jerry’s order” to “Jane’s order”, and change the amounts from Jerry’s amounts to Jane’s amounts, then the totals for Jane will adjust themselves accordingly. You can see how this works by looking at the formula in G13. This formula is shown in the box right above the orange-highlighted G: it shows =SUMPRODUCT($C$4:$C$11, G$4:G$11). It calculates the inner product of the price column with the order amount column, giving the total price. This formula, when copied and pasted into H13, will land there as =SUMPRODUCT($C$4:$C$11, H$4:H$11), which means it will still reference price information, but using Jerry’s order amounts. So this is what works about the approach shown.
What doesn’t work very well about the approach shown above is that the formulas for G14 and G15 cannot be derived from the one in G13 by copy and paste. Even though we tried very hard to protect the row numbers by typing $C$4 so that when we copy the formula downward Excel won’t mess up and turn it into C5:C12, we can’t get the column designation right. Neither $C4 nor $C$4 works to get it to paste as D4. This is not surprising. Excel adjusts formulas by noticing how far over and down the target cell is from the source cell. It has no way of guessing that you wanted the columns to advance as you move the cells down. So in the spreadsheet shown above, what is typed into G14 is =SUMPRODUCT($D$4:$D$11,G$4:G$11) and in G15 we have =SUMPRODUCT($E$4:$E$11, G$4:G$11).
Below, I show a set up that fits Excel’s way of doing things better, and just maybe this is easier for human beings as well.
If you focus on the totals matrix, in green, you see that it gets its columns from the blue matrix, and its rows from the yellow matrix. The order matrix (yellow) is now 3 rows by 8 columns of numbers, the blue matrix is 8 rows by 3 columns of numbers, and the totals matrix is 3 rows by 3 columns of numbers. Yet you can see clearly that the fact that the totals matrix has the same number of rows as columns is mostly coincidence: all it takes is adding or removing a single order, and the totals matrix changes its number of rows accordingly. Conversely, if we were to add another column of, say, cholesterol data to the blue matrix, it would change the shape of the green matrix accordingly. The green matrix has the same columns (not just the number of columns) as the blue matrix, and the same rows (not just the number of rows) of the yellow matrix.
I wish I could tell you that the content to be typed into Joe’s total price cell would be =SUMPRODUCT(B13:I13,K4:K11) – but Excel lets us down here. Though Excel documentation (Office Excel 2007) suggests that SUMPRODUCT works on any two ranges of numbers as long as they are equally long, and though we’ve seen before that Excel will blithely calculate numbers without regard for whether they make any sense or not (we saw in the previous posts that Excel will gladly calculate the inner product of two orders), Excel nevertheless refuses to calculate the inner product of a row with a column using SUMPRODUCT. That is too bad, and it is a restriction that I would consider to be a bug in the program. Fortunately, you can see in the figure above that =MMULT(B13:I13,K4:K11) does the trick. MMULT, which appears to stand for matrix multiplication, not only accepts that one range is a row and the other one is a column, it seems to insist on it. We’re over the hump, though, and the road is downhill from here. In thinking through the details of =MMULT(B13:I13,K4:K11) we can see that we want to protect the columns B through I, as they indicate the range of items on the menu: they should stay the same whether we are looking at Jane’s order or Jerry’s order, or whether we are looking at totals for the price or for the calories. Similarly, we want to protect the rows 4 through 11, as they also indicate the range of items on the menu. Conversely, we don’t want to protect row 13 or column K, as these are precisely the ones that should range freely from totals cell to totals cell. This way, we end up with =MMULT($B13:$I13,K$4:K$11) as our formula for cell K13, and this formula can now indeed be copied and pasted into all the other cells of the totals matrix (green).
Perhaps a bit surprising, the traditional treatment of matrix multiplication in textbooks matches this latter arrangement. When you multiply a matrix A times a matrix B, they would say, you get a matrix C with the same number of rows as matrix A and the same number of columns as matrix B. They would say that to be able to multiply matrix A with matrix B at all, they must be conforming, which is a fancy way of saying that the number of columns of A must match the number of rows of B. Some people like the visual image of dropping down the left matrix below the right matrix (but keeping it on the left) so that the shape of the result matrix (and the inner products that determine the value in each of its cells) can easily be seen.
Some people have never even seen any kind of visual image, and are stuck with remembering some formula they learned in college, like
How tough it must be to have any kind of real understanding of what you are doing and why, if all you learned was how to manipulate formulas like that.
This has become a long series, and we’re not done yet. Here is where we started, looking for natural models that give power and flavor to the mathematical idea of a vector. Since then, we’ve looked at vector inner products, matrices, matrix multiplication, and saw how they arise from relatively straightforward notions of grouping and keeping track of wholes with many parts. In the last post, I showed in an Excel spreadsheet how these ideas show up there. My starting point for this post will be pricing out an order, from the last post, but without showing extended prices:
The items are listed in the B column, the unit prices are listed in the C column, and Joe’s order is in the D column. From the amounts ordered, and the unit prices, the total for the order (ignoring taxes) is shown in the highlighted cell D12. It shows $15.10. Yet behind this number $15.10 is the formula that generated the number. This formula is shown on top, just above the highlighted “D”. The formula used is a variation of =SUMPRODUCT(C4:C11, D4:D11). This is Excel’s version of what we’ve seen as an inner product calculation. If you look closely at the formula, you’ll see that it really says =SUMPRODUCT($C4:$C11,D4:D11). This dollar symbol has a special meaning to Excel, and it has nothing to do with the fact that the result of 15.10 is a dollar amount. Rather, Excel interprets the dollar sign as protecting or freezing the column number that follows. This is important, not for calculating the contents of the cell C12, but when we copy this formula into another cell.
In this spreadsheet, I have copied cell D12 into cell E12. (This can be done by right-click on D12, selecting Copy, then right-click on E12, selecting Paste. Alternatively, I can hit Control-C in D12, and Control-V in E12. Yet another way is to drag the bottom right hand corner of the box around D12 and extend it into E12.) Excel’s way of copying formulas is clever, in that it assumes you don’t want another identical calculation resulting in $15.10, but that you want the calculation applied to a different set of numbers. If I had typed in =SUMPRODUCT(C4:C11,D4:D11) in cell D12 and then copied it into cell E12, it would have landed there as =SUMPRODUCT(D4:D11,E4:E11). This is not quite what I want. What I want is for cell E12 to be ready to give the total price for Jane’s order. Instead, =SUMPRODUCT(D4:D11,E4:E11) would calculate the inner product of Joe’s order, and Jane’s order, a calculation for which I have no use. What I want is for Excel to automatically modify the D4:D11 in =SUMPRODUCT(C4:C11,D4:D11) to E4:E11, but to leave the C4:C11 alone. That is what the dollar symbol lets me express: =SUMPRODUCT($C4:$C11,D4:D11) tells Excel to leave the C’s alone, but the D’s should be modified based on where the formula is copied into.
In the same way you protect/freeze columns in formulas, you can also protect rows. Here is a simple example, showing currency conversion.
The cell B18 contains the all-important currency conversion rate, here 1.05812 Canadian dollars for 1 US dollar. The total amounts for Joe and Jane’s orders are shown in the C column, and the D column calculates the corresponding Canadian dollar amount. The formula in cell D19 is one I typed in; the formula in cell D20 is copied directly from cell D19. As you can see in the box above the highlighted D, the formula I typed in is =B$18*C19. This formula tells two things: one, the number to show in cell D19 is what you get from multiplying the numbers in B18 and C19; two, when the formula is copied into a cell below, change the C19 accordingly, but don’t change the 18 in B18. Instead of entering =B$18*C19, I might have entered =$B$18*C19, and thus protected it from being copied into a cell anywhere, not just below. The main point here is that Excel doesn’t know or care, and will blithely calculate what you tell it to, whether doing so makes sense or not.
The way the Canadian dollar vector (D19:D20) depends on the US dollar vector (C19:C20) is a very common and important pattern, it is a vector operation called “multiplying by a scalar”. In this name, the word “scalar” refers to the single outside number 1.05812, the currency conversion rate.
In the next post in this series I will show an example of matrix multiplication done in Excel.
This post is part of a series, starting here, about models and ideas that underlie the rather abstract stuff called vectors and matrices in math textbooks. In this series, we’ve looked at shopping lists, pricing out orders, weighted averages, perspective drawings, and other stuff, to introduce – and show the relevance of – the idea of vectors, matrices, vector addition, inner products, matrix multiplication. We played with different representations.
One representation I want to play with in this post is that of a spreadsheet. My examples will be restricted to Excel spreadsheets (on a PC), but they should generalize to other spreadsheet programs. Spreadsheets also play off the usefulness and ubiquity of rectangular arrangements of data, and they allow us to play with lots of numbers and yet not be caught up by having to do lots of arithmetic – the spreadsheet is good at taking that work away from us. In setting up a spreadsheet, we have to show it what computation we want performed, but then the spreadsheet program will do that computation for us, and will redo it as many times as we want, and in particular, it will redo all the computations as needed whenever we change a number.
I’m showing an order form, and the data in the order form is consistent with the example used in prior posts, of a particular fast food place with a particular menu. The part about items and price is intended to be static, the amounts ordered would change from order to order. The total (ignoring taxes) is calculated, and as soon as you change any entry in the Amount ordered column, the total will change accordingly. In this spreadsheet, this is done by entering formulas for the column called Extended Price, and another formula for the cell that contains the total. “Cell” is the name for each rectangle in the grid, and each cell has a name based on which column it is in and which row it is in. The highlighted cell has the name E4, and the content typed into this cell is not 1.50 but the formula shown right above the highlighted E column: =C4*D4.
The formula =C4*D4 means that the number that should be shown in the E4 cell is obtained from multiplying the number in the C4 cell (here 1.50) by the number in the D4 cell (here 1). The nice thing about this formula is that it works (meaning: it displays the right number for us to see) regardless of what the price in cell C4 is and regardless of the quantity ordered. Change the price in C4, and the number shown in E4 changes accordingly, automatically, without you having to do anything. Leave the price in C4 alone but change the quantity ordered in D4, and the number shown in E4 also updates automatically.
The total amount, here$15.10, in cell E12, is obtained from adding up all the extended prices above. What I typed into the cell is =SUM(E4:E11). This is convenient shorthand for =E4+E5+E6+E7+E8+E9+E10+E11, which would have worked just as well.
What I left out of this account is what goes into the cells E5, E6, …, E11. In cell E5 goes =C5*D5, etc, but typing all of these in would be a hassle, and Excel doesn’t make you do that. I can copy and paste the content of cell E4 into cell E5! When pasting “=C4*D4” into cell E5, Excel will automatically adjust the formula so that it becomes =C5*D5. In fact, I can fill all of the cells E5 to E11 in one action by copying cell E4 (control C) and then selecting (left click and hold) the entire range E5:E11 and then paste (control V).
There is one last shortcut I will introduce in this post, and that is the SUMPRODUCT construct. In the above version of the order form, extended prices are shown, and those extended prices may be useful. Yet it is quite possible to get the total price without listing the extended prices. In Excel, you can get the total directly from the price column and the amount ordered column, as follows. In the cell E12, you can type in =SUMPRODUCT(C4:C11,D4:D11), and it will display the correct total amount $15.10 even if you had put nothing at all in E4, E5, E6..E11. The SUMPRODUCT will take two ranges and compute pair-wise products and add them all up. This matches our notion of inner product, assuming all the numbers are lined up appropriately. Excel doesn’t know or care that it is multiplying hamburger prices with hamburger amounts – it assumes you know what you are doing, and would blithely multiply hamburger prices with fries amounts if you specified the ranges wrong.
In a subsequent post, I’ll show matrix multiplication based on the SUMPRODUCT features in Excel.
In the prior parts of this series, I have look at a number of problem situations from which a certain notion of grouping data and organizing data and operating on data arises straightforwardly, a way of grouping data that is well known in traditional mathematics as vectors and matrices. I have also hinted all along the way that the traditional textbook treatment of these subjects is quite different. In this post I want to make some comparisons with what’s usually taught in the later grades of high school and undergraduate curricula (specifically for science and engineering), and I have no particular illusion that this will be of interest for people who have not encountered that curriculum before – so feel free to skip this post; the next posts will return to applications and models.
In school, vectors are often identified with an arrow, in three-dimensional space. It is pictured as having a size and a direction. The vector has three components, an x-component, a y-component and a z-component. An example would be the force acting on a point mass, another example would be the velocity of that point mass, and electric field at a particular location can also be pictured as such a vector. The joint effect of two forces, often called the resultant force, can be drawn as the diagonal of a parallelogram for which the original forces form the sides. This is then called the sum of the two vectors.
A matrix is often introduced as a three-by-three arrangement of numbers that can have a variety of uses. In algebra class, a matrix may be first introduced when looking at a system of linear equations with multiple unknowns, e.g.
The system of three equations with three unknowns, shown at the top, is then rewritten by separating out the pattern of coefficients, as shown on the bottom. This pattern of coefficients is then called a matrix, the arrangement with x, y and z is then called the unknown vector, and the group of numbers 23, 11, 17 is then called the known vector, or sometimes the right hand side vector. The big things that look like parentheses are traditionally used to “hold” the numbers in the matrix, and similarly to hold numbers in a vector. Sometimes big square brackets are used instead. The matrix of coefficients is said to be multiplied with the unknown vector to give the known vector.
The matrix of coefficients is an example of a square matrix, here 3 by 3, but you can also use the same idea for a system of 2 equations with 2 unknowns, or 7 equations with 7 unknowns.
Multiplication of one matrix and another is often introduced using square matrices only, and this has some appeal because it is a closed system. When you multiply one three-by-three matrix with another three-by-three matrix, what you get is yet another three-by-three matrix. For the curriculum writers, matrix multiplication using square matrices is often their earliest opportunity to show non-commutative groups: a system where multiplication is associative but not commutative: the order in which you multiply two square matrices matters. How you multiply two matrices, or why, that is often dealt with strictly as a recipe, as a number of steps to follow. Here is an example of such an attempt, typical in its recipe-like approach, yet unusually visual and animated in its execution. Watch the moving hands! Compared to the moving hands, a college textbook’s formula seems positively ante-diluvial. But even the nice moving hands don’t make clear why you are multiplying this way. Most incredible, to me, is how anybody is supposed to keep track of what all these numbers in the matrix stand for.
In our approach in prior parts of this series, I indicated how matrix multiplication is symmetric, that is, commutative. This would seem to fly in the face of everything you learn in normal textbooks. There is no real conflict between the two approaches, but rather an easy way to confuse what is going on when the rows and columns of a matrix are not labeled the way I’ve suggested doing. In traditional treatments, the issue of matrix multiplication being non-commutative usually only shows up when multiplying square matrices. When you multiply a 3×5 matrix with a 5×7 matrix, you get a 3×7 matrix, but you can’t multiply a 5×7 matrix with a 3×5. In the traditional treatment, that multiplication just cannot be done. And because of that, there is rarely any confusion between matrix A times matrix B versus matrix B times matrix A, since only one of them makes sense to be done at all. When both matrices are square, then confusion can arise. In our treatment earlier, for two matrices to be multiplied, there as to be a shared edge. To have a shared edge requires more than simply having the same number of numbers: the labels have to match as well. After all, I would want to multiply the number of fries ordered with the price of fries, and not multiply the number of fries with the price of chicken. The labels help you keep it all straight.
In the traditional treatment, the order of the numbers in the matrix, and the separation between rows and columns become sacrosanct. Matrix multiplication is then described in terms of combining a row from the matrix on the left with a column from the matrix on the right. In the treatment we have given in the prior parts of this series, what is shown as a row and what is shown as a column, and in what order the products of the menu are listed are largely a matter of convenience. What matters is that we combine pieces of data that all relate to french fries, or pieces of data that all relate to chickenburgers.
The compactness of the traditional notation for matrices, achieved by just listing the numbers, comes at the risk of losing track what number stands for what, and what the numbers, vectors and matrices mean.
In prior posts in this series we have seen two completely different kinds of applications for the idea of matrix multiplication: in part 11 we looked at pricing out an order, and in part 12 we looked at drawing a three-dimensional object in two dimensions using a simple form of perspective. As different as these situations are, in both we could see vectors and matrices appear rather naturally, and in both we could see inner product of vectors and matrix multiplication playing a key role. We could ask ourselves the question: how come vectors and matrices appear so naturally; and we could ask a different but similar question: how come inner products and matrix multiplication appear so naturally. They are actually quite different questions, and in this post I only intend to tackle the first one. What is so natural about vectors and matrices?
To answer that question, lets first revisit what these things are that we’ve been calling vectors and matrices. Roughly speaking, a vector is a bunch of numbers, but not just numbers – numbers coupled with some unambiguous indication of what each number means.
If I have a pile of coins, I may notice that it consists of 3 quarters, 5 nickels, 2 dimes and 7 pennies. The numbers involved are 3, 5, 2 and 7. But unless I keep careful track of what the 3 represents and what the 2 represents, I may get confused easily. 3 quarters and 5 nickels and 2 dimes and 7 pennies is not the same as 2 quarters and 7 dimes and 3 pennies and 5 nickels, but it is the same as 2 dimes and 5 nickels and 7 pennies and 3 quarters. One way of saying that is that the numbers come in a certain denomination and that the denomination is as important as the number. There might be an objection that we should just say $1.27 and be done with it. Who cares about the make up of that $1.27 total? Well, in some situations you might: when a parking meter takes quarters and dimes but no nickels or pennies, or when you want to give your kid a 50 cent allowance and she has no change. To use a bit more ‘adult’ example, imagine you run a clothing store, and you decide that its current level of inventory, at $47,000, is too high. You don’t think you’d care about how many you have of each item (so you can match it with your knowledge about how fast each item is selling) rather than just knowing the total? Or let’s think about a grocery store, who is running low on inventory. You think they’d simply call their suppliers and say: “bring us two trucks worth of inventory!”? Surely they’d specify how much of this they want, and how much of that.
If a vector is a bunch of numbers, a matrix is a bunch of vectors – but not just any vectors. Rather, the vectors share something, so that the bunch of vectors can be displayed in a rectangular arrangement.
The picture above, from part 11, shows an order matrix on the left, and this matrix contains the orders for Joe and Jerry. This matrix contains two order vectors, one for Joe and one for Jerry. These vectors share the same set of products from the menu, and this allows them to be joined into a matrix. You can see that this is done in part by including stuff that Jerry didn’t order. Jerry didn’t order coke, but an entry for coke (a zero entry) was included in Jerry’s order vector, and this is part a general technique that can often be meaningfully used to join vectors into a single matrix. (In real-life applications of matrices you’ll often find matrices that are largely filled with zeros.)
Because the matrix is a rectangular arrangement, you can get a vector from a matrix by taking either a horizontal slice or a vertical slice. You could look at the order matrix and extract a vertical slice for fries, and this slice shows all the orders for fries (Joe ordered 3 and Jerry ordered 1) independently of all the other products that were ordered.
It is also possible (and common) to consider a vector as a special case of a matrix. A vector could look like a matrix with only a single row (such a vector is often called a row vector), or a vector could look like a matrix with only a single column (such a vector is often called a column vector).
If we look at our example of matrix multiplication above, and imagine that Joe was the only one putting in an order. In that case, the order matrix would just be Joe’s order vector, and the totals matrix would similarly be restricted to Joe’s totals vector. We could say that a row vector multiplied by a matrix gives us a row vector as a result.
Starting out again from the matrix multiplication shown above, we could imagine that we no longer cared about calorie and sodium information, and erased that from the menu information matrix. Correspondingly, the totals matrix would no longer carry calorie and sodium totals, and would be reduced to a total price vector. We could say that a matrix multiplied by a column vector gives us a column vector.
Vectors and matrices show up naturally in spreadsheets as well as in data base tables. In addition, if you fill out any standard form, the collection of forms filled out by multiple people will yield a matrix. There is no need for magic or terror when looking at matrices – they are a rather simple organizing tool for data. Basic operations on matrices like addition and multiplication are relatively straightforward too, as we’ve seen in the example of pricing out an order. It’s too bad that textbook treatments so often obscure the underlying simplicity of the ideas.
After the work we’ve done in prior parts of this series, e.g. part 11, to show vector inner products and matrix multiplication using an example of pricing out an order at a fast food place, I’d like to work through an example that looks very different at first: an example of a perspective drawing.
If we first focus on the top drawing, we could come up with a likely story about what the drawing represents. I think of it as a picture of four cubes, stacked in such a way that one cube isn’t really visible. The side of the cubes are 1 unit in each direction. I’ve drawn axes and scales to correspond to this. The x-axis points towards the right, the z-axis points straight up, and the y-axis points away from us, out of the paper, so to speak. The point labeled P is 2 units to the right, 1 unit back, and 1 unit up, and so has x,y,z coordinates of 2,1,1, which in standard notation is given as P(2,1,1). The point Q(0,2,2) isn’t marked, but you should be able to locate it, it’s on the top cube, on top, all the way in the back, and on the left.
Using our vector notation, we could indicate the information we have about P and Q as follows:
We can show the information about P and Q separately, as is done on the left, or combine them into a single matrix as is done on the right.
If we now focus on the bottom drawing, we note that the picture, as such, is identical to the one on top. But this one we’re going to interpret entirely as a two-dimensional picture. And in a way, it really is! It’s a picture of squares and parallelograms, all drawn on a single plane, the plane of the paper, or the plane of the screen. We can also use a pair of coordinate axes for this drawing, and I’ve indicated such a pair, though you may note I’ve used u and v rather than the more standard x and y. In this picture, the point P could be given as P(2.5,1.5), or P and Q could be given in matrix notation as follows:
There is a relationship between the x,y,z values and the u,v values, and this relationship is given by the way the perspective works. The perspective I’ve used in the drawing is not the one you learn in art class, with horizons and points where parallel lines meet in the distance. The perspective here is the one you see (with some variations) in engineering drawings. These variations are sometimes called oblique perspective, or cavalier perspective, or cabinet perspective. I’m not sure where this one fits in, I think it is closest to cavalier perspective. (Where I grew up, it was called “engineer’s perspective.”) Regardless, an essential feature is that lines that are parallel in space will remain parallel in the perspective drawing, and that equal distances along parallel lines in space will remain equal distances in the perspective drawing (though equal distances in different directions in space may not show as equal distances in the perspective drawing).
The relationship between x,y,z and u,v can itself be represented as a matrix:
and we can call this matrix the projection matrix (textbooks will usually call this the transformation matrix). One way to view this relationship is to look at the rows. The row for x indicates that one unit of x contributes a unit to u and nothing to v. The row for y indicates that one unit of y contributes half a unit to u and half a unit to v. The row for z indicates that one unit of z contributes nothing to u and one unit to v. We can determine these values by moving one in the direction of x and see what that does to u and v and then repeat this for y and z. Or we could see this by looking at the marker “1″ on the x axis and writing down its coordinates in the u,v system. Since the x axis and the u axis are kind of the same, it makes sense to have x be one u and no v. Similarly, the z axis and the v axis are the same. The trickier one is the y axis, and we can see that the “1″ marker on the y axis corresponds to (.5, .5) in the (u,v) grid.
The relationship between x,y,z and u,v for any particular point can now be expressed as a matrix multiplication:
To the points P and Q you can see that we’ve added points O and R. O is the origin of the axes, and R is the point that all four cubes have in common. On the top right we’ve shown the projection matrix which connects x,y,z and u,v; on the bottom left we have the coordinates of the four points in the x,y,z grid; on the bottom right we have the coordinates of the four points in the u,v grid. And, guess what, the matrix on the bottom right is the matrix product of the other two matrices. For example, the u coordinate of Q is found by multiplying the x coordinate of Q by 1, the y coordinate of Q by .5 and the z coordinate of Q by 0, and then adding up all these products. This gives us (0 × 1) + (2 × .5) + (2 × 0) = 1. Similarly, the v coordinate of Q is found as (0 v 0) + (2 × .5) + (2 × 1) = 3. Each number in the bottom right matrix is found as the inner product of the row on the left and the column on top.
If you wonder what the practical use of this might be – you don’t have to look very far. Almost any recent computer game you care to examine will spend a lot of its resources figuring out where to put a pixel on your two-dimensional screen, a pixel that represents a point in three-dimensional space. Even though the game’s perspective may include horizons and vanishing points, and even though it also needs worry about color gradients and shading, the process of taking a simulated three-dimensional scene and rendering it on a two-dimensional screen is done through vast amounts of matrix multiplications that are essentially of the type shown above. The numbers in the projection matrix will be different, depending on the location and position of the simulated camera, but ultimately it still boils down to matrix multiplication. Before computers (specifically, the computer graphics cards) could handle massive amounts of matrix multiplication in real time and in high resolution, games were restricted to what were called side scrolling games.
What we’ve seen here is that both pricing orders and projecting three-dimensional scenes on a two-dimensional screen have something surprising in common: a relationship between numbers in the matrices that goes beyond the particular labels that indicate what these numbers mean. This core pattern must be important in some way.
Let’s look some more at matrix multiplication, which we introduced in part 10 of this series. We showed three rectangular arrangements, and called each a matrix, and called one the product of the other two. (The accepted plural of matrix happens to be ‘matrices’.) Here they are again, in a slightly different form:
Each matrix is a rectangular arrangement of numbers, where the labels for rows and columns indicate what these numbers stand for. The matrix for the left is the order matrix, and it shows that Joe ordered 5 hamburgers. Jerry order 1 fries and no hamburgers. The matrix in the middle is the menu information matrix and it shows that one order of coke contains 10 mg of sodium, and one order of diet coke contains 20 mg of sodium. The matrix on the right is the totals matrix, and it shows that Joe’s order came to $15.10 (ignoring taxes) and Jerry’s order contained 1460 mg of sodium altogether. Matrix multiplication is the thing that gives us the matrix on the right from the two matrices on the left.
It is now time to stand back a bit and look at what is going on in a way that depends less on the particular situation of pricing out an order. The two matrices that are being multiplied have different shapes, but they aren’t wholly independent. They share an edge, so to speak, an edge that allows them to be folded into part of a box. In our example above, the shared edge is the list of products on the menu, cheeseburger, chickenburger, etc. Both matrices have exactly that edge as one of their dimensions (whether row or column). The resulting (product) matrix shares one edge with the left matrix, and one edge with the middle matrix, and all three matrices would fold together to make a 3-dimensional box. Each entry in the resulting matrix is obtained from an inner product of two vectors, one in each of the original matrices. In our example, the total price for Joe is obtained from the inner product of the Joe order vector with the menu price vector. In our picture, the Joe order vector is the top row of the order matrix, and the menu price vector is the left column of the menu information matrix.
If you have taught matrix multiplication before, you may be interest to notice that in the setup we’ve shown here, matrix multiplication is entirely commutative. This may be counterintuitive, since the normal way of introducing matrix multiplication has it be non-commutative. The difference, as you can see, is that the way each row and column is labeled allows you to keep track of what number means what.
If you have never before seen matrix multiplication, you may wonder what the big deal is. What we’ve done seems pretty straightforward, right? You can simply look at the labels and track what is going on. The totals price for Joe’s order ($15.10) clearly comes from what he pays for cheeseburgers (1 × $1.50) plus what he pays for coke (3 × $1.20) plus what he pays for fries (3 × $1.50), etc. For you, I recommend you pay particular attention to the fact that the pattern of numerical relationships is the same for each of the positions in the totals matrix: each derives from the inner product of a row of the order matrix and a column of the menu information matrix, and this pattern itself is independent of the label shown above the column or left of the row. It is because this numerical pattern (sum of products) shows up so often in real-life situations that it is worthwhile giving it a special name (innerproduct) and worth it to pay particular attention to all the various places that it shows up. We’ll do more of that – in a next post.
Chances are that if I asked you to add 1756678 and 99810023 together without using a calculator, you would have a way to accomplish that. This is already pretty interesting, if you think about it, in that it is likely you have never before added those two particular numbers in your life. If I asked you to check your result on a calculator, you would know how to do that, and it is quite likely you have additional ways to discover the result. Some of those ways may be to ask an authority figure; but even short of relying on authority figures – whether teachers, parents, older siblings, buddies, librarians, spouses, accountants, you really do have alternative ways to discover the result. In fact, there are many ways even without resorting to counting out 1756678 blocks and then counting out another 99810023 blocks, throwing them all in a big pile and then counting how many blocks are in this joint pile. And this is a good thing because counting that many blocks may feel like a prison sentence rather than anything you’d remotely want to be involved with.
The idea of something being discoverable is fairly simple, though I haven’t heard the word used much in school settings. The idea applies to mathematics but not only to mathematics. If I want to find out if salt water freezes at a lower temperature than plain water, I can discover this; that is, I can try it out. I don’t need to rely on Google or text books or teachers or any other authority. The question about salt water might be resolved as simply as taking an ice cube out of the freezer and sprinkling some salt on it and see if it melts faster than another ice cube you take out at the same time but without salt on it. There might be other and better ways to discover the effect of salt on the freezing point of water, sure – but it isn’t the kind of thing that relies on having the right person with the right credentials and the right magic wand performing the right magic invocation.
There are lots of things in mathematics as well as other fields that are not discoverable. Some authority, early in your life, told you what a “two” looks like, it looks like this: “2″. That is a convention, and this one happens to be a world-wide convention. Somebody in authority, early in your life, told you that to add things we use the “+” symbol. That too is a convention. These are conventions with a long pedigree and very wide-spread adoption. Some conventions are recent, and some have local significance but not global significance and acceptance. In much of the world, unlike the USA, the “,” symbol is used to separate whole from decimal part; and the “.” is used to mark thousands and millions etc. – just the reverse of what is used in the USA, Britain, Australia, and other countries. This means, for example, that the meaning of “7,040″ is not discoverable outside of a cultural setting. In the USA, it clearly means seven thousand and forty; where in most of Europe, the same 7,040 would mean seven and 40 thousandths. For “7.040″ the same would hold but in reverse.
When you look at a map, you know that the top of the map corresponds to North. This is a convention, and not discoverable. A kid is not stupid for not being able to figure it out. The question “which way (on this map) is North?” is a question that tests knowledge of a convention, of a cultural legacy being handed down.
When you ask a student what the square root of 9 is, things are a little more complicated. It may be that the student has never heard or seen the term “square root” or has forgotten it. In that situation, the answer to the question about the square root of 9 is not discoverable. But if the student knows what the term “square root” refers to, then the job of figuring out what number, when multiplied by itself, gives 9, that part is discoverable. It doesn’t depend on any authority to tell or confirm that the answer is correct.
Discovery, and following conventions, those two are entirely different beasts. Nothing bad about either, but confusing the two can lead to considerable mischief.
When a student discovers something in mathematics, it bolsters something really important. It fosters the idea not only that mathematics can make sense, but that “sense making” is the very essence of mathematical thinking. It raises the confidence of the student that he or she can make sense of the mathematics, that he or she is capable, that he or she can figure this out. (Note that by “discovery” I don’t mean something that’s never been seen before by anybody. For example, noticing for oneself that adding two even numbers together will always produce another even number, that would count as having made a discovery – very different from being able to reproduce something that one has been told to memorize.)
When a student learns something that is part of the mathematical heritage, it opens up participation in a community. A student who learns how to take a pair of numbers and plot it as a point on a graph with axes at ninety degree angles and regularly-spaced markings on each axis (an ordinary Cartesian graph, in other words), this student now has access to a new community and a new world, in essentially the same way that learning to read opens up a new world and a new community. If a child were to learn written language on his own, writing with a toe in the sand, inventing his own private written language, that would be an extraordinary feat, but even so it wouldn’t give this kid access to Dr. Seuss or Harry Potter or any other piece of our joint heritage.
Though not everything in math is either a convention or something to be discovered, it is nevertheless very eye-opening to sort out for oneself whether something I’m teaching is a convention or a discovery. This is not a particularly hard thing to do, and yet so often we leave it befuddled. Am I inducting somebody into a society of people who share a rich common legacy and heritage? Or am I enabling somebody to figure out something powerfully for themselves, so they are left with a real sense of having made a real discovery for themselves? Do I find myself doing only one of these without hardly ever doing the other?

















