Hacker News new | past | comments | ask | show | jobs | submit login
The Philosophy of Linear Algebra (2021) (sigfyg.medium.com)
106 points by pengstrom on June 12, 2022 | hide | past | favorite | 33 comments



> While nowadays most people learn linear algebra through the perspective of vectors and transformations, the matrix and determinant are much older than the ideas of vectors in a multi-dimensional space.

Determinants, yes. Matrices, no. Matrices were originally invented by Arthur Cayley as a formalism for describing linear transformations.

Mathematicians making historical claims ought to cite their sources. Mathematicians should also define what their terms mean, otherwise it's not clear what they're saying. I'm defining matrices as a certain number system which supports operations called "plus" and "times" (for the pedants, a preadditive category).

[edit]

Secondary sources: If we look here [1], matrices were invented in parallel by Sylvester and Cayley. Cayley invented the notion of a linear transformation first (1845), and in parallel Sylvester developed a notion of "matrix" to study determinants (1850), and then Cayley finally combined the two ideas to form the modern matrix algebra (1858). Therefore the claim that matrices are older than their modern perspective in terms of vectors and transformations isn't true.

[1] - https://www.quora.com/Why-and-who-invented-matrices


I agree with prior comments about the article (not great). But I love LA, and will indulge a bit.

For me, the most amusing feature of LA is how quickly results and tools escalate from trivial and obvious to surprising and hard. LA is often introduced as slick notation to express systems of linear equations. Gaussian elimination and matrix inverses (when they exist) are natural evolutions of this, but not what I consider surprising.

Matrix multiplication is an elementary next step, but it is not at all clear why one would multiply matrices. My first real mathematical surprise (I was in jr high at the time) is a result of matrix multiplication: there is a matrix A of real values that satisfies A @ A = -I. (The matrix is [[0,1],[-1,0]]). I was aware of complex numbers, but until that moment, I didn't know you could get "minus identity" without first artificially introducing the idea of "square root of -1". But this shows one can construct "square root of -1" without introducing anything like that.

When one gets to determinants, characteristic polynomials, eigenstructure, fixed subspaces, projection operators, SVD, orthogonal transformations, and so-on, the work really begins. That determinants are multiplicative is really wonderful, but not at all obvious. SVD in full generality was only shown in 1936 and modern algorithms emerged in the 1950's to compute it [1].

Finally, check out the LDU decomposition of the Walsh matrix [2].

[1] https://en.wikipedia.org/wiki/Singular_value_decomposition#H...

[2] https://en.wikipedia.org/wiki/Walsh_matrix


> it is not at all clear why one would multiply matrices

I disagree. Matrix multiplication is the composition of linear maps. AB is the transformation you get by applying B, then applying A to the result of that. Furthermore this provides a great intuition for why matrix multiplication is not, in general, commutative. If your professor didn't make this clear when you learned the definition of matrix multiplication, then they didn't do a great job.

>That determinants are multiplicative is really wonderful, but not at all obvious

Again, I think it is kind of obvious. The determinant of a map measures the degree to which it alters the area/volume/measure of the unit cube. So if A doubles it and B triples it, then BA should scale it by a factor of 6, since matrix multiplication is the composition of maps, and you're tripling the size of something that was already doubled.

This is why in a first course on linear algebra, it's more important to get the intuition and logical structure down than learn advanced matrix decompositions. Now your complex number example is a little less obvious. I actually think using the 2x2 matrix representation of complex numbers is the best way to introduce them because it sidesteps the part where you take roots of negative numbers, which is the most conceptually problematic part for many people and usually requires some kind of handwave.


> Matrix multiplication is the composition of linear maps.

This is an insight due to Cayley (see @ogogmad's comment, [1], and [2]).

> If your professor...

My professor provided historical context.

[1] https://abel.math.harvard.edu/~knill/history/matrix/index.ht...

[2] https://ia802808.us.archive.org/0/items/philtrans05474612/05...


> I love LA

Note: this quote originates from some early publications by amateur mathematician Randy Newman.


I only recently realized that multiplying a matrix with a column vector is equivalent to a linear combination of the vector's columns. This explains the importance of matrix multiplication since 95% of Linear Algebra is calculating linear combinations of vectors.


Indeed, and this leads to another important interpretation: matrix multiplication is function evaluation.

Arbitrary functions which take in vectors and output vectors can be very complex and thus difficult to reason about. A useful simplifying assumption is that of linearity: that f applied to a linear combination is just a linear combination of f applied to each piece of the combination separately. Linear algebra, broadly speaking, is the study of functions of this kind and the properties that emerge from making the linearity assumption.

It turns out that, if we assume a function f is linear, all of the information about that function is contained in what it does to a set of basis vectors. We can in essence "encode" the function by a table of numbers (a matrix), where the kth column contains the result of f applied to the kth basis vector. In this way, given a basis, any linear transformation f has a matrix A which compactly represents it.

Since f is linear, to compute f(v) I could write v in my chosen basis then apply f to each basis vector and recombine. Alternatively, I could write the matrix A representing f in that basis, and then multiply Av. The two are equivalent: that is, Av = f(v). And so matrix-vector multiplication is "just" evaluating the function f.


> vector's columns

Matrix columns, you mean. I think it is Strang who is a staunch proponent of teaching this to students early on. It is indeed helpful to realize that the matrix can be seen as a collection of the images of the basis vectors under the linear transformation the matrix represents.


Thanks for the correction. Too late to fix it though. Yes, teaching it like this would instantly eliminate all the mystery around it.


Articles like this help scale mathematical knowledge to run on a greater number of minds, so helpful, and thank you to the OP! While practical skills and competence are the necessary conditions for understanding maths that can only come from practicing and doing, we also need a bigger funnel of people with intuition and curiosity so as to move more people through the pipeline of curiosity, to competence, to mastery, to innovation. I hope as a culture we start to encourage more treatments of difficult topics like this article.


A philosopher's tools are paper and pencil. A mathematician's tools are paper, pencil, and wastebasket. There is no "philosophy" of linear algebra. A vector space is an abelian group with a scalar multiplication that satisfies distributive laws. Unlike most other mathematical objects, they can be classified by a single quantity: their dimension. Every vector space is isomorphic to the set of functions from S to F, F^S = {v:S -> F}, where S is a set and F is the underlying field. It's dimension is the cardinality of S. A linear transformation is a function preserving the linear structure. A matrix is a representation of a linear transformation using a particular basis. Matrix multiplication is composition of linear transformations. As Whistler remarked when an aficionado said "There are two great painters in the world, you and Velazquez." His reply, "Why drag Velazquez into this?" Why drag philosophy into this? Just buckle down and learn the math, as everyone before you did as a matter of course.


Linear algebra isn't about matrices. And I heavily disagree with last paragraph.

>Computers certainly helped accelerate the development of linear algebra and vice-versa.

Computers have helped with matrix analysis given floating point constraints inherit to computers. But linear algebra has been in a pretty solid place as theory since Halmos published "Finite Dimensional Vector Spaces" in 1942.


> We share a philosophy about linear algebra: we think basis-free, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury.

~ Irving Kaplansky


Ha! True enough.

It's long joked that much pure math research is resolving really hard problems down to a calculus or linear algebra computation.

When I first learned differential geometry, I was enamored with all the stuff we could do without coordinates. Then I learned Riemannian geometry and felt like I had been duped.


For an alternative to Linear Algebra, there's also Clifford Algebra (also known as Geometric Algebra. It makes rotations much more intuitive, among other things. The only drawback I found is that the current Python libraries seem to struggle with very high dimensionality (which is common in ML).


These articles remind me that one should look in the right places for the right respective knowledge.

That article is not at all an exposition of what the Philosophy of Linear Algebra is.

>While nowadays most people learn linear algebra through the perspective of vectors and transformations, the matrix and determinant are much older than the ideas of vectors in a multi-dimensional space.

That is not true. Anyway, what is the article about? According to the title it's about linear algebra. Then, the basic objects are vectors, and the interesting topic of discussion are maps. Some maps can be represented by matrices. That's it.

Yet, the article begins by describing matrices. Is it like those fiction books who randomize the order of the chapters to make them more interesting?

Or is the article supposed to be about matrices in general, not about linear algebra? (Niche) uses of matrices outside of linear algebra do exist.

>An upper triangular matrix conveys a whole lot more than just the fact that the numbers form a triangle; it tells you that the matrix is describing a nonsingular system that is solvable.

That is not true in general.

If the matrix is an upper triangular matrix AND all the coefficients in the diagonal are non-zero, THEN it's regular (so invertible, so non-singular).

But philosophically, WHY does it do that?

The answer is because you are basically trying to invert the matrix but you notice half-way through that you can already use the upper triangular matrix to get the result with non-matrix means, so you just stop there instead (it's a lot less work). It can be shown that the kind of manipulations you do to get from the triangular matrix to the inverse matrix are all conserving the determinant, so you're in the clear.

That is slightly interesting from a philosophical standpoint.

>The diagonal matrix describes a system with elements that only operate on themselves,

It just means the elements don't mix the different axes of the multiplicand in the result. Coordinate along axis one in the result = coordinate along axis one multiplied by diagonal element one. And so on, for two, three.

>symmetric matrix tells the story of a system that is bi-directional

Argh. It just means that the eigenvectors are all orthogonal to each other and the eigenvalues are all real. It's nice to have these properties (for example because it's intuitive to have orthogonal coordinate axes), so it's often used for calculations.

Also, eigenvectors and eigenvalues are interesting because they give you a better basis for your calculation (literally).

If you choose your coordinate axes wrong, it will tell you the "better" ones (the one intrinsic to the problem--so it's much easier to calculate things there).

To summarize, to save others the work: that article is not about the philosophy of linear algebra. It's just showing a small example with matrices and not explaining any WHYs in the process. Where it does explain things, it's wrong multiple times.

----

Maybe a small real Philosophy of Linear Algebra here:

The real Philosophy of Linear Algebra is that there are a lot of systems that are linear, therefore it's nice to have a complete mathematical theory for systems of linear equations. That's it. It's not magical, it's not general enough (cough non-linear systems), and (looked at it right) it's simple.

Linear algebra's objects of discussion are vectors and maps.

What vectors are everyone learned in school. Suffice to say vectors are any objects which can be added together, multiplied by a scalar, and then form a commutative ring with respect to addition; and multiplication and addition distribute over in the usual way (like for real numbers). Notably, there's no division! This is not a complete definition, but you get the idea.

(It is also useful, but not necessary, to introduce a so-called "inner" product. Inner because the result stays inside the vector space. Nevermind!)

In any case, you can then have equations with vectors in them. Like 5 v⃗ = w⃗, where v⃗, w⃗ are vectors.

This works like any other system of equations: there are not enough equations for those two unknowns v⃗ and w⃗.

So what about this system of equations: { 5 v⃗ = w⃗, 5 v⃗ = w⃗ } ? That's silly you say?

Ok then, what about { 5 v⃗ = w⃗, 10 v⃗ = 2 w⃗ } ? Still doesn't work?

So what's the rule here? Turns out it has to do with those upper triangular matrices.

A system that has no interesting solution (that is not 0⃗):

   5 v⃗ = w⃗
   5 v⃗ = 2 w⃗

   w⃗ = 2 w⃗
   0⃗ = 2 w⃗ - w⃗
   0⃗ = w⃗
Booring.

Anyway, let's introduce functions on vectors, so-called "maps" (here, f and g are maps).

   f(v⃗) := 5 v⃗
   g(v⃗) := (5/2) v⃗

   f(v⃗) = w⃗
   g(v⃗) = w⃗
THAT is the basic philosophy of linear algebra.

Now for some more practical stuff:

It's annoying to have those abstract v⃗ and w⃗ be around. In a lot of cases, you are only interested in coordinates with respect to some basis (the basis is a set of axes that you choose--usually your axes of measurement. That is the axes your measurement devices measure along). It can be shown that the number of basis vectors you need for that (to represent any abstract vector!) is equal to the dimension of the vector space, usually 3.

Let the basis be {b⃗_1, b⃗_2, b⃗_3}. These basis vectors obviously must be independent, so b⃗_1 := b⃗_2 is out because that's cheating. b⃗_1 := 2 b⃗_2 ? Out. b⃗_1 := 2 b⃗_2 + 3 b⃗_3 ? Stop it already.

Now you can find numbers q, r, s such that v⃗ = q b⃗_1 + r b⃗_2 + s b⃗_3.

If you want to be more conserative with your letter consumption, better is to call q and r and s: v^1, v^2, v^3 instead (it's just a notation, and the ^1 and ^2 and ^3 are part of the name for now).

So

   v⃗ = v^1 b⃗_1 + v^2 b⃗_2 + v^3 b⃗_3.
   w⃗ = w^1 b⃗_1 + w^2 b⃗_2 + w^3 b⃗_3.
Or in non-annoying notation

   v⃗ = ∑_i v^i b⃗_i
   w⃗ = ∑_i w^i b⃗_i
Or in even less annoying notation, let the sum be implied when an index repeats in a term

   v⃗ = v^i b⃗_i
   w⃗ = w^i b⃗_i
Ahhh. Much better.

If the f is a LINEAR map, then you can do this (by definition, which you have to look up):

   f(∑_i v^i b⃗_i) = ∑_i v^i f(b⃗_i)
So in order to specify the linear map f completely, you only need {_i f(b⃗_i) }, i.e. how f acts on the individual basis vectors each. That's it.

So for f and g in non-annoying notation:

  f(v^i b⃗_i) = v^i f(b⃗_i)
  g(v^i b⃗_i) = v^i g(b⃗_i)
Also,

  f(v⃗) = w⃗ = w^i b⃗_i
  g(v⃗) = w⃗ = w^i b⃗_i
So

  f(v^i b⃗_i) = v^i f(b⃗_i) = w^i b⃗_i
  g(v^i b⃗_i) = v^i g(b⃗_i) = w^i b⃗_i
So

  v^i f(b⃗_i) = w^i b⃗_i
  v^i g(b⃗_i) = w^i b⃗_i
And NOW we can go to matrices.

Those b⃗_i are annoying, and the f(b⃗_i) are even more annoying.

So let's get rid of them.

Let c⃗_i := f(b⃗_i). Let d⃗_i := g(b⃗_i).

Now we'd like all those abstract basis vectors to disappear. But c⃗_i and b⃗_i are different bases (if they are bases at all--neeevermind for now :P), we'd like these to be specified only with the b⃗_i. Let c⃗_i = c^j_i b⃗_j

   v^i c⃗_i = w^i b⃗_i
   v^i c^j_i b⃗_j = w^i b⃗_i
   v^i c^j_i b⃗_j = w^i δ^j_i b⃗_j

   c^j_i v^i = w^j

   c := (_i (_j c^j_i)) That's the first matrix!
   v := (_i v^i)        That's a column "vector"--a matrix
   w := (_i w^i)        That's a column "vector"--a matrix
Note: δ^j_i := {1 if i=j, otherwise 0}

   c v = w
Likewise for the g, with the d⃗_i killed and replaced by the b⃗_i.

And now we are at the philosophically completely uninteresting result. Good job.


> uses of matrices outside of linear algebra do exist.

What do you have in mind here?


Matrices are "just" 2D arrays - no need to invert a minesweeper map, for example.


Off the top of my head, adjacency matrices in graph theory.


> Interestingly enough, linear algebra only exposed its deepest secrets to us in the 20th century, despite having existed for many centuries.

I think this becomes less surprising when phrased as: the deepest knowledge about linear algebra that we have so far is that gained most recently—at which point it sounds almost tautological. There's no reason to think that we've plumbed all the mysteries of linear algebra, no matter how much more we know (especially about the computational side of it) than we used to.


Not tautological, I'd say. It is entirely reasonable to say, about a field, that its deepest mysteries have been elucidated centuries ago. One might be mistaken about that, sure. But it is a sensible and non-tautological statement to make.

(Having said that, the article struck me as fairly vacuous.)


> It is entirely reasonable to say, about a field, that its deepest mysteries have been elucidated centuries ago.

I can imagine this being true of a field that is no longer studied, although I can't think of any good example—the only unsatisfactory one that occurred to me is penmanship, but (a) that's not exactly a scientific discipline and (b) there are probably still people out there making strides in penmanship—but I find it hard to believe of any living discipline. In fact, I would argue that, if it is true of a discipline that its deepest mysteries were discovered in the past, then that discipline is effectively dead.


The pure side also: spectral sequences, quiver representations, etc.


Spectral sequences (and sheaves) were developed by Leray in the 1940's while he was a prisoner of war captured by the Nazi army. Not exactly new.


Yep, and they're still an active research topic, eg.

https://arxiv.org/abs/2107.02130


This reminds me of the discussions in the book "Linear Algebra done right" by Sheldon Axler. He has a beautiful series for videos on YouTube that can serve as a companion to the book.


linear algebra looks seems easy because the majority of promblems are non linear problems and as such untractable except when you approximate them to linear problems


Okay but somebody please explain to me the reason for prefix operator column-major notation.

I digress…

>> linear algebra doesn’t exist simply because humanity managed to concoct a system that coincidentally finds usage in many seemingly unrelated subjects

I’m not a stranger to this field, but this article doesn’t make this case at all, and I’m not sure it can be made. Linear algebra is fundamentally a set of notation rules that people built upon and found ways to represent their problems with. It’s ubiquity in many fields is no more surprising than the observation that many different genre of literature are written in English, and somehow nobody is confused enough to think that flight instruction manuals are the domain of English rather than the range of it. There is no practical limit to the number of academic papers one can write which awkwardly contort a simple intuitive computation into an indecipherable representation in linear algebra. If you want the concept of linear algebra to consume all linearity that exists in the world, then fine, whatever, but then the notation sucks. You may call a continuous Fourier transform a linear operation, but try writing down the values of that matrix.

No; there’s nothing fundamentally tying the inner-workings of the world to the mechanics of linear algebra, but has been considerable effort to adapt some narrowly-chosen set of mechanics to represent the world, and some subset of those things work out well, so that’s what we focus on. But this is just path dependency. A different set of rules for writing linear relationships could have worked out better, or worse, but it would have to get there regardless. I expect that the current system will disappear soon as the world shifts to talking about computational tensors, and more flexible rules about how to apply operations. Who says you can’t multiply a row by a row? That’s stupid; we can and should do better. Is the determinant magical, or is is just a an arbitrary series of shift-product-sum which is far more applicable in the general sense? Do we really need a whole matrix to describe a permutation?

Linearity may be considered well-described when there is no longer any need for a separate course to teach the notation and tricks behind it. When I was in grad school, one of my favorite books was Gene Golub’s “Matrix Computations”. It really is an impressive book from an incredible mind, like a book on chess positions written by Bobby Fisher. Regardless, it was hard for me to go through it without wondering why such a thing exists. Much like chess, the human mind can become enamored with any simple set of rules that can be extended to sufficient complexity. Faced with a real problem, sometimes it helps me to get to a solution, and sometimes it distracts me from the solution. I can never be sure which is more common.


From chicmath (https://www.ocf.berkeley.edu/~abhishek/chicmath.htm), Halmos' Finite dimensional vector spaces:

> But this book was where I first learned about tensor products, and why the matrix elements go the way they do and not the other way (Halmos is very careful on this point).

I haven't read the book, but it sounds like the answer may be there.


It’s just an arbitrary convention based on historical inertia.

My impression/speculation is that prefixing functions comes from the historical process turning prose into notation. You start with “sine of a” and that eventually becomes “sin a”, etc. Later the idea of a generic f(x) was from the early 18th century.

The spatial organization of written matrix notation is directly lifted from the way people wrote systems of simultaneous linear equations. Just take the variables out and put all of the numbers in a grid.


I should made a linear algebra YouTube video. probably make me $1 million given how much interest there seems to be in the subject

The subject is poorly taught imho. It's used for almost everything and yet most instruction focus on the matrix and vector operations without tying it to other things.


Ermmm Gilbert strang has already done that and is free no want will want to pay you 1 million...


I think it is implied that the views on YouTube will generate the revenue...

It's not as though there are no YouTube millionaires, after all.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: