Category Archives: Math

Half-baked: systematic, elementary “Russian” proof of the hook length formula (HLF)

This is mostly a condensed version of the corner-recursive proof of HLF from in “Russian notation” (as opposed to English or French notation; IMO this clarifies the proof), with some commentary. (See also for what may be a generalization, though I’m not familiar with it myself.)

To-do: compare with representation theory proof.

Link for self: Stanley’s generating function proof (Wikipedia mentions “reverse plane partitions”—the point seems to be that by considering more general labelings than 1,2,\dots,|\lambda|, one can extract enough structure for a nice generating function), Theorem 17 of

hook length blog picture

Russian notation, with the slopes of the rim edges (up or down) labeled as in the diagram for \lambda = 7+7+4+4+2+0+0+\dots, clarifies the additive-combinatorial nature of hook lengths: hook lengths are given by the differences between the half-integers indexing the upward and downward “ends” of a hook (see also Section 2 of for exposition on application to so-called “core partitions”). From cursory googling it seems considering “negative hook lengths” along these lines can be useful (see

Anyways, given a partition \lambda, it is natural in the context of hooks (illuminates factorial-like structure) and HLF (in view of out-corner-removal recursion) to keep track of the “in-corners” y_0,\dots,y_m and “out-corners” x_1,\dots,x_m ordered as a sequence y_0<x_0<y_1<x_1<\dots<x_m<y_m (see diagram). For instance, the product of hook lengths is \prod_{1\le i\le j\le m} \prod_{u+0.5\in (x_j,y_j)}\prod_{v+0.5\in (y_{i-1},x_i)} (u-v). (Note that this formula holds for any sequence y_0 \le x_0\le y_1\le x_1\le \dots\le x_m\le y_m whose x_i contain all out-corners and y_j contain all in-corners.) Removing an out-corner essentially corresponds to replacing some < x_i < with the sub-sequence \le x_i -1 < x_i < x_i + 1 \le. (We have an additional relation \sum x_i = \sum y_j from the placement of the vertex of the partition at the origin, but this is unimportant as our problem is invariant under horizontal translation.)

Idle question for self: Is there a nice generating function in these coordinates? Connection to core partition coordinates?

Now, we induct/recurse based on the following observation: any labeling (i.e. standard Young tableau on \lambda) has maximal number n = |\lambda| on some “out-corner” of \lambda. (The out-corners can also be thought of as “rim hooks” of length 1, where “rim hooks” are hooks whose removal leaves a smaller partition remaining.)

Thus, by looking at the set of hook lengths under removal of out-corners, it suffices, after some telescoping cancellation (note that every hook either changes by length 0 or 1, and most change by 0, i.e. stay the same; from core partitions perspective, think of corners as 1-rim-hooks), to prove that |\lambda| = -\sum_{\ell=1}^{m} \prod_{j=0}^{m} (x_\ell - y_j) \prod_{j\ne \ell}(x_\ell - x_j)^{-1}.

However, by decomposing into rectangular cells as in the diagram, one has |\lambda| = \sum_{1\le i\le j\le m} (y_j - x_j)(x_i - y_{i-1}). (This is quadratic in the position coordinates x_i,y_j, as expected.)

It is, however, true algebraically that \sum_{\ell=1}^{m} \prod_{j=0}^{m} (x_\ell - y_j) \prod_{j\ne \ell}(x_\ell - x_j)^{-1} = \sum_{1\le i\le j\le m} (x_j - y_j)(x_i - y_{i-1}). This follows from the general identity \sum_{i=1}^{m} P(x_i)\prod_{j\ne i}(x_i - x_j) = [X^{m-1}]R(X) for any polynomial P, where R is the remainder P\mod{\prod_{i=1}^{m} (X - x_i)} (a possibly-zero polynomial of degree at most m-1), which itself follows from either finite differences with arbitrary (in particular, not necessarily equal) steps—induct on m—or Lagrange interpolation—look at the “leading” (i.e. X^{m-1}) coefficient of \sum_i P(x_i) \prod_{j\ne i}(X - x_j)(x_i - x_j)^{-1} = R(X)—or partial fractions—decompose \frac{1}{R(X)}, and compare power series of both sides. (This is, incidentally, related to the theory of the different ideal in algebraic number theory: see Problem 1 at for instance.)

For the application, just take P(X) = \prod_{j=0}^{m} (X - y_j), whose quotient mod \prod_{i=1}^{m} (X-x_i) is X - (\sum y_j - \sum x_i); then [X^{m-1}]R(X) is easily computed by Vieta’s formulas, as desired.


Mini: Towers (of extensions) that are Galois

This is fairly random, but I thought I might as well write it down. In general it seems hard to say when a tower of Galois extensions is itself Galois, but sometimes one can…

Classification of finitely generated modules over Dedekind domains, with and without projective modules, and reconciliation of approaches

… as promised in my Quora answer here. Also, here’s a PDF version of the post: Classification of finitely generated modules over Dedekind domains, with and without projective modules, and reconciliation of approaches.

In this post (specifically in Section 4) we give a proof of the classification of finitely generated (f.g.) modules {M} over a Dedekind domain {A} avoiding the notion of projective modules. (Instead we rely on pure submodules, which give a slightly more transparent/natural/direct approach to the classification problem at hand, at the expense of the greater generality afforded by the projective module approach. Along the way we also explain why these approaches are really not so different after all.)

We gradually build up from the more familiar special cases when {A} is a principal ideal domain (PID) or even just a field. Along the way we discuss some relevant abstractions, such as the splitting lemma (Section 2).

I would like to thank Profs. Yifeng Liu and Bjorn Poonen for teaching me in 18.705 and 18.785, respectively, this past (Fall 2014) semester at MIT. Thanks also to Alison Miller for catching a typo in Exercise 3.

Continue reading Classification of finitely generated modules over Dedekind domains, with and without projective modules, and reconciliation of approaches

Sylvester’s law of inertia, from the spectral theorem

Here we prove Sylvester’s law of inertia. Let {A} be a real symmetric matrix, and assume the spectral theorem.

Take an arbitrary orthogonal basis with respect to the form {A(v,w) = X^tAY}, and scale (and optionally reorder) so that there are {(n_+,n_-,n_0)} entries of {+1,-1,0}. (This can also be shown using the spectral theorem, but I think it’s overkill and conceptually messier (we’re “mixing up” linear transformation matrices and bilinear form matrices).)

So first we show {n_+,n_-,n_0} are unique (i.e. invariant under change of basis). This is not too hard, since our basis {v_1,\ldots,v_n} is quite nice. The maximum dimension of a positive-definite set is {n_+}, or else we would get a linearly independent set of at least {(n_+ + 1) + n_-+n_0 = n+1} vectors. (More precisely, this would force a nontrivial intersection between a dimension-({\ge n_+ + 1}) positive-definite space and a dimension-{(n_-+n_0)} negative-semi-definite space, which is clearly absurd.) Similarly, the maximum dimension of a negative-definite set is {n_-}.

Now that we have uniqueness, we move on to the eigenvalues and principal minor determinant interpretations. By the spectral theorem (for real symmetric matrices), the fact that symmetric matrices have real eigenvalues, and uniqueness of Sylvester form, {A} has {n_+} positive eigenvalues, {n_-} negative eigenvalues, and {n_0} zero eigenvalues.

Continue reading Sylvester’s law of inertia, from the spectral theorem

Non-inductive proof of the spectral theorem (for normal matrices)

Here we present a non-inductive proof of the spectral theorem for normal matrices (which doesn’t use, for instance, proposition 8.6.4 in Artin’s Algebra). (But it does seem to be the same as the proof in Herstein’s Topics in algebra.) It is motivated by a similar direct proof (presented in my class) for Hermitian operators with {n} distinct eigenvalues.

We work in matrix form, so we need to prove that there exists an orthonormal basis of {\mathbb{C}^n} consisting of eigenvectors of a normal matrix {A}.

Continue reading Non-inductive proof of the spectral theorem (for normal matrices)

Galois theory basics, part 2

From the previous post, we know that (in characteristic {0}), if {[K:F]} is finite, then (by the primitive element theorem to get some {\alpha} to satisfy hypotheses of previous explanation) {G(K/F) = G(K/K^{G(K/F)})} and

\displaystyle [K:K^{G(K/F)}] = \lvert{G(K/F)}\rvert = \lvert{G(K/K^{G(K/F)})}\rvert

(so {K} is a splitting field over/Galois over {K^{G(K/F)}}) equals the degree of the “splitting part” of min poly of {\alpha}.

So naturally, we wonder if we can say anything starting from the perspective of groups, i.e. “can {G(K/F)} be replaced by an arbitrary (finite) group of automorphisms (of {K}) {H}?”

Perhaps not surprisingly, yes!

Continue reading Galois theory basics, part 2

Galois theory basics, part 1

I was recently looking at algebraic proofs of FTA, and decided to review some Galois theory. So here’s some notes/an attempt to motivate Galois theory (perhaps up to or a bit before fundamental theorem of Galois theory), from the perspective of someone who much prefers field theory/polynomials to group theory/automorphisms (although maybe I should just stop being stubborn).

(Random side note: in the context of abstract algebra, polynomials are naturally motivated by “minimal (free) ring extensions”: if we have a commutative ring {R}, then {R[x]} is the smallest ring containing {R} with the additional element {x} satisfying no relations. On the other hand, any constraints/extra relations would be polynomials, so at least for monic polynomials {f} we get the quotient construction {R[x]/(f)}.)

Suppose {K = F(\alpha)} is a singly-generated field extension (by primitive element theorem this is broader than it seems). If {f} is the minimal polynomial of {\alpha} of degree {d}, then let’s look at how it splits/factors in {K[x]}. If {f} has some set of roots {r\le n} of roots lying in {K} (the “splitting part”/linear factors of {f} in {K[x]}), say {z_1 = \alpha, z_2, \ldots, z_r}.

Generally, the main object of Galois theory seems to be splitting fields (or if we’re lazy, algebraic closures), but I’m still fighting through the material myself so I don’t fully appreciate/communicate this myself. Perhaps the point is just that it’s much easier to work concretely with roots (in algebraic closures) than directly with irreducible polynomials. (For example, we can then work with symmetric sums of the roots, and generally draw lots of intuition from number fields.)

We’ll work in characteristic {0} for convenience, so e.g. the {z_i} are pairwise distinct.

1. “Symmetry” of the {z_i}: crux of Galois theory, and introducing Galois groups

The key is that {K = F(z_1) = F(z_2) = \cdots = F(z_r)} (recall {\alpha = z_1} by definition), since the {z_i} share minimal polynomials {f \in F[x]}.

To make this symmetry precise, we phrase things in terms of {F}-automorphisms of {K}; each {F}-automorphism fixes coefficients of {f}, hence is uniquely determined by sending {\alpha \rightarrow z_i}. Thus they form a Galois group {G := G(K/F)} of size {r}, since we easily check the automorphisms to be bijections of {K}.

Continue reading Galois theory basics, part 1

Weierstrass approximation theorem

In this post, I describe some thoughts on (the proofs and related ideas behind) the Weierstrass approximation theorem.

1. Revisited/filtered thoughts after first trying to prove it myself

First instinct is to use Lagrange interpolation. Runge’s phenomenon says equally spaced nodes are bad for this. More generally even smarter things like Chebyshev nodes are bad. See comments here for some intuition: high degree means greater oscillations in between nodes, as we’ve only controlled nodes perfectly and it’s thus hard to bound stuff between nodes. (On the other hand, I don’t see good intuition a priori why something like Chebyshev nodes shouldn’t work, it’s just that it’s more plausible that it won’t work than a “smoother/more-averaged-out” approximation. In fact the Wikipedia says all absolutely continuous guys are good with Chebyshev so… .)

Continue reading Weierstrass approximation theorem