Odlyzko's Lemma

Published: 07/26/2020

I fell of the blogging wagon the past two months partly because I felt like everything was terrible and I didn't have much to say about that and partly because I was preparing for candidacy. Nothing has really changed on the first front except my increasing acclimation. I've now completed candidacy and while it's definitely more of a formality than a test that one could reasonably fail there was something pretty stressful at staring my ignorance in the face and collecting the things I've learned in the last couple years and the things I hope to accomplish in the next couple into one document.

To celebrate I thought I'd write up a small thing from my field. I tried to make it understandable but I don't think it'll make sense to anyone whose not already comfortable with linear algebra [1] and probability [2]. Odlyzko's Lemma is a very useful estimate in random matrix theory. In this blog post I'll give its statement and proof along with a simple corollary and a little bit of context.

Odlyzko himself is a very impressive figure. He worked at Bell Labs (where everyone cool seemed to be) for 26 years and is now at the University of Minnesota. You can see his list of publications here. Let me just highlight a few things:

The lemma is the following.

Lemma: Let \(X\) be a random vector of length \(n\) with \(\pm 1\) entries with independent and even probability. Let \(V\) be any subspace in \(R^n\) of codimension \(d\). Then,

\[\mathbb{P}(X\in V) \leq \frac{1}{2^d}.\]

Note that \(R\) can be any commutative ring in which \(-1\neq 1\). For our purposes you might as well assume it is the real numbers. Though for me it is very often a finite field.

Proof. Let \(v_1,\dots,v_{n-d}\) be linearly independent vectors which span \(V\). Write them as the columns of a matrix,

\[\begin{bmatrix} v_1 & \dots & v_{n-d}\end{bmatrix} = \begin{bmatrix} v_{1,1} & \dots & v_{n-d,1} \\ \vdots & & \vdots \\v_{1,n} & \dots & v_{n-d,n} \end{bmatrix}\].

By assumption that \(V\) has codimension \(d\) this matrix has a full rank minor so without loss of generality we can assume the first \(n-d\) rows are linearly independent. Restricting to the first \(n-d\) coordinates of our vector space we therefore have unique coefficients \(c_1\dots c_{n-d}\) such that \(\big[c_1 v_1+\dots+c_{n-d}v_{n-d}\big]_{[n-d]} = \big[X\big]_{[n-d]}\). In order for \(X\in V\) we need \(d\) more equalities given by the bottom \(d\) rows in the following matrix equation.

\[\begin{bmatrix} v_1 & \dots & v_{n-d}\end{bmatrix} \begin{bmatrix} c_1 \\ \vdots \\ c_{n-d}\end{bmatrix} = X.\]

Because \(X_i\) is no given value with probability greater than \(\frac{1}{2}\) we get the desired bound.

The same proof technique applies to more general \(X\). If instead of \(X\) having \(\pm 1\) entries \(X\) simply has entries which aren't equal to any specific value with probability greater than \(\alpha\) we get the bound \(\alpha^d\). We also don't need our entries to be independent. We just need the entries to remain unconcentrated after conditioning on any value of the other entries.

Why is this lemma useful? Through its iterated application one can show that for fixed \(\epsilon\) an \(n\times (1-\epsilon)n\) (tall and skinny, but not that tall and skinny) \(\pm 1\)-matrix is full rank with high probability. Why is that important? As a first step in showing that square \(\pm 1\) matrices are nonsingular with high probability. Why is that important? Many physical systems are best understood as random matrices. For instance the Hamiltonian in Quantum Mechanics and hidden layers in Neural Nets. Amazingly the Wigner Semicircular Law was first observed experimentally. Though I myself don't understand well what physical system encoded a matrix or why the eigenvalues of that matrix were physically significant.

I'll close with a corollary of the lemma (is there any more bad math writing cliche than putting a corollary after a lemma? Where was the Theorem?). This corollary is a first step in showing a square \(\pm 1\) matrix is nonsingular with high probability. One observes the first couple columns are linearly independent with high probability. One then needs more involved techniques to expose the remaining columns. How involved depends on how tight a bound you're shooting for. Perhaps later I'll follow up with a blog post on this.

Corollary: Fix \(\epsilon>0\). With probability at least \(1-n2^{-\epsilon n}\) an \(n\times (1-\epsilon)n\) matrix with independent \(\pm 1\) entries is full rank.

Proof. If the matrix is not full rank then there exists a column \(X_i\) which lies in the span of the previous \(i-1\) columns. The codimension of those columns is at least \(\epsilon n\)so by Odlyzko's Lemma the probability \(X_i\) lies in the span of the preceding columns is less than \(2^{-\epsilon n}\). We can take a union bound over the \((1-\epsilon)n\) columns where the rank may drop and throw out a factor of \(1-\epsilon\) to get the desired bound.

[1] If you want to learn more about linear algebra I imagine 3blue1brown's series is best. Full disclosure I haven't watched this particular series. But I've watched some of Grant Sanderson's other stuff and I'm sure he wouldn't let me down here.

[2] The best way to understand probability better is just to play a lot of mtg and poker.