The learning process over the past couple of days has gradually clarified my understanding of linear algebra. With this post, I summarize some key new content.
When facing matrices and singular values, one should establish the following understanding:
✅ A matrix is a spatial operator.
✅ Singular value decomposition helps you break down the essence of a matrix: rotation → stretching → rotation.
✅ The size order of singular values tells you: in which directions the matrix truly has strength and which directions are ineffective.
1. Orthogonal Matrix#
The core definition of an Orthogonal Matrix
An real matrix is called an orthogonal matrix if it satisfies
Here, is the transpose, and is the -dimensional identity matrix.
The inverse is the transpose
This simplifies calculations and ensures numerical stability.
An orthogonal matrix is a real matrix that "preserves the inner product"—it rotates or flips the coordinate system but never stretches or distorts it.
2. Angle Brackets#
Here
is the symbol for the "inner product." In the most common case—real vector space —it is equivalent to the dot product we are familiar with:
3. Matrix Position Exchange#
- Eliminate the left
-
The attached to the left of
-
Multiply both sides by its inverse from the left:
Note:
Must multiply from the left consistently;
Do not attempt to multiply to the right (that would disrupt the order of multiplication).
- Eliminate the right
-
The attached to the right of
-
Multiply both sides by its inverse from the right:
If is an orthogonal matrix, , thus
Why can't the order be reversed?
-
Once you multiply on the wrong side, the symbols will "insert" into different positions:
. -
The same operation must be performed symmetrically on both sides of the equation for it to hold.
-
This is essentially the same as the order of function composition or coordinate transformation: which transformation to perform first and which second must be written in the corresponding positions of the product, and cannot be arbitrarily exchanged.
4. Similar Diagonalizable Matrix#
A similar diagonalizable matrix (commonly referred to as a "diagonalizable matrix") means:
There exists an invertible matrix such that
where is a diagonal matrix.
At this point, we say can be diagonalized through a similarity transformation, or simply is diagonalizable.
The "mechanical process" of diagonalization
-
Find the eigenvalues: Solve .
-
Find the eigenvectors: For each , solve .
-
Assemble : Form the matrix by stacking the linearly independent eigenvectors as columns.
-
Obtain : Fill the corresponding eigenvalues into the diagonal: .
Then we have .
5. Singular Value Decomposition#
Symbol | Meaning |
---|---|
Given real symmetric matrix () | |
Orthogonal matrix: , with orthogonal unit length column vectors | |
Diagonal matrix: |
The expression is called orthogonal similarity diagonalization; geometrically, it means "rotate (or mirror) the coordinate system → A only retains independent stretching."
- Why can "real symmetric matrices always be orthogonally diagonalized"?
Spectral Theorem:
For any real symmetric matrix , there exists an orthogonal matrix such that is diagonal, and the diagonal elements are the eigenvalues of .
- Real eigenvalues: Symmetry guarantees all eigenvalues are real.
- Orthogonal eigenvectors: If , the corresponding eigenvectors must be orthogonal.
- Repeated roots can also take orthogonal bases: The same eigenvalue may correspond to multiple vectors; in this case, perform Gram–Schmidt in the subspace they span.
- Step-by-step textual analysis
Step | Explanation |
---|---|
1. Find all eigenvalues and eigenvectors of | Calculate to obtain all ; for each solve to find eigenvectors. |
2. Arrange the eigenvalues in a certain order on the diagonal to obtain the diagonal matrix | For example, arrange them in ascending order as . The order is not important as long as it is consistent with the order of the column vectors later. |
3. Eigenvectors corresponding to different eigenvalues are orthogonal; for repeated eigenvalues, use Gram-Schmidt to orthogonalize and normalize | - If , the corresponding vectors are naturally orthogonal, no action needed. |
- If has duplicates (geometric multiplicity >1), first take a set of linearly independent vectors, then perform Gram-Schmidt in that subspace to make them pairwise orthogonal and normalize (adjust length to 1). |
| 4. Arrange the modified eigenvectors horizontally according to the order of eigenvalues on the diagonal to obtain the orthogonal matrix | Arrange the modified eigenvectors as columns in according to the order of the diagonal eigenvalues. At this point, , and we have . |
- A specific small example of
Let
① Find the eigenvalues
② Find the eigenvectors
- :
- :
③ Normalize
④ Assemble and verify
6. Determinant (det · )#
The determinant (determinant) is an operation that maps a square matrix to a scalar .
This scalar encapsulates the most core geometric and algebraic information of the matrix: volume scaling factor, invertibility, product of eigenvalues, etc.
Formula
Order | Formula |
---|---|
"Sarrus' rule" or expand along the first row: | |
Core properties (any definition must satisfy) |
Property | Explanation |
---|---|
Multiplicativity | |
Invertibility Criterion | is invertible |
Linearity in Rows/Columns | Each row (or column) is linear in its elements |
Alternating | Swapping two rows (or columns) ⇒ determinant changes sign |
Diagonal Product | Upper/lower triangular matrix: |
Eigenvalue Product | (including multiplicities) |
3×3 Manual Calculation Example
Let
Expand along the first row:
In summary
"Taking the determinant" means: collapsing an square matrix into a single number through a set of alternating, linear rules, and this number encodes key information such as the matrix's volume scaling, direction, invertibility, and product of eigenvalues.
7. Rank of a Matrix#
What exactly is the "rank" of a matrix?
Equivalent Perspective | Intuitive Explanation |
---|---|
Linear Independence | The number of linearly independent vectors that can be selected from the rows (or columns) is the rank. |
Dimensionality of Space | The dimension of the subspace spanned by the column vectors (column space) = the dimension of the subspace spanned by the row vectors (row space) = the rank. |
Full Rank Minor | The order of the largest non-zero determinant in the matrix = the rank. |
Singular Values | In SVD , the number of non-zero singular values = the rank. |
Linear Independence
Below are three comparative cases using a small matrix to make the statement "rank = how many linearly independent column (or row) vectors can be selected" clear.
| Matrix $A$ | Column Vectors Written as | Linear Relationship | Rank |
|------------|-----------------------------------------------|-----------|--------|
| |
| All three columns lie on the same line—only 1 independent vector | 1 |
| |
| are not collinear ⇒ 2-dimensional plane; lies in this plane | 2 |
| |
| Any two columns cannot linearly express the third column ⇒ All three columns span the entire | 3 |
How to determine "independence"?
-
Manual Calculation Form a matrix with the columns and perform elimination → The number of non-zero rows is the rank.
-
Concept If there exist constants $c_1,c_2,c_3$ such that $c_1v_1+c_2v_2+c_3v_3=0$ and not all are 0, the vectors are dependent; otherwise, they are independent.
-
Case 1: $2v_1-v_2=0$ → dependent
-
Case 2: Only $v_3=v_1+v_2$ is dependent, $v_1,v_2$ are independent
-
Case 3: Any non-trivial combination ≠ 0 → All three vectors are independent
-
In summary: The rank = how much independent information (dimension) this matrix can truly "hold."
8. Low-Rank Approximation#
Why does truncating SVD (low-rank approximation) only require storing k (m+n)+k
numbers?
When the original matrix
is truncated to rank and written as
Block | Shape | Number of Scalars to Save | Explanation |
---|---|---|---|
Left singular vectors: only take the first columns | |||
Right singular vectors: similarly | |||
diagonal | Only keep the singular values on the diagonal |
Adding these three parts gives
-
and : Each has columns, and each column stores a vector of length equal to the number of rows
scalars. -
: It is a diagonal matrix, only needing those diagonal elements—not .
Therefore, using a rank- SVD approximation to replace the original storage, the parameter count reduces from to .
If , the saved space is quite considerable.
Lower rank = reduced information dimension, low-rank storage = reduced parameter count/memory simultaneously
9. Norms#
The "double vertical bars" $|,\cdot,|$ in linear algebra represent norms.
-
For a vector , the most commonly used is the Euclidean norm:
In the figure, is the sum of the squares of each component of the vector . -
For a matrix , if also written as , it commonly refers to the Frobenius norm: . However, the figures here involve vectors.
In contrast, the single vertical bar usually denotes absolute value (scalar) or determinant . Thus, double vertical bars signify the "length" of vectors/matrices, while single vertical bars denote scalar size or determinant—different objects and meanings.
Common Euclidean Distance for Vectors -- 2-Norm (L2 Norm)#
import torch
b = torch.tensor([3.0, 4.0])
print(b.norm()) # Outputs 5.0
.norm()
is a method of PyTorch tensors (torch.Tensor).
Common Frobenius Norm for Matrices#
Matrices also have "length"—the commonly used is the Frobenius norm
Name | Notation | Formula (for $A\in\mathbb R^{m\times n}$) | Analogy with Vectors |
---|---|---|---|
Frobenius Norm | $\displaystyle|A|_F$ | $\displaystyle\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}A_{ij}^{2}}$ | Similar to vector 2-norm $|v|=\sqrt{\sum v_i^2}$ |
1. Why can it also be written as "matrix dot product"
The commonly used inner product in matrix space is
where is the trace operation (sum of diagonal elements).
Taking the inner product of with itself gives
Thus:
This is the matrix version of —just replacing the vector dot product with the "trace dot product."
The Frobenius norm is indeed equal to the square root of the sum of the squares of all singular values, that is:
Here:
-
$| A |_F$ is the Frobenius norm of matrix $A$
-
$\sigma_i$ is the singular value of $A$
The Frobenius norm is indeed equal to the square root of the sum of the squares of all singular values
Expanded explanation:
The Frobenius norm is defined as:
But singular value decomposition (SVD) tells us:
where is a diagonal matrix, with singular values on the main diagonal.
Since the Frobenius norm does not change (unit orthogonal transformations do not change the norm), we can directly compute:
Thus, ultimately:
Beware of misconceptions
Note:
✅ Not the square root of a single singular value, nor the maximum singular value
✅ Is the sum of the squares of all singular values, then taking the square root
The spectral norm looks at "the direction that stretches the most," while the Frobenius norm accumulates all energy.
Spectral Norm of a Matrix#
✅ Definition of Spectral Norm
The spectral norm of matrix is defined as:
Simply put, it is the maximum value to which matrix stretches a unit vector.
Singular values inherently represent the stretching transformations of the matrix.
It equals the maximum singular value of :
From another perspective: the spectral norm ≈ the maximum length to which a unit vector is stretched after being input into the matrix.
✅ Its relationship with the Frobenius norm
-
Frobenius Norm → Looks at overall energy (sum of squares of matrix elements)
-
Spectral Norm → Looks at the maximum stretching amount in a single direction
In other words:
-
The Frobenius norm is like the total "volume" of the matrix
-
The spectral norm is like the "most extreme" stretching rate in a single direction
✅ Example: Why is it important?
Imagine a linear layer in a neural network:
-
If is very large, even small perturbations in the input will be amplified,
making the network prone to overfitting and sensitive to noise. -
If is moderate, the output changes will be stable against input perturbations,
leading to better generalization.
Thus, modern methods (like spectral normalization)
will directly constrain the spectral norm of within a certain range during training.
⚠ Straightforward drawbacks
The spectral norm is powerful, but:
-
It only focuses on a single maximum direction, ignoring the stretching in other directions;
-
It is more complex to compute than the Frobenius norm (requires singular value decomposition, rather than a simple sum of squares).
Summary Comparison
Euclidean Norm (2-norm, ‖v‖) | Frobenius Norm (‖A‖F) | |
---|---|---|
Object | Vector | Matrix |
Definition | ||
Equivalent Expression | ||
Geometric Meaning | Length of a vector in -dimensional Euclidean space | Length when viewing matrix elements as a "long vector" |
Unit/Scale | Same metric as coordinate axes | Same; does not depend on the arrangement of rows and columns for matrices |
Common Uses | Error measurement, regularization , distance | Weight decay, matrix approximation error, kernel methods |
Relationship with Spectral Norm | (only one singular value) | ; equal if rank = 1 |
- Same idea, different dimensions
-
The Euclidean norm is the square root of a vector's dot product with itself.
-
The Frobenius norm treats all matrix elements as a long vector and does the same; written in matrix language as
This is "transpose → multiply → take trace."
- When to use which?
Scenario | Recommended Norm | Reason |
---|---|---|
Prediction error, gradient descent | Euclidean (vector residual) | Residuals are naturally column vectors |
Regularization of network weights (Dense / Conv) | Frobenius | Does not care about parameter shape, only overall magnitude |
Comparing matrix approximation quality (SVD, PCA) | Frobenius | Easily corresponds with the sum of squares of singular values |
Stability/Lipschitz bounds | Spectral Norm () | Concerned with amplification rate rather than total energy |
- Intuitive Differences
-
Euclidean: Measures the length in a single direction;
-
Frobenius: Measures the total energy of each element, so for matrices, no particular row or column is special; all elements are treated equally.
One-sentence memory:
Euclidean Norm: The "ruler" for vectors.
Frobenius Norm: Measuring the overall size of a matrix after "flattening" it with the same ruler.
10. Transpose of Matrix Multiplication#
In matrix algebra, there is a fixed "flip order" rule for the transpose of the product of two (or more) matrices:
This means transpose each matrix first, then reverse the order of multiplication.
This property holds for any dimension-matching real (or complex) matrices and can be recursively extended:
xLog Editing Markdown Document Notes
-
Ensure all mathematical expressions are enclosed in
$$
…$$
-
If there are single
$n \times n$
types, change them ton × n
or$$n\\times n$$
Reference video: