ORF 570 Statistical Machine Learning

Fall Semester, 2023
MW 3:00pm - 4:20pm 

Text Books

Textbooks Title and Author
Spectral Methods for Data Science Book Cover


Chen, Y., Chen, Y., Fan, J., and Ma, C. (2021).
Spectral Methods for Data Science: A Statistical Perspective.
Foundations and Trends in Machine Learning.

Statistical Foundations of Data Science Book Cover


Fan, J., Li, R., Zhang, C.-H., and Zou (2020). 
Statistical Foundations of Data Science.
CRC Press. 

General Information

Instructor: Jianqing Fan, Frederick L. Moore'18 Professor of Finance.
Office: 205 Sherred Hall
Phone: 258-7924
E-mail: [email protected]
Office Hours: Monday 1:40pm--2:30pm, Wednesday 10:30am--11:30am, or by appointments.

Teaching Assistant: Dr. Soham Jana
Office: 214 Sherred Hall
E-mail: [email protected]
Office Hours: Tuesday 10:30 am - 11:30 am, Thursday 10:30 am - 11:30 am., or by appointments.

Text Book

Reference Books

  • Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical learning with sparsity. CRC press, New York.
  • Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press.


This course covers several topics on statistical machine learning theory, methods, and algorithms for data science. Topics include (1) Spectral methods for Data Science. (2) Matrix perturbation theory and concentration inequalities. (3) Robust covariance regularization and graphical model. (4) Factor models and their applications (5) Matrix completion. (6) Graphical clustering and community detection. (7) Item ranking. (8) Deept Neurlal networks. (9) Uncertainty quantification in high dimension.  Students are expected to participate in paper surveying and presentations.

Course material will be covered the following topics.

  1. Introduction to Spectral Methods
    • Community Detection
    • Topic Modeling
    • Matrix Completion
    • Item Ranking
    • Factor Model and Covariation Regularization
    • Factor-Adjusted Regularized Model Selection
  2. Matrix Perturbation Theory
    • Matrix Norms
    • Distances and Angles of Eigenspaces
    • Eigenspace Pertubation Theory
    • Singular Subspace Perturbation Theory
    • Perturbation for Probability Transition Matrix
  3. Covariance Learning and Factor Models
    • Principal Component Analysis
    • Covariance Learning and Factor Models
    • Covariance Estimation with Observabe Factors
    • Asymptotic Properties of PCA Based Estimators
    • Factor-Adjusted Robust Multiple Testing
    • Factor Augmented Regression Methods for Prediction
  4. Applications of l_2 Perturbation Theory
    • Matrix Tail Bounds
    • Community Detection
    • Low-rank Matrix Completion
    • Ranking from Pairwise Comparisons
    • PCA and Factor Models
  5. Applications of l2, \infty Perturbation Theory
    • Motivations:  Exact Recovery
    • Leave-one-out Analysis:  an illustrative example
    • l\infty eigenvector perturbation theory (rank-1)
    • Exact Recovery in Community Detection
    • l2, \infty Eigenspace Purturbatioj Theory (rank-r)
    • Entrywise error in Matrix Completion
  6.  Recent Developments on Statistical Machine Learning
    • FAST-DNN for Big Data Modeling
    • Inferences for HeteroPCA and Matrix Completion
    • Ranking Inferences Based on Top Choices of Multiway Comparisons
    • Universally Trainable Optimal Prediction Intervals Aggregation
    • Inferences on Mixing Probabilities and Ranking in Mixed-Membership Models
    • Spectral Ranking Inferences based on General Multiway Comparisons
    • Other Papers of Students' Choice


Attendance of the class is required and essential.  The course materials are mainly from the notes.  Many conceptual issues and statistical thinking are only taught in the class. They will appear in the midterm and final exams.  

Schedules and Tentative Grading Policy

Assignment Schedule
Participation (30%) Throughout the semester
Presentation (70%) Before the end of reading period
or Term paper (70%) Before the end of reading period