學習多變量降維方法於統計實務的應用分析。本課程介紹多個創新的統計降維方法[例如:分段逆迴歸法(Sliced Inverse Regression)、主要黑森定向法(principal Hessian Directions)等],用以達到資料縮減(data reduction)的目的,並且應用至大量資料集。
A Tentative Course Outline :
Week 1 : 迴歸分析之維度縮減;主成份分析 (PCA)
Week 2-3 : 分段逆迴歸法 (SIR)
Week 4-5 : 設限迴歸 (censored regression)
Week 6-7 : 主要黑森定向法 (PHD)
Week 8 : 多變項反應變數迴歸:Most Predictable Variables
Week 9 : 期中報告
Week 10-11 : 曲線資料分析 (curve data analysis)
Week 12-13 : 超高維之變數選取
Week 14: 最小平均變異估計法:MAVE
Week 15-17: 其他多變量方法: LASSO
Week 18 : 期末報告Learn the application analysis of multi-variable dimensionality reduction methods in statistical practice. This course introduces a number of innovative statistical dimensionality reduction methods [such as: Sliced Inverse Regression, principal Hessian Directions, etc.] to achieve the purpose of data reduction. and applied to large data sets.
A Tentative Course Outline:
Week 1: Dimensionality reduction in regression analysis; Principal Component Analysis (PCA)
Week 2-3: Segmented Inverse Regression (SIR)
Week 4-5: censored regression
Week 6-7 : Main Hessian Orientation (PHD)
Week 8: Multivariate Response Variable Regression: Most Predictable Variables
Week 9: Interim report
Week 10-11: Curve data analysis
Week 12-13: Super high-dimensional variable selection
Week 14: Minimum Mean Variation Estimation Method: MAVE
Week 15-17: Other multivariate methods: LASSO
Week 18: Final report
The reduction of dimension is an issue that can arise in every scientific field. Generally speaking, the difficulty lies on how to visualize a high dimensional function or data set. People often ask: How do they look?, What structures are there?, What model should be used? Aside from the differences that underlie the various scientific contexts, such kinds of questions do have a common root in Statistics. This is the driving force for the study of high dimensional data analysis. This course will discuss several statistical methodologies useful for exploring voluminous data. They include principal component analysis, clustering and classification, survival analysis and other recent developed sufficient dimension reduction (SDR) methods. Sliced inverse regression (SIR) and principal Hessian direction (PHD) are two novel SDR methods, useful for the extraction of geometric information underlying noisy data of several dimensions. The theories of several SDR methods will be discussed in depth. They will be used as the backbone for the entire course. Examples from various application areas will be given. They include social/economic problems like unemployment rates, biostatistics problems like clinic trials with censoring, machine learning problems like handwritten digital recognition, biomedical problems like functional Magnet Resonance Imaging, and bioinformatics problems like micro-array gene expression etc.
The reduction of dimension is an issue that can arise in every scientific field. Generally speaking, the difficulty lies on how to visualize a high dimensional function or data set. People often ask: How do they look?, What structures are there?, What model should be used? Aside from the differences that underlie the various scientific contexts, such kinds of questions do have a common root in Statistics. This is the driving force for the study of high dimensional data analysis. This course will discuss several statistical methodologies useful for exploring voluminous data. They include principal component analysis, clustering and classification, survival analysis and other recently developed sufficient dimension reduction (SDR) methods. Sliced inverse regression (SIR) and principal Hessian direction (PHD) are two novel SDR methods, useful for the extraction of geometric information underlying noisy data of several dimensions. The theories of several SDR methods will be discussed in depth. They will be used as the backbone for the entire course. Examples from various application areas will be given. They include social/economic problems like unemployment rates, biostatistics problems like clinic trials with censoring, machine learning problems like handwritten digital recognition, biomedical problems like functional Magnet Resonance Imaging, and bioinformatics problems like micro-array gene expression etc.
No textbook. Lecture notes and selected papers will be available.
no textbook. lecture notes and selected papers will be available.
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
作業作業 Homework |
30 | |
學期研究報告(Term paper)學期研究報告(Term paper) Term paper |
40 | 大型資料分析或程式建立(software development) |
論文選讀報告論文選讀報告 Paper selection report |
30 |