資料視覺化分析對於清理資料、探索資料結構、偵測界外值(outliers)及異常群體、辨識趨勢及叢聚(clusters)、發現局部模式(pattern)、評估模型分析輸出(output)、與呈現分析結果,都相當有幫助。資料視覺化分析對於探索性數據分析(exploratory data analysis)、資料採礦(data mining)、網絡分析(network analysis)更是不可或缺的。
本課程主要目標為使用 R 軟體與實際調查資料,來展現資料視覺化分析能展現資料中的哪些資訊。透過實際操作來增加具體設計並畫出統計圖像、詮釋統計圖像的經驗,來有效率地了解資料視覺化分析。
Data visual analysis is useful for cleaning data, exploring data structure, detecting outliers and abnormal groups, identifying trends and clusters, discovering local patterns, evaluating model analysis output, and presenting analysis. The results are quite helpful. Data visual analysis is indispensable for exploratory data analysis, data mining, and network analysis.
The main goal of this course is to use R software and actual survey data to demonstrate what information visual analysis of data can reveal in the data. Through practical operations, you can gain experience in designing, drawing, and interpreting statistical images to effectively understand data visualization analysis.
Data visualization is an important issue that can arise in high-dimensional data analysis. It has become increasingly more important due to the advent of computer and graphics technology. The difficulty lies on how to visualize a high dimensional structure or data set. Such kinds of questions do have a common root in Statistics. This course will introduce some statistical methodologies useful for exploring voluminous data. The main topics include, but not limited to, two parts. The first part is based on dimension reduction methods which include Principal Component Analysis (PCA), Projection Pursuit, Sliced Inverse Regression (SIR), Principal Hessian Direction (PHD), Minimum Average Variance Estimation (MAVE) and LASSO etc. The second part is just a collection of dimension free methods which consist of Parallel Coordinate Plot, Matrix Visualization, Generalized Association Plots (GAP) etc. Most of methods will be discussed from both theoretical and practical perspective for the entire course. Examples from various application areas will be given.
Data visualization is an important issue that can arise in high-dimensional data analysis. It has become increasingly more important due to the advent of computer and graphics technology. The difficulty lies on how to visualize a high dimensional structure or data set. Such kinds of questions do have a common root in Statistics. This course will introduce some statistical methodologies useful for exploring voluminous data. The main topics include, but not limited to, two parts. The first part is based on dimension reduction methods which include Principal Component Analysis ( PCA), Projection Pursuit, Sliced Inverse Regression (SIR), Principal Hessian Direction (PHD), Minimum Average Variance Estimation (MAVE) and LASSO etc. The second part is just a collection of dimension free methods which consist of Parallel Coordinate Plot, Matrix Visualization, Generalized Association Plots ( GAP) etc. Most of methods will be discussed from both theoretical and practical perspective for the entire course. Examples from various application areas will be given.
(1) Monaé Everett, 2015, Graphical Data Analysis with R, CRC Press.
(2) Tamara Munzner, 2014, Visualization Analysis and Design, CRC Press.
(3) Claus O. Wilke, 2019, Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures, O’Reilly Media.
(4) Winston Chang, 2012, R Graphics Cookbook: Practical Recipes for Visualizing Data, O’Reilly Media.
(5) Eric D. Kolaczyk and Gábor Csárdi, 2020, Statistical Analysis of Network Data with R, Springer.
(1) Monaé Everett, 2015, Graphical Data Analysis with R, CRC Press.
(2) Tamara Munzner, 2014, Visualization Analysis and Design, CRC Press.
(3) Claus O. Wilke, 2019, Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures, O’Reilly Media.
(4) Winston Chang, 2012, R Graphics Cookbook: Practical Recipes for Visualizing Data, O’Reilly Media.
(5) Eric D. Kolaczyk and Gábor Csárdi, 2020, Statistical Analysis of Network Data with R, Springer.
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
課堂參與出席課堂參與出席 class participation attendance |
15 | 每週上課點名,期末出席成績由 R 軟體隨機選出五週計分。若需請假,請將請假證明寄至教師 email 。 |
平日小組作業平日小組作業 Daily group work |
20 | 包含隨堂電腦軟體操作及輸出。 |
期中小組口頭報告期中小組口頭報告 Midterm group oral report |
15 | 包含關於 R 軟體的操作與分析內容的詮釋;請針對所選資料之研究議題做出之視覺化分析與詮釋,引用及書寫格式請比照期刊撰稿體例,若有抄襲之情事以零分計。 |
期中小組書面報告期中小組書面報告 Midterm team written report |
15 | 包含關於 R 軟體的操作與分析內容的詮釋;請針對所選資料之研究議題做出之視覺化分析與詮釋,引用及書寫格式請比照期刊撰稿體例,若有抄襲之情事以零分計。 |
期末小組口頭報告期末小組口頭報告 Final group oral report |
15 | 包含關於 R 軟體的操作與分析內容的詮釋;請針對所選資料之研究議題做出之視覺化分析與詮釋,引用及書寫格式請比照期刊撰稿體例,若有抄襲之情事以零分計。 |
期末小組書面報告期末小組書面報告 Final group written report |
20 | 包含關於 R 軟體的操作與分析內容的詮釋;請針對所選資料之研究議題做出之視覺化分析與詮釋,引用及書寫格式請比照期刊撰稿體例,若有抄襲之情事以零分計。 |