本課程主要是針對統計系及資工系開課,也歡迎管理學院修習過統計學課程的學生選修。透過課程講授、個人作業與分組期末報告,在學期末時,修課學生應該具備下面的觀念與技能: (1) 了解什麼資料用什麼方法。 (2) 瞭解資料分析與探勘的過程與步驟方法。 (3) 使用R或Python軟體進行程式撰寫與統計分析。 (4) 視覺化資料進行溝通。 (5) 針對一個有趣或產業的問題,運用教授的統計學習與R、Python語言,進行真實資料的分析。This course is mainly for the Department of Statistics and Qualifications, and students who have passed the Statistics course are also welcome to choose from. Through course lectures, personal work and sub-substantiated final reports, at the end of the year, students in the course should have the following concepts and skills: (1) What methods should be used to understand what information is used. (2) Understand the process and step-by-step methods of data analysis and exploration. (3) Use R or Python software for programming and statistical analysis. (4) Visualize the data for communication. (5) Use the professor's statistical learning and R and Python languages to analyze real data for an interesting or industrial problem.
本課程目標培育Data+人才,所謂Data+人才能處理大數據、統計與機器學習模型、解讀分析結果,透過學習統計學、資料科學、機器學習技術能直接有效地解決實際問題。課程內容是基於統計信息理解數據的框架,可以將其分為有監督式學習或非監督式學習,也可以說是對複雜資料分析與建模的工具和方法。它是統計領域的一個近期發展領域,與計算機科學特別是機器學習並行發展融為一體。本課程介紹不同型態與複雜度的資料,包含資料清理、特徵選擇處理及常用的統計與機器學習方法,例如:迴歸、分類和迴歸樹以及Boosting和Support Vector Machine、廣義估計模式(GEE)、自動挑選分類技術的最佳設定參數方法、最常見的問題不平衡資料 (imbalanced data) 的處理、集群分析 (clustering analysis)與其他多變量方法及集成學習 (ensemble learning)、卷積神經網路(CNN)、RNN、訓練及優化類神經網路。每個方法都搭配實例演練與分析,R語言、python語言、 TensorFlow /Keras是課程必備,期末專題同學組隊接受Kaggle競賽挑戰。
This course aims to cultivate Data+ talents. The so-called Data+ talents can handle large data, statistical and machine learning models, and read and analyze results. Through learning statistical studies, data science, and machine learning technologies, they can directly and effectively solve actual problems. The course content is a framework based on statistical information understanding data. It can be divided into supervised learning or non-supervised learning, or it can be said to be a tool and method for complex data analysis and modeling. It is a recent development field in the statistical field, integrating computer science, especially machine learning and development. This course introduces data of different types and complexity, including data cleaning, characteristic selection processing, and commonly used statistical and machine learning methods, such as reincarnation, classification and reincarnation trees, Boosting and Support Vector Machine, Broadcasting and Support Vector Machine, Glossary estimation mode (GEE), the best setting parameters for automatic classification technology, the most common problem imbalanced data processing, clustering analysis, other multivariate methods and integrated learning (ensemble) learning), volume neural network (CNN), RNN, training and optimization neural network. Each method is accompanied by example practice and analysis. R language, python language, TensorFlow /Keras are necessary courses. The final topics of the student team accepts the Kaggle competition challenge.
1.Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning with Applications in R.
2.Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Second Edition.
3.R語言機器學習。吳金朝 譯。
1.Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning with Applications in R.
2.Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Second Edition.
3.R language machine learning. Wu Jinchao Translation.
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
期中考期中考 Midterm exam |
30 | |
期末報告期末報告 Final report |
30 | |
作業作業 Action |
30 | |
平時表現和出席平時表現和出席 Performance and attendance at ordinary times |
10 |