本課程將以實際的巨量資料為核心,讓學生接觸實際的巨量資料計畫,並學習相關的方法與技術。 課程會就資料的背景、來源、要解決的問題及相關的domain knowledge做說明。 接著,針對以下四個主題:1.資料搜集、儲存與整理;2.模型建立與分析方法;3.結果呈現、說明與視覺化;4.分析流程自動化軟體的雛型製作, 講述相關既存的概念、方法與實作工具,接著針對新穎方法進行討論。
世界上的資料量正在迅速增長。大型綜合巡天望遠鏡(Large Synoptic Survey Telescope, LSST)計畫,每晚可收集約20 TB (1 TB=1000 GB) 的天文資料;單一醫學機構只要花一天,就能完成人類30億個鹼基對的定序;美國股市每天大約會成交70億股;網路公司,像Google每天得處理超過24 PB (1 PB=1000 TB) 的資料,Facebook每小時會收到超過1千萬張新照片、30億次的留言,YouTube用戶每秒上傳的影片總長度超過1小時。巧妙運用這些「巨量資料」(big data),將可為我們的生活從醫療、政府、教育、經濟、人文各個方面,帶來新的價值與創新。然而巨量資料的內容常常是混亂不齊、品質不一,而且分布在無數伺服器中。因此如何從巨量資料裡,引出潛藏其中的價值,便成為現在最急迫的工作,一個新的科學領域:資料科學(data science)也孕育而生。(參考來源:http://www.stat.nctu.edu.tw/data/super_pages.php?ID=data1)This course will focus on the huge amount of actual data, allowing students to come into contact with the huge amount of actual data plans and learn related methods and techniques. The course will explain the background, source of the data, the problems to be solved and the related domain knowledge. Next, we will discuss the following four topics: 1. Data collection, storage and organization; 2. Model establishment and analysis methods; 3. Results presentation, explanation and visualization; 4. Automatic software production of the analysis process, describe related concepts, methods and practical tools, and then discuss the new methods.
The data volume in the world is growing rapidly. Large Synoptic Survey Telescope (LSST) project, which can collect about 20 TB (1 TB=1000 GB) of astronomical data every night; a single medical institution can complete the human-based sequence of 3 billion yuan in just one day; the U.S. stock market will trade about 7 billion shares every day; Internet companies, such as Google, have to process more than 24 PB (1 PB=1000 TB) per day For information, Facebook receives more than 10,000 new photos and 3 billion messages every hour, and the total length of videos uploaded by YouTube users per second exceeds 1 hour. Cleverly using these "big data" will bring new value and innovation to our lives from medical, government, education, economy, and humanities. However, the content of huge amounts of data is often chaotic, of different quality, and is distributed in countless servers. Therefore, how to draw out the value hidden in it from a huge amount of data has become the most urgent task now, and a new scientific field: data science (data science) has also been nurtured. (Reference source: http://www.stat.nctu.edu.tw/data/super_pages.php?ID=data1)
1. 講義與SAS原廠教案。
2. 應用 R 語言於資料分析- 從機器學習、資料探勘到巨量資料。
1. Talk about the original SAS factory lesson plan.
2. Application R language in data analysis - from machine learning and data exploration to huge amounts of data.
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
平時作業與點名平時作業與點名 Normal business and point names |
40 | 平時作業含專題製作 |
期中考期中考 Midterm exam |
30 | |
期末考與期末分組專題期末考與期末分組專題 Final exam and final division topics |
30 |