「巨量資料/大數據(Big Data)」在我們的生活裡已經掀起滔天巨浪,繼雲端運算(Cloud Computing) 之後,儼然成為學術界跟科技業中最熱門的潮字,似乎每家公司都在進行有關的研究,三句不離大數 據。巨量資料時代,統計與資料分析是根本中的根本。數據專家(Data Scientist)或量化分析師(Quantitative Analyst)的專業包含了統計學、電腦科學和數學,過去這些人才都搶著要進華爾街工 作,但多虧了 Big Data 帶來的風潮,現在各行各業都在尋找擁有量化分析、統計學背景的工程師、數據專家。本課程將以實際體驗Hadoop多台主機的分散式叢集架構,做到HDFS分散式儲存和MapReduce的叢集運算,達到Big Data的處理與分析。學習Hadoop儲存系統與資源管理框架及Spark In-Memory巨量資料相關關鍵技術。資料分析軟體及程式語言-Python或R語言做為進入巨量資料分 析的初階基本課程,相信要進大數據一行不成問題。
在大數據技術平台與應用這門課程中,將帶領學生了解當前最當紅的與大數據技術與平台,並利用相關的開放源碼框架實作學習,使學生學習到符合目前與未來發展趨勢的基礎原理與相關的實作技術。從服務雲端化至大數據軟體環境建置及應用實作,達到理論與實務兼備的教學目標。
讓學生暸解與熟悉代表性的巨量資料分析技術之操作、應用與實現的方法
讓學生暸解與熟悉常見的巨量資料運算平台之原理、架構,並實際建置與操作
讓學生暸解與熟悉巨量資料之熱門議題,如:高效能運算(High Performance Computing)、機器學習(Machine Learning)、雲端計算(Cloud Computing)、資料探勘(Data Mining)
提高學生對於巨量資料分析技術與相關應用的興趣,培植國內相關領域之可用人才。
利用實作達到理論與實務兼備的教學目的。
實驗1:Data analytics on single machine,利用巨量資料分析技術(Python或R、Weka或Scikit-Learn)觀察生活現象,在本課程提供了四個搭乘計程車的問題供學生實作。
實驗2:Big Data analytics on Big Data platform,使用Java、Scala或Python在Hadoop平台上運行Spark處理大數據資料,本課程要求學生實作”word count”範例程式作為練習,再將此程式修改並搭配實驗一的題目來做更深入的研究與討論。"Big Data" has caused huge waves in our lives. After Cloud Computing, it has become the hottest trend in the academic and technological industry. It seems that every company is conducting relevant research, and three sentences are not inferior to the large number. In the era of huge data, statistics and data analysis are the fundamentals. Data Scientist or Quantitative Analyst's profession includes statistics, computer science and mathematics. In the past, these talents have been dying to work in the Huaer Street, but more of the trend brought by Big Data are being found. Now all walks of life are looking for engineers and data experts with backgrounds in quantitative analysis and statistical analysis. This course will actually experience the distributed assembly architecture of Hadoop multiple hosts, achieve HDFS distributed storage and MapReduce collection computing, and achieve Big Data processing and analysis. Learn Hadoop storage system, resource management framework and Spark In-Memory's huge data-related key technologies. Data analysis software and programming language - Python or R language is the basic course of entering a huge amount of data analysis. I believe that it will be impossible to get into a large number of data analysis.
In the course of large-data technology platforms and applications, students will be led to understand the most current and large-data technology and platforms, and use relevant open source code frameworks to implement learning, so that students can learn basic principles and related practical technologies that are in line with current and future development trends. From service cloudization to large-scale software environment construction and application implementation, we achieve both theoretical and practical teaching goals.
The method to allow students to understand and familiarize themselves with the operation, application and implementation of representative huge data analysis techniques
Let students understand and familiarize themselves with the principles and structures of common large data computing platforms, and actually build and operate them
Let students understand and be familiar with the popular topics of huge amounts of data, such as: high performance computing, machine learning, cloud computing, data mining
Improve students' interest in huge amounts of data analysis technology and related applications, and cultivate available talents in relevant fields within the country.
Use practice to achieve both theoretical and practical teaching purposes.
Experiment 1: Data analytics on single machine, using huge data analysis techniques (Python or R, Weka or Scikit-Learn) to observe life phenomena, provides four problems with riding a scheduled car for students to implement in this course.
Experiment 2: Big Data analytics on Big Data platform, using Java, Scala or Python to run Spark on the Hadoop platform to process large data. This course requires students to implement the "word count" routine program as practice, and then modify this program and combine it with Experiment 1 topics for more in-depth research and discussion.
https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0
https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
出席與討論出席與討論 Attendance and discussion |
20 | |
作業作業 Action |
60 | |
期末考與期末分組專題期末考與期末分組專題 Final exam and final division topics |
20 |