5695 - 大數據技術平台與應用

Big Data Technology Platform and Applications

教育目標 Course Target

「巨量資料/大數據(Big Data)」在我們的生活裡已經掀起滔天巨浪，繼雲端運算(Cloud Computing) 之後，儼然成為學術界跟科技業中最熱門的潮字，似乎每家公司都在進行有關的研究，三句不離大數據。巨量資料時代，統計與資料分析是根本中的根本。數據專家(Data Scientist)或量化分析師(Quantitative Analyst)的專業包含了統計學、電腦科學和數學，過去這些人才都搶著要進華爾街工作，但多虧了 Big Data 帶來的風潮，現在各行各業都在尋找擁有量化分析、統計學背景的工程師、數據專家。本課程將以實際體驗Hadoop多台主機的分散式叢集架構，做到HDFS分散式儲存和MapReduce的叢集運算，達到Big Data的處理與分析。學習Hadoop儲存系統與資源管理框架及Spark In-Memory巨量資料相關關鍵技術。資料分析軟體及程式語言-Python或R語言做為進入巨量資料分析的初階基本課程，相信要進大數據一行不成問題。
在大數據技術平台與應用這門課程中，將帶領學生了解當前最當紅的與大數據技術與平台，並利用相關的開放源碼框架實作學習，使學生學習到符合目前與未來發展趨勢的基礎原理與相關的實作技術。從服務雲端化至大數據軟體環境建置及應用實作，達到理論與實務兼備的教學目標。
讓學生暸解與熟悉代表性的巨量資料分析技術之操作、應用與實現的方法
讓學生暸解與熟悉常見的巨量資料運算平台之原理、架構，並實際建置與操作
讓學生暸解與熟悉巨量資料之熱門議題，如：高效能運算(High Performance Computing)、機器學習(Machine Learning)、雲端計算(Cloud Computing)、資料探勘(Data Mining)
提高學生對於巨量資料分析技術與相關應用的興趣，培植國內相關領域之可用人才。
利用實作達到理論與實務兼備的教學目的。
實驗1：Data analytics on single machine，利用巨量資料分析技術(Python或R、Weka或Scikit-Learn)觀察生活現象，在本課程提供了四個搭乘計程車的問題供學生實作。
實驗2：Big Data analytics on Big Data platform，使用Java、Scala或Python在Hadoop平台上運行Spark處理大數據資料，本課程要求學生實作”word count”範例程式作為練習，再將此程式修改並搭配實驗一的題目來做更深入的研究與討論。

"Big Data" has made huge waves in our lives. Following Cloud Computing, it has become the hottest buzzword in academia and the technology industry. It seems that every company is conducting related research, and big data is inseparable. In the era of huge amounts of data, statistics and data analysis are the fundamentals. The majors of Data Scientists or Quantitative Analysts include statistics, computer science and mathematics. In the past, these talents were rushing to work on Wall Street. But thanks to the trend brought by Big Data, all walks of life are now looking for engineers and data experts with backgrounds in quantitative analysis and statistics. This course will provide practical experience of the distributed cluster architecture of Hadoop multiple hosts, achieve HDFS distributed storage and MapReduce cluster operations, and achieve Big Data processing and analysis. Learn the key technologies related to Hadoop storage system and resource management framework and Spark In-Memory massive data. Data analysis software and programming language - Python or R language are the basic basic courses for entering into the analysis of huge amounts of data. I believe that it will not be a problem to enter the big data field.
In the course Big Data Technology Platforms and Applications, students will be led to understand the most popular big data technologies and platforms, and will use relevant open source frameworks for practical learning, so that students can learn the basic principles and related practical technologies that are in line with current and future development trends. From service cloudization to big data software environment construction and application implementation, the teaching goal is to achieve both theory and practice.
Let students understand and become familiar with the operation, application and implementation of representative massive data analysis techniques.
Let students understand and become familiar with the principles and architecture of common massive data computing platforms, and actually build and operate them.
Let students understand and become familiar with popular topics involving huge amounts of data, such as: High Performance Computing, Machine Learning, Cloud Computing, and Data Mining.
Improve students' interest in massive data analysis technology and related applications, and cultivate available talents in related fields in China.
Use practice to achieve the teaching purpose of both theory and practice.
Experiment 1: Data analytics on single machine, using massive data analysis technology (Python or R, Weka or Scikit-Learn) to observe life phenomena. This course provides four taxi-taking problems for students to practice.
Experiment 2: Big Data analytics on Big Data platform, using Java, Scala or Python to run Spark on the Hadoop platform to process big data data. This course requires students to implement the "word count" sample program as an exercise, and then modify this program and match it with the questions of Experiment 1 for more in-depth research and discussion.

參考書目 Reference Books

https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0

評分方式 Grading

評分項目 Grading Method	配分比例 Percentage	說明 Description
出席與討論 Attend and discuss	20
作業 Homework	60
期末考與期末分組專題 Final exam and final group topics	20

授課大綱 Course Plan

點擊下方連結查看詳細授課大綱
Click the link below to view the detailed course plan

查看授課大綱 View Course Plan

相似課程 Related Courses

無相似課程 No related courses found

課程資訊 Course Information

基本資料 Basic Information

課程代碼 Course Code: 5695
學分 Credit: 0-3
上課時間 Course Time:
Saturday/2,3,4[ST023]
授課教師 Teacher:
楊朝棟
修課班級 Class:
資工系4,碩1,2
選課備註 Memo:
三大領域：大數據；大四可選。同大數據碩士學分學程：巨量資料導論

選課狀態 Enrollment Status

目前選課人數 Current Enrollment: 40 人

請先登入才能進行選課登記
Please login first

交換生/外籍生選課登記

請點選上方按鈕加入登記清單，再等候任課教師審核。
Add this class to your wishlist by clicking the button above.

東海大學交換生課程資訊網