Home
資訊工程學系
course information of 108 - 2 | 5695 Big Data Technology Platform and Applications(大數據技術平台與應用)

5695 - 大數據技術平台與應用 Big Data Technology Platform and Applications


教育目標 Course Target

「巨量資料/大數據(Big Data)」在我們的生活裡已經掀起滔天巨浪,繼雲端運算(Cloud Computing) 之後,儼然成為學術界跟科技業中最熱門的潮字,似乎每家公司都在進行有關的研究,三句不離大數 據。巨量資料時代,統計與資料分析是根本中的根本。數據專家(Data Scientist)或量化分析師(Quantitative Analyst)的專業包含了統計學、電腦科學和數學,過去這些人才都搶著要進華爾街工 作,但多虧了 Big Data 帶來的風潮,現在各行各業都在尋找擁有量化分析、統計學背景的工程師、數據專家。本課程將以實際體驗Hadoop多台主機的分散式叢集架構,做到HDFS分散式儲存和MapReduce的叢集運算,達到Big Data的處理與分析。學習Hadoop儲存系統與資源管理框架及Spark In-Memory巨量資料相關關鍵技術。資料分析軟體及程式語言-Python或R語言做為進入巨量資料分 析的初階基本課程,相信要進大數據一行不成問題。 在大數據技術平台與應用這門課程中,將帶領學生了解當前最當紅的與大數據技術與平台,並利用相關的開放源碼框架實作學習,使學生學習到符合目前與未來發展趨勢的基礎原理與相關的實作技術。從服務雲端化至大數據軟體環境建置及應用實作,達到理論與實務兼備的教學目標。 讓學生暸解與熟悉代表性的巨量資料分析技術之操作、應用與實現的方法 讓學生暸解與熟悉常見的巨量資料運算平台之原理、架構,並實際建置與操作 讓學生暸解與熟悉巨量資料之熱門議題,如:高效能運算(High Performance Computing)、機器學習(Machine Learning)、雲端計算(Cloud Computing)、資料探勘(Data Mining) 提高學生對於巨量資料分析技術與相關應用的興趣,培植國內相關領域之可用人才。 利用實作達到理論與實務兼備的教學目的。 實驗1:Data analytics on single machine,利用巨量資料分析技術(Python或R、Weka或Scikit-Learn)觀察生活現象,在本課程提供了四個搭乘計程車的問題供學生實作。 實驗2:Big Data analytics on Big Data platform,使用Java、Scala或Python在Hadoop平台上運行Spark處理大數據資料,本課程要求學生實作”word count”範例程式作為練習,再將此程式修改並搭配實驗一的題目來做更深入的研究與討論。"Big Data" has made huge waves in our lives. After Cloud Computing, it has become the hottest trend word in academia and technology industry. It seems that every company They are all conducting related research, and big data is indispensable. In the era of huge amounts of data, statistics and data analysis are the fundamentals. The majors of Data Scientists or Quantitative Analysts include statistics, computer science and mathematics. In the past, these talents rushed to work on Wall Street, but thanks to the trend brought by Big Data, now they are all in different walks of life. The industry is looking for engineers and data experts with backgrounds in quantitative analysis and statistics. This course will provide practical experience of the distributed cluster architecture of Hadoop multiple hosts, achieve HDFS distributed storage and MapReduce cluster operations, and achieve Big Data processing and analysis. Learn the key technologies related to Hadoop storage system and resource management framework and Spark In-Memory massive data. Data analysis software and programming language - Python or R language are the basic basic courses for entry into massive data analysis. I believe it will not be a problem to enter the big data field. In the course Big Data Technology Platforms and Applications, students will be led to understand the most popular big data technologies and platforms, and will use relevant open source frameworks to practice learning, so that students can learn the basics that are in line with current and future development trends. Principles and related implementation techniques. From service cloudization to big data software environment construction and application implementation, the teaching goal is to achieve both theory and practice. Let students understand and become familiar with the operation, application and implementation of representative massive data analysis techniques. Let students understand and become familiar with the principles and architecture of common massive data computing platforms, and actually build and operate them. Let students understand and become familiar with popular topics involving huge amounts of data, such as: High Performance Computing, Machine Learning, Cloud Computing, and Data Mining. Improve students' interest in massive data analysis technology and related applications, and cultivate available talents in related fields in China. Use practice to achieve the teaching purpose of both theory and practice. Experiment 1: Data analytics on single machine, using massive data analysis technology (Python or R, Weka or Scikit-Learn) to observe life phenomena. This course provides four taxi-taking problems for students to practice. Experiment 2: Big Data analytics on Big Data platform, use Java, Scala or Python to run Spark on the Hadoop platform to process big data data. This course requires students to implement the "word count" sample program as an exercise, and then modify and match the program The topic of Experiment 1 will be used for more in-depth research and discussion.


參考書目 Reference Books

https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0


https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0


評分方式 Grading

評分項目 Grading Method 配分比例 Grading percentage 說明 Description
出席與討論出席與討論
Attend and discuss
20
作業作業
Homework
60
期末考與期末分組專題期末考與期末分組專題
Final exam and final group topics
20

授課大綱 Course Plan

Click here to open the course plan. Course Plan
交換生/外籍生選課登記 - 請點選下方按鈕加入登記清單,再等候任課教師審核。
Add this class to your wishlist by click the button below.
請先登入才能進行選課登記 Please login first


相似課程 Related Course

很抱歉,沒有符合條件的課程。 Sorry , no courses found.

Course Information

Description

學分 Credit:0-3
上課時間 Course Time:Saturday/2,3,4[ST023]
授課教師 Teacher:楊朝棟
修課班級 Class:資工系4,碩1,2
選課備註 Memo:大數據、資料科學產業領域;遠距教學課程;大四可選
授課大綱 Course Plan: Open

選課狀態 Attendance

There're now 32 person in the class.
目前選課人數為 32 人。

請先登入才能進行選課登記 Please login first