Home
資訊工程學系
course information of 107 - 2 | 5695 Big Data Technology Platform and Applications(大數據技術平台與應用)

5695 - 大數據技術平台與應用 Big Data Technology Platform and Applications


教育目標 Course Target

「巨量資料/大數據(Big Data)」在我們的生活裡已經掀起滔天巨浪,繼雲端運算(Cloud Computing) 之後,儼然成為學術界跟科技業中最熱門的潮字,似乎每家公司都在進行有關的研究,三句不離大數 據。巨量資料時代,統計與資料分析是根本中的根本。數據專家(Data Scientist)或量化分析師(Quantitative Analyst)的專業包含了統計學、電腦科學和數學,過去這些人才都搶著要進華爾街工 作,但多虧了 Big Data 帶來的風潮,現在各行各業都在尋找擁有量化分析、統計學背景的工程師、數據專家。本課程將以實際體驗Hadoop多台主機的分散式叢集架構,做到HDFS分散式儲存和MapReduce的叢集運算,達到Big Data的處理與分析。學習Hadoop儲存系統與資源管理框架及Spark In-Memory巨量資料相關關鍵技術。資料分析軟體及程式語言-Python或R語言做為進入巨量資料分 析的初階基本課程,相信要進大數據一行不成問題。 在大數據技術平台與應用這門課程中,將帶領學生了解當前最當紅的與大數據技術與平台,並利用相關的開放源碼框架實作學習,使學生學習到符合目前與未來發展趨勢的基礎原理與相關的實作技術。從服務雲端化至大數據軟體環境建置及應用實作,達到理論與實務兼備的教學目標。 讓學生暸解與熟悉代表性的巨量資料分析技術之操作、應用與實現的方法 讓學生暸解與熟悉常見的巨量資料運算平台之原理、架構,並實際建置與操作 讓學生暸解與熟悉巨量資料之熱門議題,如:高效能運算(High Performance Computing)、機器學習(Machine Learning)、雲端計算(Cloud Computing)、資料探勘(Data Mining) 提高學生對於巨量資料分析技術與相關應用的興趣,培植國內相關領域之可用人才。 利用實作達到理論與實務兼備的教學目的。 實驗1:Data analytics on single machine,利用巨量資料分析技術(Python或R、Weka或Scikit-Learn)觀察生活現象,在本課程提供了四個搭乘計程車的問題供學生實作。 實驗2:Big Data analytics on Big Data platform,使用Java、Scala或Python在Hadoop平台上運行Spark處理大數據資料,本課程要求學生實作”word count”範例程式作為練習,再將此程式修改並搭配實驗一的題目來做更深入的研究與討論。"Big Data" has caused huge waves in our lives. After Cloud Computing, it has become the hottest trend in the academic and technological industry. It seems that every company is conducting relevant research, and three sentences are not inferior to the large number. In the era of huge data, statistics and data analysis are the fundamentals. Data Scientist or Quantitative Analyst's profession includes statistics, computer science and mathematics. In the past, these talents have been dying to work in the Huaer Street, but more of the trend brought by Big Data are being found. Now all walks of life are looking for engineers and data experts with backgrounds in quantitative analysis and statistical analysis. This course will actually experience the distributed assembly architecture of Hadoop multiple hosts, achieve HDFS distributed storage and MapReduce collection computing, and achieve Big Data processing and analysis. Learn Hadoop storage system, resource management framework and Spark In-Memory's huge data-related key technologies. Data analysis software and programming language - Python or R language is the basic course of entering a huge amount of data analysis. I believe that it will be impossible to get into a large number of data analysis. In the course of large-data technology platforms and applications, students will be led to understand the most current and large-data technology and platforms, and use relevant open source code frameworks to implement learning, so that students can learn basic principles and related practical technologies that are in line with current and future development trends. From service cloudization to large-scale software environment construction and application implementation, we achieve both theoretical and practical teaching goals. The method to allow students to understand and familiarize themselves with the operation, application and implementation of representative huge data analysis techniques Let students understand and familiarize themselves with the principles and structures of common large data computing platforms, and actually build and operate them Let students understand and be familiar with the popular topics of huge amounts of data, such as: high performance computing, machine learning, cloud computing, data mining Improve students' interest in huge amounts of data analysis technology and related applications, and cultivate available talents in relevant fields within the country. Use practice to achieve both theoretical and practical teaching purposes. Experiment 1: Data analytics on single machine, using huge data analysis techniques (Python or R, Weka or Scikit-Learn) to observe life phenomena, provides four problems with riding a scheduled car for students to implement in this course. Experiment 2: Big Data analytics on Big Data platform, using Java, Scala or Python to run Spark on the Hadoop platform to process large data. This course requires students to implement the "word count" routine program as practice, and then modify this program and combine it with Experiment 1 topics for more in-depth research and discussion.


參考書目 Reference Books

https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0


https://link.springer.com/bookseries/11970 (Big Data)
https://link.springer.com/search?facet-series=%2211970%22&facet-content-type=%22Book%22
https://link.springer.com/book/10.1007/978-3-030-01566-4
https://link.springer.com/book/10.1007/978-3-319-91815-0 (Text Mining)
https://link.springer.com/book/10.1007/978-981-13-0550-4 (Spark)
https://link.springer.com/book/10.1007/978-3-030-03359-0


評分方式 Grading

評分項目 Grading Method 配分比例 Grading percentage 說明 Description
出席與討論出席與討論
Attendance and discussion
20
作業作業
Action
60
期末考與期末分組專題期末考與期末分組專題
Final exam and final division topics
20

授課大綱 Course Plan

Click here to open the course plan. Course Plan
交換生/外籍生選課登記 - 請點選下方按鈕加入登記清單,再等候任課教師審核。
Add this class to your wishlist by click the button below.
請先登入才能進行選課登記 Please login first


相似課程 Related Course

很抱歉,沒有符合條件的課程。 Sorry , no courses found.

Course Information

Description

學分 Credit:0-3
上課時間 Course Time:Saturday/2,3,4[ST023]
授課教師 Teacher:楊朝棟
修課班級 Class:資工系4,碩1,2
選課備註 Memo:三大領域:大數據;大四可選。同大數據碩士學分學程:巨量資料導論
授課大綱 Course Plan: Open

選課狀態 Attendance

There're now 40 person in the class.
目前選課人數為 40 人。

請先登入才能進行選課登記 Please login first