Home
統計學系
course information of 109 - 2 | 6191 Probability and Statistical Thinking(大數據時代下的機率統計思維)

6191 - 大數據時代下的機率統計思維 Probability and Statistical Thinking


教育目標 Course Target

大數據時代,大家都在談論”大數據”。可是,到底大數據時代的本質是什麼? 彷彿一夜之間,“大數據”成了家喻戶曉的常用詞,不論新興行業還是傳統行業,都準備“擁抱大數據”,都想從大數據中發現寶藏。 統計與大數據有何區別?大數據分析結果如何解讀,是為探討的重點:(1)大數據所代表的母體是哪一個母體,是原先預期的母體嗎?(2)大數據分析結果準確嗎?錯了怎麼辦?影響有多大?(3)非傳統資訊的量化是否正確?誤判的後果為何?解答似乎就在統計的理論中。 透過統計方法分析資料,我們做出具有錯誤率控制保證的推論,統計分析成了決策的重要依據。 在大數據時代的潮流下,人們對於數據越來越重視,研究和分析數據的理論和方法越來越豐富,統計學越來越發揮出它的作用,在各個領域都體現出應用價值。無論是人工智能技術下的超級AI,還是海量數據中的深度學習技術,或者是以假亂真的虛擬現實,統計學都是它們賴以存在的理論基礎。本課程以科普的角度從應用機率的觀點出發並論及統計分析模型所扮演的角色,進而邁入大數據的範疇,這其中包括統計與大數據異同發展且相輔相成的過程,最後亦論及人工智慧發展的利弊得失。In the era of big data, everyone is talking about "big data". But, what is the essence of the big data era? As if overnight, "big data" has become a common word for all metaphors. Whether it is a new industry or a traditional industry, they are all ready to "hold big data" and want to discover treasures from big data. What is the difference between statistics and large data? How to interpret the results of large data analysis is the focus of exploration: (1) Which mother body is represented by large data? Is it the originally expected mother body? (2) Are the results of large data analysis accurate? What should I do if it is wrong? How big is the impact? (3) Is the quantification of non-traditional information correct? What are the consequences of the misjudgment? The answer seems to be in the statistical theory. Through statistical methods to analyze data, we make recommendations with error rate control guarantees, and statistical analysis has become an important basis for decision-making. Under the trend of the big data era, people are paying more and more attention to data, and theories and methods for studying and analyzing data are becoming more and more abundant, and the scheming is increasingly developing its role and has produced application value in all fields. Whether it is super AI under artificial intelligence technology, deep learning technology in massive data, or false reality, statistics are the theoretical basis for their existence. This course uses popular science perspectives to publish and discuss and analyze the role played by the model, and then enters the scope of large data, which includes the process of different development and integration of statistics and large data, and finally discusses and artificial The pros and cons of smart development.


課程概述 Course Description

大數據時代,大家都在談論”大數據”。可是,到底大數據時代的本質是什麼? 彷彿一夜之間,“大數據”成了家喻戶曉的常用詞,不論新興行業還是傳統行業,都準備“擁抱大數據”,都想從大數據中發現寶藏。可是,大數據究竟是什麼?是新瓶裝舊酒,還是技術革命? 傳統數據庫技術無法處理海量、高增長率、多樣化的大數據,革命性的新處理模式應運而生。大數據是創新,更是革命,海量的數據不僅可以用作統計分析,還可以用作產生“智慧”。 大數據到底比傳統得資料分析好在哪裡,最少有三方面可以觀察: (1) 資料數量大,幾乎等於母體,可以看到傳統抽樣調查看不到的小地方。 (2) 資料累積快速,可以即時呈現各種機會。 (3) 彙整非傳統的資料來源(例如網路、多媒體、文字等)、可以看到前所未見的資訊。 統計與大數據有何區別?大數據分析結果如何解讀,是為探討的重點: (1) 大數據所代表的母體是哪一個母體,是原先預期的母體嗎? (2) 大數據分析結果準確嗎?錯了怎麼辦?影響有多大? (3) 非傳統資訊的量化是否正確?誤判的後果為何?解答似乎就在統計的理論中。 透過統計方法分析資料,我們做出具有錯誤率控制保證的推論,統計分析成了決策的重要依據。大數據浪潮之初,資料處理的技術似乎超越了一切。大數據代表的是哪一個母體的資訊,很少被提及;統計學家也憂心在這一波大數據浪潮中被邊緣化,紛紛提出看法。 在大數據時代的潮流下,人們對於數據越來越重視,研究和分析數據的理論和方法越來越豐富,統計學越來越發揮出它的作用,在各個領域都體現出應用價值。無論是人工智能技術下的超級AI,還是海量數據中的深度學習技術,或者是以假亂真的虛擬現實,統計學都是它們賴以存在的理論基礎。事實上早在二、三十年前就有人提出數據科學的概念,惟統計主流學者認為分析技巧充其量不過是輔助統計推論的工具而已。就資料分析而言,統計自有其重要性,但已非主軸。 本課程從應用機率的觀點出發並論及統計分析模型所扮演的角色,進而邁入大數據的範疇,這其中包括統計與大數據異同發展且相輔相成的過程,最後亦論及人工智慧發展的利弊得失。
In the era of big data, everyone is talking about "big data". But, what is the essence of the big data era? As if overnight, "big data" has become a common word for all metaphors. Whether it is a new industry or a traditional industry, they are all ready to "hold big data" and want to discover treasures from big data. But, what exactly is large data? Is it a new bottle of old wine, or a technical revolution? Traditional database technology cannot handle massive, high growth rate, and diverse large data, and a revolutionary new processing model should be born. Large data is innovation and revolution. Massive data can not only be used as statistical analysis, but also as the generation of "wisdom". What is better than traditional data analysis? There are at least three aspects to observe: (1) The amount of data is large, almost equivalent to the parent body, and you can see small places that cannot be viewed by traditional sampling. (2) Data accumulation is fast, and various opportunities can be presented instantly. (3) Complete non-traditional data sources (such as network, multimedia, text, etc.), and you can see unprecedented information. What is the difference between statistics and large data? How to interpret the results of large data analysis is the focus of exploration: (1) Which mother body is represented by large data? Is it the originally expected mother body? (2) Are the results of large data analysis accurate? What should I do if it is wrong? How big is the impact? (3) Is the quantification of non-traditional information correct? What are the consequences of the misjudgment? The answer seems to be in the statistical theory. Through statistical methods to analyze data, we make recommendations with error rate control guarantees, and statistical analysis has become an important basis for decision-making. At the beginning of the big data wave, data processing technology seemed to surpass everything. Information about which mother body represents is rarely mentioned; statisticians are also worried that they are being transformed in this wave of large data and have put forward opinions. Under the trend of the big data era, people are paying more and more attention to data, and theories and methods for studying and analyzing data are becoming more and more abundant, and the scheming is increasingly developing its role and has produced application value in all fields. Whether it is super AI under artificial intelligence technology, deep learning technology in massive data, or false reality, statistics are the theoretical basis for their existence. In fact, as early as 20 or 30 years ago, some people proposed the concept of data science, but mainstream statistical scholars believe that analytical skills are at best tools to assist statistical recommendations. As far as data analysis is concerned, statistics have their own importance, but they are no longer the main axes. This course publishes the application probability perspective and discusses and statistically analyzes the role played by the model, and then enters the scope of large data, which includes the process of developing and integrating statistics and data in a synchronous manner. Finally, it also discusses the pros and cons of the development of artificial intelligence. Gains and losses.


參考書目 Reference Books

王碧、牟昀譯 (2013),熵的神秘國度,天下文化出版社
王鴻龍 (2016),統計學在大數據時代的角色,主計月刊727:24-30
李帥 (2017),世界是隨機的,清華大學出版社
胡守仁譯(2009),隨機法則-左右你我的命運和機會,天下文化書坊
馮啟思(2010),像統計學家一樣思考,高寶國際公司台灣分公司
黃文璋 (2003),隨機思考論,華泰書局
劉強編著 (2018),大數據時代的統計學思維,中國水利水電出版社
羅耀宗 (2014),隨機騙局,大塊文化
蘇子堯譯 (2019),精準預測,三采文化集團

Wang Bi and Mou Yunlu (2013), The Mysterious Country of Entropy, World Culture Publishing House
Wang Donglong (2016), The role of statistic in the era of large data, Main Project Monthly 727: 24-30
Li Jie (2017), the world is random, Tsinghua University Press
Hu Shouren (2009), Random Laws - Measure your fate and opportunities, World Culture Bookstore
Rong Kan (2010), thinking like a singular technician, Taiwan Branch of Gaobao International Company
Huang Wenzhang (2003), thinking and discussion at random, Huatai Book Bureau
Written by Liu Qiang (2018), The Political Thought of the Major Data Era, China Water Conservancy and Hydropower Press
Luo Yaozong (2014), random scammers, big block culture
Suzi's translation (2019), accurate prediction, Sancai Culture Group


評分方式 Grading

評分項目 Grading Method 配分比例 Grading percentage 說明 Description
課堂參與課堂參與
Class Participation
30
期中報告期中報告
Midterm Report
20
期末報告期末報告
Final report
50

授課大綱 Course Plan

Click here to open the course plan. Course Plan
交換生/外籍生選課登記 - 請點選下方按鈕加入登記清單,再等候任課教師審核。
Add this class to your wishlist by click the button below.
請先登入才能進行選課登記 Please login first


相似課程 Related Course

很抱歉,沒有符合條件的課程。 Sorry , no courses found.

Course Information

Description

學分 Credit:0-3
上課時間 Course Time:Thursday/2,3,4[M438]
授課教師 Teacher:林正祥
修課班級 Class:統計碩1,2
選課備註 Memo:
授課大綱 Course Plan: Open

選課狀態 Attendance

There're now 3 person in the class.
目前選課人數為 3 人。

請先登入才能進行選課登記 Please login first