大數據時代,大家都在談論”大數據”。可是,到底大數據時代的本質是什麼? 彷彿一夜之間,“大數據”成了家喻戶曉的常用詞,不論新興行業還是傳統行業,都準備“擁抱大數據”,都想從大數據中發現寶藏。 統計與大數據有何區別?大數據分析結果如何解讀,是為探討的重點:(1)大數據所代表的母體是哪一個母體,是原先預期的母體嗎?(2)大數據分析結果準確嗎?錯了怎麼辦?影響有多大?(3)非傳統資訊的量化是否正確?誤判的後果為何?解答似乎就在統計的理論中。 透過統計方法分析資料,我們做出具有錯誤率控制保證的推論,統計分析成了決策的重要依據。 在大數據時代的潮流下,人們對於數據越來越重視,研究和分析數據的理論和方法越來越豐富,統計學越來越發揮出它的作用,在各個領域都體現出應用價值。無論是人工智能技術下的超級AI,還是海量數據中的深度學習技術,或者是以假亂真的虛擬現實,統計學都是它們賴以存在的理論基礎。本課程以科普的角度從應用機率的觀點出發並論及統計分析模型所扮演的角色,進而邁入大數據的範疇,這其中包括統計與大數據異同發展且相輔相成的過程,最後亦論及人工智慧發展的利弊得失。In the era of big data, everyone is talking about "big data". However, what is the essence of the big data era? As if overnight, "big data" has become a household word. Both emerging and traditional industries are preparing to "embrace big data" and want to discover treasures from big data. What is the difference between statistics and big data? How to interpret the big data analysis results is the focus of the discussion: (1) Which matrix is represented by the big data? Is it the originally expected matrix? (2) Are the big data analysis results accurate? What should I do if I make a mistake? How big is the impact? (3) Is the quantification of non-traditional information correct? What are the consequences of misjudgment? The answer seems to lie in the theory of statistics. By analyzing data through statistical methods, we can make inferences with guaranteed error rate control, and statistical analysis has become an important basis for decision-making. In the trend of the big data era, people are paying more and more attention to data. The theories and methods for researching and analyzing data are becoming more and more abundant. Statistics is increasingly playing its role and showing application value in various fields. Whether it is super AI based on artificial intelligence technology, deep learning technology based on massive data, or virtual reality that mimics the real thing, statistics is the theoretical basis on which they exist. This course starts from the perspective of popular science from the perspective of applied probability and discusses the role of statistical analysis models, and then moves into the category of big data. This includes the similarities, differences, and complementary processes between statistics and big data. Finally, it also discusses artificial intelligence. The pros and cons of smart development.
大數據時代,大家都在談論”大數據”。可是,到底大數據時代的本質是什麼?
彷彿一夜之間,“大數據”成了家喻戶曉的常用詞,不論新興行業還是傳統行業,都準備“擁抱大數據”,都想從大數據中發現寶藏。可是,大數據究竟是什麼?是新瓶裝舊酒,還是技術革命?
傳統數據庫技術無法處理海量、高增長率、多樣化的大數據,革命性的新處理模式應運而生。大數據是創新,更是革命,海量的數據不僅可以用作統計分析,還可以用作產生“智慧”。
大數據到底比傳統得資料分析好在哪裡,最少有三方面可以觀察:
(1) 資料數量大,幾乎等於母體,可以看到傳統抽樣調查看不到的小地方。
(2) 資料累積快速,可以即時呈現各種機會。
(3) 彙整非傳統的資料來源(例如網路、多媒體、文字等)、可以看到前所未見的資訊。
統計與大數據有何區別?大數據分析結果如何解讀,是為探討的重點:
(1) 大數據所代表的母體是哪一個母體,是原先預期的母體嗎?
(2) 大數據分析結果準確嗎?錯了怎麼辦?影響有多大?
(3) 非傳統資訊的量化是否正確?誤判的後果為何?解答似乎就在統計的理論中。
透過統計方法分析資料,我們做出具有錯誤率控制保證的推論,統計分析成了決策的重要依據。大數據浪潮之初,資料處理的技術似乎超越了一切。大數據代表的是哪一個母體的資訊,很少被提及;統計學家也憂心在這一波大數據浪潮中被邊緣化,紛紛提出看法。
在大數據時代的潮流下,人們對於數據越來越重視,研究和分析數據的理論和方法越來越豐富,統計學越來越發揮出它的作用,在各個領域都體現出應用價值。無論是人工智能技術下的超級AI,還是海量數據中的深度學習技術,或者是以假亂真的虛擬現實,統計學都是它們賴以存在的理論基礎。事實上早在二、三十年前就有人提出數據科學的概念,惟統計主流學者認為分析技巧充其量不過是輔助統計推論的工具而已。就資料分析而言,統計自有其重要性,但已非主軸。
本課程從應用機率的觀點出發並論及統計分析模型所扮演的角色,進而邁入大數據的範疇,這其中包括統計與大數據異同發展且相輔相成的過程,最後亦論及人工智慧發展的利弊得失。
In the era of big data, everyone is talking about "big data". However, what is the essence of the big data era?
As if overnight, "big data" has become a household word. Both emerging and traditional industries are preparing to "embrace big data" and want to discover treasures from big data. But, what exactly is big data? Is it old wine in new bottles, or is it a technological revolution?
Traditional database technology cannot handle massive, high-growth, and diverse big data, and revolutionary new processing models have emerged. Big data is innovation and revolution. Massive data can not only be used for statistical analysis, but also for generating "wisdom."
How big data is better than traditional data analysis can be observed in at least three aspects:
(1) The amount of data is large, almost equal to the matrix, and small places that cannot be seen in traditional sampling surveys can be seen.
(2) Data is accumulated quickly and various opportunities can be presented immediately.
(3) By integrating non-traditional data sources (such as the Internet, multimedia, text, etc.), you can see unprecedented information.
What is the difference between statistics and big data? How to interpret the results of big data analysis is the focus of discussion:
(1) Which matrix is represented by big data? Is it the originally expected matrix?
(2) Are big data analysis results accurate? What should I do if I make a mistake? How big is the impact?
(3) Is the quantification of non-traditional information correct? What are the consequences of misjudgment? The answer seems to lie in the theory of statistics.
By analyzing data through statistical methods, we can make inferences with guaranteed error rate control, and statistical analysis has become an important basis for decision-making. At the beginning of the big data wave, data processing technology seemed to surpass everything. Which parent information big data represents is rarely mentioned; statisticians are also worried about being marginalized in this wave of big data and have put forward their opinions.
In the trend of the big data era, people are paying more and more attention to data. The theories and methods for researching and analyzing data are becoming more and more abundant. Statistics is increasingly playing its role and showing application value in various fields. Whether it is super AI based on artificial intelligence technology, deep learning technology based on massive data, or virtual reality that mimics the real thing, statistics is the theoretical basis on which they exist. In fact, some people proposed the concept of data science as early as 20 or 30 years ago, but mainstream statistical scholars believe that analysis techniques are at best a tool to assist statistical inference. As far as data analysis is concerned, statistics has its own importance, but it is no longer the main focus.
This course starts from the perspective of applied probability and discusses the role of statistical analysis models, and then moves into the category of big data. This course includes the similarities, differences and complementary development processes of statistics and big data. Finally, it also discusses the pros and cons of the development of artificial intelligence. Gains and losses.
王碧、牟昀譯 (2013),熵的神秘國度,天下文化出版社
王鴻龍 (2016),統計學在大數據時代的角色,主計月刊727:24-30
李帥 (2017),世界是隨機的,清華大學出版社
胡守仁譯(2009),隨機法則-左右你我的命運和機會,天下文化書坊
馮啟思(2010),像統計學家一樣思考,高寶國際公司台灣分公司
黃文璋 (2003),隨機思考論,華泰書局
劉強編著 (2018),大數據時代的統計學思維,中國水利水電出版社
羅耀宗 (2014),隨機騙局,大塊文化
蘇子堯譯 (2019),精準預測,三采文化集團
Translated by Wang Bi and Mou Yun (2013), The Mysterious Country of Entropy, Tianxia Culture Publishing House
Wang Honglong (2016), The role of statistics in the era of big data, Accounting Monthly 727: 24-30
Li Shuai (2017), The world is random, Tsinghua University Press
Translated by Hu Shouren (2009), The Law of Random - Determining the Destiny and Opportunities of You and Me, Tianxia Culture Bookstore
Feng Qisi (2010), Think like a statistician, KBA Taiwan Branch
Huang Wenzhang (2003), Random Thinking Theory, Huatai Book Company
Edited by Liu Qiang (2018), Statistical Thinking in the Big Data Era, China Water Conservancy and Hydropower Press
Luo Yaozong (2014), Random Scam, Chunk Culture
Translated by Su Ziyao (2019), accurate prediction, Sancai Culture Group
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
課堂參與課堂參與 class participation |
30 | |
期中報告期中報告 interim report |
20 | |
期末報告期末報告 Final report |
50 |