1、介紹實際應用在分子生物學上的各種模型與計算方法。學生並可實際體會,如圖論、機率理論、統計方法、資料結構等各種數學模型與演算法,應用在分子生物學上的完整建模(modeling)過程。
2、生物資訊學為跨領域學科;學習跨領域學科的最佳時機應為大學時期。此課程為生物資訊入門,之後學生可於各系修習相關課程,以為日後進一步發展奠基。
此為學習跨領域學科的必經過程,跨領域必需走入所跨領域裡。
3、本課程之上課方式,首先由實際的分子生物學問題出發,並據以樹立定義明確的問題(well-defined problem),然後建立解決該問題的模型(modeling),提出計算方法(演算法),並分析計算複雜度(Computational Complexity),最後依計算結果,評估對原分子生物學問題的解答程度。
4、尋求有效率的計算方法,一直是生物資訊學所面臨的挑戰。因此第一週說明何謂生物資訊學後,將先簡介計算複雜度與 NP-completeness 問題。
5、第二週將簡介分子生物學。
6、前二週為學生加退選課程時間,為使較晚選修此課程的學生較無學習落差,因此第三週才會開始本課程主談的生物資訊主題。
7、50年來,計算機科學家發現,即使各領域解決的問題不同,但所提出的各種演算法,有雷同的原理,亦即基本的技巧並不多。本課程的另一主軸,為介紹各種基本的演算法。
8、教學進度會隨著各種假期而調整。1. Introduce various models and calculation methods that are actually applied in molecular biology. Students can also learn from various mathematical models and algorithms such as diagrams, rate theory, statistical methods, data structures, etc., and apply them to complete modeling processes in molecular biology.
2. Biological information is a cross-domain subject; the best time to learn a cross-domain subject should be the university period. This course is the entry of biological information, and students can then study relevant courses in various departments to lay the foundation for further development in the future.
This is a necessary process for learning cross-domain subjects, and cross-domain must enter the cross-domain.
3. The course method above this course is first developed by actual molecular biology problems, and a well-defined problem is established based on trees. Then a model to solve the problem is established, a calculation method (algorithm) is proposed, and the calculation complexity is analyzed. Finally, based on the calculation results, the degree of solution to the protomolecular biology problems is evaluated.
4. Finding efficient calculation methods has always been a challenge facing biological information. Therefore, after explaining how to study biological information in the first week, we will first introduce the calculation complexity and NP-completeness problems.
5. Molecular biology will be introduced in the second week.
6. The students will add time for the withdrawal course in the first two weeks. In order to make students who choose this course less without learning gap, the biological information topics discussed in this course will begin only in the third week.
7. Over the past 50 years, computer scientists have discovered that even if the problems solved in different fields are different, the various algorithms proposed have similar principles, that is, there are not many basic techniques. Another main axis of this course is to introduce various basic algorithms.
8. Teaching progress will be adjusted according to various holidays.
何謂 “計算生物學” (或稱生物資訊學)? DNA由a,t,c,g 4個字母組合而成,如下例即為一串DNA序列(sequence): atgcactctt caatagtttt ggccaccgtg ctctttgtag cgattgcttc agcatcaaaa acgcgagagc tatgcatgaa atcgctcgag catgccaagg ttggcaccag caaggaggcg (習慣上,每10個字母寫成一小串,小串間以一“空白”隔開。此例計有120個字母,我們稱其長度為120) 人類DNA總長為30億,這30億個字母決定了一個人。1988年開始的人類基因計劃的主要目的,就是將這30億個字母寫出來。而這些字母是如何運作的,則有待進一步了解。這些隱藏於字母中的生命秘密,我們稱之為生物資訊(Biological information)。 DNA會製造出蛋白質,以營造活生生的生命。蛋白質由20個英文字母 (各代表一種氨基酸)組合而成,長度從數十至數百都有,如下例即為一條蛋白質序列: mhssivlatv lfvaiasask trelcmksle hakvgtskea kqdgidlykh mfehypamkk yfkhrenytp advqkdpffi kqgqnillac hvlcatyddr etfdayvgel marherdhvk 人類約有2萬條不同的蛋白質。這些蛋白質如何營造出生命,有待進一步了解。這些隱藏於字母中的秘密,也是所謂的生物資訊(Biological information)。 研究DNA如何運作及蛋白質如何營造生命, 也就是研究生物資訊(Biological information),是今日蓬勃發展的“生命科學”之目的。 所謂“生物序列”(Biological sequence),指的是DNA序列或蛋白質序列。 提出有效的生物序列分析方法(演算法或模型),以計算機為工具,挖掘隱藏在大量字母裡的生物資訊,我們稱之為“計算生物學”(Computational Biology),或稱之為“生物資訊學”(Bioinformatics)。
What is "calculation biology" (or biological information science)? DNA is composed of four letters a, t, c, g, etc. The following example is a series of DNA sequences: atgcactctt caatagtttt ggccacccgtg ctctttgtag cgattgcttc agcatcaaa acgcgagagc tatgcatgaa atcgctcgag catgccaagg ttggcaccagg caaggaggcg (In habit, every 10 letters are written into a small string, separated by a "blank". This example has 120 letters, which we call their length 120) The total length of human DNA is 3 billion, and these 3 billion letters determine a person. The main purpose of the human gene plan, which started in 1988, is to write these 3 billion letters. How these letters work remains to be further understood. These secrets of life hidden in letters, we call them biological information. DNA creates proteins to create a living life. Protein is composed of 20 English letters (each represents an amino acid), with a length ranging from ten to hundreds, as shown in the following example: mhssivlatv lfvaiasask trelcmksle hakvgtskea kqdgidlykh mfehypamkk yfkhrenytp advqkdpffi kqgqnillac hvlcatyddr etfdayvgel marherdhvk There are about 20,000 different proteins in humans. How these proteins create life remains to be further understood. These secrets hidden in letters are also the so-called biological information. Studying how DNA works and how proteins create life, that is, studying biological information is the purpose of "life science" that is booming today. The so-called "biological sequence" refers to a DNA sequence or a protein sequence. We propose effective biological sequence analysis methods (algorithms or models) to use computers as tools to mine biological information hidden in large numbers of letters. We call it "Computational Biology" or "Bioinformatics".
Textbook:
An introduction to bioinformatics Algorithms
Neil C. Jones and Pavel A. Pevzner
2004, MIT
References:
1. Algorithms on strings, trees, and sequences
── Computer Science and Computational Biology
Dan Gusfield
1997, Cambridge
2. Biological Sequence Analysis
R. Durbin etc.
1998 Cambridge
3. Computers and intractability
Michael R. Garey and David S. Johnson
1979 W.H. Freeman and company
4. Bioinformatics for Biologists
Pavel A. Pevzner etc.
2011 Cambridge
Textbook:
An introduction to bioinformatics Algorithms
Neil C. Jones and Pavel A. Pevzner
2004, MIT
References:
1. Algorithms on strings, trees, and sequences
── Computer Science and Computational Biology
Dan Gusfield
1997, Cambridge
2. Biological Sequence Analysis
R. Durbin etc.
1998 Cambridge
3. Computers and intractability
Michael R. Garey and David S. Johnson
1979 W.H. Freeman and company
4. Bioinformatics for Biologists
Pavel A. Pevzner etc.
2011 Cambridge
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|