1、本學期將選擇與Copy Number Transformation Problem 相關的學術論文。
2、Copy Number Transformation Problem 的意義:
a、在物種進化與癌症的研究裡,基因組重新排列(genome rearrangement)是個核心問題。過去20年,以計算方法對此問題的研究,大部分是在探討物種進化。目前卻有絕佳的機會,以計算方法研究癌症基因組的進化。物種進化以百萬年為單位,
其基因突變,一代傳一代。而一個個體內癌症的基因突變,就只在幾十年內發生。2005年開始的 The Cancer Genome Atlas 計畫,目的即在收集癌症基因突變數據,有利於以計算方法研究癌症的基因突變。
b、癌症是個動態過程,快速的突變,形成複雜的腫瘤基因組。其中,有很多的突變,是整段DNA 的剔除或複製(deletions and amplifications),使得腫瘤基因組內的基因個數(copy number profile),不斷改變,且與正常基因組不同。
了解它們的不同,可幫助我們預測疾病的進展與可能的醫療介入。
c、以計算方法了解正常基因組與各階段腫瘤基因組的差異,方法之一為訂出其copy number profile的距離。此概念目前並未有太多探討。本課程選擇的Copy Number Transformation Problem 相關的學術論文,和此有關。1. During this period, you will select academic papers related to Copy Number Transformation Problem.
2. The meaning of Copy Number Transformation Problem:
a. In the research on species evolution and cancer, genome rearrangement is a core problem. In the past 20 years, most of the research on this problem using calculation methods has been exploring species evolution. There is currently an excellent opportunity to study the evolution of cancer genomes using computational methods. Species evolve in millions of years.
Its gene changes, and it passes on generation after generation. The genetic changes in a single cancer only occur within a few decades. The Cancer Genome Atlas project started in 2005, aims to collect data on cancer gene inversion, which is conducive to studying cancer gene inversion using computational methods.
b. Cancer is a dynamic process that rapidly breaks, forming complex tumor gene groups. Among them, there are many sudden changes, which are the removal or copying of the entire DNA, so that the number of genes in the tumor genome is constantly changed and is different from the normal genome.
Understanding their differences can help us predict disease progression and possible medical intervention.
c. Use calculation methods to understand the differences between normal genomes and tumor genomes in each stage. One method is to book the distance between their copy number profile. There is not much discussion on this concept at present. The Copy Number Transformation Problem-related academic papers selected for this course are related to this.
何謂 “計算生物學” (或稱生物資訊學)? DNA由a,t,c,g 4個字母組合而成,如下例即為一串DNA序列(sequence): atgcactctt caatagtttt ggccaccgtg ctctttgtag cgattgcttc agcatcaaaa acgcgagagc tatgcatgaa atcgctcgag catgccaagg ttggcaccag caaggaggcg (習慣上,每10個字母寫成一小串,小串間以一“空白”隔開。此例計有120個字母,我們稱其長度為120) 人類DNA總長為30億,這30億個字母決定了一個人。1988年開始的人類基因計劃的主要目的,就是將這30億個字母寫出來。而這些字母是如何運作的,則有待進一步了解。這些隱藏於字母中的生命秘密,我們稱之為生物資訊(Biological information)。 DNA會製造出蛋白質,以營造活生生的生命。蛋白質由20個英文字母 (各代表一種氨基酸)組合而成,長度從數十至數百都有,如下例即為一條蛋白質序列: mhssivlatv lfvaiasask trelcmksle hakvgtskea kqdgidlykh mfehypamkk yfkhrenytp advqkdpffi kqgqnillac hvlcatyddr etfdayvgel marherdhvk 人類約有2萬條不同的蛋白質。這些蛋白質如何營造出生命,有待進一步了解。這些隱藏於字母中的秘密,也是所謂的生物資訊(Biological information)。 研究DNA如何運作及蛋白質如何營造生命, 也就是研究生物資訊(Biological information),是今日蓬勃發展的“生命科學”之目的。 所謂“生物序列”(Biological sequence),指的是DNA序列或蛋白質序列。 提出有效的生物序列分析方法(演算法或模型),以計算機為工具,挖掘隱藏在大量字母裡的生物資訊,我們稱之為“計算生物學”(Computational Biology),或稱之為“生物資訊學”(Bioinformatics)。
What is "calculation biology" (or biological information science)? DNA is composed of four letters a, t, c, g, etc. The following example is a series of DNA sequences: atgcactctt caatagtttt ggccacccgtg ctctttgtag cgattgcttc agcatcaaa acgcgagagc tatgcatgaa atcgctcgag catgccaagg ttggcaccagg caaggaggcg (In habit, every 10 letters are written into a small string, separated by a "blank". This example has 120 letters, which we call their length 120) The total length of human DNA is 3 billion, and these 3 billion letters determine a person. The main purpose of the human gene plan, which started in 1988, is to write these 3 billion letters. How these letters work remains to be further understood. These secrets of life hidden in letters, we call them biological information. DNA creates proteins to create a living life. Protein is composed of 20 English letters (each represents an amino acid), with a length ranging from ten to hundreds, as shown in the following example: mhssivlatv lfvaiasask trelcmksle hakvgtskea kqdgidlykh mfehypamkk yfkhrenytp advqkdpffi kqgqnillac hvlcatyddr etfdayvgel marherdhvk There are about 20,000 different proteins in humans. How these proteins create life remains to be further understood. These secrets hidden in letters are also the so-called biological information. Studying how DNA works and how proteins create life, that is, studying biological information is the purpose of "life science" that is booming today. The so-called "biological sequence" refers to a DNA sequence or a protein sequence. We propose effective biological sequence analysis methods (algorithms or models) to use computers as tools to mine biological information hidden in large numbers of letters. We call it "Computational Biology" or "Bioinformatics".
學術論文
Academic essay
評分項目 Grading Method | 配分比例 Grading percentage | 說明 Description |
---|---|---|
期中考期中考 Midterm exam |
40 | |
期末考期末考 Final exam |
40 | |
課堂表現課堂表現 Classroom performance |
20 |