Penelusuran database bioinformatika Lutfan Lazuardi
Spektrum dan hirarki dari informatika kesehatan Clinical nformatics ndividual Patients Public Health nformatics Populations maging nformatics Tissues, rgans Bioinformatics Molecular Cellular Diadaptasi dari Shortliffe Adapted from Shortliffe
Apa itu bioinformatika? Disiplin ilmu yang meliputi bagaimana mendapatkan, memproses, menyimpan, mendistribusikan dan menganalisis informasi biologi dan kedokteran Dalam arti luas: Setiap penelitian yang berhubungan dengan proses biologi dengan mempergunakan komputer Dalam arti sempit: Analisis berbasis komputer terhadap data sekuens dari struktur makromolekul
S C T S T A T S S C M N E G S C M T P R C S N A R T S C M L B A T E M S C T A M E H T A M G N L E D M R A L U C E L M S C M E T R P Y G L T N H C E T F N N T U L V E H C E T B S C M L L E C S C M S Y H P Y G L B Bioinformatika menggabungkan ilmu Biologi, Kedokteran, Kimia, Matematika, Statistik dan ilmu Komputer untuk memahami proses biologis dari kehidupan Bioinformatika menjembatani banyak disiplin ilmu Kolaskar, 2003
Sequence Physiology (and beyond) Experimental Computation Hardware & instrumentation nformation technology Mathematical & physical models DNA sequence Gene & Genome Molecular evolution Methodology & expertise Genome sequencing Genomic data analysis Statistical genetics Protein Structure, Folding, Function & nteraction Metabolic Pathways Regulation Signaling Networks Physiology & Cell Biology nterspecies nteraction Ecology & Environment Proteomics Functional genomics (microarrays, 2D-PAGE, etc) High-tech Field ecology Protein structure prediction, dynamics, folding & design Data standards, representations, & analytical tools for Complex biological data Computational ecology Dynamical Systems modeling Kaitan antara ilmu biologi dengan teknology diadaptasi dari Gibas, 2003
Penelitian biologi abad 21 The new paradigm, now emerging is that all the 'genes' will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical. - Walter Gilbert
B N F R M A T C S The Pyramid of Life 1400 Chemicals 10,000 Proteins 30,000 Genes Metabolomics Proteomics Genomics Wishart (2004)
Entrez: Neighbors and Hard Links PubMed abstracts Word weight Phylogeny Taxonomy 3-D Structure VAST Genomes BLAST Nucleotide sequences Protein sequences BLAST Source NCB
Data sekuens Diagnosis Pencarian bat Baru Hypothesis-Driven Research Pengembangan Vaksin
penelitian tradisional Satu gen setiap eksperimen butuh waktu panjang Melelahkan hasil terbatas n vitro, in vivo, ex vivo NGS, Microarray fast tracking ribuan gen setiap kali eksperimen fungsi dari gen, baik sendiri atau interaksi dengan yang lainnya n silico (in algorithmo)
Contoh basisdata Nucleotide Database (GenBank) BLAST (Basic Local Alignment Search Tool) Protein Sequence Database Protein Structure Database (PDB) Genome Database Microarray Database Metabolic Pathway and Protein Function Database
Contoh tipe data Nucleotide/protein sequence Gene expression level
GenBank Basis data sekuens Koleksi anotasi sekuens DNA 171.744.486 sekuens (April 2014) Data sekuens didapatkan dari submisi langsung dari para ilmuwan/author Basis data genbank didesain untuk menyediakan informasi sekuens yang paling up to date untuk komunitas ilmuwan
Sumber data GenBank Submisi langsung dari individu peneliti melalui form (Bankt, Sequin) Submisi melalui Batch email (EST, GSS, STS) Melalui akun FTP (File Transfer Protocol) Data dari tiga kolaborasi basis data: GenBank DNA Database of Japan (DDBJ). European Molecular Biology Laboratory Database (EMBL)
Basis data primer vs. sekunder Primary Databases riginal submissions by experimentalists Database staff organize but don t add additional information Example: GenBank Derivative Databases (Secondary) Human curated compilation and correction of data Example: SWSS-PRT, NCB RefSeq mrna Computationally Derived Example: UniGene Chattopadhyay, 2007
Format file Genbank Flatfile (GBFF) Header Features Sequence FASTA format Deskripsi dimulai dengan tanda> Diikuti dengan data sekuens Berupa protein atau DNA
Contoh analisis Kaohsiung J Med Sci. 2008 Feb;24(2):55-62. doi: 10.1016/S1607-551X(08)70098-6. Phylogenetic study of dengue-3 virus in Taiwan with sequence analysis of the core gene. Tung YC 1, Lin KH, Chang K, Ke LY, Ke GM, Lu PL, Lin CY, Chen YH, Chiang HC. URL: http://www.sciencedirect.com/science/article/pii/s160 7551X08700986
Analisis kemiripan (BLAST) Desain primer Komparasi sekuen Multiple alignment Phylogenetic analysis
Phylogenetic analysis
high-density oligonucleotide human genome array GeneChips U133 Plus 2.0 (Affymetrix) This chip comprises more than 54.000 probe sets and analyzes the expression level of over 47.000 transcripts and variants including 38.500 well-characterized human genes Sumber: affymetrix
Microarray assay life cycle Biological question Data analysis Sample preparation Microarray detection Microarray hybridization
Proses data Microarray Microarray chips mages scanned by laser Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 New sample Prediction: Data Mining and analysis Sumber: Yuki Juan (2003) Datasets Class Sno D26528 D63874 D63880 ALL 2 193 4157 556 ALL 3 129 11557 476 ALL 4 44 12125 498 ALL 5 218 8484 1211 AML 51 109 3537 131 AML 52 106 4578 94 AML 53 211 2431 209
High level analysis Preprocessing Contoh skema data analisis microarray Microarray Raw data Normalization Filtering steps Gene expression profiles Statistical test; T-test/ANVA (Analysis of Variance) Cluster Analysis PTM (Pavlidis template matching) validation Genes of interest Biological Process/Function/Pathway
Contoh analisis Biogerontology. 2009 Apr;10(2):191-202. doi: 10.1007/s10522-008-9167-1. Epub 2008 Aug 27. Microarray analysis reveals similarity between CD8+CD28- T cells from young and elderly persons, but not of CD8+CD28+ T cells. Lazuardi L 1, Herndler-Brandstetter D, Brunner S, Laschober GT, Lepperdinger G, Grubeck-Loebenstein B. URL: http://link.springer.com/article/10.1007%2fs10 522-008-9167-1
Expression level Y1_28P Y2_28P 01_28P 02_28P Y1_28N Y2_28N 01_28N 02_28N Y1_28P Y2_28P 01_28P 02_28P Y1_28N Y2_28N 01_28N 02_28N contoh hierarchical cluster analysis A 0 7 10 Cluster gen Genes clusters (1-21) 1 3 2 4 5 6 7 8 9 1 01 1 1 2 1 31 1 4 5 B Linkage distance cluster 13 1 6 1 17 8 1 9 2 0 2 1 Y1_28P & Y2_28P : CD8 + CD28 + T cells from young persons 1_28P & 2_28P : CD8 + CD28 + T cells from elderly persons Y1_28N & Y2_28N : CD8 + CD28 T cells from young persons 1_28N & 2_28N : CD8 + CD28 T cells from elderly persons
Classification, function and pathway analysis (pantherdb.org)
Links Genbank https://www.ncbi.nlm.nih.gov/genbank/ Protein database http://www.wwpdb.org/ http://www.rcsb.org/pdb/home/home.do KEGG Pathway database http://www.genome.jp/kegg/genes.html
Terima kasih lutfanl@yahoo.com