Tuesday, January 30, 2007

Resources of Genomics and Microarray Analysis

Genomics and Microarrays:
Stanford Microarray Database:
storing lots of raw and normalized data from microarray experiments

Genomics tutorial at Genome Canada
http://www.genomecanada.ca/xpublic/dnaBasics/index.asp?l=e

Introductions to microarray at NCBI
http://www.ncbi.nlm.nih.gov/About/primer/microarrays.html

Microarray (movie)
http://www.broad.harvard.edu/chembio/lab_schreiber/anims/videos/microarray.html

other resourses for microarray
http://www.learner.org/channel/courses/biology/units/genom/images.html

image analysis for microarray
http://www.maths.usyd.edu.au/u/jeany/ (publication)
http://www.stat.berkeley.edu/users/terry/zarray/Talks/image/jpegindex.html http://cmm.ensmp.fr/~angulo/research/dnamicro.htm

Bioconductor
The richest source of freely available packages for genomic data analysis

Nature article
the perspective of biologists facing heaps of noisy genomic data including their urgent need for better methods and computationally and statistically skilled support.

dChip Software
http://biosun1.harvard.edu/complab/dchip/

dChip Software: Analysis and visualization of gene expression and SNP microarrays



Biology:

Retroviruses
http://www.whfreeman.com/kuby/content/anm/kb03an01.htm (FLASH)

human genome project(movies)
http://www.genome.gov/Pages/EducationKit/download.html

the central dogma of molecular biology (wonderful movie):
http://www.genome.gov/Pages/EducationKit/video/qt/3D.mov

EBM & Clinical Research Workstation menu
http://www.shdem.com/ebm/default.asp

Biochemistry & Epidemiology useful link:
http://www.med-ed.virginia.edu/menu/otherMedEd.cfm
 
Statistics:

online textbook for statistics
http://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm

Terry Speed's Microarray homepage
statistical challenges related to microarray data
new webpage: http://www.stat.berkeley.edu/~terry/Group/home.html

Statistics
http://www.bettycjung.net/statsiteS.htm

Computing technology

Introduction to R for biologists (by Natalie Roberts, WEHI, Melbourne)
R manuals under link "Manuals" (left column)
manuals: http://cran.r-project.org/manuals.html
R tutorial: http://www.personality-project.org/r/
R package: Statistics for Microarray Analysis

Directionary of Blogs About Microarray Analysis: Draft

Biodefense Bioinformatics Blog http://ai59694.blogspot.com/
Rotten bananas - http://heathermaughan.blogspot.com/index.html
Synthetic Biology and Gene Synthesis - http://syntheticbio.blogspot.com
Genomics Online - http://genomics-info.blogspot.com/index.html
formerscienceguy - http://formerscienceguy.blogspot.com/index.html

Friday, January 26, 2007

Alan Perelson

Dr. Perelson received his B.S. degrees in Life Science and Electrical Engineering from MIT in 1967, and a Ph.D. in Biophysics, under the supervision of Aharon Katchalsky-Katzir, from UC Berkeley in 1972. He was Acting Assistant Professor, Division of Medical Physics, Berkeley, in 1973 and a postdoctoral fellow at the Department of Chemical Engineering, University of Minnesota, in 1974. He was a staff member in the Theoretical Biology and Biophysics Group at Los Alamos National Laboratory from 1974 - 1991, a Laboratory Fellow from 1991 - 2002, head of the Theoretical Biology and Biophysics Group between 1995 - 2001, and is currently a Los Alamos National Laboratory Senior Fellow. He spent the 1978 and 1979 academic years at Brown University as an Assistant Professor of Medical Sciences in the Division of Biology and Medicine and the Lefschetz Center for Dynamical Systems, was a visiting scientist at the Mathematical Institute, Oxford University in 1986 and a visiting professor of Physics at Ecole Normale Superieure, Paris in 1990, and the University of Paris VII in 1992. He is also a member of the Science Board and head of the Theoretical Immunology Program at the Santa Fe Institute. He is also an adjunct professor of Bioinformatics at Boston University and an adjunct professor of biology at the University of New Mexico.

Research Interests

Mathematical and theoretical biology, with an emphasis on problems in immunology, virology,
and cell and molecular biology.

Time Zone of United States

PST: Washington, Oregon, Neveda, California

MT: Montana, Wyoming, Idaho, Utah, Colorado, Arizona, New Mexico and parts of
North Dakota, South Dakota and Nebraska

CT: Parts of North Dakota, South Dakota and Nebraska, Kansas, Oklahoma, Texas,
Minnesota, Iowa, Missouri, Arkansas, Louisiana Wisconsin, Illinois,
Tennessee, Mississippi, Alabama

EST: Michigan, Indiana, Ohio, Kentucky, Georgia, New York, Pennsylvania, West
Virginia, Virginia, North Carolina, South Carolina, Florida, Washington DC
New Jersey, Connecticut, Ehode Island, Massachusetts, New Hampshire,
Vermont, Maine

Thursday, January 25, 2007

Friday, January 19, 2007

A Haplotype Map of the Human Genome

A Haplotype Map of the Human Genome

David Altshuler
Harvard Medical School, Massachusetts General Hospital, Whitehead Institute

Eric Lander
Whitehead Institute and MIT

Goal

The next key step of the Human Genome Project (HGP) (following the creation of the genetic, physical, sequence and SNP maps) is the generation of a "haplotype" map of the human genome. Such a "haplotype" map consists of a high density of SNPs defining the small number of ancestral haplotypes (blocks of tightly correlated genetic variants) in each region of the human genome. Knowledge of these haplotypes will allow comprehensive and efficient testing of the association of human genes with human diseases. The haplotype map can and should be generated rapidly and should be made freely available to researchers worldwide.

Background

A haplotype map of the human genome has become both justified and practical due to significant advances over the last two years.

Specifically, these advances include:

  • Genomic Sequence: The development of a complete genome sequence - integrated with human genes and annotations - providing a reference framework on which to layer knowledge about allelic variation.

  • Genetic Variants: The development of a dense (and rapidly growing) map of 1.4 million human SNPs provides a genome-wide resource of genetic variation adequate to uniquely tag the vast majority of human haplotypes.

  • Genotyping Technology: The development of high-throughput methods, allowing a rapid, efficient and cost-effective experimental approach to a project of the required scale.

  • Long-range LD: The discovery that human SNPs display strong linkage disequilibrium (LD or allelic association) over large distances. LD is detectable over distances in the range of 100kb and is extremely strong over regions spanning several tens of kb (the size of typical genes). For such regions, the vast majority of chromosomes in the population carry one of a handful of highly conserved haplotypes. As a result, genetic diversity in the region can be represented by a small number of well-chosen SNPs.

Impact on biomedical research

The availability of a haplotype map of the human genome will have a substantial impact on human genetic studies.

Specifically, these studies include:

  • Comprehensive association studies of individual genes. The association of genes with disease has traditionally been probed by testing individuals SNPs one-at-a-time. The drawback to this approach is that the task is never-ending: one can exclude particular SNPs as playing a role, but one cannot exclude a gene. Once the haplotype structure of the genome is defined, one can (1) comprehensively test all significant haplotypes in the gene, and (2) decrease the number of SNPs needed by selecting a subset that defines the population variability. This will allow haplotype studies of individual genomic loci in an unbiased manner, without assumption about the locations of causal mutations in coding regions, promoters or regulatory sites at significant distance away. And, it will greatly decrease the technical and financial barriers faced by laboratories in undertaking such work

  • Genome-wide association studies. A genome-wide haplotype map will make possible whole-genome scans for association in the population. Rather than focusing only on 'candidate' genes, it will become possible to search the genome in an unbiased manner for genes whose common variation contributes to disease in the population. Routine use of genome-wide association studies will also require further decreases in genotyping costs, but such decreases are likely to be driven by the development of the haplotype map.

  • Human population structure and history. Knowledge of haplotypes will transform our understanding of human population structure and history. The LD pattern turns out to be an extremely sensitive indicator of population history, because the multi-allelic nature of haplotypes provides rich detail and because the breakdown of haplotypes follows a predictable clock set by recombination rates. In particular, LD patterns are more powerful than traditional studies of allele frequencies per se. Information about human population history is interesting in its own right, but is also very valuable in the design of medical studies (such as admixture mapping).

Technical Issues

Generating a haplotype map would involve the following components:

  • Population Samples. Development of appropriate population samples, consisting of parent-offspring trios (to allow inference of haplotypes). We estimate that a total of about 300 samples will be needed, representing major ethnic groups in a manner appropriate for generating a map that can be used for medical studies in all populations. The population samples should be a renewable resource (i.e., immortalized cell lines).

  • Sample and Data Availability. The samples should be made freely available so that any interested scientific group can contribute data (in the manner of the CEPH panel and the DNA Polymorphism Discovery Resource). Conversely, all data generated by the project should be immediately released into the public domain without restrictions of any kind.

  • Numbers of SNPs to be genotyped. It is estimated that generating the haplotype map will require successful genotyping of 450,000 SNPs, which will in turn require initial testing of some 800,000 to 900,000 SNPs. The required scale is now well within reach: the Whitehead and Sanger Centre are each currently engaged in pilot projects involving 25,000 SNPs using automated genotyping setup and MALDI-TOF-based detection. Given the required scale and efficiencies, it is likely that the bulk of the work should be performed by a few large groups, but all groups should be encouraged to participate in the project by analyzing genes and regions of interest.

  • Analytical Tools. The project will require various analytical tools to readily define haplotype blocks from genotype data, software systems to aid in the hierarchical selection of SNPs to fill in blocks, and databases to make the information maximally useful to the community. Prototype systems have been developed, but focused effort will be needed to develop mature systems.