Wednesday, December 20, 2006

Chris Sander


Chris Sander
Chris Sander

Trained as a theoretical physicist, Chris Sander, Director of Memorial Sloan-Kettering Cancer Center's Computational Biology Center and Chairman of the Sloan-Kettering Institute's Computational Biology Program, knew early in his career he wanted to do more than compute the behavior of elementary particles. After a bold move from physics to biology, he helped to develop the field of computational biology, which aims to use mathematical algorithms and information systems to simulate the behavior of molecules, cells, and organisms, using these simulations to make useful diagnostic and therapeutic predictions.

While an undergraduate at the University of Berlin, I was intensely engaged in the study of theoretical physics and mathematics. Yet I felt that analyzing life, the living system on this planet, was a more fascinating and challenging scientific problem than studying the world of elementary particles. To explore this potential change in career direction, I sought the advice of Max Delbrück at the California Institute of Technology, a physicist originally from Berlin who was one of the founders of molecular genetics.

After visiting friends in Texas, I hopped on a Greyhound bus to Los Angeles, made my way to Caltech, found Dr. Delbrück's office, knocked on the door, and with a dash of chutzpah, asked the Nobel laureate if he had a moment to chat with an aspiring graduate student. Dr. Delbrück described the areas of theoretical physics that might be relevant to biology in the future -- information that was in the forefront of my mind as I entered the graduate physics program at the University of California, at Berkeley, in 1967.

Wanting to move from the theoretical physics of my PhD thesis to the theoretical biology I had dreamt of, I made my second pilgrimage, this time to see Manfred Eigen, a Nobel Prize-winning chemist who was studying biological evolution in Göttingen, Germany. Dr. Eigen surprised me by explaining that the field barely existed, but he did point me to three mathematical biology research problems: a theory of the immune response, neuronal mapping in the brain, and protein folding. So I packed my bags and I moved from the University of Heidelberg to the Weizmann Institute of Science, in Israel, where I began work with Shneior Lifson on the prediction of three-dimensional protein structures.

Enter the third motivator of my career: the first completely sequenced genome -- no, not in 2000, but in 1977! In that year, I saw an amazing paper in the journal Nature from Fred Sanger's group in Cambridge, United Kingdom, which included two entire pages filled with 5,375 letters, all As, Ts, Gs and Cs, representing the genetic blueprint of a small virus. I walked down the hall to ask my friend Georg Schulz, "With this kind of cryptic information coming from genomes, won't biology need computational science to decipher it?" His answer was yes, and I spent the next 23 years of my professional life preparing for the day, in the year 2001, when the 3.5 billion letters of the human genome finally became available. In the process, I helped to develop the field now known as computational biology.

The real value of computational science, when applied to any system, is to predict what's going to happen next. Weather forecasting is an example. There's an enormous amount of data collected about the weather, but the data, by themselves, are unintelligible. What's required is the application of the appropriate mathematical equations embodied in a software system, which, using the data, allows one to compute tomorrow's weather.

Applied to cancer biology, we want to be able to predict, for example, if a cancer will go from a nonaggressive to an aggressive form, or more importantly, to predict accurately the consequences of possible therapeutic interventions. The goal is to have an impact on human disease, and to do this you have to work in collaboration with physicians. In 2002, Harold Varmus presented his vision of Memorial Sloan-Kettering as the perfect environment for this -- a place with open doors between basic and clinical research, where close collaboration is encouraged.

My first action at Memorial Sloan-Kettering Cancer Center was to start the Computational Biology Center (CBC) and its Bioinformatics Core Facility. The CBC's researchers and engineers are devoted both to basic science and to the goal of developing diagnostic and therapeutic tools that help improve the lives of people affected by cancer. We often collaborate with researchers in the lab and in the clinic to translate data -- data such as the molecular profiles of cells and tissues, the billions of letters of genome sequences, and the functions and structures of key genes -- into biological insights and prediction tools. And the Bioinformatics Core, ably led by Alex E. Lash, provides internal bioinformatics training, collaboration, and infrastructure support.

One concrete example of the practical uses of computational biology is the work we have been doing with Howard I. Scher, Chief of Memorial Sloan-Kettering Cancer Center's Genitourinary Oncology Service, and Francis M. Sirotnak, Member Emeritus and Head of Sloan-Kettering Institute's Laboratory of Molecular Therapeutics. The idea is that cancer cells, like any system that recovers from a round of major damage, might be especially sensitive after a first round of therapy. With this concept in mind, we are aiming to prevent the development of aggressive prostate cancer by looking at the molecular profile of prostate cancer cells after androgen removal in mice, using DNA chips provided by our Genomics Core Laboratory. We use computer software to find needles in a haystack -- the perhaps tens of genes, out of tens of thousands, that may be a characteristic signature of how prostate cancer reacts to such therapy. We hope this will lead us to an Achilles' heel to target to avoid recurrence. It's a long-term effort but the idea is to arrive computationally at the best therapeutic intervention.

Overall, what's been most rewarding for me during my short time here is the opportunity not just to predict the behavior of biological systems, but hopefully to help improve the quality of people's lives. The dream that started with Max Delbrück's advice is now within reach.

Saturday, December 09, 2006

Super Computing

 Logo
The TOP500 project was started in 1993 to provide a reliable basis for tracking and detecting trends in high-performance computing. Twice a year, a list of the sites operating the 500 most powerful computer systems is assembled and released. The best performance on the Linpack benchmark is used as performance measure for ranking the computer systems. The list contains a variety of information including the system specifications and its major application areas.


NCSA Home
The National Center for Supercomputing Applications (NCSA), one of the five original centers in the National Science Foundation's Supercomputer Centers Program, opened its doors in January 1986. Since then, NCSA has contributed significantly to the birth and growth of the worldwide cyberinfrastructure for science and engineering, operating some of the world's most powerful supercomputers and developing the software infrastructure needed to efficiently use these systems (for example, NCSA Telnet and, in 1993, NCSA Mosaic™, the first readily available graphical Web browser). Today the center is recognized as an international leader in deploying robust high-performance computing resources and in working with research communities to develop new computing and software technologies

Blue Gene




Blue Gene is an IBM Research project dedicated to exploring the
frontiers in supercomputing: in computer architecture, in the software required to program and control massively parallel systems, and in the use of computation to advance our understanding of important biological processes such as protein folding.

The full Blue Gene/L machine was designed and built in collaboration with the Department of Energy's NNSA/Lawrence Livermore National Laboratory in California, and has a peak speed of 360 Teraflops. Blue Gene systems occupy the #1 (Blue Gene/L) and #2 (Blue Gene Watson) positions in the TOP500 supercomputer list announced in November 2005, as well as 17 more of the top 100.

IBM now offers a Blue Gene Solution. IBM and its collaborators are currently exploring a growing list of applications including hydrodynamics, quantum chemistry, molecular dynamics, climate modeling and financial modeling.

SDSC - San Diego Super Computer Center

Founded in 1985, the San Diego Supercomputer Center (SDSC) enables international science and engineering discoveries through advances in computational science and high performance computing. Continuing this legacy into the era of cyberinfrastructure, SDSC is a strategic resource to science, industry and academia, offering leadership in the areas of data management, grid computing, bioinformatics, geoinformatics, high-end computing as well as other science and engineering disciplines. The mission of SDSC is to extend the reach of scientific accomplishments by providing tools such as high-performance hardware technologies, integrative software technologies and deep inter-disciplinary expertise, to the community.

SDSC was founded with a $170 million grant from the National Science Foundation's (NSF) Supercomputer Centers program. From 1997 to 2004, SDSC extended its leadership in computational science and engineering to form the National Partnership for Advanced Computational Infrastructure (NPACI), teaming with approximately 40 university partners around the country. Today, SDSC is an organized research unit of the University of California, San Diego primarily funded by NSF with a staff of talented scientists, software developers and support personnel.





The National Resource for Biomedical Supercomputing (NRBSC) pursues leading edge research in high performance computing and the life sciences, and fosters exchange between PSC expertise in computational science and biomedical researchers nationwide.

Our focus is two-fold: computational biomedical research and outreach to the national biomedical research community through education and publications.

Research at NRBSC is centered in three areas: microphysiology; volumetric visualization and analysis; and computational structural biology.

NRBSC's education arm includes not only user training, but also software distribution, publications, and other outreach activities such as online courses and workshop webcasts.

The National Resource for Biomedical Supercomputing, formerly the Biomedical Initiative, was established at the Pittsburgh Supercomputing Center in 1987 as the first extramural biomedical supercomputing program in the country funded by the National Institutes of Health.

Tuesday, December 05, 2006

Python Resources Collection

Website



Python® is a dynamic object-oriented programming language that can be used for many kinds of software development. It offers strong support for integration with other languages and tools, comes with extensive standard libraries, and can be learned in a few days. Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.

Jython
Python is an implementation of the high-level, dynamic, object-oriented language Python written in 100% Pure Java, and seamlessly integrated with the Java platform. It thus allows you to run Python on any Java platform.




Stored in these dark caverns you may find rich veins of Python code, collected caches of Python information, and all manner of sundry Python passageways to explore. With candle or torch in hand, good hunting this night to all.Those not familiar with Python perhaps might start your quest at a brighter place.

NumPy

The fundamental package needed for scientific computing with Python is called NumPy. This package contains: a powerful N-dimensional array objectsophisticated (broadcasting) functionsbasic linear algebra functions basic Fourier transformssophisticated random number capabilitiestools for integrating Fortran code.

SciPy.org
SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The core library is NumPy which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization. Together, they run on all popular operating systems, are quick to install, and are free of charge. NumPy and SciPy are easy to use, but powerful enough to be depended upon by some of the world's leading scientists and engineers. If you need to manipulate numbers on a computer and display or publish the results, give SciPy a try!


DISLIN Homepage
DISLIN is a high-level plotting library for displaying data as curves, polar plots, bar graphs, pie charts, 3D-color plots, surfaces, contours and maps.


wxPython is a GUI toolkit for the Python programming language. It allows Python programmers to create programs with a robust, highly functional graphical user interface, simply and easily.


VPython is a package that includes: the Python programming languagethe IDLE interactive development environment "Visual", a Python module that offers real-time 3D output, and is easily usable by novice programmers"Numeric", a Python module for fast processing of arrays

PyOpenGL Logo

PyOpenGL is the cross platform Python binding to OpenGL and related APIs. The binding is created using the SWIG wrapper generator, and is provided under an extremely liberal BSD-style Open-Source license.


The Biopython Project is an international association of developers of freely available Python tools for computational molecular biology.It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics. The source code is made available under the Biopython License, which is extremely liberal and compatible with almost every license in the world. We work along with the Open Bioinformatics Foundation, who generously provide web and CVS space for the project

PyZine
The journal of Python Language



PyLucene is a GCJ-compiled version of Java Lucene integrated with Python. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is designed to be API compatible with the latest version of Java Lucene.

Courses with an emphasis on scientific computing

Python course in Bioinformatics
Introduction to Python and Biopython with biological examples.



Monday, December 04, 2006

LaTeX Resources Collections

To start with Latex, the most convenient way is to read a short book with a strange name( sorry, I can not recall it now).
And Professor Schneider provides a page to introduce Latex for Biologist, here it is
LaTeX Style and BiBTeX Bibliography Formats for Biologists: TeX and LaTeX Resources