There are an estimated 100,000 different proteins in the human body,
each performing a different function in the body. This function is
largely determined by the geometric structure of the molecule, which
is primarily a function of the molecule's composition. The makeup of
each molecule is determined by DNA, but the transformation from
instructions to final product is a difficult problem. The chains
formed can loop back, zigzag and twist as atoms interact with each
other; in a chain thousands of atoms long, there are millions of
interactions, occurring on a time scale measured in trillionths of
a second.
Not long ago, the workings of proteins had to be inferred from their
static structure as revealed by X-ray crystallography, or from
limited dynamical information from NMR measurements. But many
functions of a protein can only truly be understood by watching or
simulating the protein molecule is in its natural environment. The
ability to accurately simulate atomic and molecular interactions,
and to follow their motions, plays a crucial role in elucidating
this dynamical behavior.
Using ab initio quantum methods, virtually exact energies and
molecular structures can be calculated for systems of a hundred
atoms, but quantum calculations of large protein molecules are
still too time consuming, and other methods must be found.
Scientists have found one such method, "molecular dynamics" (MD),
to be quite effective. MD eschews the complexities of quantum
mechanics, and instead models the interactions of atoms as simple
classical mechanical interactions. Using MD, one can treat systems
up to a hundred thousand atoms or more. MD has another advantage
over quantum simulations in that it can follow the dynamics of a
system, which is often important in understanding how proteins
carry out their tasks. In MD, nonbonded two-body interactions are
typically modeled by an electrostatic potential and a vanderWaals
potential. For covalently bonded atoms, simple bond stretching,
bond angle bending, and bond torsion forces are used. At each time
step, all of these forces between all of the atoms are calculated,
and Newton's equations are integrated to get complete trajectories.
Indiana University offers a number of molecular dynamics codes on
Big Red, which is a particularly attractive platform for this work
due to its large number of nodes, large memory per node, and fast
interconnects between the nodes. The packages NAMD, AMBER,
Gaussian, and DOCK are all available on Big Red. For example, Samy
Meroueh at IUPUI uses these computational tools to understand the
catalytic machinery of enzymes involved in antimicrobial resistance
and various pathological processes, such as cancer, and to discover
potential therapeutic agents.
Indiana University Cyberinfrastructure News
-
-
Most general purpose supercomputers tend to favor some types of
computations over others. Unlike a special purpose system, these
affinities are usually unintended but are inescapable. To highlight
this, consider the multiply and divide functions. On the IBM, more
than 10 multiply operations can be performed for the cost of a
single divide. Therefore, a particular program that is heavily
weighted with unique divide operations would not perform as well as
one with the same number of multiplies.
On Big Red, it has been discovered that typical molecular dynamics
packages, such as NAMD, run unexpectedly well. The question has been
asked "why?" Unfortunately, there is no one single magic feature.
Instead it is a conglomeration of several contributing factors.
Consider a simple, related calculation, the gravity problem. The
force on a particular particle in a system is the vector sum of the
gravitational forces: the formula is (G * M1 * M2 / R^2), where G is
a constant, M1 and M2 are the particle masses, and R is the distance
between them. So, why does Big Red do this well? Here are a few
factors:
1. Reciprical math: Division takes ten times as much time as
multiplication. Using the -qfloat=rsqrt compiler switch in computing
the force between two particles replaces division by the square root
with multiplication by its reciprocal. The reciprocal is
calculated once, then used repeatedly.
2. More rename/floating point registers: A saved value is much
faster to use than a recalculated one. On Big Red's architecture,
the number of "holding registers" has been increased to around 100,
so a term once calculated has a greater chance to be saved and
reused, and constants, such as G, can stay very close to the
processor and not need to be contually recalled from cache.
3. Data streaming: With 2 Load-Store Units on each processor, and
with 4 "helper" processors attached to each core, streaming multiple
calculations through the same instructions is faster.
4. Floating multiply/add instruction: In the simple example above,
the product of each particle's gravitational force must be summed
with the forces of all of the other particles in the system. If the
sum could be automatically added to the partial sum simultaneously
with the calculation, both operations could be performed in the time
required to accomplish just one. Using the -qfloat=maf switch
enables this.
Obviously, this article is not able to delve deeply into comparisons
with other systems. However, the same attributes which make Big Red
a great MD machine can improve related families of equations in the
same way. Please contact researchtechnologies@iu.edu for advice or
assistance if you think we might be able to help you improve your
code's performance on Big Red. -
Transfers to Indiana University's Massive Data Storage System are
capable of saturating Gigabit ethernet, if the intervening network
is also capable. For those who have access to even faster networks,
like the TeraGrid, users can now saturate even 10 Gigabit ethernet,
using the new class of service (COS) 4.
Transfers to the small (1), medium (2), and large (3) classes of
service each involve a single disk, and a single tape drive. This
limits the user transfer speeds to disk speed (150-200 MB/sec), and
the tape migration and staging speeds to tape speed (100 MB/sec).
Since there are many disks and tape drives distributed across
multiple servers, multiple users can transfer at full speed
simultaneously.
Transfers to the new high-performance, large file class of service
(COS 4) involve 16 disks and 4 tape drives. This permits burst
transfers at very high rates, limited by the fact that the same
servers that provide COS 4 are also providing other users access
to the other classes of service. The tape rate is also limited by
the fact that four tape drives must become available before any
transfers take place, but with IU's 20 high speed tape drives per
campus, this wait should be minimal.
To upload files using the parallel COS, use GridFTP, hsi, or
pftp_client, and set the target COS to 4. Downloads of files
previously created in COS 4 will be parallelized automatically.
For GridFTP clients, the parallel GridFTP server on port 2814 should
be used. In addition, the block size should be set to 8MB, and the
stripe width should be a multiple of 4. For third party transfers,
this is not always possible; in that case, additional network
overhead will be incurred, which will reduce the transfer rate by at
least 2/3.
For example, to use COS 4 with GridFTP:
globus-url-copy -rp -stripe -bs 8388608 \
file:///gpfs/mydir/mybigfile.tar \
gsiftp://gridftp.archive.iu.teragrid.org:2814/mybigfile.tar,,4
To retrieve, it's not necessary to set the blocksize in the client,
since it will be set by the server:
globus-url-copy -rp -stripe \
gsiftp://gridftp.archive.iu.teragrid.org:2814/mybigfile.tar \
file:///gpfs/mydir/mybigfile.tar -
Indiana University announces the availability of travel scholarships
for students to attend the Richard Tapia Celebration of Diversity in
Computing, in order to present technical posters based on use of the
TeraGrid. Up to four travel scholarships will be awarded. Students
who receive support to present posters at the Tapia conference will
also be invited to visit Indiana University and present a seminar
talk about their work.
Applications from students who use advanced scientific applications
directly from the Unix shell prompt and students who make use of
Science Gateways are both welcome. Students who participated in the
WxChallenge and used IUÕs Big Red supercomputer via the LEAD gateway
are particularly invited to participate.
The conference will be held October 14-17, in Orlando, Florida. The
conference poster submission deadline is June 22. Those interested
in applying are strongly encouraged to contact Indiana
University as soon as possible via email to
researchtechnologies@iu.edu.
Please contact your faculty advisor as well. Students should
generally have a local faculty advisor for their research project
and to serve as sponsor for a TeraGrid allocation.
More information about this opportunity is available at
http://news.teragrid.org/announcements/20070507_02.php
More information on the Richard Tapia Celebration of Diversity in
Computing conference is available at
www.richardtapia.org -
Monday, June 4, 2007, 8:30 AM-Noon -- University of Wisconsin Union,
at the TeraGrid '07 conference:
Big Red Tutorial
The primary purpose of this tutorial is to enable TeraGrid users to
learn about the Big Red system so that they can easily use codes
already ported and optimized for that system (e.g. WRF, NAMD, MILC)
or rapidly migrate other applications to Big Red.
Indiana University's Big Red system, a 40 TFLOPS IBM e1350 cluster,
is presently the third largest supercomputer integrated into the
TeraGrid. Big Red supports the TotalView debugger and the Vampir
performance analysis system, and is capable of excellent performance
on applications scaling beyond 1000 processors.
Within a TeraGrid environment dominated overall by Intel processors,
PowerPC processors may be perceived as a barrier. This tutorial
specifically introduces the compilers and optimizations that provide
the best performance on the PowerPC processor.
In addition, as massive computations commonly depend on massive data
sets as input, and produce massive data sets as output, it is useful
to obtain a working knowledge of IU's archival data storage system.
Plan to attend to gain hands-on experience with Big Red and IU's
Massive Data Storage System, as resources on the TeraGrid. For more
information, see
http://www.union.wisc.edu/teragrid07/
This tutorial will offered again October 14, 2007 at BiBE 2007
(Bioinformatics and Biomedical Engineering), Boston, MA. More
information at
http://www.cs.gsu.edu/BIBE07/
--------
Friday, June 8 - Submissions due, Workshop on Progress Toward
Petascale Applications in Bioinformatics and Computational
Biology, held in conjunction with the IEEE 7th International
Symposium on Bioinformatics & Bioengineering (BIBE 2007). More
information at
http://racinfo.indiana.edu/newsletter/archives/2007-05s.shtml
--------
Friday, June 27, 12:30-1:30 -- ICTC Room 497 & IMU Walnut Room:
Research Technologies Monthly Roundtable
Research Computing Systems Update
As we head into the mid-summer time frame, we'll take an updated
look at Indiana University's cyberinfrastructure and where we are
headed through the second half of 2007.
It is important that the users of Research Technologies' systems
engage in these discussions. One new system is a direct result of
January's meeting. Your input is valuable and we welcome your
continued participation. -
Outages
-------
System Date Failure
libra 5/16 scheduling problem
Planned maintenance
-------------------
System Date Time Action
Libra 06/05 08:00-12:00 logging updates
RDC 06/05 08:00-17:00 OS Patches
ArcIMS updates
ArcSDE updates
Steel 06/03 06:00-10:00 OS Patches
Data Capacitor 06/05 08:00-17:00 updates and patches -
If you have questions pertaining to IU's cyberinfrastructure, or you
are encountering some difficulty, there are several ways to obtain
help.
An introduction and overview titled "Indiana University's
CyberInfrastructure: The least you need to know" has been updated
and is available at http://rc.uits.iu.edu/education_and_training/ .
The IU Knowledge Base (http://kb.iu.edu) is an excellent source of
help on how to do things.
If you have problems which the KB does not enable you to solve,
questions about system outages, or if you just have a problem and
you don't know who to contact, send email to
researchtechnologies@iu.edu.
