Indiana University

 

Indiana University Cyberinfrastructure News

  1. There are an estimated 100,000 different proteins in the human body,
    each performing a different function in the body. This function is
    largely determined by the geometric structure of the molecule, which
    is primarily a function of the molecule's composition. The makeup of
    each molecule is determined by DNA, but the transformation from
    instructions to final product is a difficult problem. The chains
    formed can loop back, zigzag and twist as atoms interact with each
    other; in a chain thousands of atoms long, there are millions of
    interactions, occurring on a time scale measured in trillionths of
    a second.

    Not long ago, the workings of proteins had to be inferred from their
    static structure as revealed by X-ray crystallography, or from
    limited dynamical information from NMR measurements. But many
    functions of a protein can only truly be understood by watching or
    simulating the protein molecule is in its natural environment. The
    ability to accurately simulate atomic and molecular interactions,
    and to follow their motions, plays a crucial role in elucidating
    this dynamical behavior.

    Using ab initio quantum methods, virtually exact energies and
    molecular structures can be calculated for systems of a hundred
    atoms, but quantum calculations of large protein molecules are
    still too time consuming, and other methods must be found.

    Scientists have found one such method, "molecular dynamics" (MD),
    to be quite effective. MD eschews the complexities of quantum
    mechanics, and instead models the interactions of atoms as simple
    classical mechanical interactions. Using MD, one can treat systems
    up to a hundred thousand atoms or more. MD has another advantage
    over quantum simulations in that it can follow the dynamics of a
    system, which is often important in understanding how proteins
    carry out their tasks. In MD, nonbonded two-body interactions are
    typically modeled by an electrostatic potential and a vanderWaals
    potential. For covalently bonded atoms, simple bond stretching,
    bond angle bending, and bond torsion forces are used. At each time
    step, all of these forces between all of the atoms are calculated,
    and Newton's equations are integrated to get complete trajectories.

    Indiana University offers a number of molecular dynamics codes on
    Big Red, which is a particularly attractive platform for this work
    due to its large number of nodes, large memory per node, and fast
    interconnects between the nodes. The packages NAMD, AMBER,
    Gaussian, and DOCK are all available on Big Red. For example, Samy
    Meroueh at IUPUI uses these computational tools to understand the
    catalytic machinery of enzymes involved in antimicrobial resistance
    and various pathological processes, such as cancer, and to discover
    potential therapeutic agents.

  2. Most general purpose supercomputers tend to favor some types of
    computations over others.  Unlike a special purpose system, these
    affinities are usually unintended but are inescapable. To highlight
    this, consider the multiply and divide functions.  On the IBM,  more
    than 10 multiply operations can be performed for the cost of a
    single divide. Therefore, a particular program that is heavily
    weighted with unique divide operations would not perform as well as
    one with the same number of multiplies.

    On Big Red, it has been discovered that typical molecular dynamics
    packages, such as NAMD, run unexpectedly well. The question has been
    asked "why?" Unfortunately, there is no one single magic feature.
    Instead it is a conglomeration of several contributing factors.

    Consider a simple, related calculation, the gravity problem. The
    force on a particular particle in a system is the vector sum of the
    gravitational forces: the formula is (G * M1 * M2 / R^2), where G is
    a constant, M1 and M2 are the particle masses, and R is the distance
    between them. So, why does Big Red do this well? Here are a few
    factors:

    1. Reciprical math: Division takes ten times as much time as
    multiplication. Using the -qfloat=rsqrt compiler switch in computing
    the force between two particles replaces division by the square root
    with multiplication by its reciprocal. The reciprocal is
    calculated once, then used repeatedly.

    2. More rename/floating point registers: A saved value is much
    faster to use than a recalculated one. On Big Red's architecture,
    the number of "holding registers" has been increased to around 100,
    so a term once calculated has a greater chance to be saved and
    reused, and constants, such as G, can stay very close to the
    processor and not need to be contually recalled from cache.

    3. Data streaming: With 2 Load-Store Units on each processor, and
    with 4 "helper" processors attached to each core, streaming multiple
    calculations through the same instructions is faster.

    4. Floating multiply/add instruction: In the simple example above,
    the product of each particle's gravitational force must be summed
    with the forces of all of the other particles in the system. If the
    sum could be automatically added to the partial sum simultaneously
    with the calculation, both operations could be performed in the time
    required to accomplish just one. Using the -qfloat=maf switch
    enables this.

    Obviously, this article is not able to delve deeply into comparisons
    with other systems. However, the same attributes which make Big Red
    a great MD machine can improve related families of equations in the
    same way. Please contact researchtechnologies@iu.edu for advice or
    assistance if you think we might be able to help you improve your
    code's performance on Big Red.

  3. Transfers to Indiana University's Massive Data Storage System are
    capable of saturating Gigabit ethernet, if the intervening network
    is also capable. For those who have access to even faster networks,
    like the TeraGrid, users can now saturate even 10 Gigabit ethernet,
    using the new class of service (COS) 4.

    Transfers to the small (1), medium (2), and large (3) classes of
    service each involve a single disk, and a single tape drive. This
    limits the user transfer speeds to disk speed (150-200 MB/sec), and
    the tape migration and staging speeds to tape speed (100 MB/sec).
    Since there are many disks and tape drives distributed across
    multiple servers, multiple users can transfer at full speed
    simultaneously.

    Transfers to the new high-performance, large file class of service
    (COS 4) involve 16 disks and 4 tape drives. This permits burst
    transfers at very high rates, limited by the fact that the same
    servers that provide COS 4 are also providing other users access
    to the other classes of service. The tape rate is also limited by
    the fact that four tape drives must become available before any
    transfers take place, but with IU's 20 high speed tape drives per
    campus, this wait should be minimal.

    To upload files using the parallel COS, use GridFTP, hsi, or
    pftp_client, and set the target COS to 4. Downloads of files
    previously created in COS 4 will be parallelized automatically.

    For GridFTP clients, the parallel GridFTP server on port 2814 should
    be used. In addition, the block size should be set to 8MB, and the
    stripe width should be a multiple of 4. For third party transfers,
    this is not always possible; in that case, additional network
    overhead will be incurred, which will reduce the transfer rate by at
    least 2/3.

    For example, to use COS 4 with GridFTP:

     globus-url-copy -rp -stripe -bs 8388608 \
       file:///gpfs/mydir/mybigfile.tar \
       gsiftp://gridftp.archive.iu.teragrid.org:2814/mybigfile.tar,,4

    To retrieve, it's not necessary to set the blocksize in the client,
    since it will be set by the server:

     globus-url-copy -rp -stripe \
       gsiftp://gridftp.archive.iu.teragrid.org:2814/mybigfile.tar \
       file:///gpfs/mydir/mybigfile.tar

  4. Indiana University announces the availability of travel scholarships
    for students to attend the Richard Tapia Celebration of Diversity in
    Computing, in order to present technical posters based on use of the
    TeraGrid. Up to four travel scholarships will be awarded. Students
    who receive support to present posters at the Tapia conference will
    also be invited to visit Indiana University and present a seminar
    talk about their work.

    Applications from students who use advanced scientific applications
    directly from the Unix shell prompt and students who make use of
    Science Gateways are both welcome. Students who participated in the
    WxChallenge and used IUÕs Big Red supercomputer via the LEAD gateway
    are particularly invited to participate.

    The conference will be held October 14-17, in Orlando, Florida. The
    conference poster submission deadline is June 22. Those interested
    in applying are strongly encouraged to contact Indiana
    University as soon as possible via email to

    researchtechnologies@iu.edu.

    Please contact your faculty advisor as well. Students should
    generally have a local faculty advisor for their research project
    and to serve as sponsor for a TeraGrid allocation.

    More information about this opportunity is available at

            http://news.teragrid.org/announcements/20070507_02.php

    More information on the Richard Tapia Celebration of Diversity in
    Computing conference is available at

            www.richardtapia.org

  5. Monday, June 4, 2007, 8:30 AM-Noon -- University of Wisconsin Union,
    at the TeraGrid '07 conference:

    Big Red Tutorial

    The primary purpose of this tutorial is to enable TeraGrid users to
    learn about the Big Red system so that they can easily use codes
    already ported and optimized for that system (e.g. WRF, NAMD, MILC)
    or rapidly migrate other applications to Big Red.

    Indiana University's Big Red system, a 40 TFLOPS IBM e1350 cluster,
    is presently the third largest supercomputer integrated into the
    TeraGrid. Big Red supports the TotalView debugger and the Vampir
    performance analysis system, and is capable of excellent performance
    on applications scaling beyond 1000 processors.

    Within a TeraGrid environment dominated overall by Intel processors,
    PowerPC processors may be perceived as a barrier. This tutorial
    specifically introduces the compilers and optimizations that provide
    the best performance on the PowerPC processor.

    In addition, as massive computations commonly depend on massive data
    sets as input, and produce massive data sets as output, it is useful
    to obtain a working knowledge of IU's archival data storage system.

    Plan to attend to gain hands-on experience with Big Red and IU's
    Massive Data Storage System, as resources on the TeraGrid. For more
    information, see

            http://www.union.wisc.edu/teragrid07/

    This tutorial will offered again October 14, 2007 at BiBE 2007
    (Bioinformatics and Biomedical Engineering), Boston, MA. More
    information at

            http://www.cs.gsu.edu/BIBE07/

    --------

    Friday, June 8 - Submissions due, Workshop on Progress Toward
    Petascale Applications in Bioinformatics and Computational
    Biology, held in conjunction with the IEEE 7th International
    Symposium on Bioinformatics & Bioengineering (BIBE 2007). More
    information at

    http://racinfo.indiana.edu/newsletter/archives/2007-05s.shtml

    --------

    Friday, June 27, 12:30-1:30 -- ICTC Room 497 & IMU Walnut Room:
    Research Technologies Monthly Roundtable

    Research Computing Systems Update

    As we head into the mid-summer time frame, we'll take an updated
    look at Indiana University's cyberinfrastructure and where we are
    headed through the second half of 2007.

    It is important that the users of Research Technologies' systems
    engage in these discussions. One new system is a direct result of
    January's meeting. Your input is valuable and we welcome your
    continued participation.

  6. Outages
    -------

    System                Date           Failure

    libra                5/16        scheduling problem


    Planned maintenance
    -------------------

    System                Date           Time                   Action

    Libra                06/05        08:00-12:00        logging updates
    RDC                06/05        08:00-17:00        OS Patches
                                            ArcIMS updates
                                            ArcSDE updates
    Steel                06/03        06:00-10:00        OS Patches
    Data Capacitor        06/05        08:00-17:00        updates and patches

  7. If you have questions pertaining to IU's cyberinfrastructure, or you
    are encountering some difficulty, there are several ways to obtain
    help.

    An introduction and overview titled "Indiana University's
    CyberInfrastructure: The least you need to know" has been updated
    and is available at http://rc.uits.iu.edu/education_and_training/ .

    The IU Knowledge Base (http://kb.iu.edu) is an excellent source of
    help on how to do things.

    If you have problems which the KB does not enable you to solve,
    questions about system outages, or if you just have a problem and
    you don't know who to contact, send email to
    researchtechnologies@iu.edu.