Indiana University

 

Indiana University Cyberinfrastructure News

  1. Indiana University's "Big Red" supercomputer is again ranked among
    the world's fastest, and IU accomplishments in advanced
    cyberinfrastructure are attracting attention and acclaim at the
    International Supercomputer Conference being held this week in
    Dresden, Germany. Big Red placed 30th on the June 2007 list of the
    world's 500 fastest supercomputers unveiled today at the conference,
    and Indiana University's Data Capacitor team, with partners from
    Technische Universitaet Dresden, demonstrated impressive performance
    on a distributed transatlantic Lustre file system designed to move
    large amounts of scientific data quickly and easily.

    Thanks to an upgrade this spring with the assistance of the Indiana
    Economic Development Corporation, Big Red moved up in rank, after
    placing 31st in the previous (fall 2006) list. In existence since
    1993, The TOP500 List is compiled twice a year by a group of highly
    respected leaders in the supercomputing community and released at
    the world's two largest supercomputing conferences, held each June
    in Germany, and each November in the US. The higher rankings of the
    Top500 list change very rapidly.

    IU's Data Capacitor, a 535 Terabyte storage system, is also
    featured in the week's hardware news at the International
    Supercomputing Conference. Using Wide Area Network access to the
    Lustre file system and the Data Capacitor over GEANT2 and Internet2
    advanced research networks, a team from IU and the Technische
    Universitaet Dresden achieved nearly 100 MegaBytes/sec data transfer
    over a single 1 Gigabit link across the Atlantic Ocean and has plans
    to increase the capabilities of long-distance data access via Lustre
    in the near future.

    The ability to transparently access data across long distances is
    critical to enabling new scientific advances, as the amount of
    research data "born digital" continues to skyrocket. IU is also
    working within the US to enable use of the Data Capacitor across
    long distances, within the NSF-funded TeraGrid. Managing massive
    amounts of data and large-scale computational analysis is a critical
    aspect of Indiana University's strategy for supporting innovation.

    Craig Stewart, associate dean for research technologies and chief
    operating officer of Pervasive Technology Labs at Indiana University
    said, "Big Red has enabled scientific innovations at IU and, via the
    TeraGrid, throughout the nation. The key challenge for us in the
    months ahead will be to use Big Red to enable new business
    innovations within the State of Indiana, working with our colleagues
    from Purdue for the benefit of the economy of the state."

    This material is based upon work supported by the National Science
    Foundation under grant numbers. CNS-0521433. ACI-0338618l,
    OCI-0451237, OCI-0535258, and OCI-0504075. Collaboration with the
    Technische Universitaet Dresden has been supported by TU-D, Indiana
    University, and the Fulbright Senior Scholar's program. Any
    opinions, findings and conclusions or recommendations expressed in
    this material are those of the author(s) and do not necessarily
    reflect the views of the National Science Foundation (NSF), Lilly
    Endowment, Inc., or any other funding agency.

  2. Often a job's workflow has different requirements at different
    stages of processing.  It would be wasteful in a massively parallel
    system to collect hundreds of processors, only to make them wait
    while the data files are copied or other serial tasks are
    performed.  By the same token,  if only a minimal number of
    processors are requested for the bulk of the job's processing, it
    would be nice to increase the number of processors for the parallel
    calculations that could benefit. In LoadLeveler, this type of
    workflow adjustment is called "staging" or "stepping."

    Staging is accomplished in the LoadLeveler (LL) submit script.  A
    typical LL script has a keyword stanza, followed by the keyword
    "queue" and then a shell command execution space.  If, a single
    stanza exists in the LL script,  the job is always assigned
    "step 0."  This can be confirmed by noticing that job IDs always
    seem to end in a zero. However, multiple stanzas can be linked
    together in a single LL script to form a workflow.

    Suppose we want to run a job that will:

    1.  Copy a [previously compiled executable] file and a data file
          to a [previously created] scratch directory,
    2.  Run the executable on 48 processors,
    3.  Copy the results to a safe place, and
          remove the output files from the scratch directory.

    This can be completely accomplished in a single script and the job
    need never reserve more processors than is required for that portion
    of the workflow.

    How do we link the stanzas together?  We introduce two new keywords,
    step_name and dependency. We will first name each stanza with
    step_name, and then use the dependency keyword to require a
    successful completion of the previous step prior to executing the
    next one. Each step is allocated its own time block. So, if a queue
    limit is one week and six job steps are used, the resulting workflow
    could run as long as six weeks. There is virtually no limit to the
    number of steps which may be strung together. One ocean model job
    required nearly one hundred! A simple example of the case presented
    above may be found at:

         http://rac.uits.iu.edu/hpc/loadleveler_ex.shtml

  3. The Cygwin tools are ports of the popular GNU development tools for
    Microsoft Windows. They run thanks to the Cygwin library, which
    provides the Unix system calls and environment. With Cygwin
    installed, Windows users can use tools such as hsi or htar to
    connect to the Massive Data Storage System. HSI is fast, robust, and
    user-friendly, and provides the best performance interface between
    IU's supercomputers and MD.SS. HSI also automatically selects the
    best class of service for a given upload.

    Cygwin can be downloaded from http://www.cygwin.com/mirrors.html.
    Versions of hsi and htar that work with Cygwin can be downloaded
    from https://rfs.iu.edu/clients/, which also has HPSS.conf file
    for download. HPSS.conf should be put to /usr/local/etc from
    cygwin shell or /cygdrive/usr/local/etc/ from Windows command shell.

    To run hsi/htar under cygwin, firewall on windows needs to be
    unblocked for hsi data transfer. Use the "firewall -on" command to
    get around the firewall if you can't modify the firewall rules.

    HSI commands will seem familiar to UNIX and FTP users. A session
    might look like the following (here % is the UNIX shell prompt,
    ? is the HSI prompt):

    %
    % hsi
    Principal: jdoe
    [jdoe]Password:
    Username: jdoe  UID: 11021  CC: 11021 Copies: 1 [hsi.3.3.3 Fri
    Jan 12 13:36:06 EST 2007]
    ?

    ? put myfile1.dat
    put myfile1.dat : /hpss/j/d/jdoe/myfile.dat ( 10485760 bytes,
    12283.4 KBS (cos=3))
    ? cd test2
    ? get myfile2.mov
    Scheduler: retrieving file(s)
    get myfile2.mov : /hpss/j/d/jdoe/movies/myfile2.dat
    (2005/09/29 08:49:03 10485760 bytes, 16842.8 KBS )

    Below is an example to use htar once you have opened a cygwin
    command shell:

    Desktop> mkdir /cygdrive/c/tmp
    Desktop> ./htar.exe cf test/t.tar /cygdrive/c/Documents\
    and\ Settings/jdoe/My\ Documents
    (That all goes on one line, on your Windows XP system.)

    Principal: jdoe
    [jdoe]Password:
    HTAR: HTAR SUCCESSFUL
    Desktop>

    This backs up the user's "My Documents" folder to the
    Massive Data Storage System.

  4. Use of Research Technologies systems requires a communications
    client known as the secure shell (ssh). Graphical applications
    require another package known as an X server. These software
    packages can be a hassle to install and configure. For convenience,
    Research Technologies has created a CD that provides both the secure
    shell and an X server that are run directly from the CD without the
    hassle of installing or configuring software. The CD is known as
    XLiveCD, and it is available from http://xlivecd.indiana.edu/.

    After you download a copy and burn it to CD, here's how to use it:

    1. Insert CD into the drive
    2. Accept the license and indicate that you want to run the software
    3. Indicate the number of buttons your mouse has
    4. Wait for a window to appear

    When the window appears, you can click in it and then run the
    secure shell to connect to a system. For example, user hoagyc
    would use the following command to connect to Big Red:

       ssh hoagyc@bigred.teragrid.iu.edu

    You'll then be asked to accept the host's key of authenticity and
    to supply your password. You can then run applications, and
    graphical applications will create windows on your workstation.

    XLiveCD does not yet run on Windows Vista. Briefly, the software
    that was used to build XLiveCD does not yet support Windows Vista.
    We are watching the development of that software, and we will
    release a new version of XLiveCD when the software is stable.

    XLiveCD has been very popular. Since December 2004 ,over 1000 copies
    have been downloaded by people at IU, and over 175,000 copies have
    been downloaded throughout the world.

  5. There are several job openings available in Indianapolis, focused on
    development of applications for multi-core processors.

    Technical staff are expected to be computational or computer
    scientists with a Ph.D. or M.S. in an appropriate scientific or
    engineering discipline, able to work with other scientists/engineers
    in creating applications and middleware (tools, libraries, etc.) for
    advanced multi-core technologies in targeted disciplines. Experience
    developing and optimizing highly scalable parallel code is
    essential. Other key skills needed/desired are:

       * Initiative, self-motivated, able to work successfully and
         achieve objectives without frequent supervision.
       * Able to excel in working in situations where not all project
         elements are fully or clearly defined.
       * Successful in working in distributed collaborative teams.
         Demonstrated talents and success in leading teams is valued.
       * Strong communication skills, both oral and written.
       * Effective in engaging with customers/business partners in
         establishing projects and acheiving the project objectives.

    Those interested should contact researchtechnologies@iu.edu

  6. Wednesday, July 25, 12:30-1:30 -- ICTC Room 497 & IMU Walnut Room:
    Research Technologies Round Table

    Scott McCaulay will discuss IU's role in the TeraGrid.

    --------

    Tuesday, July 10 -- Submission deadline

    Workshop on Progress Toward Petascale Applications in Bioinformatics
    and Computational Biology, to be held in conjunction with the IEEE
    7th International Symposium on Bioinformatics & Bioengineering
    (BIBE 2007), which will be held at Cambridge-Boston, Massachusetts,
    USA, October 14-17, 2007.

    Paper Submission Final Deadline has been extended to July 10, 2007.
    All accepted paper will be published by IEEE indexed in EI, INSPEC,
    DBLP and Library of Congress, and can be further included in journal
    issues dedicated for IEEE 7th BIBE such as BMC with unique PubMed ID
    for each paper (both SCI and PubMed/Medline indexed).

    --------

    Sunday, October 14, 2007 -- Boston, MA

    Indiana University is offering a TeraGrid-related tutorial, "Using
    IU's Big Red PowerPC Cluster and IU Storage Resources via the
    TeraGrid" at BiBE 2007 (Bioinformatics and Biomedical Engineering),

    The primary purpose of this tutorial is to enable TeraGrid users to
    learn about the Big Red system so that they can easily use codes
    already ported and optimized for that system (e.g. WRF, NAMD, MILC),
    or rapidly migrate other applications to Big Red.

    In addition, as massive computations commonly depend on massive data
    sets as input, and produce massive data sets as output, it may be
    useful to obtain a working knowledge of IU's archival data storage
    system, and how to store and access files via gridftp.

    Plan to attend to gain hands-on experience with Big Red and IU's
    High Performance Storage System, as resources on the TeraGrid. For
    more information, see

      http://www.cs.gsu.edu/BIBE07/index.php

  7. Planned maintenance
    -------------------

    System                Date           Time                   Action

    Libra           07/03   08:00-12:00     logging updates
    RDC             07/03   08:00-17:00     OS Patches
    Steel           07/03   06:00-10:00     OS Patches
    Data Capacitor  07/03   08:00-17:00     updates and patches


  8. If you have questions pertaining to IU's cyberinfrastructure, or you
    are encountering some difficulty, there are several ways to obtain
    help.

    An introduction and overview titled "Indiana University's
    CyberInfrastructure: The least you need to know" has been updated
    and is available at http://rc.uits.iu.edu/education_and_training/ .

    The IU Knowledge Base (http://kb.iu.edu) is an excellent source of
    help on how to do things.

    If you have problems which the KB does not enable you to solve,
    questions about system outages, or if you just have a problem and
    you don't know who to contact, send email to
    researchtechnologies@iu.edu.