Women in IT Invited to Apply for WINS Program at SC18 Conference

WINS_logo_HorzApplications are now being accepted for the Women in IT Networking at SC (WINS) program at the SC18 conference to be held Nov. 11-16 in Houston. WINS seeks qualified female U.S. candidates in their early to mid-career to join the volunteer team to help build and run SCinet, the high-speed network created at each year’s conference. Here’s how to apply.

WINS was launched to expand the diversity of the SCinet volunteer staff and provide professional development opportunities to highly qualified women in the field of networking. Selected participants will receive full travel support and mentoring by well-known engineering experts in the research and education community.

For the second year in a row, Kate Mace of ESnet’s Science Engagement Team is the WINS chair for SCinet.

Applications are to be submitted using the WINS Application Form. The deadline to apply is 11:59 p.m. Friday, March 23 (Pacific time). More information can be found on the SC18 WINS call for participation.

Each year, volunteers from academia, government and industry work together to design and deliver SCinet. Planning begins more than a year in advance and culminates in a high-intensity, around-the-clock installation in the days leading up to the conference.

Launched in 2015, the success of the WINS program led to an official three-year award by the National Science Foundation (NSF) and DOE-ESnet. WINS is a joint effort between ESnet, the Keystone Initiative for Network Based Education and Research (KINBER), the University Corporation for Atmospheric Research (UCAR), and SCinet.


CENIC Honors Astrophysics Link to NERSC via ESnet

A star-forming region of the Large Magellanic Cloud (Credit: European Space Agency via the Hubble Telescope)

An astrophysics project connecting UC Santa Cruz’s Hyades supercomputer cluster to NERSC via ESnet and other networks won the CENIC 2018 Innovations in Networking Award for Research Applications announced last week.

Through a consortium of Science DMZs and links to NERSC via CENIC’s CalREN and the DOE’s ESnet, the connection enables UCSC to carry out the high-speed transfer of large data sets produced at NERSC, which supports the Dark Energy Spectroscopic Instrument (DESI) and AGORA galaxy simulations, at speeds up to five times previous rates. These speeds have the potential to be increased by 20 times the previous rates in 2018. Peter Nugent, an astronomer and cosmologist from the Computational Research Division, was pivotal in the effort. Read UC Santa Cruz’s press release.

ESnet’s Inder Monga Featured in Video Recapping Netwerkdag 2017 in the Netherlands

ESnet Director Inder Monga’s keynote talk is among the events highlighted in a new video recapping the events at “Netwerkdag 2017 (Network Day 2017)“, a daylong meeting organized by SURFnet, the national research and education (R&E) network of the Netherlands. The event was held Dec. 14, 2017 in Utrecht under the theme of making connections.

In his talk about the future of R&E networking, Monga talked about the vision for next-generation networks that includes the increasing importance of software/software expertise in building networks, security and increasing the telemetry and analytics capability (including research in machine learning for networking) to tackle the growth in data as well as number of data producing devices.

ESnet Workshop Report Outlines Data Management Needs in Metagenomics, Precision Medicine

William Barnett, the chief research informatics officer for the Indiana Clinical and Translational Sciences Institute (CTSI) and the Regenstrief Institute at Indiana University, discusses the promise of precision medicine at the workshop.

Like most areas of research, the bioinformatics sciences community is facing an unprecedented explosion in the size and number of data sets being created, spurred largely by the decreasing cost of genome sequencing technology. As a result there is a critical need for more effective tools for data management, analysis and access.

Adding to the complexity, two major fields in bioinformatics – precision medicine and metagenomics – have unique data challenges and needs. To help address the situation, a workshop was organized by the Department of Energy’s Energy Sciences Network (ESnet) in 2016 at Lawrence Berkeley National Laboratory. Organized as part of a series of CrossConnects workshops, the two-day meeting brought together scientists from metagenomics and precision medicine, along with experts in computing and networking.

A report outlining the findings and recommendations from the workshop was published Dec. 19, 2017 in Standards in Genomic Sciences. The report reflected the input of 59 attendees from 39 organizations.

One driver for publishing the report was the realization that although each of the two focus areas have unique requirements, workshop discussions revealed several areas where the needs overlapped, said ESnet’s Kate Mace, lead author of the report. In particular, the issue of data management loomed largest.

Read a summary of the findings and recommendations from the workshop.

ESnet, Globus Experts Design a Better Portal for Scientific Discovery

Globus, Science DMZ provide new architecture to meet demand for accessing shared data

These days, it’s easy to overlook the fact that the World Wide Web was created nearly 30 years ago primarily to help researchers access and share scientific data. Over the years, the web has evolved into a tool that helps us eat, shop, travel, watch movies and even monitor our homes.

Meanwhile, scientific instruments have become much more powerful, generating massive datasets, and international collaborations have proliferated In this new era, the web has become an essential part of the scientific process, but the most common method of sharing research data remains firmly attached to the earliest days of the web. This can be a huge impediment to scientific discovery.

That’s why a team of networking experts from the Department of Energy’s Energy Sciences Network (ESnet), with the Globus team from the University of Chicago and Argonne National Laboratory, have designed a new approach that makes data sharing faster, more reliable and more secure. In an article published Jan. 15 in Peer J Comp Sci, the team describes their “The Modern Research Data Portal: a design pattern for networked, data-intensive science.”

“Both the size of datasets and the quantity of data objects has exploded, but the typical design of a data portal hasn’t really changed,” said co-author Eli Dart, a network engineer with the Department of Energy’s Energy Sciences Network, or ESnet. “Our new design preserves that ease of use, but easily scales up to handle the huge amounts of data associated with today’s science.”

Read the full story.

The Modern Research Data Portal design pattern from a network architecture perspective: The Science DMZ includes multiple DTNs that provide for high-speed transfer between network and storage. Portal functions run on a portal server, located on the institution’s enterprise network. The DTNs need only speak the API of the data management service (Globus in this case).


Berkeley Lab and ESnet Document Flow, Performance of 56 Terabytes Climate Data Transfer

Visualization by Prabhat (Berkeley Lab).
The simulated storms seen in this visualization are generated from the finite volume version of NCAR’s Community Atmosphere Model. Visualization by Prabhat (Berkeley Lab).

In a recent paper entitled “An Assessment of Data Transfer Performance for Large‐Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6,” experts from Lawrence Berkeley National Laboratory (Berkeley Lab) and ESnet (the Energy Sciences Network, document the data transfer workflow, data performance, and other aspects of transferring approximately 56 terabytes of climate model output data for further analysis.

The data, required for tracking and characterizing extratropical storms, needed to be moved from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at Berkeley Lab.

The authors found that there is significant room for improvement in the data transfer capabilities currently in place for CMIP5, both in terms of workflow mechanics and in data transfer performance. In particular, the paper notes that performance improvements of at least an order of magnitude are within technical reach using current best practices.

To illustrate this, the authors used Globus to transfer the same raw data set between NERSC and Argonne Leadership Computing Facility (ALCF) at Argonne National Lab.

Read the Globus story: https://www.globus.org/user-story-lbl-and-esnet
Read the paper: https://arxiv.org/abs/1709.09575

30 Years Ago this Month ESnet Rolled Out its Rollout Plans

1988 ESnet map2

Although officially established in 1986, ESnet did not formally begin network operations until 1988, as the Department of Energy’s Magnetic Fusion Energy Network (MFEnet, affectionately known as MuffyNet) and High Energy Physics Network (HEPnet) were gradually melded into a single entity.

In January 1988, then-ESnet head Jim Leighton laid out the plans for the new network in the Buffer, the monthly user newsletter for the National Magnetic Fusion Energy Computing Center (known today as NERSC). At the time, ESnet was managed by the Networking and Engineering Group at the center.

After giving some background on the organization of ESnet, Leighton wrote “Now you are probably saying to yourself that this really is very exciting stuff, but it would be even more exciting if we knew when we could expect to see something running. Well, I just happen to be ready to outline our schedule for the next two years:

“January 1988: We believe that the new approach ESnet is taking will require much closer coordination with people responsible for the local area networking at each site. Accordingly, we are planning to convene a new committee in January, with sites involved in Phase I (see below) of ESnet deployment (“Boy, a new committee, that is exciting!” you are probably saying to yourself.). Additional site members will be added to the committee as the implementation continues.

“Phase 0 (January-March 1988): We expect to bring up all the sites on the X.25 backbone, including Brookhaven National Laboratory (BNL), CERN, Fermi National Accelerator Laboratory (FNAL), Florida State University (FSU), Lawrence Berkeley Laboratory (LBL), Lawrence Livermore National Laboratory (LLNL), and the Massachusetts Institute of Technology (MIT). Additional foreign sites will be added during the year.

“Demonstration (March 1988): During the MFESIG meeting to be held at LBL, we expect to demonstrate some ‘beta release’ capabilities of ESnet.

“Phase I (June-September 1988): We will begin deploying and installing a terrestrial 56-K bits per second backbone for ESnet. Sites affected include Argonne National Laboratory (ANL), FSU, GA Technologies, Los Alamos National Laboratory, LBL, MFECC, Princeton Plasma Physics Laboratory, and the University of Texas at Austin. No sites will be disconnected from MFEnet during this phase.

“Phase II (October-December 1988): We will complete the ESnet backbone and connect additional sites to the backbone. This phase will require some sites to be disconnected from MFEnet. The MFEnet to ESnet transition gateway must be installed during this phase. Additional sites affected include CEBAF, FNAL, MIT, Oak Ridge National Laboratory, and UCLA.

“Phase III (Calendar Year 1989): We will continue to switch major hub sites from MFEnet to MFEnet II, along with all secondary sites connected through those hub sites.”

Read more from the Buffer about  the 1988 ESnet launch.

ESnet’s DOE early-career awardee works to overcome roadblocks in computational network

Mariam Kiran, ESnet, speaks to Kennedy High School students and their teacher Dr. LaRue Moore during a networking camp at Lawrence Berkeley National Laboratory.

Like other complex systems, computer networks can break down and suffer bottlenecks. Keeping such systems running requires algorithms that can identify problems and find solutions on the fly so information moves quickly and on time.

Mariam Kiran – a network engineer for the Energy Sciences Network (ESnet), a DOE Office of Science user facility managed by Lawrence Berkeley National Laboratory – is using an early-career research award from DOE’s Office of Science to develop methods combining machine-learning algorithms with parallel computing to optimize such networks.

Read more: http://ascr-discovery.science.doe.gov/2017/12/thinking-networks/

ESnet’s Petascale DTN Project Speeds up Data Transfers between Leading HPC Centers

Operations staff monitor the network in the ESnet/NERSC control room. (Photo by Marilyn Chung, Berkeley Lab)

The Department of Energy’s (DOE) Office of Science operates three of the world’s leading supercomputing centers, where massive data sets are routinely imported, analyzed, used to create simulations and exported to other sites. Fortunately, DOE also runs a networking facility, ESnet (short for Energy Sciences Network), the world’s fastest network for science, which is managed by Lawrence Berkeley National Laboratory.

Over the past two years, ESnet engineers have been working with staff at DOE labs to fine tune the specially configured systems called data transfer nodes (DTNs) that move data in and out of the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory and the leadership computing facilities at Argonne National Laboratory in Illinois and Oak Ridge National Laboratory in Tennessee. All three of the computing centers and ESnet are DOE Office of Science User Facilities used by thousands of researchers across the country.

The collaboration, named the Petascale DTN project, also includes the National Center for Supercomputing Applications (NCSA) at the University of Illinois in Urbana-Champaign, a leading center funded by the National Science Foundation (NSF). Together, the collaboration aims to achieve regular disk-to-disk, end-to-end transfer rates of one petabyte per week between major facilities, which translates to achievable throughput rates of about 15 Gbps on real world science data sets.

Performance data from March 2016 showing transfer rates between facilities. (Image credit: Eli Dart, ESnet)

Research projects such as cosmology and climate have very large (multi-petabyte) datasets and scientists typically compute at multiple HPC centers, moving data between facilities in order to take full advantage of the computing and storage allocations available at different sites.

Since data transfers traverse multiple networks, the slowest link determines the overall speed. Tuning the data transfer nodes and the border router where a center’s internal network connects to ESnet can smooth out virtual speedbumps. Because transfers over the wide area network have high latency between sender and receiver, getting the highest speed requires careful configuration of all the devices along the data path, not just the core network.

In the past few weeks, the project has shown sustained data transfers at well over the target rate of 1 petabyte per week. The number of sites with this base capability is also expanding, with Brookhaven National Laboratory in New York now testing its transfer capabilities with encouraging results. Future plans including bringing the NSF-funded San Diego Supercomputer Center and other big data sites into the mix.

“This increase in data transfer capability benefits projects across the DOE mission science portfolio” said Eli Dart, an ESnet network engineer and leader of the project. “HPC facilities are central to many collaborations, and they are becoming more important to more scientists as data rates and volumes increase. The ability to move data in and out of HPC facilities at scale is critical to the success of an ever-growing set of projects.”

When it comes to moving data, there are many factors to consider, including the number of transfer nodes and their speeds, their utilization, the file systems connected to these transfer nodes on both sides, and the network path between them, according to Daniel Pelfrey, a high performance computing network administrator at the Oak Ridge Leadership Computing Facility.

The actual improvements being made range from updating software on the DTNs to changing the configuration of existing DTNs to adding new nodes at the centers.

Performance measurements from November 2017 at the end of the Petascale DTN project. All of the sites met or exceed project goals. (Image Credit: Eli Dart, ESnet)

“Transfer node operating systems and applications need to be configured to allow for WAN transfer,” Pelfrey said. “The connection is only going to be as fast as the slowest point in the path allows. A heavily utilized server, or a misconfigured server, or a heavily utilized network, or heavily utilized file system can degrade the transfer and make it take much longer.”

At NERSC, the DTN project resulted in adding eight more nodes, tripling the number, in order achieve enough internal bandwidth to meet the project’s goals. “It’s a fairly complicated thing to do,” said Damian Hazen, head of NERSC’s Storage Systems Group. “It involves adding infrastructure and tuning as we connected our border routers to internal routers to the switches connected to the DTNs. Then we needed to install the software, get rid of some bugs and tune the entire system for optimal performance.”

The work spanned two months and involved NERSC’s Storage Systems, Networking, and Data and Analytics Services groups, as well as ESnet, all working together, Hazen said.

At the Argonne Leadership Computing Facility, the DTNs were already in place and with minor tuning, transfer speeds were increased to the 15 Gbps.

“One of our users, Katrin Heitmann, had a ton of cosmology data to move and she saw a tremendous benefit from the project,” said Bill Allcock, who was director of operations at the ALCF during the project. “The project improved the overall end-to-end transfer rates, which is especially important for our users who are either moving their data to a community archive outside the center or are using data archived elsewhere and need to pull it in to compute with it at the ALCF.”

As a result of the Petascale DTN project, the OLCF now has 28 transfer nodes in production on 40-Gigabit Ethernet. The nodes are deployed under a new model—a diskless boot—which makes it easy for OLCF staff to move resources around, reallocating as needed to respond to users’ needs.

“The Petascale DTN project basically helped us increase the ‘horsepower under the hood’ of network services we provide and make them more resilient,” said Jason Anderson, an HPC UNIX/storage systems administrator at OLCF. “For example, we recently moved 12TB of science data from OLCF to NCSA in less than 30 minutes. That’s fast!”

Anderson recalled that a user at the May 2017 OLCF user meeting said that she was very pleased with how quickly and easily she was able to move her data to take advantage of the breadth of the Department of Energy’s computing resources.

“When the initiative started we were in the process of implementing a Science DMZ and upgrading our network,” Pelfrey said. “At the time, we could move a petabyte internally in 6-18 hours, but moving a petabyte externally would have taken just a bit over a week. With our latest upgrades, we have the ability to move a petabyte externally in about 48 hours.”

The fourth site in the project is the NSF-funded NCSA in Illinois, where senior network engineer Matt Kollross said it’s important for NCSA, the only non DOE participant, to collaborate with other DOE HPC sites to develop common practices and speed up adoption of new technologies.

“The participation in this project helped confirm that the design and investments in network and storage that we made when building Blue Waters five years ago were solid investments and will help in the design of future systems here and at other centers,” Kollross said. “It’s important that real-world benchmarks which test many aspects of an HPC system, such as storage, file systems and networking, be considered in evaluating overall performance of an HPC compute system and help set reasonable expectations for scientists and researchers.”

Origins of the project

The project grew out of a Cross-Connects Workshop on “Improving Data Mobility & Management for International Cosmology,” held at Berkeley Lab in February 2015 and co-sponsored by ESnet and Internet2.

Salman Habib, who leads the Computational Cosmology Group at Argonne National Laboratory, gave a talk at the workshop, noting that large-scale simulations are critical for understanding observational data and that the size and scale of simulation datasets far exceed those of observational data. “To be able to observe accurately, we need to create accurate simulations,” he said.

During the workshop, Habib and other attendees spoke about the need to routinely move these large data sets between computing centers and agreed that it would be important to be able to move at least a perabyte a week. As the Argonne lead for DOE’s High Energy Physics Center for Computational Excellence project, Habib had been working with ESnet and other labs on data transfer issues.

To get the project moving, Katrin Heitmann, who works in cosmology at Argonne, created a data package of small and medium files totaling about 4.4 terabytes. The data would then be used to test network links between the leadership computing facilities at Argonne and Oak Ridge national labs, the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois in Urbana-Champaign, a leading center funded by the National Science Foundation.

“The idea was to use the data as a test, to send it over and over and over between the centers,” Habib said. “We wanted to establish a performance baseline, then see if we could improve the performance by eliminating any choke points.”

Habib admitted that moving a petabyte in a week would only use a fraction of ESnet’s total bandwidth, but the goal was to automate the transfers using Globus Online, a primary tool for researchers accessing high performance networks like ESnet for rapidly sharing data or to use remote computing facilities.

“For our research, it’s very important that we have the ability to transfer large amounts of data,” Habib said. “For example, we may run a simulation at one of the large DOE computing centers, but often where we run the simulation is not where we want to do the analysis. Each center has different capabilities and we have various accounts at the centers, so the data gets moved around to take advantage of this. It happens all the time.”

Although the project’s roots are in cosmology, the Petascale DTN project will help all DOE scientists who have a need to transfer data to, from, or between the DOE computing facilities to take advantage of rapidly advancing data analytics techniques. In addition, the increase in data transfer capability at the HPC facilities will improve the performance of data portals, such as the Research Data Archive at the National Center for Atmospheric Research, that use Globus to transfer data from their storage systems.

“As the scientists deal with data deluge and more research disciplines depend on high-performance computing, data movement between computing centers needs to be a no-brainer for scientists so they can take advantage of the compute cycles at all DOE Office of Science user facilities and the extreme heterogeneity of systems in the future” said ESnet Director Inder Monga.

This work was supported by the HEP Center for Computational Excellence. ESnet is funded by DOE’s Office of Science.