I’ll be busy in Brazil next week. Sharing the newsletter article about it with the Blog readers:
ESnet’s Monga Keynotes Two R&E Network Workshops
ESnet’s Chief Technology Officer Inder Monga will keynote two workshops and participate in a panel focusing on research and education (R&E) networks in Brazil next week.
On May 18, Monga opens the National Research and Education Network Workshop (WRNP) hosted by ESnet’s Brazilian counterpart. In his talk, entitled “R&E Networks: Imagining the next generation,” Monga will focus on new ideas in R&E networks, from technologies like software defined networking (SDN) and named data networking (NDN) to collaborative architectures to build an internet of different capabilities for global science collaborations. He will also showcase the challenges R&E networks face and focus on enabling end-to-end architectures, including concepts like the Science DMZ.
On May 22, Monga opens the Experimental Research Workshop of the Future Internet (WPIEF). His keynote is entitled “Moving from SDN demo to operations: Challenges.”
Both workshops are held as part of the Brazilian Symposium on Computer Networks and Distributed Systems (SBRC) conference where Monga will contribute to a May 19 panel discussing “Challenges in the Development of Network Infrastructure, Testbeds for Software Defined Networks.”
It has almost been a year since we turned 25, and transferred a “whole universe of data” at Supercomputing 2011 – and that was over a single 100G link between NERSC and Seattle. Now we are close to the end of building out the fifth generation of our network, ESnet5.
In order to minimize the downtime for the sites, we are building ESnet5 parallel to ESnet4, with just a configuration-driven switch of traffic from one network to the other. Since the scientific community we serve depends on the network to be up, it’s important to have assurance that the transition is not disruptive in anyway. The question we have heard over and over again from some of our users – when you switch the ESnet4 production traffic to ESnet5, how confident are you that the whole network will work, and not collapse?
In this blog post, I’d like to introduce an innovative testing concept the ESnet network engineering team (with special kudos to Chris Tracy) developed and implemented to address this very problem.
The goal of our testing was to ensure that the entire set of backbone network ports would perform solidly at full 100 Gbps saturation with no packet loss, over a 24 hour period. However we had some limitations. With only one Ixia test-set with 100 GE cards at hand to generate and receive packets and not enough time to ship that equipment to every PoP and test each link, we had to create a test scenario that would generate confidence that all the deployed routers and optical hardware, optics, the fiber connections, and the underlying fiber would performing flawlessly in production.
This implied creating a scenario where the 100 Gbps traffic stream being generated by the Ixia would be carried bi-directionally over every router interface deployed in ESnet5, traverse it only once and cover the entire ESnet5 topology before being directed back to the test hardware. A creative traffic loop was created that traversed the entire footprint, and we called it the ‘Snake Test’. Even though the first possible solution was used to create the ‘snake’, I am wondering if this could be framed as a NP-hard theoretical computer science and optimization approach known as the traveling salesman problem for more complex topologies?
The diagram below illustrates the test topology:
So after sending around 1.2 petabytes of data in 24 hours, and accounting for surprise fiber maintenance events that caused the link to flap, the engineering team was happy to see a zero loss situation.
Here’s a sample portion of the data collected:
Automation is key – utility scripts had been built to do things like load/unload the config from the routers, poll the firewall counters (to check for loss ingress/egress at every interface), clear stats, parse the resulting log files and turn them into CSV (a snapshot you see in the picture) for analysis.
Phew! – the transition from ESnet4 to ESnet5 continues without a hitch. Watch out for the completion news, it may come quicker than you think…..
Just last month our resident IPv6 expert, Mike Sinatra, discussed the Risks of not deploying IPv6 in the R&E Community. On World IPv6 Launch, ESnet is happy to unveil a simple dashboard that tracks the status of IPv6 deployment across its sites. This page is updated based on summary of tests performed by a v6 connected host within ESnet.
ESnet and its collaborators successfully completed three days of demonstrating its End-to-End Circuit Service at Layer 2 (ECSEL) software at the Open Networking Summit held at Stanford a couple of weeks ago. Our goal is to build “zero-configuration circuits” to help science applications seamlessly use networks for optimized end-to-end data transport. ECSEL, developed in collaboration with NEC, Indiana University, and the University of Delaware builds on some exciting new conceptual thinking in networking.
Wrangling Big Data
To put ECSEL in context, the proliferating tide of scientific data flows – anticipated at 2 petabytes per second as planned large-scale experiments get in motion – is already challenging networks to be exponentially more efficient. Wide area networks have vastly increased bandwidth and enable flexible, distributed, scientific workflows that involve connecting multiple scientific labs to a supercomputing site, a university campus, or even a cloud data center.
The increasing adoption of distributed, service-oriented computing means that resource and vendor independence for service delivery is a key priority for users. Users expect seamless end-to-end performance and want the ability to send data flows on demand, no matter how many domains and service providers are involved. The hitch is that even though the Wide Area Network (WAN) can have turbocharged bandwidth, at these exponentially increasing rates of network traffic even a small blockage in the network can seriously impair the flow of data, trapping users in a situation resembling commute conditions on sluggish California freeways. These scientific data transport challenges that we and other R&E networks face are just a taste of what the commercial world will encounter with the increasing popularity of cloud computing and service-driven cloud storage.
Abstracting a solution
One of the key feedback from application developers, scientists and end-users is that they do not want to deal with the complexity at the infrastructure level while still accomplishing their mission. At ESnet, we are exploring various ways to make networks work better for users. A couple of concepts could be game-changers, according to Open Network Summit conference presenter and Berkeley professor Scott Shenker: 1) using abstraction to manage network complexity, and 2) extracting and exposing simplicity out of the network. Shenker himself cites Barbara Liskov’s Turing Lecture as inspiration.
ECSEL is leveraging OSCARS and OpenFlow within the Software Defined Networking (SDN) paradigm to elegantly prevent end-to-end network traffic jams. OpenFlow is an open standard to allow application-driven manipulation of network flows. ECSEL is using OSCARS-controlled MPLS virtual circuits with OpenFlow to dynamically stitch together a seamless data plane delivering services over multi-domain constructs. ECSEL also provides an additional level of simplicity to the application, as it can discover host-network interconnection points as necessary, removing the requirement of applications being “statically configured” with their network end-point connections. It also enables stitching of the paths end-to-end, while allowing each administrative entity to set and enforce its own policies. ECSEL can be easily enhanced to enable users to verify end-to-end performance, and dynamically select application-specific protocol forwarding rules in each domain.
The OpenFlow capabilities, whether it be in an enterprise/campus or within the data center, were demonstrated with the help of NEC’s ProgrammableFlow Switch (PFS) and ProgrammableFlow Controller (PFC). We leveraged a special interface developed by them to program a virtual path from ingress to egress of the OpenFlow domain. ECSEL accessed this special interface programmatically when executing the end-to-end path stitching workflow.
Our anticipated next step is to develop ECSEL as an end-to-end service by making it an integral part of a scientific workflow. The ECSEL software will essentially act as an abstraction layer, where the host (or virtual machine) doesn’t need to know how it is connected to the network–the software layer does all the work for it, mapping out the optimum topologies to direct data flow and make the magic happen. To implement this, ECSEL is leveraging the modular architecture and code of the new release of OSCARS 0.6. Developing this demonstration yielded sufficient proof that well-architected and modular software with simple APIs, like OSCARS 0.6, can speed up the development of new network services, which in turn validates the value-proposition of SDN. But we are not the only ones who think that ECSEL virtual circuits show promise as a platform for spurring further innovation. Vendors such as Brocade and Juniper, as well as other network providers attending the demo were enthusiastic about the potential of ECSEL.
But we are just getting started. We will reprise the ECSEL demo at SC11 in Seattle, this time with a GridFTP application using Remote Direct Memory Access (RDMA) which has been modified to include the XSP (eXtensible Session Protocol) that acts as a signaling mechanism enabling the application to become “network aware.” XSP, conceived and developed by Martin Swany and Ezra Kissel of Indiana University and University of Delaware, can directly interact with advanced network services like OSCARS – making the creation of virtual circuits transparent to the end user. In addition, once the application is network aware, it can then make more efficient use of scalable transport mechanisms like RDMA for very large data transfers over high capacity connections.
We look forward to seeing you there and exchanging ideas. Until Seattle, any questions or proposals on working together on this or other solutions to the “Big Data Problem,” don’t hesitate to contact me.
Eric Pouyoul, Vertika Singh (summer intern), Brian Tierney: ESnet
We are proud to announce that two of ESnet’s projects have received IDEA (Internet2 Driving Exemplary Applications) awards in Internet2’s 2011 annual competition for innovative network applications that have had the most positive impact and potential for adoption within the research and education community. (see: Internet2’s press release).
Internet2 recognized OSCARS (On-Demand Secure Circuits and Advance Reservation System), developed by the ESnet team led by Chin Guok, including Evangelos Chaniotakis, Andrew Lake, Eric Pouyoul and Mary Thompson. Contributing partners also included Internet2, USC ISI and DANTE.
ESnet’s MAVEN (Monitoring and Visualization of Energy consumed by Networks) proof of concept application was also recognized with an IDEA award in the student category. MAVEN was prototyped by Baris Aksanli during his summer internship at ESnet. Baris is a Ph.D student at the University of California, San Diego conducting research at the System Energy Efficiency Lab with his thesis advisor, Dr. Tajana Rosing. Baris worked closely with his summer advisor, Inder Monga, and Jon Dugan to implement MAVEN as part of ESnet’s new Green Networking Initiative.
The idea behind OSCARS
OSCARS enables researchers to automatically schedule and guarantee end-to-end delivery of scientific data across networks and continents. For scientists, being able to count on reliable data delivery is critical as scientific collaborations become more expansive, often global. Meanwhile, in disciplines ranging from high-energy physics to climate, scientists are using powerful, geographically dispersed instruments like the Large Hadron Collider that are producing increasingly massive bursts of data, challenging the capabilities of traditional IP networks.
OSCARS virtual circuits can reliably schedule time-sensitive data flows – like those from the LHC – round the clock across networks, enabling research and education networks to seamlessly meet user needs. OSCARS code is also being deployed by R&E networks worldwide to support an ever-growing user base of researchers with data-intensive collaboration needs. Internet2, U.S. LHCnet, NORDUNet, RNP in Brazil as well as over 10 other regional and national networks have currently implemented OSCARS for virtual circuit services. Moreover, Internet2’s NSF-funded DyGIR and DYNES projects will in 2012 deploy over 60 more instances of OSCARS at university campuses and regional networks to support scientists involved in LHC, Laser Interferometer Gravitational-Wave Observatory (LIGO), Large Synoptic Survey Telescope (LSST) and Electronic Very-Long Baseline Interferometry (eVLBI) programs.
We are proud of the hard work and dedication the OSCARS development team has demonstrated since the start of this project. Just as importantly we are proud to see this work paying off in with new science collaboration and discoveries.
The potential of MAVEN
The Monitoring and Visualization of Energy consumed by Networks (MAVEN) project is a brand new prototype portal that will help network operators and researchers better track live network energy consumption and environmental conditions. MAVEN – implemented by Baris during his summer internship – is a first major step for ESnet in instrumenting our network with the tools to understand these operational dynamics. As networks continue to get bigger and faster, they will require more power and cooling in an era of decreased energy resources. To address this pressing challenge, ESnet is leading a new generation of research aimed at understanding how networks can operate in a more energy-efficient manner. We are grateful for Baris’ significant contributions in leading the development of MAVEN and glad to see that his talent is being recognized by the R&E networking community through this award.
Our take on ANI, OSCARS, perfSONAR, and the state of things to come.
In 2010 ESnet led the technology curve in the testbed by putting together a great multi-layer design, deploying specially tuned 10G IO Testers, became early investors in the Openflow protocol by deploying the NEC switches, and built a research breadboard of end-hosts leveraging open-source virtualization and cloud technologies.
The first phase of the ANI testbed is concluding. After 6+ months of operational life, with exciting research projects like ARCHSTONE, Flowbench, HNTES, climate studies, and more leveraging the facilities, we are preparing to move the testbed to its second phase on the dark fiber ring in Long Island. Our call for proposals that closed October 1st garnered excellent ideas from researchers and was reviewed by the academic and industry stalwarts in the panel. We are tying up loose ends as we light the next phase of testbed research.
This year the OSCARS team has been extremely productive. We added enhancements to create the next version (0.5.3) of currently production OSCARS software, progressed on architecting and developing a highly modular and flexible platform for the next-generation OSCARS (0.6), a PCE-SDK targeted towards network researchers focused on creating complex algorithms for path computation, and developing FENIUS to support the GLIF Automated GOLE demonstrator.
Not only did the ESnet team multitask on various ANI, operational network and OSCARS deliverables, it also spent significant time supporting our R&E partners like Internet2, SURFnet, NORDUnet, RNP and others interested in investigating the capabilities of this open-source software. We also appreciate Internet2’s participation by dedicating testing resources for OSCARS 0.6 starting next year to ensure a thoroughly vetted and stable platform during the April timeframe. This is just one example of the accomplishments possible for the R&E community by commiting to partnership and collaboration.
perfSONAR kept up its rapid pace of feature additions and new releases in joint collaboration with Internet2 and others. In addition to rapid progress in software capabilities, ESnet is aggressively rolling out perfSONAR nodes in its 10G and 1G POPs, creating an infrastructure where the network can be tuned to hum. With multiple thorny network problems now solved, perfSONAR has proven to be great tool delivering value. This year we focused on making perfSONAR easily deployable and adding the operational features to transform it into a production service. An excellent workshop in August succinctly captured the challenges and opportunities to leverage perfSONAR for operational troubleshooting and also by researchers in understanding further how to improve networks. Joint research projects continue to stimulate further development with a focus on solving end-to-end performance issues.
Life in technology tends to be interesting, even though people keep warning about the commoditization of networking gear. The focus area for innovation just shifts, but never goes away. Some areas of interest as we evaluate our longer term objectives next year:
Enabling the end-to-end world: What new enhancements or innovations are needed to deploy performance measurement, and control techniques to enable a seamless end-to-end application performance?
Life in a Terabit digital world: What network innovations are needed to fully exploit the requirement for Terabit connectivity between supercomputer centers in the 2015-2018 timeframe?
Life in a carbon economy: What are the low-hanging fruit for networks to become more energy-efficient and/or enable energy-efficiency in the IT ecosystem they play? Cloud-y or Clear?
Part 1: Considering the state of 100G and the state we’re in
The past year slipped by at a dizzying pace for us at ESnet, as we made new forays into cutting-edge technologies. In this two-part blogpost, we will recap accomplishments of the year, but also consider the challenges facing us in the one to come as we progress towards delivering the Advanced Networking Initiative.
One of our prime directives with ANI funding was to stimulate the 100G market towards increasing spectral efficiency. In the last year, we have had wonderful engagement with the vendors that are moving products in this direction. Coherent receivers and DP-QPSK modulation are now standard fare for the 40G/100G solutions. At the latest conference, IEEE ANTS, in Mumbai last week, the 100G question was considered solved. Researchers are now exploring innovative solutions to deliver a next generation of 100G with higher power efficiency, or jump to the next level in delivering 400G. One researcher at the Indian Institute of Technology, Mumbai, is looking at revolutionizing the power consumption curve of the digital processing paradigm of coherent solutions by investigating analog processing techniques (super secret, so we will just have to wait and see).
A representative from OFS, the optical fiber company, described research on new fibers which cater to the coherent world that will enable better performance. He quoted hero experiments, papers, and research presented at this years’ OFC, touting the advantages of new fiber captured through the joint work of Alcatel-Lucent Bell Labs and OFS (ex-Lucent) research collaborators. There is a lot of fiber still being laid out in the developing countries and they are well positioned to take advantage of this new research to bring cheaper broadband connectivity in so far underserved communities.
Some selected points raised at the panel regarding 400G and beyond:
Raman amplification is coming back in vogue
50GHz ITU-Grid needs to evolve to flexi-grid technology. With flexi-grid, some of the basic modem concepts of negotiation (remember the auto-sensing modems of late 90’s) is back – where based on distance, and loss, the appropriate grid spacing can be negotiated for each wavelength.
If the industry sticks with electrical compensation, optical equipment will see increased electricity consumption by the power-hungry Analog-Digital Conversion (ADC) and Digital Signal Processing (DSP) ASICS. With advances in CMOS, the status quo might not suffice in a few years, especially since the whole industry is out there sticking the router vendors with a big “power-hungry” sticker. The equations in power-consumption tradeoffs still need to be studied and appropriate comparisons made. I hope the vendors also develop a perspective in that direction.
Comcast, the only other vendor on the panel, mentioned current capacities of 30x40G (coherent) on some links of their backbone and their eagerness to deploy 100G solutions. They are WAY ahead in deploying 100G, though the industry seems to not broadcast such news widely.
Comcast felt that coherent optics in the Metro Area is overkill and entreated the vendors not to build one-size-fits-all solutions even if simpler (and, they hope making 100G more affordable, as well).
There was little discussion on the 100GE standards, although there was a clear message that LR-10 is here to stay, mainly supported by Data Center customers, though almost all traditional carriers intend deploy LR-4, in the case it starts costing less than a Ferrari.
At Supercomputing 2010, the SCinet community orchestrated and deployed 100G-capable equipment from Brocade, Cisco, Ciena and Juniper, to name a few vendors, and included 100G host demonstrations of data transfers by NASA. It was encouraging to see the Academic and R&E community lead deployment and testing of 100G [See a sample poster below].
The SCinet community lives on the “bleeding edge” and supported a demonstration by Internet2, ESnet, and other partners carrying live 100Gbps application data over a 100G wave from Chicago to New Orleans show floor.
We are looking forward to Seattle (SC11) and can already predict multiple 100G’s of bandwidth coming towards the show floor – if you have any cool demonstrations that you would like to collaborate with us, please drop us a note.
As science transitions from lab-oriented to a distributed computational and data-intensive activity, the research and education (R&E) networking community is tracking the growing data needs of scientists. Huge instruments like the Large Hadron Collider are being planned and built. These projects require global-scale collaborations and contributions from thousands of scientists, and as the data deluge from the instruments grows, even more scientists are interested in analyzing it for the next breakthrough discovery. Suffice it to say that even though worldwide video consumption on the Internet is driving a similar increase in commercial bandwidth, the scale, characteristics, and requirements of scientific data traffic is quite different.
And this is why ESnet got invited to Cisco Systems’ headquarters last week to talk about how we how we handle data as part of their regular Nerd Lunch talk series. What I found interesting although not surprising, was that with Cisco being a big evangelist of telepresence, more employees attended the talk from their desks than in person. This was a first for me and I came away with a new appreciation for the challenges of collaborating across distances.
From a speaker’s perspective, the lesson learnt by me was to brush up my acting skills. My usual preparations are to rehearse the difficult transitions and focus on remembering the few important points to make on every slide. When presenting, that slide presentation portion of my brain goes on auto-pilot, while my focus turns towards evaluating the impact on the audience. When speaking at a podium one can observe when someone in the audience opens a notebook to jot down a thought, when their attention drifts to email on the laptop in front of them, or when a puzzled look appears on the face of someone as they try to figure out the impact of the point I’m trying to make. But these visual cues go missing with a largely webcast audience, making it harder to know when to stop driving home a point or when to explain the point further to the audience. In the future, I’ll have to be better at keeping the talk interesting without the usual clues from my audience.
Maybe the next innovation in virtual-reality telepresence is just waiting to happen?
Notwithstanding the challenges of presenting to a remote audience, enabling remote collaboration is extremely important to ESnet. Audio, video and web collaboration is a key service offered by us to the DOE labs. ESnet employees use video extensively in our day-to-day operations. The “ESnet watercooler”, a 24×7 open video bridge, is used internally by our distributed workforce to discuss technical issues, as well as, to have ad-hoc meetings on topics of interest. As science goes increasingly global, scientists are also using this important ESnet service for their collaborations.
With my brief stint in front of a stage now over, it is back to ESnet and then on to the 100G invited panel/talk at IEEE ANTS conference in Mumbai. Wishing all of you a very Happy New Year!
While we have been busy working towards a 100G ANI prototype wide area network (WAN), researchers at Intel are making sure that we have plenty to do in the future. Yesterday’s Wall Street Journal article (http://on.wsj.com/dcf5ko) on Intel demonstrating 50Gbps communication between chips with silicon-based lasers, is just the tip of the iceberg of competitive research looming in the arena of photon-electron integration.
This demonstration from Intel (Kudos to them!) is a great reminder of how such innovations can revolutionize the computing model by making it easier to move large amounts of data between the chips on a motherboard or between thousands of multi-core processors, leading the way towards exascale computing. Just imagine the multi-terabit fire hose of capacity ESnet would have to turn on to keep those chips satisfied! This seamless transition from electronics to photonics without dependence on expensive sets of photonic components has the potential to transform the entire computing industry and give an additional boost to the “Cloud” industry. Thomas J. Watson has been credited with saying “The world needs only five computers”. We look to be collecting the innovations to just prove him right one day.
While we do get excited about the fantastic future of silicon integration, I would like to point out the PIC (Photonic Integrated Chip) has been a great innovation by a company, Infinera, just down the Silicon Valley – they are actually mass-producing integrated lasers on a chip for a different application – long distance communication, by using a substrate material different than silicon. This technology is for real. You can get to play with the Infinera’s in our ANI testbed – you just need to come up with a cool research problem and write a proposal by October 1st, 2010.
History is being written: from a simple diagram published in 1976 by Dr. Robert Metcalfe, with a data rate of 3 Mpbs, Ethernet surely has come a long way in the last 30 years. Coincidentally, the parent of ESnet, MFEnet, was also launched around the same time as a result of the new Fusion Energy supercomputer center at Lawrence Livermore National Labs (LLNL) http://www.es.net/hypertext/esnet-history.html. It is remarkable to note that right now, as the 100GE standard got ratified, ESnet engineers are very much on the ball, busy putting 100GE enabled routers through the paces within our labs.
For ESnet and the Department of Energy – it is all about the science. To enable large-scale scientific discovery, very large scientific instruments are being built. You have read on the blog about DUSEL, and are familiar with LHC. These instruments – particle accelerators, synchrotron light sources, large supercomputers, and radio telescope farms are generating massive amounts of data and involve large collaborations of scientists to extract useful research results from it. The Office of Science is looking to ESnet to build and operate a network infrastructure that can scale up to meet the highly demanding performance needs of scientific applications. The Advanced Networking Initiative (ANI) to build the nationwide 100G prototype network and a research testbed is a great start. If you are interested in being part of this exciting initiative, do bid on the 100G Transport RFP.
As a community, we need to keep advancing the state of networking to meet the oncoming age of the digital data deluge (D³).
Steve Cotter, Department Head, ESnet at Lawrence Berkeley National Laboratory “As the science community looks at collaboratively solving hard research problems to positively impact the lives of billions of people, for example research on global climate change, alternative energy and energy efficiency, as well as projects including the Large Hadron Collider that probe the fundamental nature of our universe – leveraging petascale data and information exchange is essential. To accomplish this, high-bandwidth networking is necessary for distributed exascale computing. Lawrence Berkeley National Laboratory is excited to leverage this standard to build a 100G nationwide prototype network as part of ESnet’s participation in the DOE Office of Science Advanced Networking Initiative.”