re:Invent: « We Need High-Level Compilers to Program the Cloud Machine »
In my last post I reported on the TEST & TEST advice I brought back from re:Invent, and the idea that cloud applications are becoming an art form.
Today an ocean of computing power is available everywhere. Despite the fact that NASA landed Curiosity on Mars using distributed applications in a cloud infrastructure, testing distributed applications in the cloud is still much too much of a “dark art” which wastes time, money and talent. What can be done to improve this situation?”
At re:Invent, the Netflix CEO alluded to the answer. He said that he feels like he and his team were trying to optimize the raw cloud machine as he used to optimize register usage in the old days of assembler programming some 30 years ago. We are still at the early stages of creating TEST & SET instructions in the cloud! We are still figuring out how to program the cloud machine. High-level language and compilers are needed… and I expect they will come soon!
And remember, when we think of TEST & SET, Amdhal is not far away! It reminds me his foundational paper on the “Validity of sequential processor for achieving large scale computing capabilities” where he described “Amdahls” law (the speedup formula of parallel computation). I remember also very important problems discussed in this paper: Amdahl had foreseen many negative factors plaguing the parallel computation of irregular problems. These problems included the fact that propagation rates of different physical effects may have very different impact on the overall performance. And this is exactly my point here: the stability and predictability of latency and bandwidth is a critical issue in the Cloud, the massively shared “software defined parallel machine.”
This is why developers TEST and TEST the network and the availability of nodes. Continuously!
We experienced a similar story 10 years ago when building the first global grid with CERN in the DataGrid project. We observed that the grid application developers were testing and re-testing the network to adapt and boost their applications. To help them, we developed a centralized, “on-demand” network monitoring system. Our goal was to eliminate the anarchic bandwidth and latency probes invasions. My friend Rich Wolski, today CTO at Eucalyptus, also designed the very smart Network Weather Service for forecasting the state of the network in a grid infrastructure.
Yes, visibility is critical to help the application artist to better see what is happening within the IT cloud and anticipate performance issues as well as expenses – or to be cost-aware, as Werner Vogels from Amazon says.
I used to promote “cloud visibility” in my presentations, using the airplane metaphor to explain it more clearly.
Weather forecasts are very important for the airplane pilot. They help to influence the navigation plan. But they alone are not sufficient. Pilots mainly rely on their own embedded instruments when they are in the clouds and turbulence and need to take decisions in real time. That’s why an easy to use and powerful “navigation system” is important when you are in the cloud!
If you are a pilot, you know that you cannot fly in real clouds, facing turbulence, without instruments on board! In much the same way, this is why we are now seeing many monitoring systems being created for the IT cloud! But you must be careful to select the right one because not all “visibility instruments” will provide the clear picture you really need. Be prepared to be flooded by tons of monitoring curves and numbers of all kinds with most of these systems. And be prepared to have to do the triage by yourself, even you are not a network or system specialist!
The point I’m making here is that a navigation system should not just be a collection of “meters.” In a plane, a “navigation system” provides the critical information the pilot needs to make the right decision at the right time. Such “navigation systems” for the IT cloud should be just as sophisticated. In this case, it is not an art. We also need rocket science to navigate safely!
Maybe SDN (Software Defined Network) solutions will provide a remedy based on a consistent view of the network, adapted to cloud infrastructure users needs?
That said, my advice for today, is: Do not forget to check your Cloud valve and adjust it manually if necessary!
Get visibility but take also decision. Try and adapt. Decide if you want to be conservative or aggressive. These are not just technical issues, but also business and financial considerations.