Infrastructure as Code and Flying to the Moon
Space is awesome
The Space Launch System will be the most powerful rocket that NASA has ever built.
Seriously, this thing is incredible. The Space Shuttle was gorgeous, but it didn’t have the power or the architecture to get us any further than low Earth orbit.
If we want to get humanity back to the Moon – and beyond – then we’re going to need a bigger boat.
Absolutely, we’re going back to the Moon.
And this time, we’re staying.
In the not too-distant future, NASA plans to use the SLS to ferry astronauts up into orbit around the Moon to build humanity’s first lunar outpost. It’ll be our jumping-off point for missions to the Moon’s surface, to Mars, and the rest of the Solar System.
I get happy little chills just thinking about it!
All about multi-stage vehicles
Like all heavy-lifting spacecraft, the SLS will be a multi-stage vehicle.
Breaking free of Earth’s gravity takes fuel, and fuel costs money. Every gram shaved off a launch proposal increases the chances of it being funded.
Rockets are big, weighty things, but – unlike the food, water, scientific equipment and everything else on board – rockets literally pull1 their weight. Everything else is just along for the ride. As long as a rocket is providing enough thrust to overcome its own mass, it’s contributing to the launch.
Eventually, though, an engine’s fuel will dwindle to the point where it can’t create enough thrust, and it becomes a net drain rather than net contributor. The mass that the other engines need to carry is increased, and this naturally makes them incredibly resentful.
This why we have multi-stage vehicles. Each “stage” is a means of providing thrust2, and is connected to the vehicle with explosive bolts. As long as a stage is providing more thrust than drag, it’s welcome to stay. As soon as it becomes a burden, the bolts are blown and it’s left behind.
Here’s an incredible video showing the Space Shuttle’s boosters being jettisoned around the 1:55 mark:
Expensive reusable vs cheap disposable boosters
You’ve probably already seen SpaceX’s approach to designing boosters. After a Falcon 9 booster detaches, it performs a controlled return to Earth for recovery and possible reuse.
I’m not ashamed to admit, I wept the first time I saw those things land. It still brings a tear to my eye. It’s so far beyond the spaceflight I grew up knowing, and my little heart runneth over with joy.
NASA are taking a more traditional approach with the SLS’s boosters; these things are going to crash into the ocean for instant retirement after their first and only mission. There’s a time and a place for investing in radically new technologies, and NASA have decided that this ain’t it.
So, since the boosters will be used just once before they’re scrapped, NASA will want to keep the cost as low as possible. This decision will drive others.
Liquid-fuel vs solid-fuel
NASA also had to choose between liquid-fuel and solid-fuel engines.
Solid-fuel engines are often compared to fireworks, and that’s not far off. They’re literally filled with a solid explosive cake, and there’s no going back after it’s been lit. It can’t be stopped, and it can’t be throttled, so you better be sure that you’re strapped in and ready to go.
Liquid-fuel engines, on the other hand, can be regulated. They can throttle up and down, and even be shut down in an emergency. Liquid-fuel engines are also far more efficient and powerful than solid-fuel boosters.
So liquid-fuel engines feel like an easy win, right? Unfortunately, they come with a considerable drawback; they’re far more expensive than solid-fuel engines. This makes them less palatable to abandon in the ocean after just one flight.
If only there was a way of manufacturing cheap liquid-fuel engines. Especially a way that’s been proven already.
That would be lovely, right?
We’ve done this before
The fact is, NASA has flown an affordable, single-use liquid-fuel engine before.
The Rocketdyne F-1 engine pushed the first stage of the Saturn V, which took astronauts to the Moon in the 1960s. The simple design made it relatively cheap to manufacture, and less depressing to just throw away after lift-off.
NASA has flown this engine before. They’ve still got the blueprints. So, surely, stamping out more of the same should be easier, and way cheaper than it was fifty years ago?
So, why not?
Why can’t we just build more of the same?
I’m going to spoil the ending, so you should watch the video first.
Did you watch it? Okay then, here we go…
The Rocketdyne F1 engine was designed and manufactured in the days before computer-aided design, computerised simulations and high-precision manufacturing.
These days, we’re blessed with experience and technology which give us a much deeper understanding of engineering, including:
- How to model the stresses inside a system.
- How the properties of materials change under stress.
- How to manufacture materials more consistently to specifications.
- How to simulate designs before committing metal to welding.
This deep knowledge allows our designs to be more accurate up-front. It’s this kind of knowledge that allows – for example – two different companies to manufacture two different sides of a spacecraft docking port, and have them work perfectly when they meet for the first time in space.
Without that modern knowledge, every individual F1 engine was a unique snowflake that was lovingly tweaked by teams of experts until the combined quirks of every imperfection were tuned into a working machine.
The engineers had blueprints to follow, and NASA still has them. The point is that – by today’s standards – they’re more aspirational than instructional. They don’t detail the “secret sauce” that the engineers had to add to each engine to make them work.
There was no malice to the secrecy. The engineers were working to tight deadlines, and didn’t have time to commit every tweak to paper. I doubt there was much interest in documenting the tweaks anyway; what one engine needed, most others probably wouldn’t.
Also, at the time, the engineering skills required to diagnose and tweak the problems were commonplace enough that there was simply no need to document a lot of it. Any engineer looking at those plans just a few years later might’ve had a chance of making them work. But now fifty years later, those skills are lost.
So yes, NASA has the blueprints for a low-cost liquid-fuel engine, buy they don’t know what the engineers of the time had to do to each one to make it work.
Infrastructure as code and “deploying more of the same”
The good news for us software crafters is that we don’t need to worry about losing this knowledge in quite the same way that the F1’s engineers did.
One of the many – many! – benefits of infrastructure as code is the ability to use that code as a template to deploy the same infrastructure over and over again.
- When you’ve written the infrastructure to deploy a single service, stamp it out again to deploy the rest of your services.
- When you’ve written the infrastructure to deploy a single environment, stamp it out again to deploy the rest of your environments.
- When you’ve written the infrastructure to deploy your platform to a single region, stamp it out again to deploy the rest of your regions.
The alternative to deploying infrastructure as code is to deploy it manually. That means getting humans to type commands and click buttons, and hoping they do it all the same, accurately, every time.
To keep your infrastructure as code valuable, you have to ensure that every new feature and bug fix goes into the code. As soon as you make an ad-hoc change to your infrastructure, you’ve lost reproducibility. Your template won’t stamp out “more of the same” anymore, because what it stamps out won’t be exactly what you’re expecting. The “secret sauce” of the manual tweak will be lost to history.
“Stop the bleeding”
Despite what I just said, there is a time and place for quick ad-hoc changes to infrastructure.
My most excellent friend Stu introduced me to a phrase I carry in my soul and apply every time something goes wrong:
“Stop the bleeding.”
The phrase originates from medial triage; if a patient lands in front of you with a broken leg, with the bone punctured though the skin and blood gushing everywhere, you’ve got stop the bleeding before you try to fix anything else.
Stopping the bleeding keeps the patient alive while you come up with a plan to solve the underlying problem. Trying to re-set the bone while the patient bleeds out might solve the problem of the break, but the patient will be dead.
Likewise, you might find yourself in a scenario where an infrastructure bug is preventing users logging into your platform. The support calls are flooding in. Your top five customers are pointing at the compensation clause in your service-level agreement. And your boss is looking at you.
The worst thing you could do now is get stuck in a pure infrastructure-as-code mindset. If you slipped your headphones on and started calmly branching your code, thinking about the tests you need to write and the nature of the long-term fix, then sure, you might end up with an awesome fix, but your customers have already moved to your competitor.
The underlying problem can wait. You need to stop the bleeding first. You need to pull your team together, diagnose the problem, and fix it immediately – even if that means logging in and manually changing a line of code on a live server. Your team will be there to review the change, consider the ramifications, minimise the risk, and – basically – ensure the tourniquet is good and tight, even if the bone is still broken underneath.
And then, when your customers are able to log in and carry on with business as usual, you and your team can take a deep breath, analyse the problem, and update your infrastructure as code to apply the mitigation across all your existing infrastructure, and ensure it never happens in any new infrastructure you deploy in the future.
Okay, it’s not quite the same thing
By no means am I sneering at the F-1 engineers for not updating their plans. The challenges they were facing – and the deadlines they were up against – were unique in all of humankind’s history. Their mandate wasn’t sustainability; it was velocity3.
You very likely work on a project with a longer-term view, where there’s more business value in your work being reproducible and scalable than first. Deploying your infrastructure as code – and keeping it up to date with all your features and fixes – will keep you ready to deploy more of what you already have at a moment’s notice.