AWS Trivia - Broken user data and instance tag timing
Have you ever noticed in the AWS console, when new instances are created, the “Tags” tab doesn’t have any content for the first few seconds? A second or two before values are added may not seem like much but it can lead to elusive provisioning issues, especially if you’re autoscaling and have easily blamed network dependencies in your user data scripts.
A lot of people use Tag values in their user data scripts to help ‘inflate’ AMIs and defer some configuration, such as which config management classes to apply, to run time when the instance is started, rather than embedding them at build time when the AMI itself is created. In a stupendous amount of cases everything will work exactly as you expect. Instances will start, tags will be applied and user data will determine how to configure the instance based on their values. However, very rarely, the user data script will begin before the tags are applied to the instance.
If your script requires these tag values then you need to consider this
rare but occasional issue and decide how to handle it. You can ignore
it, as it’s very rare. If you’re using tags to assign config
management roles or similar provide sensible defaults such as applying
the base class. It’s possible to ensure that instances that don’t detect
their tags fail their health checks and are marked as defective and
terminated before they come into service. You can also stack the odds a
little more in your favour by having tags reading happen a little later
in your user data, run that apt-get update or AWS agent installing
curl
before fetching the tags for instance to give the tags more time
to be applied.
Tagging is often a simple after thought but in the cloud you need a very firm understanding of which things are atomic units and which are separate services and can fail independently. Although tags may seem like a direct property of the instance they are actually handled (I think) by a completely different service, which can always fail. Understanding this split also explains why you can’t read tags and their values from the local metadata service. Which as an aside can, even more rarely, be unavailable. That was a fun afternoon.
I’ll leave you with a closing comment from the days when you could only have 10 tags. Tag values can be complex strings, for example, JSON objects. Possibly even compressed and base64 encoded JSON objects. Just putting that out there.