I do a lot of review for someone else's code for Ansible and I write a lot myself. In the course of analyzing errors (both strangers and my own), as well as a certain number of interviews, I realized the main mistake that Ansibl users make - they climb into a difficult one without mastering the basic one.

To correct this universal injustice, I decided to write an introduction to Ansible for those who already know it. I warn you, this is not a retelling of mana, it is a longride in which there are many letters and no pictures.

The expected level of the reader - several thousand lines of soap have already been written, already something in production, but "somehow everything is crooked."


The main mistake of the Ansible user is not knowing what is called. If you do not know the names, you cannot understand what is written in the documentation. A living example: at an interview, a person who allegedly claimed that he wrote a lot in Ansible could not answer the question "what elements of a playbook consists of?". And when I prompted that "an answer was expected that the playbook consists of play", then there was a killer comment "we do not use this." People write in Ansible for money and don’t use play. They actually use, but don’t know what it is.

So let's start with a simple: what’s called. Maybe you know this, or maybe not, because you didn’t pay attention when you read the documentation.

ansible-playbook runs a playbook. Playbook is a file with the extension yml/yaml, inside of which something like this:

--- - hosts: group1 roles: - role1 - hosts: group2,group3 tasks: - debug: 

We already realized that this whole file is a playbook. We can show where roles are, where tasks are. But where is play? And what is the difference between play and role or playbook?

The documentation has it all. And it is overlooked. Beginners - because there are too many and you won’t remember everything at once. Experienced - because "trivial things." If you are experienced, re-read these pages at least once every half a year, and your code will become a better class.

So remember: Playbook is a list of play and CDMY0CDMY.
This is one play:

- hosts: group1 roles: - role1 

and this is also another play:

- hosts: group2,group3 tasks: - debug: 

What is play? Why is she?

Play is a key element for a playbook because play, and only play, associates a list of roles and/or shuffles with a list of hosts on which to run them. In the deep bowels of the documentation you can find mention of CDMY1CDMY, local lookup plugins, network-cli-specific settings, jump hosts, etc. They allow you to slightly change the place of execution of tasas. But forget about it. Each of these tricky options has very special uses, and they are certainly not universal. And we are talking about basic things that everyone should know and use.

If you want to execute something somewhere, you write play. Not a role. Not a role with modules and delegates. You take and write play. In which, in the hosts field, you list where to execute, and in roles/tasks - what to execute.

Just the same, right? But how could it be otherwise?

One of the characteristic moments when people have a desire to do this not through play is the "role that sets everything up." I would like to have a role that configures both the server of the first type and the server of the second type.

An archetypal example is monitoring. I would like to have a monitoring role that will configure monitoring. The monitoring role is assigned to monitoring hosts (respectively play). But it turns out that for monitoring we need to put packets on the hosts that we monitor. Why not use delegate? And you also need to configure iptables. delegate? And still it is necessary to write/correct a config for a DBMS that monitoring started. delegate! And if the creative popper, then you can make the delegation CDMY2CDMY in a nested loop on the tricky filter on the list of groups, and inside CDMY3CDMY you can still do CDMY4CDMY again. And away we go.

A good wish - to have one single monitoring role, which "does everything" - leads us to hell, from which most often there is one way out: rewrite everything from scratch.

Where did the error happen here? At that moment, when you found that to complete the task “x” on host X, you need to go to host Y and make “y” there, you should have performed a simple exercise: go and write play, which makes y on host Y. Do not append something to "x", but write from scratch. Even with hardcoded variables.

It seems that in the paragraphs above everything is said correctly. But this is not your case! Because you want to write reusable code that is DRY and looks like a library, and you need to look for a method to do this.

Here is another gross error lurking here. A mistake that turned a lot of projects from tolerably written (it’s possible better, but everything works and is easy to add) into utter horror, which even the author cannot understand. It works, but God forbid change something.

This error sounds like this: the role is the library function . This analogy has ruined so many good undertakings that it’s just sad to watch. A role is not a library function. She cannot make calculations and she cannot make decisions of the play level. Remind me what decisions play makes?

Thank you, you're right. Play decides (or rather, contains information) about which tasks and roles on which hosts to perform.

If you delegate this decision to a role, and even with calculations, you doom yourself (and the one who will try to parse your code) into a miserable existence. The role does not decide where it is performed. This decision is made by play. The role does what she was told where she was told.

Why it is dangerous to program in Ansible and why COBOL is better than Ansible we will talk in the chapter about variables and jinja. For now, let’s say one thing - each of your calculations leaves an indelible trace of the change in global variables, and you can’t do anything about it. As soon as the two “tracks” crossed, everything was gone.

Note for corrosive: a role can certainly influence control flow. There is CDMY5CDMY and it has reasonable uses. There is CDMY6CDMY. But! Remember we learn the basics? Forgot about CDMY7CDMY. We are talking about the simplest and most beautiful Ansible code. Which is easy to read, easy to write, easy to debug, easy to test and easy to append. So again:

play and only play decides what hosts are running on which.

In this section, we figured out the confrontation between play and role. Now let's talk about the relationship tasks vs role.

Tusky and Roles

Consider play:

- hosts: somegroup pre_tasks: - some_tasks1: roles: - role1 - role2 post_tasks: - some_task2: - some_task3: 

Let's say you need to do foo. And it looks like CDMY8CDMY. Where to write it? in pre? post? Create a role?

... And where did the tasks go?

We start again with the basics - the play device. If you are swimming in this question, you cannot use play as the basis for everything else, and your result will be "shaky."

Play device: hosts directive, settings for play itself and the pre_tasks, tasks, roles, post_tasks sections. The rest of the options for play are not important to us right now.

The order of their sections with tasks and roles: CDMY9CDMY, CDMY10CDMY, CDMY11CDMY, CDMY12CDMY. Since the semantic order of execution between CDMY13CDMY and CDMY14CDMY is not clear, best practices says that we add the CDMY15CDMY section only if there is no CDMY16CDMY. If there is CDMY17CDMY, then all the attached tasks are placed in the sections CDMY18CDMY/CDMY19CDMY.

All that remains is that everything is semantically clear: first CDMY20CDMY, then CDMY21CDMY, then CDMY22CDMY.

But we still haven’t answered the question: where is the CDMY23CDMY module call to write to? Do we need to write an entire role for each module? Or is it better to have a thick role underneath everything? And if not a role, then where to write - in pre or in post?

If there is no well-reasoned answer to these questions, then this is a sign of a lack of intuition, that is, those very "shaky foundations." Let's get it right. First, the security question: If play has CDMY24CDMY and CDMY25CDMY (and there are no tasks or roles), can something break if I transfer the first task from CDMY26CDMY to the end of CDMY27CDMY?

Of course, the wording of the question hints that it will break. But what exactly?

... Handlers. Reading the basics reveals an important fact: all handlers flush automatically after each section. Those. all tasks from CDMY28CDMY are executed, then all handlers that were notify. Then all roles and all handlers that were notify in roles are executed. Then CDMY29CDMY and their handlers.

Thus, if you drag task from CDMY30CDMY to CDMY31CDMY, then, potentially, you will execute it before executing the handler.for example, if a web server is installed and configured in CDMY32CDMY, and something is sent to it in CDMY33CDMY, then transferring this task to the CDMY34CDMY section will cause the server to not be started yet and everything will break.

And now let's think again, why do we need CDMY35CDMY and CDMY36CDMY? For example, in order to perform everything you need (including handlers) before performing a role. And CDMY37CDMY will allow us to work with the results of performing roles (including handlers).

The annoying connoisseur of Ansible will tell us that there is CDMY38CDMY, but why do we need flush_handlers if we can rely on the execution order of sections in play? Moreover, the use of meta: flush_handlers can deliver us unexpected things with repeated handlers, make us strange warings if CDMY39CDMY is used with CDMY40CDMY, etc. The better you know the ensemble, the more nuances you can name for a "tricky" solution. A simple solution - using the natural separation between pre/roles/post - does not cause nuances.

And, back to our 'foo'. Where to put it? In pre, post or roles? Obviously, this depends on whether we need the results of the handler for foo. If they are not, then foo does not need to be put in either pre or post - these sections have a special meaning - executing shuffles before and after the main code array.

Now the answer to the question "role or task" comes down to what is already in play - if there are tasks, then you need to add them to tasks. If there are roles, you need to make a role (albeit from one task). I remind you that tasks and roles are not used at the same time.

Understanding the basics of Ansible gives reasonable answers to seemingly questions of taste.

Tasks and roles (part two)

Now let's discuss the situation when you are just starting to write a playbook. You need to do foo, bar and baz. Are these three tasks, one role or three roles? Summarizing the question: at what point should I begin to write roles? What is the point of writing roles when you can write tasks?... And what is a role?

One of the grossest mistakes (I already spoke about this) is to consider that a role is like a function in the library of a program. What does a generalized function description look like? It takes input arguments, interacts with side causes, makes side effects, returns a value.

Now, attention. What of this can be done in the role? To call side effects - always please, this is the essence of the whole Ansible - to do side effects. Have side causes? Elementary. But with "pass the value and return it" - here it is not. First, you cannot pass a value to a role. You can set a global variable with a lifetime of size play in the vars section for the role. You can set a global variable with a lifetime in play inside the role. Or even with playbook life (CDMY41CDMY/CDMY42CDMY). But you cannot have "local variables". You cannot "take a value" and "return it."

The main thing follows from this: you cannot write something on ansible and not cause side effects. Changing global variables is always a side effect for a function. In Rust, for example, changing a global variable is CDMY43CDMY. And in Ansible - the only method to influence the values ​​for the role. Pay attention to the words used: do not "transfer the value to the role", but "change the values ​​that the role uses." There is no isolation between the roles. There is no isolation between tasks and roles.

Total: A role is not a function .

What good is there in the role? Firstly, the role has default values ​​(CDMY44CDMY), and secondly, the role has additional directories for folding files.

What are the good default values? Due to the fact that in the Maslow pyramid Ansible has a rather perverted table of variable priorities, role defaults are the most non-priority ones (minus the ansible command line parameters). This means that if you need to provide default values ​​and not worry that they will kill the values ​​from the inventory or group variables, then role defaults are the only right place for you. (I'm lying a little - there is CDMY45CDMY, but if we talk about stationary places, it’s only role defaults).

What else is good about casting? The fact that they have their own catalogs. These are directories for variables, both constant (i.e. calculated for the role) and dynamic (there is either a pattern or an anti-pattern - CDMY46CDMY together with CDMY47CDMY.).These are the directories for CDMY48CDMY, CDMY49CDMY. Also, it allows you to have roles in your modules and plugins (CDMY50CDMY). But, in comparison with the playbook’s tasks (which can have all this too), the only benefit is that the files are piled not in one heap, but in several separate heaps.

Another detail: you can try to make roles that will be available for reuse (via galaxy). After the appearance of the collections, the distribution of roles can be considered almost forgotten.

Thus, roles have two important features: they have defaults (a unique feature) and they allow you to structure the code.

Returning to the original question: when to do tasks and when roles? Taskes in playbooks are most often used either as “glue” before/after roles, or as an independent building element (then there should be no roles in the code). A pile of normal shuffles mixed with roles is an unequivocal sloppiness. You should adhere to a specific style - either task or role. Roles give separation of entities and defaults; tasks make it possible to read code faster. Usually, more “stationary” (important and complex) code is rendered in the role, and auxiliary scripts are written in the style of tacs.

It is possible to do import_role as task, but if you write this, then be prepared for an explanatory for your own sense of beauty, why do you want to do this.

An annoying reader can say that roles can import roles, roles can be addicted via galaxy.yml, and there is also the scary and terrible CDMY51CDMY - I remind you, we improve our skills in basic Ansible, and not in figured gymnastics.

Handlers and Tasks

Let's discuss one more obvious thing: handlers. The ability to use them correctly is almost an art. What is the difference between a handler and a task?

Since we recall the basics, here is an example:

- hosts: group1 tasks: - foo: notify: handler1 handlers: - name: handler1 bar: 

In the role, handlers are in rolename/handlers/main.yaml. Handlers shuffle between all play participants: pre/post_tasks can pull role handlers, and a role can pull handlers from play. However, cross-role handler calls cause far more wtf than repeating a trivial handler. (Another element of best practices is to try not to repeat handler names.)

The main difference is that task execution is (idempotent) always (plus/minus tags and CDMY52CDMY), and the handler is used for state changes (notify only works if it has been changed). What is this fraught with? For example, the fact that when you restart it, if it has not been changed, then there will be no handler. And why can it be that we need to execute a handler when the generating task was not changed? For example, because something broke and was changed, but the execution did not reach the handler. For example, because the network was temporarily lying. The config has changed, the service is not restarted. The next time you start the config, it does not change anymore, and the service remains with the old version of the config.

The situation with the config cannot be solved (more precisely, you can invent a special restart protocol with file flags, etc., but this is no longer 'basic ansible' in any way). But there is another common story: we installed the application, recorded its CDMY53CDMY file, and now we want its CDMY54CDMY and CDMY55CDMY. And the natural place for this seems to be the handler. But if you make him not a handler but a task at the end of the tasklist or role, then it will be idempotent executed every time. Even if the playbook broke in the middle. This does not solve the restarted problem at all (you cannot do task with the restarted attribute, because idempotency is lost), but it is definitely worth doing state=started, the overall stability of playbooks increases, because the number of connections and dynamic state is reduced.

Another positive property of handler is that it does not clog the output. There were no changes - no extra skipped or ok in the output - easier to read. It is also a negative property - if you find a typo in a linearly executed task on the first run, then handlers will be executed only when changed, i.e. under some conditions - very rarely. For example, the first time in my life five years later. And, of course, there will be a typo in the name and everything will break. And the second time you can’t start them - there’s no change.

Separately, we need to talk about the availability of variables.For example, if you do notify for task with a loop, then what will be in the variables? You can guess analytically, but this is not always trivial, especially if the variables come from different places.

... So handlers are far less useful and far more problematic than they seem. If you can write something beautifully (without tricks), write without handlers is better to do without them. If it’s beautiful, it’s better with them.

The prudent reader rightly notes that we did not discuss CDMY56CDMY, that the handler can cause notify for another handler, that the handler can include import_tasks (which can do include_role c with_items), that the handler system in Ansible is Turing-complete, that the handlers from include_role intersect in the most curious way with handlers from play, etc. - all this is clearly not the "basics").

Although there is one specific WTF, which is actually a feature, and which must be remembered. If your task is executed with CDMY57CDMY and it has notify, then the corresponding handler is executed without CDMY58CDMY, i.e. on the host on which play is assigned. (Although the handler, of course, may have CDMY59CDMY, too.)

Separately, I want to say a few words about reusable roles. Before the collections appeared, the idea was that universal roles could be made that CDMY60CDMY could go. Works on all OSs of all options in all situations. So, my opinion is: this does not work. Any role with the massive CDMY61CDMY, supporting 100,500 cases is doomed to the abyss of corner case bugs. They can be shut up with mass testing, but as with any testing, either you have a Cartesian product of input values ​​and a total function, or you have "covered individual scripts." My opinion is much better if the role is linear (cyclomatic complexity 1).

The fewer ifs (explicit or declarative - in the form CDMY62CDMY or form CDMY63CDMY in a set of variables), the better the role. Sometimes you have to do branching, but, I repeat, the fewer they are, the better. So, it seems that a good role with galaxy (it works!) With a bunch of CDMY64CDMY may be less preferable than its own role of five shuffles. The moment when the role with galaxy is better is when you start to write something. The moment when it gets worse - when something breaks, and you have a suspicion that this is due to the "role with galaxy". You open it, and there are five inclusions, eight task lists and a stack of CDMY65CDMYs... And you need to figure it out. Instead of 5 shuffles with a linear list, in which there’s nothing to break.

In the following parts

  • A little about inventory, group variables, host_group_vars plugin, hostvars. How to tie a Gordian knot from spaghetti. Scope and precedence variables, Ansible memory model. "So where does the username for the database still be stored?".
  • CDMY66CDMY - nosql notype nosense soft plasticine. It is everywhere, even where you do not expect it. A little about CDMY67CDMY and delicious yaml.