apt module with state: present installs a package if missing, but does nothing if it already exists.command and shell modules are not idempotent by default. Always prefer purpose-built modules (file, copy, service) that handle state checks.
creates or removes parameters, or check output with register + when conditionals.ansible-playbook.
- name: Configure web servers # Play
hosts: webservers
become: true
tasks:
- name: Install nginx # Task
apt:
name: nginx
state: present
ansible.builtin.* — core: copy, file, template, service, useransible.posix.* — POSIX: mount, firewalld, sysctlcommunity.aws.* / amazon.aws.* — AWS resourcescommunity.general.* — broad ecosystem: Docker, databasesansible-doc -l | grep keyword or docs.ansible.com. Always prefer a specific module over shell — specific modules are idempotent and return structured data.
[webservers] web1.example.com web2.example.com ansible_user=ubuntu [databases] db1.example.com ansible_port=2222Dynamic inventory: A script or plugin that queries an external source (AWS EC2, Azure, GCP, Terraform state) at runtime and returns JSON-formatted host data.
amazon.aws.aws_ec2 plugin pulls instances tagged with specific values automatically — no stale host files.defaults/main.yml) — lowest, intentionally easy to overridevars: block)vars/main.yml) — high, harder to override-e) — always wins, highest priorityansible -m debug -a "var=my_var" hostname to inspect its resolved value and source.group_vars and host_vars? How do you structure them?
inventory/
hosts.yml
group_vars/
all.yml # applies to every host
webservers.yml # applies to [webservers] group
webservers/ # can be a directory
main.yml
vault.yml # encrypted secrets
host_vars/
web1.example.com.yml # overrides for one host
Variables in host_vars take precedence over group_vars. Both are automatically loaded by Ansible when found adjacent to your inventory.
register and when to make conditional tasks?
register captures a task's output into a variable. when evaluates a Jinja2 condition to decide whether to run a task.
- name: Check if config exists
stat:
path: /etc/myapp/config.yml
register: app_config
- name: Run first-time setup
command: /opt/myapp/setup.sh
when: not app_config.stat.exists
- name: Check service status
command: systemctl is-active myapp
register: svc_status
ignore_errors: true
- name: Restart if not running
service:
name: myapp
state: restarted
when: svc_status.rc != 0
Common attributes: .stdout, .stderr, .rc, .stat.exists, .changed, .failed
ansible-vault encrypt group_vars/all/vault.ymlEdit in place:
ansible-vault edit group_vars/all/vault.ymlRun with vault password:
ansible-playbook site.yml --vault-password-file ~/.vault_pass # or interactively: ansible-playbook site.yml --ask-vault-passIn CI/CD: Store the vault password as a masked CI secret. Pass it via temp file:
echo "$VAULT_PASS" > /tmp/vp && ansible-playbook site.yml \ --vault-password-file /tmp/vp; rm /tmp/vpBest practice: Name vault vars with a prefix:
vault_db_password. Reference with a plain var: db_password: "{{ vault_db_password }}" — keeps code readable while keeping secrets encrypted.
roles/
nginx/
tasks/
main.yml # entry point, always executed
handlers/
main.yml # triggered by notify
templates/
nginx.conf.j2 # Jinja2 templates
files/
index.html # static files to copy
vars/
main.yml # high-priority variables
defaults/
main.yml # low-priority override-friendly defaults
meta/
main.yml # role dependencies, metadata
Generate the skeleton: ansible-galaxy role init nginx
vars/main.yml and defaults/main.yml in a role?
defaults/main.yml — low priority: Variables here are sensible defaults you expect users to override. They lose to nearly everything: inventory vars, playbook vars, extra vars. Use for things like nginx_port: 80.vars/main.yml — high priority: Variables here override inventory and most playbook vars. Use for internal role constants — package names, internal paths, version pins — that the role depends on to function correctly.defaults. If it's an internal implementation detail → vars.
tasks:
- name: Update nginx config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: Restart nginx
- name: Update TLS cert
copy:
src: cert.pem
dest: /etc/nginx/cert.pem
notify: Restart nginx # notified twice, runs once
handlers:
- name: Restart nginx
service:
name: nginx
state: restarted
If the config task reports changed: false (file unchanged), the handler is not notified — no unnecessary restarts. Idempotency in action.
requirements.yml for reproducible installs:
roles:
- name: geerlingguy.nginx
version: 3.1.0
collections:
- name: amazon.aws
version: ">=6.0.0"
ansible-galaxy install -r requirements.ymlIn CI/CD: run this as the first step before any playbook. Commit
requirements.yml to source control; never commit the installed roles directory itself.
ansible-repo/
inventories/
production/
hosts.yml
group_vars/
host_vars/
staging/
hosts.yml
group_vars/
roles/
common/ # baseline every server gets
nginx/
postgres/
collections/
requirements.yml
playbooks/
site.yml # master (imports all)
webservers.yml
databases.yml
library/ # custom modules
filter_plugins/ # custom Jinja2 filters
ansible.cfg
Key principles:
site.yml imports playbooks — lets you run the full stack or a single tieransible-lint in CI to enforce quality across teamstemplate module and how does it differ from copy?
copy transfers a static file to the managed node unchanged.template processes a Jinja2 (.j2) file, substituting variables and evaluating logic, then transfers the rendered result.
server {
listen {{ nginx_port }};
server_name {{ ansible_hostname }};
worker_processes {{ ansible_processor_vcpus }};
{% if ssl_enabled %}
listen 443 ssl;
ssl_certificate {{ ssl_cert_path }};
{% endif %}
}
Use template when config varies per host or environment. Use copy for truly static files.
{{ variable_name }}{% if env == 'production' %}...{% endif %}{% for host in groups['webservers'] %}{{ host }}{% endfor %}{{ my_list | join(', ') }}
{{ my_string | upper }}
{{ my_var | default('fallback') }}
{{ some_dict | to_json }}
{{ path | basename }}
{% if my_var is defined %}, {% if value is none %}default() filter is your safety net — use it whenever a variable might not be defined: {{ nginx_port | default(80) }}. Prevents "variable undefined" errors in edge cases.loop in Ansible tasks?
loop (modern) iterates a task over a list. Each iteration exposes the current item as item.
# Install multiple packages
- name: Install required packages
apt:
name: "{{ item }}"
state: present
loop:
- nginx
- python3
- git
# Loop over dicts
- name: Create users
user:
name: "{{ item.name }}"
groups: "{{ item.groups }}"
state: present
loop:
- { name: alice, groups: sudo }
- { name: bob, groups: docker }
# Loop with index
- name: Numbered output
debug:
msg: "Item {{ idx }}: {{ item }}"
loop: "{{ my_list }}"
loop_control:
index_var: idx
setup module (gather_facts: true by default).ansible_hostname — short hostnameansible_os_family — RedHat, Debian, etc.ansible_distribution, ansible_distribution_versionansible_processor_vcpus, ansible_memtotal_mbansible_default_ipv4.addressansible_date_time.date# Conditional based on OS
- name: Install on Debian systems only
apt:
name: nginx
when: ansible_os_family == "Debian"
# Use in templates
worker_processes {{ ansible_processor_vcpus }};
Custom facts: Drop a .fact file (JSON/INI) in /etc/ansible/facts.d/ on managed nodes — accessible as ansible_local.myfile.key.
set_fact to compute and pass variables between tasks?
set_fact creates or updates a variable at runtime — useful for deriving computed values or combining data from registered output.
- name: Get instance region from metadata
command: curl -s http://169.254.169.254/latest/meta-data/placement/region
register: region_raw
- name: Set derived facts
set_fact:
aws_region: "{{ region_raw.stdout | trim }}"
deploy_tag: "{{ app_name }}-{{ ansible_date_time.date }}"
is_primary: "{{ inventory_hostname == groups['appservers'][0] }}"
- name: Use derived fact
debug:
msg: "Deploying {{ deploy_tag }} to {{ aws_region }}"
Facts set with set_fact persist for the rest of the play on that host. Add cacheable: true to persist across plays.serial and how do you use it for rolling deployments?
serial limits how many hosts Ansible targets at once, enabling rolling deployments without taking down the whole fleet.
# 2 hosts at a time
- hosts: webservers
serial: 2
# Graduated batches: 1 canary, then 5, then rest
- hosts: webservers
serial:
- 1
- 5
- "100%"
# Percentage-based
- hosts: webservers
serial: "25%"
max_fail_percentage: 10 # abort if 10%+ of batch fails
Pattern: deploy to 1 canary first, validate health, then roll out in batches. Ansible's native rolling deployment story.
serial with a health check task (using uri module) after each deploy. If it fails, the block's rescue section can rollback before the next batch starts.ansible.cfg:
[ssh_connection] pipelining = TrueAlso increase forks (default is 5 — very conservative):
[defaults] forks = 50And enable SSH connection reuse:
[ssh_connection] ssh_args = -o ControlMaster=auto -o ControlPersist=60sCombined result: 3–5x performance improvement on large inventories.
requiretty disabled in /etc/sudoers on managed nodes. Most modern distros already have this disabled.--check mode do and what are its limitations?
--check (dry run) runs the playbook without making actual changes, reporting what would change. Add --diff for line-by-line file diffs.
ansible-playbook site.yml --check --diffLimitations:
command and shell tasks are skipped — their side effects can't be safely predictedregister output may fail or behave oddlycheck_mode: false on individual tasks to force them to always run (e.g., a stat check that later tasks depend on).
- name: Install packages
apt:
name: nginx
tags: [packages, nginx]
- name: Deploy config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
tags: [config, nginx]
Run only config tasks:
ansible-playbook site.yml --tags configSkip package tasks:
ansible-playbook site.yml --skip-tags packagesSpecial tags:
always runs even when other tags are specified. never only runs when explicitly called by tag.
profile_tasks callback to see per-task timing.
[defaults] callbacks_enabled = profile_tasksStep 2 — Increase parallelism: Default
forks = 5 is very conservative.
[defaults] forks = 50Step 3 — Enable pipelining + SSH multiplexing (see Q22).
free strategy if tasks are host-independent:
- hosts: webservers strategy: freeStep 5 — Disable fact gathering where not needed:
- hosts: webservers gather_facts: falseStep 6 — Cache facts if you need them but don't want to re-gather: use
fact_caching = jsonfile in ansible.cfg.
profile_tasks output usually reveals one or two slow tasks responsible for 80% of runtime, often a slow command or unoptimized fact gather.ignore_errors vs failed_when? When do you use each?
ignore_errors: true — continue the play even if this task fails. Task is still marked failed, but execution continues. Use sparingly — silently swallows real errors.failed_when — define custom failure conditions based on task output. More precise and intentional.
# Custom success condition
- name: Check app health
command: curl -s -o /dev/null -w "%{http_code}" http://localhost/health
register: health_check
failed_when: health_check.stdout != "200"
# Multiple conditions (AND logic)
- name: Run idempotent migration
command: /opt/app/migrate.sh
register: migrate_out
failed_when:
- migrate_out.rc != 0
- "'already up to date' not in migrate_out.stdout"
failed_when over ignore_errors — it's explicit about what constitutes success vs failure rather than blanket-ignoring all errors.rescue and always — analogous to try/catch/finally.
- block:
- name: Deploy application
command: /opt/deploy.sh
- name: Run smoke test
uri:
url: http://localhost/health
status_code: 200
rescue:
- name: Rollback deployment
command: /opt/rollback.sh
- name: Alert on-call
slack:
token: "{{ slack_token }}"
msg: "Deploy FAILED on {{ inventory_hostname }}"
always:
- name: Record deployment attempt
command: /opt/log-attempt.sh
If any task in block fails, rescue runs. always runs regardless. Enables robust deployments with automatic rollback.
-v through -vvvv — each level reveals more: task results, module args, SSH commands, SSH connection detailsdebug module:
- debug:
var: my_registered_var
- debug:
msg: "Value is: {{ my_var }}"
ansible webservers -m ping ansible web1 -m shell -a "systemctl status nginx" ansible web1 -m setup | grep ansible_os
--step prompts before each task--start-at-task "Task name" skips to the failing pointdelegate_to and when would you use it?
delegate_to runs a task on a different host than the one being targeted, while still using the targeted host's variables.- name: Remove from ALB target group command: aws elbv2 deregister-targets ... delegate_to: localhost # run AWS CLI on control node
- name: Run migrations
command: python manage.py migrate
delegate_to: "{{ groups['appservers'][0] }}"
run_once: true
delegate_to: localhost is the most common pattern — run AWS CLI, curl, or API calls from the control node while looping over your fleet.
ping host and ssh user@host manually from the control nodeauthorized_keys changed? Try ssh -i /key user@host -vvviptables, ufw)ansible-inventory --list and inspect output.timeout in ansible.cfg.ansible hostname -m ping -vvv. The raw SSH debug output tells you exactly where the connection fails.jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Ansible
run: pip install ansible boto3
- name: Install requirements
run: ansible-galaxy install -r requirements.yml
- name: Write SSH key
run: |
echo "${{ secrets.SSH_PRIVATE_KEY }}" > /tmp/deploy_key
chmod 600 /tmp/deploy_key
- name: Write Vault password
run: echo "${{ secrets.VAULT_PASS }}" > /tmp/vp
- name: Run playbook
run: |
ansible-playbook deploy.yml \
-i inventories/production \
--vault-password-file /tmp/vp \
--private-key /tmp/deploy_key \
--extra-vars "app_version=${{ github.sha }}"
- name: Cleanup secrets
if: always()
run: rm -f /tmp/deploy_key /tmp/vp
Key: use if: always() on the cleanup step so secrets are always removed even if the playbook fails.
ansible-lint roles/nginx/2. Molecule: The standard framework for role testing. Creates ephemeral instances (Docker, Vagrant, EC2), applies your role, verifies with test assertions.
molecule test # create -> converge -> verify -> destroyA typical scenario:
changed tasks, catching hidden side effects automatically.ansible-galaxy collection install amazon.aws community.awsKey modules:
amazon.aws.ec2_instance — create, start, stop, terminate instancesamazon.aws.ec2_security_group — manage SGsamazon.aws.s3_object — upload/download S3 objectsamazon.aws.aws_ec2 — dynamic inventory plugin, auto-discover by tagcommunity.aws.ssm_parameter — read/write SSM Parameter Storecommunity.aws.route53 — manage DNS during deploymentsAWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY environment variables. Never hardcode credentials in playbooks.
hosts: tag_Environment_production automatically targets all EC2 instances tagged Environment=production.serial: 1 for rolling deploys, and handlers to only restart nginx when config actually changed."