Ansible Study Guide

📖

Core Concepts 5 questions

01 What is Ansible and what problem does it solve?

›

Ansible is an agentless, push-based configuration management and automation tool written in Python. It solves the "configuration drift" problem — ensuring servers are in a known, consistent state — without requiring any software installed on managed nodes.

It communicates over SSH (Linux) or WinRM (Windows), and uses human-readable YAML playbooks to describe desired system state.

Core use cases:

Configuration management — install packages, manage files, configure services
Application deployment — push code, restart services, run migrations
Orchestration — coordinate multi-tier rolling upgrades
Provisioning — spin up and configure cloud infrastructure

Contrast with Chef/Puppet (agent-based, pull model) — Ansible's agentless nature is its biggest operational advantage. No agent management, no bootstrapping chicken-and-egg problem.

02 What is idempotency in Ansible and why does it matter?

›

Idempotency means running the same playbook multiple times produces the same result — it only makes changes when the current state differs from the desired state.

Example: the apt module with state: present installs a package if missing, but does nothing if it already exists.

Why it matters:

You can run playbooks as a remediation tool — if a server drifts, just re-run and it converges
Safe to run repeatedly in CI/CD pipelines without side effects
Enables scheduled enforcement — run nightly to detect and fix drift

Caveat: The command and shell modules are not idempotent by default. Always prefer purpose-built modules (file, copy, service) that handle state checks.

If asked "how do you make a shell command idempotent?" — answer: use the creates or removes parameters, or check output with register + when conditionals.

03 What is the difference between a Task, a Play, and a Playbook?

›

Task: The smallest unit. A single call to a module with parameters — "install nginx using the apt module."

Play: A mapping of tasks to a group of hosts. Defines which hosts are targeted, connection user, privilege escalation, and variable scope.

Playbook: A YAML file containing one or more plays. The top-level unit you execute with ansible-playbook.

- name: Configure web servers     # Play
  hosts: webservers
  become: true
  tasks:
    - name: Install nginx           # Task
      apt:
        name: nginx
        state: present

04 How does Ansible's push model differ from Chef and Puppet's pull model?

›

Ansible (Push): The control node initiates connections and pushes configuration. No agent needed.

Pro: Simple setup, no agent lifecycle management, immediate execution
Con: Control node must reach all managed nodes. Large-scale parallelism needs fork tuning.

Chef/Puppet (Pull): An agent runs on each node, periodically polling a central server for its configuration catalog.

Pro: Scales to thousands of nodes, continuous enforcement without a central initiator
Con: Agent must be installed and kept healthy on every node. Bootstrap chicken-and-egg problem.

Ansible Tower/AWX can schedule recurring playbook runs, giving pull-like continuous enforcement with Ansible's agentless simplicity.

05 What is an Ansible module and how do you find the right one to use?

›

A module is a reusable unit of code Ansible ships to the managed node and executes. Each module does one specific thing. Ansible has 3,000+ modules in collections:

ansible.builtin.* — core: copy, file, template, service, user
ansible.posix.* — POSIX: mount, firewalld, sysctl
community.aws.* / amazon.aws.* — AWS resources
community.general.* — broad ecosystem: Docker, databases

Finding modules: ansible-doc -l | grep keyword or docs.ansible.com. Always prefer a specific module over shell — specific modules are idempotent and return structured data.

📋

Inventory & Variables 5 questions

06 What is an Ansible inventory and what forms can it take?

›

An inventory defines the hosts and groups Ansible can target.

Static inventory (INI or YAML):

[webservers]
web1.example.com
web2.example.com ansible_user=ubuntu

[databases]
db1.example.com ansible_port=2222

Dynamic inventory: A script or plugin that queries an external source (AWS EC2, Azure, GCP, Terraform state) at runtime and returns JSON-formatted host data.

AWS example: the amazon.aws.aws_ec2 plugin pulls instances tagged with specific values automatically — no stale host files.

Use dynamic inventory in cloud environments so playbooks always target current infrastructure, not a manually maintained file that drifts.

07 Explain Ansible variable precedence. Which sources win?

›

Ansible has ~22 precedence levels. Most important (lowest to highest):

Role defaults (defaults/main.yml) — lowest, intentionally easy to override
Inventory group_vars and host_vars
Playbook vars (vars: block)
Role vars (vars/main.yml) — high, harder to override
Task vars / include_vars
Extra vars (-e) — always wins, highest priority

Key rule: role defaults are cheap overrides (designed for users to override via inventory). Extra vars always win — great for CI/CD parameter injection.

If a variable isn't behaving as expected, run ansible -m debug -a "var=my_var" hostname to inspect its resolved value and source.

08 What is group_vars and host_vars? How do you structure them?

›

Directories that automatically assign variables to groups and individual hosts without cluttering the inventory file.

inventory/
  hosts.yml
  group_vars/
    all.yml          # applies to every host
    webservers.yml   # applies to [webservers] group
    webservers/      # can be a directory
      main.yml
      vault.yml      # encrypted secrets
  host_vars/
    web1.example.com.yml   # overrides for one host

Variables in host_vars take precedence over group_vars. Both are automatically loaded by Ansible when found adjacent to your inventory.

09 How do you use register and when to make conditional tasks?

›

register captures a task's output into a variable. when evaluates a Jinja2 condition to decide whether to run a task.

- name: Check if config exists
  stat:
    path: /etc/myapp/config.yml
  register: app_config

- name: Run first-time setup
  command: /opt/myapp/setup.sh
  when: not app_config.stat.exists

- name: Check service status
  command: systemctl is-active myapp
  register: svc_status
  ignore_errors: true

- name: Restart if not running
  service:
    name: myapp
    state: restarted
  when: svc_status.rc != 0

Common attributes: .stdout, .stderr, .rc, .stat.exists, .changed, .failed

10 How do you use Ansible Vault to manage secrets? Walk me through the workflow.

›

Ansible Vault encrypts sensitive data at rest using AES-256.

Encrypt a file:

ansible-vault encrypt group_vars/all/vault.yml

Edit in place:

ansible-vault edit group_vars/all/vault.yml

Run with vault password:

ansible-playbook site.yml --vault-password-file ~/.vault_pass
# or interactively:
ansible-playbook site.yml --ask-vault-pass

In CI/CD: Store the vault password as a masked CI secret. Pass it via temp file:

echo "$VAULT_PASS" > /tmp/vp && ansible-playbook site.yml \
  --vault-password-file /tmp/vp; rm /tmp/vp

Best practice: Name vault vars with a prefix: vault_db_password. Reference with a plain var: db_password: "{{ vault_db_password }}" — keeps code readable while keeping secrets encrypted.

Interviewers ask how you handle Vault in CI. Answer: vault password as a masked environment variable, passed via temp file — never on the command line where it appears in process lists.

🗂️

Roles & Project Structure 5 questions

11 What is an Ansible Role and what does its directory structure look like?

›

A Role is a standardized, reusable unit that bundles tasks, variables, templates, handlers, and files together.

roles/
  nginx/
    tasks/
      main.yml       # entry point, always executed
    handlers/
      main.yml       # triggered by notify
    templates/
      nginx.conf.j2  # Jinja2 templates
    files/
      index.html     # static files to copy
    vars/
      main.yml       # high-priority variables
    defaults/
      main.yml       # low-priority override-friendly defaults
    meta/
      main.yml       # role dependencies, metadata

Generate the skeleton: ansible-galaxy role init nginx

12 What is the difference between vars/main.yml and defaults/main.yml in a role?

›

defaults/main.yml — low priority: Variables here are sensible defaults you expect users to override. They lose to nearly everything: inventory vars, playbook vars, extra vars. Use for things like nginx_port: 80.

vars/main.yml — high priority: Variables here override inventory and most playbook vars. Use for internal role constants — package names, internal paths, version pins — that the role depends on to function correctly.

Rule: If you want users to customize it → defaults. If it's an internal implementation detail → vars.

13 What are handlers and when do you use them?

›

Handlers are tasks that only run when notified by another task, and run only once at the end of the play — regardless of how many tasks notified them.

tasks:
  - name: Update nginx config
    template:
      src: nginx.conf.j2
      dest: /etc/nginx/nginx.conf
    notify: Restart nginx

  - name: Update TLS cert
    copy:
      src: cert.pem
      dest: /etc/nginx/cert.pem
    notify: Restart nginx   # notified twice, runs once

handlers:
  - name: Restart nginx
    service:
      name: nginx
      state: restarted

If the config task reports changed: false (file unchanged), the handler is not notified — no unnecessary restarts. Idempotency in action.

14 What is Ansible Galaxy and how do you use it in a project?

›

Ansible Galaxy is the public hub for sharing roles and collections — think npm for Ansible.

Pin dependencies in requirements.yml for reproducible installs:

roles:
  - name: geerlingguy.nginx
    version: 3.1.0

collections:
  - name: amazon.aws
    version: ">=6.0.0"

ansible-galaxy install -r requirements.yml

In CI/CD: run this as the first step before any playbook. Commit requirements.yml to source control; never commit the installed roles directory itself.

15 How do you structure a large Ansible project across multiple teams?

›

ansible-repo/
  inventories/
    production/
      hosts.yml
      group_vars/
      host_vars/
    staging/
      hosts.yml
      group_vars/
  roles/
    common/        # baseline every server gets
    nginx/
    postgres/
  collections/
    requirements.yml
  playbooks/
    site.yml       # master (imports all)
    webservers.yml
    databases.yml
  library/         # custom modules
  filter_plugins/  # custom Jinja2 filters
  ansible.cfg

Key principles:

Separate inventories per environment — production and staging never share an inventory
Roles in shared repo or private Galaxy server for cross-team reuse
site.yml imports playbooks — lets you run the full stack or a single tier
Lint with ansible-lint in CI to enforce quality across teams

Mentioning ansible-lint in CI shows production maturity — it catches deprecated syntax, style violations, and common bugs before they reach prod.

📝

Templates & Jinja2 5 questions

16 What is the template module and how does it differ from copy?

›

copy transfers a static file to the managed node unchanged.

template processes a Jinja2 (.j2) file, substituting variables and evaluating logic, then transfers the rendered result.

server {
    listen {{ nginx_port }};
    server_name {{ ansible_hostname }};
    worker_processes {{ ansible_processor_vcpus }};
    {% if ssl_enabled %}
    listen 443 ssl;
    ssl_certificate {{ ssl_cert_path }};
    {% endif %}
}

Use template when config varies per host or environment. Use copy for truly static files.

17 What Jinja2 features do you commonly use in Ansible templates?

›

Variable substitution: {{ variable_name }}
Conditionals: {% if env == 'production' %}...{% endif %}
Loops: {% for host in groups['webservers'] %}{{ host }}{% endfor %}

Filters:

{{ my_list | join(', ') }}
{{ my_string | upper }}
{{ my_var | default('fallback') }}
{{ some_dict | to_json }}
{{ path | basename }}

Tests: {% if my_var is defined %}, {% if value is none %}

The default() filter is your safety net — use it whenever a variable might not be defined: {{ nginx_port | default(80) }}. Prevents "variable undefined" errors in edge cases.

18 How do you use loop in Ansible tasks?

›

loop (modern) iterates a task over a list. Each iteration exposes the current item as item.

# Install multiple packages
- name: Install required packages
  apt:
    name: "{{ item }}"
    state: present
  loop:
    - nginx
    - python3
    - git

# Loop over dicts
- name: Create users
  user:
    name: "{{ item.name }}"
    groups: "{{ item.groups }}"
    state: present
  loop:
    - { name: alice, groups: sudo }
    - { name: bob,   groups: docker }

# Loop with index
- name: Numbered output
  debug:
    msg: "Item {{ idx }}: {{ item }}"
  loop: "{{ my_list }}"
  loop_control:
    index_var: idx

19 What are Ansible facts and how do you use them in playbooks?

›

Facts are system information automatically gathered from managed nodes at the start of each play by the setup module (gather_facts: true by default).

Useful facts:

ansible_hostname — short hostname
ansible_os_family — RedHat, Debian, etc.
ansible_distribution, ansible_distribution_version
ansible_processor_vcpus, ansible_memtotal_mb
ansible_default_ipv4.address
ansible_date_time.date

# Conditional based on OS
- name: Install on Debian systems only
  apt:
    name: nginx
  when: ansible_os_family == "Debian"

# Use in templates
worker_processes {{ ansible_processor_vcpus }};

Custom facts: Drop a .fact file (JSON/INI) in /etc/ansible/facts.d/ on managed nodes — accessible as ansible_local.myfile.key.

20 How do you use set_fact to compute and pass variables between tasks?

›

set_fact creates or updates a variable at runtime — useful for deriving computed values or combining data from registered output.

- name: Get instance region from metadata
  command: curl -s http://169.254.169.254/latest/meta-data/placement/region
  register: region_raw

- name: Set derived facts
  set_fact:
    aws_region: "{{ region_raw.stdout | trim }}"
    deploy_tag: "{{ app_name }}-{{ ansible_date_time.date }}"
    is_primary: "{{ inventory_hostname == groups['appservers'][0] }}"

- name: Use derived fact
  debug:
    msg: "Deploying {{ deploy_tag }} to {{ aws_region }}"

Facts set with set_fact persist for the rest of the play on that host. Add cacheable: true to persist across plays.

Common pattern: register → parse stdout → set_fact → use in later tasks or templates.

⚡

Performance & Execution Control 5 questions

21 What is serial and how do you use it for rolling deployments?

›

serial limits how many hosts Ansible targets at once, enabling rolling deployments without taking down the whole fleet.

# 2 hosts at a time
- hosts: webservers
  serial: 2

# Graduated batches: 1 canary, then 5, then rest
- hosts: webservers
  serial:
    - 1
    - 5
    - "100%"

# Percentage-based
- hosts: webservers
  serial: "25%"
  max_fail_percentage: 10   # abort if 10%+ of batch fails

Pattern: deploy to 1 canary first, validate health, then roll out in batches. Ansible's native rolling deployment story.

Combine serial with a health check task (using uri module) after each deploy. If it fails, the block's rescue section can rollback before the next batch starts.

22 What is pipelining and how does it improve Ansible performance?

›

By default, Ansible creates a temp file on the managed node for each module. Pipelining eliminates the temp file by piping module code directly over SSH, reducing the SSH connections per task from 3 to 1.

Enable in ansible.cfg:

[ssh_connection]
pipelining = True

Also increase forks (default is 5 — very conservative):

[defaults]
forks = 50

And enable SSH connection reuse:

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

Combined result: 3–5x performance improvement on large inventories.

Pipelining requires requiretty disabled in /etc/sudoers on managed nodes. Most modern distros already have this disabled.

23 What does --check mode do and what are its limitations?

›

--check (dry run) runs the playbook without making actual changes, reporting what would change. Add --diff for line-by-line file diffs.

ansible-playbook site.yml --check --diff

Limitations:

command and shell tasks are skipped — their side effects can't be safely predicted
Subsequent tasks depending on skipped tasks' register output may fail or behave oddly
Some modules don't support check mode and always run

Use check_mode: false on individual tasks to force them to always run (e.g., a stat check that later tasks depend on).

24 How do you use tags to run only part of a playbook?

›

Tags let you selectively run or skip tasks.

- name: Install packages
  apt:
    name: nginx
  tags: [packages, nginx]

- name: Deploy config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  tags: [config, nginx]

Run only config tasks:

ansible-playbook site.yml --tags config

Skip package tasks:

ansible-playbook site.yml --skip-tags packages

Special tags: always runs even when other tags are specified. never only runs when explicitly called by tag.

In CI/CD: use tags to run only "deploy" on every push, and run the full playbook including "packages" only on new server provisioning. Much faster pipelines.

25 A playbook is running very slowly against 200 hosts. How do you diagnose and fix it?

›

Step 1 — Profile: Enable the profile_tasks callback to see per-task timing.

[defaults]
callbacks_enabled = profile_tasks

Step 2 — Increase parallelism: Default forks = 5 is very conservative.

[defaults]
forks = 50

Step 3 — Enable pipelining + SSH multiplexing (see Q22).

Step 4 — Use free strategy if tasks are host-independent:

- hosts: webservers
  strategy: free

Step 5 — Disable fact gathering where not needed:

- hosts: webservers
  gather_facts: false

Step 6 — Cache facts if you need them but don't want to re-gather: use fact_caching = jsonfile in ansible.cfg.

Profile first — don't guess. The profile_tasks output usually reveals one or two slow tasks responsible for 80% of runtime, often a slow command or unoptimized fact gather.

🔧

Error Handling & Debugging 5 questions

26 What is ignore_errors vs failed_when? When do you use each?

›

ignore_errors: true — continue the play even if this task fails. Task is still marked failed, but execution continues. Use sparingly — silently swallows real errors.

failed_when — define custom failure conditions based on task output. More precise and intentional.

# Custom success condition
- name: Check app health
  command: curl -s -o /dev/null -w "%{http_code}" http://localhost/health
  register: health_check
  failed_when: health_check.stdout != "200"

# Multiple conditions (AND logic)
- name: Run idempotent migration
  command: /opt/app/migrate.sh
  register: migrate_out
  failed_when:
    - migrate_out.rc != 0
    - "'already up to date' not in migrate_out.stdout"

Prefer failed_when over ignore_errors — it's explicit about what constitutes success vs failure rather than blanket-ignoring all errors.

27 How do you use blocks for error handling in Ansible?

›

Blocks group tasks and support rescue and always — analogous to try/catch/finally.

- block:
    - name: Deploy application
      command: /opt/deploy.sh

    - name: Run smoke test
      uri:
        url: http://localhost/health
        status_code: 200

  rescue:
    - name: Rollback deployment
      command: /opt/rollback.sh

    - name: Alert on-call
      slack:
        token: "{{ slack_token }}"
        msg: "Deploy FAILED on {{ inventory_hostname }}"

  always:
    - name: Record deployment attempt
      command: /opt/log-attempt.sh

If any task in block fails, rescue runs. always runs regardless. Enables robust deployments with automatic rollback.

28 How do you debug a failing Ansible task?

›

Layered debugging approach:

Verbosity: -v through -vvvv — each level reveals more: task results, module args, SSH commands, SSH connection details

debug module:

- debug:
    var: my_registered_var
- debug:
    msg: "Value is: {{ my_var }}"

Ad-hoc commands:

ansible webservers -m ping
ansible web1 -m shell -a "systemctl status nginx"
ansible web1 -m setup | grep ansible_os

Step mode: --step prompts before each task
Start at task: --start-at-task "Task name" skips to the failing point

29 What is delegate_to and when would you use it?

›

delegate_to runs a task on a different host than the one being targeted, while still using the targeted host's variables.

Common use cases:

De-register from load balancer before updating:

- name: Remove from ALB target group
  command: aws elbv2 deregister-targets ...
  delegate_to: localhost   # run AWS CLI on control node

Run DB migration on only one app server:

- name: Run migrations
  command: python manage.py migrate
  delegate_to: "{{ groups['appservers'][0] }}"
  run_once: true

delegate_to: localhost is the most common pattern — run AWS CLI, curl, or API calls from the control node while looping over your fleet.

30 A playbook worked yesterday but fails today with "UNREACHABLE". How do you diagnose it?

›

UNREACHABLE means Ansible couldn't establish an SSH connection. Layer-by-layer diagnosis:

Basic connectivity: ping host and ssh user@host manually from the control node
SSH key/credentials: Has the key been rotated? Has authorized_keys changed? Try ssh -i /key user@host -vvv
Firewall/Security Group: Did a recent change block port 22? Check cloud SGs, NACLs, OS firewall (iptables, ufw)
Host is down: Was the instance stopped, terminated, or replaced (Auto Scaling, spot interruption)? IP may have changed.
Dynamic inventory: Is it returning stale/wrong IPs? Run ansible-inventory --list and inspect output.
SSH timeout: Host is slow to accept. Increase timeout in ansible.cfg.

Always test with an ad-hoc command first: ansible hostname -m ping -vvv. The raw SSH debug output tells you exactly where the connection fails.

🚀

AWX / Tower & CI/CD Integration 5 questions

31 What is Ansible Tower / AWX and what does it add over CLI?

›

AWX is the open-source upstream; Ansible Automation Platform (Tower) is the Red Hat enterprise product. Key additions:

Web UI and REST API — run playbooks from a browser or trigger via webhooks/API
RBAC — control who can run which playbooks against which inventories, without giving SSH access or Vault passwords
Centralized credential management — secrets stored encrypted in Tower, never exposed to users
Job scheduling — run playbooks on a cron-like schedule for continuous enforcement
Audit trail — full log of who ran what, when, against which hosts
Workflow templates — chain multiple playbooks with conditional branching
Notifications — Slack, email, PagerDuty on job events

Tower is how you give non-engineers (developers, support teams) the ability to run approved playbooks without any Ansible knowledge or SSH access.

32 How do you integrate Ansible into a GitHub Actions pipeline?

›

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Ansible
        run: pip install ansible boto3

      - name: Install requirements
        run: ansible-galaxy install -r requirements.yml

      - name: Write SSH key
        run: |
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > /tmp/deploy_key
          chmod 600 /tmp/deploy_key

      - name: Write Vault password
        run: echo "${{ secrets.VAULT_PASS }}" > /tmp/vp

      - name: Run playbook
        run: |
          ansible-playbook deploy.yml \
            -i inventories/production \
            --vault-password-file /tmp/vp \
            --private-key /tmp/deploy_key \
            --extra-vars "app_version=${{ github.sha }}"

      - name: Cleanup secrets
        if: always()
        run: rm -f /tmp/deploy_key /tmp/vp

Key: use if: always() on the cleanup step so secrets are always removed even if the playbook fails.

33 How do you test Ansible roles before deploying to production?

›

Layered testing approach:

1. ansible-lint: Static analysis — catches deprecated syntax, style issues, and common bugs. Run in CI on every push.

ansible-lint roles/nginx/

2. Molecule: The standard framework for role testing. Creates ephemeral instances (Docker, Vagrant, EC2), applies your role, verifies with test assertions.

molecule test   # create -> converge -> verify -> destroy

A typical scenario:

Converge: Apply the role
Idempotence: Apply again — assert 0 changed tasks
Verify: Check service is running, config has expected values, ports are listening

3. Staging environment run: Always apply to staging with real infrastructure before production.

The idempotence check is Molecule's killer feature — it runs your role twice and fails if the second run shows any changed tasks, catching hidden side effects automatically.

34 How do you use Ansible with AWS? What collections and patterns are involved?

›

ansible-galaxy collection install amazon.aws community.aws

Key modules:

amazon.aws.ec2_instance — create, start, stop, terminate instances
amazon.aws.ec2_security_group — manage SGs
amazon.aws.s3_object — upload/download S3 objects
amazon.aws.aws_ec2 — dynamic inventory plugin, auto-discover by tag
community.aws.ssm_parameter — read/write SSM Parameter Store
community.aws.route53 — manage DNS during deployments

Authentication: Use an IAM Role on the control node (EC2 instance profile or ECS task role) or set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY environment variables. Never hardcode credentials in playbooks.

Dynamic inventory + tag-based targeting is the power pattern: hosts: tag_Environment_production automatically targets all EC2 instances tagged Environment=production.

35 Tell me about a time you used Ansible to solve a real infrastructure problem.

›

Use the 5-part framework tailored for Ansible stories:

1. Problem (30s): What was inconsistent, manual, or broken? "We had servers with no config management — every server was slightly different, deployments were manual SSH sessions, and we had no audit trail."

2. Solution design (45s): Why Ansible? "Agentless meant we could start managing existing servers immediately without bootstrapping. We wrote roles for OS hardening, app deployment, and service configuration."

3. Implementation detail (60s): Mention specific Ansible features. "Used dynamic EC2 inventory against tags, Vault for secrets, serial: 1 for rolling deploys, and handlers to only restart nginx when config actually changed."

4. Result (30s): Measurable outcome. "First deployment went from 45 minutes of manual SSH to an 8-minute automated pipeline run. Config drift went to zero."

5. What you'd do differently (20s): "We'd add Molecule testing earlier — we caught some idempotency bugs in production that Molecule would have caught in CI."

Draw from your Carvana, PetSmart, or ADT experience. Specific details — server count, time saved, what actually broke — are what make the story credible and memorable.