My Automation Journey, Part 2: Building in Ansible and Initial Impact

Last updated on August 23, 2021.

BlueCat invited John Capobianco, author of “Automate Your Network: Introducing the Modern Approach to Enterprise Network Management” to walk us through his journey of network automation. From the planning phase to deployment up the stack, John will cover the tradeoffs and critical decision that every network automation project should address – including the role of DNS. John’s opinions are solely his own and do not express the views or opinions of his employer.

Part 1:  Frameworks and Goals

Now that the necessary background of setting goals and building out an automation framework were in place, I was ready to move on to the actual work of building things out in Ansible.

Getting started with Ansible

Ansible requires very little in terms of setup, but be aware that the Linux workstation needs to be able to SSH into the network devices being automated. This might mean a change to firewall rules, depending on where your Linux box resides on the network. Your Ansible playbooks will also have to authenticate on the network device, which might mean a need for special RSA or other service accounts.

Once connectivity has been established, the first step is to create your inventory hosts.ini file. Then you’ll have to logically group your devices using their hostnames. The Linux box hosting Ansible will also need to be able to resolve these hosts using DNS. A sample hosts.ini file might look like this:

[ENTERPRISE:children]
CORE
DISTRIBUTION

[CORE]
CORE

[DISTRIBUTION]
DISTRIBUTION01
DISTRIBUTION 02

DISTRIBUTION45

Once our inventory file was created, we developed and distributed our end-state configuration file to each distribution switch. These files were always named after the hostname appended with _new_config. This way, we could use the hostname as a variable for looping over the devices in our playbook. One of the things we learned along the way is that {{ inventory_hostname }} is a special variable that can be used natively in Ansible. It references the device’s entry in the hosts.ini file.

The playbook itself was made up of the Ansible module ios_command. Gathering pre-config information in the form of show commands with output sent to files-per-device. Once the state of each vrf and the global routing table were captured, the playbook executed the Cisco config-replace command. Since we standardized the file name to be “<hostname>_new_config” we could use the hostname of each device as a variable that is replaced at runtime.

Ansible lessons learned

We expected major challenges with the Ansible module commands and making the automation work, but that did not turn out to be the case.  Instead, most of our problems were with mundane issues such as YAML syntax and spacing. Also, we had difficulties developing the playbook files on a developer workstation and having to move them  to the Linux box where Ansible could execute them. (We crudely were using FTP file managers for a point of time to manage this in the beginning, thinking like network engineers instead of developers.)  Every time we had code changes, we had to go through this exercise of moving the update source code to the Ansible Linux box for execution.

Some specific Ansible lessons learned that really helped us become successful:

  • Ansible playbook check mode
    • Check mode on Ansible playbooks lets you perform a ‘dry-run’ of the playbook. It runs as it normally would, but does not actually push the commands to the device – great for troubleshooting.
    • Intent – great for validating code and commands match what your intention is.
    • Combined with verbosity you can see all the output of the commands you want to execute.
  • Limiting the scope of Ansible playbooks
    • Limits the scope of the playbook.
    • Great for testing and getting started.
    • In our case we could limit to a single distribution switch, test the playbook in check mode, then remove the limit to perform against the whole [distribution] group.
  • Ansible playbook verbosity
    • Adding verbosity to playbooks exposed more of what is happening
    • Allows you to see exactly the commands or configuration each task in your playbook is performing

Deploying network automation

Once we had our playbooks working in a lab environment, we scheduled our production change. This is where we took the new artifacts and methodology from design and lab to production. All I had to do was update the inventory file to match the new environment and provide a simple guide for operations on where the pre-change information could be found, how the playbook worked, and where the post-information could be found. No core logic changes were required when porting over to the production environment. At the time, we did not have an automated core, so manual CLI instructions were still required for that layer.  At the same time, the distribution layer would all be handled by Ansible playbooks. I was also able to provide the check mode commands to operations so they could perform dry-runs of the playbook, giving them an opportunity to get a better understanding of the process.

Our change went something like this:

  • Distribute a desired-state configuration file to each device
  • Execute ansible-playbook pre_change_distribution_documentation.yml
    • Captures from each device all output from show commands
    • Stored in folder on Linux box
  • Execute ansible-playbook distribution_layer_change.yml
    • Performs the config-replace commands on each device
  • Execute ansible-playbook pre_change_core_documentation.yml
    • Capture Core state () using show commands
  • Manually modify the Core using the CLI following the steps provided
    • Complete campus changes on Core
  • Execute ansible-playbook post_change_core_documentation.yml
    • Re-capture Core state (manually) using
  • Execute ansible-playbook post_change_distribution_documentation.yml
    • Gather from each device all output from show commands a second time
  • Validate new topology from Core using show commands
    • Validate Distribution switches reform neighbors with the Core
    • Validate vrf routing tables on the Core
  • Use text-editor to compare output from pre/post automated tasks

Here are some samples to show how easy it is to get started in Ansible. As part of the pre-change documentation playbooks, we want to capture the OSPF neighbors from each device in an output file:

– hosts: ENTERPRISE

tasks:

– name: Capture pre-change OSPF Neighbors

ios_command:

provider: “{{ ioscli }}”

commands: show ip ospf neighbors

register: show_ip_ospf_neighbors_output

– name: Copy OSPF neighbors to file

copy:

content=”{{ show_ip_ospf_neighbors_output.stdout[0] }}”

dest=”../../documentation/pre_change_ospf/{{ inventory_hostname }}_pre_ospf_neighbors.txt”

Following the capture of pre-change information state on each device we then wanted to invoke the config replace commands. A sample from that playbook:

– hosts: DISTRIBUTION

tasks:

– name: Configure replace with new configuration

ios_command:

provider: “{{ ioscli }}”

commands: configure replace flash:{{inventory_hostname}}_new_config time 120

Note that in both of these playbooks we were using a soon to be deprecated “provider” dictionary object. This contained our connection details, including the RSA service account information used to authenticate against each network device. This methodology of connecting to the devices and credential handling is also something that we improved upon over time, but for the sake of this first playbook, this was the methodology that we used (hard coding credentials into the “ioscli” provider).

Initial impact of network automation

Overall, our first change involving network automation was a huge success. Nobody could have predicted the massive time savings involved. The development and deployment time saved by automating this change across 50 devices was remarkable. There were zero human errors during the deployment as it was the Ansible engine and well tested logic pushing the configuration changes. Our change window required to execute production changes was reduced from hours to minutes. The playbook itself ran in less than 2 minutes. It was so efficient that at first I thought it had failed.

The potential and power of this new framework was obvious. Given our initial success, we immediately decided that any large, complex, repeatable task would be automated using Ansible. However, despite our success there were many gaps identified in the general process and toolset used while developing the solution.

It was obvious we needed to find improvements in the following areas for network automation to be a viable solution that could scale with our needs:

  • Full automation
    • In retrospect the core, configuration changes themselves could have been automated with a specific playbook
    • At the time, we were unsure about reliability and were timid in making changes to the core using an unproven automation solution.
  • Tools
    • Development of code
      • The traditional textpad and notepad editors used by network administrators were not sufficient.
      • A more robust and feature rich development tool was required to create the YAML and other files used by the Ansible framework
    • Source control
      • We struggled with refactoring code and keeping consistent versions of the file on local desktops, developer workstations, and the Linux box where Ansible resides
      • We want to treat the Ansible playbooks as code, not as scripts
    • Distribution and mobility of code
      • Challenges with multiple team members working on the same code
    • Source of truth
      • No real central location to keep known-good versions of playbooks or code
      • Difficult to track work across many team members
    • Post-project
      • We had no real central repository aside from the Linux box where the final version of the playbook resides
      • Rich output of pre and post change information available
      • Nothing really tracked with history or version control

Looking ahead:  the role of DNS automation

Ultimately, the biggest lesson I took away from my first network automation experience was the potential to revolutionize enterprise network management. This goes far beyond Ansible. While having the underlying network infrastructure automated is a giant leap forward, the implementation of complete end-to-end automation requires upper layer protocols, particularly critical components like DNS.

The ability to automate DNS has even greater benefits to the organization than lower level automation because of its core integration to every connection made on an enterprise network. Underlying network infrastructure typically undergoes fewer changes than DNS, so automating the upper layers is a key component of any network automation strategy.

Read more