First Half of Python for Network Engineers

It’s been non stop for 5 weeks of training, but this week we had a week off so I thought I would post this.

I was able to get my work to fund the Python for Network Engineers course taught by Kirk Byers. 

https://pynet.twb-tech.com/class-pyauto.html

I had taken the free class a couple of times and learned quite a bit. I thought that being able to take the paid course would give me a better understanding of things related to python and how to handle some of the more complex things that I want to do. I really want to be able to take advantage of more automation in our environment and make things work better/easier with fewer chances for errors. I also want to empower my Helpdesk to be able to do more things, we are a very small shop with a large footprint of stores/offices. We have deployed Meraki to almost all of the locations so being able to take advantage of python/rest apis has been a great benefit so far. However I feel there is more that I can do, I just need some more training. Also the more stuff I can give to my Helpdesk the less they have to call me for and I can try and get some more sleep(as though that would happen). 

I have really enjoyed the first half of the class and learned quite a bit so far in just using Netmiko, textfsm, and jinja2. The other part that is nice is the community of people that Kirk has put together so that we can all learn off from each other and exchange ideas and questions. Between using Slack and some group channels there has been a lot of good comments/questions exchanged back and forth. 

As for the class Kirk’s videos have been informative and I have found a lot of useful information in them. His examples have been good and have shown some real life information in working with equipment. Not diving into actual network engineering, but showing some information in relation to real life data/examples. I have also found the exercises he has assigned us to be challenging and quite good. I have picked up some good ideas from them and it has pushed my learning and understanding of python.

In all I am really enjoying it and can’t wait for the next half and to see how my python programming improves.

Meraki Script to pull LTE Card Signal

Script for pulling the make and signal strength of wireless cards

We are trying to continually audit our LTE cards in the Meraki Routers so we wanted to be able to monitor the stores LTE connections and see the signal strength and then determine which if any needed to be swapped out. However that data is only stored at the device level so you have to iterate through the whole Organization then by network and then by device in the network. Meraki has a polling limit for how many times you can poll the cloud per second so I put a 1 second delay in there to keep the program from overwhelming everything and causing issues for itself or for our users monitoring on the website.

The script can be found here:

https://github.com/undrwatr/MERAKI_CARD_SIGNAL

How I handle credentials and shared variables in Python

How to handle common variables between programs

I have been writing a lot of python programs lately for interacting with the Meraki Platform. I was tired of copying and pasting my variables and credentials between programs, plus I wanted the ability to easily upload the programs to GitHub without having to worry about sanitizing the program of my companies or personal data. I did some searching and didn’t find a lot so what I figured I would do is put this information into a python module and then I could call that module from within my programs and then I wouldn’t have to worry about keeping all of my data secure. I decided to call my module cred.py and then I could call it from within the program with just a “import cred”. I used to copy this file into each of the directories where I was working on a program. Then I ran into a problem where I had to change an API key, I then had to go through and find all of the cred.py files I had created and then update the data in them. That proved to be more of a pain than I wanted to deal with so I decided to place it in a central directory for all of my programs. This proved much easier, but then I had to figure out how to call it from within Python without making it a module in the install path.

That is where I came up with this:

import sys

#Import the CRED module from a separate directory
sys.path.insert(0,’../CRED’)
import cred

With this it allows me to keep one central directory to store all of my credentials, but also commonly needed variables. I call it from within the program and can then run my programs easily. Love to hear how others are handling this or if there is a better way for me to do it.

Moving Cisco UCS to 10G Interfaces

We initially implemented our Cisco UCS chassis and FI’s with 4 port channels each with 2x1Gb interfaces connected to our Cisco Core Switches. Now we are in the process of moving from a dual Cisco Core to a Juniper Virtual Chassis Core, more on that later in another post. Part of getting the new core was finally getting 10G for our network. We had been surviving just fine with our current network connectivity, but figured it wouldn’t hurt to get 10G and connect whatever we could to it.

What I could not find searching around the internet was how the UCS FI’s were going to handle the additional links and how the traffic would move over. I was afraid it would do some sort of Spanning Tree blocking and not allow them to pass traffic. However I realized after checking the existing links that I had two from each FI and both of them were actively passing traffic and neither was in a blocking state.

I then went ahead and started to plan the turnup of the links. For us the majority of our servers are sitting in the UCS environment from bare metal linux and windows machines, to our 500 Guest VMfarm. With so much crucial infrastructure we wanted to make sure we didn’t have any downtime or lose any traffic during the transition. So as part of the planning I built out a python script that would ping a list of known addresses to ensure they were all up and on the network as each part of the plan was completed. I wanted it be useful across any platform so that the code was reusable, so I made some allowances for the different versions and the unique requirements of each OS. The only requirement is a file called hosts.txt with your ip addresses in it you want to ping. Its multi threaded so it will run a lot of the pings at the same time and complete it as soon as possible. Then you just need to go through the output and look for anything that is failing.

 

 

#!/usr/bin/python
import sys
import os
import platform
import subprocess
import threading
plat = platform.system()
scriptDir = sys.path[0]
hosts = os.path.join(scriptDir, 'hosts.txt')
hostsFile = open(hosts, "r")
lines = hostsFile.readlines()
def ping(ip):
    if plat == "Windows":
        ping = subprocess.Popen(
            ["ping", "-n", "1", "-l", "1", "-w", "100", ip],
            stdout = subprocess.PIPE,
            stderr = subprocess.PIPE
        )
    if plat == "Linux":
        ping = subprocess.Popen(
            ["ping", "-c", "1", "-l", "1", "-s", "1", "-W", "1", ip],
            stdout = subprocess.PIPE,
            stderr = subprocess.PIPE
        )
    if plat == "Darwin":
        ping = subprocess.Popen(
            #["ping", "-c", "1", "-l", "1", "-s", "1", "-W", "1", ip],
            ["ping", "-c", "1", "-s", "1", "-W", "1", ip],
            stdout = subprocess.PIPE,
            stderr = subprocess.PIPE
        )
    out, error = ping.communicate()
    print out
    print error
for ip in lines:
    threading.Thread(target=ping, args=(ip,)).run()

For the migration we took the subordinate FI and brought up the 10G interface as we watched traffic flow over it we then ran the ping script a couple of times. We then started to shutdown the trunk interfaces each time running the ping script looking for issues. After we had the Subordinate FI moved over to the 10G and new core we then then did the same with the Primary FI. I was happy to find that at no time did we lost any connectivity to our hosts/guests and that everything went smoothly.

3750 Unicast Flooding Issue

Since I ran into this issue and wasn’t really able to find anyone posting on this I thought I should put something together for anyone else that runs into it. I had an issue with a stack of 3750x switches where there was unicast flooding to all of the ports in the same VLAN. While doing research I came across suggestions of asymmetric l2 routes and timeout values for the arp tables and tcam table overruns. My issue turned out to be none of these, the arp timeout values where all increased and that didn’t solve the problem. My network if farily simple with a collapsed core and l2 asymmetric routing wasn’t the issue. The tcam tables were different not being overrun on this switches as it can handle 8K arp entries and I am no where near that.

So what did that leave me with? An issue where the ARP tables of all members of the stack were not getting updated in a timely manner. As seen below with the following command:

remote command all sh mac add count | i Total

Switch : 3 : (Master)

———————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 22

Total Mac Addresses    : 28

Total Mac Addresses    : 178

Total Mac Addresses    : 0

Total Mac Address Space Available: 6402

Switch : 1 :

————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 162

Total Mac Addresses    : 22

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 28

Total Mac Addresses    : 0

Total Mac Address Space Available: 6418

Switch : 2 :

————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 165

Total Mac Addresses    : 22

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 28

Total Mac Addresses    : 0

Total Mac Address Space Available: 6415

Switch : 4 :

————

Total Mac Addresses    : 152

Total Mac Addresses    : 585

Total Mac Addresses    : 39

Total Mac Addresses    : 381

Total Mac Addresses    : 384

Total Mac Addresses    : 22

Total Mac Addresses    : 28

Total Mac Addresses    : 140

Total Mac Addresses    : 0

Total Mac Address Space Available: 6440

After many hours of troubleshooting with TAC, they finally came to the conclusion that we were hitting bug:

CSCut64281    Ports on Member of the stack takes long time to learn/age MAC addr

This was only evident in the 15.1x code train, this issue didn’t exist in 15.0 which is why some of my older switches weren’t seeing it. Only the brand new shiny ones I had installed last year. The fix was finally available in the last couple of months in 15.2.(2)E3. I finally finished testing the release on some slightly non prod switches and then decided to roll out to my campus, now I am seeing the following in the upgraded switches:

Switch : 3 : (Master)

———————

Total Mac Addresses    : 142

Total Mac Addresses    : 536

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 21

Total Mac Addresses    : 27

Total Mac Addresses    : 150

Total Mac Addresses    : 0

Total Mac Address Space Available: 7151

Switch : 1 :

————

Total Mac Addresses    : 142

Total Mac Addresses    : 535

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 21

Total Mac Addresses    : 27

Total Mac Addresses    : 151

Total Mac Addresses    : 0

Total Mac Address Space Available: 7151

Switch : 2 :

————

Total Mac Addresses    : 142

Total Mac Addresses    : 534

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 20

Total Mac Addresses    : 27

Total Mac Addresses    : 150

Total Mac Addresses    : 0

Total Mac Address Space Available: 7154

Switch : 4 :

————

Total Mac Addresses    : 142

Total Mac Addresses    : 535

Total Mac Addresses    : 44

Total Mac Addresses    : 70

Total Mac Addresses    : 30

Total Mac Addresses    : 21

Total Mac Addresses    : 27

Total Mac Addresses    : 146

Total Mac Addresses    : 0

Total Mac Address Space Available: 7156

While not perfect it definitely seems to be a lot better than the previous reports. I keep looking for the bug to be posted on Cisco’s site, but it is still private at this point.

Troubleshooting Websense as Proxy for site access

I recently had to troubleshoot a problem with a client going through Websense as a proxy and trying to gain access to a site. The site has at https://somesite.com:11001. Every time I would go to the site I would just get a “Page could not be displayed”. I then wen through and started troubleshooting from the Websense side and couldn’t see anything in the interface itself, so I went to the log server and then stopped the logging service and ran it from the commandline with just the client I was testing with. However this didn’t even show that there was a hit from the client. I then had to go to the next level and troubleshoot with a packet capture and Wireshark. Once I was able to capture the traffic I could see that Websense was returning an error that the browser wouldn’t display. The issue came down to using https on port 11001 which wasn’t allowed in the Content Gateway on the Websense appliance. Once I added that I was able to browse successfully to the site and have it show up in the log server.

So below I have summarized the steps for someone else needing to do this type of troubleshooting.

How to use the Websense testlogserver to troubleshoot problems and limit the information that is seen:

  1. Log into the logging server
  2. Stop the “Websense Log Server” service
  3. Go into the c:\program files (x86)\Websense\Web Security\bin folder and run the testlogserver.exe -onlyip (ip address you want to see)
  4. You can now surf the site from that machine and see what errors are showing up in the log server to help determine the problem.
  5. If you need to go to another level then run a packet capture from the machine using Websense as an explicit proxy in your browser. You can then limit the capture to just the Websense IP.
  6. Once you have gone to the site you can then look at the packet capture and search for “http contains (site you are going to)”.
  7. You should be able to then decode the http stream and see all of the headers and information returned. This should help you in troubleshooting the issue.

Setting up Ansible on OS X for deploying VPN Configurations

While my day job is as a Network Engineer I also moonlight as a Network Engineer and consult for a couple of people. Most of that work involves VPNs for remote office connectivity. I’ve been able to get most of my clients to standardize their VPN devices on Cisco 5505’s. While the configurations are a bit different most of the items are the same. So this lends itself to a templating system. I’ve also found that at times when a client calls with frantic call saying that a device has died they find it reassuring when I can immediately dial in and upload a configuration and have them back up and running in a quick time. I went through Kirk Byer’s Python course and then got his emails about setting up Ansible. I thought that this might be a great way to build a standard config per client and keep their configurations in a template file so that I can regenerate their configurations as needed. I am also running this on my macbook pro so that I have it wherever I am.

Kirk put together some wonderful walk through’s of setting up Ansible and getting it to work:

https://pynet.twb-tech.com/blog/ansible/ansible-cfg-template.html
https://pynet.twb-tech.com/blog/ansible/ansible-cfg-template-p2.html
https://pynet.twb-tech.com/blog/ansible/ansible-cfg-template-p3.html

Since I wanted to use a password for the SSH connection, I did have to load SSHPASS for OS X:

http://thornelabs.net/2014/02/09/ansible-os-x-mavericks-you-must-install-the-sshpass-program.html

I took my standard config and put it into the templates directory after that I then modified it to handle removing DHCP and then variabilize key aspects of the config so that I could put those into the vars/main.yml and customize per site.

The issue I ran into was when trying to run the Playbook, Ansible didn’t like the PSK for the L2L VPN.  I found that it needed to be enclosed in single quotes in the /vars/main.yml file. This way there wasn’t an issue with the characters in the PSK and the config would generate correctly.

After this it is a simple copy and paste and the vpn device is ready to go.  The other thing that this helps with is when doing upgrades from older devices such as 501’s.  I can now just modify the template that is used and I get the new config for the 5505 with very minimal error checking and a much better standardized config.

Mac, Python, paramiko, all in a days work

I am trying to learn Python as I think it will be good for my day job. I bought a couple of books, but I am someone that learns by doing. I found some good scripts out on the internet that I wanted to modify and make use of. However I am also a mac user and so I wanted to be able to run these scripts on my Mac so that when I wanted I could run them from where ever I might be. I do on occasion travel to sites and do some extra curricular activities that might require this ability. So the mac has Python pre installed, it’s version 2.7.5, which seemed sufficient for my needs and what I wanted to do. The script I wanted to play with needed the paramiko module. I was able to download it and extract from here:

https://github.com/paramiko/paramiko

That was easy, however to install it said if I had setuptools would be best. So I found this site:

https://pypi.python.org/pypi/setuptools#unix-including-mac-os-x-curl

And was able to find a command to download and install setuptools.
***Make sure you are root, you will have a much better time of it.***

curl https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -o – | python

So that installed correctly, however when I went into python and did an “import paramiko” I was told I needed a crypto module. I then went out and found this:

https://pypi.python.org/pypi/pycrypto

Downloaded it and of course I couldn’t use setuptools for it, it needed to be built and then installed. So that required me to get Xcode 5.1 for the cc compiler and load that on my machine. That was straight forward enough. So after the Xcode install I then ran:

python setup.py build

But I was getting this error:

error: command ‘cc’ failed with exit status 1

Turns out there is an issue with Python and Xcode 5.1. The fix for that is to run the following before doing the build and install:

export CFLAGS=-Qunused-arguments
export CPPFLAGS=-Qunused-arguments

Once that is done you can then go into the pcrypto folder and run:

python setup.py build
python setup.py install

Now you have everything you need to use paramiko to ssh into a cisco device from a mac and run some commands or do whatever it is you want.

I did find one other thing that is needed and that was as part of the connect string for paramiko, I needed to specify “allow_agent=False,look_for_keys=False” as part of the string. If I didn’t then I was getting password errors on the cisco switch I was testing with.

ssh.connect(‘x.x.x.x’, username=’name’, password=’password’, allow_agent=False,look_for_keys=False)

All in all it was a very educational day and I think some hours well spent. I am now going to take my scripts and look to put everything into variables and also specify some lists so I can run it against multiple machines.

Using Mcafee Enterprise Security Manager to monitor Anyconnect Groups

We use Anyconnect for our Remote Access solution and one of the issues I have is with other admins not putting people into the right groups or not putting them into groups at all.  So then what happens is they get stuck into the DfltGrpPolicy, which is definitely not where I want them since that doesn’t have the customization for each of the different groups.

Since Mcafee Enterprise Security Manager is monitoring my ASA and all of the logs are going to it, building an alert to notify me when people are in this group shouldn’t be an issue.  Here is what we are going to do:

Identify the group, in this case I want to know when people get assigned the DfltGrpPolicy policy.

Now what we are going to do is build an alarm that fires when it sees DfltGrpPolicy in the object field.  Since no one should be pulling this group for any reason anyone that is in it is misconfigured and needs to be moved to the appropriate group.

1. Go to System Properties and then click on Alarms

2. Choose an assignee and give the alarm a name.

3. Choose, Type: Field Match, Field: Object, Value: DfltGrpPolicy, Select your device of your Cisco ASA.

4. Choose your actions, in my case I am going to have it email when it sees that.  I am also going to customize the template so it only sends me the relevant information, such as user misconfigured and group they are in.

5. I leave Escalation blank

6. Click Finish and you are done.

Now sit back and admire your work and you should now be alerted for people misconfigured.