I was recently working a project where I was significantly redesigning the WAN infrastructure for a large international organization. This organization has between 400 and 500 sites, and many different methods of potential connectivity. The issue that we were solving was that path selection wasn’t always predictable, especially in failure scenarios, and asymmetric flows caused havoc on some of their applications.
Due to the large nature of the change, it was split up into smaller components (phases) and planned to be implemented across a rather large window in a single day. We had a need to verify that we had a consistent environment at the end of each of each phase, before moving on to the next. We also needed a tool to rapidly test a multitude of failure scenarios during failover/user acceptance testing. The other requirement is that this testing happen from many perspectives. I needed to not only verify connectivity, but that the flow of traffic in both directions was selecting a consistent path.
Because of these requirements, I decided to use TCL (eventually EEM-TCL) and run the script from the routers and switches themselves. I used this approach, because it wasn’t practical to set up machines at each of these locations to run the script locally. This ended up working incredibly well for our purposes and since I couldn’t find any good pre-built examples of what I was trying to do, I’ve sanitized the script and posted it for others to use. The functional information on how it runs is on the GitHub page. Simply put, the script is pre-configured with a bunch of test targets that have ping and traceroute run against them. Output is simplified so only relevant information is displayed and so you can quickly determine if there is an issue. For the change referenced above, there was about 20-30 different target IP addresses (internal and external) that gave a sampling of the different types of locations/connectivity types.
In larger organizations I could see this script could also being used as a level 1 troubleshooting tool. Run the script, and validate against a “known good” output before escalating to an upper tier. Escalation could have a copy of the output which would potentially point to trouble areas in the network. You could also take this script and have it run whatever commands you like. TCL gives you some pretty powerful options for variable handling and output filtering, so this could just be a good starting point on setting up whatever validations you would like to run.