Network Connectivity Issue - ARP

A recent project involved converting a small public LAN in a colocation facility into a private LAN behind a Cisco ASA firewall.  Two of the main reasons for doing so was to allow us to centrally manage host security as well as "move" IPs via simple NAT changes.

We purchased a new switch and the Cisco ASA appliance and did a rough configuration on our bench.  On the switch we created two simple Layer 2 VLANs to create a public and private virtual switch.  On the external LAN was our IP-based KVM and the external firewall interface.  For the migration of servers behind the firewall, we uplinked the external interface of the firewall to the existing switch in our cabinet.

First we moved our test server behind the firewall.  We created a private network and matched the last octet of the private IP network to the last octet of the public IP.  Using the default Port Address Translation (PAT) configuration to the external interface of the fierwall, our server was able to successfully connect to the internet.

Next I created a one to one NAT relationship on the ASA to map the previous external IP of the server to the new IP behind the ASA.  However once I completed that, I noticed the server lost internet connectivity.  We verified security rules were not a factor and even created an "allow all" rule for the purpose of troubleshooting.  Here were some of the items we found in testing...

  • Server behind the ASA could ping the ASA internal interface
  • Server behind the ASA could ping a host on the external network of the ASA interface
  • Server behind the ASA could NOT ping a host on the internet.
  • Server on external subnet (same as ASA) could ping the NAT'd public IP.
  • Servers from other locations could NOT ping the NAT'd public IP.
  • Switching back to PAT resolved the connectivity issue

We did quite a few things to try to discover the cause of this issue.

  • Restarting the Firewall had no effect
  • Recycling the Switch had no effect
  • Moving the uplink and firewall external interface to the external VLAN of the NEW switch had no effect

We had ruled out the firewall as a cause since we could traverse from external to internal on the ASA and had created the "allow all" test rule. 

Anyway, we went home for the night.  The next morning the issue was still present.  Around lunchtime though, it all suddenly started working.   The idea of an Address Resolution Protocol (ARP) issue came to mind.  However we had rebooted both the old and new switch as well as the firewall.  Also the LONG time after where it started working... seemed TOO long to be an ARP cache issue.

What really nailed it was when we tried a one to one NAT of an IP we knew was previously not used.  VOILA, worked right away.  We spoke with one of the NOC engineers about the idea of the uplink router having an ARP cache problem. 

Later that day, we picked an extra public IP that we wanted to move.  We created some ping tests that we ran continuously.  NAT'ing the new address to the server behind the firewall created the same symptons, no internet access but the test server could ping our server on the outside of the firewall and vice versa.  We asked the NOC engineer to clear the cache.

PROBLEM SOLVED.  The upstream router's ARP cache was causing the issue.  We were blaming ourselves, but by testing as many scenarios as possible we were able to establish it was not in fact our issue.