Fix for DSL latency issue

xrayspx's picture
Music: 

Siouxsie and the Banshees - Isreal

Office
Since upgrading to 7.6Mb/sec DSL, I've had an occasional issue where the Tubes get all slow-like. The modem will say it's connected at full speed, and that things are, highly technically, "GO!". However my latency is horrible. Even to my first upstream router I end up getting like 50+ms ping replies. Pinging my site or 4.2.2.2 or other "Things that are fast on the Internet" ends up well over 100ms.

This makes actually downloading anything very slow and shitty, and I have trouble getting more than like 30% saturation on the 7Mb link, even with multiple threads.

The first time I found this, I called Fairpoint support, reported every stat I could find on the router, and their highly technical suggest was to reset to factory and try again. Even though this /is/ what I do for a living, I never mention that to support people. There are reasons for that:

  • Their suggestions often work, even if their analysis isn't technically rigorous
  • I worked long enough as a technician at a small computer store, taking calls all day and fixing machines and networks to know that the guy who gets all "I'm a Sr. Systems Engineer at (Compaq, Oracle, Sun, Microsoft)" will always be wrong, and will always annoy me.
  • If my job and skills were relevant, I would have fixed it already
  • My experience is NOT in consumer-grade hardware anymore. Just because I know how to troubleshoot this issue on a Cisco 6500 does not mean I know how to deal with their Westell DSL modem
  • I try not to be a complete self-important cock to everyone I ever talk to

Bearing all this in mind, I did their suggested Burn-The-World-Down-And-Rebuild-Using-Silicon-Instead-Of-Carbon approach. It worked, of course. In my world, great internal connectivity + horrible upstream connectivity = "Problem with the line (noise/crosstalk, since it is a phone line?)" or "Problem with the upstream gateway" or some other connectivity/carrier issue. In the world of consumer hardware, it could really be anything.

The next few times I had issues, I was usually too burned out to care what the problem was and just noted down my config, I've never tried to save and restore a config file, and rebuilt from factory settings.

Tonight, my mood was a little different. I've been having a far more frustrating packet-loss problem in my work which has involved me spending 1/2 a month standing in a datacenter babysitting carrier guys and all banging our heads trying to get a clean test on a new circuit. I'm still exhausted, but I decided I wasn't going to live with this "Factory Reset and Rebuild" anymore. My home connection has been crappy for like 3 weeks but I've been too tied up to deal with it, tonight it's fixed. The problem seems to be in using my DSL modem as a bridge.

My network might be a little non-standard, I don't know, but it works great for me in general. It goes like this:

  • Internet into Westell 7500 DSL modem.
  • DSL modem runs: Open WiFi network that anyone can connect to, with a small-ish DHCP pool which NATs to the Internet; DNS; Bridge (DMZ Host) connection to a Linksys broadband router.
  • Linksys: Has a public IP address (DMZ-Host); Registers itself with DynDNS; Port forwards ssh to an internal Linux host; Runs a different WPA2 WiFi network; DHCP server; Default gateway for my internal LAN.

To start troubleshooting, I first just rebooted the modem, because, well, you know... That didn't work, same horrible latency. Then I started looking for anything that grows and which I'm able to delete. Since I don't make configuration changes to this thing like, ever, I figured logs or route cache, ARP cache/MAC table, DNS cache, or some DHCP crap might be bumping up against some memory limit or other and causing problems. Nothing I could clear out helped at all, and with all the reboots, it should have forgotten anything it had cached anyway.

I decided to drop the Bridge/DMZ-Host and just test using the internal DHCP Public-Wifi network. That worked. With no bridging, my latency went way down. I re-enabled the DMZ-Host setting, and my latency stayed low. I'm now able to saturate the link easily and everything seems great.

Now I don't know WTF the actual root cause is, but this is definitely a much quicker way of fixing he problem, only resets the modem twice, and doesn't involve me going back in and setting up DHCP, bridging, all my DNS records (blackholes) all over again. Yay! I have to wait another 6 or 8 months for it to get bad again before I'll know if this is a sure-fire fix or not.

If anyone has a better answer or an actual reason for this, I'm all ears.

Fixed Tags: