Sphere of Inconvenience

So... As I was putting together my most recent post regarding IPv6 I got to thinking about how many computers I use every day. It started as I counted up how many things in my house use IP addresses. From here forward I will refer to anything that uses an IP address as a computer for simplicity (yes, that means that in this context my iPhone is a computer, as is my Tivo, and my Linksys wireless access point).

Then I started to think... How many computers do I inconvenience on any given day? If you think of every website you go to in a day, plus the server that serves up those ads on those sites, plus all of the routers in between. And then add in the fact that most sites actually have more than one server behind a load balancer and have back-end services that the front end talks to and probably a separate database (or 3) and your connection gets logged and put in a database for somebody to write reports about and, and and... Phew... That's probably a lot of computers.

So I decided to count what I could. There is no way to know how many servers at google are required for my request, or what google analytics is going to do with it, but I /can/ count the external IP that I hit. So here's what I did. I created a cron job that looked at all established IP connections, logged them, and spat out only unique IP addresses. That took care of all of the things that I connected to. Then I took that output and ran a traceroute to each of those IPs, took that output and spat out the unique IPs there, and counted them up. Obviously, there's a huge margin for error here due to a lot of routers that won't respond to my traceroutes, but it gives me a little insight.

And what did I end up with?
1 Day: 404 connections for a total of 1107 including routers
2 Days: 728 connections for a total of 1744 including routers

Wow. Over one thousand machines per day are touched by my daily activities from my laptop alone. And I don't BitTorrent or Skype or use any other P2P app. I also don't social network. And I don't /think/ I'm a heavy web surfer...

Fixed Tags:

Comments

xrayspx's picture

My computers have decided to eat each other today. I noticed my Mac Pro had no network connectivity tonight when I got home. From the looks of things it had lost it about 2:00 this morning.

After 15 minutes of troubleshooting the big Mac, my Linksys router, a Macbook Pro (which COULD get DHCP, and reach the Internet, but nothing else on the internal network [this was the huge clue that led to the answer]), an Asus EEE HTPC, I finally found that the problem is my *powered off* DirecTV DVR. It's flooding my network and flooding the switch in the Linksys. That shows us that:

A:) DirecTV pushes some SHITTY code
B:) Linksys doesn't put very good switches in their routers
C:) I'm a lazy ass because I unplugged the DVR rather than sit here and sniff it for 3 hours to figure out WTF.

As to your post, it doesn't surprise me at all that you should touch 1000 machines, minimum, per day. Any website you hit, after loading tracking gifs, third party cookies and ads, probably puts you in contact with 10 or 15 FQDNs at least. Load Fark's homepage and sit and watch what it does sometime.

xrayspx's picture

That was kind of interesting to me, so here's how I'm tackling it:

tcpdump -n | awk '{print $3,$4,$5}' | grep -v "Request who has" | grep -v "::" > nic.txt

grep "> 172.16.0.187" nic.txt | awk -F "." '{print $1"."$2"."$3"."$4}' | sort | uniq -c > fromhosts.txt

That gives me a list of hosts that sent me data, reverse it for a list of host I sent data to:
grep "172.16.0.187 >" nic.txt | awk -F "." '{print $1"."$2"."$3"."$4}' | sort | uniq -c > tohosts.txt

for host in `cat ./tohosts.txt | awk ' {print $2}'`; do for hops in `traceroute -q 1 -w 1 $host | awk -F "(" '{print $2}' | awk -F ")" '{print $1}'`; do echo $hops >> routers.txt; done; done

The line for getting the tohost.txt file doesn't work because of port numbers, really you'd have to awk them out of the original capture to make that file work, which is probably fine. Anyway, fromhosts and tohosts give you lists of all the IPs you hit and how many times you interact with them, at the end you can do a "cat routers.txt | sort | uniq" to give you only uniq IPs of routers, because the first bunch are always going to be the same obviously.

How much cleaner was your perl? I think I'll get a better capture of every interaction since I'm dumping the list from a constant tcpdump over a day instead of taking snapshots with a cron job that I assume grabs netstat data of what's connected right now?

I tested for 30 seconds and refreshed tabs containing Fark, Last.fm, Flickr, Twitter and FB. I got 53 unique hosts sending me data, and 218 unique routers. From reloading FIVE. TABS. Yikes.

I used -w 1 and -q 1 on the traceroute to reduce the wait time for that line to run. Basically by default traceroute waits 5 seconds for each hop to respond, and repeats the request 3 times. This only makes one request, and only waits 1 second. That way the max time you'll wait per traceroute is 64 seconds if zero hops respond but you never finish the trace either. You'll lose a couple routers here and there, but it saves SO much time. I don't think doing traceroutes on a days worth of data gathered this way is really a smart thing to do on a work computer :-)

I fully intend to run this for a full day on a few different systems to see what results I get, then maybe make a map of them, just to do it, that would be fun, right?

I swear... He always takes some simple thought and engineers it all to hell. That's why I love him.

I think you forgot to uniq the routers though ;)

~Sean

xrayspx's picture

Yeah I just dumped the file, because you can't really just stick | sort | uniq | in that command line. It's just going to drop in unsorted in the order of each traceroute in succession. Besides, I thought it would be good to be able to do a count on them and say "wow I hit this random router 1300 times today" or whatever.

I'm still too chicken to do this on a whole day of data, I might get fired for doing 100,000 traceroutes all at once. I think I would want to de-serialize it in that case too.

Yeah... I just noticed that you also said that you should uniq it but didn't.

Perl looks like crap:


#! /usr/bin/perl

use strict;

my $source = "/root/uniqconns.txt";
my @ips = &read_file($source);
chomp @ips;
my %seen;
my @uniq;

foreach my $ip (@ips) {
push (@uniq,$ip);
my @out = `traceroute -n -w 1 -q 1 $ip | grep -v traceroute | awk \'\{print \$2\}\'`;
foreach my $ip (@out) {
$seen{$ip} += "1";
}
}
@uniq = keys %seen;

my $count;
foreach my $ip (@uniq) {
$count += "1";
}
print "You have inconvenienced $count IPs.\n\n";

sub read_file {
my ( $f ) = @_;
open (F, " my @f = ;
close F;
return wantarray ? @f : \@f;
}

I ran this at home, so I didn't care about all the traceroutes, not to mention the time. I just kicked it off and checked back hours later.