Chances are your website has a single file that is included on every page, like logo.jpg or something similar. You can use that filename to grep your Apache access logs, and with a few more commands turn it into meaningful statistics.
| Code: |
grep 'somefile' ./access_log | cut -d' ' -f1 | sort | uniq -c |sort -r |more
|
First grep searches your access logs for the file your specify, and pipes the output to the cut command which is using the ( -d' ' ) argument that makes it use a space as the delimeter, so only the first column is piped to the next command. Unless you have tweaked your access log layout, the first column will be the IP addresses of your vistors.
So the output of grep is piped to cut, then that output is piped to the sort command which as its name implies, sorts the IP addresses in numerical order.
Then the output of the sort command is piped to the uniq command, with the ( -c ) argument, which makes it remove duplicate IP entries, and prepend the output with the number of identical entries it found.
Then.. that output is piped to another sort command, this one with the ( - r) argument to reverse the sort, highest to lowest.
Finally.. all that is piped to the more command so you can page through the results
All of that, while possibly sounding very complicated and CPU intensive is lightening fast since all commands are small C programs. You'll be amazed at how fast it is and how well it deals with large log files.
| Code: |
grep 'somefile' ./access_log | cut -d' ' -f1 | sort | uniq -c |sort -r |more
The output would look like this ...
3096 127.67.140.157
84 127.253.32.44
72 127.70.17.106
55 127.247.106.104
46 127.223.115.155
41 127.70.54.129
|
Looks like 127.67.140.157 is up to no good, maybe trying to cripple my server, or a web crawler run amok. It'll make a nice addition to the server's firewall, under the deny list.