Find programmers and grapic design experts at ScriptLance.com
<? DEW-CODE.COM  
Dew-Code
Welcome, Guest
Please Login or Register.    Lost Password?
Quick web log analysis with grep and friends (1 viewing) (1) Guest
Post your BASH related questions or observations here. Code snippets are always welcome.
Go to bottom Post Reply Favoured: 0
TOPIC: Quick web log analysis with grep and friends
#474
Dewed (User)
Administrator
Posts: 220
graph
User Online Now Click here to see the profile of this user
Dew-Code.com
Quick web log analysis with grep and friends 3 Months ago Karma: 7  
Chances are your website has a single file that is included on every page, like logo.jpg or something similar. You can use that filename to grep your Apache access logs, and with a few more commands turn it into meaningful statistics.

Code:

grep 'somefile' ./access_log | cut -d' ' -f1 | sort | uniq -c |sort -r |more
First grep searches your access logs for the file your specify, and pipes the output to the cut command which is using the ( -d' ' ) argument that makes it use a space as the delimeter, so only the first column is piped to the next command. Unless you have tweaked your access log layout, the first column will be the IP addresses of your vistors. So the output of grep is piped to cut, then that output is piped to the sort command which as its name implies, sorts the IP addresses in numerical order. Then the output of the sort command is piped to the uniq command, with the ( -c ) argument, which makes it remove duplicate IP entries, and prepend the output with the number of identical entries it found. Then.. that output is piped to another sort command, this one with the ( - r) argument to reverse the sort, highest to lowest. Finally.. all that is piped to the more command so you can page through the results All of that, while possibly sounding very complicated and CPU intensive is lightening fast since all commands are small C programs. You'll be amazed at how fast it is and how well it deals with large log files.
Code:

grep 'somefile' ./access_log | cut -d' ' -f1 | sort | uniq -c |sort -r |more The output would look like this ... 3096 127.67.140.157 84 127.253.32.44 72 127.70.17.106 55 127.247.106.104 46 127.223.115.155 41 127.70.54.129
Looks like 127.67.140.157 is up to no good, maybe trying to cripple my server, or a web crawler run amok. It'll make a nice addition to the server's firewall, under the deny list.
 
Report to moderator   Logged Logged  
 
Last Edit: 2008/08/15 09:56 By Dewed.
 
Nothing to it but to Dew it !
Dew-Code.com
  The administrator has disabled public write access.
Go to top Post Reply
Powered by FireBoardget the latest posts directly to your desktop

Newsflash

Sign up for PayPal and start accepting credit card payments instantly.
Copyright Dew-Code 2008