站点上线只是第一步,在后续的运营中,监控访问日志、错误日志都是非常必要的。目前访问日志的分析软件比较流行,也比较为大家熟知;而错误日志却较少有人关注,分析软件也乏善可陈。本文分享一款我发现并已使用了半年的优秀的错误日志分析软件ScanErrLog。
在前文常见的Apache错误日志中,我介绍了一些常见的Apache错误日志,并提出应该持续跟踪错误日志,以便及时修订站点程序的错误,以及发现潜在的威胁。我日常关注站点的错误日志,就是通过ScanErrLog。
印象
我是通过另外的shell脚本和cron任务定时将错误日志发送到邮箱来实现站点监控的,这里贴两张图方便大家对ScanErrLog有一些直观印象。
发现
在serverfault的问题Apache Error Log Analyzer — Which is Best?中,提问者写道:
I have 3 log analyzer tools pre-installed on my server. In your opinion, which of the 3 analyzer tools do you find best?
My Objective: basically to analyze the error log file.
Software Installed:
* Analog
* Awstats
* Webalizer
一位网友推荐了ScanErrLog:
The three log analysers you list (Analog, Awstats & Webalizer) don’t do much with Apache Error logs.
I’ve used ScanErrLog to summarise the Apache error_log file for several years now. I run it from Cron once or twice a day, and it remembers where it finished, to be able to pick up and add to the output. Usually, I have it produce a HTML page with counts and URLs of problems. I can produce other formats though.
了解
我于是对ScanErrLog做了了解,虽然2002年后这款软件再也没有更新,我必须说,这就是我要找的,并且十分好用。Freecode对此软件的描述是:
ScanErrLog is a Python module that allows you to parse Apache error_log files and present their data in decreasing order of occurences of error messages. This is particularly useful if you want to quickly solve the most annoying problems Web visitors encounter on your site. You can use it directly from the command line, import it into another Python program and use the classes it defines, or use it as a CGI script. You can produce reports in HTML, PDF, XML, or Plain Text formats.
作者jalet发布的多款软件最晚在2007年后都没有更新,看来这款软件以后更新的可能性也不大,也就是说如果出现什么问题,我们要自己解决。
ScanErrLog的发布主页是ScanErrLog,作者对其的介绍为:
- ScanErrLog is both a command line tool, a CGI script, and a Python module which processes the
Apache web server ‘s error log files and generates statistical reports about the different errors encountered when serving web pages.It can produce reports in plain text, XML, HTML or PDF formats. - ScanErrLog is used by web hosting providers worldwide to help their clients focus on the bugs they may have on their web sites. This gives an added value to their services, making their hosting solutions less error-prone, and improves the web experience of both their clients and all the web surfers who visit the sites they host.
- ScanErrLog is part of the official Debian GNU/Linux distribution.
安装
ScanErrLog是python脚本软件,没有在pip库中,所以需要手动下载安装,其有一个依赖jaxml
,是同一位作者发布的。下面我们介绍安装流程:
# jaxml是可以pip安装的
pip install jaxml
# 下载压缩包
wget http://www.librelogiciel.com/software/ScanErrLog/tarballs/scanerrlog-2.01.tar.gz
# 解压
tar xvzf scanerrlog-2.01.tar.gz
# 进入解压目录
cd scanerrlog-2.01/
# 修正错误,因为python版本兼容性问题,这个版本的ScanErrLog会报错,只需更新一行即可,编辑第815行为:
(year, month, day, hour, minute, second, weekday, jday, dst) = time.strptime(string.join(string.split(line[datebeg:dateend])), "%a
%b %d %H:%M:%S.%f %Y")
# 完成安装
python setup.py install
使用
基本用法是scanerrlog.py [options] [inputfile1 inputfile2 ...]
以分析最近的错误日志文件,并将结果以text格式打印到终端举例:
qiushan@topvps:~$ sudo scanerrlog.py -f text /var/log/apache2/topvps_error.log
ScanErrLog v2.01 Report
Fri May 10 09:32:09 2019
1 => [client 148.251.22.75
1 => 37894] AH01630: client denied by server configuration: /var/www/html/topvps/robots.txt
1 => [client 123.145.9.108
1 => 36067] AH01630: client denied by server configuration: /var/www/html/topvps/, referer: http://www.vps123.top/
Skipped 0 unwanted lines (0%).
# 比较一下用流行的sed命令对相同的日志文件进行分析的结果:
qiushan@topvps:~$ sudo sed 's^\[.*\]^^g' /var/log/apache2/topvps_error.log | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c | sort -n
1 AH01630: client denied by server configuration: /var/www/html/topvps/
1 AH01630: client denied by server configuration: /var/www/html/topvps/robots.txt
因为我将在另一篇文章阐述配合另外的脚本和定时任务发送每日简报的方案,所以这里只举一个简单的例子,更多用法请看脚本打印的帮助。
使用scanerrlog.py -h
打印帮助看看:
qiushan@topvps:~$ scanerrlog.py -h
ScanErrLog v2.01 (C) 2000 Free Software FoundationThis Python module allows people to parse Apache error_log files from
one of different possible sources (filename, stdin, python file object),
and present their datas in decreasing number of occurences of error
messages.This is particularly useful if you want to quickly solve the most
annoying problems web surfers encounter visiting your site.If you run this module directly, it will parse each file which name was
passed on the command line.If you don’t pass any argument on the command line, then scanerrlog will
read an error_log from stdin if you’ve piped some file or command to its
standard input, or it will print its documentation if you’ve not.You can also use it as a CGI script, but you’ll not be able to
modify the pattern and outputfile used, and the input filename
should not begin with / or contain .. in its name, all for
security reasons. The names you may use for your CGI variables
are: continue, date, withoutheader, title, limit, exclude, format and
inputfile.
if continue, date or withoutheader exist in your form, these options
will be set to TRUE whatever value they have. See ScanErrLog.html for
a sample form to launch ScanErrLog as a CGI script.e.g.:
./scanerrlog.py
prints scanerrlog’s documentation (what you are reading now)
./scanerrlog.py /var/log/httpd/error_log /var/log/httpd/error_log.1
will read datas from the specified files.
./scanerrlog </var/log/httpd/error_log
will read datas from standard input
You can pass some options on the command line:
options:
-c | –continue useful if you want to parse the same file
many times (e.g. every week): the current
state and statistics of the file are saved
in a file named ScanErrLog.stats in the
same directory, so you don’t have to reparse
the beginning of the file each time. You
should use this option either to tell
ScanErrLog to save the statistics or to reuse
the saved ones.
Without this option the file is completely
parsed again, even if you’ve got an old
statistics file saved in the same directory.
WARNING: this option is incompatible with
the parsing of multiple files.
-d | –date include in the final report the date when
each message appeared for the last time.
this option is mutually exclusive with
the –pattern option.
-e | –exclude e e is a slash separated list of
messages severity. All messages with
a severity listed in e are excluded
from the final report. By default all
messages are included. For example,
e can be: info/debug to exclude all
messages which severity is info or
debug.
-f | –format f output format for the report, f can be
any of:
‘xml’, ‘text’, ‘html’, ‘pdf’
the default format is ‘html’.
-h | –help displays this help screen.
-l | –limit lim selects messages only if their number of
occurences equals or exceeds lim.
lim’s default value is 1, meaning all
messages are included in the final report.
-n | –nocumulate don’t cumulate counts for all the files
passed on the command line. the old
-c | –cumulate option is now the default.
if the following option -o is not used,
then -n implies -w because all reports
will be in the same file (stdout).
-o | –outputfile f save the report in the file f.
if -n is used, then the filename will
be n.f where n is an integer incremented
for each new file and starting at 1.
-p | –pattern regexp select only the lines which match regexp.
the default regexp is:
^(httpd: |\B)\[([^\[\]]+)\] \[([^\[\]]+)\] (?:\[([^\[\]]+)\] )?
which selects all Apache logged messages,
but not errors from CGI scripts for example.
to work correctly, your regexp should consume
all characters from the beginning of the
error line up to the beginning of the real
error message.
this option is mutually exclusive with
the –date option.
-t | –title t sets the report title.
-v | –version displays ScanErrLog’s version number.
-w | –withoutheader suppress the header of the HTML report.
useful if you want to include the report
directly into another HTML document.Warning: some options may not work with all report formats.
A fifth possibility is to import this module into another python
program and use the ApacheErrorLog class it defines.ScanErrLog comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it under
certain conditions; refer to the Gnu General Public License for details.
You’ll find the GNU GPL in the file COPYING which should came along
with this software or at http://www.gnu.orgPlease e-mail bugs to: [email protected] (Jerome Alet)
参考资料
-- EOF --
本文最后修改于6年前 (2019-05-10)