Mail service monitoring is not as easy as web monitoring, even outgoing smtp is more difficult to monitorize. We can summarize some service problems and how to monitorize them:
On a postfix we can use this script that warns us when a threshold is reached:
mailq|grep "@" > /tmp/queue.txt
if test `cat /tmp/queue.txt|wc -l` -gt 100 ; then
/usr/local/bin/mobilealert TOO_MANY_MAILS_ON_QUEUE ;
cat /tmp/queue.txt |mail -s TOO_MANY_MAILS_ON_QUEUE_`cat /tmp/queue.txt|wc -l` admin@company.com ;
fi;
Example of remote side bounce causes:
-Account doesn’t exist.
-Quota exceed
-Domain not accepted on remote server (missconfigured)
-Other
Example of ourself bounce causes:
-Our server IP is in a black list.
-Content is being rejected by content SPAM or malware.
That is difficult to monitor, due that each server explain the error as a description. Our approach on that is to monitor how many entries we have, eliminating remote side causes (as many as possible).
This could be done for postfix log with this script:
cat /var/log/mail.log|grep -i " 550 " |\
grep -v "mailbox unavailable"|grep -i -v "Invalid recipient"|\
grep -i -v "does not exist"|grep -i -v "invalid address"|grep -v -i quota |grep -v -i Unknown|\
grep -v -i "Address rejected"| \
grep -v -i "invalid user"| \
grep -v -i "Mailbox unavailable"| \
grep -v -i "Mailbox disabled"| \
grep -v -i "relay not permitted"| \
grep -v -i "Account disabled"| \
grep -v -i "Invalid local address"| \
grep -v "no mailbox"|grep -v "recipient rejected"> /tmp/bounced.txt
if test `cat /tmp/bounced.txt|wc -l` -ne 0 ; then
mail -s BOUNCED_MAILS_`cat /tmp/bounced.txt|wc -l` admin@company.com < /tmp/bounced.txt ;
fi ;
if test `cat /tmp/bounced.txt|wc -l` -gt 20 ; then
/usr/local/bin/msg2mobile TOO_MANY_BOUNCES ;
fi;
As we explain this approach is not perfect due to the impossibility to identify local problems vs remote problems, It is very important to adjust the threshold according the server load.