UbeBlock Training Notes

Whitelist Bypass

If you use the option to bypass whitelisted addresses from the filtering option then you bypass all of the analysis and reduce the processing required considerably. Normally whitelisted mail is delivered without filtering so the default is correct. Clearing the option may result in mail from whitelisted addresses being bounced or deleted depending on the rules you define. Thus care should be taken when adjusting the whitelist settings.

Automated Self Training

It is inadvisable to use the spam or RBL messages detected by UBEBlock to train UBEBlock. Automated training in general is not advised for the following reason:

  1. RBL lists are not proof of spam. Messages can come though an RBL server that are not spam. If you train just one of these it will ruin your training.
     

  2. Many spam messages are seeded with hundreds of innocent words that would appear as legitimate (they are used to try to fool statistical filtering. Thus you seriously reduce the margin between the good and the bad. In the worst case this make it practically impossible to get good training because the RBL spam is swamping any attempts to train good messages. Eventually all mail looks the same and you will have to delete the training and start again.
     

  3. Training a good message as spam by accident will undermine the whole training process and may result in your having to start again. So please be careful.

Bounce or Delete

In general it is better to bounce mail rather than delete. Nearly all spam comes from invalid addresses, and in such cases a bounce will just get deleted. If any real mail is bounced it will reach a real user and will tell them to try again. Bouncing mail, even for legitimate addresses, does not confirm to spammers that the address is real.

Who should train

All training should be done by a person who understands the nature of spam and has been told the issues below.

  1. Always make sure that you train in the right way. Mixing the spam and not spam training will result in very poor performance.

  2. Don’t train all the messages. Take a look at the message first if it has many real words included in the message, do not train it as spam, you will only make it easier for the next one to come in.

  3. Use the unknown word weightings (UBEBlock Rating) to improve spam detection. It is easier to train UBEBlock with real messages and have it reject anything it does not recognise.