Okay, for those of you that read my blog, the number of which usually astonishes me to be above single digits, I thought you might be interested in getting a personal take on the mail migration project I'm working on for the CIS department.
So, back the beginning. When I first started with CIS I wasn't actually that pleased to be running mail services. Mail is one of those services that has to just work, but also requires more babysitting than would seem necessary. Let me use a diagram to demonstrate why:

Now, this is vastly simplified, but gets the jist of what happens when a foreign user sends an email to an account in CIS. I start with the first step of the process that involves us (i.e., I don't really care what kind of setup the other guy has ;) Anyway, the first step is for their server to execute a DNS lookup to find the MX (Mail eXchange) record for "cis.ksu.edu" since that's the last part of the email address. You can do the same on any of our Linux boxen (or just about any Unix box, for that matter) via:
$ host -t MX cis.ksu.edu cis.ksu.edu mail is handled by 10 mustang.cis.ksu.edu.
That is, they contact their local DNS server, which then does the whole complicated process of finding our DNS server and then asks us for the MX, which responds with mustang, which is the "real" name of the machine we call "smtp.cis.ksu.edu." The 10 part is the priority of the mail server, so that if we had multiple mail servers we could have them listed by priority.
Once the foreign host knows the mail server, it initiates a direct connection to that mail server via a communication protocol called SMTP. We use a program called Sendmail to provide a server that communicates this protocol. Among other checks, it makes sure that the email being delivered belongs to a local user. To do this, it contacts our authentication server (through another convoluted process) to see if the user exists and performs delivery if she or he does.
Mail delivery continues when Sendmail starts another program called Procmail. In the typical case, this program runs another program called SpamAssassin to check to see if that program decides to see if it spam or not. Then, Procmail either finds the user's mailbox (a directory named "Maildir" in the user's home directory) or executes a script named ".procmailrc" and performs the actions requested by that script. Again, I'm simplifying for brevity.
Procmail writes to the file server (using the NFS protocol) to finally deliver your mail to your mailbox. For you to get your mail, you start up your favorite mail program (be it Microsoft Outlook, Mozilla Thunderbird, Netscape Mail, Pine, or Mutt), your mail program logs into the IMAP server (though Pine and Mutt can read the mail straight out of your home directory, but we'll not cover that case, again, brevity!), which contacts the ActiveDirectory server with your given username and password (did mention this is convoluted?) to verify your identity. Then the IMAP server uses NFS to read the contents of your mailboxes from the file server.
The problem being that if any one of these systems (or their subsystems) fails, part or all of the process stops. If the DNS server goes down, gets misconfigured when we make configuration changes, or because we lose network connectivity, email stops being delivered and possibly/probably you can't check/send email anymore either.
If the Sendmail server goes down, gets misconfigured (rarely), or the computer the server runs on gets too busy, mail stops being sent and stops being delivered. Other problems can occur when SSL keys expire, get accidentally changed during updates, or your local Anti-Virus program decides to interfere with communication.
If the Microsoft Windows Server running Active Directory goes down, becomes inaccessible, or a critical update fails you can't login and you can't send or check email because none of the machines are able to verify your identity.
If you modify your ".procmailrc" file and bugger it up, you won't have your mail delivered (or in the case of mistake I made this week, all your mail could be delivered to the wrong place). Generally, we don't have to mess with Procmail often, so this isn't something we break very often.
SpamAssassin, on the other hand, has proven to be a pretty fragile program. We actually have it set to restart on an extremely frequent basis because it would occassionally crash. It takes a lot of processing power to do the checks it does, so this takes a pretty heavy amount of processing on the mail server.
If some part of file services goes down, obviously, nothing works again.
I could also mention PAM, NSS, LDAP libraries, SASL libraries, libc libraries, OpenSSL, various hosts running various services, etc. If one goes down, half the system goes.
We could probably keep things up and running for a long time if we could keep ourselves from touching configuration files and installing updates all the time. However, that's assuming the scriptkiddies and the hackers would leave our boxen alone if we didn't keep them up to date, so we either risk breaking services ourselves in predictable and generally quickly reparable ways, or risk damage that requires complete rebuilds.
Furthermore, our user base is small enough that I don't have staff working in Nichols 24 hours. CNS has staff on call 24 hours a day and a night crew that spends part of their time monitoring systems. In fact, I'm really the only employee at this time who understands most of the inner workings of our mail system. It's not that it's extraordinarily complicated as mail systems go (as it's pretty simple), but that this simple mail system is more complicated than I would expect an hourly student to take the time to understand. CNS has had some notable downtime in the last year or so, but our downtime has been considerably worse.
The benefit of doing this for ourselves is the ability to be flexible. We can set our own policies that best suit our small department. On the other hand, we pay for that flexibility in greater amounts of down time. As such, I'm for the move to CNS despite the fact that there will be a number of inflexible policies that we will have to cope with because it will mean a better overall uptime. It will also mean that I have more time to work on improving the systems that are really more important to what CIS does, even if not as critical (if that makes sense).
This has been a fairly politically charged move as email is one of those things that has to just work. CIS faculty have some particularly difficult needs to accommodate in the way of mail quotas, attachment sizes, pre-filtering, spam protection, etc. CIS faculty are also very picky because most of them are pretty knowledgable about how these things can and should work. Some of our faculty also have some pretty eccentric work environments configured and aren't interested in relearning to do things in a different environment. I won't argue these things good or bad, that's not my business, but I can say that accommodating all of these disparate desires has been quite a task. (And I haven't even mentioned the students, but then, they come and go so
However, the major obstacles to this move have been overcome. There will still be parts of this that some will not like, including me. However, I am convinced that the benefits outweigh the problems.
I hope this "little" blog helps explain things a bit more clearly. Cheers.
