February 2005 Archives

Neat. ~Sterling has been rebuilt using the latest CMS to take the Open Source world by storm...er something. Okay, so the "secret" project I've been working on for years is finally doing something because I decided to stop being "elegante" and decided to JustMakeItWork(tm).

What have I done? Well, I wrote a little tool to help me manage the CIS Support Site for work. This little tool is a combination themer/indexer for static pages. It also does some on-the-fly generation of HTML from reStructuredText, which is what we write most of our docs in. It seemed pretty useful and is similar to the software I was using to run this site before October 2004, called Blosxom, which is a lightweight file-based blogger.

Anyway, when I had some trouble getting access to K-State Online for my course last semester, I decided to try and dump my course data into it on ~Sterling. With a few modifications it worked quite well and quickly supplanted any previous ideas I had about the content management systems I'd been toying with for the past several years.

Most of the work is already done by HTML::Mason. My system just took advantage of the features already present to add indexing and theming and generation of content from other formats. ~Sterling took it a bit further by adding the ability to generate even more complicated content (especially, ripping apart zipped Keynote "files" and using XSLT to generate HTML outlines).

At this point, I ran into a few issues:

  1. Adding new generators was requiring lots of custom code and my indexing code was becoming convoluted.
  2. New content had to be added with care, otherwise Mason would try to interpret files it had no business touching. When this happened, my indexer would basically bring the entire site to it's knees with a single exception.
  3. Some content is just better stored in a database. Blog entries, news items, and simple records are just a few examples. The system had no way of coping with any of these.

Thus, since about the time I put Drupal on this site, I've been working on a replacement. Drupal is merely a temporary expedient. I started completely from scratch, but have dragged in a lot of the bits from the existing "knowledge base" system to build this new system, which I am dubbing as Contentment (superceding all the predecessors I'd created and called this).

This system currently features a lot of unused features, but most of the good ones are employed currently. One of the best features I just added this week and after just a few days it's practically remade the quality of the system. Specifically:

  1. It features a (largely unused, as yet) forms handler that can help design forms and wizards with a fairly small amount of effort. I borrowed a lot from the kinds of work that Everything has done in this area.
  2. It uses the SPOPS object-mapping system to provide a database API. It's not required that new plugins use this API, but all the existing database pieces use it.
  3. The system automatically provides for context, sessions, and logins. The user accounting system is completely pluggable, so new support for LDAP or other login types could be added with a little effort.
  4. The system provides a basic permissions system. All of these features have been designed to make adding database-based plugins possible, but there really aren't any yet.
  5. The major feature that has really made the system work despite the lack of any database plugins is the VFS system I've put together. I've debated whether or not this should be forked into a separate project, but I'm going to leave it where it is for now. Anyway, this enhances Mason's abilities by quite a bit and allows for a much more general way of looking at files. This way, Mason no longer has primary control of generating files, but passes that control off to other plugins.
  6. Right now the system works via CGI, but I'd like to put together a mod_perl front-end to take advantage of those features. I've designed everything to this point with mod_perl in mind, so it should work with minimal effort.

That's a pretty bad mish-mash summary of the features. There's a lot more I could say, but I'll save that for documentation. I'm going to admit that I've had a SourceForge project for this for eons, but that it'd never really worked until now. I'm so excited about how this is going now, that I have registered Contentment.org and will be posting information and documentation there. I'm going to, for now, use the mailing lists, bug trackers, announcements, and CVS repository at SourceForge. (Though, I'm hunting hard for a way to keep it in Subversion as I strongly prefer it, despite it's performance and other issues.)

Anyway, I wanted to announce that and say that Drupal may be saying farewell to this site soon---if I can get the plugins written and translate all of my Blosxom and Drupal entries into my new plugins.

For the last few weeks, I've been working on my work web site, ~Sterling to improve it's capabilities. The purpose of this update is many-fold. Read on if you want to learn about my software (you may want to read this post first)...

My web site is based upon the "Knowledge Base" software I developed for the CIS Support Site. This has made keeping this web site up-to-date considerably easier. The software started as a handy index generator that generates indexes of pages posted into the system automatically. Additionally, it's able to perform some basic transformations on input files to generate HTML or RSS or whatever. It's a very minimal content manager that mostly relies upon the features of Mason for most of its functionality.

Okay, so I branched the system to create ~Sterling, which was the repository for the CIS 450 web site last year. I added some more features to make it automatically detect multiple file formats with the same basename minus suffix (e.g., Session-01.pdf, Session-01.pdf, Session-01.key.zip). This made it possible to view multiple versions of a file and using the transformation bits, I was able to generate HTML summaries straight from my Keynote presentations since the format is based around an XML file.

Problems: (1) I now have two versions of the "knowledge base" software that I wanted to have the same features. (2) Adding content always requires dumping files into the system, which is overkill for items like blog/news entries, blog aggregation, etc. (3) One bad file spoiled the whole web site—any index page that found a badly formatted Mason file caused the index to puke. (4) The transformation system was a kludge and required careful tweaking and depended dubiously on file suffixes.

Thus, I endeavored to rewrite the software and have now decided that it is satisfactory for my content management dreams of the past several years. It's not quite ready for distribution as I need to add a little bit more documentation, but I anticipate that in the next month or so, I'll be updating the woeful Contentment project page.

I should be able to take the existing CIS Knowledge Base and drop it into the new system with very little effort. My ~Sterling page is going to be repopulated over the next couple weeks with the original files, but using the new improved transformation system. I still need to migrate the indexing system from the old knowledge base to the new one, but that should be a relatively simple matter (and this time, errors will be handled gracefully!)

The new transformation system is really the key. I borrowed a lot of ideas from Cocoon since I've always kind of liked that system. Basically, each file in the system is first checked to see what input "kind" it has (determined by a set of Mason plugin components, which pick the "kind" from file suffix, file contents, etc.). Based on this information, the file is run through a "generator," which translates the file into an initial kind. The system only has two generators right now: Mason (runs the file as a Mason component) and the fallback generator which just reads the file as is.

Then, the transformation system is applied to each input file. The transformation system attempts to find a sequence of transformations that can be applied to the file to get it from it's initial kind to the requested final kind (which is determined by more plugin Mason components, usually based upon the URL or query parameters). If no transformation can be found, then you'll get a 404, otherwise, it attempts to find the best transformation using a shortest path search (which is probably too costly, but works fine for now with a very small number of transformers). The transformers are applied to get the final output file.

Yet, we're not done yet. Finally, based upon the final output kind, another group of components are applied to the output, called "filters," which further modify the file. The main reason for this is that HTML files coming through the system need to have some links fixed, etc. before output. I had intially thought that this would be a good place to put the theme engine too, but I've decided against for now until I can come up with a decent policy for regulating how themes should work. I have a theming engine in place now, but it depends upon the "autohandler" feature of Mason, in the same way as the previous version of the knowledge base.

Once all the filters have been applied, the file is finally output (after being passed through any autohandlers, as per normal Mason operation).

I should have the system adapted to take any index and turn it into a proper RSS feed and I also want to add Atom feeds this time round as well (and better put in the "alternate" links to let browsers know it's there).

The next step is to add in the database features so that the web site can store some content into MySQL. This will make a lot of the routine updates to the site much easier. A file manager would also be handy to allow users to upload and manage files through a web browser, which is another goal of mine.

If all goes well, I may be dropping Drupal for Contentment in a couple months. I won't hold my breath though...

There is exactly one aspect of Perl that annoys me and it's the same aspect that everyone makes fun of Perl for and it's exactly the reason I love Perl too: JAPHs. JAPH = Just Another Perl Hacker. The Perl world is filled with JAPHs.

I think Computer Science is divided into several overlapping camps. You have the mathematicians who think that CS who spend their time with theory and showing that programs and designs are provably correct. You have the software engineers who want to model everything and then do almost no coding and then use the model to verify the code. And then, you have the hackers who just like to play and want to figure things out on their own on their own terms.

I've come to believe I stand in the last group. I like to beat an idea to death and I'm never satisified with any result short of perfection. I don't have the patience for mathematics (nor the brain). I just don't think that way. And software engineering strikes me as more like a cross between bad wizardry, meteorology, middle management, and a pipe dream. Who wants to draw models according to someone else's idea of a good diagram? *co-UML-ugh* I've taken my SE and math classes and learned from both, but I can't really say I like either camp. Finally, I design in vi, I code in vi, I research with Google, and I use print-statements and logs to debug my code. I'm definitely in the "hacker camp."

The term "hacker" has received a lot of press and has been generally defined by the culture, but I think the term finds it's most appropriate meaning in it's original denotation: a dude that literally hacks away at something until he gets what he's looking for or just broken pieces. The term hacker has come to mean: a dude that likes to break into systems and read your email. However, that's more of the realm of the Script Kiddie. Anyway, I digress...

Back to Perl: The problem is that Perl code is extremely expressive. That's why us hackers like it. I feel like saying, "do this or do that", then I say , "this or that." Or, if I prefer "unless this do that," then I say, "unless (this) { that }." Or, "that unless this" or "that if not this," etc. I can think in Perl. I can design in Perl. Perl doesn't enforce very many policies, so I get to choose my own. That's Perl's beauty.

This lack of policy enforcement is also it's bane. As one of the CIS professors (in a Japanese accent) stated during my MS Presentation (just got my diploma in the mail today, btw), "This language is very strange to me. Why did you pick this?" I like it. I didn't tell him that, I gave him the BS about it being the language of the tools we already use. (BS only because I chose those tools partly because of their language, though RT is head and shoulders above any other similar tool I've taken a look at.)

The problem Perl has is that so much of the code written in Perl is simply hacked together. This means that CPAN (a great feature of Perl), has to be carefully sifted to find the module that fits your need. For example, if you need a tool for object persistence, there's Tangram, SPOPS, Class::DBI, Alzabo, DBIx::SearchBuilder, and a few others. Each has their own strengths, but all of them were written by hacker types with more or less lenience towards the math and SE mentalities. I'd say Tangram is the most mathematical and Alzabo is the most SE. DBIx::SearchBuilder is completely in the hacker camp. However, I think they all find themselves written from the JAPH perspective: hacking is a fact of life in Perl.

I'm looking forward to Perl 6. I think part of the reason so much Perl is so hackish is because the Perl 5 language is so hacked. Larry Wall built a simple tool for processing log files and then hacked on more and more features until he got to Perl 5. Now, we have hacked on object support. Hacked on packages. Hacked together subroutines. Hacked together regular expressions. All of it is brilliant, but the "cruft," as Larry Wall calls the entropy that hacking inevitably brings along, it has started to take over. Perl 6 should freshen the slate, especially since the development is very heavily design driven. Anyway, in another year or two when the preliminary versions of Perl 6 start coming out, we'll see.

Enough ranting....now for more hacking....

I've classified this as a rant, but this is an extremely mild one. The business of the start of semester has reached it's conclusion. I say this on the basis that the last "emergency" in RT was ended a couple days ago and I haven't been handed another emergency since.

Speaking of emergencies, I'll just use the rest of this blog to explain how the tech staff (namely me since I do the prioritizing) prioritize incoming requests. As most systems administrators can tell you, the job of sysadmin is a little difficult because there are usually more things to be taken care of than you are able to during than a short 70 year life span may allow for. As such, it is necessary to setup prioritization. Based upon the prioritization ideas of other systems administrator, I put together a policy with the following priorities:

  • 80 - 89 : Emergency
  • 75 : Routine
  • 60 - 69 : Critical
  • 55 : Unprioritized Fac/Staff Request
  • 50 : Unprioritized Other Request
  • 40 - 49 : Very Important
  • 30 - 39 : Important
  • 20 - 29 : Not Important
  • 10 - 19 : Wish List

I've left the top and bottom of the ranges unspoken for in case I want to add something else or move things around a bit in the future.

Basically, tickets are assigned a starting priority of 50 or 55 based upon whether or not the ticket is posted to the normal help list or the special one given to faculty and staff. In the future, I think I'd like to make this assignment more or less automatic based upon the sender rather than the destination, but that's probably a "Not Important" task which will never get done. I then assign a ticket to some range and then I assign (or the someone takes) ownership to the ticket, the administrator responsible for following up on the problem. The assigned person may then adjust the ticket within the priority range as he sees fit (or she, should a woman ever apply and get the position; none have applied since I've been Coordinator).

If anyone has one or more Emergencies, they are not permitted to handle anything else until all action they are able to take action on all Emergency tasks. That is, the task might not be resolved, but everything that can be done to move towards resolution has been done. All other tasks should wait in lieu of an Emergency. Only the tasks that require the most urgency and have the highest importance are given Emergency status. They should never be ignored. Generally, only faculty or staff are able to make requests that reach this high of a status unless a request by a student will affect faculty or staff, or the entire student body.

The Routine tasks are generally simple tasks that are easy to complete. This is not always the case as I know Tyson has a number of Routines that have been in his list for a long time while he waits for me to complete more pieces of the account manager software. It's also a place to place notes about routine tasks that might not go away, but need to be reviewed on a routine basis (such as disabled accounts for printing fees and other reasons).

The Critical tasks have the same requirements as the Emergencies. However, they aren't so urgent as to cause Routine tasks to stack up unattended to. Emergencies are sometimes downgraded to Critical if they cannot be handled quickly. Most problems that have no known work around and bother faculty, staff, or a group of students are assigned Critical status. The "work-around" requirement is the major difference between Critical and Very Important. If a user has to work more, but can still get their work done, then it's Very Important. If this problem keeps a user from doing their work, then it's Critical. This is a troublesome rule because annoyances are (in some cases) worse than something without a work around. In those cases, I have been known to upgrade what would otherwise be a Very Important to a Critical.

The four remaining levels are treated as essentially part of the same category. These tasks can be worked on in any order, but the tasks at the top are preferred. The levels 20 through 49 are assigned by me to faculty, staff, and PhD student requests that aren't Critical, Emergency, or Routine, but we'd like to handle when we finish of those---which usually means late in the semester or during the summer. The levels 10 through 39 are assinged by me to all other student requests. The 40 through 49 is also the location for internal systems projects related to improving overall system quality and adding new features.

The Very Important stuff will probably be taken care of within a year. The Important stuff may be handled if we ever find time and it's easy enough. Anything else might happen, but probably won't in the foreseeable future (or might happen, but only as a side-effect of something else).

Anyway, for those who are interested, that's pretty much how we prioritize things. If you're task doesn't get taken care of for a long time and you think it should be, a large cash donation to hire more tech staff would be a move in the right direction.

As those who have read ~Sterling TNG will know, I've recently begun revamping my ~Sterling web site. I am still working on it in short bouts when I can and have gotten the first revision of my form handling stuff put together and tested using the first revision of my database security plugin.

The two major things I've struggled with over the last few years in desiging my framework is URL handling and form handling. The essential problem is that HTTP is a cruddy protocol for designing web applications and interactive systems. HTTP has only barely hacked together support to allow state to exist between clicks. As such, you have this situation where you have to track work flows page by page by page. Instead of the more typical GUI design of event by event by event. A single page might include hundreds of events, which cannot be processed by the server until the user clicks a "Submit" button or something similar.

Anyway, I've come up with what I consider to be a solution a little different from most that I've encountered. One thing I want is the ability to have truly hidden fields. Many forms need to have extra information associated with the form. Such things can include the type of operation being performed, limitations on the input the user can give, or just extra data that needs to be associated with a form that has already been set by the user or that the user doesn't get to have a say in. This is especially needed in the case of multi-page "wizards" where each page collects a little bit more data from the user. After several pages, the wizard is completed and all the data can be processed.

What I've done is assigned every form in a page an ID that is unique to each user's session. Then, session data stored on the server can be used to store the hidden information without relying on the user to submit the form properly (i.e., hidden fields in the form itself might be maliciously modified). Then each field name has this ID associated with it so that the user could have multiple instances of the same form. Each page might have multiple forms, so I've included the ability to have multiple forms on a page where each is activated separately or through a kind of event system, one form may activate another, etc. Forms that aren't activiated aren't processed, but their information might be kept in case they need to be activated later or the user hits the "Back" button.

To me, this seems really nice. I have a nice way of storing form data associated with the user and it's all fairly seamless to the programmer as well through the use of a lightweight widget API embedded in Mason files.

However, this does have a few problems: session expiration, form data expiration/naming, and storage abuses. This system requires that each form add a few entries to the user's session. Watching a user's session is a little interesting because everytime a user opens a page with a form on it, the session expands to add the metadata for that form to their session. Thus, if a person hits reload on the login page 10 times, their session expands each time. This could lead to some consequences where a form with a lot of session data could eventually consume huge amounts of hard disk space. This is a bit unlikely because most the information is small amounts of text, but you never know.

Session expiration is a problem most sites have to watch out for. For example, when I click on the Preview button of Drupal after I finish typing, I will have preceded that click by selecting the text of this post and copying it to the clipboard. Why? Because sometimes Drupal unexpected terminates a session, which will cause all my typing to be lost because Drupal will say "ACCESS DENIED" and Firefox will forget to store the values I have already typed. Nasty.

I do not want this to be a problem. So, I have to find a way to recover a user's session. Probably this will involve setting a cookie on their machine that never expires to help track the session ID they had originally. Then, when their session expires and the get taken to the login screen, the successful login will result in an automatic copying of their old session to the new one. Then, we must take the data that they submitted when the session expired and process it as if nothing had happened. My forms processor should be able to cope with this pretty seamlessly, I just need to hook it into the security API, which hasn't yet been done. It will also require some modifications to the security API which as the security API contract is incomplete.

Finally, a problem fairly unique to my setup is form data naming/expiration. This is a little more nebulous. In some cases, forms may need to be time sensitive and need the information in them to expire. I also haven't really figured out how a completely processed form should be cleaned up or if it should. What if a user logs in to the system, and then I just clean out all record of that form and then the user hits the back button a couple times to get back to the login screen and logs in again using the original form? Error! The form no longer exists. At the same time, it's bad news to store the user's password in their session longer than absolutely necessary, which would be stored indefinitely if I didn't clean up the form metadata in their session. Therefore, I'm considering setting up some way of reinitializing a form to a fresh state if they try to go back to one that's already been cleared. (Of course, this isn't yet an issue because the implementation doesn't yet clear forms out because of this problem.)

A second aspect of this problem is naming. At this time, each form widget is named with a 5-letter identifier prepended to the name to identify which form the control belongs to. This will undoubtedly break automatic form fillers, which might not be a bad thing at times. However, these tools are useful, so I don't want to completely invalidate them. Instead, I'm probably going to create some sort of aliasing system where each form element will have it's name prepended by the same integer every time and then there will be a hidden field to alias that integer to the real form ID. In the cases where a page contains only a single form, or where forms aren't nested, the integer wouldn't be necessary, but might just be added in anyway to make the coding a little easier and make page output faster. (Sometimes, I don't know if theirs going to be a nested form until after I've started printing output to the client, so we'd probably be better of just making the assumption anyway).

I hope XForms takes off because it would make all of this much easier as it gives a much more natural client-side API for handling a lot of these details (especially, nested forms, form wizards, et. al.).

About this Archive

This page is an archive of entries from February 2005 listed from newest to oldest.

January 2005 is the previous archive.

March 2005 is the next archive.

Find recent content on the main index or look in the archives to find all content.