Drupal for Documents

| 8 Comments

One place where Drupal is week is document management. However, after doing a little digging, I've decided that Drupal is actually not so bad as it might seem at first.

What is a document management system (DMS)? Basically, it's a repository that allows you to upload and organize documents. It should include the ability to store document revisions so that older revisions can be reviewed and restored. It needs a workflow system so that various authors and individuals can collaborate. It needs an orderly way of organizing and fetching documents. It needs a permissions system for managing who has access to the documents. It needs to be able to store metadata about documents (log entries, categories, authors, bibliography, price structures, or anything else that might be useful as a sticky note on the document).

Drupal does several of these out of the box. There's a module for everything else. However, there are still a number of minor shortfalls that keep it from being very robust as a DMS. All of these could be addressed pretty easily. Anyway, this is my summary of what needs to happen to make Drupal act as a capable DMS.

So, as is, this is how I would set it up:

  1. Install Drupal.
  2. Install the Pathauto module.
  3. Install the Category module and turn on the menu and pathauto extensions.
  4. Install the CCK module.
  5. Turn on the built-in upload module.
  6. Create a new CCK content type named "document". Include any metadata fields you need.
  7. Configure the "document" content type allow file uploads and make sure that "Create a revision" is the default action. You probably want to uncheck "Promted to front page"
  8. Create a new container (i.e., the category module's replacement for vocabulary) named "documents". This should be hierarchical with single parents (at least to start). Select the "document" content type to require a setting in this hierarchy.
  9. Configure the Pathauto module so that document nodes are located at "Workflow plugin. I have not yet taken the opportunity to evaluate that module, so I don't know how it works or how well. Personally, I'd like to see a system like Relationship module---again, I have not evaluated this module and I'm not certain how well it works.

    There are a few weaknesses here.

    1. No Browser. There isn't a nice document browser. The category menu will work, but it's a bit of a kludge. It would be nice to have a configurable view that could show the hierarchy, show some metadata, etc. Preferably something with a nice AJAX interface.
    2. File Storage. This "DMS" will store document revisions just fine, but all of the documents will be stored in a single directory. This is probably fine until you start storying a few thousand entries. Unless your file system scales well in such a case (very few do), your storage is going to start slowing down. This isn't acceptable since a DMS should be able to store as many documents as you have disk space for without blinking.
    3. No search. What's a DMS without a decent search of the documents themselves? A likely solution to this would be to integrate a 3rd party search tool, like Nutch. There is, actually, a Nutch module for Drupal, so this might be another easy fix.
    4. No file typing. The system only vaguely understands different file types. It would be helpful if it really understood the difference between PDF and Word and OpenOffice documents and could perform specialized operations on each. Even just being able to identify them on a rudimentary level would be a start.

    Anyway, this is a pretty interesting problem. I find it very interesting that the various Drupal modules cooperate well enough together that this kind of enhancement is practically available as long as you glue everything together just so. I'm intrigued by the possibility of finding and using additional modules or improving or adding a new module to help fix the deficiencies I've noted here. I have a couple projects where it would be nice.

    Cheers.

    Update April 4, 2007: Please see visit the document management
    group on Drupal Groups to discuss this further.

8 Comments

Answer to all my questions
You've managed to summarize in one succint page not only what I've been trying to figure out how to do but also told me that I in fact can do what I want to do using drupal. Thanks for the thoroughness!

Drupal 5.1
I am setting up my first drupal site for a service club and we want something that will be a private file system for minutes, reports, etc. This looks like what we need. I think I have all of the modules installed and running but I don't seem to end up with the functionality you describe. I wonder whether the instruction work as described in 5.1. Still working on it, thanks for doing the article.

The post was purely theoretical, but...
When I wrote the post, it was purely theoretical. I haven't actually tried this out, even now.

Yet, even though I haven't tried it, I'm pretty confident that it should work, even in Drupal 5.1. I am in the process of migrating our site from a full DMS system to Drupal 5.1 and I don't anticipate this being difficult, at least for what we need. In my case, we'll probably use Organic Groups in Drupal 5.1 to provide containers for documents because we have different associations that need different sets of documents for their respective meetings.

Anyway, I'm starting a [http://groups.drupal.org/document-management|document] (still awaiting moderation as of this comment) management group on [http://groups.drupal.org/|Drupal Groups] to discuss this idea (and related document management ideas) further since I'm going to be using it soon myself. (A new post is forthcoming on my blog about that.)

cac_lite not working with Drupal 5.1
It looks like cac_lite may not be working properly with 5.x versions of drupal. Several discussions (eg http://drupal.org/node/116673) talk about problems with it. I can't get past the error message that comes up saying it is not configured. The configuration screens that are described just can't be found so far as I can tell. If cac_lite is just a simplified version of taxonomy access control, it would seem to me that would be the way to go although I am not sure that is a solution. I see that the wait for moderation in the Drupal group continues. It does seem odd to me that this function isn't something that is all figured out. Oh well, still trying. Cheers!

This is fixed in the dev release
The cac_lite problem is fixed in the current 1.x-dev release. It's been fixed for over a month, but they haven't made an official release of the fix in that time.

Any way to do this with just
Any way to do this with just taxonomy (without categories module?). Not that I don't want categories module I just don't want to depend on another non-core module.

Please continue the discussion...
I've started a [http://groups.drupal.org/document-management|group] to continue this discussion. I'm not going to continue discussing new ideas and variations on this here.

I tried to find this step by step summary on the drupal group page but to no avail. I hope you don't mind me asking this here...

Can you elaborate on point 9, it doesn't make sense to me - perhaps some of it has been deleted by accident?

Many thanks,

simon

About this Entry

This page contains a single entry by Andrew Sterling Hanenkamp published on August 26, 2006 5:22 PM.

Using Drupal to build a Knowledge Base with RT was the previous entry in this blog.

Open Source Corporate and Community is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.