Rethinking CPAN, Some Ideas

| No Comments | No TrackBacks
Andy Lester has issued an interesting challenge regarding CPAN. First, I have to say that CPAN is probably at least one-third of the reason I love Perl. Having a centralized, search-able, browse-able, and documentation-oriented repository of modules makes development so much easier than any other alternative I'm familiar with. In addition to all that, it has reviews, forums, a bug tracker, test result summaries, etc. That said, it is far from perfect.

There are two main complaints that I get. The first is from friends who are systems administrators. They often despise CPAN because they don't care if it helps developers easy locate and include modules. All they see is that to install some application, they have to install 15 modules, which installs another 22 dependencies for those, and then another 18 for those, and then another 12 for those, and then another 5 for those and then finally 2 more. This can take hours to build and install all of them. This is just a hassle for them.

From a developer perspective, I also have the problem that Andy highlights: which effing modules solves my problem? For example, if I want to build Subversion/CVS/Git style commands with sub-commands I could use App::CLI by clkao or App::Cmd by Ricardo Signes. Which one is better for my problem space? Both are by smart developers and both are well put together, but they are completely different implementations for a similar problem. This is something that CPAN won't help you with very much. Andy's example of XML is even more complicated.

There are other problems as well. Here are some ideas I've had that could be implemented in individual chunks that could help.
  • Wikifying the documentation. This is an idea I've thought some about and even worked on designing an implementation for a bit over my Christmas break. This is basically about allowing visitors to contribute POD to a module on CPAN using a wiki-ish interface. This could improve the documentation. The challenge is that CPAN documentation is generated out of the POD stored in modules themselves. There needs some way to make sure that the POD updates made on the Wiki site can get back into the modules easily. I developed what I think would be a working solution to that problem, but requires the author to pay attention to the wiki and make sure to download the patches. Build tools could help automate this process, though.
  • Dependency mapping to cluster modules. Every module contributes a metadata file that should list the dependencies required to run that module. I can think of several useful things we can do here on a large scale. We can use that to suggest which modules are more popular or better based upon how many modules depend upon it. We can use this to indicate (or estimate) how many overall dependencies a particular module has, i.e., you will need to install 184 module dependencies to use this module. We can also tie this dependency information into other features like reviews and ratings to generate other helpful statistics and heuristics.
  • Incorporating use statistics. Sites like ohloh, or iusethis have a good idea in how they can rate content. It's not a perfect system, but letting a person just click a counter that says, "this is good" or "I like this" provides a very simple mechanism to rate a module. In addition, it allows you to gather statistics grouping modules again in another way.
None of this is a new idea. None of these are really very innovative, but I think they would be steps in the right direction toward making CPAN work better and better. If we can incorporate additional metrics into how the search works, adding tagging, or who knows what else along the way would go a long way.

I wonder if we could expose the CPAN services currently available in some sort of unified web service that would allow developers to try and enhance and experiment too. This could make it so that CPAN grows in new ways without a lot of overhead to get TPFs attention or what-not.

I don't know. More ideas...

Cheers.

No TrackBacks

TrackBack URL: http://contentment.org/mt/mt-tb.cgi/567

Leave a comment

About this Entry

This page contains a single entry by Andrew Sterling Hanenkamp published on April 5, 2008 8:58 PM.

Twitter and OpenResty was the previous entry in this blog.

Never-Ending Software Timelines is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.