Results tagged “programming”

I Like Big Bugs

Sorry for the Sir Mixalot reference there. Ahem. Anyway, I just had a coworker walk by my desk and say something like, “Don’t you just wish we could get rid of all these glitches and be done with them?” She left before I could answer and I think it was rhetorical (it’s hard for me to switch gears to interpret social signals while in the midst of concentrating on code). My answer was going to, “Uh, no. I like finding bugs.” In this sense, I’d agree with what Nat Torkington twittered recently (edited for language, I try to run a PG rated blog):

acid test of whether I’m still a hacker: do I think “oh goody!” or “oh [skittles]!” when I find a bug?

Not only this, but a few years back I worked for a company, NRG, that shared office space with another company owned by John Devore. John and I were chatting one afternoon and we were talking about debugging things. He said that when he finds a bug, he doesn’t just want to fix it. He wants to know why he didn’t find it before: why does the bug work so well except here? That’s certainly my passion.

Debugging is an interesting opportunity to learn. Not only can you learn from the mistake, but you can then use that mistake to actually become an enhancement in the future if it happens to be interesting in some way. Sometimes, you can even end up turning a bug into a feature if it happens to do something really cool with a slight change.

Anyway, I just wanted to post that I enjoy debugging and when I get to the point where I say, “Ah crap” when I need to fix something, it’s time to switch careers.

Cheers.

Perl is not going away

I’ve been taking DDJ for a couple years now. It’s cheap and occasionally has something interesting in it, but it’s been less interesting than I remember it being when I read it in college. I’ve been much more enamored with the Communciations of the ACM. Today, I received my issue and there’s an interview with Paul Jansen of TIOBE Software. In the article, he’s quoted saying:

Another language that has had its day is Perl. It was once the standard language for every system administrator and build manager, but now everyone has been waiting on a new major release for more than seven years. That is considered far too long.

While I am biased, I have to admit that I disagree pretty strongly with Jansen’s assessment. First, let me go into the problems with how he came to this conclusion and then explain why I think I’m justified trusting that Perl is in it for the long haul despite my bias that would have me think so anyway.

I want to first evaluate the way Jansen has collected the data he’s used to make this statement. TIOBE puts together what they call the TIOBE Index. This is a rating of the popularity of various programming languages. The TIOBE web site claims, “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.” How do they measure this? By performing a search for:

+"<language> programming"

on 5 popular search engines, including: Google, Google Blogs, MSN, Yahoo!, and YouTube. That’s it.

What they are measuring is not actual popularity, but the amount of hype surrounding each one. Not only are they measuring hype, but only hype that discusses “programming”. What if everyone prefers to say “programming Perl is fun!” That wouldn’t get picked up by the search they use. What about “Perl scripting”? Nope. Missed. (Here I should point out that Andy Lester appears to have been on to something when he gave his lightning talk about Perl programs versus scripts at OSCON last year.) In essence, this is, if they’re disclosing the complete metric, incomplete. It’s a shortcut that might be 90% right or 50% right. This is just poor statistics.

The second aspect of Jansen’s comments I take issue with is the statement that there has not been a major release in seven years. That’s not strictly true. Perl 5.10 has just been released and it includes new features like the new smart match operator. Beyond that, there has been some very active development on a closely related project, Parrot, and language development toward a huge milestone, Perl 6. Furthermore, where Perl truly shines is in all the development on CPAN. CPAN is getting large and complex enough now that we’re having to rethink how it works just so we can find anything on it. This is a good problem to have.

This comment by Jansen does, however, serve to indicate a certain perception gap caused by the long wait for Perl 6. It’s even been considered that the name of Perl 6 is harmful to Perl 5. This has been discussed out by others for some time.

In my opinion, Jansen is on shaky ground with his claims and probably only because he’s not well informed by anything but his own metrics. I should think that he’d at least research the trends and issues facing the top 10 languages listed by his survey as to provide some better justification for it’s accuracy.

As for the reasons I still have warm and fuzzy feelings toward Perl’s future, I can list them off rather easily.

  1. I am participating in a number of growing projects that depend on Perl’s future. Jifty and rethinking-cpan are just a couple I’m involved in. I can point you to several other vital projects that I use or am familiar with.
  2. I know of several companies actively pursuing Perl to develop core projects and continuing to train developers. This includes imdb.com, Socialtext, Best Practical, Six Apart, and several others.
  3. Recently, Google launched Google App Engine. This tool provides services to Python developers as part of the initial release. The top most voted for issues are first to add support for Ruby and second to add support for Perl, as of this writing.
  4. There’s an average of 50 new and updated modules being posted to CPAN every day. That’s not a small number.

I can probably come up with more, but now it’s getting late, so I’d better end this thing. If Perl is going to die, it’s got some years left before it happens. I think there will be enough activity to keep it going and increasing during those years rather than dying.

Cheers.

Popular Tags

This is the list of all tags I've used in my blog to those point (this includes or will include tags used in content aggregated from other sites as well).

Categories

$container_id = 181;
$sql =
"SELECT d.nid, d.title, c.description, ".
" MAX(n.created) AS updated, ".
" COUNT(*) AS count, ".
" SUM(2.5/LOG(0.25*((UNIX_TIMESTAMP()-n.created)/2592000)+1.5)-1) AS score ".
"FROM {node} d ".
" INNER JOIN {category} c ON c.cid = d.nid ".
" INNER JOIN {category_node} cn ON cn.cid = c.cid ".
" INNER JOIN {node} n ON n.nid = cn.nid ".
"WHERE c.cnid = %d AND n.status = 1 ".
"GROUP BY d.nid, d.title, c.description ".
"ORDER BY score DESC";

$count_sql = "SELECT COUNT(*) FROM {category} c WHERE c.cnid = $container_id";

$result = pager_query($sql, 30, 0, $count_sql, $container_id);

while ($category = db_fetch_object($result)) {
$items[] = array(
l($category->title, 'node/'. $category->nid,
array('title' => $category->description)),
$category->count,
round($category->score, 1),
t('%time ago', array('%time' => format_interval(time() - $category->updated, 3))),
);
}

print theme('table', array('tag', 'count', 'score', 'last update'), $items);
print theme('pager');
?>

How it works

The list is sorted in order of popularity using an inverse-logarithm scoring method I developed myself. Basically, the most recently used terms get a very high score that tapers off to a relatively low score once the post is a year old or older. Each use of a term is cumulative so the more commonly used terms will appear higher on the list even if they haven't been used as recently as others. This list will sort itself as time goes on according to whatever my latest posts are about.

Originally, when I used the built-in taxonomy module of Drupal, I used this SQL query to do the calculation:

SELECT d.tid, d.name, d.description, 
    MAX(n.created) AS updated, 
    COUNT(*) AS count, 
    SUM(2.5/LOG(0.25*((UNIX_TIMESTAMP() 
        - n.created)/2592000)+1.5)-1) AS score 
FROM {term_data} d 
    INNER JOIN term_node USING (tid) 
    INNER JOIN node n USING (nid) 
WHERE d.vid = ? 
    AND n.status = 1 
GROUP BY d.tid, d.name, d.description 
ORDER BY score DESC

However, since I recently switched to the Category module, I updated the SQL to reflect this with:

SELECT d.nid, d.title, c.description,
       MAX(n.created) AS updated,
       COUNT(*) AS count,
       SUM(2.5/LOG(0.25*((UNIX_TIMESTAMP()-n.created)/2592000)+1.5)-1) AS score
FROM {node} d
     INNER JOIN {category} c ON c.cid = d.nid
     INNER JOIN {category_node} cn ON cn.cid = c.cid
     INNER JOIN {node} n ON n.nid = cn.nid
WHERE c.cnid = ? AND n.status = 1
GROUP BY d.nid, d.title, c.description
ORDER BY score DESC

Here's the function describing the math involved. I developed this function for calculating the score by arbitrarily modifying the log curve to suit my needs.

s equals the sum iterating over eye from one to en of two-point-five over one-point-five plus em log zero-point-two-five quantity minus one

In the equation, s represents the final score, n is the number of terms and i is the iterator over the terms. The mi represents the number of months since the creation of the post the ith use of the term belongs to. Since Drupal stores time in seconds since the epoch (i.e., January 1, 1970), the current time is calculated by subtracting the node's creation time from the current time and then dividing by 2,592,000, which is the number of seconds in a month.

If you view a curve plot for an individual iteration, you would note that when the delta (mi) is 0, the score for that term will be around 5.2. When the delta is 12, the score is about 1.

The last few months have been interesting. I've started a new job building the web site of Boomer.com. Part of this process has been reacclimating myself to Java. My language of preference is Perl and I'm still working on a few projects at home involving Perl. In the last week, we've made some decisions in which it appears as if we'll be using PHP here for some of our development as well. From these experiences, I've found that Java actually cooperates pretty nicely with these two scripting languages.

In Perl, I've written a library to connect with a Java library I'm using at work, named Java::JCR. Rather than porting an implementation of the JCR to Perl, I've actually hooked the JCR libraries from Perl. Using a tool named Inline::Java, the Perl interpreter spawns a JVM to run the Java code and then communicates with that JVM to run the necessary code. This actually works very well. After getting past the first few hurdles of getting the interface built, date conversions handled, and exception handling worked out, a user of the Java::JCR library can use a JCR implementation without worrying about any of the nuts and bolts of the interface. I'm very happy with the Perl-to-Java interface here.

In PHP, we're looking into using Drupal to make up for some of the deficiencies in our web site design, particularly in the area of community collaboration. However, we're using a JCR-based ECM, Magnolia, to handle a lot of the document and publication management. We need communication between the collaboration side and the publication side of the web site. Much of that communication can be handled through the use of RSS and other syndication protocols or through the use of RPC. However, since both of these involve serialization of data across a network connection, they can slow things down. Some features (particularly search, in our case), don't work very well this way. Therefore, we're looking at the PHP JavaBridge or Caucho Resin and Quercus to perform direct language communication between PHP and Java.

From my experience thus far, Java actually does a pretty decent job of communicating with other languages—either by RPC or even directly with inter-language communications. This involves some overhead of running both an interpreter/VM for the second language in addition to at least one JVM, but it seems to work pretty. I don't know that I can really extrapolate out to other languages in general, but from my experience and research so far, PHP and Perl both seem to work very well with Java. Since each language has it's own pros and cons, this allows me to consume pros from multiple languages while avoiding a few more of the cons.

Anyway, this has been a cool enough experience that I thought I'd share it.

Cheers.

In a recent discussion on the Magnolia developers list, Nicolas referred to a post discussing the negative side-effects of using JDK 1.5 to code readability. This pretty well demonstrates one of the reasons Java bugs: the developers want a language that guarantees code readability.

I'm all for code readability, but Java programmers seem to have an irrational fear of giving out tools that could be used for evil. This is a pretty irrational fear since Java already has features like the reflection API, which is like a programmer's version of the nuclear option, one mistep and you've blown up half the planet and it makes some of the worst code I've ever seen.

Anyway, David Flanagan points out that the new import static can make your code less readable. He contrives a couple examples of poor uses of the new construct and warns folks. Now, I want to say that Flanagan has it correct, "This isn't meant as criticism of static import; just an exploration of it." My criticism is of Nicolas and others that make statements like, "I would tend to think it's better to stay with 1.4, especially as the code is easier to read. (in my own opinion). [sic]" JDK 1.5 is not any less readable than JDK 1.4. That's like saying British English is less readable than American English, it's not a valid comparison. It's the usage that matters.

If someone engages in irresponsible coding via the static import mechanism, is this any worse than some of the issues one can have with regular imports? For example, look at:

import com.boomer.mgnl.firmadmin.User;

// Later...
public info.magnolia.mgnl.user.external.bean.User copyToExternalUser(User firmUser) {
info.magnolia.user.external.bean.User user = new info.magnolia.user.external.bean.User();
user.setFirstName(firmUser.getFirstName());
return user;
}

I have some of that crap in my code because User was used to name an interface in one package, a bean implementation in another, and an extension of that bean in another, and then used in a totally different module to connect to external users in another place. This is an example of poor choice in names. If your application is going to have multiple user classes that might be used in the same object now or in the future, name them something different from each other! The code above works in JDK 1.4, so JDK 1.4 code can also be hard to read.

Thus, the point for any programmer in any language, is think when you write your code. You can write horrible code in even the simplest language with the best enforced rules and weakest capabilities. Adding powerful tools that can be abused does not mean that the code will suddenly become unreadable, it just means that you've got a few extra pitfalls that you need to put railings around (i.e., corporate/project policies to make sure no one does the stupid). And then, you have to have code reviews and enforce these policies on yourselves or via a community.

I hope that wasn't too much of a rant and I hope my ramble didn't negate what I said too much. Cheers.

I'm currently working on the last minute preparations for the launch of a new web site for Boomer.com. We are using a pretty decent piece of software for the side, called Magnolia. However, one issue in the Magnolia source that's been bugging me is the use of java.lang.Exception in many throws/catch clauses.

This is a pretty nasty habit that Java developers should avoid. Often when building code quickly or when you're making a lot of changes, it may be tempting to engage in this habit, but it's a fairly bad idea. Let me enumerate:

  1. Catching Exception may catch errors you aren't really anticipating and handle them in ways thare aren't correct or mask them from appearing higher up in the call stack where they belong. For example, you may write a piece of coding expecting some kind of database errors that should be ignored, but you end up spending an hour ro two trying to track down a problem because something throws the infamous NPE, but you were ignoring it rather than coping with it, which led to some other problem in your code 500 lines later. Not cool.
  2. Catching Exception involves treating all errors the same, which is rarely the right way to handle problems. Generally, each exception should be handled differently. For example, a database connection problem shouldn't be treated the same as a failure to return an expected result. On the web, the former might result in a 500 Internal Server Error and the latter might be better described as a 404 File Not Found error. Those are very different responses.
  3. Catching Exception rather than the specific exceptions masks changes to code that will affect other parts of the code. For example, if I write a piece of code that throws errors that are generally ignored, but later come back and add another function call which adds a new exception that really needs to be handled by the caller in the 4 or 5 places the method is used, I might not catch all 4 or 5 places if my throws clause just contains Exception. This isn't smart use of the static checker to help me find places where a code change might propogate elsewhere.
  4. Catching specific exceptions helps to make code self-documenting and easier to read. If all the code just catches Exception or states a throws clause of Exception, that's a really poor indicator of the kinds of errors one can expect from that method. If you also don't include any JavaDoc or wait until later to add your docs, it may take a lot of deep looking to find out what errors are really possible.

If you are a Java programmer, you should avoid this trap. No matter how tempting it is, even on code you plan to use just a few times and then throw away, you ought to think about this carefully. There've been far too many times in my programming experience where a little bit of temporary throw away code suddenly became permanent because I found it more useful than anticipated or project goals changed. I don't want to write some quickie code that I then have to go back and rewrite because I hacked it together with a bunch of bad habits, if I end up reusing it. I expect most programmers don't.

Anyway, don't code stupid. You don't want to be featured on The Daily WTF. Cheers.

I was interested in finding out what the latest changes were to the Maven2 plugin for Eclipse, so I hopped on over to take a look and, as most teeny OSS projects, there's no properly documented changelog, but it does have a source code repository. Most contain something almost as good. I ended up following the link to the VCS leading here, which suddenly dazzled my eyes with a very cool tool for viewing a source tree. They used a tool from Cenqua called fisheye.

Now, I'm not a big fan of the look and feel, but the features provided by are very nice. It includes graphs for monitoring code size. It includes annotations and version numbers per line when you view an individual file so you can note how old a particular set of changes is or who made them. One graph shows the number of lines currently selected text files have contained over the history of a project. The coolest chart I've seen so far is a chart showing branches and tags that have been made for a given file.

Just on first look, this tool looks pretty cool. They also have a pretty nice tour, where they show it off via Ant's repository.

Cheers.

1

Tags

Find recent content on the main index or look in the archives to find all content.