Category Archives: Software Development

Posts about software development. Generally I use Java, PHP, and Python for development but occasionally I delve into other things as well.

Using sed to convert PHP short open tags to proper tags

In the good ol’ days of PHP is was quite accepted to just use <? to signify the start of PHP code. PHP installs by default would accept this short notation so you could safely distribute scripts that made use of it. However, in the days of XML, this short notation is a little more dangerous, so PHP now comes with short_open_tags turned Off.

This is a short little command line snippet that uses sed to convert all of the php files in a specified directory from the old short tags, to the proper and full <?php tags:

for file in *.php
do
sed -i 's/<?$/<?php/' $file
sed -i 's/<? /<?php /' $file
done

Enjoy!

Amazon EC2 Error: Client.AuthFailure: AWS was not able to validate the provided access credentials

This is just a quick note to help me (and others) in case I run into the same problem again. I was setting up Amazon EC2 following This tutorial. When it came time to test my instances:

powerbook-2:~ shannah$ ec2-describe-images -o self -o amazon
Client.AuthFailure: AWS was not able to validate the provided access credentials

The problem was that I had to make the permissions on my private key and cert files to be more restrictive so that only I could read them.
i.e.

chmod 600 chmod 600 pk-7HIOWAY3SOPV7G2LQLMMXEKAH5CKJ5HY.pem
chmod 600 cert-7HIOWAY3SOPV7G2LQLMMXEKAH5CKJ5HY.pem

That fixed it.

PHP/MySQL Time zone design pattern

Suppose you have a PHP/MySQL application that you want to adapt to work across multiple timezones. I.e. Users from Toronto should see all dates and times in Eastern time while Vancouver users should see times in Pacific time. On this surface this appears like a simple thing to do, but there are a number of pitfalls to watch out for.

For this design pattern we’ll assume that we can always obtain the user’s timezone by calling a function, say getUserTimezone() which would return something like Canada/Pacific.

The tricky part is making sure that different users can view and edit dates in their local timezone, while keeping the data consistent. For example, if Toronto Tony posts an event in our application, he can specify that it starts at 8pm (meaning 8pm Toronto time). When Vancouver Vinnie logs in, he sees this event, but the time says ‘5pm’ because he sees the event in Vancouver time.

Design Decisions

  1. How do we store the dates in the database? Do we store them in Toronto time? Vancouver time? Some other time? or does it matter?
  2. Both MySQL and PHP contain functions to convert dates between different timezones. Do we do the conversion in MySQL? In PHP? In both? Does it matter?

Our solution for the first design decision, is that it doesn’t matter how we store the dates in the database, as long as we are consistent. I.e. we must store ALL of the dates in Toronto time or ALL of the dates in Vancouver time, but we cannot store some in one timezone and some in another. When deciding on an appropriate default timezone for the database, I considered using GMT (Greenwich mean time) but it is probably better just to leave the server running the most intuitive time for the majority of users. If the server is physically located in Denver, why not just leave the times in Denver time. We will convert the times to and from the users’ timezones when we insert and retrieve the data from the database.

For the 2nd design decision,, it will be much easier to use the MySQL conversion functions than to use the PHP functions. This will allow us to simplify our handling of the timezone issue exclusively to our SQL queries and leave our PHP code largely alone.

Setting the Timezone in PHP

Inside our PHP script we can set the timezone for the current user by setting the TZ environment variable:

putenv('TZ='.getUserTimezone());

Once we have set the timezone, the PHP date function will properly convert timestamps to the user’s local timezone, and the strtotime function will propertly convert dates in the user’s local timezone back to timestamps.

Now we have 3 choices in how to store our dates in the database and interact with them.

Strategy 1: Store dates as timestamps

Since timestamps are not affected by timezones, we could just store all of our dates in INT columns as timestamps and convert them to dates using PHP. e.g.:

// inserting dates
$date = '2007-12-29 08:00:00';
mysql_query("insert into events ( ..., `startDate`, ...) values ( ..., '".strtotime($date)."', ...)");

// retrieving dates
$res = mysql_query("select `startDate` from events where ...");
list($date) = mysql_fetch_row($res);
$date = date('Y-m-d H:i:s', $date);  // convert the timestamp back to a date string

The main drawback to this strategy is style. Databases are equipped with DATE data types for a reason. By using INT datatypes to store our dates, we are missing out on some nice features for data manipulation etc… Plus it just doesn’t *feel* quite right to be using INT columns when we should be using DATETIME columns.

Strategy 2: Use FROM_UNIXTIME() and UNIX_TIMESTAMP() mysql functions

MySQL provides a some nice conversion functions that will allow us to work with timestamps yet still store our dates in proper DATE and DATETIME columns in the database. The FROM_UNIXTIME() function will convert a timestamp into a date string, and the UNIX_TIMESTAMP() function will convert a date string into a unix timestamp. So we can achieve the same as in Strategy 1 with the following:

// inserting dates
$date = '2007-12-29 08:00:00';
mysql_query("insert into events ( ..., `startDate`, ...) values ( ..., FROM_UNIXTIME('".strtotime($date)."'), ...)");

// retrieving dates
$res = mysql_query("select UNIX_TIMESTAMP(`startDate`) from events where ...");
list($date) = mysql_fetch_row($res);
$date = date('Y-m-d H:i:s', $date);  // convert the timestamp back to a date string

This strategy is a little bit better, in my opinion, because it uses the proper data types for the proper data. However it still requires that we use timestamps as an intermediary. In certain cases you may be unable to use timestamps (either because the rest of the system expects the SQL queries to return and accept date strings directly (not timestamps)), or because timestamps won’t handle the extreme dates in your application (e.g. timestamps are only valid for dates later than 1901 on most systems, and later that 1970 on some).

Strategy 3: Use the MySQL CONVERT_TZ() function

MySQL’s CONVERT_TZ() function will convert a date between two different timezones without requiring the intermediate step of converting to a timestamp. Theoretically, it should be able to accept input of the form:

CONVERT_TZ(`startDate`, 'Canada/Pacific', 'Canada/Eastern');

However if your server doesn’t have the timezone tables installed, then it won’t work correctly with these forms of timezones. The safest way seems to be to specify timezones in the form ‘+08:00’, meaning (8 hours behind GMT). e.g.

CONVERT_TZ(`startDate`, '+01:00', '-08:00');


This creates a small hurdle for us: How do we convert the user’s timezone (in the form ‘Canada/Pacific’) to an offset timezone like ‘+08:00’?

Here is a function that does just that:

/**
 * Returns the current timezone as an offset (e.g. +08:00) of GMT
 **/
utc_offset(){
        // Find the difference in seconds between GMT and local time.
        $diff = gmmktime(0,0,0,12,29,2007) - mktime(0,0,0,12,29,2007);
    $sign = ( ( $diff >= 0 ) ? '+' : '-');
    $diff = abs($diff);
    $hours = str_pad(strval(floor($diff/3600)), 2, '0',STR_PAD_LEFT);
    $minutes = str_pad(strval(floor($diff/60) % 60), 2, '0',STR_PAD_LEFT);
    
    return $sign.$hours.':'.$minutes;
    
}

So we can use our utc_offset() function to get the user’s timezone as an offset from GMT. We can pass this as one of the parameters for CONVERT_TZ(). The other timezone will be the system timezone, which we can conveniently specify as ‘SYSTEM’. e.g.:

-- Convert from -08:00 to system time
CONVERT_TZ(`startDate`, '-08:00', 'SYSTEM')

-- Convert from system time back to -08:00 time.
CONVERT_TZ(`startDate`, 'SYSTEM','-08:00')

We now have all the tools we need to adapt our examples from strategies 1 and 2:

// inserting dates
$date = '2007-12-29 08:00:00';
mysql_query("insert into events ( ..., `startDate`, ...) values ( ..., CONVERT_TZ('$date','".utc_offset()."','SYSTEM'), ...)");

// retrieving dates
$res = mysql_query("select CONVERT_TZ(`startDate`,'SYSTEM','".utc_offset()."') as `startDate` from events where ...");
list($date) = mysql_fetch_row($res);
echo $date;  // 2007-12-29 08:00:00

I used this strategy to add timezone support to Xataface because I needed a solution that would allow me to change the mysql queries without altering any of the code them uses them. It seems to work quite well.

Happy hunting!

Giving PHPBB Another chance

I have been using Plone for my Dataface web site for the past couple of years. Finally, the slow performance, resource heaviness, and lack of a decent forum product forced me to move the site out of Plone and into PHP. This move was inevitable since Dataface itself is a PHP framework so it would only make sense to eventually host the site using Dataface.

When I first set up the site I wasn’t sure how it would be used mostly. I knew I would need a forum for support but I didn’t realize just how important the forum would be. Over 80 percent of traffic was directed to the forum, and Plone’s forum (Ploneboard) was barely up to the challenge as it was lacking quite a few standard forum features (like email notifications), and the community didn’t seem to be too interested in pushing the development along (there is no money in developing message forums).

With the new site, I had a few choices for the forum:
1. I could build my own using Dataface. This would be pretty easy to do, but I figured the time would be better spent on other things especially if I could be a pre-built forum with all the features ready to go.

2. I could use one of the 2 main commercial PHP bulletin board products (VBulletin, and Invision Power Board). These are definitely full featured, and the prices are pretty good, but I kind of wanted to stay open source through and through for this open source project.

3. I could use PHPBB.

PHPBB is the most well known open source PHP bulletin board systems and it has been around for a while. I have used it for web sites in the past, but have had bad experiences, as every one of my past PHPBB sites has been hacked repeatedly. I have heard that it has improved, however, and that the new version is much more cleanly designed. So I am reluctantly giving it another shot.

The Good …

Installation
PHPBB installed without a hitch. I was able to write a simple PHP script to import all of my old Ploneboard posts into PHPBB so that I wouldn’t have to lose the wealth of knowledge.
Performance
It is much faster than Plone (it has less to do), and the refined features (e.g. email notifications) that it has built up over the years should help the community to grow faster and stay stronger. Looking forward to it.

The Bad …

Spam
Within 8 hours of installing PHPBB, the board was being spammed. I found out that Guest posting is enabled by default, so I figured that one was my fault. I turned off guest posting. Within mere hours the board was receiving more spam. Even with PHPBB’s captcha text spammers seem to be able to create accounts and spam the board. This is a symptom of being popular, so spammers have spent lots of time writing bots specifically for this type of board.

There are MODs you can get to try to prevent spam even more, but I think I will rig up my own solutions so that the board can’t be spammed by standard PHPBB methods (they will at least have to come up with some custom spam stragegies for my site if they want to spam me).

Can’t rebulit search index
Because I imported all of my old posts directly into the database, PHPBB didn’t have a chance to index them for searching. So the search doesn’t work. The problem is that PHPBB doesn’t come with an administrator function to rebuild the search index. You have to install a MOD for this. IMHO such an integral part of a forum should be included as part of the main distribution.

The Ugly …

No Modules, only Modifications
One of the exciting things about using a product with a large community is that it has lots of add-ons available in the form of modules. PHPBB has an entire database of MODs that can be installed. The trouble is that in PHPBB, a MOD is a Modification, and not a Module. I.e. Installing a MOD entails following instructions to actually change the templates or source code of PHPBB. From a developer’s perspective this isn’t too good as it doesn’t take long before your forum is unmanageable and un-upgradeable.

I figured that such a large project surely would have developed a module system by now (so that you can install and uninstall modules without changing the main application. I figured wrong.

Summary

So far it doesn’t seem too bad. I’ll have to make a few customizations to shore up security and add a few features that I want, but all-in-all I’m pretty happy with what I see so far with PHPBB.

Plone: where our paths diverge

So here it is. The time has come for my path and Plone’s to diverge. I’m sure we’ll meet again sometime in the future as she has too much potential to shun completely. She showed up at my door adorned with diamonds, so who wouldn’t let her in. She held so much promise. There were times when I fantasized about handing Plone all of my responsibilities, but these dreams were dashed time and time again by tantrums, fights, and utter refusal to do the work at hand. Now that I have handed her her bag and coat, show shows glimmers, once again, of the grace and beauty that convinced me to let her in in the first place. I would reconsider the eviction notice except for the fact that she uses far too many resources and works far too slowly for an operation of my size.

To Plone:
Thank you for making it so easy for me to extract all of your data from you. You are being a little sticky with your passwords, but who wouldn’t be. (No worries… I have sent in a request to the powers that be to tell me how to extract those too… so watch out .. I will get them.

You may be wondering how I will be replacing someone with such a broad range of duties like yourself. I haven’t decided completely. You were primarily used for posting tutorials, news, and hosting a forum, so I will likely just replace you with PHPBB and some homegrown PHP CMS (or Drupal). In any case, whatever I use to replace you will use 50 times less memory and run 50 times faster. That’s really all it comes down to. I didn’t like using you, so how can I expect the other community members to like using you. One day I’ll check on you again to see if your performance has improved at all… but until then, I won’t be using you anymore if I can help it.

P.S.: I am still using you for some sites, but won’t be setting up any new sites with you for quite some time.

Compare the Fuel Economy Ratings of Cars on Craigslist

I was recently in the market for a new car, and I found myself constantly switching back and forth between Craigslist and fueleconomy.gov because I wasnted to see what the gas mileage was like on the cars that were for sale. Suffice to say, this was tedious. So I developed a tool that allows me to see both the Craigslist ads and the fuel economy ratings of the advertised cars in one place. The tool is available for free at http://fueleconomy.weblite.ca.

See the press release here.

Unique RSS Feed for Digg and Reddit

I discovered Digg and Reddit a few months ago and was immediately impressed by the quality and relevance of content contained therein. So I subscribed to their RSS feeds so that I could keep up to date with the latest internet news in my RSS reader.

For those unfamiliar with Digg and Reddit, they are web sites that allow users to rate other web sites and articles that are found on the internet. Users essentially vote for web sites and articles that they like and popular sites show up in the top 100 list.

This presents a problem for RSS feeds, however, since every time a ranking of an article is changed (e.g. it is ranked 21 instead of 22) the timestamp is updated, so it appears at the top of the RSS feed again. Needless to say it is quite annoying to finding 30 or 40 Digg and Reddit articles at the top of my news list every time I refresh my subscriptions.

How should it work?

Preferably when an article breaks into the top 25 or top 50, it will show up in the RSS feed – and will never show up again. Even if it rises to the top. I want my news feed to contain news – not “olds”.

Solution

I created a feed filter that takes RSS feeds and filters out duplicates. Even if the timestamp has changed, an item that has been loaded once, will never be loaded again.
You can find the tool at http://feedfilter.weblite.ca.

Now I can subscribe to Digg and Reddit without being bombarded by old news every time I refresh.

Clickable URLs with javascript

A client wants all email addresses and urls to be clickable in an application. I would like to develop a javascript solution for this.

I found this article that shows some example code for doing this on a particular piece of text, however, fixing the links on an entire page is a little bit more involved as it would require us to only convert the URLs that are static text – and leave the URLs, already inside an <a> tag, or appearing as tag attributes, alone.

My Solution


/**
 * A function that makes all static urls and email addresses in given DOM 
 * element's subtree to be converted to links.
 *
 * @param el DOM Element that we are checking
 *
 * This will recursively call itself until the entire subtree has been walked.
 * @author Steve Hannah .
 *
 * Some parts were adapted from http://www.arstdesign.com/articles/autolink.html
 */
function makeLinksClickable(el){

    //If no element is provided, then we will traverse the entire document.
    if ( el == null ) el = document.body;
    
    
    if ( el.nodeType == 3 /* text node */ ){
        // We only want to replace urls in text nodes
        var s = el.nodeValue;
        
        if ( s.replace(/^\s+|\s+$/g, '').length == 0) return;
            // If there is no text in this text node, we will just return and do
            // nothing.
        
        var hlink = /\b(ht|f)tp:\/\/([^ \,\;\:\!\)\(\"\'\< \>\f\n\r\t\v])+/g;
            // regex for URLs
        var mlink = /\b([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+\b/g;
            // regex for e-mail addresses
            // Copied from http://www.quirksmode.org/js/mailcheck.html
        
        var frag = document.createElement('span');
            // Because we are having difficulty putting just the A tag into
            // the document after conversion, we create a wrapper <span> tag
            // that will hold the resulting <a> tag.  Then we will 
            // insert the <span> tag into the document.

        
        // Replace URLs in this text node.
        var new_s = s.replace(hlink, 
                        function($0,$1,$2){ 
                            //s = $0.substring(1,$0.length); 
                                // remove trailing dots, if any
                            s = $0;
                            while (s.length>0 && s.charAt(s.length-1)=='.') 
                                s=s.substring(0,s.length-1);
                                // add hlink
                                                   
                            return " " + s.link(s); 
                        }
                     ) ;
                     
        // Replace Email addresses in this text node.
        new_s = new_s.replace(mlink,
                    function($0,$1,$2){ 
                            //s = $0.substring(1,$0.length); 
                                // remove trailing dots, if any
                            s = $0;
                            while (s.length>0 && s.charAt(s.length-1)=='.') 
                                s=s.substring(0,s.length-1);
                                // add hlink
                                                
                            return " " + s.link('mailto:'+s); 
                        }
                     ) ;
        
        if ( new_s != s ){
            // We only want to perform the switch if a change was made
            frag.innerHTML = new_s;
            
            var parent = el.parentNode;
            parent.replaceChild(frag, el);
            
        }
    } else {
        // This is not a text node, so we will loop through all of its child
        // nodes and recursively change the urls and email addresses in them.
        for ( var i=0, len=el.childNodes.length; i<len ; i++ ){
            if ( !el.childNodes[i] ) continue;
                // If for some reason this node is not set we skip it
            if ( el.childNodes[i].tagName == 'A' ) continue;
                // We don't want to change anything that is already inside
                // an <a> tag
            if ( el.childNodes[i].tagName == 'SCRIPT' ) continue;
                // We leave scripts alone because we don't want to screw up
                // any javascripts.
            
        
            makeLinksClickable(el.childNodes[i]);
                // Now we recursively call ourselves.
        }
    }

}


// Now we register this function to be called as soon as the document is finished
// loading.
registerOnloadHandler(makeLinksClickable);

But Safari has trouble with string.replace() callbacks:

Apparently Safari has some trouble with the string.replace() method when it comes to passing a function as the second parameter, as we have done in our solution above. See This post for information on how to beat this problem. Then we have a fully-workable, cross-browser solution for converting static URLs and email addresses to links using javascript.