Category Archives: Software Development

Posts about software development. Generally I use Java, PHP, and Python for development but occasionally I delve into other things as well.

Javascript matching all characters including new line

In most languages you can add a modifier to the regex function to change the behavior of the ‘.’ operator so that it matches all characters including new lines (by default it matches everything except new lines). Javascript doesn’t!

This post shows a nifty solution to this problem:
http://siphon9.net/loune/2011/02/match-any-character-including-new-line-in-javascript-regexp/

EDIT:

Turns out this trick doesn’t work in IE. Here is another one that is supported by IE and apparently all other browsers: Use [\s\S] instead.

http://simonwillison.net/2004/Sep/20/newlines/

Adobe CQ5 Developer Training

I just spent the past week in a developer training course for Adobe Communiqué 5.4 – a content management system on steroids. I thought I’d jot down some of my thoughts while they’re fresh in my mind.

CQ5 is a Java based CMS that is built around the JCR-283 (Java Content Repository) spec which essentially defines a sophisticated object database that is indexed by Lucene for easy searching and cross-referencing of objects. CQ5’s JCR implementation is called CRX, but there is also an open source reference implementation named Apache Jackrabbit if you have an allergy to commercial software.

It is not entirely correct to call the JCR an object database as it isn’t used to store Java objects directly – but the fact that it defines a tree of nodes and that all content is stored and accessed in a hierarchical fashion makes its use very similar to that of an object database. As such, it is natural to draw comparisons with Zope and its object database, the ZODB.

JCR vs ZODB

Zope, a python-based application framework, is radically different than the traditional relationship database model of web application development. The ability to store Python objects directly in the database and have them indexed solved many development problems, but it also created a few problems that would make maintenance of an ever-changing web application more difficult. Namely:

  1. When you make changes to a class, it can break all of the existing objects of that class in the database (you need to run a migration).
  2. If you try to load an object whose class definition can’t be found the system barfs.

This problem of class versions, managing upgrades of content types etc.. , was the single biggest problem with devleoping on Zope – and while I’m sure that there are best practices to work around this problem, I believe that the JCR solution of of storing content nodes but not actual objects is a much cleaner way of handling content.

The JCR stores a tree of content nodes, each of which have properties and their own child nodes. These structures translate well to different formats like XML (so you can dump entire branches of the repository as XML) and JSON – not so with a pure object database like the ZODB whose structures can be far more complex and include dependencies to classes. Data in the JCR can always be browsed independent of the component libraries which may be loaded into the system. You can browse the repository using WebDAV, the web-based content explorer that is built into CRX (the JCR implementation that is packaged with CQ5), or using CRXDE (the Eclipse-based development environment that is freely available to developers).

You can still define custom node types for your repository but this would merely dictate the name of the node type and perhaps which properties are required.

So, at first glance, this seems like a very stable base upon which to build web applications.

The Stack

The CQ5 stack looks like this:

  • WCM – The web content management layer consisting of a bunch of flashy UI components built using the ExtJS javascript library. (this part is proprietary).
  • Sling – HTTP server that makes it easy to read and write from the repository using HTTP requests. Very slick (this part is open source).
  • CRX – The content repository itself. Handles all permissions, storage, replication, etc… This part is proprietary. It performs the same function as Apache Jackrabbit, but includes a number of enterprise level improvements including a more powerful security model (I am told).

Author & Publish Deployment Instances

The recommended deployment is to have separate author and publish environments each running their own stack, and use the built-in replication feature to propagate authors’ changes to the publish instance whenever a piece of content is activated. This functionality, luckily, has been streamlined to hide most of the complexity. Workflow is built-in to allow you to activate each piece of content individually. Activation automatically triggers replication to the publish instance(s). This model seems to be very well suited to websites with few authors and many public viewers. It is scalable also, as you can add as many publish instances as you want to share the load.

This standard flow control (replicating changes from the author instance to the publish instances) leads me to wonder about cases where you do want the public to be able to interact with your site (e.g. through comments). We didn’t get into this scenario very much in the training, but, as I understand it, any content posted to the publish instance will go into an “outbox” for that instance that will be replicated to the author instances and await approval. They will then be re-replicated back to the publish instances once approved.

Security Model

The security model is quite different than that of most systems. Rather than having security attached to content types (because there are no content types) like with a relational database, or defining a large set of permissions corresponding to each possible action in the system as Zope does, security is 100% attached to the nodes themselves. Each node in the JCR includes an ACL (access control list) which maps only a small set of permissions to each user. There are only a few possible permissions that can be assigned or denied on each node. Basically it boils down to permission to read, write, delete, create new, set permissions, and get permissions on a node level. If there are no permissions assigned to a user on a particular node, then it will use permissions from the node’s parent.

One implication of this security model is that you must pay attention to the content hierarchy when developing applications. You cannot treat this like a relational database!

This is important. I suspect that many developers coming from a relational database background will be tempted to try merge the best of both worlds and try to create pseudo-content types in the system. After-all, all properties in the JCR are indexed, so you could easily just add a property called ‘contentType’ to your nodes to identify them as a particular content type, then build functionality that allows users to add instances of this content type. You could then create view templates that aggregate these content types to treat them as a table. You could do this, but you must be aware that you don’t have the same level of control that you have in a relational database system over what a user can do with your content types.

If you are querying the repository solely based on a property on a node – and not based on the path, then you may be surprised by the results that you obtain. At the very least, the JCR security model, despite appearing to be simple, is actually far more difficult to implement than its relational cousin – when trying to imitate the functionality of a relational database. You cannot control what properties are added to every node in the repository so querying based on property values may produce undesirable results. Instead you have to fully embrace the hierarchical model of data step very carefully when you try to import concepts from other paradigms as they could cause you to inadvertently introduce holes.

Custom Content Types (Sort of)

While CQ doesn’t have custom content types, it does allow you to map content nodes to a set of rendering scripts which produces something very much likc a content type. By setting the “sling:resourceType” property on a node to the path to a “component” that you develop, you can dictate where CQ looks for scripts that are used to render the node when requests are made. Components can be either “page” components, which represent an entire page, or regular components, which are included inside a page.

You can register page components to show up in the list of types of pages that can be added by authors when they add a new page to the system. Similarly you can register your regular components to show up in the “sidekick” (i.e. component palette) for authors when they are editing a page, so that it can be dragged onto a page. You can define which types of components are allowed to be parents or children of other components, and you can define which parts of site are allowed to have a particular component types added.

The Component Hierarchy

You can also define a “resourceSuperType” for components to allow them to inherit from other components in the system. This is handy for code reuse as there are hundreds or thousands of existing components that can be overridden or extended. We ran through several exercises creating and extending components. I’m satisfied that this process is not difficult and quite powerful.

Component Dialogs

A component without a dialog is really a lame duck. Users (especially authors) need to be able to interact with your components. E.g. if you create a photo album component, you need to allow your user to add photos to it. Adding dialogs is not difficult but I suspect that the development process is slated for improvements and more automation for future releases. The dialog forms are created entirely by creating appropriately named subtrees under your component’s node. E.g. you would create a child node of a particular type named “dialog”, which contains a child node named “items”, which contains a subnode named “tabs”, etc… 6 or 7 layers deep.

Each tab, each widget, each panel, is represented by a node in the repository. This is clever but somewhat tedious. It is like building a UI using only the UI hierarchy tree in the left panel of the IDE without the visual editor. I suspect that future versions will probably include a proper WYSIWYG UI editor for developing these dialogs but for now this manual system will have to do.

Despite the tediousness of the process, in the scheme of things it is still quite efficient. In only a few minutes you can produce a multi-tab, multi-field UI with rich widgets that allows your users to add and edit a myriad of content types on your site.

TestDisk a Nifty Utility for fixing drives with bad boot sectors

Just ran into an interesting problem with an external hard drive that was being used as a time machine backup for laptop. Someone tried to connect this drive to their windows machine and it evidently screwed up the boot bits so not only would windows not recognize it, Macs wouldn’t recognize the disk either.

Tried running it through Disk Utility but received a message saying “Disk cannot be repaired.”

So I loaded up TestDisk and took it for a spin. Here is a photo gallery outlining the steps that I took.

CentOS and old versions of PHP

When you ask for someone to set you up with an install of the latest CentOS, you’re still going to be stuck with an old version of PHP (version 5.1.6) which is missing many, many useful features, and includes a few bugs. Then salt is applied to the wound when you try to update PHP in yum and discover that 5.1.6 is the latest in their repository.

Here is a post that explains the easy way to update to PHP 5.2.x, without having to build from source.

http://www.freshblurbs.com/install-php-5-2-centos-5-2-using-yum

It relies on this third party Yum repository that contains a more up-to-date version of PHP and its extensions.
http://www.jasonlitka.com/yum-repository/

Getting Serious about Data Redundancy

Reluctantly, I have had to assume the role of “server guy” with my translation company. I generally prefer to focus on the creative side of web application development, but I’m not naive enough to think that server backups and security can be completely ignored… so it falls to me to make sure that we are prepared for a catastrophe or any kind. This weekend I spent some time reviewing our current situation and implementing improvements.

In reviewing our backup strategy we must first consider what type of catastrophes we want to be prepared for. Some possible problems we might face include:

1. The site could be hacked and data corrupted or deleted.
2. We could experience hardware failure (e.g. hard drive could die or server could konk out).
3. We could face a major regional distaster like the earthquake/tsunami that hit Japan recently.

We also need to consider our tolerance to down time and frequency of database changes. E.g. some simple backup strategies might involve an off-site copy of the files and database off-site so that you can retrieve them if necessary. But this strategy, for larger setups may take upwards of 24 hours to get back only in the case of a failure (re-uploading the data, setting up the server configuration, etc..). If you’re working in an environment where even a few minutes of down-time is a problem, then you would need to develop a strategy that will allow you to be back online much faster.

Similarly, if you are only backing up once every 24 hours, you could potentially be losing 24 hours worth of user updates if you had to revert to a backup.

In our case we are running a 2-tier backup strategy:

1. Hot-backup: This is a backup that is always sychronized to the live site so that it can be brought online with the flip of a switch.
2. Archived Backup: The focus of this back-up is to be able to withstand a regional catastrophe, or be able to revert to a previous version of the data in case of corruption that has affected the hot-backup.

Hot Backup Strategy

For the hot backup we are using MySQL replication to run a slave server that is always in sync with the master. This is useful for 2 reasons:

1. If there is a failure on the master’s hard drive, then we can switch over to the slave without any down-time or loss of data.
2. If we need to produce a snapshot of the data (and shut down the server temporarily) it is easier to work off of this slave so that the live site never needs to be turned off for backup maintenance.

Archived Backup Strategy

We are running our more critical sites on Amazon’s Elastic Computing Cloud service (EC2) because of its ease of scalability and redundancy. We are using the Elastic Block Storage (EBS) for the file systems which store both the application files and the database data. This makes it easy for us to take snapshots of the drives at any point in time. For our database backups, we periodically take a snapshot of the EBS volume containing our database data. (First we set a read lock on the database and records the master status so we know exactly which point in the binary log this snapshot refers to). If there is a failure at any point in time, we just load in the most recent snapshot, then rebuild the data incrementally using the binary log.

The snapshot feature of Amazon for EBS is a real life saver. Actually copying the data when you’re talking hundreds of gigabytes is quite time consuming. With EBS, however is only takes a minute or two as it uses a clever incremental scheme for producing the snapshot. This is one of the key reasons why I’m opting to use EC2 for our critical sites.

Just in Case

Amazon claims to have redundant backups of all of our snapshots and distributed across different data centres…. but just in, case, I like to have a local backup available for our purposes. So I use rsync to perform a daily backup to a local hard drive. Hopefully I never need to use this backup, but it is there just in case.

We can always get better…

This backup strategy helps me sleep at night but there are still somethings about it that could be improved. Our database backups are now pretty rock solid – as we could recover from nearly any failure without experiencing any data loss using a combination of a snapshot and the binary log. However for the file system we don’t have the equivalent of a binary log to be able to rebuild the build system from the most recent snapshot. I know that this can be achieved, and more seasoned “server people” probably think this is a no-brainer, but I’m a software guy, not a server guy so go easy ….

Adventures with KeepAlive Slowing down PHP

Disclaimer: To my regular readers (if there are any) this post is entirely technical in nature so if you’re not looking for the solution to this particular problem (e.g. from Google) you’re probably not interested in this post.

I run a few sites that use PHP as a reverse proxy for other sites. The PHP script loads a specific webpage, then sub-requests for resources like stylesheets, images, and javascripts, are also processed by the same PHP script. For most of these other resources the script simply passes the content through unchanged.

Problem:

When a page is loaded that includes a large amount of supporting resources like CSS and javascript files, it seems to take a long time to load some of the CSS and javascript files. Some of the scripts were taking upwards of 15 seconds to load when loaded as part of the page (although if loaded on their own through the PHP script, they load instantly).

Short Term Solution

The investigation of the problem determined that the problem lies in the KeepAlive setting in Apache. Turning Keep Alive off fixes the issue and yields excellent load times consistently for all resources. But WHY??

Long Term Solution

Keep Alive is generally a good thing and it is supposed to boost performance. So why is it a bad thing in this context? (I really don’t know the answer yet). I would like to find a PHP solution to this problem that will work with KeepAlive, but I can only speculate at this point as to why this might be happening.

Possible reasons that I’ve considered:

  1. I need to do something more than “exit” at the end of the PHP script to declare that I’m done with the request – as Apache could be keeping the connection open for longer than necessary and preventing subsequent requests from happening.
  2. Perhaps there is some setting that I am unaware of in PHP that limits the number of simultaneous requests it can handle from the same client… (I find this unlikely – at least at the level that we’re talking about here).

Edit:

I found that simply adding the ‘Connection:close’ header in the PHP script resolves this issue.
But i’m still not satisfied. This shouldn’t be necessary IMHO. If you set the Connection-Length properly then flush that content to the browser, there’s no reason why PHP should be making the browser wait around indefinitely.

So maybe we’re missing a few performance points here – but at least it’s not hanging like it was.

How I use the iPad

It’s been close to a year since I purchased my iPad, so I thought I’d post a short piece reflecting upon how I have ended up using it.

Things I tried that didn’t really work:

1. Word processing. – The keyboard is OK for typing short little things but it is just too cumbersome to try to write anything substantial. I think that even if I had the bluetooth keyboard it still wouldn’t be a good solution for word processing. Selecting text, copying and pasting, and even just trying to insert text at a different position in the document is just too difficult at this time.

2. Note taking – I tried taking the iPad to a few meetings for the purpose of note taking. It was a novelty but ultimately it is just easier to take notes on paper and then transcribe them on the laptop later.

Things I tried that really work:

  1. Reading the news paper. I use the Pressreader app which gives me access to 1700 newspapers from around the world for $30 per month. I generally read through the Vancouver Sun, Province, and 24 H, and the Washington Post every morning before I go to work.
  2. Reading books. (on the Kindle app – not iBooks)
  3. Watching Movies in bed. I use Web Lite TV to be able to stream my entire iTunes movie collection to my iPad. This effectively turns the iPad into my bedroom TV. I also use the Netflix app which works quite well when I want to watch a movie or TV show that I don’t currently have in my personal collection.
  4. Facebook in bed. Rather than use a Facebook App, I just added the Facebook site to my home screen (so it gets an icon and all and acts like an app). I much prefer checking facebook on my iPad to logging in on my computer.
  5. Reading Email in bed – I check my email from bed on the iPad. This allows me to make a mental inventory of things that I need to reply to. Short replies I will make on the iPad directly, but generally I’ll go to a computer to write more detailed replies.
  6. Browsing the web in bed. It’s just easier to use an iPad than a laptop in bed. And it’s pretty damn easy to browse the web on the iPad.

The iPad has secured a permanent place in my life – though it hasn’t threatened to replace the laptop in the foreseeable future. Of course, this is what Apple intended when they designed it.

Why do you need an app for that?

I have recently stopped using the Facebook and Mail apps for my iPad and iPhone in favour of the HTML equivalents offered through Safari. The reasons: Mobile-optimized web applications are now as good, or even better in many cases, as their native app equivalents. In the case of the Mail app, I have found that the gmail mobile version has a superior search feature and is much faster in loading my messages. For facebook, I just found the app limited and more buggy than the HTML equivalent.

In general, I’ve come to the conclusion that if you’re going to build a native app for something, you’d better have a good reason – and good reasons are becoming fewer as HTML browsers improve their support for HTML5. The only valid reason at this point is a requirement for significant client-side processing – e.g. a 3-D game. But as WebGL matures even this will be quite possible in HTML.

So here are some reasons for developing your mobile applications in HTML5:

  1. Increased productivity in most cases over development of native apps.
  2. Real standards-based, cross-platform support so you can write it once and have it work on all major platforms (Android, iPhone, Blackberry, etc…).
  3. True freedom of deployment. You are not locked into Apple’s (or the next would-be gatekeeper’s) store or subject to their sovereign choice of what can and cannot be installed.

The remaining reasons why you may still need to develop a native app:

  1. You want to use a platform specific feature that isn’t available through the web api. E.g. the accelerometer, or the contacts list, or push notifications, GPS, etc..
  2. You need more client-side processing power than you can harness through the web.

The compelling reasons to want to develop a native app:

  1. To get it into the app store and maybe make some money.
  2. Toolkits (e.g. XCode/Interface builder) are developed and promoted specifically for making apps target a specific platform. This can make it seem easier to break into the market since there are lots of learning resources on how to use these tools.

The biggest challenge right now facing HTML mobile developers is that the tools are less promoted and more scattered. This is because the major stakeholders (e.g. Apple) have a significant interest in getting you to develop your app natively so that it will be exclusively available for their platform, and they will get a cut of any revenue through their store model. If you look, however, there are tools out there for building slick HTML mobile apps that look and feel very much like native apps. Some examples include:

  1. Sencha Touch
  2. jqTouch

And there are more where those came from. If you still want to develop a native app and you would prefer to work with open technologies like HTML and Javascript you may want to look at Appcelerator which provides tools to develop applications in Javascript, HTML, and CSS that can be compiled into native apps all the major smart phones (e.g. Android, iOS).

.NET: The most unlikely savior of Desktop Java on the Mac

When Microsoft first introduced the .NET platform and its flagship programming language C#, I, like many Java developers, looked at it and said “They’re just copying Java”. Why would I want to develop on the .NET platform which is targeted exclusively at Windows, when I could use the more established Java language and deploy across all platforms including Mac and Linux?

When I heard rumblings of the open source version of .NET, called Mono being developed, I was curious but ultimately wrote it off as it was likely that, since .NET was married to Windows in so many ways, probably most of the libraries for .NET would be platform specific and wouldn’t run properly on Mono anyways.

The past 10 years have seen many new trends come and go, and quite a few shifts in mindshare between the different technologies. One stream of trends that I have been quite interested in as a Java developer and a Mac user, is the role of Java on the Mac.

When Apple first unveiled OS X, Java was at the center of it. They promoted Mac as an excellent platform for Java developers to deploy their programs. They pledged to provide a Mac-specific implementation of Java that would blend into the OS and work seamlessly with the slick new look and feel of Mac. Two key pieces of this puzzle were the Swing Aqua look and feel and the Java Cocoa Bindings. The Swing Aqua look and feel allowed all Swing (the default java UI toolkit) programs to look like native programs on the mac. The Java Cocoa bindings allowed even deeper integration by allowing Java programs to use the native Objective-C classes and widgets directly.

Fast forward to 2011. If you try a google search for Desktop Java on the Mac or related terms you’ll notice that there are lots of articles, tutorials, and documents from the period ranging from 1997 to 2005. But very little after that. This is due to a number of trends and developments during that timespan. Some of these include:

  1. Apple deprecating the Cocoa Java bridge. (Turns out that is wasn’t used very much anyways because Java developers could achieve an almost native look and feel using Swing and keep cross-platform compability).
  2. Java mindshare had moved predominantly to server-side technologies.
  3. The emergence of higher-productivity interpreted dynamic languages like Ruby and Python had stolen a lot of mindshare on the desktop.
  4. Objective-C through the introduction of the iPhone and iPod had drastically increased in mindshare – so more developers were familiar with the native tools and would forego trying to work with a language such as java for native applications.
  5. Sun seemed to be confused as to which direction it wanted to go – and years of progress were lost (* more on this later).

Nonetheless, during the period of 2005 to present, there have still been some good options for producing high quality desktop applications for the mac. After all it was the only OS with a large user base that shipped with Java. This meant that you could develop for Java and be certain that the target users would be able to use your software without having to download anything extra. Swing still has the Aqua look and feel, and all of those tools and widgets that were developed pre-2005 still worked nicely (except of course those tools that were built upon the Cocoa-Java bridge).

Unfortunately the writing was on the wall and Apple made it official in October 2010 when it announced that it would be deprecating Java on the mac and that future versions of the operating system would not ship with it. It would be up to the open source community and Java’s owner Oracle to provide a Java for the future of the Mac (and this future is still very much unfolding as I type).

So now, at a time when the future of Java on the Mac is as bleak as ever, an unlikely ally enters the fray: .NET – or rather its open source twin, Mono.

Mono has quietly been picking up a following over the past 10 years. It reached 1.0 status in 2004, and has facilitated the development of 2 separate projects, which, together appear to offer the best hope for the future of Java on the Mac:

  1. IKVM.NET – A Java virtual machine that runs on .NET and Mono. This tool is able to run Java byte code in Mono and use java libraries natively. It also includes tools to statically compile java as a .NET executable which can be used in .NET or Mono applications. This has opened many doors to both C# and Java allowing libraries developed with java to be quickly compiled a distributed for .NET (e.g. Apache’s PDFBox which is developed in Java but available for both .NET and Java).

  2. The Mono Mac project – An objective C binding that allows C# code to directly access the Mac Cocoa classes. The current versions of MonoDevelop (the Open Source Mono IDE) work seamlessly with Apple’s developer tools, especially interface builder so that developing and deploying an application for the Mac using C# is a first class experience.

These two projects together open up a myriad of possibilities for Java on the mac that haven’t been available since the deprecation of the Cocoa-Java bridge. If you have a large existing source base of java code and libraries, you can quite easily now compile them into a Mono library that can be used in a Mono Mac application – and then deploy it natively on the Mac and even distribute your applications in the Mac App store.

And this is how a Windows-Only competitor of Java has evolved into an unlikely ally in staying relevent on the Mac platform.

How to get a Java Application into the Mac App Store

I just received a reply from Apple support confirming that there is nothing in their guidelines that prohibits apps that include an embedded Java virtual machine. Therefore, Apple’s regrettable decision to deprecate Java and disallow apps that depend upon it from entry into their new App store is a mere speed bump and not the road block that I had originally believed it to be. So there are a number of strategies for getting Java into the Mac app store: 1. Embed a java runtime environment as part of the application bundle. Currently there are no Mac JVMs that can be distributed which support Swing with the Quartz UI (i.e. graphical Swing apps won’t look native, if you can even get them to work). However Oracle and Apple’s recent announcement that Apple will be donating source code to the OpenJDK project suggests that this will be resolved with Oracle’s release of JDK7 on the Mac. In the mean time it would be possible to embed third party JVM like Soy Latte or IKVM (a .Net implementation of java) and use Rococoa as the UI. 2.** Use GCJ (The GNU Java Compiler)** which can be installed via Mac Ports, in conjunction with Rococoa for the UI to compile your application into a native binary.

Resources

  • Mono for Mac Resource: This page shows some resources related to Mono support on the Mac. Why is this relevant? Because Mono is the open source, cross-platform implementation of the .Net framework and it supports IKVM, a fully-functional java virtual machine that runs on Mono. IKVM will allow you to convert your java applications into .exe files that will run on Mono, which can be embedded in your application bundle.
  • Example using Rococoa to use Quartz framework from java
  • Rococoa GUI Demo : This tutorial shows a short sample of how to use Interface builder to build an interface for your java application using Rococoa.