Wednesday, December 5, 2012

Don't concatenate URL parts in your code directly

I know this might seem very basic, but I often see the following constructs in the code:

var url = base_url + url_part;

I recently found a whole bunch of these in my own code :O

This can easily break, since one doesn't know if the variables contain starting or trailing slashes. Relying on their presence or absence can be dangerous, especially if the parts come from configuration files or any other source where they were entered by the user.

Consider the following cases:
base_url url_part base_url + url_part
1. http://host/abc/ /def/ http://host/abc//def/
2. http://host/abc /def/ http://host/abc/def/
3. http://host/abc/ def/ http://host/abc/def/
4. http://host/abc def/ http://host/abcdef/

As you see, case 4 is completely broken and case 1 is possibly broken depending on your routing mechanism.

Instead of plain concatenation I would recommend either using whatever method is available in your language, or write your own url_combine(part1, part2, ...) method. This method should insert missing slashes and remove extraneous ones (consider also remove padding spaces, if there are any).

As a result you will be able to combine the URL parts like this:

var url = url_combine(base_url, url_part);

All four of the above cases would result in "http://host/abc/def/" assigned to url. This would keep the code protected against the modification of the URL parts upstream.

The same approach can be applied to combining parts of local file system paths.