Thursday, 18 December 2008

Using S3 With dm-paperclip

dm-paperclip is a port of Paperclip to Datamapper. It's a pretty straightforward port so most of the Paperclip documentation is valid for dm-paperclip. One of the most attractive features of Paperclip is it's ability to host your files on Amazon S3 instead of locally. Here's how you do it:

You will need 3 extra parameters on your has_attached_file declaration in your model:

has_attached_file :photo,
:storage => :s3,
:s3_credentials => Merb.root / "config" / "s3.yml",
:path => ':attachment/:id/:style.:extension'
validates_attachment_presence :photo
validates_attachment_size :photo, :in => 1..200000

The config/s3.yml file stores your S3 credentials and bucket name. This is very handy because you can have different configs and inheritance, just like database.yml:

development: &defaults
access_key_id: ...
secret_access_key: ...
bucket: project_images_development

<<: *defaults bucket: project_images_production

That's it!

Sunday, 30 November 2008

"Update All" Has Returned To The App Store

I updated my iPhone to 2.2 today and was very pleasantly surprised to see that the ability to update all the applications at once has returned.

Upgrading A Jailbroken iPhone From 2.1 to 2.2

To upgrade an iPhone that has been jailbroken on version 2.1 to version 2.2:
  • Open up iTunes and plug in your iPhone
  • Sync and backup your iPhone
  • Open up iTunes and click Restore to restore the factory default, this should also ask you to upgrade to 2.2
  • Use QuickPwn to jailbreak your fresh 2.2 install
  • Restore your backup
NOTE: You can also use AptBackup from Cydia to backup your jailbroken apps.

iPhone Tethering Done Right: PdaNet

Tethering an iPhone to another device, usually a laptop, is not allowed. AT&T contracts ban the practice (though this is rumoured to be changing soon). However, when you're stuck without a decent WiFi connection, an iPhone tether is a tempting proposition. Sadly, since it is banned, Apple have not made it easy.

Most solutions revolved around something called a SOCKS proxy. A proxy is just an application that runs on the iPhone that the tethered device communicates with in order to retreive data from the web. Despite it's simplicity, there is a major drawback: every application that needs to use the internet must have SOCKS compatibility built in. Many applications do, but the implementations are usually hasily added and tend to be buggy (even Firefox has issues).

If you were lucky enough to acquire the NetShare iPhone application (a SOCKS proxy that was available on the App Store for a very short period before being removed by Apple) you do not need to jailbreak your iPhone to be able to tether a device. However, most people will need to do this. However, this is really easy these days and quite safe. A tool such as QuickPwn will do the job with very little fuss.

Once your iPhone is jailbroken, you have a number of proxy server choices. From the NetShare style 3proxy to the old school SSH server. However, there is a much better solution: PdaNet. This is not a SOCKS proxy, it acts more like a software router, passing all your network traffic over the iPhone's internet connection. This means that your applications don't need to have SOCKS built in.

There are a number of issues with NetShare. Sometimes the DNS setup doesn't work very well (especially using an Ad-Hoc network on a Mac). It will also heavily drain your battery. You will need to plug it in for it to last for any long period (i.e. over an hour). You also have to pay to use it with encrypted websites (i.e. HTTPS) - but there is a way around that. Just visit the link printed at the bottom of the PdaNet screen on the iPhone, pretend to buy the software, but stop when you hit the payment screen. You can then browse any site you want without paying!

However, it's definitely the best tethering solution at the moment. Though you might also like to try solutions like proxifier for making SOCKS proxies easier (and more reliable).

Facebook Group Email Limits

Facebook do not allow group owners with more than 5000 members to message all their members at once. Facebook have made this decision to stop people from acquiring large Facebook groups in order to spam the members. A group with over a million members would be incredibly valuable as there is a very direct and clutter free communication channel with the members.

However, I think that Facebook have missed a trick here. They have been struggling to find a revenue model for a while now and allowing this kind of marketing might give them a stready income stream by charging companies who want access to their group members. For example, say a million people sign up to the 'I Love Sprite' group. Sprite would have no way of sending a message to these users, despite them all having expressly shown their interest in Sprite.

Facebook could charge Sprite to message their users. This fee could also help to pay for decent quality control and to enforce any rules they might have. Facebook is already proficient at rejecting low quality ads (though they do have a inventory problem), this shouldn't be difficult for them.

Saturday, 29 November 2008

Facebook Loses Data

A schoolboy mistake from the king of social networks today. They sent me this email:

Please reset your email notification settings

Unfortunately, the settings that control which email notifications get sent to you were lost. We're sorry for the inconvenience.

To reset your email notification settings, go to:

The Facebook Team

Made me chuckle

Sunday, 23 November 2008

Spotify Rocks

Spotify is a new music service that lets you listen to unlimited amounts of music for free. They have an impressively complete collection and you have complete control over what you listen to. The only caveat: it's ad supported. However, the ads are sparse (about one 30 second add for every album you listen to).

I was initially skeptical about having ads. I hate radio stations that will have you listening to adverts all day long or TV stations that play a few minutes of ads every 20 minutes. But adverts in Spotify are rare enough that it doesn't bother me. If you don't like the adverts, you can pay a subscription at £9.99 a month or a day pass for £0.99 (for parties perhaps?). I think the £9.99 a month is a little steep, but luckily I'm happy with the ad-supported version.

They have most of the music I like to listen to. However, they often don't have the obscure first albums (produced by tiny labels), which is a real shame. I expect this will change if Spotify gets big. There are a couple of big names who are not available. The only one I have really missed is Metallica's new album.

Each artist has links to similar artists. This is nice for finding new music, but doesn't appear to be particularly clever. That said, exploring is effortless because songs are played instantly and the browsing interface is very well designed. Songs and playlists can be sent to friends using special URLs, but these have to be emailed or IMed, it would be much better to have a more tightly integrated social network. I'm sure this will follow later on.

At the moment, Spotify is invitation only. I have a few spare, leave me your email and I'll send you one if I have one left.

Friday, 10 October 2008

Doing EC2 Without Scalr

It is important that your website can scale. You can spend all your energy promoting it and adding features, but that is all wasted if it cannot deal with all those users who are desperate to use your site. The architecture is at least half the battle. We are spoilt these days with affordable options to be able to deal with scale. Cloud computing has become a buzz word and big tech companies are scrambling to carve out their chunk of the market. So far we have EC2, Google App Engine, Joyent, VPS services (such as Slicehost) and a number of others.

EC2 was one of the first and most mature of the offerings. Though it does not stray far from the traditional concept of a machine (with fixed CPU and memory), what it does do is give you access to as many of these as you need at any one point and the (very) basic tools to manage them. They provide the basic infrastructure: machines, network connections and data storage. It is up to you to do the rest: load balancing, application servers, relational databases, backups, replication, redundancy, fail-over, etc.

Scalr promises to help with some of these issues, notably load balancing, and backups / replication / fail-over for MySQL. The idea is great but the project is not mature (despite the claim to v1.0) and the architecture is not very solid. To be fair, this is acknowledged to some degree in that v2 is to be a complete redesign. After a period of two months using Scalr, I decided to move on because of these issues:
  • Ad-hoc design: a strange combination of PHP, bash, MySQL and CRON. The interaction between these components was overly complicated, which means debugging was tricky and the system itself was prone to getting mixed up (for example: CRON calling PHP to do a backup but the MySQL state was stuck so backups were being skipped, silently, for weeks). V2 is to be Java.
  • Buggy: rebundling an instance would break the load balancer. It would get confused about the new instances and refuse to update. This was only solved by restarting the whole farm.
  • Scalr makes the decision to start new instances based on the load average. This is over simplistic really and can mean that your machines are under heavy load for a little too long which new instances are being brought up.
  • Scalr is built on Ubuntu 7.04 AMIs, upgrading the distro breaks it.
I do like what Scalr are trying to do. Just having pre-built load balancing servers, application servers and MySQL servers that are able to replicate and reconfigure themselves based on load is great. However, Scalr v1 is experimental really, I'm very keen to see how the next version evolves.

One alternative is to build the architecture yourself. It's actually not very hard (and quite quick). All you need is a decent scripting language that has good SSH and EC2 libraries and some experience with MySQL and linux admin. A few days work (4 in my case) will get you:
  • Run and stop instances with a single command, including application servers, master and slave MySQL instances.
  • MySQL slave replication (automatically configured against the current master)
  • NGINX load balancer, always up to date with the application servers.
  • Automatic backup and restore facilities.
  • Various other tricks that your setup might benefit from.
All that would be needed to catch up with Scalr's features (albeit not generically) is to automatically bring up the required instances as demand changes and automatically fail over MySQL instances. Not much more work and with the added benefit that you can tailor your load algorithm to your site (i.e. use response times instead of load averages). I would recommend getting your hands dirty!

Friday, 5 September 2008

Spotlight Goes Dotty

The last 3 times I have had a new install of OS X, I've noticed that it gets sluggish after a few weeks. Loading up the monitor app shows a process called mdworker being heavy on both CPU and memory.

My first reaction was that it was some sort of virus or at best a rogue app, but googling mdworker revealed that it was part of OS X. More precisely it is the Spotlight indexing daemon (keeps track of all your files to make searching faster).

My first reaction was to look for a way to turn off spotlight. I don't really use it very much and Quicksilver covers the needs I do have. However, Spotlight is also needed to search your mail, which I can't do without.

The problem appears when Spotlight is trying to keep up with a large number of new files on the system. In my case copying some music onto a new machine. The most straightforward solution is to tell Spotlight to ignore those files. This is easily done in System Preferences -> Spotlight -> Privacy -> Add ( + ). I added my Music directory which is where iTunes stores it's library.

It is odd that this should be a problem. Indexing is something that should happen in the background. There is no reason for it to affect the user. It appears that Apple decided that it was more important for files to be searchable than for the operating system to be responsive.

Saturday, 19 July 2008

A Good Reason To Use IFrames For Facebook Apps

If you use canvas pages to display your Facebook apps, dealing with scale suddenly becomes quite a bit harder. A request must be served in less than 4 seconds otherwise it fails completely. This is fine when your traffic is constant, but can catch you off your guard if you get a sudden spike.

I'm currently building a Facebook app using Scalr to deal with scaling. This automatically creates new Amazon EC2 instances to deal with increases in load and kills them again when traffic dies down. The problem arises in deciding when to start a new instance.

Scalr keeps track of the server load average. This is a moving average over 15 minutes. It's a good idea to use a slow moving average like this so as to not start instances prematurely. It will only do so if the existing instances do actually need some help (especially considering Amazon charge by the hour). However, this introduces lag in starting a new server, say about 5 minutes.

If I weren't serving pages to users via Facebook this wouldn't be such a problem. There would be small window when the site would be a little sluggish. For a canvas page this means pages are considered to have timed out, not good!

Of course, using IFrame's is generally faster because it doesn't have the proxy overhead (and sometimes sluggish Facebook servers). There are some drawbacks in losing some cool FBML functionality (e.g. fb:can-see) and you will have to make a lot more calls to the API, but I now think IFrames are the way to go.

Wednesday, 28 May 2008

Is SQLAlchemy Ready For Production?

SQLAlchemyI have built my most recent project using SQLAlchemy (SQLA) 0.4. It is a reasonably young library, so I thought others considering using SQLAlchemy might benefit from my experiences.

The most recent version is 0.4, so it obviously does not claim to be 'finished' (can software ever be?) However, there are plenty of benefits in using it already:
  • Ridding your beautiful Python code of any SQL. It's untidy and pain to work with (which means bugs).
  • ORM (Object Relational Model) to fit in with a nice OO architecture.
  • Free sharding to split your database horizontally.
  • DB independant code, so you can switch if you feel the need.
  • Hassle free transactions and DB connection management.
Sounds great doesn't it? In general this is exactly what you get. However, there are a few points which are very important to consider before committing yourself to using SQLA.

Forcing Indices

Anyone who has had to do something reasonably involved in MySQL has spent time optimising their indices. Sadly, MySQL is not very clever about how it chooses the index to use for a particular query. In some cases the only option is to force it with FORCE. This is not available in SQLA, so anytime you need to do this, you'll have to manually specify the SQL. Not only does this partly defeat the point of SQLA, but it also happens that the place you most need the SQL abstraction is often the same place you need to force an index. Consider a complicated search page. You are searching over a large data set using a number of filters. MySQL is probably going to get the index wrong and you're going to have to generate a complicated SQL query without SQLAlchemy's help.

Commit ORM Objects

SQLAlchemy can be a little silly about detecting a change in an ORM object. If you assign to a member variable which is part of the schema, it will be marked as dirty even if the value doesn't actually change. The solution is to check for a change before assigning, which does not make for neat code!


It's a good idea to plan for scaling as early as possible. It can be very difficult to build the necessary bits in later when you need it (and will also under pressure to fix things fast). One common way to deal with scale is to split large tables over several machines, this is known as sharding (each independant chunk of table is known as a shard).

SQLAlchemy has some code to help you there. All you need to do is to define 3 functions which tell SQLA which shard a particular row is in. Really simple. Sadly, this part of the code is not very mature at all:
  • query.count() doesn't work (nor any scalar query). I had to write a function to query each shard in turn and sum the result. The real problem here is that it was not clear at all which bits of functionality will or won't work with sharding (expect long debugging sessions, digging into the SQLA code).
  • The ORM caches objects and identifies them by their primary key. However, a common MySQL trick when sharding is to have an auto_incremented INT as the primary key of each shard but use something like a UUID as the 'real' primary key recognised by the code (this speeds things up quite a bit). Of course, the auto_incremented key will not be unique across shards and this will confuse SQLAlchemy. I think the best solution here (suggested by someone on the SQLAlchemy group) is to have a 2 column primary key with an INT and another integer shard identifier, making it unique.
No Server Side Cursors

A minor point, but might be important for some. Server side cursors come in handy sometimes when you are dealing with large amounts of data.

Tips For Optimising
  • Periodically check the SQL queries being made with the echo option. There might be some surprises in there (though usually easily fixed). This kind of thing usually pops up because you will use the ORM instance and forget / not realise that it will result in a query (from a software design point of view, great. From an optimisation point of view, awful).
  • Use set_shard on a query whenever you are able. If you know which shard the column you want is, no need to go checking the other ones. A common example is when the shard identitifier is in one of the query parameters.
  • Design for scale right from the beginning and develop / test on a distributed architecture (i.e. have at least 2 shards). This doesn't need to be difficult, for example: just create two databases on your developement box to simulate two machines.

I still think that SQLAlchemy is worth using. Overall it will save time and effort as long as you are careful and not afraid to get your hands dirty when the going gets tough. I expect most of these problems will be addressed in (near) future releases.

Tuesday, 27 May 2008

C# BHO Tutorial

ie7I get a fair bit of traffic from people looking for help with C# and BHOs (my event handling post). There isn't much information about and the only beginners tutorial went missing a few months back. A new one has appeared, anyone wanting to get going should check it out.

Wednesday, 14 May 2008

Vim: The Word Processor

I love Vim. It took me a while to get there, but I've been using it exclusively for coding for a couple of years now and it has become second nature. I first learnt to use it during a practical exercise for an operating systems course at uni. We had to write a Minix driver so all the work had to be done on the command line. A powerful text editor was a must. I think I used Vi (rather than Emacs) because it was available and I had been told it was great. It was a steep learning curve, but I got a hang of the basics after a few days.

It was a couple of years before I started to use it again. In the meantime I'd mostly been working with .NET and Java so I'd been using some pretty decent development environments and it didn't seem necessary to use anything else (especially considering the power of their debuggers). But I moved into the world of the web and started writing PHP and then Python, these did not really have especially good IDEs and so it was back to a text editor and my choice was Vim. It also coincided with worsening RSI, for which Vim is great.I tend to learn a couple of features in a burst every few months when something really bugs me. This is probably not ideal, but Vim is so powerful I will never learn it all. I've been keeping a to do list in a text file recently (rather than on paper - go planet!) but Vim's defaults are not great for editing prose:
  • Vim's word wrap is by character, not word.
  • k and j (up and down) work on a line basis. If you have a wrapped line, you cannot move inside it with j and k (like you would with a normal text editor).
Of course, Vim is hugely powerful and can be tweaked to be much more useable when writing prose. These .vimrc commands:
  • Wrap lines by cutting lines off at word boundaries.
  • The word wrap is virtual, no extra line break is inserted (so that it's easy to edit afterwards).
  • j and k are replaced with gj and gk which allow you to move up and down inside a wrapped line.
  • I've also added the spell checker (I've not played around with it properly yet, but it looks a little weak).
  • smartindent for bullet points.

 autocmd BufRead *\.txt setlocal formatoptions=l
 autocmd BufRead *\.txt setlocal lbr
 autocmd BufRead *\.txt map  j gj
 autocmd BufRead *\.txt  map  k gk
 autocmd BufRead *\.txt setlocal smartindent
 autocmd BufRead *\.txt setlocal spell spelllang=en_us

On a slightly different note: if you're using OS X, get this port of Vim. It's aim (and it does) is to integrate better into Macs. It's worth getting just because it has pretty Carbon tabs ;-)

Wednesday, 23 April 2008

Facebook Chat Firefox Plugin

FacebookFacebook finally released their chat functionality to all their users today. It's fantastic! I quickly came to the conclusion that it did have one major flaw: you have to be on the Facebook page to see any incoming messages. So to solve this, I quickly knocked up a Firefox add-on to alert you if you have any new chat messages. You can download it here.Firefox Facebook Chat Add-on Screenshot

It uses the built-in Firefox alert system, so you'll need Firefox 3 on OS X (for Growl) and 2 for other OSs. At the moment you also have to leave the Facebook page open, if I have time later, I'll deal with this so that you can close the Facebook page.

PLUG: Donate 2 Date is my Facebook app. It helps raise money for charity by setting people up on dates. Check it out!

Sunday, 20 April 2008

Building A Legacy LAMP Stack

LAMPThere are many live sites on the web that were written in PHP3. In case you ever need to set up an environment for one, here's how:
  1. Download and compile MySQL 4. You can get the source in from the archives on the MySQL website.
  2. Download Apache 1.3, take a look here. As well as setting the prefix in the configure step, you'll need to enable mod_so with --enable-module=so.
  3. Download PHP3, you can find it here. Now here comes the tricky bit:
    • The MySQL 4 client libraries don't include DROP DATABASE and CREATE DATABASE bindings, so we need to remove these from the PHP source. Grep for mysql_drop_db and mysql_create_db and comment out the relevant lines and functions.
    • Set the MYSQL_LIB and MYSQL_INC environment variables before running configure. The PHP 3 configure doesn't pick up the default MySQL 4 paths (MYSQL_LIB=/usr/local/mysql/lib/mysql and MYSQL_INC=/usr/local/mysql/include/mysql).
    • Continue with a normal configure, make and make install, building a shared object for Apache (see the INSTALL file).
  4. One last thing. After following the INSTALL file instructions for editing the httpd.conf, you may need to add a FileHandler for php files as well as php3 files.
Note: This is not a long term strategy. PHP 3, MySQL 4 and Apache 1.3 are not actively maintained so they're bound to be riddled with problems (including security holes). Best to update as soon possible!

Thursday, 3 April 2008

Zen of Programming

ZenProgramming is a difficult discipline. It requires a phenomenal knowledge of difficult technologies: (usually several) programming languages, operating systems, knowledge of hardware, protocols, databases, etc. It takes years of using them to build up the know-how to use them all effectively to solve something (and isn't that the point of programming). However, knowledge and understanding are not enough. Having a certain type of brain helps a lot too. Most good programmers are excellent logical thinkers and have a solid background in maths.

So you've got a few years experience, you know a couple of languages and can bash out code with your eyes closed. This is the point where you start worrying about the quality of your code. How buggy is it? How robust is it? How maintainable is it? Dealing these problems is not like before - there is no solution, but there are ways you can cope. The Zen of Programming:
  1. Re-read what you just coded - before running it / compiling it.
  2. Think as far ahead as you can. But don't spend so long that you don't get anything done.
  3. Refactor a lot.
  4. Try and automate your testing.
  5. Re-read the documentation of a language / library after you become competent in it.
Most of these points boil down to having patience. If you are naturally so, then you probably do most of these anyway. Otherwise, it is very much worth forcing yourself to change your style - you will save time and your code will be much, much better.

Tuesday, 1 April 2008

Handling International Dialling Codes (in Python)

The website I'm working on at the moment collects a user's phone number. This must work with phones from any country and each number must be converted into a standard format so that it can be used with an SMS API. For example, the phone number +44 (0)7912345678 would become 447912345678.To make it as easy as possible for the user to not make a mistake, the international dialling code is in a

<select name="dialling_code">
for="country, dialling_code, ndd in internationalDiallingCodes"
${country} (+${dialling_code})

When the form is submitted, we need another function to turn the dialling code plus the rest of the number into the right format (note: this is also where internationalDiallingCodes comes from):

def makeStandardPhoneNumber(internationalCode, rest):
Make a standard phone number by appending rest to the
internationalCode. Check the first digits rest to see if they
match the NDD which must be removed from the number.
i.e. 0 in the UK
Example: makeStandardPhoneNumber('44', '073749135381')
-> '4473749135381'
# Get the NDD code for this internationalCode
ndd = getNDD(internationalCode)
# if the country has an NDD, check for it and remove if it
# exists
if ndd is not None:
index = len(ndd)
if rest[:index] == ndd:
rest = rest[index:]
result = internationalCode + rest
return result

def getNDD(internationalCode):
for country, iCode, ndd in internationalDiallingCodes:
if iCode == internationalCode:
return ndd
return None

internationalDiallingCodes = [ ("Afghanistan ", "93", "0"), ("Albania", "355", "0"), ("Algeria", "213", "7"), ("Andorra", "376", None), ("Angola", "244", "0"),...]

You can download the python source file here. It is trivial to modify the functions to put the phone number into the format you want. If you needed a performance boost, this function could be optimised by inversing the index on internationalDiallingCodes.

* Also:
You can download a CSV of phone codes here. That should make it easy to include them in any program. These do not include satellite phone companies.

Saturday, 15 March 2008

Print to PDF in OS X and ConceptDraw 7

ConceptDraw 7A good way to get PDF output from pretty much any application is to install a PDF printer driver. Applications can then print as normal, but the document gets distilled to a PDF rather than sent to a printer. If you are on OS X, give CUPS-PDF a whirl.

I did find that the output from CUPS was massive. A file which should have been about 100k was actually 70M. The outputted file gets put into a pre-defined folder on the desktop, there is no option to change it. It's a useful piece of kit, but only as a last resort.

Luckily, the application I was trying to get a PDF from turned out to have its own functionality to do it properly. If you want to export to PDF in ConceptDraw 7, you actually have to go to the print dialog and look in the bottom left hand corner (not in "Export" as you might expect).

Monday, 10 March 2008

Facebook Notification / Request Buckets

FacebookOne of the most common ways for a Facebook application to spread is through users inviting their friends once they have installed it. Unfortunately, this has lead to some pretty dubious practices (i.e. spam) to force people to invite others. The Facebook API team have responded by removing requests from the API, and more recently, allocating a maximum number of requests you may have per user based on how "spammy" your application is.

It uses metrics like how often people choose to "ignore, hide and report notifications as spam". This is an interesting way to suppress spam applications, but there have been reports that things can be difficult early on because a single 'ignore' will be more statistically significant. Based on the metrics, your application gets placed in one of 9 buckets for notifications and 13 buckets for requests, each bucket having a "limit threshold" which is the maximum per user per day. I've put together a table showing these based on a number of forum posts:


Notifications BucketLimit Threshold
1Blocked for 1 month
828 (estimated)


Request BucketsLimit Threshold
1Blocked for 1 month (estimate)
22 (estimate)
58 (estimate)