Sunday, December 30

Types and Programming Languages Revisited

My previous brute-force approach to list all the 4-dimension combinations of S = {(latent, manifest), (static, dynamic), (weak, strong), (nominal, structural)} typing -by filtering the desired tuples from the powerset of S - wasn't very efficient.
It was a perfect example of The Danger of Naïveté: an inefficient algorithm that was nevertheless simple to devise and implement.

Moreover, quoting The eight rules of Dean Carney, I should have observed that "the first place to look for a solution is within the problem itself". If the problem involves 4 tuples of 2 elements each to begin with, the efficient solution is not likely to require the powerset of all 8 elements.

Thus, I figured I should give it another go.

First, for simplicity and better readability, let me replace the values of S with the character-tuples on the left hand side:

(A, B) = (latent, manifest)
(C, D) = (static, dynamic)
(E, F) = (weak, strong)
(G, H) = (nominal, structural)

S = {(A, B), (C, D), (E, F), (G, H)}

Next, I'll write down some 4-size combinations. After a short while, a pattern emerges:


The pattern becomes clear when I look at the elements like this:

X1=(A, B) x X2=(C, D) x X3=(E, F) x X4=(G, H)

This is a Cartesian product of X1 x X2 x ... x Xn, the set containing all possible combinations of one element from each set.

With this knowledge, implementing an efficient algorithm is easy. First, a method to return the Cartesian product of two sets:

# arguments must be lists, e.g.: cartesian([1, 2], [3, 4])
def cartesian(list1, list2):
     result = []
     for el in list1:
         for item in list2:
             if isinstance(el, list):
                 result.append(el + [item])
                 result.append([el] + [item])
     return result

Testing the method with input X=[1, 2] x Y=[3, 4] yields:

     [[1, 3], [1, 4], [2, 3], [2, 4]]

Which looks good.
Next, a method to process the Cartesian product of any number of sets. Here's a candidate implementation:

def types_cartesian(aList):
     head = aList[0]
     tail = aList[1:]
     for item in tail:
         head = cartesian(head, item)
     return head

The output of method types_cartesian(aList) given input aList=S is:

S={['static', 'dynamic'], ['weak', 'strong'], ['latent', 'manifest'], ['nominal', 'structural']}

S1 x S2 x ... Sn =

     ['static', 'weak', 'latent', 'nominal']
     ['static', 'weak', 'latent', 'structural']
     ['static', 'weak', 'manifest', 'nominal']
     ['static', 'weak', 'manifest', 'structural']
     ['static', 'strong', 'latent', 'nominal']
     ['static', 'strong', 'latent', 'structural']
     ['static', 'strong', 'manifest', 'nominal']
     ['static', 'strong', 'manifest', 'structural']
     ['dynamic', 'weak', 'latent', 'nominal']
     ['dynamic', 'weak', 'latent', 'structural']
     ['dynamic', 'weak', 'manifest', 'nominal']
     ['dynamic', 'weak', 'manifest', 'structural']
     ['dynamic', 'strong', 'latent', 'nominal']
     ['dynamic', 'strong', 'latent', 'structural']
     ['dynamic', 'strong', 'manifest', 'nominal']
     ['dynamic', 'strong', 'manifest', 'structural']

This is the same list as the one obtained earlier with the powerset approach.
The big difference is that with the powerset implementation the number of possible combinations explodes exponentially ( O(2n) ), whereas the Cartesian product is a quadratic ( O(n2) ) algorithm.

Here's the code.

Types and Programming Languages
What to know before debating type systems

Saturday, December 29

Anchoring & Brainstorming

From Quote of the week: Why brainstorming is a bad idea:

"To their surprise, the researchers found that virtual groups, where
people brainstormed individually, generated nearly twice as many
ideas as the real groups.

The result, it turned out, is not an anomaly. In a 1987 study,
researchers concluded that brainstorming groups have never
outperformed virtual groups.
Of the 25 reported experiments by psychologists all over
the world, real groups have never once been shown to be
more productive than virtual groups.
In fact, real groups that engage in brainstorming consistently
generate about half the number of ideas they would have
produced if the group's individuals had worked alone."

Fascinating stuff.
Could this be related to [Nobel laureate] Daniel Kanheman's work on the Anchoring bias, where [apparently individual] contributions during a brainstorming session overly rely (anchor) on each others' ideas?

Useful Links:
Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases
The Medici Effect

Friday, December 28

Types and Programming Languages

This excellent overview of type systems by nostrademons @ reddit describes latent, manifest, static, dynamic, weak, strong, nominal, and structural typing, and places a number of popular programming languages on some of these dimensions.

I got curious about listing all the possible combinations and then find programming languages to fill in the slots.

I cooked up some quick and dirty Python code to do the job.

Let's call S = {static, dynamic, latent, manifest, weak, strong, nominal, structural}
The powerset of S has 28 = 256 combinations, only some of them useful but I can fix that later.

# return a list with 2len(l) elements (definition of powerset)
def powerset(l):
     r = []
     if l:
         head = l[:1]
         tail = l[1:]
         for item in powerset(tail):
             r.append(head + item)
     return r

The powerset of S returns the complete list of subsets of S from sizes 1 to 8 plus the empty set {}.
This means subsets {static}, {dynamic}, {static, dynamic}, ..., all the way to {static, dynamic, latent, manifest, weak, strong, nominal, structural} are included too.

However, there are 4 pairs of [kind of] mutually exclusive types and I am only interested in the subsets of size 4. Thus, I need only consider the unique 4-dimension combinations.

The maths tells me I should expect n!/k!(n-k)! = 8!/4!(8-4)! = 70 such combinations (see binomial coefficient or look up the middle number of the 8th row of Pascal's triangle).

I can obtain this by keeping only those subsets of size 4 from the list returned by the powerset. The function below will help:

# filter list keeping only n-tuples where n=size
def len_filter(l, size=1):
     return [item for item in l if len(item) == size]

Still, the set of 70 size-4 combinations contain lots of invalid members because of the 4 pairs of [kind of] mutually exclusive types (read original comment for details):

    ("static", "dynamic"),
    ("latent", "manifest"),
    ("weak", "strong"),
    ("nominal", "structural")

A simple solution is to just remove any combination that contains these "invalid" pairs. The 2 functions below will do that:

# retain n-tuples that don't contain any of the invalid pairs
def invalidpairs_filter(l):
     return [item for item in l if not is_invalid(INVALID_PAIRS, item)]

# returns True if any of the invalid tuples is contained in the given n-tuple,
# False otherwise
def is_invalid(invalid_pairs, ntuple):
     s = set(ntuple)
     for invalid_pair in invalid_pairs:
         si = set(invalid_pair)
         if si.issubset(s):
             return True
     return False

Et voilá! Running the Python program prints the 16 combinations we want:

    ['latent', 'nominal', 'static', 'weak']
    ['dynamic', 'latent', 'nominal', 'weak']
    ['manifest', 'nominal', 'static', 'weak']
    ['dynamic', 'manifest', 'nominal', 'weak']
    ['latent', 'nominal', 'static', 'strong']
    ['dynamic', 'latent', 'nominal', 'strong']
    ['manifest', 'nominal', 'static', 'strong']
    ['dynamic', 'manifest', 'nominal', 'strong']
    ['latent', 'static', 'structural', 'weak']
    ['dynamic', 'latent', 'structural', 'weak']
    ['manifest', 'static', 'structural', 'weak']
    ['dynamic', 'manifest', 'structural', 'weak']
    ['latent', 'static', 'strong', 'structural']
    ['dynamic', 'latent', 'strong', 'structural']
    ['manifest', 'static', 'strong', 'structural']
    ['dynamic', 'manifest', 'strong', 'structural']

Should there be a programming language for all the above combinations? nostrademons and grauenwolf already filled in a few of these slots (apologies if my copy & paste introduced errors):

latent, nominal, static, weak =
dynamic, latent, nominal, weak = PHP
manifest, nominal, static, weak = C, Access SQL, Java, C++, C#,
dynamic, manifest, nominal, weak =
latent, nominal, static, strong = Haskell
dynamic, latent, nominal, strong = Scheme
manifest, nominal, static, strong = Java, C++, C#, VB.NET (strict), T-SQL
dynamic, manifest, nominal, strong =
latent, static, structural, weak = VB.NET, VBScript
dynamic, latent, structural, weak = JavaScript, PHP, Assembly
manifest, static, structural, weak =
dynamic, manifest, structural, weak =
latent, static, strong, structural = Ocaml, Haskell
dynamic, latent, strong, structural = Erlang, Scheme, Common Lisp, Python, Ruby
manifest, static, strong, structural = Haskell, C++ Templates
dynamic, manifest, strong, structural = Common Lisp

Repetitions occur because most languages support multiple forms of typing.

My next task (for another night, another post) is to clean-up the table and find languages that fit in the 5 unused dimensions from the list above:

    ['latent', 'nominal', 'static', 'weak'] =
    ['dynamic', 'manifest', 'nominal', 'weak'] =
    ['dynamic', 'manifest', 'nominal', 'strong'] =
    ['manifest', 'static', 'structural', 'weak'] =
    ['dynamic', 'manifest', 'structural', 'weak'] =

Possible Questions:

  1. No programming languages for a given combination? Why is that?

  2. Once most known programming languages found their place in the matrix, it will be interesting to understand why some combinations are more popular than others

I've created project programming-misc on Google Code to hold these and other programming exercises I might engage in. The code shown above is available in the SVN repository and downloads page.

3-31-04 I'm Over It (the article by Bruce Eckel that started the conversation @ reddit)
Comment by nostrademons
Comment bt grauenwolf

It is quite bizarre that the otherwise very polished Google Analytics has a bug known since (at least) Nov 2005!
The URL considered invalid by Google Analytics is actually the one generated by and taken from Google Code. The issue is well documented with easy work-arounds, though.

Friday, December 7

One-liners that made me lol

(original theme by Niniane)

"Don't let your beard get in the way of your typing"

A comment on this Reddit thread.

Sunday, September 30

Mistakes that cost lives

From The Economist Plenty of blame to go around, Sep 27th 2007:

"On September 5th Mattel had told an American Congressional committee that
its recall of 17.4m toys containing a small magnet that could be swallowed by
children was due to a flaw in the toys' design, rather than production
flaws in China.

From a BBC article on the 13 August 2007:

"The boss of a Chinese toy firm involved in a huge safety recall has committed suicide, Chinese media has said. Zhang Shuhong, who co-owned the Lee Der Toy Company, was reportedly found dead at his factory in southern China.
About 1.5 million toys made for Fisher Price, a subsidiary of US giant Mattel, were withdrawn from sale earlier this month. Many were made by Lee Der."

(emphasis all mine)
This seems to suggest the suicide was a consequence of the recall and the accusations.
If only the (late, reluctant and shameless) apology could bring Zhag Shuhong's life back...

Plenty of blame to go around
Chinese toy boss 'kills himself'

Saturday, September 29

The eight rules of Dean Carney

From the [very entertaining] book Ugly Americans by Ben Mezrich, the tale of Ivy League traders playing the Asian markets:

  1. Never get into something you can’t get out by the closing bell. Every trade you make, you’re looking for the exit point. Always keep your eye on the exit point.
  2. Don’t ever take anything at face value. Because face value is the biggest lie of any market. Nothing is ever priced at its true worth. The key is to figure out the real, intrinsic value — and get it for much, much less.
  3. One minute, you have your feet on the ground and you’re moving forward. The next minute, the ground is gone and you’re falling. The key is to never land. Keep it in the air as long as you fucking can.
  4. You walk into a room with a grenade, and your best-case scenario is walking back out still holding that grenade. Your worst-case scenario is that the grenade explodes, blowing you into little bloody pieces. The moral of the story: don’t make bets with no upside.
  5. Don’t overthink. If it looks like a duck and quacks like a duck — it’s a duck.
  6. Fear is the greatest motivator. Motivation is what it takes to find profit.
  7. The first place to look for a solution is within the problem itself.
  8. The ends justify the means, but there’s only one end that really matters: Ending up on a beach with a bottle of champagne.

Vaguely Related:

Quotes Entirely Relevant to Software Engineers

Thursday, September 20

5 Business Ideas

Share my ISA (of dubious legality)
On the one hand there is a number of people constrained by the ISA limits but keen on investing their money in a reduced risk, tax efficient way; on the other hand, only a small number of people uses or maximises their ISA.
I see an opportunity to restore the supply-demand balance: investors could agree to "lend" money to the non-investors and maximize their ISAs every year, effectively using the ISA allowance on their behalf. The "borrower" would take a (very) small commission for his ISA allowance rental, a lot better than the nothing they would have received had their ISA allowance remained unused.
As the amount accumulates, the compounded interest may allow a renegotiation more favorable to the borrower.

Twitter Services
Services on the move in an area (e.g.: plumbers) update their status and location; interested parties subscribe to the service they need and request a visit if the professional is available and in their neighbourhood.

Rent My Garage
A garage rental marketplace: matchmaking between people with a garage and no car, and people with a car but no garage.

Paper-Holding Notebook Screen
Computer lids that also serve as page holders by either having a pressure system on the side of the screen to hold the pages in between, or having a supporting frame that slides outwards and becomes a vertical paper tray.
This would simplify tremendously the process of typing while reading from paper notes.

The Chosen 1
Unlike current online dating and matchmaking services, The Chosen 1 "game" goes beyond simple string matching.
Male and female participants are plotted on the screen as blue and pink dots resembling stars in the sky. The game challenges participants over time with questions, puzzles, activities, and preferences, some of which are introduced by the participants themselves, others are created by the system. Traits, tastes, preferences, characteristics, and relationships, can change and evolve throughout a season.
Participants can decide whether they want to attract opposites or like-minded personalities.
The Chosen 1 clusters participants creating "galaxies" and "systems" of attraction. Once certain thresholds are broken, participants within a constellation are allowed to chat online. If two participants of opposite sexes (or same sex if they so selected) are so close that they end up in a cluster of two, they are each others' Chosen 1 and given each others' contacts.
Game Over.

Because ideas are easy, doing stuff is hard. True, true.

Vaguely Related:
Business need: Digg for Consumer Electronics

Wednesday, September 19

Facebook is the new MySpace (definitely)

Morning chat I overheard on the tube:

Little Girl 1: I wanna get into facebook
Little Boy: yeah, everyone's in it
Little Girl 2: wha' is it?
Little Girl 1: it's kinda like sister's in it too. I really want to join

Tuesday, September 18

Saturday, September 15

Barclays needs central bank loan too

From the BBC Business News, 31 August 2007:

Barclays says that a "technical breakdown" in the UK's clearing system forced it to borrow £1.6bn from the Bank of England.

It is the second time this month that the bank has tapped into the central bank's emergency credit line, sparking fears it is facing a cash crisis.
On 20 August, Barclays was forced to take out a £314m loan from the Bank of England after HSBC was unable to process a last-minute request for the money.

No bank run for Barclays, though.

Wednesday, September 5

Better living through (unit) testing and advice

A few weeks ago I (accidentally) found out that a pet-project of mine, the cross-lingual instant messaging, wasn't working anymore. Debugging showed the "3rd party translation sub-system" was always returning an empty translation.

The fix revealed my development and maintenance process was flawed:

  1. It was time consuming to identify the root cause;
  2. My unit tests didn't address that particular scenario;
  3. I wasn't notified when the problem first occurred, and so might have been unaware of the existence of a problem for God knows how long;
  4. There was no user friendly message given to the user, who was left with emptiness where a translated message was expected;

The course of action required addressing each of the above listed issues:
  1. Identify and fix the class of problems observed;
  2. Improve prevention and identification of similar problems in the future, by expanding the coverage and improving the quality of the unit-tests;
  3. Introduce a prevention and notification mechanism;
  4. Improve the user experience by providing user-friendly error messages;
1) Identify and fix the class of problems observed
I just had to find out what changed in the 3rd party API and adapt my parser accordingly. Thanks to Spring, this involved a small change on a XML file and didn't require recompiling the code.
Now the next step forward is trying to automate this detection and adaption process. This is still work in progress to merit a post of its own.

2) Improve coverage and quality of unit-tests
Unit-tests that don't address particular scenarios are text-book cases of code smells. Here's the signature for my translation services:

public interface ITranslationEngine {
* Translates 'text' from a given 'fromLanguage' into a given 'toLanguage'
* @param fromLanguage
* @param toLanguage
* @param text
* @return
* @throws Exception
public String translate(String fromLanguage, String toLanguage, String text) throws TranslationException;

Implementations of the method translate above were responsible for the 3 tasks below:
  1. send request to the 3rd party translation sub-system with the original message to translate, the source language, and the destination language;
  2. fetch response from the translation sub-system;
  3. parse response, extract and return translated string;

There weren't any unit-tests for any of these tasks because these multiple tasks were encapsulated under the one single method translate. When a method is long and/or does too much, it is harder to test. The method wasn't fine-grained enough to allow proper unit-test coverage, and there were at least two anti-patterns present in the tests:
  • The One, where a unit-test contains one test method which tests an entire set of functionality;
  • The Superficial Coverage, where exceptional conditions are missing from the test cases;

The fix involved extracting methods to simplify testing. The original translate method was refactored and each of the tasks listed above extracted into a well-named method. I added unit-tests that focused on each extracted method and considered input and output data in valid and exceptional scenarios. Loose coupling and testability go hand in hand.

3) Introduce a notification mechanism
To address this I implemented an Advice, a Spring schema-based AOP, to intercept the translate method, and check its arguments and return String. If the returned string is empty, a notification email is sent. The code snippet is shown below:

<aop:aspect id="translationEngineInterceptor" ref="translationEngineAdvice">
pointcut="execution(* translate(String, String, String)) and args(from, to, text)" />

<bean id="translationEngineAdvice" class="org.jiim.translation.TranslationEngineAdvice">
<property name="mailSender" ref="mailSender"/>
<property name="mailMessage" ref="mailMessage"/>


public class TranslationEngineAdvice {
protected final Log logger = LogFactory.getLog(getClass());
private MailSender mailSender;
private SimpleMailMessage mailMessage;
public static final String NEW_LINE = System.getProperty("line.separator");


public boolean notifyByEmail(JoinPoint jp, Object translatedString, String from, String to, String text) {
String s = (String) translatedString;
// if message string returns empty translation
if (!"".equals(text) && "".equals(s.trim())) {
SimpleMailMessage msg = new SimpleMailMessage(this.mailMessage);
StringBuffer txt = new StringBuffer(msg.getText());
txt.append(NEW_LINE).append("Date [").append(new Date()).append("]");
txt.append(NEW_LINE).append("From language [").append(from).append("]");
txt.append(NEW_LINE).append("To language [").append(to).append("]");
txt.append(NEW_LINE).append("Original Text [").append(text).append("]").append (NEW_LINE);
try {
return true;
} catch (MailException me) {
logger.error("Error sending email notification ", me);
return false;

4) Improve the user experience
I made a more considerate application by fetching a localized user friendly message to display if the translated message is empty.

Extracting Methods to Simplify Testing
Making Considerate Software
JUnit Anti-Patterns
TDD Anti-Pattern Catalogue
Reusable Advice - Part II: Unobtrusive Notification

Sunday, September 2

Rhino + Maven for JavaScript compression @ build time

AJAX-based Rich Internet Applications (RIAs) make heavy use of JavaScript. To improve the user experience of RIAs we can minimize the number of (JavaScript) file requests and reduce their size.

These concatenation and compression operations can be done on-the-fly or at build time.
On-the-fly techniques intercept requests and perform merging and/or compression in real-time. They are simple(r) to implement, often involving URL rewriting combined with a compress script, but:

  • don't scale, since compression costs grow proportionately to the number and size of the files;
  • add latency, by consuming CPU resources during (precious) server response time;
  • are harder to test and debug;
  • rely heavily on caching for continued performance improvements, when research indicates 40-60% of users browse with an empty cache;
  • don't work on all browsers

Compression at build time allows the use of more robust, often slower, compression methods (e.g.: a proper JavaScript interpreter instead of a regular expression) and/or combination of different methods (e.g.: run YUI Compressor after Rhino).
At build time we're unconstrained by the "real-time performance pressures".

Here's a step-by-step guide for a Maven/Ant build file using Mozilla's Rhino to serve pre-compressed JavaScript and significantly reduce application load time.

  1. Download the rhino.jar from Dojo ShrinkSafe
  2. Download, install, and configure Maven
  3. Copy the rhino.jar to your Maven repository
  4. Have a Maven-based Web-application ready

2.Add rhino.jar as a dependency to the POM


3.Setup some useful variables on maven.xml

<!-- destination for the compressed JavaScript files -->
<j:set var="js.compression.dir" value="${typically_the_war_src_dir}"/>
<j:set var="js.compression.skip" value="false"/> <!-- enable/disable compression -->
<j:set var="js.compressor.lib.path" value=""/> <!-- path to the compressor jar -->

4.Fetch compressor and make it globally available

    <goal name="get-compressor" description="Set path to JavaScript compressor library">
<!-- get compressor from dependencies, to allow standalone use of goal -->
<j:if test="${empty(js.compressor.lib.path)}">
<ant:echo message="Fetching JavaScript compressor from repository" />
<j:forEach var="lib" items="${pom.artifacts}">
<j:if test="${lib.dependency.artifactId == 'custom_rhino'}">
<lib dir="${maven.repo.local}">
<include name="${lib.path}"/>
<j:set var="js.compressor.lib.path" value="${lib.path}"/>
Path to JavaScript compressor library is ${js.compressor.lib.path}

This way, get-compressor can be reused as a pre-requisite of all compression tasks.

5.Aggregate compression sub-tasks

<goal name="compress-js" description="Compress JavaScript across all site components">
<!-- Different sections might have different compression requirements -->
<j:if test="${js.compression.skip == 'false'}">
<attainGoal name="compress-site-js"/>
<attainGoal name="compress-forum-js"/>

6.Compression task (to compress a single file)

<goal name="compress-site-js" prereqs="get-compressor" description="Compress JavaScript files for 'site'">
<ant:echo message="Compressing JavaScript files for 'site'" />
<j:set var="stripLinebreaks" value="true" />
<j:set var="js.dir" value="${path_to_javascript_files}"/>
<concat destfile="${js.dir}/site-concat.temp" force="yes">
<filelist dir="${js.dir}"
files="file_to_merge.js, another_file_to_merge.js, etc.js"/>
<ant:arg value="-c"/>
<ant:arg value="${js.dir}/site-concat.temp"/>
<!-- move compressed files back from dest to src dir -->
<ant:move file="${js.dir}/site-breaks.temp" tofile="${js.dir}/site_c.js" filtering="true">
<j:if test="${stripLinebreaks == 'true'}">
<ant:echo message="Removing line breaks" />
<delete file="${js.dir}/site-concat.temp"/>

This goal concatenates the 3 JavaScript files file_to_merge.js, another_file_to_merge.js,
and etc.js into a single (temporary) file site-concat.temp.
It then compresses site-concat.temp into site-breaks.temp (-breaks suffix indicates compressed file still contains line brakes).
Finally, it moves the content of site-breaks.temp into site_c.js (optionally removing line breaks) and cleans-up any temporary files and directories.

7.Compression task (to compress many independent files)

<goal name="compress-forum-js" prereqs="get-compressor" description="Compress JavaScript files for 'forum'">
<j:set var="stripLinebreaks" value="true"/>
<j:set var="src.dir" value="${path_to_javascript_files}"/>
<!-- Delete previously compressed files -->
<ant:fileset dir="${src.dir}" includes="*_c.js"/>
<ant:echo message="Compressing 'forum' JavaScript files from ${src.dir}" />
<!-- create temp dir for compressed files -->
<ant:mkdir dir="${src.dir}/_compressedjs"/>
<j:set var="dest.dir" value="${src.dir}/_compressedjs"/>
<!-- compile JavaScript files to compress -->
<ant:fileScanner var="forumJSFiles">
<ant:fileset dir="${src.dir}" casesensitive="yes">
<ant:include name="file_to_compress.js"/>
<ant:include name="another_file_to_compress.js"/>
<ant:exclude name="*_c.js"/>
<!-- loop through files and compress using compressor set in 'get-compressor' goal -->
<j:forEach var="jsFile" items="${forumJSFiles.iterator()}">
<ant:echo message="Compressing ${}" />
<ant:arg value="-c"/>
<ant:arg value="${src.dir}/${}"/>
<!-- move compressed files back from dest to src dir -->
<ant:move todir="${src.dir}" filtering="true">
<ant:fileset dir="${dest.dir}" casesensitive="yes">
<ant:include name="*.js"/>
<j:if test="${stripLinebreaks == 'true'}">
<ant:echo message="Removing line breaks" />
<ant:mapper type="glob" from="*.js" to="*_c.js"/>
<!-- delete temp dir -->
<ant:delete dir="${dest.dir}"/>

This goal loops through a list of JavaScript files compressing them one by one.


Rhino compression removes all comments, so beware of IE-specific conditional compilation statements such as the snippet below (taken from

var xmlhttp=false;
/*@cc_on @*/
/*@if (@_jscript_version >= 5)
// JScript gives us Conditional compilation, we can cope with old IE versions.
// and security blocked creation of the objects.
try {
xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
} catch (e) {
try {
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (E) {
xmlhttp = false;
@end @*/

There are many possible ways around this issue, namely:
  • Use Ant to concatenate the critical code sections with the compressed files after compression
  • Move all critical sections to a separate, uncompressed file
  • Move all critical sections inline
  • Fix the compressor and submit your patch to Dojo :)


When I implemented the solution above, more than a year ago, a common complaint from fellow developers was that the compression task had the undesirable side-effect of slowing down the build -biiig time. A colleague pointed me towards Ant's Uptodate task, which I then used to implement conditional compression. This means files were only compressed if they had been changed. Conditional compression reduced the automated compression time from about 1 minute to 5 seconds on average.

To use conditional compression, replace the ellipsis in the script below with the content of any of the compression goals above.

<goal name="conditional-compress-js" prereqs="get-compressor"
description="Conditional compression of JavaScript files">
<j:set var="src.dir" value="${path_to_javascript_files}"/>
<!-- check timestamps to see if compression is required -->
<ant:echo message="Checking timestamps of JavaScript files from ${src.dir}" />
<ant:fileScanner var="jsFiles">
<ant:fileset dir="${src.dir}" casesensitive="yes">
<ant:include name="**/*.js"/>
<ant:exclude name="**/*_c.js"/>
<j:forEach var="jsFile" items="${jsFiles.iterator()}">
<ant:echo message="Checking last-modified-date of ${}" />
<uptodate property="js.modified" targetfile="${src.dir}/${}">
<srcfiles dir="${src.dir}" includes="**/*_c.js" />
<j:if test="${js.modified}">
<j:set var="compression.required" value="true"/>
<j:if test="${compression.required}">
<ant:echo message="Modified JavaScript files. Compression required." />

<!-- insert compression block here -->

10.Further (potential) Improvements
  • Clever use of Caching, to avoid unnecessary downloading of unmodified resources.
  • Request parameter trigger to alternate between compressed and uncompressed JavaScript in real-time, a feature that is very useful for testing and debugging.
  • Very clever use of Versioning to aid with caching; my friend and former colleague Robert, responsible for an ingenious solution, can write about it in a post of his own.
  • In specific and controlled situations JavaScript namespaces can be shortened with tools like Ant, e.g.: <replace file="${file.js}" token="the.long.api.namespace" value="__u._a"/>.
  • Clever use of HTTP compression


  • I dislike the verbosity (by the very nature of XML) and obtrusiveness in the build configuration file.
  • I believe the Maven plugin approach (in combination with Julien Lecomte's excellent YUI Compressor) is a better solution.
  • With a few changes the scripts above can also be used to compress CSS resources at build time.

(Vaguely) Related Posts:
Measuring client-side performance of Web-apps
CSS clean-up @ build time

Custom on-the-fly compression
Make your pages load faster by combining and compressing javascript and css files

On-the-fly server compression
gzip, where have you been all my life..?

Compression @ build time
YUI compressor
Maven plugin for the YUI Compressor

Relevant Articles
YUI Performance Research - Part 1
YUI Performance Research - Part 2
Minification v Obfuscation
Serving JavaScript Fast
Using The XML HTTP Request
Response Time: Eight Seconds Plus Or Minus Two
Optimizing Page Load Time
Speed Web delivery with HTTP compression

Friday, August 31

Reusable Advice - Part II: Unobtrusive Notification

Besides encapsulating behaviors that affect multiple classes into reusable modules, AOP can also be used for unobtrusive notification tasks.
Instead of browsing application logs looking for exceptions, for example, I like to have the application itself email me the exception stack-trace and the arguments that caused the error when something goes wrong.

Here's an example advice that sends an email after an exception is thrown. The date of occurrence and the exception stack-trace are sent in the body of the email.

The Java code for the advice:

public class ThrowableAdvice {
private MailSender mailSender;
private SimpleMailMessage mailMessage;
public static final String NEW_LINE = System.getProperty("line.separator");

// Setters and getters omitted for clarity

* Send email notification with exception stacktrace
* @param throwable
* @return boolean frue if email message is sent, false otherwise
public boolean notifyByEmail(Throwable throwable) {
SimpleMailMessage msg = new SimpleMailMessage(this.mailMessage);
StringBuffer txt = new StringBuffer(msg.getText());
txt.append(NEW_LINE).append(new Date());
try {
return true;
} catch (MailException me) {
logger.error("Error sending email notification ", me);
return false;

A useful method to "print" a stack-trace onto a String:

public static String stackTraceToString(Throwable throwable) {
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
return sw.getBuffer().toString();

And a snippet of the Spring XML configuration:

<aop:aspect id="appThrowableInterceptor" ref="throwableAdvice">
pointcut="execution(* some.package.*.*(..))" />

<bean id="throwableAdvice" class="ThrowableAdvice">
<property name="mailSender" ref="mailSender"/>
<property name="mailMessage" ref="throwableMessage"/>

<bean id="mailSender" class="org.springframework.mail.javamail.JavaMailSenderImpl">
<property name="host" value="localhost"/>

<bean id="throwableMessage" class="org.springframework.mail.SimpleMailMessage">
<property name="subject" value="ALERT! Exception thrown @ some app module"/>
<property name="text" value="ALERT! Exception thrown from some module"/>
<property name="from" value=""/>
<property name="to">

This is as flexible and unobtrusive as it gets -the throwableAdvice advice above can be:
  • easily reused by specifying a different pointcut (above in bold);
  • removed altogether without needing to touch the advised (target) code;

(Vaguely) Related Posts:
Better Living Through (Unit) Testing and Advice
Reusable Advice - Part I

AOP@Work: Dependency injection with AspectJ and Spring
AOP@Work: Check out library aspects with AspectJ 5
AOP@Work: AOP myths and realities
Improve modularity with aspect-oriented programming
Spring AOP Reference documentation

Saturday, August 18

Compulsive Attention Seeker

A few weeks ago, I approached a TfL staff member (let's call him SM) after failing to exit the gate with my Oyster card. Chat transcript below.

Me: "Hi, I can't get out and the machine keeps telling me to seek attention..."

SM (grinning and slowly printing each word in my head): "seek attention? Are you sure the machine told you to seek attention"?

Me (mildly annoyed and a tad confused): "Yes, it displayed the words seek attention in flashing red letters!"

This goes on a few more rounds until SM, visibly amused, opens the gate and let's me through.


Today's note to self: the gate screens actually display the words "Seek Assistance"...

Sunday, August 12

Resources for learning Chinese (language & culture)
Download free .mp3 podcasts with the basic, free account. Each lesson is a short and entertaining conversation between an English native and a Chinese speaker.
There is also plenty of activity in the discussion panels, where community members help and discuss spoken and writing details about the lessons.
Very thorough and authoritative course material developed by the United States government. Each course module contains audio (.mp3), a student textbook (in .PDF format), and a workbook. The lessons are impeccable and demanding.
"This website is aimed at contributing to a better understanding of the Chinese languages and how romanization can be used to write languages traditionally associated with Chinese characters (such as Japanese, Korean, and especially Mandarin Chinese)."
Very academic with plenty of excellent scholarly essays, such as:
Online dictionary of Chinese characters.
Online, one-on-one tutoring with native Chinese teachers. Similar to ChinesePod but access to most content requires paid subscription.

[Update, Jan 2013]:

[Update, Mar 2010 via Mariam]:
Learn Chinese and Japanese characters by actually writing them from scratch. Much better than this Taiwanese school at teaching pronunciation and stroke order.
Slick and easy to use User Interface.

[Update, Apr 2008]:
Dictionary, conversations (text-to-speech annotated with hànzì and pīnyīn), Q & A space, tools, an awesome handwriting recognition tool that let's you draw Chinese characters, and plenty more cool stuff!

[Update, Dec 2007]:
Free Foreign Language Courses Online

BBC Languages - Mandarin Chinese

MIT - Foreign Languages and Literatures

Utah University - Languages, Philosophy and Speech Communication - Learn to Speak Mandarin Chinese Online

Other interesting resources:
Language Centre in the School of Oriental and African Studies (SOAS) at the University of London.

Comparing American and Chinese Negotiation Styles (Video)

How hard is it to learn Chinese? (BBC News)

88 MOCCA - The Museum of Chinese Contemporary Art on the web

Chinese Contemporary

UK Chinese Music

Beginner's Chinese

I know I'll be 'intermediate material' the day I am able to karaoke to this song...

Friday, August 10

CRUD unit-testing checklist

I'm writing a unit-test code generator for CRUD operations to keep me from repeating myself. In preparation for this, I'm compiling a comprehensive set of CRUD unit-tests beyond the eponymous ones.

So, for a given business object Foo I may want to test:


public void TestCreate_NullFoo {
try {
} catch(PersistenceServiceException pse){
fail("Expected exception not thrown");

// possible test, depends on requirements
public void TestCreate_EmptyFoo {
Foo foo = new Foo(); // no properties set
try {
} catch(PersistenceServiceException pse){
fail("Expected exception not thrown");

// depends on requirements; validation shouldn't happen here
public void TestCreate_InvalidFoo {
Foo foo = new Foo();
foo.set...; // populate with known invalid values
try {
} catch(PersistenceServiceException pse){
fail("Expected exception not thrown");

// depends on requirements; validation shouldn't happen here
public void TestCreate_IncompleteFoo {
Foo foo = new Foo();
foo.set...; // miss required values
try {
} catch(PersistenceServiceException pse){
fail("Expected exception not thrown");

public void TestCreate_IncrementsCount {
int beforeCreate = PersistenceServices.countFoos();
Foo foo = new Foo();
foo.set...; // populate with required values
int afterCreate = PersistenceServices.countFoos();
assertTrue(afterCreate > beforeCreate);

public void TestCreate_IsNotIdempotent {
Foo foo = new Foo();
foo.set...; // populate with required values
Foo aCreated = PersistenceServices.create(foo);
Foo bCreated = PersistenceServices.create(foo);
assertNotEquals(aCreated, bCreated);

public void TestCreate_Foo {
Foo foo = new Foo();
foo.set...; // populate with required & validated values
Foo created = PersistenceServices.create(foo);
assertEquals(foo, created);


public void TestRead_NullFoo {
Foo read = PersistenceServices.getFoo(Null);

// depends on requirements
public void TestRead_EmptyFooId {
Foo read = PersistenceServices.getFoo("");

public void TestRead_DoesNotIncrementCount {
int beforeRead = PersistenceServices.countFoos();
PersistenceServices.getFoo(aFoo); // aFoo is known, existing Foo
int afterRead = PersistenceServices.countFoos();
assertTrue(beforeRead == afterRead);

public void TestRead_DuplicateFoo { // see IsIdempotent below

public void TestRead_IsIdempotent {
Foo aFoo = PersistenceServices.getFoo(aFoo);
Foo bFoo = PersistenceServices.getFoo(aFoo);
assertEquals(aRead, bRead);

// depends on requirements; could throw friendly exception
public void TestRead_DeletedFoo {
Foo deleted = PersistenceServices.delete(aFoo);
Foo read = PersistenceServices.getFoo(deleted);

public void TestRead_FooById {
Foo foo = new Foo();
foo.set...; // populate with required & validated values
Foo created = PersistenceServices.create(foo);
Foo read = PersistenceServices.getById(;
assertEquals(foo, read);


public void TestUpdate_NullFoo {}

public void TestUpdate_EmptyFoo {}

public void TestUpdate_InvalidFoo {}

public void TestUpdate_IncompleteFoo {}

public void TestUpdate_DuplicateFoo {}

public void TestUpdate_UnchangedFoo {}

public void TestUpdate_DoesNotIncrementCount {}

public void TestUpdate_IsIdempotent {}

public void TestUpdate_UndoFoo {}

public void TestUpdate_Foo {}


public void TestDelete_NullFoo {
Foo deleted = PersistenceServices.delete(Null);

public void TestDelete_EmptyFoo { // dependent of business requirements
Foo foo = new Foo(); // no properties set
Foo deleted = PersistenceServices.delete(foo);

public void TestDelete_InvalidFoo {

public void TestDelete_IncompleteFoo {

public void TestDelete_DuplicateFoo {

public void TestDelete_DecrementsCount {
Foo aFoo = PersistenceServices.getFoo(aFoo);
int beforeDelete = PersistenceServices.countFoos();
int afterDelete = PersistenceServices.countFoos();
assertTrue(beforeDelete > afterDelete);

public void TestDelete_IsIdempotent {
Foo aFoo = PersistenceServices.delete(aFoo);
Foo bFoo = PersistenceServices.getFoo(aFoo);
aFoo = PersistenceServices.delete(aFoo);
bFoo = PersistenceServices.getFoo(aFoo);

public void TestDelete_ExistingFoo {
Foo aFoo = PersistenceServices.delete(aFoo);
Foo bFoo = PersistenceServices.getFoo(aFoo);

public void TestDelete_MissingFoo {
// aFoo does not exist
Foo deleted = PersistenceServices.delete(aFoo);
assertNull(deleted); // could expect exception

// depends on requirements
public void TestDelete_ReturnsDeleted {
Foo aFoo = PersistenceServices.getById(;
Foo deleted = PersistenceServices.delete(aFoo);
assertEquals(aFoo, deleted);

(Vaguely) Related Posts
Testing For Developers
Automated Unit Testing
CRUD base classes for unit testing
Unitils Guidelines

Sunday, July 22

Sunday Talks: Scalability @ YouTube

I've been watching the videos from the Seattle Conference on Scalability talks. These are my notes from the YouTube Scalability talk by Cuong Do, YouTube Engineering Manager. It's pretty amazing how a team of only 9, including 2 developers, 2 sysadmins, 2 scalability architects, 2 network engineers, and 1 DBA, grew YouTube from Null to delivering over 100 million videos/day (prior to the Google acquisition).

  1. Summary of engineering process: deploy quickly, identify bottlenecks (they will happen), fix and repeat; constant iteration, no premature optimization.
  2. They wrote their own Web Server in Python. Why? Python is fast enough with less than 100ms page service time. Development speed was critical, whereas benefits of speed on the server are negligible. Critical sections (bottlenecks) were then surgically optimized: compiling to C, writing C extensions, pre-generating and caching HTML, etc.
  3. Videos are hosted by mini-clusters, which are small number of machines serving the exact same set of videos and providing scalability, fault-tolerance, and replication.
  4. Popular content (head of 'long-tail') is stored in CDN's. I find this 'vicious cycle' very interesting: new users are channeled to popular lists keeping the majority from randomly hitting the long tail of content all the time; head of long tail is highly tuned for fast and efficient delivery of content, thus increasing (and perpetuating) the list's popularity.
  5. Surprisingly, for a site streaming Gigabytes of videos per day, storing and serving thumbnails caused major problems: OS limitations in respect to the number of files on a directory, high number of small requests (~4+ times more thumbnails than video), etc.
  6. Asynchronous replication of MySQL is a bottleneck: replicas can fall behind master and serve old data; replication process is a single thread, causing replica lag, too many read replicas,etc. Introduced replica pools as temporary solution: video watching users served from a 'watch pool', and everything else from a different pool (damage containment).
  7. Finally settled on DB shards (non-overlapping DB partitions).

[Update]: Notes from others

Sunday Music: Interpol & White Stripes

Wednesday, June 20

CSS clean-up @ build time

Here is a simple Maven goal to optimize CSS delivery at build time and low cost.

<!-- set path to CSS files -->
<set var="css.cleanup.src.dir" value="path_to_css_files" />

<goal name="cleanup-css" description="Remove comments and blank lines from CSS files">
  <echo message="Removing comments and blank lines from files in ${css.cleanup.src.dir}" />
    remove single+multiline comments,
    see docs@
  <replaceregexp byline="false" flags="gm">
    <regexp pattern="\/\*(.\s)*?\*\/" />
    <substitution expression="" />
    <fileset dir="${css.cleanup.src.dir}">
      <include name="*.css" />

  <!-- remove multiple blank lines -->
  <replaceregexp byline="false" flags="gm">
    <regexp pattern="^\r\n^\n" />
    <substitution expression="" />
    <fileset dir="${css.cleanup.src.dir}">
      <include name="*.css" />

Is it really worth it?
I tested it on 5 CSS files (test1.css, test2.css,test3.css, test4.css, test5.css) of different sizes and obtained a respectable file size reduction. In addition, dev comments are not shipped and exposed to the outside world.

Below is the output of the Python stats scripts I wrote to analyse situations like this (testX_c.css is the compressed version of the original file):

>>> describe(cmpfilesizelist(d, listfiles(d, "*.css")))

file test1.css is 4.57% larger than test1_c.css
file test2.css is 33.40% larger than test2_c.css
file test3.css is 21.93% larger than test3_c.css
file test4.css is 34.55% larger than test4_c.css
file test5.css is 36.54% larger than test5_c.css

('Sample Size: 5', 'Min/Max: ', (4.5725935634192512, 36.543124350536885), 'Mean: 26.20', 'Median: 33.40', 'Std. Dev: 13.37', 'Std. Error: 5.98')

The compressed files also validated on the w3c CSS validator

Monday, June 18

Measuring client-side performance of Web-apps

About 80% of Web-application loading time is spent on the browser. Since users prefer faster web sites, the user experience is affected by what happens on the browser.
Thus, it is possible to improve the user experience by breaking down, measuring, and optimizing the activities on the browser.

Enter Page Detailer, a graphical tool that measures client side performance of Web pages.
Page Detailer assesses performance from the client's perspective by showing how the page was delivered to the browser. It provides detailed graphical and tabular views about the timing, size, data flow, and identity of each item in a page, shown in the order started by the browser.

It has a very simple interface and is very easy to use -it will automatically capture, time, and plot calls made by the browser.

It is very useful to:

  • track number of items requested by page (how many files? From how many different servers?)
  • track response time per request/item
  • track connection time per request/item (loading many small files from too many servers? Requesting too many small files?)
  • check overall page size and load time
  • check page structure and data flow
  • optimize organization of content
  • compare page load time using different transports mechanisms

The captured data does not include the browser's rendering time, and does not time separately any intelligent use of lazy loading techniques. Thus, it is not an exact mirror of the end user experience.

There are trade-offs between client-side performance optimization and improvements to the (holistic) user experience, e.g.: uncompressed items 'travel slower' but render faster on the browser and vice-versa.

Download from IBM alphaWorks @

Similar tools

  • Web Page Analyzer - Web-based Website performance tool and Web page speed analysis
  • WebWait - simple Web-based website timer: type a URL and see how long it takes to load; useful to benchmark a website or test the speed of a web connection.
  • Pingdom - Web-based monitoring suite which also tests the load time of a web page including all its objects.
  • eValid - testing & analysis suite with recording and playback options.


Tuesday, May 29

Superman+Spiderwoman(?!), Bollywood style

This made my day week, gotta love the flying special effects :D

Stumbling on happiness

Daniel Gilbert's fantastic book won the Royal Society prize for science books.
Two other books in the same league for me were:

  1. The Undercover Economist, by Tim Harford
  2. The Paradox of Choice: Why More Is Less, by Barry Schwartz

Tuesday, May 22

Thursday, May 17

Efficient JavaScript talk

Watch it @ the YUI Theater or download for Quicktime

5 lessons to retain from the talk:

  1. Don't modify visible objects: hide, modify, display.
  2. Best way to append multiple elements: remove node from DOM, append elements, re-attach element to DOM.
  3. Beware of the markup maintenance issues caused by the use of innerHTML.
  4. Avoid internal browser reflow (repositioning) by caching element properties.
  5. Rather than binding events to every element inside a container, bind only to the parent container. It is easy to access each of its children by id.

Wednesday, May 16

The Google Mission

I often hear "predictions" about the inevitability of a Google downfall because they are stretching themselves far beyond search. The Economist claims:

"A leapfrog may seem unlikely, given Google's reputation as an innovator, but its diversification into so many fields beyond web-searching might yet cause it to stumble"
from Out-googled, May 10th 2007.

Quoting Google's Corporate Information page:
"Google's mission is to organize the world's information and make it universally accessible and useful."
The word 'search' is not mentioned in the mission statement. Thus, the core search service as we know it is simply a means to an end -an entry point to a much more ambitious goal: "organize the world's information and make it universally accessible and useful".
Docs, apps, email, IM, and the sometimes misunderstood calendar, are all perfectly aligned with the company's strategy (unsurprisingly).
In fact, as an information organizer and management tool, a calendar is closer to their mission statement than a search box.

Tuesday, May 15


Jia on bubble skirt[click on image to enlarge]

Last Sunday, walking down Portobello Road, a couple of photographers from The Style Scout spotted this young beauty on a very unique and underused bubble skirt. I, of course, had to get out of the way -no room for trend-wasters in the A-list fashion world :)

Monday, May 14

New house, new address

This is a list of the institutions I have notified of my address change, for future reference:

  1. Abbey National (Cash ISA): filled-in paper form in person, confirmed a few days later and address was successfully changed.
  2. Amazon: Added new address to my account and set it as the default one. It was that simple.
  3. Barclays (Personal Banking): Filled-in paper form in person, returned a week later to find old address still on the system. Account manager 'apologised' and changed address on the spot (on the system, no forms needed).
  4. Barclaycard (Credit Card): Filled-in form on the back of statement and sent it on pre-paid envelope. Several weeks later, statement for the following month was still sent to old address. Had to call to confirm address change.
  5. BCP (Personal Banking+Credit Card): Sent letter with copy of tenancy agreement to HQ. No confirmation and old address is still on the system.
  6. BUPA (Private Health Care):
  7. Council: Posted Westminster council tax form. A week later received letter addressed to my name, plus 3 times the same form I had sent before.
  8. DVLA: Posted driver license and paper counterpart to DVLA. Received updated license a week later.
  9. The Economist: Emailed new address and received confirmation. Also used online form @ Address change was only effective in 10 days time -it is a weekly newspaper(!), which says something about their logistics infrastructure.
  10. Electoral Roll: As per instructions on the Electoral Commision, obtained form online and posted to address given. No confirmation yet.
  11. Fidelity (Savings Funds): Setup online account and requested address change online. It was done in less than a week with a pre-paid envelope sent for the account holder's signature.
  12. Invesco Perpetual (Stocks ISA): Called and was sent change-of-address form to new address (account holder's signature is required). Sent form back on pre-paid envelope and address was changed on the system accordingly. Whole process was painless and took about a week.
  13. Inland Revenue: Called and was told that I needed to write a letter. Did so, sent letter, and received no confirmation. Called again and address was changed over the phone.
  14. O2: Updated online form. Unsure if successful since I receive electronic invoices.
  15. PayPal: Added new address to system but it is clumsy and unintuitive to remove old (verified) address.
  16. Scottish Widows (Pension): Changed address online @ Changes take effect within 3 working days but that is not too bad for a pension fund.
  17. Southampton University:
  18. University College London: Emailed student records and was pointed towards alumni form @ No confirmation yet.
  19. Virgin Media (Ex-Telewest, for TV+broadband): Called to transfer account and change address and a few days later received statement to new address. However, a week later called to proceed with account transfer and old address was still on the system.
  20. Solomon Smith Barney (Stock Options): Faxed address change form. Received no acknowledgment and more than a week later old address still shows on my online account. Called customer services a week later and was informed that new address is already on file and the online system will not be updated. Needless to say, they thought it was pretty normal that I should get no notification nor acknowledgment. Those who can't abandon ship due to some corporate deal already know what Smith Barney is all about -the angry comments from like minded unfortunate users like myself on their own web survey were quite clear. To those undecided, a word of warning: run! Don't come anywhere near this sinking ship.
  21. Work: filled-in online form and emailed HR. System was updated immediately.