Friday, July 21

5 Useful Python Tips

Here are a few useful Python tips I’ve learned over time.


1. When using the '%' format operator always put a tuple or a dictionary on the right hand side.


Instead of:
  print "output %s" % stuff


Write:
  print "output %s" % (stuff,)


With the tuple on the right hand side, if stuff is itself a tuple with more than one element we'll still get its representation instead of an error.


Example:
  >>> def output(arg):
            print "output %s" % arg

  >>> output("one item")
  output one item

  >>> output(('single tuple',))
  output single tuple

  >>> output(('tuple','multiple','items'))

  Traceback (most recent call last):
  File "", line 1, in -toplevel-
  output(('tuple','multiple','items'))
  File "", line 2, in output
  print "output %s" % arg
  TypeError: not all arguments converted during string formatting


Now, if the function output is changed to:
  >>> def output(arg):
            print "output %s" % (arg,)

  >>> output(('tuple','multiple','items'))
  output ('tuple', 'multiple', 'items')


It will always work as intended and expected.


2. Use the built-in timer function proactively and aggressively to avoid "premature pessimization".


Python has a very useful built-in timing framework, the timeit module, which can be used interactively to time the execution of short pieces of code.
Suppose we want to find out if a hypothetical word_count implementation is faster using the split() method or using a loop.
We'd like to implement each variant, call each implementation many times, repeat the entire test a few times, and select the one that took the least time.

Timeit.py to the rescue. Let's test the implementation using split() first.

  >>> import timeit
  >>> def word_count():
            s = "long string with several words to be counted "
            return len(s.split())

  >>> word_count()
  8

  >>> t = timeit.Timer(setup ='from __main__ import word_count', stmt='word_count()')

  >>> t.repeat(3, 1000000)
  [4.6016188913206406, 4.5184541602204717, 4.5227482723247476]


And now let's test a loop variant.
  >>> def word_count():
            s = "long string with several words to be counted "
            return len([c for c in s if c.isspace()])

  >>> word_count()
  8

  >>> t = timeit.Timer(setup ='from __main__ import word_count', stmt='word_count()')

  >>> t.repeat(3, 1000000)
  [17.766925246011169, 17.784756763845962, 17.890987803859275]


We have our informed answer right there and then.


The first argument of repeat() is the number of times to repeat the entire test, and the second argument is the number of times to execute the timed statement per test.


You can even select the best out of X runs (3 on this example) by using the min function
  >>> min(t.repeat(3, 1000000))
  17.766925246011169


We can try and compare other implementations such as a loop without the (expensive) call to isspace().

  >>> def word_count():
            s = "long string with several words to be counted "
            return len([c for c in s if c == ' '])

  >>> word_count()
  8

  >>> t = timeit.Timer(setup ='from __main__ import word_count', stmt='word_count()')

  >>> t.repeat(3, 1000000)
  [8.8144601897920438, 8.7707542444240971, 8.7721205513323639]


Which proves faster than our second implementation but still slower than calling split().


Note:
Instead of repeat() we can call timeit(), which calls the function 1 million times and returns the number of seconds it took to do it.


3. Don't traverse to append, extend instead.


Don't do:
  >>> def bad_append():
            l1 = ["long","string","with","long"]
            l2 = ["elements","and","words","to","be","counted","or","words"]
            for item in l2:
                  l1.append(item)

  >>> t = timeit.Timer(setup ='from __main__ import bad_append', stmt='bad_append()')

  >>> min(t.repeat(3, 1000000))
  5.4943255206744652


Do instead:
  >>> def good_append():
            l1 = ["long","string","with","long"]
            l2 = ["elements","and","words","to","be","counted","or","words"]
              l1.extend(l2)

  >>> t = timeit.Timer(setup ='from __main__ import good_append', stmt='good_append()')

  >>> min(t.repeat(3, 1000000))
  2.3049167103836226


Calling extend() results in an almost 60% performance gain.


4. Beware of doing string concatenation using '+'.


Let's see why with "no fluff just stuff" by applying golden rule 2 above.
Bad:
  >>> def bad_concat():
            s = ""
            l = ["items", "to", "append"]
            for sub in l:
                  s += sub

  >>> t = timeit.Timer(setup ='from __main__ import bad_concat', stmt='bad_concat()')

  >>> min(t.repeat(3, 1000000))
  1.6777893348917132


Better:
  >>> def good_concat():
            s = ""
            l = ["items", "to","append"]
            s = "".join(l)

  >>> t = timeit.Timer(setup ='from __main__ import good_concat', stmt='good_concat()')

  >>> min(t.repeat(3, 1000000))
  1.3923049870645627


Needless to say all this adds up if these operations are done repeatedly and with bigger lists.


Also avoid:
  out = "output: " + output + ", message: " + message + ", param: " + param


Instead, use:
  out = "output: %s, message: %s, param: %s" % (output, message ,param, )


Which neatly combines rules 1 and 4.


5. Environment settings and variables are available cross-platform.


This is a very handy feature. Take a close look at os.path.expanduser() and os.environ on Linux and Windows.


*Nix:
  >>> import os
  >>> os.path.join(os.path.expanduser('~'))
  '/home/jcastro/'


Windows:
  >>> import os
  >>> os.path.join(os.path.expanduser('~'))
  'C:'


Useful Online Resources

The Python Coding Conventions
Python Performance Tips
Patterns in Python
Data Structures and Algorithms with Object-Oriented Design Patterns in Python
The Python Tutor Mailing List
My Python links on del.icio.us

Saturday, July 15

Happy Feet

Shown tonight, during the opening of Superman Returns



More here.

Thursday, July 13

RIP Syd



Thank you for the wonderful legacy.