Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Python: map or list comprehensions?

This question seems to come up a lot when doing functional programming in Python. Should you use the more classical map/filter functions, or the Pythonic and usually more readable list/generator comprehensions? Do you do:

map(lambda x: x*x, xrange(11))


[x*x for x in xrange(11)]


(Note that map in Python3 is lazy and you would use range(), if you want laziness with comprehensions use parentheses instead of brackets to use generators.)

There are arguments for and against both. map() has a long tradition in functional programming, it has "classicalness" going for it. The downside is Python doesn't have very good lambdas, so in practice you'll have to write and give your function a name just before calling the map function. I don't see this as a big downside since the function will still be locally scoped and is still a closure, and you can name it something like '_' if you can't think of anything better. map() combined with reduce/filter is also fairly standard, and decently readable, but with Python nested list comprehensions can get confusing fast. I do think for simple cases like the above though, the list comprehension form is more readable. Regarding speed, sometimes one is faster than the other, I don't think it matters that much.

However, there is a core argument in favor of the map() form that I don't see being made very often. It has to do with parallelizing. It's what Hadoop and all these other NoSQL, Big Data things are founded on, the MapReduce model. If your function has no side-effects, the input data can be split up into different threads, machines, nodes, whatever, and processed very quickly in parallel, then joined back up on the master machine.

If your code is written using the map() function, and a clever programmer on your team writes a DistributedMap() function, guess what? You can simply say in your global imports: map = DistributedMap, and now all the code that used map() before is distributed and instantly faster. No need to rewrite stuff. You cannot do this if you use the comprehension form, you will have to rewrite it.

That is why I prefer map() over comprehensions even if the latter look nicer in many cases. Of course many times I use both, switching between them as my mood sees fit. In the above example I might equally write either one, because I'm not dealing at the scale where I have such a volume of data or a complex mapper function nor do I expect to. If you think you might need to optimize later, start with the map() and keep with it.

Posted on 2011-08-04 by Jach

Tags: programming, python


Trackback URL:

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.