TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

An attempt at a practical exploration of Python for newcomers

In open office format here. (I know, I know, LaTeX heresy!) This post may be more up to date if I corrected anything though.

This document isn't explicitly about the syntax and 'overview of language features' of Python. Figure out the syntax of Python yourself from examples, Google, guessing, or error messages. Some features will be mentioned in passing when they're not obviously inferred from the example.
If you're looking for a style guide (sometimes the best way to learn syntax): check out “PEP 8”: http://www.python.org/dev/peps/pep-0008/ (I disagree with some of it, of course, but PEP 8 explicitly is fine with that.)

Pro tip 1: never use tabs.
Pro tip 2: read what an error message actually says before asking for help. They're very unlike C++ compilers' vague template-related errors over which people tend to get glazed eyes.
Pro tip 3:
“Very often it will be faster for you to try something out on the computer than to look it up in the manual. Besides, the computer is always right, and the manual could be wrong.”
--Apple II Basic Programming Manual

Python 2 or Python 3?



3 is the future, but most people still use 2 (specifically 2.7). The differences are minimal. This document uses 2.7. (A few differences with 3 are mentioned in passing comments.)

Python Installation



There are a lot of “try Python in your browser!” websites out there which is useful when you're on a locked down machine. http://shell.appspot.com/ seems to work okay.

On Windows, you should use the 32-bit version of Python, because you're more likely to find a 32-bit build for a third party library you want to use. Also if you want to create an “EXE” for your Python program that you can distribute, a 32-bit one tends to be easier to achieve and will also run on Windows XP...
For every other OS, 64-bit. Come on, it's 2014.

Windows: download and install here: http://python.org/download/releases/2.7.6/
A lot of third party modules, mostly ones that have C/C++ components, are unofficially built for Windows here: http://www.lfd.uci.edu/~gohlke/pythonlibs/
(Main recommendation: setuptools will give you the 'easy_install' command line program which makes it easy to install other pure Python packages. Linux distros tend to come with Python and 'easy_install' (or its successor 'pip') already, and they have other Python modules in the package manager. OS X comes with Python but I hear it's better to get Homebrew going. Cygwin may also come with Python but you're better off with the explicitly Windows Python in your C:\Python27 folder...)

On Windows you can also try IronPython, which is a Python implementation on top of .NET. A lot of Windows devs like it, and in addition to having access to everything that's pure Python you have direct access to everything in .NET which I often hear makes Windows application programming a joy.

(There's also the Jython project built on Java's JVM.)

For those on the bleeding edge, PyPy is another implementation of Python that uses a JIT compiler to massively speed up pure Python programs (even beating hand-written C in special cases).
Open it up!

First open the Python program:

Windows:
Start > All Programs > Python 2.7 > IDLE

Or open up cmd.exe and type 'python', if it's on your path. IDLE is a nice GUI though because it has syntax highlighting and auto-indenting and auto-doc and if you press tab it will auto-complete code for you. Some versions by default make the command history key “alt p” for some reason, you can change it to the more familiar up-arrow in IDLE's settings.

OS X, Linux: open terminal, type 'python'.

You'll be greeted with a window and something like the following in it:


Python 2.7.5
Type "help", "copyright", "credits" or "license" for more information.
>>>


Welcome to the Python REPL. The REPL Reads your input, Evaluates it as python code, Prints the result, and Loops to wait for your input again.

In this environment you can write any Python code you want. This provides a very interactive way of developing your program that's very different to the 'Save file > compile > run > see error > return to file' pattern of other languages. Of course you can always go back to doing things that way. Just save a .py file with your editor of choice and double-click it to run. Or pass the filename as an argument to the python exe on the command line. (Make sure to add a call to raw_input() as the last line so the window doesn't close automatically.)

Sometimes people call the REPL the Interpreter. In a sense Python is an 'interpreted' language, but if that's true then so is Java. In Python your input (whether on the REPL or in a .py file) is read, line-by-line (“interpreted”) and immediately compiled automatically (instead of manually as with Java) to bytecode which is executed (“interpreted”) by the language's virtual machine, rather than the actual machine which just “executes” assembly code. When discussing modern programming languages, simplistic concepts like “interpreted language”, “compiled language”, or “scripting language” are obsolete and inappropriate for describing a language's implementation, capabilities, or usefulness.

Python as a calculator



Most of my daily use of Python tends to be as a quick calculator for simple arithmetic. e.g.:


>>> hash_expected = 40e15 # 40 Petahashes/s
>>> hash_expected / 7158388.055
5587850182.5646


Stuff like that. Once I have my result I might close the window, do other stuff, and later I'll need to calculate something else and open up a new Python REPL. Not too glamorous, but it's so much faster and versatile than opening a GUI calculator or finding my TI-89, and is stylistically nicer than Matlab/Octave. More examples:


>>> 2+2 # addition
4
>>> 2*3 # multiplication
6
>>> 2**3 # exponentiation
8
>>> 2**3.5 # exp with floats (all Python floats are 64-bit doubles. There are no 32-bit floats.)
11.313708498984761
>>> 1/2 # Note: Python 3 will return 0.5. You need to do 1//2 for old behavior.
0
>>> 1/2.
0.5
>>> 1/2.0
0.5
>>> 1/float(2) # other type casting functions: int(), str()
0.5
>>> half = _ # _ always contains the most recently outputted value, here we assign a copy of the last output value to the half variable.
>>> 2.718281828**(half*3.14159265358979j) # complex numbers work, grouping done with parens
(2.6526720281826307e-10+1j)
>>> _ * 2.718281828**(half*3.14159265358979j)
(-1+5.305344056365261e-10j) # close enough to -1 exactly!


Python has a large, useful standard library (“batteries included”), all available as individual modules you can import into your namespace. This document only touches on a few I thought are practical, you can check out the rest here: http://docs.python.org/2/library/


>>> import math
>>> dir(math) # “dir” uses reflection to give a directory of symbols available within any Object.
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'hypot', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc']
>>> math.cos(math.pi) + 1j * math.sin(math.pi)
(-1+1.2246467991473532e-16j)


Modules are their own namespaces. Classes are too. (Python supports object-oriented programming and all symbols act like objects.) The dot operator is used for indirection.

Python has a lot of symbols and functions “built-in” like the previously used float(). You can get a list of them:


>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NoneType', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__builtins__', '__debug__', '__doc__', '__import__', '__metaclass__', '__name__', '__package__', '_issubtype', 'abs', 'all', 'any', 'apply', 'basestring', 'bin', 'bool', 'buffer', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'cmp', 'coerce', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'execfile', 'exit', 'file', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'sequenceiterator', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']


min(), max(), round(), and abs() are commonly useful in simple calculations.
raw_input() is similar to C++'s getline(). It returns as a string whatever input the user types until they hit enter. You can also pass an argument as the prompt: raw_input('Prompt: ').

Symbols with leading and trailing double underscores are “magic” in some way. For __builtin__, it's magic in that everything is already imported for you by default. If you write a class you can implement the __lt__ method and that provides overloading the < operator.


>>> help(round)
Help on built-in function round in module __builtin__:

round(...)
round(number[, ndigits]) -> floating point number

Round a number to a given precision in decimal digits (default 0 digits).
This always returns a floating point number. Precision may be negative.


You may have to type 'q' to get back to the Python prompt after looking at help documentation, depending on how you're running the Python REPL. You could also just run the command “print round.__doc__”. (__doc__ is another 'magic' method whose function is basically the same as /** */ style doc comments that things like Doxygen look for.)


>>> round(3.5)
4.0
>>> round(4.5)
5.0
>>> round(4.5789)
5.0
>>> round(4.5789, 2)
4.58


Note Python's round() sanely does NOT follow the IEEE 754 standard on floating point; that is, rounding x.5 to the nearest even.

Digression from a basic, practical calculator: need a really really fast (no looping or recursion) Fibonacci function?


>>> def fib(n):
... return round(math.sqrt(5)**-1 * ( (0.5 + 0.5 * math.sqrt(5))**n - \
... (0.5 - 0.5 * math.sqrt(5))**n) )
...
>>> # That \ by the way is to 'escape' the newline, telling Python that the next line is a continuation of the current line.
>>> for i in xrange(1, 10): #you can also use range(), but xrange() is preferred
... print fib(i) # Note: In Python 3, print became the function print()
...
1.0
1.0
2.0
3.0
5.0
8.0
13.0
21.0
34.0
>>> fib(1000)
4.3466557686938915e+208
>>> int(_)
43466557686938914862637500386755014010958388901725051132915256476112292920052539720295234060457458057800732025086130975998716977051839168242483814062805283311821051327273518050882075662659534523370463746326528L # the L indicates the number is a 'long', which take up as much memory as needed to represent. Python auto-converts so you never have integer overflows.


Comparing that number to an official list shows an error of about 1e+195, so it's not super accurate, but it is super fast. (How fast? A looping fib to 1000 will take 100 microseconds, this fib takes 53 microseconds. At 1475 we hit Python's float power limit.) What if we want accuracy, or want to go above 1475? The 'decimal' module provides arbitrary but finite precision. Its default precision is 18 places, as will be evident soon. Note that depending on your version of Python, you may be able to omit the string casts when creating a Decimal object.


>>> fib1000_official = 43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875
>>> from decimal import *
>>> # imports everything from the decimal module into this
>>> # namespace so we don't have to type decimal.foo for
>>> # everything. We could have been more specific on what foos we wanted:
>>> # from decimal import Decimal, getcontext
>>> def fib_precise(n):
... n = Decimal(str(n))
... half = Decimal('0.5')
... five = Decimal('5')
... return Decimal('1') / five.sqrt() * \
... (getcontext().power(half + half * five.sqrt(), n) - \
... getcontext().power(half - half * five.sqrt(), n))
...
>>> fib_precise(1000)
Decimal('4.34665576869375047E+208')
>>> mine = long(_)
>>> mine
43466557686937504700000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000L


See, only 18 significant figures. If we want accuracy up to the official number, we need at least as many significant figures as the real answer:


>>> len(str(fib1000_official))
209


And probably more. For safety, let's double it. This is done with decimal module's getcontext() context object.


>>> getcontext().prec = 2 * len(str(fib1000_official))
>>> mine = fib_precise(1000)
>>> mine
Decimal('43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228874.99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999813')
>>> mine_rounded = mine.quantize(1) # rounds Decimal such that its exponent is 1
>>> mine_rounded - fib1000
Decimal('0')


Huzzah. (In general one can find out if one has enough precision by doing the calculation, then raising the precision, redoing the calculation, and see if the values match. If not, need more precision.)

Although this was kind of a Pyrrhic victory. The resulting function is unbearably slow as n gets larger and you have to up the precision. The iterative version with straight-up additions on bigInts wins out by far.

Need a fast no-loop Factorial function too?


>>> def fact(n):
... return round(math.exp(math.lgamma(n+1)))
...
>>> fact(5)
120.0


Exercise for the bored: figure out how to compute 1000! (It only has 2568 decimal digits! But alas it too suffers from an overflow error at 171 (and general inaccuracy starting at 17!) and is actually slower than the iterative factorial.)

The third party library 'mpmath' is an alternative to the built-in 'decimal', it has some more features and is generally faster which may be important for big loads. There are a lot of other mathy libraries for when you need to do specific things (like scipy is good for a lot of scientifically-minded computations).

Additionally there's the Sage project, which is super awesome. Check out: http://wiki.sagemath.org/interact/

It's a pretty ambitious project too; from their homepage:
Sage is a free open-source mathematics software system licensed under the GPL. It combines the power of many existing open-source packages into a common Python-based interface.
Mission: Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.

Python as a philosophy




>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
>>>


Python was created by Guido van Rossum. Dutch.

Python as a text mangler



A lot of the time I find some useful text on a web page that I want to format in a different way, or extract certain information from, or strip out useless text, or something else. Usually I can do this manipulation quickly with my text editor, vim, but sometimes Python enters the picture either as a supplement to vim or a replacement.

For example, suppose I wanted to get all the titles on the Hacker News home page (https://news.ycombinator.com/) and put them in a list. I could do it “the right way” with an HTML parser... Or I could just do it quick-and-dirty with a bunch of search-replace function calls and regular expressions.

Grabbing the HTML source on the front page:


>>> html = '''
... (pasted source code)
... '''
>>>


The triple quote (either single or double) lets you do a multi-line string (or a multi-line comment if you don't bind it to a variable). Similar to heredoc syntax in other languages.
Looking at the HTML, it looks like titles are in the form <td class=”title”> ...stuff... title … </td>. So we want a way to grab everything in between. Regular expressions to the rescue! (HTML tags leaÍ ki̧n͘g fr̶ǫm Ì¡yo​͟ur eyeÍ¢s̸ Ì›lÌ•ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE áµ’h god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨeÌ…Ì s ÍŽa̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGÎŒ ISͮ̂҉̯͈͕̹̘̱ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ )

Regular expressions are a big topic.

But well worth your time to learn.


>>> import re
>>> title_pattern = re.compile('<td class="title">(.+?)</td>')
>>> titles = title_pattern.findall(html)
>>> print titles
...oritization-System-Nabbed-Pandora-More-Than-70-Million-Active-Monthly-Users-with-Just-40-Engineers">How Pandora Nabbed More Than 70M Monthly Users with Just 40 Engineers</a><span class="comhead"> (firstround.com) </span>', '<a href="http://aarvik.dk/linux-monitoring-tools-suggestions-from-hacker-news/" rel="nofollow">Follow-up on "Linux server monitoring tools"</a><span class="comhead"> (aarvik.dk) </span>', '<a href="https://bugsnag.com/blog/branding-early-stage-startups" rel="nofollow">Branding for early-stage startups</a><span class="comhead"> (bugsnag.com) </span>', '<a href="/x?fnid=eM2Mtpc9Lhm68OoafEOBxT" rel="nofollow">More</a>']
>>> titles.pop()
'<a href="/x?fnid=eM2Mtpc9Lhm68OoafEOBxT" rel="nofollow">More</a>'
>>> titles_separated = []
>>> p2 = re.compile('<a href.+?>(.+)</a>')
>>> for e in titles:
... titles_separated.append(p2.findall(e)[0])
...
>>> print titles_separated
["Satya Nadella \x96 Microsoft's CEO", "Don't End The Week With Nothing", "DEA redacts tactic that's more secret than parallel construction", 'My love affair with AngularJS', 'CTF3 architecture', 'The Unpaid Bill that Launched a Thousand Starships', 'Your SaaS product is too cheap if you never lose customers because of pricing', 'Firefox 27 Released', "Facebook Paper's gesture problems", 'Taplytics (YC W14) Run A/B Tests On iOS Without Waiting For App Store Updates', 'Sigma.js, a JavaScript library dedicated to graph drawing', ' We are moving into the quantitative finance industry', 'Becoming A Software Consultant: My Backstory', 'Stripe adds multiple subscriptions', 'Nginx 1.5.10 released with SPDY 3.1 support', '4D Tesseract: Fourth Dimension Game', 'Managing Node.js Callback Hell with Promises, Generators and Other Approaches', 'New Tor Denial of Service Attacks and Defenses', 'Asusgate: A story about thousands of crimeless victims', 'Put.io', 'Hacking Airline Lounges for Free Meals', 'Data Structures in Clojure: Hash Tables', 'After Tyrone Hayes said that a chemical was harmful, its maker pursued him', 'Satya Nadella email to employees on first day as CEO', 'Why I avoid SDKs in production', 'AeroFS is hiring Engineers to improve collaboration at work', 'Little\x92s Law, Scalability and Fault Tolerance: The OS is your bottleneck', 'How Pandora Nabbed More Than 70M Monthly Users with Just 40 Engineers', 'Follow-up on "Linux server monitoring tools"', 'Branding for early-stage startups']
>>>


File IO is also really easy:


>>> file = open('setup_ltx.sh')
>>> for line in file:
... print line.replace('\n', '') # read line has trailing newline already
...
#!/bin/bash
set -x
cd /cygdrive/y/
mkdir bin
echo 'export PATH=/cygdrive/y/bin:$PATH' >> .bashrc
echo "alias ltxm='latexmk -silent -pdf'" >> .bashrc
echo "alias ltxc='latexmk -silent -c -CF'" >> .bashrc
cd /tmp
wget 'http://www.phys.psu.edu/~collins/software/latexmk-jcc/latexmk-435.zip'
unzip 'latexmk-435.zip' latexmk.pl
mv latexmk.pl /cygdrive/y/bin/
cd /cygdrive/y/bin/
mv latexmk.pl latexmk
chmod +x latexmk
echo
echo 'Done, press enter to quit'
read
>>> file.close()
>>>


If you want to write to a file, do open('filename', 'w') instead. Check out other methods on the file object on your own. Also, idiomatic Python now uses the “with” block:


>>> with open('filename') as file:
... for line in file:
... print line
...


Basically the with block cleans up after itself so you don't have to, which is nice. Check out the built in 'csv' module too. (csv.reader(file, delimiter=',', quotechar='”'))

Python as a cornucopia of data structures



Okay, maybe not as many as Java has, but it does have a lot of nifty ones. Built into the language's syntax are lists (similar to C++ vector, but are much more general and also make for good stacks), dictionaries (also known as maps or associative arrays and often implemented as hashtables), and tuples (“immutable” lists). You can make a set (unordered, no duplicate items) any time with the built-in set() function.


>>> my_list = [1, 2, 'foo', 3.4]
>>> print my_list
[1, 2, 'foo', 3.4]
>>> for e in my_list:
... print e
...
1
2
foo
3.4
>>> my_map = {'apple': 3.4, 'orange': 2, 8: 'ball'}
>>> my_map['four'] = 4
>>> dir(my_map)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']
>>> my_map['apple']
3.4
>>> my_map['banana']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'banana'
>>> my_map.get('apple')
3.4
>>> my_map.get('banana')
>>> my_map.get('banana', 4)
4
>>> rooms = { (0,0): 'Empty', (0,1): 'Empty', (0,2): 'Bat',
... (1,0): 'Treasure', (1,1): 'Boss', (1,2): 'Bat' }
>>> rooms[0,1]
'Empty'


Unlike some languages' dictionaries, your keys don't have to be strings in Python. They just have to be any “hashable” object. (Implement __hash__.)

There are a lot of other data structures and algorithms hidden away in modules. If for some perverse reason you want to have an array of ints and only ints:


>>> import array
>>> a = array.array('i')
>>> a.append(4)
>>> a
array('i', [4])
>>> a.append(5.5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float


The heapq module includes the priority queue algorithm. bisect contains the Bisection algorithm.

Several useful things are hidden in collections:


>>> import collections
>>> dir(collections)
['Callable', 'Container', 'Counter', 'Hashable', 'ItemsView', 'Iterable', 'Iterator', 'KeysView', 'Mapping', 'MappingView', 'MutableMapping', 'MutableSequence', 'MutableSet', 'OrderedDict', 'Sequence', 'Set', 'Sized', 'ValuesView', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_abcoll', '_chain', '_eq', '_get_ident', '_heapq', '_imap', '_iskeyword', '_itemgetter', '_repeat', '_starmap', '_sys', 'defaultdict', 'deque', 'namedtuple', 'newdict']


If you want to build your own data structures it's not too hard given the rich base structures and Python's easy OOP style. Here's a simple FIFO queue that returns None when empty instead of raising an error.


>>> class MyQueue:
... def __init__(self):
... self.queue = []
... def add(self, item):
... self.queue.append(item)
... def remove(self):
... if len(self.queue) > 0:
... return self.queue.pop(0)
... else:
... return None
...
>>> queue = MyQueue()
>>> queue.add(3)
>>> queue.add(4)
>>> queue.add(5)
>>> queue.remove()
3
>>> queue.remove()
4
>>> queue.remove()
5
>>> queue.remove()
>>>


The __init__ method is 'magic' and can be thought of as the constructor. In Python, all object-methods must be defined with the explicit 'self' argument as the first argument (you can call it whatever you like but the convention is 'self'). It represents the object that is calling the method (just like 'this'). If you've ever done OOP in C using structs with function pointers this may be familiar. If you leave the 'self' off, and have no other arguments, you're left with a rather useless method you'll be hard-pressed to call. If you want a class-method you'll need to leave out the 'self' argument and add a special decorator... Decorators are shown later.

Since I mentioned operator overloading before, this is for the perverse among you:


>>> import sys
>>> class ostream(object): # inheritance – more in a moment
... def __init__(self, file):
... self.file = file
...
... def __lshift__(self, obj):
... self.file.write(str(obj))
... return self
...
>>> cout = ostream(sys.stdout)
>>> cerr = ostream(sys.stderr)
>>> endl = '\n'
>>> cout << 'Hello' << ' ' << "World's" << 'leaders!' << endl;
Hello World'sleaders!
<__main__.ostream instance at 0x116bc68>


That was from http://norvig.com/python-iaq.html which is dated (a lot of 'missing' things are no longer missing) but still useful.

One addition to this class that's different from the old one is the '(object)' marker at the end. This is where class inheritance is done in Python: you pass the name of a class (or classes) you want to inherit from, and can override any methods. (And optionally call parent constructors too.) In Python 3, all classes by default inherit from the 'object' base class, whereas in Python 2 there is no such default (which usually doesn't matter, but often does, so it's currently considered “good form” to explicitly inherit from object – personally I'm lazy and only do it when needed).

While we're on the topic of OOP, you may have noticed no 'public', 'private', or 'protected' nonsense. Python doesn't have that. If you don't want to use a piece of data outside a class, then don't! You can if you're so inclined indicate to other people that they shouldn't either, but nothing's stopping them from doing it anyway, as shown next. (Know that in C++ as well, nothing stops a determined user from referencing a private member of an object via pointer arithmetic.)


>>> class Hider:
... def __init__(self):
... self.visible = "Use me!"
... self.__invisible = "Don't use me!"
...
>>> h = Hider()
>>> h.visible
'Use me!'
>>> h.__invisible
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: Hider instance has no attribute '__invisible'
>>> dir(h)
['_Hider__invisible', '__doc__', '__init__', '__module__', 'visible']
>>> h._Hider__invisible
"Don't use me!"


Python OOP is powerful and supports the important features (like multiple inheritance which is useful for mixins), but it also ignores nonsense like abstract classes. (Just write the word 'ABSTRACT' as the only line in your abstract method. When Python tries to compile it to bytecode it will raise an exception because that variable isn't defined.)

Python as a job time saver



Say you get a job where you have to write some Java. Part of your work involves creating a bunch of POJOs (Plain Old Java Objects) that due to the insanity of some framework or some corporate style requires you to have dumb getters and setters for object members instead of just accessing them directly. (Never write a plain getter or setter method in Python or Guido will eat your soul.) Something like this:


public class Student {

private int id;
private String name;
private int classes;
private String[] class_names;

public int getId() {
return id;
}

public void setId(int id) {
this.id = id;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

public int getClasses() {
return classes;
}

public void setClasses(int classes) {
this.classes = classes;
}

public String[] getClass_names() {
return class_names;
}

public void setClass_names(String[] class_names) {
this.class_names = class_names;
}
}


A lot of IDEs these days have some sort of wizard to generate such files for you. If you don't like those IDEs, you could instead do something like this (in Linux or Cygwin anyway):


$ echo -ne "Student\nint id\nString name\nint classes\nString[] class_names\n" | ./make_evil_java_class.py


And have the Python script generate Java's monstrosity. Or you could even put

Student

int id
String name
int classes
String[] class_names


into its own file_name and just run

$ python make_evil_java_class.py file_name


You could even have multiple files and pass them all to the Python script at once. Waste of time? It saved me a lot of it, anyway...


import sys

def make_evil(lines):
class_name = lines[0]
java_code = "\npublic class " + class_name + " {\n\n"
for var in lines[1:]:
java_code += " private " + var + ";\n"

for var in lines[1:]:
parts = var.split(' ')
if len(parts) > 2:
parts = [' '.join(parts[:-1]), parts[-1]]
type, member = parts
Name = member[0].upper() + member[1:]
java_code += """
public %s get%s() {
return %s;
}

public void set%s(%s %s) {
this.%s = %s;
}
""" % (type, Name, member, Name, type, member, member, member)

java_code += "}"
return java_code

if __name__ == '__main__':
nkill = lambda x: x.replace('\n', '')
for file_name in sys.argv[1:]:
f = open(file_name)
lines = map(nkill, f.readlines())
print make_evil(lines)

if not sys.stdin.isatty():
lines = map(nkill, sys.stdin.readlines())
if len(lines) > 0:
print make_evil(lines)


Python as a way to explore some higher level computer science concepts



Closures



You may have noticed the 'lambda' keyword in the last example. 'lambda' is a way to create an anonymous function. Above this anonymous function takes one argument, x, and returns the result of the expression to the right replacing x's newlines with empty string. After creating it I assigned it to the 'nkill' symbol so I can call it later, but I could have just passed the lambda into the map() function directly as an argument. Lambdas in Python are really only useful for single-line functions like that, and they can only evaluate single expressions.

More generally you can define a function within a function and it will be scoped to just that function. But more importantly, this allows you to have closures. The inner function closes on the scoping of its parent. This is best shown with an example.


>>> def slow_square(x):
... time.sleep(3)
... return x**2
...
>>> slow_square(2)
4
>>> def make_cached_square():
... cache = {}
... def square(x):
... if x not in cache:
... time.sleep(3)
... cache[x] = x**2
... return cache[x]
... return square
...
>>> cached_slow_square = make_cached_square()
>>> cached_slow_square(2)
4
>>> cached_slow_square(2)
4
>>>


Even though 'cache' was initialized outside of the inner function, the inner function 'closed' on it and can reference it through subsequent calls. Furthermore any new calls to make_cached_square() result in new values of 'cache' and 'square()' in their own environment, leaving the original alone. e.g.:


>>> def make_adder():
... start = [-1] # Note: you can only close read-only over 'naked' values like -1
... def adder():
... start[0] += 1
... return start[0]
... return adder
...
>>> add = make_adder()
>>> add()
0
>>> add()
1
>>> add()
2
>>> add2 = make_adder()
>>> add2()
0
>>> add2()
1
>>> add()
3
>>>


Since the previous closure pattern of caching return values of a function called with the same arguments is so useful, it has a name: memoization. And since you can pass functions around as arguments in Python as well as return them, it's easy to make a generic memoizer:


>>> def memoize(fn):
... cache = {}
... def wrapper(*args, **kwargs):
... if (args, str(kwargs)) not in cache:
... cache[args, str(kwargs)] = fn(*args, **kwargs)
... return cache[args, str(kwargs)]
... return wrapper
...
>>> slow_square = memoize(slow_square) # we don't want to call slow one directly anymore
>>> slow_square(3)
9
>>> slow_square(3)
9


Since it's somewhat cumbersome to define a function, then immediately after say something like 'myfun = memoize(myfun)', and because there's lots of other nice things you can do when you start using 'higher order functions' in this way (i.e. passing functions as arguments to other functions that make wrappers for the original function), Python provides a convenient syntax known as decorators:


>>> @memoize
... def cached_square(x):
... time.sleep(3)
... return x**2
...


While square() isn't a very good practical example, in the field you might have a long-running process on a web server that validates Basic Auth (i.e. auth details passed with every request) passwords hashed with the 'bcrypt' algorithm at a high work factor (so that each password hash takes say half a second). Obviously it would suck if the user had to wait half a second for every request to the web server (and your web server won't be happy either doing all that number crunching), so you can just slap memoize() around your hashing function so that only the first validation will take the full amount of time. I ended up doing this for some work in Node.JS. (You might not even be using bcrypt but have a database call somewhere, which can also be slow...)

If you want a Java-like class method, then use the @staticmethod decorator around your method definition. If you want a sometimes more nifty class method, use @classmethod around a method with 'cls' as its first argument instead of 'self'.

Metaclasses



Metaclasses are exactly what they sound like. They are classes whose objects are classes themselves and can be instantiated into new objects. This lets you rewrite standard class semantics if you want to... such as not calling __init__ when your object-class object gets instantiated...

Remember that everything in Python acts like an object. And every object has a class it was instantiated from. When you make your classes normally, they implicitly get instantiated from the 'type' class. i.e. Python classes are type objects. Also the 'type' class is itself a metaclass. To show the example of not executing __init__:


>>> class Meta(type):
... def __call__(cls, *args, **kwargs):
... if raw_input('Init class? [y/n] ') == 'y':
... print 'Initializing cls'
... return type.__call__(cls, *args, **kwargs)
...
>>> class MyCls(object):
... __metaclass__ = Meta
... def __init__(self):
... print 'I should be called'
...
>>> c = MyCls()
Init class? [y/n] y
Initializing cls
I should be called
>>> c2 = MyCls()
Init class? [y/n] n
>>>


More commonly one would write the __init__ method of the meta class. Thus when you create a new class with a parent metaclass, once the class is defined the metaclass's __init__ is called... so you don't have to wait for a new object to be created first. One common usage of a metaclass is in adding various aspects of the subclass (like certain attributes) to some sort of internal bookkeeping structure... In general, metaclasses have questionable utility, but they're interesting nonetheless.

Generators and coroutines



Way back in the beginning I mentioned it's suggested to use the function 'xrange' instead of range() in your for loops. Now we can explore why.


>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xrange(10)
xrange(10)
>>> a = _
>>> b = iter(a)
>>> b
<rangeiterator object at 0x04C46F08>
>>> b.next()
0
>>> b.next()
1
>>> b.next()
2
>>> iter(a).next()
0
>>> b.next()
3
>>> list(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


This is an example of lazy evaluation. With the list() function, the range of numbers is created right away and returned as a value. If you have a large range (like trying to loop to 10 million), this means that you first have to wait for Python to construct an array of 10 million numbers, and keep that whole array stored in memory the whole time, and pass each element to the for loop. It'd be better if we could just generate the range numbers when they are needed instead of all at once, up-front, and if we're just doing a for loop we don't want to have to store the whole list in memory at once either. xrange() does this for us, but how does it work?

It's really easy to make your own:


>>> def my_xrange(start, stop=None):
... if stop is None:
... stop = start
... start = 0
... while start < stop:
... yield start
... start += 1
...
>>> x = my_xrange(10)
>>> list(x)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x = my_xrange(10) # new one
>>> x.next()
0
>>> list(x)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>


my_xrange() returns a generator object thanks to the 'yield' keyword.
The 'yield' keyword is just like the 'return' keyword, only it saves the execution place in the function. So when the next() method is called on the generator, the function starts executing again directly after the yield statement, and will run until another yield statement is reached. When there are no more yield statements, the generator raises a StopIteration exception. This gives an insight into how Python's for loop works. If it wasn't given a generator object to work with, it makes one with the iter() function (which calls an object's __iter__ method), and repeatedly calls .next() on the generator until it catches the exception.

Lazy evaluation is such an important concept that some programming languages (like Clojure and Haskell) go to great efforts to make things as lazy as possible. If you happen to be threading a list of values to a lot of different functions and making updates, it's much more efficient if under the hood you're really threading input through a pipeline of generators. (Similar to *nix shell pipes.)

Some algorithms explicitly demand lazy behavior for iterative construction over time. For example, the Genuine Sieve of Eratosthenes must be implemented lazily: https://gist.github.com/Jach/1761175

A generalization of generators is a coroutine. The main difference is that 'yield' now becomes an expression, and instead of just using .next() to receive values, you can use .send() to send values.


>>> def grep(pattern):
... print 'Looking for %s' % pattern
... while True:
... line = (yield)
... if pattern in line:
... print line,
...
>>> g = grep('python')
>>> g.next() # 'primes' generator, advances it to first (yield)
Looking for python
>>> g.send('Hello world')
>>> g.send('Hello Python world')
>>> g.send('Hello python world')
Hello python world


Since coroutines are generalizations of generators you can technically both send and receive from one... but your brain may explode. It's best to just think of coroutines as lazy data consumers and generators as lazy data producers. If you want a full overview, check out www.dabeaz.com/coroutines/Coroutines.pdf (Which is so crazy it ends with implementing the basics of a multitasking operating system...without interrupts, and the option to use threads or subprocesses or whatever. Coroutines enable concurrency even in single-thread programs, which is awesome.)

Python as a quick way to share documents over the websites



From the command line:


$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...


Now the directory and subdirectories you ran that command in are visible to a browser connecting to your IP on port 8000. If you have that port forwarded it can be a quick way to share files with a friend – just make sure you run it in a directory that doesn't contain secret files...

Python as ...



We've only touched on a few of the included batteries. Did I mention if you want a simple GUI program (albeit an ugly one), Python comes with Tkinter to let you do that? (There's also third-party bindings for WxWidgets, Qt, Gtk, etc.) Really take a look at the standard library, linked again: http://docs.python.org/2/library/ There's a lot in there.

Additionally the Python community is large, and there are a lot of really great open source third party modules you can download and install. Here's a few I like:

scipy/numpy – Number crunching, scientific applications, can replace a lot of matlab code.

PyGame – SDL-based, develop games using old-school thinking of the graphics pipeline (blitting) (you have complete control) (http://pygame.org/news.html)
Example:


import random
from math import sqrt
import pygame
from pygame.locals import *
pygame.init()
screen = pygame.display.set_mode((800,600))
pygame.display.set_caption('test')

ball = pygame.Surface((30,30))
ball.fill((0,0,0))
pygame.draw.circle(ball, (255,255,255), (15,15), 15)
ball_pos = ball.get_rect()
ball_pos.center = (400, 300)
ball_speed = [random.randint(2,4), random.randint(2,3)]

clock = pygame.time.Clock()
keep_going = True
while keep_going:
clock.tick(60)
for event in pygame.event.get():
if event.type == QUIT or (event.type == KEYDOWN and event.key == K_ESCAPE):
keep_going = False
if event.type == MOUSEBUTTONUP:
vec_x, vec_y = ball_pos[0] - event.pos[0], ball_pos[1] - event.pos[1]
mag = sqrt(vec_x**2 + vec_y**2)
normed = [vec_x/mag, vec_y/mag]
ball_speed[0] += normed[0]*2
ball_speed[1] += normed[1]*2
ball_pos.centerx += ball_speed[0]
ball_pos.centery += ball_speed[1]
if ball_pos.left < 0 or ball_pos.right > 800:
ball_speed[0] *= -1
if ball_pos.top < 0 or ball_pos.bottom > 600:
ball_speed[1] *= -1
screen.fill((0,0,0))
screen.blit(ball, ball_pos)
pygame.display.update()


Flask – Quick, local web server and framework, can then use as production web site (http://flask.pocoo.org/)

PyAudio – Easily connect to computer microphones and get live data on the fly (example: https://gist.github.com/Jach/6361147)

Texttable – Produce simple formatted ASCII tables (https://oneau.wordpress.com/2010/05/30/simple-formatted-tables-in-python-with-texttable/)

MyHDL – Python library letting you write hardware description code that compiles to both Verilog and VHDL (http://www.myhdl.org/doku.php) D flip-flop example:
Transfer value of the d input to the q output on every rising edge of the clock.


from myhdl import *
def dff(q, d, clk):
@always(clk.posedge)
def logic():
q.next = d
return logic


And you can test it:


def test_dff():
q, d, clk = [Signal(bool(0)) for i in xrange(3)]
dff_inst = dff(q, d, clk)
@always(delay(10))
def clkgen():
clk.next = not clk

@always(clk.negedge)
def stimulus():
d.next = random.randrange(2)

return dff_inst, clkgen, stimulus

def simulate(timesteps):
tb = traceSignals(test_dff)
sim = Simulation(tb)
sim.run(timesteps)

simulate(2000)


Or convert it to Verilog:

toVerilog(dff, q, d, clk)


resulting in:

module dff (

q,
d,
clk
);

output q;
reg q;
input d;
input clk;

always @(posedge clk) begin: _dff_logic
q <= d;
end

endmodule



Posted on 2014-02-09 by Jach

Tags: programming, python, tips

Permalink: https://www.thejach.com/view/id/294

Trackback URL: https://www.thejach.com/view/2014/2/an_attempt_at_a_practical_exploration_of_python_for_newcomers

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.