Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Simple abstraction

Abstraction is good. Say you're given the following task: write a script to take in five integers as input, add them together, and print the result. A naive implementation would look like this:

n1 = raw_input('Enter number: ')
n1 = int(n1)
n2 = raw_input('Enter number: ')
n2 = int(n2)
n3 = raw_input('Enter number: ')
n3 = int(n3)
n4 = raw_input('Enter number: ')
n4 = int(n4)
n5 = raw_input('Enter number: ')
n5 = int(n5)
total = n1 + n2 + n3 + n4 + n5
print total

Now, this certainly gets the job done, and if you're new enough that you can't conceive of better abstractions to make the work load a lot less, this will suffice for a first shot allowing you to continue work on launching your product... If you're writing software for a company that's not giant, the first job should be making something that works, the second job is making it work efficiently/elegantly/maintainable. While I'm working I have to stop myself from premature abstraction that I didn't see until I was done with a naive approach. I'll get the obvious things, sure, but a few times I've seen abstractions for the entire code base that I was only allowing myself to do later, not at the moment I saw them.

So, what are a few immediate downsides, and a few tricks we could do to clean this up? The most significant: Code reuse. Exactly similar blocks of code beg for a function, characteristically similar blocks of code beg for an abstraction. In this case a function wouldn't gain us much, though perhaps as a first step we should put the int() applications onto the same line as inputs, and get rid of the unnecessary variable total. Also make the input message into its own variable.

msg = 'Enter number: '
n1 = int(raw_input(msg))
n2 = int(raw_input(msg))
n3 = int(raw_input(msg))
n4 = int(raw_input(msg))
n5 = int(raw_input(msg))
print n1 + n2 + n3 + n4 + n5

Looks better, the initial redundancies are taken care of. There is still a glaring characteristic similarity going on, in this case one of repetition. Each of the number lines are doing essentially the same thing, just storing the result in a different place. Also, there's a related problem of scalability, which might not be a concern for this one-off script but can definitely cause problems in a bigger application. If I suddenly want you to sum 10 numbers instead of 5, you have to add 5 more lines and change one, increasing the possibility of error and continuing the repetition. Good code is robust to sweeping design requirement changes.

Fortunately Python provides us with a for loop to avoid explicit repetition in the code. We can get rid of the individual variables for the entered numbers and return the total variable. We can drop the msg variable since we only use the phrase once.

total = 0
for _ in range(5): # the _ is conventionally used to say we don't care about it
total += int(raw_input('Enter number: '))
print total

This is alright, the solution in many languages might initially look like this, it might be the first thing you thought of when you read the problem. It's scalable, if I want to handle 10 numbers I just change 5 to 10. But what if the design changes again, and now I actually want a reference to the entered variables? Well, we can store them in an array as they come in.

numbers = []
for _ in range(5):
numbers.append(int(raw_input('Enter number: ')))
# print total...

Now you have numbers[0] through numbers[4], and now you have to sum them. You have three options, the first being very naive like the first total implementation, and explicitly sum numbers[0] + numbers[1] + numbers[2] + numbers[3] + numbers[4], which doesn't scale as you add more you would like to sum. The second is to bring back the total variable and either temporarily store the input to add to total and then store in the array, or use a second loop through the stored values in the array and total. The third way is to use Python's built-in sum() in place of the second option, let's use it.

numbers = []
for _ in range(5):
numbers.append(int(raw_input('Enter number: ')))
print sum(numbers)

The solution looks nice. If you were busy, you might be tempted to just leave it like this. You might remark that if you were interested in a million numbers, you would have to loop through three million items since sum() has an implicit loop, and range() returns an actual array.

Enter generators! If we use xrange instead of range we get a generator version of range() that will return number-by-number instead of giving us an entire array. That brings us to two loops. We can bring us down to one loop like so:

numbers = (int(raw_input('Enter number: ')) for _ in xrange(5))
print sum(numbers)

and if we changed our mind about needing the entered numbers, this simply becomes:

print sum((int(raw_input('Enter number: ')) for _ in xrange(5)))

One line, five iterations, scalable to N, easy to factor out an actual list if we want the entered numbers, and very readable (if you know Python). This is abstraction, this is beauty, this is better.

(Of course, in the Real World you might also want to put an exception handler around the thing in case the user enters something not an integer. Input is always ugly.)

Posted on 2011-03-12 by Jach

Tags: programming, python


Trackback URL:

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.