TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

"I can understand what each line is doing"

This is typically said of a "simple" language such as C (and sometimes, lately, Go) because the abstraction layer between the hardware and the language is light. If nothing else, you can figure out what's going on at the hardware level if you really want to. (And in Go you don't have to "worry" about the possibility of a macro.) This might help you reason about performance, or understand a disassembled binary, or just as an excuse not to use C++.

A typically response might be "Oh really? What does the line with fnord(i); do?" The correct answer is "It finds the variable i using lexical scoping rules, puts a copy of it in a free register (or pushes it on the stack), and jumps to the location of the fnord function which may or may not use the i value in the register or on the stack, may or may not do something else, and may or may not alter the machine instruction pointer to avoid returning to the jump location like a normal function would."

Okay, but what does the piece of code do? As a programmer reading C code, almost all of my time is spent trying to understand that. Better rephrased, what goal does the piece of code accomplish? I don't tend to care what the machine itself is doing, except in very specific circumstances like optimization and hardware programming.

So to me it seems like a language that makes the task of figuring out what a chunk of code does in terms of goal accomplishment (a chunk since I'm rarely looking at a single function call in isolation like that), it's worth the tradeoff of getting further from what the machine itself is doing. Since this is a known tradeoff, languages that include features to mitigate this tradeoff when desired are even more worthwhile.

Let's examine a small line of Clojure code that utilizes something that a C zealot might object to: lazy lists. The code is (take 20 (cycle [1 2 3])). Just an innocent piece of code, we won't pollute the discussion with what goal it's trying to accomplish...

After evaluation we get the list (1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2). The C zealot might say: "Big deal... I can get the same thing, watch!" And after some effort (or they might just remember) they'll try memcpy and realize it copies chunks at a time and so they have to call it multiple times (or implement their own byte-by-byte memcpy). At last they're left with:


#define N 20
int i;
char list[N] = "123";
int patlen = strlen(list);
for (i = patlen; i < N; i+=patlen) {
memcpy(list+i, list, (N-i > patlen) ? patlen : N-i);
}


"Ha!"

The Clojure newb chortles.


user=> (take 6 (cycle (partition 3 1 [0] [1 2 3 "four" {:five 5}])))
((1 2 3) (2 3 "four") (3 "four" {:five 5}) ("four" {:five 5} 0) (1 2 3) (2 3 "four"))


After crushing the C programmer in terms of expressive power, the C zealot says "Well, I know what mine is doing at the machine level! How would I even implement a clone of the first example?" We'll accommodate the C zealot for a moment and talk about what the machine is actually doing.

What is the machine doing with the first Clojure example? I don't know. I don't really care. I'm okay with not knowing, so I ask the C zealot "Why does it matter?"

"Because if it's slow you can figure out how to optimize it!"

Is it slow? Wrapping (time) around it reports that it takes about 0.05 milliseconds, that seems fast enough to me. I'm sure if it was important enough, the HotSpot JIT might notice and make it even faster...

But I can still tell you at a high-level what it might be doing, and then we could optimize from there. cycle might return a generator, one way to implement lazy-ness, and so the zealot will complain about lots of function calls to grab each item when we already know how many we're taking. Are either take or cycle macros? A glance at the source indicates no. However, it's trivial to make a new macro, say, tk (though we could have called it take and superseded the original), which when used has exactly the same source code as before but is missing an a and an e. Now the computation happens at compile-time, and what's left at run time is a list of N numbers as large as however big it would have been had we typed it in ourselves.

This feature alone is not possible in C, its macro system isn't nearly powerful enough. We've just optimized the cost of that piece of code to 0, because any cost now happens at compile-time instead of runtime. A C programmer's only option is to type in the list manually, or run once, get the list, and copy-paste it manually into their source code and recompile. Pleh! In addition we've kept the meaning around in the source code. A new reader doesn't need to know that we discovered it was a slow piece of code that needed optimization so we macro'd it, the reader just needs to know the intention, and the intention was to take 20 elements from an infinite cyclical source to accomplish some goal. If we just replaced it with the pattern list itself like in C, we might need to include a comment. If the list is particularly large, say 20000 elements, we're wasting a ton of space keeping that in the source file. And what if there's a mistake? What about for all the other lists of various sizes with different patterns? Are we going to replace all those too since the code is so sub-optimal? In Clojure this happens for us already since we macro'd it, and if we made a mistake there's only one place to fix it for all variations.

And if we really needed to, we could make Clojure macros to generate assembly directives for a particular machine. Like they did for many robots 40 years ago when the robots didn't have enough memory to host the full Lisp environment.

While Lisp makes it harder to know exactly what's going on at the machine level (especially with Clojure, where understanding what's going on at the machine level implies you understand what's going on at the JVM level and also implies you know how the Clojure code is being compiled to JVM bytecode), that downside is mitigated by Lisp's macros that make for more powerful optimizations without sacrificing any clarity of problem-meaning.


Posted on 2012-04-27 by Jach

Tags: c, clojure, lisp, programming

Permalink: https://www.thejach.com/view/id/250

Trackback URL: https://www.thejach.com/view/2012/4/i_can_understand_what_each_line_is_doing

Back to the top

Anonymous December 04, 2012 10:27:14 AM Here's an implementation of tk:

(defmacro tk [n myseq]
(let [seq2 (eval myseq)
my-list (take n seq2)]
`(quote ~my-list)))
Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.