Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Short guide to Lisp (syntax)

(Update: Go watch Rich Hickey's Clojure for Java Programmers for a superior introduction to Clojure's (a Lisp) syntax.) Lisp has been around for a long time. While some of its current magic capabilities are fairly recent, the basic syntax it uses has hardly changed at all over 50 years. To me this means it's endured because it's good and because it's simple and because it's powerful. I'll talk about those last two qualities here, maybe you'll end up also believing the syntax is good.

I will start with a related topic: mathematics. Not any particular usage of math, just the notation. The symbols. One of the hardest things for me in math has been memorizing symbols, especially this little guy: $$\nabla$$. Very subtle changes in the way you write stuff on paper can drastically change the meaning of something. In elementary school you're taught that $$\times$$ means the same as $$\cdot$$. Later you find out they're very different. I could give more symbol soup examples, but take a moment and just skim down the linked Wikipedia page.

There's a similar phenomenon in most programming languages. There are symbols and syntax for all sorts of weird things. In math you might say "let x = 4", in programming you might say "$x = 4;". Wouldn't it be nice if math, and programming, had a consistent syntax?

Math predates programming, but a grand solution to this problem of notation didn't come all that long before. That solution was called Lambda Calculus. Lisp was heavily influenced by this system of mathematics, so it too has this solution to the problem of notation and syntax.

What is the solution? Everything is a function. (In formal Lambda Calculus, this includes numbers like 1, 2, and so on.) You know, like the kind you remember from math class such as $$f(x,y) = x^2 + y^2$$. Only with a crisper, cleaner syntax. In math you might ask for the value when x is 3 and y is 4, so you type in $$f(3,\ 4)$$ and out pops 25. How can the syntax get any crisper? Well, in typing it we often put spaces after the commas, why don't we just enforce that as the value separator and get rid of those commas? $$f(3\ 4)$$. Lisp makes one further simplification which I will talk more about when I get to discussing its power: it moves the "function name", in this case f, inside the parentheses: $$(f\ 3\ 4)$$.

That is what Lisp syntax always is: (function-to-call possible-argument-1 possible-argument-2 possible-argument-3 ...), with the ability to nest further instances of function calls and arguments to those functions within any other argument. This bugs people at first because it's different to how they learned arithmetic: $$(3+4)/2$$ becomes (/ (+ 3 4) 2). Lisp provides those symbols since we learn them in grade school, but there's nothing stopping you from saying instead: (divide-first-by-second (sum 3 4) 2). I greatly prefer writing out a function call for things like $$\nabla$$ which changes meaning in subtle ways. I like being explicit. Anyway, if you understand this simplicity, of function call then argument, always in that order, you understand Lisp syntax and can stick it as a keyword in your resumé.

Okay, not quite all cases. Not even Lisp is absolutely perfect; depending on the dialect you may have anywhere from a handful to several handfuls of "exceptions" to this rule, but the "exceptions", generally, stay true with the spirit of the rule. For example, here's one way to define that example function earlier rather than just calling it:

(define (f x y)

(+ (* x x) (* y y)))

(For note-taking purposes, the dialect of Lisp being used here is Scheme though that's not too important.)

In this example, everything I said about the idea of (function-to-call args) rule applies, except for the first argument to define. (f x y) is how you would call the function, but here you're just defining it! It's some syntactic sugar but it's still true to the spirit of the rule. This is similar to the mathematical form, where calling/using the function is identical to that part of defining it.

Now on to Lisp's power. You might think that if the rule is almost always (function-to-call args), that if your arguments contain functions to call themselves, those functions actually get run by the computer. This is not necessarily the case! I demonstrated that above with the define example. If you tell Lisp (+ 3 4) you'll immediately get back 7. If you tell Lisp (define (seven-and extra) (+ extra (+ 3 4))), it doesn't go in and evaluate the (+ 3 4) prematurely, you won't be left with (define (seven-and extra) (+ extra 7)). No, in fact, it waits to evaluate. This "laziness" is a very powerful idea in computer science and has many great uses. Stipulating this particular exception, that "Nothing but the top-most function names have to be evaluated/called immediately", to the normal rule of (function-being-called args) seems worth the "violation" of absolute purity.

So what about (/ (+ 3 2) 2)? The rule is "That '/' symbol has to be called since it's the highest and outermost level, but should the arguments by evaluated or not? Should I pass the arguments 7 and 2 or should I pass the arguments (+ 3 2) and 2, and let '/' worry about evaluating (+ 3 2) (which would be at the highest level as far as its concerned)?" The answer is determined by how the symbol '/' was defined. If it was defined the usual way with the "define" function, then 7 gets passed to it. If it was defined in a different way, it waits.

This waiting is incredibly important. How do you make conditional actions? This code is only a "special form" in the same way as the "define" function was, in that it waits to evaluate all the arguments.

(if (= 4 (+ 2 2))

(print "Math works!")
(print "Something went really wrong."))

If it didn't wait, then the question of whether 4 equals 2+2 is irrelevant since both pieces of code would get executed.

So why put the function on the inside instead of the outside like everyone's used to? The chief reason? It's incredibly easy to parse and manipulate this way. f(3) is ugly to parse, (f 3) is easy. divide(5 plus(1 2)) is also uglier to parse. It's also slightly harder for a computer to parse.

This manipulation aspect, though, is a key idea. Code is data, data is code. When code is expressed the Lisp way, every symbol can be thought of as just an element in a normal list! (1 2 3) is a list; so is (+ 2 3), the only difference is the first element. The lazy/non-lazy feature of Lisp is how it can distinguish between the two. If we didn't have lazy unevaluated lists in Lisp, then (1 2 3) would try to call the function 1 and that's not what we want. So Lisp lets us treat data as code and code as data, which means manipulating code is as simple as manipulating (list) data.

Say for some reason that in a non-Lisp language the expression divide(5 plus(1 2)) is being passed as an argument to another, different function that wants its arguments to wait, such as if. And suppose this function just came home at 3am drunk from a party and is feeling pretty trixy. He decides that no matter what the first argument's real function is, he's going to replace it with a call to the function named stupid. Therefore when it comes to evaluation time, the actual code that gets evaluated is stupid(5 plus(1 2)) instead of the original divide.

Do you think it's easier to go from divide(5 plus(1 2)) to stupid(5 plus(1 2)), or from multiply(5 plus(1 2)) to stupid(5 plus(1 2)), without some complicated looking code? What if we just had to go from (divide 5 (plus 1 2)) to (stupid 5 (plus 1 2)) instead? In this case it's much easier. We're just changing the first element of a list! There are lots of possible implementations. But it's general, we don't have to parse for parens and try to distinguish functions and non-functions, we just treat the whole expression as a list and replace the first element with a new one.

What if we want to go from (divide 5 (plus 1 2)) to something crazier like (stupid divide 5 (plus 1 2))? Which otherwise in the non-Lisp language would look like going from divide(5 plus(1 2)) to stupid(divide 5 plus(1 2)). That actually looks kind of complicated! We're even passing a function itself as the argument! You have to replace a '(' with a space and then add 'stupid('... As opposed to just sticking 'stupid ' in front of 'divide' like Lisp would, which is just saying that we added a new element to the front of our code-list.

Speaking of functions taking functions as arguments, that's another piece of Lisp's power, though it's not unique to Lisp. It's common to pass function names themselves as arguments to other functions. In this case, we actually took a function and instead of evaluating it passed it and its arguments to a separate function. Maybe that separate function will evaluate it later on.

This allows for a "purer" form of lazy waiting among many other nice things you can now do. Other languages support passing functions as arguments, but not as many support transforming a function call into one of these behind the scenes. Suppose that the highest level of function call you can make has one, invisible, higher level still: the actual computer.

So now you realize you don't have (+ 3 4) in isolation, which can be just as good as +(3 4). You have (computer + 3 4), or computer(+ 3 4). (Telling the computer: here's a function, call it with these as arguments.) The constant between those is the "+ 3 4" in series, and so that's taken as the standard syntax for everything. (As a side note, Lisp has a "computer" function usually built-in called "apply".)

I've only scratched the surface of the real power that just the simple syntax of Lisp brings. If you've ever written HTML (press ctrl+u if you haven't), you may have noticed a certain tediousness to it. In Lisp, you might just write something as simple as (prettybox (para "Yummy yummy foood") (para "Is good")) instead of <prettybox><para>Yummy yummy food</para><para>Is good</para></prettybox>. And imagine how simple it becomes trying to go the other way! If your program takes as input such an HTML fragment, would you rather work with the Lisp version or the HTML version? Perl popularized the HTML-tags-as-language-functions idea, but only Lisp's syntax of functions-as-elements-of-lists gives you the power to easily instruct the computer to write code for you.

The one consistent criticism against Lisp's syntax is its seemingly large amounts of parentheses. This picture explains it perfectly:


In practice, though, they're not bad, and you really don't use them all that more compared to other languages. (Especially when you factor in things like curly braces in other languages.) Use a text editor that highlights matching parentheses, and even better use a text editor that has rainbow parentheses! (Adjacent parentheses aren't the same color letting you globally paren-match based on color alone.) People don't like Python because it enforces whitespace and ditches the curly braces. It's a very small stylistic thing that you'll get over because it turns out to be really nice. Recent additions to the Lisp family like Clojure back away slightly from the "everything is a list" idea to "everything is either a list, a dictionary, or an array/vector." This adds a lot of clarity in exchange for only minor overhead in how you write code that writes other code.

If you really just want to program like symbol symbol symbol ..., not even needing parentheses, there's always Forth(PDF).

Posted on 2011-08-31 by Jach

Tags: lisp, math, programming, tips


Trackback URL:

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.