TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Why you should fear your automatic refactoring tool

Okay the title is clickbait. I actually encourage people to not let themselves succumb to fear-driven development. Fear is the mind-killer, stop fearing!

No really. All too often I come across people or comments about being too fearful to change software until they had "some tool" or process. The tool and processes vary, but the fear doesn't. I argue that even with no tool but your brain, you can code fearlessly. This isn't to say that tools don't help, especially when it comes to boosting confidence in correct results. I am saying that the physical sensation of fear that plagues certain programmers is entirely to do with their psyche.

If you're possessed by fear unless you have some particular tool or process, ask yourself, what if I took that tool or process away? And said you can't have it back? What are you going to do? Will you be able to move forward? The art of fearless coding is to change your perspective from an emotional one to a logical one. Instead of being fearful about the effects of some possible change, ask yourself instead how confident you are that this specific change will produce good effects over bad effects. How can you improve your confidence? Maybe you think back to the tools I took away -- what if you don't have those particular tools, how can you improve your confidence?

Suppose I give you the task of renaming a method in Java. Your tools are Java's static type system and whatever IDE. You use the IDE's auto-refactor command. How confident are you that the IDE has done the right thing?

If you say 100%, you're wrong, for two reasons. The first is trivial, in that you can never hold 100% confidence while also being in compliance with a formalized way of changing your confidence based on new evidence (Bayesian updating). The second is that in Java, your static type guarantees aren't enough.

This is because Java supports introspection. I can ask the JVM: do you know about this class that I'm specifying by a string? If so, do you know about this method? Oh, it's private, well can you ignore that and let me call it anyway?

The answer to all of those can be yes. The point is that your IDE's refactor tool is going to miss these. Some tools will miss fewer than others, or can at least flag introspective accesses for you to look at that might end up being false positives, but it's not possible to automatically catch everything.

This isn't a theoretical concern, it happens in real code. Fortunately for myself the last time I did this and was bitten by it was due to inlining a private method, and then later getting a report from our test automation that some old test had started failing. It was calling the private method with introspection...

There are other possible places it can miss too. Namely, if you're renaming an API method, you might have clients (in the browser, or as other Java programs) that you can't see but are expecting that method to be there and it to be named a certain way. The consequence for them could be a compile-time error or a runtime error, depending on how they're invoking the method.

So, don't trust your auto refactor tool with complete certainty. If you were fearful of making such a change like renaming a method without a tool, and the tool took that fear away, maybe you should bring that fear back and then reexamine how to stop fearing regardless.

None of this means stop using the auto refactoring tool!

But it's curious to note that Java has static types, and yet it can still fall prey to this issue. What does this mean for dynamically typed languages?

Nothing! The type system is a red herring. Certain types of program analysis can be easier with a static type system, sure, but there are other types of analysis besides static analysis. Dynamic analysis for instance. That type of analysis is easier in a language that supports runtime introspection, like Java, or many dynamically typed languages.

I suspect but haven't investigated that introspection is the key mechanism to make the LSP projects for dynamic languages here function correctly and be able to support all the usual IDE features (jump to def, hover for doc, auto-complete, cross-reference callers, and certain kinds of auto refactoring). A notable exception (and maybe worth addressing) from that project's long list of supported languages seems to be Common Lisp. But this is a minor issue because Lispers have enjoyed all these things for a long time, using a similar client-server model. I like to Lisp in vim, and with one plugin I have access to tab-complete, documentation on demand, jumping to definitions, etc.

I'm missing auto refactoring tools (maybe emacs has them, or commercial lisps) -- but I don't find myself needing them as much. Better designs made possible with more dynamicism alleviate the need for massive refactors where the tools really save a lot of effort (dynamic languages encourage late decisions and late binding so there's less to change for the average refactor) so for the average refactor an auto tool might only save half a minute at most compared to the manual vim/sed incantations. Sometimes vim will be faster. Maybe my confidence in certain things is theoretically lower sometimes, because with vim I'm often relying on my brain and text manipulation vs. a tool relying on static analysis, but I have ways to increase my confidence so I'm never fearful of moving forward.

For example, let's go back to the rename method. A random 46 thousand line CL project I found has a doubly-linked queue structure with some exported generic functions. What if I wanted to rename, say 'remove-from-queue to just 'rm-from-queue? Or maybe it was vice versa. We'll pretend there are no downstream libraries we have to worry about.

Take away all my tools but a bash shell, and I am unphased. find ... | xargs grep... These days I use ag for that. But it lets me find all strings with the name "remove-from-queue", then I can go one-by-one and see if it's the real one or not and update it. This doesn't give me 100% confidence because someone might be creating the symbol from e.g. user input and evaling it, but not even a static typed language would help with that.


source/gbbopen $ ag -i remove-from-queue
queue.lisp
61: remove-from-queue
76:(defgeneric remove-from-queue (element))
163: (remove-from-queue element)
188: (remove-from-queue element)))
212: (remove-from-queue element)))
236:(defmethod remove-from-queue ((element queue-element))

control-shells/agenda-shell.lisp
1272: (remove-from-queue ksa)
1295: (remove-from-queue ksa)
1536: (remove-from-queue ksa)


So it looks like this function is only used in one other file, maybe. That file could be referring to another function of the same name but in a different package, but probably not. I would be comfortable doing the search-replace without checking. Using ag like this, and combining with sed like here is actually going to be faster than any IDE user.

Give me the choice of Lisp implementation as a tool, and I'll use SBCL. Its compiler is quite good, and warns about things like trying to call unknown methods. So after I do my rename, I could just recompile the project, and see if I have any compiler warnings about unknown function calls to the old function. No need to have 100% line coverage tests to maybe blow up on a runtime error to catch this sort of stuff, my compiler isn't brain dead.

Give me vim/emacs + SLIME, and I can use the cross referencing capabilities in slime. I put my cursor over the symbol and ask "who-calls?" In vim, I get this output:


(DEFMETHOD DELETE-INSTANCE (QUEUE-ELEMENT)) - in ~/quicklisp/dists/quicklisp/software/gbbopen-20161204-svn/source/gbbopen/queue.lisp line 163
(DEFMETHOD INSERT-ON-QUEUE (QUEUE-ELEMENT QUEUE)) - in ~/quicklisp/dists/quicklisp/software/gbbopen-20161204-svn/source/gbbopen/queue.lisp line 188
(DEFMETHOD INSERT-ON-QUEUE (QUEUE-ELEMENT ORDERED-QUEUE)) - in ~/quicklisp/dists/quicklisp/software/gbbopen-20161204-svn/source/gbbopen/queue.lisp line 212
(DEFMETHOD AGENDA-SHELL:OBVIATE-KSA (AGENDA-SHELL:KS AGENDA-SHELL:KSA T)) - in ~/quicklisp/dists/quicklisp/software/gbbopen-20161204-svn/source/gbbopen/control-shells/agenda-shell.lisp line 1272
AGENDA-SHELL::CONTROL-SHELL-LOOP - in ~/quicklisp/dists/quicklisp/software/gbbopen-20161204-svn/source/gbbopen/control-shells/agenda-shell.lisp line 1536
AGENDA-SHELL::MOVE-KSA-ON-QUEUE - in ~/quicklisp/dists/quicklisp/software/gbbopen-20161204-svn/source/gbbopen/control-shells/agenda-shell.lisp line 1295


In this case it matches the ag output for callers. So again, this change would be small, and even with auto refactor tools in Eclipse for Java I'd likely just make it manually (especially since Eclipse's tool isn't the speediest to use). In vim I just highlight one of these output lines and with a command open it up and then jump to the line. Make the change, done. I'm very confident these are all the callers, though not 100%, because I've loaded the whole system into memory and am asking the implementation where it has compiled calls to function X. Since it hasn't reported any other locations, then barring things like eval or introspection, there are no other places (in this application).

Could an auto-refactor tool be built on this? Why not? The first refactor tool was built for Smalltalk, another dynamic language. When you move into the realm of dynamic analysis and having a language runtime available to answer questions about itself, you can do a lot of fancy things. In 1980 the Tinker system was being developed on top of Lisp to support TDD before anyone knew that acronym. At another old time some Smalltalk had a variant of "apropos" where instead of trying to remember a function name, you give it an input and output. This apropos variant could be made for Lisp, why not? If I'm looking for 'string-upcase, I tell the variant "blah" outputs to "BLAH". Ok, so it looks for all the functions it knows (list-all-packages and do-symbols (in a package) are both standard things) that take one required argument of type string (here I think you need implementation-dependent introspection, example: (subtypep 'string (first (first (rest (sb-introspect:function-type #'string-upcase)))))) and then call it, see if the output matches the input.

If you're worried about side effects (reasonable) you can say that this apropos-from-input-output variant needs a whitelist mechanism (tagging functions as you define them, or better some registry that can be added to at any time and with a built-in set containing valid pure functions bundled with CL such as string-upcase). Now it's not fully system scanning, but it's likely good enough to be of use for a lot of people.

In this rambling blog my only real intention was to highlight the existence of gaps in the correctness of any auto-refactor tool, and the existence of such refactor tools (or at least the building blocks for them) in dynamically typed languages, especially those with introspection abilities. A lot of static purists will say nonsense like "how can you know a function takes a string?" without static types. The answer I give is: "you just.. ask the function if it takes a string." Indeed, here's the full type signature for string-upcase in my implementation:


(FUNCTION
((OR (VECTOR CHARACTER) (VECTOR NIL) BASE-STRING SYMBOL CHARACTER) &KEY
(:START (MOD 4611686018427387901))
(:END (OR NULL (MOD 4611686018427387901))))
(VALUES SIMPLE-STRING &OPTIONAL))


Input is a base-string (or a few other things), optional keyword arguments to bound what gets upcased, and the output is a simple-string. Dynamically typed is not remotely the same as untyped. The types are still there. Sometimes a function accepts the general root type T, sometimes (if the language allows it) it's specified more narrowly.

On a more practical level, if you're still suffering from fear driven development and can't imagine successfully coding without static types, auto refactor tools, a compile step, a big IDE, unit tests, code review, number types beyond 64-bit floats, TDD, version control... Well I can only suggest you try to make something without these things. One book I highly recommend is Working Effectively With Legacy Code which will at least teach you how to confidently refactor in the absence of tests in order to introduce the presence of tests.


Posted on 2019-05-12 by Jach

Tags: lisp, programming

Permalink: https://www.thejach.com/view/id/362

Trackback URL: https://www.thejach.com/view/2019/5/why_you_should_fear_your_automatic_refactoring_tool

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.