If you know much about software development, then you probably have some idea what refactoring is. It was made more popular some years ago by Martin Fowler’s book Refactoring, which defines the term as “improving the design of existing code.”[1] The code referred to, is of course software source code, the specialized text used to tell a computer how to perform some task. It’s generally accepted that refactoring is a good practice for fixing unruly code, as the schedule allows. Then why does my title for this blog entry tell you to stop refactoring your code? There’s more here than meets the eye.
First of all, it would be helpful to dig deeper into refactoring. The Refactoring book describes a number of changes referred to (aptly enough) as refactorings. Each refactoring is designed to change the source code without changing the behavior of the software. For example, one such refactoring involves adding a local variable in a function, giving it the value of an expression occurring in that function, and then using the variable in place of those occurrences of the expression. Under the right conditions, this refactoring can make the code easier to understand and modify.
Of course, no one is perfect, so sometimes a refactoring doesn’t quite work out right. It might make the code itself look simpler in one way but end up making the code harder to change in a different way. How can this new problem be undone? This is unexpectedly easy, since the refactorings in Fowler’s book can often be paired as opposites, with one such refactoring undoing the other. The example refactoring in the previous paragraph introduces a temporary variable in place of an expression. The opposite refactoring “inlines” that variable, which here means replacing each use of that variable with the expression it had replaced. With the temporary variable unnecessary, the next logical step is to remove it. Of course, introducing a variable in place of that expression, is the opposite of inlining the variable as well.
How does source code get to be messy in the first place? If you develop software, then you already know the answer to this question. Maybe you make a bunch of little changes here and there, copy code and paste it in one or more places, keep adding more code, and eventually end up with a mess if you’re not careful. I ran into this problem relatively early in my career and went looking for a solution. I could make small improvements in code clarity here and there, but my approach was not organized and sometimesled to new bugs. It was not unusual for me to just rewrite a segment of source code from scratch. The Refactoring book appeared in book stores at just the right time for me and showed me how to clean up the code in a safe and organized way. Also important, it said that cleaning up the code without adding functionality was OK after all, especially if it makes it easier to add functionality (in spite of management opinion), so maybe I wasn’t crazy or overly picky for wanting to do that cleanup. Today, I still make numerous source code changes like I did earlier in my career, but now I also straighten up the code as I go along (or a bunch at once if necessary). Otherwise, my changes could lead to messier code.
And that mysterious blog title? Up to this point, I’ve been saying that refactoring is generally good, so I’m probably not telling software developers to stop refactoring and leave their source code in bad condition. What then do I mean by the title? There are only so many words in the title. The word refactoring in the title, means what we think it does, as do the words stop and your. That just leaves the word code. Yes, I’m still referring to source code. Then what in the world am I talking about?
If you’ve ever been fortunate enough to learn even a little bit of a foreign language, the instructor or author may have advised you to learn the concepts behind foreign words and phrases instead of mentally translating everything into English (or whatever your native language is). Thus, be able to understand foreign language directly, without that intermediate translation step. In discussing the length of a function, the Refactoring book mentions “the semantic distance between what the method does and how it does it.” I think of this as an aspect of any segment of code, not just a single/whole function. So it’s important to read the code as business concepts instead of as programming language parts or abstract programming constructs that can be found in a programming textbook. This makes sense, since a programming language is basically a foreign language. After all, I seriously doubt anyone has a programming language as their primary language (unless referring to AI that can accept natural language as a programming language). Just as some segments of natural language are easier to understand than others, some segments of source code are easier to understand. In mapping source code to concepts we can understand, it can be said that clearer source code has a shorter semantic distance to the concepts it represents (what it does).
Or put another way, clearer source code is written more in terms of business concepts than programming constructs. How do you make source code clear enough to express important business concepts properly? Refactor it if necessary, but make the code primarily express business concepts, not programming constructs. But I just said in the title, don’t refactor your code. What I really mean is, don’t refactor to code, but instead refactor to concepts. And then when you want to change the design of the code without changing how it behaves (the definition of refactoring), do so in terms of concepts. But it must be concepts at the appropriate level.
Why all the fuss? In my career, I’ve noticed on occasion that I refactored some code and emphasized some structures that were technically correct – they did what I wanted and were being reused – but they did not express relevant business concepts. They actually obscured the business logic. It took me a little while to figure it out, but I eventually realized that I had refactored in terms of programming constructs instead of business concepts. In one case, I looked at parsing lines of a file into essential parts instead of looking at deriving meaning from the data. I overemphasized how it did the task – processing the file – instead of what it was trying to accomplish. This drives home the point that code in a given function/method should be at the same level of abstraction. While it’s fine and even necessary to use programming constructs like parsing or stacks, they belong at an appropriate conceptual level in the source code. At any level, deal with concepts of that level, not programming constructs and not the programming language itself, unless that is what happens at that level of abstraction. (Ultimately, programming languages are abstractions on top of assembly language, which is on top of CPU instructions, which is implemented in terms of logic gates, which are circuits that treat voltages levels as ones and zeroes, etc. Every level has its abstractions.)
As an example, suppose I want to implement a feature that (for whatever reason) removes the last word from a sentence. Instead of implementing that feature in terms of dividing the text into substrings ending with periods (sentences), finding the right sentence, and then dividing the sentence into substrings separated by spaces, I should implement it in terms of first identifying the target sentence and then finding the last word of that sentence. It should be in terms of sentences and words, not lists and strings. Granted, a paragraph may be implemented as a list of sentences and a sentence as a list of words, and ultimately those lists have delimiters between items in the original text, neither list nor string is a business concept at the level of my feature. They are lower-level concepts. So if I want to refactor, I should refactor in terms of the concepts of whatever level I’m looking at.
And how do I get the code implemented in terms those concepts? Yes, I may have to refactor, but at some point I should try to get the code at a given level implemented in terms of abstractions that appear at that level. Refactor in terms of concepts, not code. Hence don’t refactor your code, but refactor the concepts. And if you have details that are too low for a given level? They should be abstracted as concepts. Your code at that level should not know about those lower implementation details. In a sense, your code at that level should be “dumb” in terms of not knowing such details. So at a given level, write/refactor in terms of concepts but also write dumb code.
Much of the discussion here seems to take place within a single function or method, but it applies at a larger scale as well. Fields and the executable members of a class (methods and properties) should be at a level of abstraction appropriate for that class. Similarly, refactoring in that class should involve the business concepts in that class, at least as far appearances from the outside are concerned. For example, an address class could have properties corresponding to street address, city, state, and zip code. Logic in that address class could be in terms of those items. Refactoring should be in terms of those items if appropriate. Maybe a method for generating text for the address could create a list of the items, adding appropriate delimiters (commas, spaces, or line endings). External code that uses that method would not need to know or care about concatenating the properties, though it might specify delimiters (or provide an object that does).
I should add a disclaimer, that my opinions (shared by others or not) may not work in all situations. Some people say that software development has reached the level of an engineering discipline, and others disagree. Some people say that software development is still more art than science. I think it’s a mix of all these, and that the same might be true of other disciplines. Some things can be done mechanically without thinking or even be automated, while other things work out better with real human judgement. I think refactoring is similarly mixed, with parts that can be done mechanically and parts that need human judgement. Also my opinions may turn out to be just plain wrong, for whatever reason. Look at the concepts, do what makes sense at whatever level.
References
[1] Fowler, Martin. Refactoring: Improving the Design of Existing Code. Reading, Massachussetts: Addison-Wesley, 1999. See also, Refactoring.com. Second edition at: https://www.pearson.com/en-us/subject-catalog/p/refactoring-improving-the-design-of-existing-code/P200000000254/9780134757599.
(c) Copyright 2023 by Mike Ferrell