Using Commit Messages for Documentation

A consistent problem in software development is reading code. Reading code, like reading a book, consists of a few separate levels of understanding:

  • What does the code say?
  • What does the code mean?
  • Why was that way of expressing that meaning chosen?
  • How do I interact with the code?[1]

These are roughly in reverse order of specificity: “what does the code say” is the most specific and least informative question you can ask, while “how do I interact with the code” is one of the most high-level questions you can ask, and may not necessarily require a detailed understanding of the answers to the first three questions.

To understand what code says, we can simply read the code. That is the most fundamental level of understanding. Often, meaning is just as easy to glean. As when a book describes a character as “tall”, there's typically nothing more to it, so most code is initially fairly easy to grasp in terms of meaning. Notably, the more you grow as a developer and the more code and advice and such you read and write, the more code will fit into the category of easily understood meaning. Certain patterns that can be difficult to grasp the first or second time may become second nature over time, to where a simple glance at a piece of code quickly tells you exactly what pattern you're looking at and what it's meant to accomplish.

Properly written code should be written to expose what it says and what it means as clearly as possible. That's why we adopt relatively short functions, descriptive function names, etc. But sometimes even at that point, it's difficult to elucidate the meaning of code. That's when we apply comments. The key is that writing a comment should never describe (as is often preached) *what a block of code says*, but rather more *how* or *why* it does what it's doing—its meaning.

The problem with comments, of course, is making sure they're up-to-date. You don't want someone to change the code but leave the comment alone. This is the perennial documentation problem, and it's a hard one to solve. Part of the solution involves avoiding comments when possible—relying instead on clear code and well-named functions and variables to convey as much of the meaning as possible. When comments are very high-level, typically you only need to update them if you rip out an entire chunk of code, when you should hopefully be more likely to remember that the comment is associated directly with that block.

There is, however, another solution. Really, it's complementary rather than a replacement to using comments. That solution comes in the form of commit messages. Assuming you're properly using atomic commits, your commit messages can contain the answer to *why* a particular implementation or algorithm was chosen. A comment could grow outdated, but a commit message won't because, if a given bit of code is replaced, a new commit will be created, and that message will describe the reason why the second change was made.

Using solely commit messages for the whys and wherefores is probably not the best solution in every situation. There's question of immediacy: if the information you want to provide is something that you're likely to need every time you go over a chunk of code, maybe a comment is the better place to put it. But if you're trying to explain reasoning for a chunk of code that's hidden behind a good, descriptive function name, perhaps a commit message is a better place to put that, since if someone's reading the function they're probably about to make changes to it, and they'll have the time to invest in looking for your message as part of the process of understanding the meaning and purpose of the code. 

Commit messages can probably form a large part of documentation if used properly, and perhaps if tools come to fruition that surface them more easily. One can imagine a version of reverse literate programming where the actual documentation exists in commit messages, and a tool that stitches the messages and code back together to create a coherent whole. I've played with the idea of structuring tutorials with tutorial contents in commit messages associated with the diffs of their commits—each step in the tutorial would be explained and described by its commit message. The project isn't quite usable, but it's an interesting experiment in using git features themselves for blogging, rather than just using git as a content repository for a blog.

Regardless of how hard you lean on commit messages as exclusive expositions of meaning and motivation, I think it's a good idea to write a good commit message that includes some of this information. It's somewhat redundant, yes, but commit messages are the quickest way to track the evolution of a file or block of code (or project). That makes them an ideal place to document those kinds of thoughts in a way that can be reviewed in bulk later when attempting to gain broader understanding.

I'd love to hear thoughts on this strategy, and perhaps alternate suggestions or other interesting ideas on how to deal with documentation getting outdated, and on how to leverage source commits as more than just a giant undo button (which just seems like a waste of a massively powerful tool).


[1]—There are more levels of understanding, of course, especially “what does running this code change?” Functional programming advocates want to obviate that question altogether by making the answer always be “nothing” ;) Arguably, understanding what code says and what it means is sufficient to understanding what it changes, as well.

Avoid Tables; Take the Stairs

I've recently seen an uptick in the number of people who consider the argument “tables are bad” to be mere dogma. Maybe there is some element of that; I don't disagree. It gets particularly ridiculous when people follow up with “well, don't use display: table/table-row/table-cell, either”. So now we're not making a semantic argument, which makes it a lot tougher to justify. There are still flowing arguments and any number of other things, but the position weakens.

I'd like to propose that avoiding table layouts is really the development equivalent of taking the stairs. When people talk about small shifts that can counter a sedentary lifestyle, they talk about things you usually don't think about: take the stairs up to and down from the office; park further out at the store; drink extra water so you have to go to the restroom periodically. All of these things force your hand a little, they make you stretch your legs, and they insert small breaks in your day that you can use to let your mind wander a bit.

This latter bit is a secondary advantage, not the primary purpose. But it's still powerful. If you don't whip out your phone whenever you take your short breaks or whenever you're heading up the stairs, it's an opportunity for your mind to get a bit creative. Maybe you explore a different set of stairs each day. Maybe you check out the restrooms on other floors, out of curiosity (depending on the office building, every floor may be the same, of course…). It's not that using the elevator isn't possible or necessarily introduces, every time you use it, a huge problem. But by using the elevator every time, you are coincidentally also removing opportunities to think and explore.

Every time I run into a problem and the thought occurs “man, tables… They would make things so much easier here…”, it's a challenge. If you've been developing for a while, challenges become rarer—that is the nature of the beast. Most challenges end up translating to experimentation with new features. Here, though, we have an opportunity to explore existing features in a deeper way. You discover properties of certain layouts (inline-block, floats, etc) that you didn't know they had. You gain a deeper understanding of the ones you already knew they had. It's a treasure trove!

When I TAed an intro to object oriented programming class, I jokingly argued that a lot of the features in OO that make things easier for a developer stem from the fundamental truth that programmers are lazy. There are some definite truths in that claim. “Laziness” often behooves us as developers because the right kind of laziness leads to better architectures, and it also leads us to avoid premature optimization and a number of other problematic practices. Laziness done wrong, however, also means never growing. If you're too lazy to go outside, you never explore the outside world.

As a developer, you want to be lazy in terms of expanding your code base, but not lazy in terms of exploring new solutions. And building in a wall or two here or there that forces you to explore new solutions periodically is a great way to make sure you don't accidentally start standing still. Dogma or not, avoiding tables for layout gives us a chance to explore some fascinating alternate properties in CSS.
For example, I don't know if the fact that overflow: hidden clears internal floats would necessarily have been discovered without the push to leave tables. Even if it had, who knows if it would be as widely-known. And this is a technique that isn't simply a way to clear floats, it gives you an incrementally better understanding of how overflow properties work compared to what you had before. It gives you new knowledge that is tangential to the actual problem you're trying to solve.

Avoiding table layouts may be dogma, but I'm going to keep doing it. Every time I get that rush of having solved a particularly gnarly float-inline-block interaction to produce the layout I was looking for, I know I've learned just a little bit more of how these things work together. And I'll be looking for more such roadblocks—ideally these should be relatively small, and should only present themselves as formidable obstacles rarely—in other places, to experiment and see where they are tolerable, where they are terrible, and where they give me the most opportunity to learn. I will, in short, be trying to take the stairs a bit more often.

Guest Post: Learning to program is hard, but languages are not to blame.

Recently I had a brief discussion on Twitter about the ease of use of programming languages. The points I wanted to make warranted more space than 140 characters would allow, so with thanks to Antonio for the blog space, here they are.

For decades, the holy grail of programming languages has been to create a language that is as simple to adopt as possible, while providing as vast an array of functionality as imaginable. Every new language boasts of its ease of use: "See? You can do a network call and pass the result to your view in one line!" Yet beginners still feel that programming is a black art, and many people, some experts included, feel that programming languages are to blame. I disagree.

In a perfect world, programming is as simple as talking to the computer, which interprets the commands and does what the author means. Unfortunately this requires superhuman intelligence on the computer's part: not only does it need to interpret your language of choice, resolve any ambiguity arising due to your accent, and set down what you said, it also needs to resolve any ambiguity arising due to your choice of words, catch contradictions in your instructions, ask you to resolve them, and so on. It's so complicated that human beings can't do it right: specifying requirements is one of the most complicated and error-prone processes imaginable. So the day is still far off when programming can be reduced to writing a requirements spec. Until then, we are left with using actual programming languages, which can be quite daunting.

Let's consider designing a language with user experience in mind, by which we mean that beginners would feel right at home in it (this is not the only user experience, but let's pretend it is). What would such a beast look like? Already we have languages like Ruby and Python that can look pretty much like English. If those are too difficult to understand, then clearly bringing the language closer to English doesn't help much.

We also have languages that try to help by working very hard to ensure that a certain type of consistency exists in the code; Haskell, for instance, ensures that you are working with whatever data you claim to want to work with, and that you do not change it in a uncontrolled manner. Yet Haskell is considered a difficult language to master, because in order to provide this functionality, it introduces complex mathematical abstractions that the user must thoroughly understand. Clearly, helping the user avoid errors (even if only of a restricted kind) is not a bonanza, either.

What's left? Really, only one thing: providing pre-packaged solutions for known problems, aka, libraries. And we have them, for every language: it has probably been years since a working programmer has had to write network socket calls—some generous and knowledgeable souls wrote the code, tested it thoroughly, and then donated it to the community. And libraries like jQuery take things that once were thought horrendously complex and make them trivial. This helps a lot—but now it adds a new form of complexity: the libraries one must be familiar with. Again, a barrier to entry.

Let me posit, to explain this complexity whack-a-mole, that learning programming is hard not because of the languages are difficult, but because programming is difficult. Why? Because it solves difficult problems.

To be sure, some languages are easier to learn than others, up to a point, but that is a very superficial ease. If your goal is to write a program that says "Hello World" on the console, half a dozen languages let you do that in one line of code—can't get much simpler than that. But if you want your browser to take a string, interpret it as a layout, add reactions to it based on user interaction, fetch data, and make the whole shebang operate as if the user were in a desktop application while making asynchronous calls to the server...well, of course it's going to be hard. No language in the world will help you because this is not a simple problem.

If you're starting to learn programming and find it hard, good! You're not deluded: it is hard. Of course, I encourage you to give yourself the benefit of the doubt when your gut instinct says "this should be easier". And if you have a way to make it easier, write it! Publish it! We will love you for it. But the truth is, if it's harder than it seems it should be, it's probably because it's not easy to simplify. Because even though it looks like a simple concept, endless edge cases will bring it in conflict with other parts of the system in unpredictable ways. Because we have not yet reached that level of abstraction.

This is really the crux of the matter: abstraction. Once upon a time, programming was literally guiding bits through a processor, with punch cards. Now we get to tell browsers (browsers!!) to layout boxes of content, to create gradients, to display specific fonts, or even to use tables (but don't do that; we keep it civilized). Whatever problem you think could be solved by a better language probably can't, but it probably can be solved by a better abstraction, which one day might make it into a language. The languages we have today reflect the level of abstraction we have gotten to. They are our report card.

So, yes, we should try to improve our languages. But let me dispel an illusion right now: that will not make programming easier or more accessible. Because when, after blood and sweat and tears, doing X is finally easy, we will move on to wanting Y, which should be so easy, beginners could do it—if only these darned languages didn't stand in the way.

--

You can follow Alexandros on Twitter as @nomothetis.

The pendulum (or why you shouldn't be despairing over SOPA)

It seems that, at least in the geek community, pessimism is the norm. Perhaps this is rooted in a certain difficulty understanding others outside the community. Perhaps it is simply arrogance. Regardless, it's there. And it is, particularly when it comes to politics, likely unfounded.

My intent isn't really to exhort all political pessimists to stop. Rather, I ask that perhaps you consider your fatalistic interpretations in a greater context: that of a swinging pendulum. One fixed to a string, the string slung across an infinite bar. The pendulum swings back, then forward, carrying just enough momentum to drag the string forward an inch. Then back again, and forward, the string pulled another inch. This is progress. 

Progress is not this image we frequently conjure up of step after bold step, strong, steady inexorably moving forward without a pause or waver of uncertainty. The past, in its simplest form, often looks this way. We lived in caves, now we live in cities. We cringed from lions, now we show them off in zoos, captivity our ultimate victory. But in the middle are the merciless swings of the pendulum. Forward—the Reformation. Backwards—the rigid religiosity of all Christian denominations that result, a defensive reflex. Forward—the New World. Backwards—brutalizing the natives, wars over every inch of this world. Forward—mastering the machine, industrialization. Backwards—two massive World Wars.

Not everything moves forward at once, and backwards movement is likewise not synchronized. Some swings take decades, some centuries. But the swing is ever there. The push back against progress is almost as strong as the progress itself—almost. And those pushing for change are always those who feel it. The ones who see and feel the setbacks. But they are also the ones who cannot give up. The ones who must not. Progress is only the stronger force because they make it so.

At times, by its very nature, progress is foisted upon the masses by the few. Then, it is foisted upon the remaining few by the masses. Computing had a bit of a battle to get integrated, considering the jobs it eliminated and the learning curve it entailed. Now it's here, foisted upon the masses by the few engineers and marketers who showed up first. But, the law has to catch up. SOPA, PIPA, all of these laws, are the flip side: the masses have to foist this progress on the few who are still resisting it. And foist we shall.

To those who fight these fights: do not despair that you must fight them. This is the way of things. We take a leap, then everyone has to be dragged to catch up. Then we take another leap. And more importantly: do not stop fighting. The fight is why progress wins, every time.

In the 1960s and '70s, South America was embroiled in a series of democratic pitches to the left, countered by a series of military coups that lurched them back to the right. The pendulum was swinging at its very wildest. The advocates of progress, those who fought for the rights of the disenfranchised and injusticed, were, as is often the case, the artists and the singers. One song that emerged from those tumultuous years continues to be a mantra to movements both in the region and around the world, and to this day remains ingrained in the memories of my parents and, through them, my own: “El pueblo unido, jamás será vencido”: the People, united, shall never be defeated.

SOPA and PIPA are fighting against us all. We may win next year, five years from now, or two decades hence. But the People, united, shall never be defeated.

Unexpected NullPointerExceptions in Lift Production Applications

TLDR: NPEs/InvocationTargetExceptions happen when either the servlet container or the reverse proxy starts discarding requests associated with suspended continuations, as used by Lift for pending comet requests. This happens presumably when the container or proxy reaches a certain threshold of unprocessed requests and starts discarding old requests to keep up with new ones.

About eight months ago, when OpenStudy first started having decent amounts of online users at the same time interacting with the site, we started occasionally seeing NullPointerExceptions in Lift. These were very strange, stemming from attempting to access the server name or port, and seemed to happen mostly in comet or ajax requests. There has been a thread regarding jetty and these issues on the Lift mailing list (https://groups.google.com/d/msg/liftweb/x1SIveK_bK0/asHehgvGXe8J) for a while, so we decided to make the move to Glassfish.

Some people had mentioned jetty hightide didn't have the problem, but we tried it and ran into the same issue. Upon moving to Glassfish, we found other problems. Glassfish 3.1 leaked file descriptors on our server, so it needed to be periodically kicked. Deeming that unacceptable, we switched back to jetty 6 and decided to absorb the NPE issues until we had a chance to look deeper at the cause. Around this time, we also switched to a bigger EC2 instance to defray some load issues without scaling out quite yet. Jetty 6 stayed a lot quieter this time.

The issues seemed to have been tied to load from the very beginning, but the key that led to our conclusions was a bug that sneaked into one of our releases that caused a rapidly-increasing internal load to build up (this was a synchronization error in our actor registry that led to actors not being spawned properly). The reason this proved to be the key is because, once it produced the NPEs and we failed to track down the bug quickly, we decided to try the other servlet containers until we could sort things out.

We went through Glassfish again, then Jetty 7, Jetty 8, and finally Tomcat. Each and every one of these exhibited similar behavior and exceptions. The only difference that arose is that Tomcat finally gave us a useful underlying cause for the exception:

java.lang.reflect.InvocationTargetException: null
        at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_26]
        at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_26]
        at net.liftweb.http.provider.servlet.containers.Servlet30AsyncProvider.resume(Servlet30AsyncProvider.scala:102) ~[lift-webkit_2.8.1-2.4-M4.jar:2.4-M4]
[…]
Caused by: java.lang.IllegalStateException: The request associated with the AsyncContext has already completed processing.
        at org.apache.catalina.core.AsyncContextImpl.check(AsyncContextImpl.java:438) ~[catalina.jar:7.0.23]
        at org.apache.catalina.core.AsyncContextImpl.getResponse(AsyncContextImpl.java:196) ~[catalina.jar:7.0.23]
        ... 22 common frames omitted

Finally, the underlying IllegalStateException gave us the clue we needed to figure out what exactly had happened: the servlet containers are killing off the requests before the relevant continuation is woken up to deal with said request. Jetty 7/8 gave something similar, though much less clear:

java.lang.reflect.InvocationTargetException: null
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_26]
        at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_26]
        at net.liftweb.http.provider.servlet.containers.Servlet30AsyncProvider.resume(Servlet30AsyncProvider.scala:106) ~[lift-webkit_2.8.1-2.4-M4.jar:2.4-M4]
        at net.liftweb.http.provider.servlet.HTTPRequestServlet.resume(HTTPRequestServlet.scala:163) ~[lift-webkit_2.8.1-2.4-M4.jar:2.4-M4]
[…]
Caused by: java.lang.IllegalStateException: IDLE,initial
        at org.eclipse.jetty.server.AsyncContinuation.complete(AsyncContinuation.java:534) ~[na:na]
        ... 22 common frames omitted

All indications point to the problem being the same in the case of Jetty 6, since the behavior manifested in the same situation. We did fix our own internal bug, but shared this valuable information as to the cause of the null exceptions with David Pollak.

We've filed a ticket on Lift itself for a more informative error message in this case. That said, generally speaking, if you see this, it means there's a load issue somewhere along the application pipeline that you need to look at. Requests are dying, and you need to figure out how to deal with that. At the moment, we haven't gotten a chance to investigate whether it's the container or nginx terminating the request early, but we'll be looking into it if and when the problem shows up on our system again.