Something's wrong with the world today

Introduction

The test of the machine is the satisfaction it gives you. There isn't any other test. If the machine produces tranquillity it's right. If it disturbs you it's wrong until either the machine or your mind is changed. The test of the machine's always your own mind. There isn't any other test.   — Robert M. Pirsig, Zen And The Art Of Motorcycle Maintenance

Note: I originally wrote this article in July 2015. Almost 4 years later I looked at it and thought it's worth publishing, so here you go.

I've been doing Ruby programming at my 3 last jobs, and I'm starting to get increasingly frustrated with the language, the libraries, the tools, the mindset and the direction in which the language is heading (read: enterprise). In this article I'd like to talk about simplicity.

The problem

Apparently "This Is The House That Jack Built" is a thing. Where most people see a story for children, us programmers see a problem that needs a program written to solve it. And of course reproducing the original is no fun, so we want to spice it up with the ability to generate our own stories in similar fashion.

If we limit ourselves to the first lines (or "iterations"), the story we want to generate would look like this:

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt that lay in the house that Jack built.

This is the cat that killed the rat that ate the malt that lay in the house that Jack built.

Sketch of a solution

The first step would be to look at the example input and output, and figure out the steps needed to transform the input into the output. But since we only have the output we need to figure out the "algorithm" (or procedure) used to generate that output.

One thing is immediately apparent: each successive sentence is longer than the previous. And if we look closer we may notice that all sentences share the same tail, and new things are added at the beginning of each sentence. If we align common parts of the first two sentences we get the following:

This is                      the house that Jack built.
This is the malt that lay in the house that Jack built.
======= ++++++++++++++++++++ ==========================
 same           new                     same

Comparing the second line to the third makes me think I see pattern:

This is                                       the house that Jack built.
This is                  the malt that lay in the house that Jack built.
This is the rat that ate the malt that lay in the house that Jack built.
======= ++++++++++++++++ ===============================================
 same           new                     same

It's obvious (to me) that the "This is" prefix is the same on each line (read: does not depend on input data). Then, ignoring this prefix, we identify the following "building blocks", and put them into a file:

the house that Jack built.
the malt that lay in
the rat that ate
the cat that killed

Now that we have the input data, and understanding of how that data is transformed into output, it should not be hard to write a command which takes this file and outputs the sentences we expect. Running it would look like this:

cat lines.txt | recite

The recite command would read successive lines from input and prepend each new line to the current line (initially empty), then print the constructed line with "This is" as a prefix.

Writing such a command seems easy enough. It might be more interesting if the story would change every time. Could our approach accommodate this functionality? In shell syntax this would look as follows:

cat lines.txt | shuf | recite

At this point the programmers among us might realize (without even writing a single line of code) that the final period in the first line of input file should not be part of input. So we update our input file so that it looks like this:

the house that Jack built
the malt that lay in
the rat that ate
the cat that killed

At this point we decide that our plan is good enough, and write our recite script:

ACC=""
while read LINE; do
  # Treat the first line specially: no need for
  # separator space or newline.
  if [ -z "$ACC" ]; then
    ACC=$LINE
  else
    # Separate the lines so we can clearly see them
    # (gets messy if the lines are wrapped).
    echo
    ACC="$LINE $ACC"
  fi

  echo "This is $ACC."
done

We check that it works:

cat lines.txt | recite

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt that lay in the house that Jack built.

This is the cat that killed the rat that ate the malt that lay in the house that Jack built.

And with the shuffling:

cat lines.txt | shuf | recite

This is the house that Jack built.

This is the cat that killed the house that Jack built.

This is the malt that lay in the cat that killed the house that Jack built.

This is the rat that ate the malt that lay in the cat that killed the house that Jack built.

At this point we can share our script with our friends. Or use it as a party game where everybody contributes to the story by putting the lines in one big file. Or, to make it even more interesting, the lines can be put in separate files so that the participants don't see what others have added, and then, instead of cat lines.txt we could just do cat *.txt. Or sort the files chronologically. If we want shuffling, but keep the lines in each file together (applies only to files with multiple lines), instead of shuffling the lines we can shuffle the file list:

ls *.txt | shuf | xargs cat | recite

I could go on like this for a while, so now would be a good time to tell you why I'm actually writing this.

When things go wrong

You're still here? OK then, let's get back to our problem. And since we're real programmers here let's solve it like real professionals would. Oh but wait, somebody already did! There was a talk at Railsconf 2015 titled "Nothing is Something." The presentation itself I liked a lot — it was very fluid and entertaining. I'd like to invite you to stop reading this article and watch the talk now.

At around 17:00 the speaker arrives at the following solution:

class House
  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{phrase(number)}.\n"
  end

  def phrase(number)
    data.last(number).join(" ")
  end

  def data
    ["the horse and the hound and the horn that belonged to",
     "the farmer sowing his corn that kept",
     "the rooster that crowed in the morn that woke",
     "the priest all shaven and shorn that married",
     "the man all tattered and torn that kissed",
     "the maiden all forlorn that milked",
     "the cow with the crumpled horn that tossed",
     "the dog that worried",
     "the cat that killed",
     "the rat that ate",
     "the malt that lay in",
     "the house that Jack built"]
  end
end

Sure the code is short and cute, but there are still things I do not quite like:

  • There is no reason for the phrase method to have an implicit dependency on data method. data should be an explicit parameter — implicit parameters can only be "injected", the explicit ones can be just passed in. If you watched the talk you should have noticed the speaker implying supplying things as parameters is a good thing.
  • The line method taking the number parameter. In what way is that related to what it does? Implicit dependencies all over the place. Ugh.
  • The name of the class: House. Later in the talk the speaker talks about how RandomHouse is not actually a different kind of House — it supposedly is the original House with a random dependency injected into it. Know what? Let's question another assumption! Namely the House being a "house". It's not! The name somehow sneaked in from the last line (or the first, depending on how one looks at the problem). But as we see later, that choice is random, because any other line might be the first one. So why not call the class Horse? Or Potato? I'm pretty sure somebody at some point will add a line containing "potato" to the input.
  • Why did we end up with a class, anyway? I'm pretty sure the task was not to write a class.

After "solving" the original problem additional requirements start rolling in. First one (shuffling the lines) is solved by RandomHouse class. The speaker has added a few extra restrictions on the requirement: write the new class (writing a class was never really a requirement, but let's play along) without breaking original class and without using conditionals (19:50).

I'd also like to note here that in the previous section we implemented this very requirement while adhering to three, not two, restrictions:

  • We did not change our script.
  • We did not use conditionals. The conditional in our recite script does not count because the restriction here is about adding the new functionality without using conditionals.
  • We did this by re-using code, which is the holy grail of programming (and the promised-but-never-delivered feature of OOP).

Let's continue with the talk. Next requested feature is the ability to duplicate each input line, with the same restrictions as before, resulting in an EchoHouse class (20:40). Along the way (multiple) inheritance is dismissed as a non-solution to the problem (23:30, 25:20), which is a topic for another article.

This is the point in the talk where I actually get very annoyed with the whole mental masturbation thing where jumping through hoops is considered a major accomplishment and the hoops are set up in increasing numbers as the talk progresses. Supposedly the right thing to do is to write collaborator classes, and inject instances of those into the House class. My guess is that passing parameters is too old-school and boring, so the process must be called "injection".

Here are the collaborator classes which the House class now accepts in the constructor (I'll have a few things to say about naming these later):

class DefaultOrder
  def order(data)
    data
  end
end

class RandomOrder
  def order(data)
    data.shuffle
  end
end

class DefaultFormatter
  def format(parts)
    parts
  end
end

class EchoFormatter
  def format(parts)
    parts.zip(parts).flatten
  end
end

The presentation did not have the full source to the class (or I have missed it) so I've done the thing left as "an exercise for the reader" into the original and this is how it looks now (using only 3 input lines to conserve electrons):

class House
  DATA = ["the rat that ate",
          "the malt that lay in",
          "the house that Jack built"]

  attr_reader :formatter, :data

  def initialize(orderer: DefaultOrder.new,
                 formatter: DefaultFormatter.new)
    @formatter = formatter
    @data      = orderer.order(DATA)
  end

  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{phrase(number)}.\n"
  end

  def phrase(number)
    parts(number).join(" ")
  end

  def parts(number)
    formatter.format(data.last(number))
  end
end

With the changes in place, this is how the classes can be combined to solve the different variations of the problem. First, with using defaults:

puts House.new.recite

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt that lay in the house that Jack built.

With shuffling:

puts House.new(orderer: RandomOrder.new).recite

This is the malt that lay in.

This is the house that Jack built the malt that lay in.

This is the rat that ate the house that Jack built the malt that lay in.

With lines duplication:

puts House.new(formatter: EchoFormatter.new).recite

This is the house that Jack built the house that Jack built.

This is the malt that lay in the malt that lay in the house that Jack built the house that Jack built.

This is the rat that ate the rat that ate the malt that lay in the malt that lay in the house that Jack built the house that Jack built.

With shuffling and duplication:

puts House.new(orderer: RandomOrder.new,
               formatter: EchoFormatter.new).recite

This is the malt that lay in the malt that lay in.

This is the rat that ate the rat that ate the malt that lay in the malt that lay in.

This is the house that Jack built the house that Jack built the rat that ate the rat that ate the malt that lay in the malt that lay in.

Searching for the light

Now, we can see the code works (for some values of "works", see below). But at what cost? My personal opinion is: if your solution looks like this, you did something wrong. Something very, very wrong1. How wrong exactly? I'm glad you asked!

Collaborators

If you look at the four collaborator classes defined above for more than two seconds, you might notice that they all look very similar:

class BlahBlah
  def execute(parameter)
    # Do the thing.
  end
end

Reminds you of anything? What in your programming career have you seen that's executable, accepts parameters and does not maintain state? That's right: functions! Don't worry if you had trouble recognizing this — functions in the Kingdom of Nouns are F-words, and people are brainwashed to see things where there aren't any.

How about we skip the ceremony, and drop the whole class proliferation business?

module Functions
  def self.identity(data)
    data
  end

  # If this looks fishy it's probably because of the
  # smell.
  def self.shuffle(data)
    data.shuffle
  end

  # This is how the EchoFormatter#format should have
  # been implemented.
  def self.echo(data)
    data.map do |line|
      line + " " + line
    end
  end
end

Now we only have three functions (instead of four) because DefaultOrderer and DefaultFormatter are literally doing the same thing. Why have two different classes do the exact same thing, anyway2? Let's start using our "functions":

class House
  DATA = ["the rat that ate",
          "the malt that lay in",
          "the house that Jack built"]

  attr_reader :formatter, :data

  def initialize(orderer: Functions.method(:identity),
                 formatter: Functions.method(:identity))
    @formatter = formatter
    @data      = orderer.call(DATA)
  end

  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{phrase(number)}.\n"
  end

  def phrase(number)
    parts(number).join(" ")
  end

  def parts(number)
    formatter.call(data.last(number))
  end
end

Notice how unwieldy it is to refer to the functions we wrote. Also notice the change we had to do in initialize and parts methods to use the functions instead of classes — all it took is to use the same call method instead of order and format. Even though it does not matter if the function is named call or execute3, consistency is good, and I consider this an improvement!

Does it still work, though? Sure does:

puts House.new.recite

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt that lay in the house that Jack built.

puts House.new(orderer: Functions.method(:shuffle)).recite

This is the malt that lay in.

This is the house that Jack built the malt that lay in.

This is the rat that ate the house that Jack built the malt that lay in.

Unnecessary dependencies

Now if you recall from my initial list of issues with the House class there's the item about methods having implied dependencies. In programming circles it is called coupling, and unnecessary coupling is bad for code maintainability.

Let's start with the line method. All it should be doing is prepending "This is" and appending a period to a string. But instead of passing a string, the calling code passes in a number, which is passed to another method to obtain the string. Bad style, much coupling, very smell. One of the things that movies have taught us is that knowing too much is harmful for one's well-being (or indeed, survival). So let's get this poor method out of the trouble:

module Functions
  def self.line(input)
    # Notice that we don't add a newline character
    # because newlines are part of output, not data.
    # And we don't know what the caller of this function
    # will do with the result of this function.
    "This is #{input}."
  end
end

There, line is independent and strong now, and does not know anything it should not, so also safe.

Next up is recite method. There is absolutely no need for this method to know how many lines there are in the story. Which makes me realize that the method is ill-conceived from the beginning. The "function" should generate the output lines by adding new lines in front of the previous lines:

module Functions
  def self.recite(lines)
    seen = []
    lines.map do |line|
      seen.prepend(line)
      seen.join(" ")
    end
  end
end

And since this "function" now does one thing only (as all functions and methods should), we need another function to actually do the output:

module Functions
  def self.this_is(lines)
    lines.each do |str|
      # We add an extra newline for the purpose of this
      # article so that it's easy to see where the lines
      # start and end when they are wrapped.
      puts line(str), "\n"
    end
  end
end

Using these functions is a bit cumbersome, and looks like this:

module Functions
  DATA = ["the house that Jack built",
          "the malt that lay in",
          "the rat that ate"]

  this_is(recite(DATA))
end

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt that lay in the house that Jack built.

Just look at that, it works! So what about the shuffling? Easy to add:

module Functions
  this_is(recite(shuffle(DATA)))
end

This is the rat that ate.

This is the malt that lay in the rat that ate.

This is the house that Jack built the malt that lay in the rat that ate.

Makes no sense, but that's what the customer asked for. How about the echo thing?

module Functions
  this_is(recite(echo(DATA)))
end

This is the house that Jack built the house that Jack built.

This is the malt that lay in the malt that lay in the house that Jack built the house that Jack built.

This is the rat that ate the rat that ate the malt that lay in the malt that lay in the house that Jack built the house that Jack built.

At this point we can chain shuffle, echo and recite as we want, and it will work as advertised. Even better, because we're not limited to just one instance of each.

If you followed the code carefully you might have noticed that my "functional" implementation of echo is very different from the one in the presentation. We could use the following definition (perfectly valid):

module Functions
  def self.echo1(data)
    data.zip(data).flatten
  end
end

Can you guess what will happen if we use this function?

module Functions
  this_is(recite(shuffle(echo1(DATA))))
end

This is the malt that lay in.

This is the rat that ate the malt that lay in.

This is the house that Jack built the rat that ate the malt that lay in.

This is the malt that lay in the house that Jack built the rat that ate the malt that lay in.

This is the house that Jack built the malt that lay in the house that Jack built the rat that ate the malt that lay in.

This is the rat that ate the house that Jack built the malt that lay in the house that Jack built the rat that ate the malt that lay in.

May I invite you to figure out from the code why the EchoFormatter works in the version of the code from the talk, but not in our functional version? If you don't want to figure it out yourself I can tell you — because the supposedly proper object oriented code in the talk is so silly, coupled and non-composable that it somehow manages to work… by accident! More details below.

Idiomatic code

The "OO-inclined" readers reading this might be talking to their screens and saying I'm full of shit and how is this crap code I've come up with even supposed to be better than the original nice and clean and OO-done-right-according-to-best-practices-using-design-patterns-and-whatnot? Well, let's see what we can do about that.

We might try and invent a DSL by writing a new class and overriding the pipe (|) operator, but that's a whole new can of worms we're not going to touch here.

There is room for improvement, though. What I want to achieve is something that looks like what I had in beginning of the article. Since we're still using Ruby, the following code looks close enough:

DATA.shuffle.echo.recite

In order to achieve that, we will have to modify (read: monkey-patch) the Array class. Even though everybody knows that monkey-patching is bad, it is besides the point in this article. Just look at me moving all the functions defined above into the Array class4:

class Array
  # shuffle is already there.

  def echo
    self.map do |line|
      line + " " + line
    end
  end

  def recite
    seen = []
    self.map do |line|
      seen.prepend(line)
      seen.join(" ")
    end
  end

  def this_is
    self.each do |str|
      puts Functions::line(str), "\n"
    end
  end

  # While we're here let's add a convenience method
  # since we're always ending our calls with
  # .recite.this_is (but we don't have to).
  def verse
    self.recite.this_is
  end
end

# Put this here so we have something to work with.
DATA = ["the house that Jack built",
        "the malt that lay in",
        "the rat that ate"]

Did not take much effort, now did it? Let's try it out:

DATA.shuffle.verse

This is the rat that ate.

This is the house that Jack built the rat that ate.

This is the malt that lay in the house that Jack built the rat that ate.

Works as requested! Now, we can go wild and implement a couple more features before the customer asks for them:

  • Ability to duplicate input lines.
  • Ability to squash neighbouring lines together.

In the House class it should be a matter of writing a collaborator class or two, right? Go on, try it! Continue reading when you're done. I'll be extending my version in the meantime.

class Array
  def duplicate
    # Reminds you of something?  Right, this is how the
    # EchoHouse was implemented in the talk.
    self.zip(self).flatten
  end

  def squash
    self.each_slice(2).map do |*lines|
      lines.join(" ")
    end
  end
end

There, I'm done! So, how big of a change was it for you? Had to rewrite the whole House class? Oh, that's too bad — I wish there were some re-usable pieces of code for you to use!

I'd also like to point out that there is not a single conditional in my code. I did not even have to jump through hoops to achieve that. Not that I'm scared of conditionals, but the code is so straight-forward I have not even had a need for them.

Looks too good to be true? I can prove it works:

DATA.duplicate.shuffle.squash.verse

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt that lay in the malt that lay in the house that Jack built.

This is the house that Jack built the rat that ate the rat that ate the malt that lay in the malt that lay in the house that Jack built.

How about we keep the feature requests coming?

  • an ability to reverse the story,
  • or elide first N lines,
  • or last N lines,
  • or random N lines.

Would we need an "orderer"? A "formatter"? How many? How would they combine? How many collaborator classes would be needed?

In my case all of the newly introduced functions are one-liners:

class Array
  # OK, I lied: drop and reverse are zero-liners.  Lucky
  # coincidence?  I think not (remember shuffle).  If
  # you think this is cheating I'd like to point out
  # that this is how code reuse actually looks like.

  # Theoretically we don't need this, because the same
  # can be achieved by:
  #
  #   .reverse.drop(n).reverse
  #
  def drop_last(n)
    self.first(self.length - n)
  end

  # This is also not needed, because it's semantically
  # equivalent to:
  #
  #   .shuffle.drop(n)
  #
  def drop_random(n)
    self.sample(self.length - n)
  end
end

So in the end our methods turned out to be just convenience shortcuts. So much for charging the customer for our time…

Naming things

This might be a good time to revisit the "finding concepts and naming them" (27:00) part.

Back when the collaborator classes were introduced I could see how shuffling things around is a kind of ordering, but what the hell is EchoFormatter? How is it formatting anything? The only thing resembling formatting in the original House class is the line method… EchoFormatter as a name does not make much sense, if any at all. And the implementation is only one line, and it has nothing to do with the name. That should have been a sign for the author of the code that something is amiss.

But now that we have shuffle and echo instead of RandomOrder and EchoFormatter, we see they are not different concepts — they are both transformations! We could try and categorize the transformations we've written so far:

  • Changing the sequence of items (shuffle),
  • Modifying items (echo),
  • Generating items (duplicate),
  • Combining items (squash),
  • Eliding items (drop_first, drop_last, drop_random).

But in the end they are all just functions, working on a sequence of strings. There are no "orderers". No "formatters". Just functions, which, when done properly (i.e, not coupled), can be combined. And still no conditionals.

Testing

The presentation did not touch the topic of testing, which would probably be another talk like this. Except with more injection and mocking and stubbing and whatnot. Even thinking about testing this supposedly-proper-OO code makes me cringe. This article is already long enough, so I will not be talking about testing, but I'd like to invite you to think about how you would approach testing both versions of the code.

Summary

If I was to teach this stuff to a beginner programmer and I used the interface I came up with originally (using the shell syntax) I imagine they would not have trouble to take this example:

# Using "duplicate" because "echo" is already taken.
cat lines.txt | shuffle | duplicate | recite

which would correspond to the following code in the context of the talk:

puts House.new(orderer: RandomOrer.new,
               formatter: RandomOrder.new).recite

and turn it into something that first duplicates the lines, then shuffles them:

cat lines.txt | duplicate | shuffle | recite

In the context of the talk, would the following work?

puts House.new(orderer: EchoFormatter.new,
               formatter: RandomOrder.new).recite

Of course it would not, you silly! Formatter is not an orderer, and orderer is not a formatter (even though they do not differ in that they take an array of strings as a parameter, and return a new array of strings). How would anybody in their right mind call whatever this is composition?

This is the problem I referred to early in the article (end of unnecessary dependencies section): the EchoFormatter has an implied dependency (i.e., coupled to) the order of execution: it only works after the items have been ordered. This is not how composition works at all! Composition is when smaller things (i.e., building blocks) can be composed together to make bigger things. If the smaller things can only be put together in one way, they're not really smaller things: they're one big thing cut into pieces. Functions compose well. Objects usually don't5.

The EchoFormatter could be fixed by implementing it like we did in echo function. But the problem of composability would still be there: the House class still dictates that there can be only one "orderer", and one "formatter" and that "formatter" goes after "orderer." And if we want something else? We're screwed6!

I've said it before, and I'll say it again: bad abstractions are worse than no abstractions. Just drop your OO hammer and see the world for what it is! All the fun is in doing stuff, not having things.

Footnotes:

1

A not-so-subtle reference to "Turn your hamster into a fighting machine" by Jared Purrington.

2

One might argue that the different classes, even if their effect is the same, are still semantically different. But that's besides the point here because if we want semantically different behaviour we just use a different functions (with the semantics we need).

3

What, you did not read Kingdom of Nouns? This is your reminder to do so now.

4

In a non-throwaway code scenario I'd use Ruby's refinements, but it so happens that programming is hard, and tools change under one's feet, and irb cannot deal with using M statements, and org-mode barely can work with irb, and there is no support for pry. Basically everything is broken and I just want to be done with this article.

5

Especially when multiple inheritance, the approach that encourages creating re-usable classes (i.e., mixins), is dismissed outright.

6

As mentioned in the talk, inheritance is not a solution. But not because inheritance is somehow bad or wrong, but because the methods of the House class are too coupled.

Date: 2019-05-03