Thursday, October 9, 2014

Functions vs. Procedure

When I first learned Pascal there was a difference between a function and a procedure. A function was actually a function. For a given set of inputs there would only be one output. Procedures were different. They could be a set of instructions and for a given input there could be multiple outputs. As I moved on to C, Java, C#, and JavaScript this distinction was abandoned. It is, however, one of the most valuable distinctions in programming. Also it kind of gets tossed with object oriented programming because you have an implicit 'this'.

Okay, let's break down the notion a bit. There are two key pieces, only one of which I mentioned above. One is the notion of being a function in the mathematical sense. For a given input the output will always be the same. We can extend this to objects a bit and say, for a specific object state the output will always be the same. But the second piece of what computer guys call a function is "side effects". Does your function have impact on the world around it (or scope outside it in this case)? Theoretically you could replace a function with a hash table going from every input to every output. If you made this replacement would your program behave the same (ignoring memory and speed and all that real world stuff).

Now a completely different direction. Private and public. This notion is valuable because it gives information to programmers about how variables work. These notions are then enforced by the machine so you know they are true. (As a .Net guy I have to mention you can get around this in code so "true" in the last statement is a tiny bit fuzzy.) So these access modifiers are parts of the code that exist for humans to read. I saw a presentation by a guy named Rob Ashton who talked about how privates were unneeded and you should just use some discipline. Yes and no, privates are contracts that you are forced to keep. This forcing has value because it is a statement about a variable that you can trust. You can't trust humans to have the discipline Rob mentioned. So let me reach a conclusion. It is acceptable and right for computers to have syntax which they enforce that exists to give programmers information about the code.

Contexting switching back to functions. So if we use the ideas mentioned above we can divide 'methods' into four classes.

  1. Has side effects; Always does same thing with same inputs
  2. Has side effects; May do something different even given same inputs
  3. No side effects; Always does same thing with same inputs
  4. No side effects; May do something different even given same inputs
Quick examples: #1) These might be functions that output data. #2) These might be complex functions that string together behavior like sending out birthday announcements. #3) Addition. #4) Getting a random number or current date.

Programs that only have #3 type functions have some very nice properties. So one approach has been to make languages like Haskell, but side effects are needed so Haskell has monads which I do not yet quite grok. But Haskell is not used much outside academic circles yet. The restrictions can make it harder to code. 

What I really want is a way to specify that my function is a #3 function or a #4 function. A machine can enforce this. I also think that if you divided your program up and you pushed as much code into these kind of functions as possible your code would be better and MUCH more testable. You could then create a measure saying that X percentage of your code is in these functions

Why do I think these functions are good? One reason is testability and the other is reuse. The reality of tests is that they are fragile and hard to write when used on anything but functions of type #3. They are easy to write on #3 type functions. You just supply inputs and outputs and don't have to worry about testing side effects. You can test what the function does instead of trying to ensure that it continues to do what it is doing (which I think you don't want to do anyway). So most people feel testing is good and having the majority of your code in #3 type functions will make testing much easier and better.

The other reasons is reuse. Now reuse as a benefit is over touted, but it is valuable. If your functions don't have lots of side effects and behave in a consistent way you can reason about, then chances are you can reuse them. Also if you focus on isolating logic into #3 type functions it will expose reusable functionality. Let's imagine there was no map function. If you were repeatedly writing code involved with going over an array you might try to isolate that into something like a #3 function. That function would be map! You can reuse map! Maybe you won't publish a library (well, if you were the first person to make map you might want to), but you can have internal project reuse.

No comments:

Post a Comment