Coding Standards - General Comments

#!rst

#### ========Coding Standards
#### ========

About
-----
Coding standards are one part of a development protocol. Other parts include
unit testing and code reviews. This document covers only coding standards.


Why?
--------------------------

Following a coding standard is like handwashing in a hospital:
both require discipline. Following the
protocol takes more time than ignoring it, and it's pretty difficult to
associate a particular negative incident (disease transmission or a software
bug) with a particular instance of failing to follow the protocol. 
Nevertheless, every incidence of corner-cutting increases the
probability of a bad outcome somewhere down the line.

Development protocols attempt to avoid bad outcomes by reining in software
complexity. Entropy kills projects, and the Second Law of Thermodynamics is as
true in the software world as it is in the natural world. Perhaps you've heard
the maxim that the first 90% of a project takes 90% of the time and the last
10% takes the other 90% of the time. That doesn't have to be true, but it
often is. On some projects, that last 10% of development is like a game of
whack-a-mole. Smack a bug here, another pops up there.

In fact, if a project is messy enough the last 10% never gets completed. All
effort gets sucked into fixing bugs and inadvertently creating new ones.
Eventually one faces the choice of shipping something that's only 90% complete
or not shipping at all.

Those are some really bad outcomes. They can be avoided, but only by deliberate
action. **The only thing you get without effort is entropy.**

That said, here's the most obvious pressures on this project that call for
software development rigor.

- This project is being written by a team of 4+ people who are across the 
  country from one another.

- Only one of them is strong in the project's primary language (Python).

- This project will subsume GAVA, Vespa and Matpulse. Software complexity
  usually grows exponentially in relation to size, so this project's complexity
  will exceed not only the individual projects but also the *sum* of the
  individual projects. That's a lot to manage!

- A larger project needs a long lifespan to justify the effort put into it,
  and a longer lifespan increases the odds that (a) someone totally new will
  join the project and need to understand the code and (b) the code will need 
  to be modified and/or expanded in the future.

- The more people involved, the greater the odds that others will read, use
  and modify code that you write.

- The end result needs to be clean enough to encourage outsiders to
  contribute.


Words of Wisdom from the Masters
--------------------------------

  "Controlling complexity is the essence of computer programming."
  
  -- Brian Kernighan

  "Let us change our traditional attitude to the construction of programs: 
  Instead of imagining that our main task is to instruct a computer what to 
  do, let us concentrate rather on explaining to human beings what we want a 
  computer to do."
  
  -- Donald Knuth

  "Readability counts." 
  
  -- Tim Peters in PEP 20.

This last quote is about design (aircraft design, actually) rather than code, 
but it is one of my favorites.

  "It seems that perfection is reached not when there is nothing left to add, 
  but when there is nothing left to take away". 
    
  -- Antoine de Saint Exupery


It's Not About You
--------------------

The guidelines below are intended to help you write code that's easier 
for others to work with. They're not about making *your* life easy.
If you think about it, that makes sense: there's a lot more of them
than there are you. 

Be kind to them! They, in turn, will be kind to you.

And you never know, five years down the road it might be you 
who has to read that long-forgotten code. You'll be glad, then, that
you considered the reader when you wrote it.

  

#!rst

In General
----------

- `Magic numbers <http://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants>`_ 
  are unacceptable.
  
- As a generalization of the above, 
  `DRY <http://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_ is a 
  valuable concept.

- Comment your Subversion commits.

- Avoid abbreviations in variable, function, file and class names. There's
  usually more than one "obvious" way to abbreviate a word or phrase, so if 
  you're not
  the author of the code (or sometimes even if you *are* the author of the
  code) it's hard to remember what abbreviation was used.

  For instance, if you're looking at a variable representing "metabolite 
  description", the author could name it metabolite_desc or metabolite_descr 
  or mdescription or m_desc or mdescr or md. Python requires a bit more 
  care in this area than compiled languages (like C) since compilers complain
  about undeclared variables whereas Python will happily accept something like 
  this:
  

#!python
    
    # Code added by person A
    mdesc = [1, 2, 3]

    # ...several pages of code here...

    # Code added by person B months later -- see the bug?
    if erase_previous_data:
        mdsc = None

#!rst
  There's also the benefit that longer variable names help to document the code.
  The name `mdesc` could mean "mule desecration" for all I know, whereas
  `metabolite_description` carries meaning.

  Yes, using unabbreviated variable names makes it harder to respect PEP 8's
  recommendation of limiting lines to a maximum of 79 characters.

  Standard abbreviations are acceptable, like *fft* for Fast Fourier
  Transform, or *ppm* for parts per million. Obviously, "standard" is a weasel
  word that doesn't really say what's OK and what's not. There's no hard and
  fast rule; we'll have to judge on a case-by-case basis.

  Here's some questions to ask when you're trying to decide whether or not an 
  abbreviation is OK --
  
  - Does the abbreviation appear more commonly than the expanded form?
  - Is my audience (i.e. those reading the code) likely to be familiar with 
    the abbreviation?
  - Will I save a lot of typing by abbreviating?
    
- Don't be shy about using parentheses to clarify operator precedence. e.g.

  This works:
  

#!python  
     z = something * PI - something_else / FUDGE_FACTOR
  

#!rst
  This works and makes your intent clear:
  

#!python
     z = (something * PI) - (something_else / FUDGE_FACTOR)
  

#!rst
- Don't put redundant information in names. For instance, in a Person class it
  is unnecessary to call the attributes ``person_name``, ``person_address``, 
  etc. Simply  use ``name`` and ``address`` instead. Similarly, 
  if a file is part of the Analysis
  project, there's no reason to name the file ``analysis_utilities.py``. Just
  ``utilities.py`` will suffice. 
  
  As a bonus, the simpler name will still make
  sense if the project's name 
  changes or is merged with another project.

- All of our source code should be straight ASCII. Be careful about copying &
  pasting text from MS Word that contains curly quotes or em/en dashes.

  If you're ever confronted with a choice as to what non-ASCII encoding to 
  use, choose utf-8.

- Always use / as the path separator. Microsoft operating systems accept both 
  \\ and / (since DOS 2.0 `according to this discussion
  <http://bytes.com/groups/python/23123-when-did-windows-start-accepting-forward-slash-path-separator>`_).
  It's only the DOS command 
  line that hiccups on /. By contrast, backslash as a
  path separator only works under Windows and is an escape character in Python
  strings.

- If you come across (or write) some code that is or may be broken, fix it. If
  the fix isn't obvious or you don't have time, add a comment containing the
  string FIXME (no space!) in the comments and a brief explanation of what you
  think is wrong. e.g.
  

#!python    
        if film == HOLY_GRAIL:
           bring_out_your_dead()
        elif film == LIFE_OF_BRIAN:
           look_on_bright_side()
        elif film == HOLLYWOOD_BOWL:
            albatross()
        # FIXME - need an else statement; how to handle unexpected cases?
  

C and C++

C++ coding standards in detail: CppCodingStandards

Python

#!rst

Python 
------

- `Duck typing <http://en.wikipedia.org/wiki/Duck_typing>`_ is an important 
  and valuable concept in Python that can feel strange if
  you're used to statically typed languages.
    
- The corollary -- if you find yourself using ``type()`` or 
  ``isinstance()``, that's usually a sign of unPythonic code. 


- Our project will require a minimum Python version of 2.5, so any language
  features (like the ternary operator) or libraries (like sqlite or ctypes) that
  are in 2.5 are fair game.

- If you're new to Python, use an editor with decent code highlighting so that
  it tells you when you're using a Python keyword as a variable name.

- `PEP 8 <http://www.python.org/dev/peps/pep-0008/>`_
  is worth following. The main
  things to remember are CamelCase for class names and lower_with_underscores
  for variable names. Filenames should be all lower case since the filesystems
  on some of our target operating systems are not case-sensitive.

  Note that PEP 8 observes, "The naming conventions of Python's library are a
  bit of a mess...". It's true! The standard library is unfortunately not always
  a good example to follow.

  `PEP 20 <http://www.python.org/dev/peps/pep-0020/>`_ is also worth a 
  read as it's really short.
  
- Never use the idiom ``from some_package import *``. It has a couple of
  disadvantages. For one, it clutters up your local namespace and can even lead
  to one module stepping on another's variables.

  The other huge disadvantage is that it makes one's code difficult to read. 
  If the code
  imports * from, say, five modules and then calls a function ``foo()``, 
  the person reading the code has to guess if the function is local, and
  if not, then which one of the five imported modules contains it.

  This is also true to a lesser extent for ``from some_package import xyz`` where
  xyz is a function. If I see a call to ``xyz()`` in the code, I have to look 
  around
  to see whether it is a local function or an imported one. By contrast, when I
  see ``some_package.xyz()`` in the code, I know exactly where that function comes
  from.

  If you find that you're importing some package with an inconveniently long
  name, make use of Python's as keyword:
  

#!python
   import xml.etree.ElementTree as ElementTree
  

#!rst
  Be mindful of creating obscure abbreviations, however:
  

#!python  
   import some_complicated_math_library.curves.splines as sp
  

#!rst  

- Python booleans are True and False, not 1 and 0. Be aware of this when you're
  porting code from languages that don't have a native Boolean type.
  Some examples include IDL, C, Fortran and possibly Matlab. They usually 
  use 1 and 0 to represent
  true and false. (C++ has a native boolean type.)

  Note that it's OK to treat 1 and 0 as booleans in expressions, just don't 
  *assign* them as booleans.

  For instance, if a variable (received from a C function for instance) has 
  a value of 1 or 0 it is perfectly acceptable to do this:
  

#!python
    if some_c_library.function_that_returns_one_or_zero():
       do_something()
  

#!rst
  It would be unPythonic, however, to do this:
  

#!python
    def on_foo_checkbox_clicked():
       self.foo_is_on = 1  # should be True, not 1
  

#!rst
  As a specific application of duck typing, it's usually unPythonic to 
  explicitly test for True and
  False. Note that all of these evaluate to False:
  

#!python
        bool(None)
        bool("")    # empty string
        bool([ ])   # empty list
        bool(( ))   # empty tuple
        bool({ })   # empty dict
        bool(0)
  

#!rst
  All of these evaluate to True:
  

#!python
        bool(n) where n is a non-zero number
        bool(s) where s is a non-empty string
        bool(z) where z is a non-empty iterable (tuple or list)
        bool(m) where m is a non-empty mapping (dict)
        bool(o) where o is an object other than None 
  

#!rst
  Historical note: the values True and False weren't added to Python until
  sometime in the 2.x series (2.2 I think) so you might see some Python code --
  esp. Python library code which must remain compatible with very old 
  versions -- using 1 and 0 instead of True and False.

- To prepare for Python 3.0, we need to `explicitly use "true" 
  division <http://www.python.org/doc/2.2.3/whatsnew/node7.html>`_.

  In order to do so, we need to add this to every module that uses division:
  

#!python  
    from __future__ import division
  

#!rst  
  And then we need to review the use of division in those modules 
  to ensure we're not breaking them.

  We can either pay this cost now, or pay it later when we want to move to
  Python 3 and there's a lot more code to review and fix.


- Python 2.2 introduced improved classes; these are called (rather 
  unfortunately) "new"-style classes. Old-style classes are gone completely 
  in Python 3. Our classes should always be new-style classes. To create a 
  new-style class, inherit from object. e.g. this:
  

#!python
    class TransformThingy(object):
  

#!rst
  not this:
  

#!python
    class TransformThingy():
  

#!rst

- Python has the identity operator "is". It means "are these objects the same
  object" rather than "are they equivalent". The only time you'll probably need
  to use it is when comparing something to None.
  

#!python
       if foo is None:
           do_something()
  

#!rst

  Since we prefer to perform simple boolean tests, the need to check explicitly
  for None (as opposed to False) might indicate a problem somewhere upstream, as
  this would be better:
  

#!python
      if not foo:
         do_something()
  

#!rst

  Sometimes an explicit test for None is unavoidable, however.

  In short, the admonition against "is" is similar to that against 
  ``isinstance()``, although less strong. If you find yourself using it, it's 
  often a sign of a design flaw.


- Don't underestimate what you can learn from testing concepts in the Python
  interpreter. For instance, if you can't remember the rules
  for taking a slice of a string from the end, try it out in the Python
  interpreter:
  

``` #!python $ python Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. »> "abcde"[:-2] ‘abc' »>

}}}