Generally speaking, classification is the process of producing a logical system of categories, which allows easier access to its components (Stace 1989). The aim of classification in biology is to group organisms together into a hierarchical structure on the basis of shared homologous characters, in such a way as to reflect their evolutionary history. Classification involves bringing together all available evidence; this can include molecular, chemical, ultrastructural and behavioural data that are not visible to the casual observer. Unfortunately, the term classification is often also used to refer to the process of establishing the identity of an object - which in biology is strictly referred to as identification (= the naming of an organism by reference to an existing classification).
ADIAC is of course about identification, which is the process whereby the identity of an organism is established only on the basis of the information available at the time -- in the case of ADIAC, on the basis of valve morphology as seen through a light microscope. This means that chemistry, molecular sequences, life history and even ultrastructure play no part in identification in this context; evolutionary history is irrelevant. What matters is the identity of the diatom, established by whatever means are available. In this sense identification transcends classification, since the identity of a diatom exists separately from the classification of which it is a part. So how do diatomists identify diatoms, and what can we learn from the process for ADIAC?
Diatomists use several methods to identify diatoms. First, by simply matching an unknown diatom to a picture of a known diatom. This method is very widely used, but actually requires considerable understanding of how variation works in order to be able to interpret shape and size variation, since no specimen is going to be exactly like the reference picture. The second method diatomists use for identifying unknown diatoms is by working through a dichotomous key (equivalent to a decision tree in computing terms), consisting of a series of either/or questions relating to the morphology of the unknown diatom. Except at the start of the key, the pair of questions presented depends on the answer to the previous pair. For example, if the first dichotomy is: "raphe present/raphe absent", any subsequent question about the position of the raphe only makes sense if the first question was answered "raphe present". This is potentially a highly efficient means of identification, requiring, on average, many fewer decision steps than simple matching, but in practice there are many problems. A mistake early on in the decision tree will inevitably lead to a wrong identification. Mistakes can result from misobservations or misinterpretations, from the key itself (if it is poorly constructed, i.e. illogical), or from the innate variability of biological organisms. In practice, most diatomists identify diatoms using a combination of picture matching and character matching (from descriptions of dimensions, etc.); this is overall the most effective approach; identifications are made on the basis of the best match of characters, and disagreement in one does not necessarily invalidate the identification. This is the approach taken in multi-access keys, in which each character is numbered in a table with a list of coded character states. The unknown diatom is scored for each character and a multi-digit code is produced with the same number of digits as characters scored, each digit being the code for the state of the character corresponding to the position of that digit. For example the "2" in the code 00241 signifies character 3, character state 2 (which, by reference to the table, could mean "raphe running the full length of the valve"). The numeric code for the unknown diatom is then looked up in the keys' index, in which the name of the diatom or diatoms with that code are given. If a character cannot be scored (or is scored incorrectly), it is still possible for an identification to be made if the number of hits on the basis of the other characters is large enough.