Abstract
Clustering of cumulative grainsize distribution curves was trialled with the publicly available software program CLARA as a means of finding sediment samples and geographical areas or parts of the geological record with highly similar particle-size characteristics. CLARA proved effective for this purpose. Tests were made with large datasets from the shallow-marine environments of Sydney Harbour (Australia), Oronsay (Inner Hebrides, Scotland), and Darss Sill (Baltic Sea). CLARA has four possible configurations depending on choices of distance metric and standardisation. One configuration identified outliers and small groups of samples most dissimilar from others, a very useful function. A second configuration clustered cumulative curves in a geometrical fashion similar to manual clustering. Compared to CLARA, an entropy algorithm was several orders of magnitude slower and did not identify outliers. When smaller numbers of clusters were requested, cumulative curves with strongly opposite curvature were grouped by the entropy algorithm, and by CLARA for non-standardised (but not standardised) variables. This potential problem is removed for CLARA by initially forming more clusters than suggested by statistical methods, in conjunction with outlier detection and removal. Entropy and CLARA clustering of frequency distributions provided the best resolution of size modes, and best separation of overlapping size modes, but clustering of cumulative curves provided better overall groupings. However, CLARA is not suitable for direct clustering of all conceivable frequency distributions, a problem not occurring with the corresponding cumulative curves.
Acknowledgements
I thank Carme Hervada I Sala for providing the Darss Sill dataset. Stuart Anstee, Adrian Baddeley, Brendan Brooke, Andrew Heap and Alan Orpin made suggestions which improved the original manuscript. Alan Orpin informed me of the Australian references to entropy clustering of frequency distributions.
Notes
*Figures 10 – 12 [indicated by an asterisk (*) in the text and listed at the end of the paper] are Supplementary Papers; copies may be obtained from the Geological Society of Australia's website (www.gsa.org.au) or from the National Library of Australia's Pandora archive (http://nla.gov.au/nla.arc-25194).