Publication Cover
Australian Journal of Earth Sciences
An International Geoscience Journal of the Geological Society of Australia
Volume 54, 2007 - Issue 4
158
Views
12
CrossRef citations to date
0
Altmetric
Original Articles

Clustering of cumulative grainsize distribution curves for shallow-marine samples with software program CLARAFootnote*

Pages 503-519 | Received 12 Aug 2005, Accepted 23 Aug 2006, Published online: 18 Jun 2007
 

Abstract

Clustering of cumulative grainsize distribution curves was trialled with the publicly available software program CLARA as a means of finding sediment samples and geographical areas or parts of the geological record with highly similar particle-size characteristics. CLARA proved effective for this purpose. Tests were made with large datasets from the shallow-marine environments of Sydney Harbour (Australia), Oronsay (Inner Hebrides, Scotland), and Darss Sill (Baltic Sea). CLARA has four possible configurations depending on choices of distance metric and standardisation. One configuration identified outliers and small groups of samples most dissimilar from others, a very useful function. A second configuration clustered cumulative curves in a geometrical fashion similar to manual clustering. Compared to CLARA, an entropy algorithm was several orders of magnitude slower and did not identify outliers. When smaller numbers of clusters were requested, cumulative curves with strongly opposite curvature were grouped by the entropy algorithm, and by CLARA for non-standardised (but not standardised) variables. This potential problem is removed for CLARA by initially forming more clusters than suggested by statistical methods, in conjunction with outlier detection and removal. Entropy and CLARA clustering of frequency distributions provided the best resolution of size modes, and best separation of overlapping size modes, but clustering of cumulative curves provided better overall groupings. However, CLARA is not suitable for direct clustering of all conceivable frequency distributions, a problem not occurring with the corresponding cumulative curves.

Acknowledgements

I thank Carme Hervada I Sala for providing the Darss Sill dataset. Stuart Anstee, Adrian Baddeley, Brendan Brooke, Andrew Heap and Alan Orpin made suggestions which improved the original manuscript. Alan Orpin informed me of the Australian references to entropy clustering of frequency distributions.

Notes

*Figures 10 – 12 [indicated by an asterisk (*) in the text and listed at the end of the paper] are Supplementary Papers; copies may be obtained from the Geological Society of Australia's website (www.gsa.org.au) or from the National Library of Australia's Pandora archive (http://nla.gov.au/nla.arc-25194).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.