Data Assets for Australian Biodiversity

Ibra Imcra Observations
Overview of the proportion of species occurrence records in each IBRA and IMCRA region and each time period that derive from human observations (including field surveys, citizen science, etc.) in contrast to specimen-based data. These changes are important when interpreting time series.

Public access to data for environmental assessment

EcoAssets has been working with the authors of the 2021 State of Environment (SOE) report to develop new insights into Australia’s environment and biodiversity, based on the work of three of the NCRIS national research infrastructures: the Atlas of Living Australia (ALA), the Integrated Marine Observing System (IMOS) and the Terrestrial Ecosystem Research Network (TERN). These infrastructures have partnered with the Australian Research Data Commons (ARDC) to deliver an example of best-practice cross-domain data integration.

EcoAssets organised two primary categories of data to assist the SOE authors:

  1. Biodiversity: Evidence of the occurrence and distribution of Australian species throughout Australia since 1900
  2. Monitoring: Summary information on environmental observation and monitoring effort since 2016

This website will host public versions of these data assets, which will subsequently be updated and versioned at intervals to support longer time-series analyses. The first public release is for biodiversity data summarising recorded presence of any species in Australia’s terrestrial and marine regions. A subsequent release will share the monitoring summary data, and other data assets may be added in the following months.

Data for biodiversity assessments

TERN and IMOS each collect a wide range of data on ecological communities and the species that they comprise, such as vegetation and faunal surveys, data on the movements of fish and other marine animals, and standardised monitoring of reef life. EcoAssets uses the ALA’s biodiversity data platforms to integrate these rich data sources with thousands of other datasets on Australian biodiversity, including natural history collections, citizen science efforts and ecological research data. All of these sources make assertions that a particular species was detected at a specified time and place. Such data represent our shared knowledge of the composition of biodiversity in time and space.

This data aggregation is a treasure trove for understanding variation in species distributions and community composition across different regions and how these change over time. Researchers can approach these data to explore an nearly infinite number of questions. Some facets of these data are of particular importance for large-scale environmental assessments such as SOE, especially in regard to trends for threatened species (those that are of highest conservation concern) and introduced species (one of the most significant drivers of biodiversity change) and in regard to how well Australia’s protected area system is preserving the country’s rich biodiversity. The EcoAssets team therefore used ALA tools and national reference datasets to annotate every species occurrence record with the following properties:

  • EPBC threatened species status (one of Extinct, Critically Endangered, Endangered, Conservation Dependent, Vulnerable or Not listed)
  • GRIIS introduced species status (one of Invasive, Introduced or Native)
  • Bioregion where the species was recorded – using the IBRA and IMCRA bioregionalisations
  • CAPAD protected area category (simplified: one of Indigenous-managed Protected Area, Other Protected Area or Outside Protected Areas)
  • Forests of Australia 2013 and Forests of Australia 2018 status for each locality
  • Basis of record for each occurrence (important for understanding changes in how biodiversity have been recorded over time – see below)

EcoAssets has processed all available species occurrence data as counts of records that share the same species, coordinates and year of occurrence, and then has annotated every one of these counts with the listed properties. Since these data include exact locations for sensitive species, it is not possible to share this complete dataset publicly. However, summaries of these data can indicate variation and changes in the frequency of records matching any combination of these properties. For example, it is possible to determine whether the set of species recorded each year in every bioregion increasingly includes a larger or smaller proportion of introduced species.

The primary biodiversity data asset offered by EcoAssets is Aggregated Data: Australian Species Occurrences. This includes all data required for faceted exploration of Australian species occurrence data using the properties listed above. Six further data assets are also offered, each illustrating a use of these data.

The Summary Data: Threatened Species Occurrences by Terrestrial Ecoregion and Summary Data: Threatened Species Occurrences by Marine Ecoregion datasets summarise 1) the number of species occurrence records and 2) the number of distinct species recorded from each category in the EPBC threatened species lists in each bioregion in the first 70 years of the 20th century and then in each subsequent five-year period.

Similarly, the Summary Data: Introduced Species Occurrences by Terrestrial Ecoregion and Summary Data: Introduced Species Occurrences by Marine Ecoregion datasets summarise 1) the number of species occurrence records and 2) the number of distinct species recorded from each category in the GRIIS introduced and invasive species list in each bioregion in the first 70 years of the 20th century and then in each subsequent five-year period.

Finally, the Summary Data: Protection Status for Australian Terrestrial Species Occurrences and Summary Data: Protection Status for Australian Marine Species Occurrences datasets list all species recorded in each bioregion along with their EPBC threatened species status and counts of the 1) the total number of records of the species within the region, 2) the number of records from within protected areas inside the region and 3) the number of records from within protected areas under indigenous management. These data may indicate how well the protected areas in each region represent the biodiversity of the region, and in particular how important the protected area system appears to be for different species. Comparing the counts for individual species within a region with the totals for all species in the region may assist with compensating for spatial and taxonomic unevenness. For more fine

Data characteristics

Inevitably, any aggregation of many millions of records from a diverse range of sources will include a proportion of errors (particularly misidentifications of species and misinterpretations of verbatim data records) and will be uneven in coverage, completeness and methodology. EcoAssets seeks to compensate for this in several ways:

  • Records are excluded if they lack a precise date or coordinates or if the coordinates are inconsistent with other geographical information.
  • Records are excluded if the organism was not identified at least to the rank of species. This results in a bias towards taxonomic groups that are studied and for which most Australian species are named, but this bias will be consistent across regions and time periods.
  • The animated image at the top of this post shows how our evidence for the occurrence of different species has changed over time. Most of our data from much of the 20th century comes from museum specimens, whereas more recently the bulk of data is from “human observations” from ecologists, field monitoring efforts and increasingly citizen science. This shift has certainly changed the likelihood that some species will be recorded. Some species are typically only recorded by specialists working in collections. Some common species may be proportionately under-recorded in collections. The Aggregated Data: Australian Species Occurrences dataset separates records using the basis of record element as one of the included facets. This should allow users to make judgements on whether to include all data or only a subset of categories or alternatively to include overall variation in basis of record (as shown in the image) as an input to any analysis.