AGU’s Paleoceanography and Paleoclimatology - Data Submission Quick Guide

Authors: Sarah Feakins, Pallavi Anand, Alex Farnsworth, John Higgins, Matthew Huber

Explanation of data preservation as part of journal publication

AGU’s Paleoceanography and Paleoclimatology adheres to FAIR Guiding Principles for scientific data management and stewardship , established in 2016 to improve the Findability, Accessibility, Interoperability, and Reuse of data. The principles emphasise machine-accessibility, seeking to advance knowledge through reuse of data, reproducibility of results, and to enable efficient, interoperable re-analysis as data volumes increase. Learn more here.

In accordance with FAIR principles, AGU’s Paleoceanography and Paleoclimatology supports and generally requires the archiving of the research data in a suitable repository. Depositing the research data into a repository is a necessary step in the submissions process and thus compliance with this requirement is part of the review and editorial process.

A paper will not generally be accepted until the data can be verified as accessible in a repository that supports discovery, preservation and citation. AGU suggests that when identifying the most appropriate repositories for your data, to consider the following prioritization: a) a repository that specializes in the data for your scientific domain as this will maximize the probability that the deposited data will be interoperable and reusable; b) a national repository if required; c) an institutional repository or computing center adhering to federal standards, or d) a general repository.

A key requirement–which may not be met by all repositories–is that the data must be verifiably present before the paper is accepted. Some repositories offer access controls which enable the data to be viewed on a limited basis and that may be an important factor to consider when submitting to a repository. In addition, please check with funder guidelines for any specific additional requirements.

In parallel to these archival considerations, authors are required to submit data necessary to review, evaluate, and (if appropriate) reproduce the study results when the manuscript is submitted to Paleoceanography and Paleoclimatology. This can be done as supplemental material in the initial submission, but this supplemental material is never a substitute for archiving the data in a repository as a condition of acceptance.

Preferred option: In many cases it will be most efficient to upload the data to a repository which allows revisions/versioning and has access controls (such as passwords) upon first submission, since this will reduce delays later on.

Non-Preferred option: Authors can still submit their data as a supplement upon initial submission, but the paper will not be accepted until the data are in a repository. This may introduce delays during the revisions and acceptance process.

AGU’s Paleoceanography and Paleoclimatology - Data Submission Checklist

Author checklist:

Check	Manuscript	Data action required
	Preparation	Author to locate repository; generate dataset to repository specifications
	Submission	Preferred: Author to submit data to repository prior to manuscript submission and to ensure reviewers have access to data through a link (may be temporarily password-protected from public view until manuscript acceptance) non-Preferred:Author submits data as supplement and indicates what repository the data will be submitted to.
	Revision	Author to ensure data housed permanently at a repository; include URL/DOI in manuscript
	Acceptance	Author to notify repository to publicly release data
	Proofs	Author to ensure data are linked and available

Choice of repository for Paleoceanography and Paleoclimatology data

Proxy reconstruction data. The most common domain-specific choices for AGU PP authors to date are NOAA NCEI (USA) and PANGAEA (Germany), as well as specialty repositories (e.g., for chronology, fossils, large computational datasets etc.), or general repositories (i.e., venues that may accept data on any topic and format, e.g., Dryad, Zenodo, and Figshare). Resources for finding repositories include:

Large climate model files. Please see AGU’s common data sharing guidance document. Additional guidance is available from EarthCube RCN, “What About Model Data?” Determining Best Practices for Preservation and Replicability. See below for AGU Paleoceanography and Paleoclimatology data guide for climate model simulations.

Additional considerations when choosing a repository? Choices may be determined by funding agency, national or institutional mandates; they may develop through evolving community standards; journal recommendations or otherwise may be author choice. Authors are advised to consider benefits of platform familiarity, archival stability and search features - all of which contribute to data visibility and accessibility. Authors should ensure their data submission complies with the requirements of the repository and journal expectations (it may help to consult recent journal publications and data deposits and journal publications for relevant examples). Authors should note that the review process includes assessment of data, therefore access to data is essential for review.

FAQs

What needs to be archived? At a minimum, the processed data, analytical code, spreadsheets, workflows necessary to reproduce the results shown in the paper. As a rule-of-thumb: the minimum requirement is whatever is necessary to reproduce the figures, tables, and interpretations in the paper. Archival of raw and intermediate data may not be necessary to meet this requirement and may not be feasible to provide which may set the minimal limit. On the other hand the more data (including code, workflows) that can be archived the greater the long-term potential utility, so we encourage archiving more than the bare minimum.

Is there any scope for expansion of archived data? We recognize the value of archiving all components needed for scientific reproducibility and accessibility, including very large files (e.g., raw instrument or model output), where checking and reuse may be helpful to the community. If repository infrastructure for handling and storage of such data types does not yet exist, researchers are advised to attend to evolving community standards and to make use of any available institutional data storage in the meanwhile, while repository capacity develops.

Are there any allowable exceptions to archiving? Any exceptions sought to data publishing, (e.g., government or commercial data or software restricted by patent, policy or law), should be discussed with the editor and decisions will be handled on a case-by-case basis.

Are there costs associated with data publication? Some repositories guarantee free deposition and storage in perpetuity, others may require fees to archive data now or in the future. These costs are not covered by the publisher except when using Dryad for NSF funded research.

When data revisions are needed. Authors are responsible for updating any archived datasets e.g., for errata, new data generated at request of reviewers. Commonly this results in a newly registered DOI. Please ensure the availability statement and citation are also updated with the new DOI.

How is data peer-reviewed? The evaluation of whether data “necessary to reproduce the results shown in the paper” are made available is part of the peer-review process and is adjudicated by the editor. We thank authors and reviewers for their attention and adherence to best-practice in data management and reporting.

Still have questions? Contact AGU Paleoceanography and Paleoclimatology paleoceanography@agu.edu.

Climate Model Data

In addition to AGU’s common data sharing guidance document (including numerical models), we provide journal-specific guidance for data publishing compliance for model simulations.

Model code.
- If the model code (e.g. an empirical model created for the paper) used to run simulations or create data is small (<10 MB) and unpublished, the code should be uploaded to a repository.
Citation of the model (most important).
- BEST OPTION (model in repository): Cite the model using a repository that registers the version used for the paper with a persistent identifier (e.g., Digital Object Identifier) and metadata that describes the model using community standards. If a published paper has the complete description, there should be a link in the repository to the published paper. If the paper does not have a complete description then the primary model paper(s) should be also cited. Your citation should accurately capture the authors/creators of the model.
- GOOD OPTION (model described in paper): Cite the publication where the model is described with information about the version used for this paper.
Description of the model.
- Include a description of the model in the text of the paper that is adequate to support reproducibility. If a publication describes the model thoroughly, cite that paper while also describing any additional modifications made to the model and/or setup.
Information about the configuration/parameters used to run the model.
- This information should be included in the paper text as well as providing any script/workflow used. The script/workflow should be preserved in a repository and cited. Any forcing datasets used should be described and cited.
- Ancillary file generation. Any files newly created and used within the model should be fully described. The raw data used to generate these files should be cited with any data processing procedures on the raw data reported.
Data that Supports the Summary Results, Tables and Figures.
- BEST OPTION: Cite a package in an appropriate repository that includes scripts/workflows, provenance information, and summary files that support the research, figures and tables, consistent with archives maintained for transparency and traceability by assessments such as the Intergovernmental Panel on Climate Change (IPCC).
- GOOD OPTION: Cite files (e.g., scripts, descriptive detail) in an appropriate repository that support evaluating the research and provide the details behind the tables and figures.
- ACCEPTABLE OPTION: Provide the necessary information for transparency and traceability of the analysis using your community standards or guidance.
Model Output Data (optional).
- BEST OPTION: If certain raw model output data are instrumental to evaluating the research, then deposit these in a trusted repository. There are currently limited resources for preserving files of very large size. Where possible synthesise that data. Selecting representative output from one or a few model runs as is recommended by a specific community may be necessary. Data formats should be in formats that are commonly used among the community (e.g. .netcdf, .dat, .xls).
- GOOD OPTION: Processed data that has been transformed/derived from raw data formats used for specific analyses should be provided using similar community reporting standards and formats.

If the model is not open because of the sensitivity of the research or proprietary concerns, then provide as much information as possible to support evaluation of the research and responsibility, and see the FAQs Are there any allowable exceptions to archiving? above.

Still have questions? Contact AGU Paleoceanography and Paleoclimatology paleoceanography@agu.edu.