Introduction to Open Science

These materials are based off a Mentoring365 circle, held in November 2023 in the weeks leading up to AGU23 in San Francisco through AGU’s Mentoring365 program.

The content here will provide an introduction to the basics of Open Science and its benefits for individual researchers, particularly early-career researchers and students, and a grounding in managing your digital presence, getting started with data and software, and our AGU Publishing policy on sharing and citing data and software.

Circle Leads:
Kristina Vrouwenvelder (Open Science Leadership); Brian Sedora (Publications/Open Science Leadership) and Sophie Hanson (Publications)

Circle Agenda:

Week 1: Creating your Digital Presence

Week 2: Tips for Handling your Data and Software

Week 3: Get Credit for your Data and Software in Your Paper

Week 4: How to Share your Work so it’s Findable, Accessible, and Reusable!

Week 1: Why Open Science?; Managing Your Digital Presence

Why Open Science?

To get us oriented for this Circle, we want to introduce the concept of Open Science and AGU’s commitment to upholding these principles. Open Science seeks to broaden participation, increase access to scientific research, and overall, make science more inclusive. The free exchange of scientific data and information is necessary to accomplish these goals. AGU’s Position Statement on Data asserts that:

“Earth and space science data are a world heritage, and an essential part of the science ecosystem. All players in the science ecosystem—researchers, repositories, publishers, funders, institutions, etc.—should work to ensure that relevant scientific evidence is processed, shared, and used ethically, and is available, preserved, documented, and fairly credited.”

For further information, feel free to read more about AGU’s Position Statement on Free and Open Science.

Why is Your Digital Presence Important?

This week we will be discussing how to create and manage your digital presence. Your digital fingerprint is how the global scientific community finds, perceives, and interacts with your research and work. While anyone on the internet can find you, this is especially important for other researchers, funders, societies, associations, and potential collaborators both inside and outside of academia. Ensuring that you and your research are discoverable makes it more likely that your work will be cited and provides more opportunities for potential collaborators to connect with you.


Activity: To get you thinking more about your digital presence, try Googling yourself! What do you find? What matches your professional profile? Is there anything that does not match that profile? Can other members of the scientific community easily find your research products (papers, datasets, software uploads, etc.) and connect them to you?


Using an ORCID to Curate Your Digital Presence

Managing your digital presence is vital to your research being seen. One powerful tool to distinguish yourself online is by creating and updating an ORCID. An ORCID is a unique persistent digital identifier that links your professional information to your research and research products.


Activity: Create or Update Your ORCID!

Go to https://orcid.org and select “For Researchers.” Your digital ID - your ORCID - can be included on everything you do: papers, datasets, presentations, posters, software uploads. Anything and everything you can think of related to your professional presence. This is an incredibly helpful resource for increasing the discoverability of your work and ensuring you are credited when others use your work. Check out this Digital Presence Checklist and YouTube tutorial (slides for reference linked here) for more information about establishing an ORCID and connecting your research work using your ORCID.

If you already have an ORCID, this would be a good opportunity to optimize your usage by turning on Auto-Updates and verifying that your information and research products are up-to-date. These updates come from two trusted publishers: CrossRef (for published manuscripts) and DataCite (for datasets and software). See this blog post for further details about how to set up automatic updates.


Week 2: Getting Started with Data and Software

Last week, you all learned about practicing open science and why it’s important and had a chance to build your own digital profiles through an ORCID.

This week, we’ll focus on your data and software, why sharing is important, and some best practices for sharing!

Data

Why share your data?

  • Papers that cite data are up to 25% more likely to be cited by others!
  • Sharing your data makes your work more reproducible and transparent.
  • Your data is valuable – and could enable scientific discovery beyond your own work! (One AGU Publications example: we’ve seen papers published in 2016 citing a dataset from the 1980s!)
    For more reading, check out: Colavizza et al, (2020) PLoS ONE 15(4): e0230416; McKiernan et al. eLife 2016;5:e16800.

How to share your data:

  • Try to include all the data someone would need to reanalyze your work! This means you should include not just the raw data – but information about any data processing you did, your experimental method, and other descriptions needed to understand your data.
  • Choose a good place to store your data. Ideally, your data storage location should be easily findable and accessible and permanent, so your data can be part of the scientific record.
    • Domain repositories specific to your scientific field are easily findable by your colleagues and often tell you exactly what kind of data description you need to enable reuse of your data.
    • Generalist repositories can also be good solutions, but make sure you include detailed descriptions of your data.
  • Cite your data in your research publication – so that your analysis and your data are linked AND you get credit for sharing your data! This means it’s best to store your data in a place where it can receive a DOI, just like your papers. (We’ll talk more about data citation in your publications in week 3!)

For more tips on best practices on sharing your data, check out the following resources:

Good Data Practices - Dryad

Checklist for Managing your Digital Objects

List of Useful Domain Repositories by AGU Journal

Data and Software Sharing Guidance for AGU Authors

What kind of data do you use? Does your funder or publisher require you to share your data? Start planning early for data sharing!

Software

What do we mean by software? This could include…

  • software or code that you used for analysis and visualization of the data
  • software or code used to produce a model output

Which software should I be sharing?

Of course, not all software can be shared alongside your paper. If you’re using proprietary programs for analysis, you likely won’t be able to share them, but you should mention these proprietary programs in your Methods section or Availability Statement. However, if your work depends on scripting in Python, R, or another scientific programming language, and/or creates or builds off an already existing model or analysis package, you should…

  • If it’s your code: publish it in a repository like Zenodo so it can be used by others
  • If it’s someone else’s code: cite their code, including the exact version!

Developing and Documenting Your Code

A lot has been written about good practices for developing software and code, and we won’t repeat them all here. However, there are some resources we’d like to share that can help make sure your code and software are ready to share with others:

  1. Version control: when developing code, you’ll quickly find out that it’s an iterative process – sometimes you’ll break something and need to roll back your changes, or maybe you’ll add on something for one project that’s not relevant for another project. In these situations, and in particular if you’re working on code collaboratively with your team, you’ll find it very useful to employ a system for tracking changes in your code. Many researchers use Github as a platform for this. We recommend the Software Carpentries lesson on getting started with Git for an introduction to version control.
  2. Documentation: writing code that works is one thing – but you’ll quickly find that to share your work effectively with others, and even to go back to your own old code and understand it, you’ll need to add textual documentation. This often takes the form of headers, in-text comments, and README files that explain important information about how your code works. Conventions for documenting code can depend on the programming language, but a good rule of thumb is to imagine you’re explaining how your code works to a colleague. Get started with more resources in this guide.
  3. Sharing: Documenting your code explains the technicalities of your programming, but to share your code with other scientists, you may need to include other information alongside your code, like scenarios when your model can be applied or conditions for initial parameters. Once you’re ready to publish your code – likely because you’re ready to publish the scientific article that code was for – we recommend sharing your code in a repository such as Zenodo. If you’re using Github for code development, you can issue a DOI (persistent identifier) to your repository directly using Zenodo. Check out this tutorial for more information.

For more tips on best practices on sharing your software, check out the following resources:
Guidance for AGU Authors - R Scripts and Markdown

Guidance for AGU Authors - Jupyter Notebooks

Week 3: Getting Credit for your Data and Software: AGU Journals Data and Software Guidance for Authors

Last week, we discussed sharing your data and software and why it’s important. This week, we’ll focus on getting credit for your data and software in your paper. We’ll be using AGU’s requirements for data and software sharing, but these will apply to many publishers, and will be helpful as you think about how you will share your data and software early in the writing and submission process.

Did you know? Citing your data and software gives you similar benefits to citing your published papers and allows you to get credit for reuse of your data and software.

AGU’s Policy on Data and Software

AGU requires that the underlying data and/or software needed to understand, evaluate, and build upon the reported research be available at the time of peer review and publication. Additionally, authors should make available software that has a significant impact on the research.

This is achieved by following three requirements:

  • Depositing the data and software in a community accepted, trusted repository, as appropriate, and preferably with a DOI

  • Including an Availability Statement as a separate paragraph in the Open Research section explaining to the reader where and how to access the data and software

  • Including citation(s) to the deposited data and software, in the Reference Section.

Next, you’ll craft a sample Data Availability Statement for your own dataset or a paper in your field using AGU guidance.

Writing a Data Availability Statement

An Availability Statement, located in the Open Research section of a journal article, or at the end of a book chapter, contains information about your data, software, and other research objects (e.g. notebook) and how readers can access these. A good Data Availability Statement contains the following elements:

  • A brief description of the type(s) of data or software
  • Repository Name(s) where they are deposited
  • DOI (Persistent Identifier) [required]; or, if no DOI is available, Link to Data or Software
  • In-text citation in References [required for all data and software with DOIs]
  • For Software: Version and Link to publicly accessible development platform (E.g. GitHub)
  • Access Conditions (e.g. if Registration is Required)
  • Licensing/Permissions (e.g. Creative Commons Attribution)

Example Availability Statements:

Data:

The [type of data] data used for [brief context, description] in the study are available at [repository, source name] via [DOI, persistent identifier link, OR URL if no persistent identifier is available] with  [license, access conditions] [in-text citation in References, required for each DOI]

Software:

[Version number] of the [software name] used for [brief context, description of what the software was used for] is preserved at [DOI, persistent identifier link, OR URL if no persistent identifier is available], available via [license type, access conditions] and developed openly at [software development platform link]. [in-text citation in References, required for each DOI]


Activity: Availability Statements!
Write a sample Data Availability Statement for your own dataset or a paper in your field (using AGU guidance), or if you have code you plan to share, write a Software Availability Statement!


Example Data and Software Citations:

Data:

Edmunds, P. J., Didden, C., & Frank, K. (2021). Mean percentage cover of corals and Porites astreoides at each site by year at St. John, VI from 1992 to 2019 (Version 1) [Dataset]. Biological and Chemical Oceanography Data Management Office (BCO-DMO). https://doi.org/10.26008/1912/BCO-DMO.843284.1​

Software:​

Shobe, C. (2023). Code and data for “The uncertain future of mountaintop-removal-mined landscapes 1: How mining changes erosion processes and variables” (v1.0) [Software]. Zenodo. https://doi.org/10.5281/zenodo.10059514

Comments, Suggestions and Contact:

Thanks for visiting our Open Science course!

If you have comments, suggestions, or questions, email Kristina Vrouwenvelder.