Chapter 8 Resources
Eager to dive deeper into collaborative coding? Here are some more great resources that delve into specific areas we’ve overviewed during this workshop.
8.1 Data Wrangling
Data Cleaning and Quality Control (Environmental Data Initiative): A great overview of best practices in organizing data sets, arranging data tables, and defining levels of data processing (Note: Focus on environmental data, but principles apply to any discipline).
Metadata and the Ecological Metadata Language (Environmental Data Initiative): Introduction to Ecological Metadata Language with tools for creating EML-formatted metadata (Note: Focus on environmental data).
8.2 Data Processing
Data pipelining in R using the Targets package (Targets for Ecologists): A step-by-step walkthrough guide for using the data pipelining tool “Targets” in the R programming language. Data pipelining is a great way to ensure reproducibility and speed up the generation of results by setting up the computer to keep track of all the steps in your analysis and re-running only those that are alterred when you make a change to the code.
R doParallel: How to Parallelize R DataFrame Computations (Radečić 2024): Overview of the doParallel package in R and its ability to speed up data analyses by running in chunks of data in parallel.
8.3 Git/GitHub and Version Control
Learn git branching interactively: A fun walk-through to learn basic git commands, including working on branches. Also includes some more advanced git techniques.
Excuse Me, Do You Have a Moment to Talk About Version Control? (Bryan 2018): (Note: Full text available with MSU login)
Software Carpentry introductory lesson on “Version Control with Git”