Directory structure for projects

  • A good starting point is to keep all files associated with a project in a single folder
  • Different projects should have separate folders
  • Use consistent and informative directory structure
  • If you need to separate public/private/secret, separate these by folder (and Git repo)
  • Add a README file to describe the project and instructions on reproducing the results
  • Talk to others in the project about what you do and write it down
  • Your mileage may vary: it’s not a one-size-fits-all
  • When software is reused in several projects it can make sense to put them in own repo

Struktur för molnbaserad kod

 

Struktur för textbaserade project

 

Struktur för kompilerad kod

 

A project directory can look something like this:

project_name/
├── README.md             <span class="c"># overview of the project</span>
├── data/                 <span class="c"># data files used in the project</span>
│   ├── README.md         <span class="c"># describes where data came from</span>
│   └── sub-folder/       <span class="c"># may contain subdirectories</span>
├── processed_data/       <span class="c"># intermediate files from the analysis</span>
├── manuscript/           <span class="c"># manuscript describing the results</span>
├── results/              <span class="c"># results of the analysis (data, tables, figures)</span>
├── src/                  <span class="c"># contains all code in the project</span>
│   ├── LICENSE           <span class="c"># license for your code</span>
│   ├── requirements.txt  <span class="c"># software requirements and dependencies</span>
│   └── ...
└── doc/                  <span class="c"># documentation for your project</span>
    ├── index.rst
    └── ...

Tracking source code, data, and results

  • All code is version controlled and goes in the src/ or source/ directory
  • Include appropriate LICENSE file and information on software requirements
  • You can also version control data files or input files under data/
  • If data files are too large (or sensitive) to track, untrack them using .gitignore
  • Intermediate files from the analysis are kept in processed_data/
  • Consider using Git tags to mark specific versions of results (version submitted to a journal, dissertation version, poster version, etc.):
    <span class="nv">$ </span>git tag <span class="nt">-a</span> thesis-submitted <span class="nt">-m</span> <span class="s2">"this is the submitted version of my thesis"</span>
    

Reproducible publications

  • Git can be used to collaborate on manuscripts written in, e.g., LaTeX and other text-based formats but other tools exist:
    • Overleaf (has Git integration)
    • Authorea (apparently also has Git integration)
    • Google Docs can be a good alternative
  • Many tools exist to assist in making scholarly output reproducible:
    • rrtools: Instructions, templates, and functions for making a basic compendium suitable for writing a reproducible journal article or report with R.
    • Jupyter Notebooks: Web-based interactive computational environment for creating notebook documents. Can be used for supplementary material with journal articles.
    • Binder: Make a repository with Jupyter notebooks available in an executable environment.
    • “Research compendia”: A set of good practices for reproducible data analysis in R, but much is transferable to other languages.
  • Do you want to practice your reproducibility skills and get inspired by working with other people’s code/data? Join a ReproHack event!

 


Källa: Organizing your projects