Directory structure for projects
- A good starting point is to keep all files associated with a project in a single folder
- Different projects should have separate folders
- Use consistent and informative directory structure
- If you need to separate public/private/secret, separate these by folder (and Git repo)
- Add a README file to describe the project and instructions on reproducing the results
- Talk to others in the project about what you do and write it down
- Your mileage may vary: it’s not a one-size-fits-all
- When software is reused in several projects it can make sense to put them in own repo
Struktur för molnbaserad kod
Struktur för textbaserade project
Struktur för kompilerad kod
A project directory can look something like this:
project_name/
├── README.md <span class="c"># overview of the project</span>
├── data/ <span class="c"># data files used in the project</span>
│ ├── README.md <span class="c"># describes where data came from</span>
│ └── sub-folder/ <span class="c"># may contain subdirectories</span>
├── processed_data/ <span class="c"># intermediate files from the analysis</span>
├── manuscript/ <span class="c"># manuscript describing the results</span>
├── results/ <span class="c"># results of the analysis (data, tables, figures)</span>
├── src/ <span class="c"># contains all code in the project</span>
│ ├── LICENSE <span class="c"># license for your code</span>
│ ├── requirements.txt <span class="c"># software requirements and dependencies</span>
│ └── ...
└── doc/ <span class="c"># documentation for your project</span>
├── index.rst
└── ...
Tracking source code, data, and results
- All code is version controlled and goes in the
src/orsource/directory - Include appropriate LICENSE file and information on software requirements
- You can also version control data files or input files under
data/ - If data files are too large (or sensitive) to track, untrack them using
.gitignore - Intermediate files from the analysis are kept in
processed_data/ - Consider using Git tags to mark specific versions of results (version submitted to a journal, dissertation version, poster version, etc.):
<span class="nv">$ </span>git tag <span class="nt">-a</span> thesis-submitted <span class="nt">-m</span> <span class="s2">"this is the submitted version of my thesis"</span>
Reproducible publications
- Git can be used to collaborate on manuscripts written in, e.g., LaTeX and other text-based formats but other tools exist:
- Many tools exist to assist in making scholarly output reproducible:
- rrtools: Instructions, templates, and functions for making a basic compendium suitable for writing a reproducible journal article or report with R.
- Jupyter Notebooks: Web-based interactive computational environment for creating notebook documents. Can be used for supplementary material with journal articles.
- Binder: Make a repository with Jupyter notebooks available in an executable environment.
- “Research compendia”: A set of good practices for reproducible data analysis in R, but much is transferable to other languages.
- Do you want to practice your reproducibility skills and get inspired by working with other people’s code/data? Join a ReproHack event!
Källa: Organizing your projects