Using reproducescript on a full study

This example script will introduce you to functionality in the FieldTrip toolbox designed to aid in making your analysis pipeline - including code, data and results - easily reproducible and shareable. It is based on the manuscript Reducing the efforts to create reproducible analysis code with FieldTrip. We assume that you already had a look at the examples on Making your analysis pipeline reproducible using reproducescript and Using reproducescript for a group analysis.

Example 3

Example 1 and 2 were fairly simple analysis pipelines that didn’t particularly benefit from reproducescript because the original scripts and data organisation were already clear. Those examples are intended to show how reproducescript works and how it’s used. In this example, we apply reproducescript to a published analysis pipeline of MEG data by Andersen (2018). We hope that this will help you to set up your own analysis pipeline using reproducescript.

Original analysis

The analysis pipeline in Andersen (2018) is well-documented and itself a good demonstration of a reproducible analysis pipeline in the FieldTrip ecosystem. Nevertheless, it consists of a complex set of 10 analysis scripts and 46 functions, which, without the extensive documentation that has been provided by the author, would be challenging to reuse and reproduce the results. This makes it particularly suited to demonstrate the effectiveness and simplicity of reproducescript.

Andersen describes an analysis pipeline from raw single-subject MEG data to group-level statistics in source space. Each of the custom-written scripts has a specific purpose:

Figure copied from Andersen (2018) with permission from the author.

Still, multiple analysis steps in separate functions are required for the purpose of one script (see figure 6 for the full analysis pipeline), creating a complex hierarchy of scripts and functions:

Figure copied from Andersen (2018) with permission from the author.

To keep the computational time and storage requirements low, we applied the full analysis pipeline to two subjects only. Both the original source code from Andersen and the standardized scripts generated by reproducescript are available on GitHub. Using the documentation in Andersen (2018), we wrote the master script run_all.m , which calls all relevant functions in Andersen’s analysis pipeline in the correct order. Here we added the reproducescript option to ft_default in the same way as in the group analysis example, i.e., by initialising reproducescript separately on each subject, and then once for the group analysis (see below).

run_all.m with reproducescript enabled

This is the original code for run_all.m with the only difference that reproducescript is enabled. It is also available on GitHub.

clear
close all

global ft_default
ft_default = [];
ft_default.checksize = inf;

%% Single subject analysis
datainfo;
for do_subject = 1:numel(all_subjects)
  reproduce_dir = [home_dir, sprintf('reproduce%02d/', do_subject)];

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % enable reproducescript
  ft_default.reproducescript = reproduce_dir;
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  % Create all relevant directories where all data and all figures will be saved
  create_MEG_BIDS_data_structure

  % Go from raw MEG data to a time-frequency representation
  sensor_space_analysis

  % Go from raw MRI data to a volume conductor and a forward model
  mr_preprocessing

  % Extract fourier transforms and do beamformer source reconstructions
  source_space_analysis

  % Plot all steps in the sensor space analysis
  plot_sensor_space

  % Plot all steps in the MR processing
  plot_processed_mr

  % Plot all steps in the source space analysis
  plot_source_space

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % disable reproducescript
  ft_default.reproducescript = [];
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

%% Group analysis
datainfo;
reproduce_dir = [home_dir, 'reproduce_group'];

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% enable reproducescript
ft_default.reproducescript = reproduce_dir;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Do grand averages across subjects for both sensor and source spaces
grand_averages

% Do statistics on time-frequency representations and beamformer source reconstructions
statistics

% Plot grand averages in both the sensor and source spaces, with and without statistical masking
plot_grand_averages

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% disable reproducescript
ft_default.reproducescript = [];
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Reproduced analysis

Please head over to GitHub to see what the reproduced script.m look like for the two subjects and for the group level.

Conclusion

In three examples we have shown how the reproducescript functionality can be applied to any analysis pipeline that is based on the FieldTrip ecosystem. This functionality can be applied without much effort on the researcher’s side, and it generates all code, original and intermediate data, as well as final results in a format in which it’s readily shareable and reproducible.

Note that there are other strategies for improving shareability and reproducibility, and we don’t claim that reproducescript is the best way in every scenario. Rather, it is one of many tools that can aid the researcher to improve the community’s standard in methodological transparency and robustness of results. For other strategies, we refer the reader to the pre-print in which we first described reproducescript.

Suggested further reading