Creating a clean analysis pipeline
If you are completely new to FieldTrip, we recommend that you skip this tutorial for now. You can read the introduction tutorial and then move on with the tutorials on preprocessing. Once you get the hang of it, you can return to this tutorial that is more on the technical and coding aspects.
Introduction
This tutorial is intended to provide some guidelines and suggestions how to set up a chain of analysis steps that makes most efficient use of your (and your computerβs) time and is in accordance to the FieldTrip philosophy. Some MATLAB basics regarding the making of your own function will also be introduced. The idea of batching is introduced.
The examples here are about preprocessing of the data, but it does not provide detailed information about it. If you are interested in how to preprocess your data, you can check for example, this tutorial.
The paper Seven quick tips for analysis scripts in neuroimaging by Marijn van Vliet (2020, Plos Comp Biol) also provides very useful guidelines for writing and organizing your analysis code. Althoug the examples it provides use Python, the ideas it presents apply equally well to MATLAB.
Background
The analysis of an experiment typically involves a lot of repetition as similar analysis steps are taken for every condition and for every subject. Also, the same steps are often repeated with only slightly different settings (e.g., filters, timings). Because of this we should program our own functions around the FieldTrip functions. FieldTrip functions are not intended to be just typed into MATLABβs command window. If you do, you are guaranteed to lose record of preceding steps, repeat yourself unnecessarily, or unknowingly change settings between subjects or conditions.
Another βno-noβ is the practice of collecting all your steps in one large m-file and copy-pasting parts in the command window. Besides becoming easily cluttered with previous tries, different filter settings, etc., it does not create a clear continuity between steps, and most importantly, does not permit batching. Batching is the ultimate aim of any analysis pipeline. It means that in the end most of your analysis steps can be repeated over all subjects and/or conditions with a single command.
Separating subject-specific details from the code
As stated before, by making our own function around FieldTrip functions, we can in a later stage easily repeat them, e.g., over multiple subjects. However, unless the data is nicely curated and represented in BIDS, every subject or condition will commonly have different filenames, different variables, different filter-settings, different trials that have to be rejected, etc. A good idea, therefore, is to first write all your subject-specific details in a separate m-file. You can choose to have one m-file per subject, or one in which you combine all subjects. In the current example we will use the first option, and we specify these m-files to be a function:
function [subjectdata] = Subject01
% Subject01 returns the subject-specific details
%
% the first few lines with comments are displayed as help
% define the filenames, parameters and other information that is subject specific
subjectdata.subjectid = 'Subject01';
subjectdata.eegdata = 'myProject/rawdata/EEG/subject01.eeg';
subjectdata.mridata = 'myProject/rawdata/MRI/01_mri.nii';
subjectdata.badtrials = [1 3]; % subject made a mistake on the first and third trial
% more information can be added to this script when needed
...
Save this as Subject01.m
in a personal folder that you will need to add to the MATLAB path. Using the command line you can now simply retrieve this personal data by calling Subject01
or from any script by using eval('Subject01')
. This will return the structure subjectdata
containing all the fields we have specified.
We can now use this structure as input for our own functions, giving us a flexible way of combining generic functions and subject-specific settings. In addition, you could use this file to add further comments such as % subject made a mistake on the first trial
.
An example that uses subject-specific m-files can be found here. The same dataset has later also been converted to BIDS and a version of the analysis that starts from the BIDS dataset is documented here and here.
A similar example is this one, which also starts from a consistent BIDS dataset (hence fewer exceptions needed) and which stores subject-specific details like the selected trials and the bad segments and channels in mat-files rather than m-files.
Making your own analysis functions
You can make an analysis pipeline by calling a sequence of FieldTrip functions from within your own function. To make a function in MATLAB write something in the style of:
function [output] = MyOwnFunction(input)
% MyOwnFunction takes the square root of the input
%
% the first few lines with comments are displayed as help
output = sqrt(input);
Make sure you save the filename identical as the function name, i.e., MyOwnFunction, and to save it in your personal folder dedicated to your own functions and scripts.
Do not save your own functions and/or scripts in the FieldTrip folder as it makes it harder to update your FieldTrip toolbox to a new version.
In general we recommend to keep your raw data, your scripts and your results in three separate directories.
Having saved your function in a folder of your MATLAB path you can, from within any script or from the command line, use your function. In our example MyOwnFunction(4)
will give you the answer 2
To put the answer in a variable for storage or future use you need to call something like output = MyOwnFunction(4)
This is the way most FieldTrip functions work: you provide the parameters together with data as the input and the function will return the results as the output.
It is often convenient to save intermediate results to disk. For instance you can type:
save('firstoutput', 'output');
to save the output to firstoutput.mat
in the directory you are in. Letβs say you defined an output folder as in the first paragraph:
subjectdata.subjectdir = 'Subject01';
you can program a generic solution to save all analysis steps of every subject in their own output folder:
save([subjectdata.subjectdir filesep 'firstoutput'], 'output');
In this way all your functions (i.e., analysis steps) can read the output of the previous step as .mat files based upon their subject number.
We suggest that you store a single variable per file. This will in general make it possible to more easily only read what is necessary. Furthermore, if you give the files a clear and consistent name, you can easily delete the files (intermediate results) that are not needed anymore. Note that you can sort in the file manager on filename, as well as on creation date. The latter is convenient to quickly get an overview of the most recent files after you notice yet another bug in your analysis script π.
The file and folder organization on disk could then look something like this:
myProject/
βββ code
βΒ Β βββ MyOwnFunction.m
βΒ Β βββ Subject01.m
βΒ Β βββ Subject02.m
βΒ Β βββ Subject03.m
βΒ Β βββ Subject04.m
βββ rawdata
βΒ Β βββ EEG
βΒ Β βΒ Β βββ subject01.eeg
βΒ Β βΒ Β βββ subject02.eeg
βΒ Β βΒ Β βββ subject03.eeg
βΒ Β βΒ Β βββ subject04.eeg
βΒ Β βββ MRI
βΒ Β βββ 01_mri.nii
βΒ Β βββ 02_mri.nii
βββ results
βββ Subject01
βΒ Β βββ avg_cond1.mat
βΒ Β βββ avg_cond1_filtered.mat
βΒ Β βββ avg_cond2.mat
βΒ Β βββ avg_cond2_filtered.mat
βΒ Β βββ avg_cond3.mat
βΒ Β βββ avg_cond3_filtered.mat
βΒ Β βββ rawdata.mat
βΒ Β βββ rawdata_filtered.mat
βββ Subject02
βββ Subject03
βββ Subject04
Along the way, you will most likely expand on the subject-specific information. For instance, in the first step you may have used ft_databrowser to select some unusual artifacts in one subject, which you could store in your subject-specific .m file:
subjectdata.artfctdef.visual.artifact = [
160611,162906
473717,492076
604850,606076
702196,703615
736261,738205
850361,852159
887956,895200
959974,972785
1096344,1099772
];
Batching
In the end weβll end up with a collection of several function calls that either depend on the output of previous functions, for example preprocessing or artifact rejection, while others could in principle be called in parallel, for example the averaging per condition. This could result in an analysis pipeline such as this (simplified) one:
Separating the information from the interactive or manual steps from the non-interactive steps allows us to automate most, in this example that would be the visual inspection of artifacts in the data. This is called batching.
Large datasets often require quite some processing time, hence it is convenient to run a batch of analysis steps overnight. The worst that can happen is that the next morning youβll see some red lines in your MATLAB command window just because of a small mistake in one of the first subjects. Therefore, you might want to try using the try-catch
option in MATLAB. Whenever something goes wrong between the try
and catch
it will jump to the catch after which it will just continue. For example:
for i = 1:number_of_subjects
try
my_preprocessing_function(i)
% my_old_freqanalysis_function(i)
my_freqanalysis_function(i)
my_sourceanalysis_function(i)
catch
disp(['Something was wrong with Subject' int2str(i) '! Continuing with next in line']);
end
end
Example batches
The following function will load the data as specified in Subject01.m, uses the databrowser for visual inspection of artifacts, rejects those trials containing artifacts and then saves the data in a separate folder as β01_preproc_dataM.matβ. You can simply call it as do_preprocess_MM('Subject01')
.
function do_preproces_MM(Subjectm)
cfg = [];
if nargin == 0
disp('Not enough input arguments');
return;
end
subjectdata = feval(Subjectm);
outputdir = 'AnalysisM';
%%% define trials
cfg.dataset = [subjectdata.subjectdir filesep subjectdata.datadir];
cfg.trialdef.eventtype = 'trigger';
cfg.trialdef.prestim = 1.5;
cfg.trialdef.poststim = 1.5;
%cfg.continuous = 'no';
cfg.lpfilter = 'no';
cfg.continuous = 'yes';
cfg.trialfun = 'motormirror_trialfun'; % located in code directory
cfg.layout = 'EEG1020.lay';
cfg = ft_definetrial(cfg);
%%% if there are visual artifacts already in subject m-file use those, they will show up in databrowser
try
cfg.artfctdef.visual.artifact = subjectdata.artfctdef.visual.artifact;
catch
end
%%% visual detection of other artifacts
cfg.continuous = 'yes';
cfg.blocksize = 20;
cfg.eventfile = [];
cfg.viewmode = 'butterfly';
cfg = ft_databrowser(cfg);
%%% enter visually detected artifacts in subject m-file;
fid = fopen([subjectdata.mfiledir filesep Subjectm '.m'],'At');
fprintf(fid,'\n%s\n',['%%% Entered @ ' datestr(now)]);
fprintf(fid,'%s',['subjectdata.visualartifacts = [ ' ]);
if isempty(cfg.artfctdef.visual.artifact) == 0
for i = 1 : size(cfg.artfctdef.visual.artifact,1)
fprintf(fid,'%u%s%u%s',cfg.artfctdef.visual.artifact(i,1),' ',cfg.artfctdef.visual.artifact(i,2),';');
end
end
fprintf(fid,'%s\n',[ ' ]; ']);
fclose all;
%%% reject artifacts
cfg.artfctdef.reject = 'complete';
cfg = ft_rejectartifact(cfg);
%%% make directory, if needed, to save all analysis data
if exist(outputdir) == 0
mkdir(outputdir)
end
%%% Preprocess and SAVE
dataM = ft_preprocessing(cfg);
save([outputdir filesep 'preproc_dataM'], 'dataM', '-V7.3')
clear all;
Summary and suggested further readings
This tutorial explained how to write your own functions and how to do batching in order to increase the efficiency of your analysis. If you are interested in improving memory usage and the speed of your analysis, you can check this and the tutorials on distributed computing using qsub and parfor tutorial.
When you have more questions about the topic of any tutorial, donβt forget to check the frequently asked questions and the example scripts.
Here are the frequently asked questions that are MATLAB specific:
- Can I prevent "external" toolboxes from being added to my MATLAB path?
- Can I use FieldTrip without MATLAB license?
- Which external toolboxes are used by FieldTrip?
- How can I compile the mex files on 64-bit Windows?
- How can I use my MacBook Pro for stimulus presentation in the MEG lab?
- How fast is the FieldTrip buffer for realtime data streaming?
- MATLAB complains about a missing or invalid mex file, what should I do?
- MATLAB does not see the functions in the "private" directory
- Replacements for functions from MathWorks toolboxes
- What are the MATLAB requirements for using FieldTrip?
- MATLAB version 7.3 (2006b) crashes when I try to do ...
- MATLAB complains that mexmaci64 cannot be opened because the developer cannot be verified
- What are the MATLAB and external requirements?
- Should I add FieldTrip with all subdirectories to my MATLAB path?
- What are the different approaches I can take for distributed computing?
- Where can I find the dipoli command-line executable?
- Why are so many of the interesting functions in the private directories?
- Why are the fileio functions stateless, does the fseek not make them very slow?
Here are the example scripts that are MATLAB specific: