Home > functions > dsImport.m

dsImport

PURPOSE ^

DSIMPORT - load data into DynaSim formatted data structure.

SYNOPSIS ^

function [data,studyinfo] = dsImport(file,varargin)

DESCRIPTION ^

DSIMPORT - load data into DynaSim formatted data structure.

 Usage:
   [data,studyinfo] = dsImport(data_file)
   [data,studyinfo] = dsImport(studyinfo)
   data = ImportData(data_file)

 Inputs:
   - First input/argument:
     - data_file: data file name in accepted format (csv, mat, ...)
     - cell array of data files
     - study_dir
     - studyinfo structure
     - studyinfo file
   - options:
     'verbose_flag': {0,1} (default: 1)
     'process_id'  : process identifier for loading studyinfo if necessary
     'time_limits' : [beg,end] ms (see NOTE 2)
     'variables'   : cell array of matrix names (see NOTE 2)
     'simIDs'      : array of simIDs to import (default: [])

 Outputs:
   - DynaSim data structure:
       data.labels           : list of state variables and monitors recorded
       data.(state_variables): state variable data matrix [time x cells]
       data.(monitors)       : monitor data matrix [time x cells]
       data.time             : time vector [time x 1]
       data.simulator_options: simulator options used to generate simulated data
       data.model            : model used to generate simulated data
       [data.varied]         : list of varied model components
       [data.results]        : list of derived data sets created by post-processing
   - studyinfo: DynaSim studyinfo structure (see CheckStudyinfo)
     Note: if data is missing, studyinfo.simulations will only show found data

 Notes:
   - NOTE 1: CSV file structure assumes CSV file contains data organized
   according to output from dsWriteDynaSimSolver: time points along rows; state
   variables and monitors are columns; first column is time vector; next
   columns are state variables; final columns are monitors. first row has
   headers for each column. if a population has more than one cell, different
   cells are sequential columns with same header repeated for each cell.

   - NOTE 2: DynaSim data exported to MAT-files are HDF-compatible. To obtain
   partial data sets without having to load the entire file, use dsImport
   with options 'time_limits' and/or 'variables'. Alternatively, the entire
   data set can be loaded using dsImport with default options, then subsets
   extracted using dsSelect with appropriate options.

 Examples:
   - Example 1: full data set
       data=dsImport('data.mat'); % load single data set
       data=dsImport(studyinfo); % load all data sets in studyinfo.study_dir
   - Example 2: partial data set with HDF-style loading
       data=dsImport('data.mat','variables','pop1_v','time_limits',[1000 4000])

 TODO:
 - specify subsets to return in terms of varied parameters, time_limits, ROIs,
   etc possible format for specifying range_varied: {'E','gNa',[.1 .3];
   'I->E','tauI',[15 25]; 'I','mechanism_list','+iM'}
 - achieve by calling function dsSelect() at end of this function.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [data,studyinfo] = dsImport(file,varargin)
0002 %DSIMPORT - load data into DynaSim formatted data structure.
0003 %
0004 % Usage:
0005 %   [data,studyinfo] = dsImport(data_file)
0006 %   [data,studyinfo] = dsImport(studyinfo)
0007 %   data = ImportData(data_file)
0008 %
0009 % Inputs:
0010 %   - First input/argument:
0011 %     - data_file: data file name in accepted format (csv, mat, ...)
0012 %     - cell array of data files
0013 %     - study_dir
0014 %     - studyinfo structure
0015 %     - studyinfo file
0016 %   - options:
0017 %     'verbose_flag': {0,1} (default: 1)
0018 %     'process_id'  : process identifier for loading studyinfo if necessary
0019 %     'time_limits' : [beg,end] ms (see NOTE 2)
0020 %     'variables'   : cell array of matrix names (see NOTE 2)
0021 %     'simIDs'      : array of simIDs to import (default: [])
0022 %
0023 % Outputs:
0024 %   - DynaSim data structure:
0025 %       data.labels           : list of state variables and monitors recorded
0026 %       data.(state_variables): state variable data matrix [time x cells]
0027 %       data.(monitors)       : monitor data matrix [time x cells]
0028 %       data.time             : time vector [time x 1]
0029 %       data.simulator_options: simulator options used to generate simulated data
0030 %       data.model            : model used to generate simulated data
0031 %       [data.varied]         : list of varied model components
0032 %       [data.results]        : list of derived data sets created by post-processing
0033 %   - studyinfo: DynaSim studyinfo structure (see CheckStudyinfo)
0034 %     Note: if data is missing, studyinfo.simulations will only show found data
0035 %
0036 % Notes:
0037 %   - NOTE 1: CSV file structure assumes CSV file contains data organized
0038 %   according to output from dsWriteDynaSimSolver: time points along rows; state
0039 %   variables and monitors are columns; first column is time vector; next
0040 %   columns are state variables; final columns are monitors. first row has
0041 %   headers for each column. if a population has more than one cell, different
0042 %   cells are sequential columns with same header repeated for each cell.
0043 %
0044 %   - NOTE 2: DynaSim data exported to MAT-files are HDF-compatible. To obtain
0045 %   partial data sets without having to load the entire file, use dsImport
0046 %   with options 'time_limits' and/or 'variables'. Alternatively, the entire
0047 %   data set can be loaded using dsImport with default options, then subsets
0048 %   extracted using dsSelect with appropriate options.
0049 %
0050 % Examples:
0051 %   - Example 1: full data set
0052 %       data=dsImport('data.mat'); % load single data set
0053 %       data=dsImport(studyinfo); % load all data sets in studyinfo.study_dir
0054 %   - Example 2: partial data set with HDF-style loading
0055 %       data=dsImport('data.mat','variables','pop1_v','time_limits',[1000 4000])
0056 %
0057 % TODO:
0058 % - specify subsets to return in terms of varied parameters, time_limits, ROIs,
0059 %   etc possible format for specifying range_varied: {'E','gNa',[.1 .3];
0060 %   'I->E','tauI',[15 25]; 'I','mechanism_list','+iM'}
0061 % - achieve by calling function dsSelect() at end of this function.
0062 
0063 % See also: dsSimulate, dsExportData, dsCheckData, dsSelect
0064 %
0065 % Author: Jason Sherfey, PhD <jssherfey@gmail.com>
0066 % Copyright (C) 2016 Jason Sherfey, Boston University, USA
0067 
0068 
0069 % Check inputs
0070 options=dsCheckOptions(varargin,{...
0071   'verbose_flag',1,{0,1},...
0072   'process_id',[],[],... % process identifier for loading studyinfo if necessary
0073   'time_limits',[],[],...
0074   'variables',[],[],...
0075   'simIDs',[],[],...
0076   'auto_gen_test_data_flag',0,{0,1},...
0077   },false);
0078 
0079 %% auto_gen_test_data_flag argin
0080 if options.auto_gen_test_data_flag
0081   varargs = varargin;
0082   varargs{find(strcmp(varargs, 'auto_gen_test_data_flag'))+1} = 0;
0083   varargs(end+1:end+2) = {'unit_test_flag',1};
0084   argin = [{file}, varargs]; % specific to this function
0085 end
0086 
0087 if ischar(options.variables)
0088   options.variables = {options.variables};
0089 end
0090 
0091 % check if input is a DynaSim study_dir or path to studyinfo
0092 if ischar(file)
0093   if isdir(file) % study directory
0094     study_dir = file;
0095     clear file
0096     file.study_dir = study_dir;
0097   elseif strfind(file, 'studyinfo')
0098     filePath = fileparts2(file);
0099     if isempty(filePath)
0100       filePath = pwd;
0101     end
0102     study_dir = filePath;
0103     clear file
0104     file.study_dir = study_dir;
0105   end
0106 end
0107 
0108 if isstruct(file) && isfield(file,'study_dir')
0109   % "file" is a studyinfo structure.
0110   % retrieve most up-to-date studyinfo structure from studyinfo.mat file
0111   studyinfo = dsCheckStudyinfo(file.study_dir,'process_id',options.process_id, varargin{:});
0112 
0113   % compare simIDs to sim_id
0114   if ~isempty(options.simIDs)
0115      [~,~,simsInds] = intersect(options.simIDs, [studyinfo.simulations.sim_id]);
0116   end
0117 
0118   % get list of data_files from studyinfo
0119   if isempty(options.simIDs)
0120     data_files = {studyinfo.simulations.data_file};
0121   else
0122     data_files = {studyinfo.simulations(simsInds).data_file};
0123   end
0124   success = cellfun(@exist,data_files)==2;
0125 
0126   if ~all(success)
0127     % convert original absolute paths to paths relative to study_dir
0128     for i = 1:length(data_files)
0129       [~,fname,fext] = fileparts2(data_files{i});
0130       data_files{i} = fullfile(file.study_dir,'data',[fname fext]);
0131     end
0132 
0133     success = cellfun(@exist,data_files)==2;
0134   end
0135 
0136   data_files = data_files(success);
0137   sim_info = studyinfo.simulations(success);
0138   studyinfo.simulations = studyinfo.simulations(success); % remove missing data
0139 
0140   % load each data set recursively
0141   keyvals = dsOptions2Keyval(options);
0142   num_files = length(data_files);
0143 
0144   for i = 1:num_files
0145     fprintf('loading file %g/%g: %s\n',i,num_files,data_files{i});
0146     tmp_data=dsImport(data_files{i},keyvals{:});
0147     num_sets_per_file=length(tmp_data);
0148     modifications=sim_info(i).modifications;
0149     
0150     if ~isfield(tmp_data,'varied') && ~isempty(modifications)
0151     % add varied info
0152       % this is necessary here when loading .csv data lacking metadata
0153       tmp_data.varied={};
0154       modifications(:,1:2) = cellfun( @(x) strrep(x,'->','_'),modifications(:,1:2),'UniformOutput',0);
0155 
0156       for j=1:size(modifications,1)
0157         varied=[modifications{j,1} '_' modifications{j,2}];
0158         for k=1:num_sets_per_file
0159           tmp_data(k).varied{end+1}=varied;
0160           tmp_data(k).(varied)=modifications{j,3};
0161         end
0162       end
0163     end
0164 
0165     % store this data
0166     if i==1
0167       total_num_sets=num_sets_per_file*num_files;
0168       set_indices=0:num_sets_per_file:total_num_sets-1;
0169 
0170       % preallocate full data matrix based on first data file
0171       data(1:total_num_sets)=tmp_data(1);
0172 %       data(1:length(data_files))=tmp_data;
0173 %     else
0174 %       data(i)=tmp_data;
0175     end
0176     % replace i-th set of data sets by these data sets
0177     data(set_indices(i)+(1:num_sets_per_file))=tmp_data;
0178   end
0179 
0180   return;
0181 else
0182   studyinfo=[];
0183 end
0184 
0185 % check if input is a list of data files (TODO: eliminate duplicate code by
0186 % combining with the above recursive loading for studyinfo data_files)
0187 if iscellstr(file)
0188   data_files=file;
0189   success=cellfun(@exist,data_files)==2;
0190   data_files=data_files(success);
0191   keyvals=dsOptions2Keyval(options);
0192 
0193   % load each data set recursively
0194   for i=1:length(data_files)
0195     tmp_data=dsImport(data_files{i},keyvals{:});
0196     % store this data
0197     if i==1
0198       % preallocate full data matrix based on first data file
0199       data(1:length(data_files))=tmp_data;
0200     else
0201       % replace i-th data element by this data set
0202       data(i)=tmp_data;
0203     end
0204   end
0205   return;
0206 end
0207 
0208 if ischar(file)
0209   [~,~,ext]=fileparts2(file);
0210   switch lower(ext)
0211     case '.mat'
0212       % MAT-file contains data fields as separate variables (-v7.3 for HDF)
0213       if isempty(options.time_limits) && isempty(options.variables)
0214         % load full data set
0215         data=load(file);
0216 
0217         % if file only contains a structure called 'data' then return that
0218         if isfield(data,'data') && length(fieldnames(data))==1
0219           data=data.data;
0220         end
0221       else
0222         % load partial data set
0223         % use matfile() to load HDF subsets given varargin options...
0224         obj=matfile(file); % MAT-file object
0225         varlist=who(obj); % variables stored in mat-file
0226         labels=obj.labels; % list of state variables and monitors
0227 
0228         if iscellstr(options.variables) % restrict variables to load
0229           labels=labels(ismember(labels,options.variables));
0230         end
0231 
0232         simulator_options=obj.simulator_options;
0233         time=(simulator_options.tspan(1):simulator_options.dt:simulator_options.tspan(2))';
0234         time=time(1:simulator_options.downsample_factor:length(time));
0235 
0236         if ~isempty(options.time_limits)
0237           % determine time indices to load
0238           time_indices=nearest(time,options.time_limits(1)):nearest(time,options.time_limits(2));
0239         else
0240           % load all time points
0241           time_indices=1:length(time);
0242         end
0243 
0244         % create DynaSim data structure:
0245         data=[];
0246         data.labels=labels;
0247 
0248         % load state variables and monitors
0249         for i=1:length(labels)
0250           data.(labels{i})=obj.(labels{i})(time_indices,:);
0251         end
0252 
0253         data.time=time(time_indices);
0254         data.simulator_options=simulator_options;
0255 
0256         if ismember('model',varlist)
0257           data.model=obj.model;
0258         end
0259 
0260         if ismember('varied',varlist)
0261           varied=obj.varied;
0262           data.varied=varied;
0263           for i=1:length(varied)
0264             data.(varied{i})=obj.(varied{i});
0265           end
0266         end
0267 
0268         if ismember('results',varlist)
0269           results=obj.results;
0270           if iscellstr(options.variables)
0271             results=results(ismember(results,options.variables));
0272           end
0273           data.results=results;
0274 
0275           % load results
0276           for i=1:length(results)
0277             data.(results{i})=obj.(results{i})(time_indices,:);
0278           end
0279         end
0280       end
0281     case '.csv'
0282       % assumes CSV file contains data organized according to output from dsWriteDynaSimSolver:
0283       data=dsImportCSV(file);
0284 
0285       if ~(isempty(options.time_limits) && isempty(options.variables))
0286         % limit to select subsets
0287         data=dsSelect(data,varargin{:}); % todo: create dsSelect()
0288       end
0289     otherwise
0290       error('file type not recognized. dsImport currently supports DynaSim data structure in MAT file, data values in CSV file.');
0291   end
0292 end
0293 
0294 %% auto_gen_test_data_flag argout
0295 if options.auto_gen_test_data_flag
0296   argout = {data, studyinfo}; % specific to this function
0297 
0298   dsUnitSaveAutoGenTestData(argin, argout);
0299 end
0300 
0301 end % main fn

Generated on Tue 12-Dec-2017 11:32:10 by m2html © 2005