Data Management
The module data_management specifies interfaces for data and pattern exposition.
It also provides simple baseline implementations of these specifications for basic scenarios
like time-series with persistent in time regimes or spatial neighborhood patterns.
Indexing and CIT Identifiers
Variables can be indexed differently for different data-managers. For example we identify variables in an IID setup
DataManager_NumpyArray_IIDby their integer index, but in a time-series setupDataManager_NumpyArray_Timeseriesby a tuple of the form (variable index, time-lag). This degree of freedom in indexing is abstracted by aTypeVarvar_index.Independence tests can be indexed relative to the variables involved. The class
CI_Identifier[var_index] encodes index information disregarding orientation, i.e. independence-tests are assumed symmetric and invariant under permutation of the conditioning set.CI_Identifier[var_index] is typed and documented as generic class relative to the variable-indexingvar_indexused.The specialization
CI_Identifier_TimeSeriesofCI_Identifier[tuple[int,int]] implements the same functionality for time-series data (here indexing keeps track of relative time-lags).
Data Representation
The class
CIT_Datarepresents the data used to perform a CIT.The class
CIT_DataPatterned(extendingCIT_Data) additionally specifies functionality rquired to implement a pattern-provider, that is, it formalizes how to describe prior knowledge about plausible pattern-structure in data. Besides this interface-specification, its implementation also provides flexible fallbacks of most of this functionality. Most custom pattern-definitions will therefore require only little actual code, see for example:The class
CIT_DataPatterned_PersistentInTimeprovides an implementation for one-dimensional persistent patterns, for example persistent-in-time regimes.The class
CIT_DataPatterned_PesistentInSpaceprovides an implementation for two-dimensional persistent patterns, for example persistent-in-space regimes.
The class
BlockViewrepresents patterned (that is grouped into blocks of a specified size) data.
See also
Details on the easy customization of patterns used are given at Patterns.
Data Manager
A data-manager’s task is, given a index-representation (see Indexing and CIT Identifiers) of a query,
to produce the corresponding data-representation (see Data Representation).
More formally, it should do so by exposing the IManageData interface.
We currently provide two implemenations:
DataManager_NumpyArray_IIDstores (all) data in an immutable numpy-array. It usesvar_index=int, and is built to handle IID (except for regime-structure) data.DataManager_NumpyArray_Timeseriesstores (all) data in an immutable numpy-array. It usesvar_index=tuple[int, int]encoding variable index and time-lag, and is built to handle time-series data.
Further details, in particular on customization and Cache IDs can be found at Details on Data Managers.