Data Management

The module data_management specifies interfaces for data and pattern exposition. It also provides simple baseline implementations of these specifications for basic scenarios like time-series with persistent in time regimes or spatial neighborhood patterns.

Indexing and CIT Identifiers

  • Variables can be indexed differently for different data-managers. For example we identify variables in an IID setup DataManager_NumpyArray_IID by their integer index, but in a time-series setup DataManager_NumpyArray_Timeseries by a tuple of the form (variable index, time-lag). This degree of freedom in indexing is abstracted by a TypeVar var_index.

  • Independence tests can be indexed relative to the variables involved. The class CI_Identifier[var_index] encodes index information disregarding orientation, i.e. independence-tests are assumed symmetric and invariant under permutation of the conditioning set. CI_Identifier[var_index] is typed and documented as generic class relative to the variable-indexing var_index used.

  • The specialization CI_Identifier_TimeSeries of CI_Identifier[tuple[int,int]] implements the same functionality for time-series data (here indexing keeps track of relative time-lags).

Data Representation

  • The class CIT_Data represents the data used to perform a CIT.

  • The class CIT_DataPatterned (extending CIT_Data) additionally specifies functionality rquired to implement a pattern-provider, that is, it formalizes how to describe prior knowledge about plausible pattern-structure in data. Besides this interface-specification, its implementation also provides flexible fallbacks of most of this functionality. Most custom pattern-definitions will therefore require only little actual code, see for example:

  • The class BlockView represents patterned (that is grouped into blocks of a specified size) data.

See also

Details on the easy customization of patterns used are given at Patterns.

Data Manager

A data-manager’s task is, given a index-representation (see Indexing and CIT Identifiers) of a query, to produce the corresponding data-representation (see Data Representation). More formally, it should do so by exposing the IManageData interface.

We currently provide two implemenations:

Further details, in particular on customization and Cache IDs can be found at Details on Data Managers.