Data Management
===============

.. py:currentmodule:: GLDF.data_management

.. py:module:: GLDF.data_management
    :synopsis: Specifications and helpers for data-management.

The module :py:mod:`!data_management` specifies interfaces for data and pattern exposition.
It also provides simple baseline implementations of these specifications for basic scenarios
like time-series with persistent in time regimes or spatial neighborhood patterns.


.. _label-indexing:

Indexing and CIT Identifiers
----------------------------

.. toctree::
   :maxdepth: 0
   :hidden:

   data_mgmt/CI_ID

*   Variables can be indexed differently for different data-managers. For example we identify
    variables in an IID setup :py:class:`DataManager_NumpyArray_IID` by their integer index,
    but in a time-series setup :py:class:`DataManager_NumpyArray_Timeseries` by a tuple
    of the form (variable index, time-lag). This degree of freedom in indexing is
    abstracted by a :py:class:`TypeVar` :py:obj:`var_index`.
*   Independence tests can be indexed relative to the variables involved.
    The class :py:class:`CI_Identifier`\ [\ :py:obj:`var_index`\ ] encodes index information disregarding orientation,
    i.e. independence-tests are assumed symmetric and invariant under permutation of the conditioning
    set. :py:class:`CI_Identifier`\ [\ :py:obj:`var_index`\ ] is typed and documented as generic class relative to the
    variable-indexing :py:obj:`var_index` used.
*   The specialization :py:class:`CI_Identifier_TimeSeries` of
    :py:class:`CI_Identifier`\ [\ :py:type:`tuple`\ [\ :py:type:`int`\ ,\ :py:type:`int`\ ]]
    implements the same functionality
    for time-series data (here indexing keeps track of relative time-lags).


.. _label-data:

Data Representation
-------------------

.. toctree::
   :maxdepth: 0
   :hidden:

   data_mgmt/cit_data_patterned
   data_mgmt/block_view

*   The class :py:class:`CIT_Data` represents the data used to perform a CIT.
*   The class :py:class:`CIT_DataPatterned` (extending :py:class:`CIT_Data`)
    additionally specifies functionality rquired to implement a pattern-provider, that is,
    it formalizes how to describe prior knowledge about plausible pattern-structure in data.
    Besides this interface-specification, its implementation also provides flexible
    fallbacks of most of this functionality. Most custom pattern-definitions
    will therefore require only little actual code, see for example:

    *   The class :py:class:`CIT_DataPatterned_PersistentInTime` provides an implementation
        for one-dimensional persistent patterns, for example persistent-in-time regimes.
    *   The class :py:class:`CIT_DataPatterned_PesistentInSpace` provides an implementation
        for two-dimensional persistent patterns, for example persistent-in-space regimes.

*   The class :py:class:`BlockView` represents patterned (that is grouped into blocks of a specified
    size) data.

.. seealso::
    Details on the easy customization of patterns used are given at :ref:`label-patterns`.


Data Manager
------------

.. toctree::
   :maxdepth: 0
   :hidden:

   data_mgmt/data_managers

A data-manager's task is, given a index-representation (see :ref:`label-indexing`) of a query,
to produce the corresponding data-representation (see :ref:`label-data`).
More formally, it should do so by exposing the :py:class:`IManageData` interface.


We currently provide two implemenations:

*   :py:class:`DataManager_NumpyArray_IID` stores (all) data in an immutable numpy-array.
    It uses :py:obj:`var_index` = :py:type:`int`, and is built to handle IID (except for regime-structure)
    data.
*   :py:class:`DataManager_NumpyArray_Timeseries` stores (all) data in an immutable numpy-array.
    It uses :py:obj:`var_index` = :py:type:`tuple[int, int]` encoding variable index and time-lag,
    and is built to handle time-series data.

Further details, in particular on customization and :ref:`label-cache-ids` can be found at
:ref:`label-data-mgr-details`.