Index > A Mathematical Account of Cell Types Edit on GitHub

A Mathematical Account of Cell Types

Table of Contents

Types

This is a gross oversimplification. The relationship between data type and function type can be granular to the level of the exact technique used to collect data from a particular modality. There are cases where some function types are matched to the technique types, or rather the data types not of the modality but of the exact technique used to collect data from that modality. There are functions that are defined on the domain of data derived from a particular modality, but those may require some level of preprocessing/loss of precision in order not to produce \bot{} results.

Data modality Type Data type Function type
Transcriptomic \(\tau_{R}\) \(D_{\tau_{R}}\) \(f_{\tau_{R}}\)
Epigenomic \(\tau_{E}\) \(D_{\tau_{E}}\) \(f_{\tau_{E}}\)
Physiology \(\tau_{P}\) \(D_{\tau_{P}}\) \(f_{\tau_{P}}\)

A note on the commutative behavior of the type notation. \(\tau_{R}\tau_{E}\) \iff \(\tau_{RE}\), which is to say that when dealing with data, all individuals with \(D_{\tau_{RE}}\) have data of both \(D_{\tau_{R}}\) and \(D_{\tau_{E}}\). The inverse cannot be inferred for populations where some individuals lack either \(D_{\tau_{R}}\) or \(D_{\tau_{E}}\).

Let \(\mathcal{T}_{cell}\) be the set of all techniques that can produce data about a single cell at a time or some part of that single cell. Let \(R\), \(E\), and \(P\) each be a single technique from the modalities transcriptomics, epigenomics, and [electro]physiology respectively. They do not represent the union of all techniques that fall under that modality. If they did, the assertions below would be false, since there are many techniques within a single modality that produce data that is cross-wise incompatible with the functions that operate on each type of data (i.e., that \(f_{\tau_{R_1}}\)(\(D_{\tau_{R_2}}\)) → \bot{} and \(f_{\tau_{R_2}}\)(\(D_{\tau_{R_1}}\)) → \bot{}).

Result types. Note that \(x\) and \(x'\) have no meaningful relation when it comes to their relation to functions accepting their types. \(f_{\tau_{x}}\)(\(D_{\tau_{x'}}\)) → \bot{} and \(f_{\tau_{x'}}\)(\(D_{\tau_{x}}\)) → \bot{}. The convention is used only to make it easier to follow the fact that their relation comes from the fact that one is the input type and the other is an output type of a given function. Note that it is entirely possible for output types \(\tau_{R_1'}\) and \(\tau_{R_2'}\) to be equivalent. In those cases an explicit assertion or a harder to follow type name will be used.

Table 1: Return types.
\(f_{\tau_{R}}\) \(\tau_{R'}\)
\(f_{\tau_{E}}\) \(\tau_{E'}\)
\(f_{\tau_{P}}\) \(\tau_{P'}\)
\(f_{\tau_{RE}}\) \(\tau_{RE'}\)
\(f_{\tau_{RP}}\) \(\tau_{RP'}\)
\(f_{\tau_{EP}}\) \(\tau_{EP'}\)
\(f_{\tau_{REP}}\) \(\tau_{REP'}\)

An overview of the types of processing/analysis functions and how they relate to continuous types and discrete types.

Table 2: Behavior of functions with types operating on a single modality.
\(f_{\tau_{R}}\)(\(D_{\tau_{R}}\)) \(C_{\tau_{R'}}\) \(f_{\tau_{E}}\)(\(D_{\tau_{R}}\)) \bot{} \(f_{\tau_{P}}\)(\(D_{\tau_{R}}\)) \bot{}
\(f_{\tau_{R}}\)(\(D_{\tau_{E}}\)) \bot{} \(f_{\tau_{E}}\)(\(D_{\tau_{E}}\)) \(C_{\tau_{E'}}\) \(f_{\tau_{P}}\)(\(D_{\tau_{E}}\)) \bot{}
\(f_{\tau_{R}}\)(\(D_{\tau_{P}}\)) \bot{} \(f_{\tau_{E}}\)(\(D_{\tau_{P}}\)) \bot{} \(f_{\tau_{P}}\)(\(D_{\tau_{P}}\)) \(C_{\tau_{P'}}\)
Table 3: Behavior over two modalities.
\(f_{\tau_{R}}\)(\(D_{\tau_{EP}}\)) \bot{} \(f_{\tau_{E}}\)(\(D_{\tau_{EP}}\)) \(C_{\tau_{E'}}\)
\(f_{\tau_{R}}\)(\(D_{\tau_{RP}}\)) \(C_{\tau_{R'}}\) \(f_{\tau_{E}}\)(\(D_{\tau_{RP}}\)) \bot{}
\(f_{\tau_{R}}\)(\(D_{\tau_{RE}}\)) \(C_{\tau_{R'}}\) \(f_{\tau_{E}}\)(\(D_{\tau_{RE}}\)) \(C_{\tau_{E'}}\)
\(f_{\tau_{R}}\)(\(D_{\tau_{REP}}\)) \(C_{\tau_{R'}}\) \(f_{\tau_{E}}\)(\(D_{\tau_{REP}}\)) \(C_{\tau_{E'}}\)
Table 4: Behavior over three modalities.
\(f_{\tau_{REP}}\)(\(D_{\tau_{RE}}\)) \bot{}
\(f_{\tau_{REP}}\)(\(D_{\tau_{RP}}\)) \bot{}
\(f_{\tau_{REP}}\)(\(D_{\tau_{EP}}\)) \bot{}
\(f_{\tau_{REP}}\)(\(D_{\tau_{REP}}\)) \(C_{\tau_{REP'}}\)

What we usually mean when we talk about types derived from a particular modality is the set of all types \(\tau_{R'_{1..n}}\) that make up the union of the return types for the family of functions \(\mathcal{F}_{\tau_{R}}\) that have the input type \(\tau_{R}\) and a return type \(\tau_{R'_{1..n}}\) where \(n\) is the number of functions in \(\mathcal{F}_{\tau_{R}}\) (possibly infinite).

Our task then is to use this understanding of types to classify a single cell as belonging to one or more types. The first an most trivial type is the type of data that has been collected about it.

A cell that has had data collected about it using technique \(t\) can be said to "have" type \(\tau_{t}\). This is not mathematically precise, but we can sort of cheat here since techniques and functions can never share the same domain — techniques have a domain that includes non-symbolic entities in the real world, whereas functions have domains that are purely symbolic. As a result, it is reasonable to use the type notation \(\tau_{t}\) for a cell to mean that the cell has had technique \(t\) applied to it, while also applying it in the context of data to mean that the data was the symbolic output of technique \(t\). Two sides of the same coin technique.

Formally, a cell \(c\) has type \(\tau_{t}\) iff it was the primary participant in a performance of technique \(t\). Data has type \(\tau_{t}\) iff it was the primary symbolic output of a performance of technique \(t\).

The meaning expands to cell \(c\) has data about it that is of type \(\tau_{t}\) (possibly hasMeasuredDataOfType). If we consider \(T\) to be a process that produces data of type \(\tau_{t}\) then this can be written as \(t\) → \(D_{\tau_{t}}\). That a cell has type \(\tau_{t}\) is usually incidental, an accident of happenstance, and thus not of particular interest scientifically1. However, practically speaking, knowing this type is critical in order to arrive at types that might be of scientific interest.

Specifically, type \(\tau_{t}\) delimits the set of functions that can be used to process the data to those in family \(\mathcal{F}_{\tau_{t}}\), as well as any functions that can be used to process any constituent types of data within \(\tau_{t}\). The constituent types would correspond (probably 1:1) to the outputs of techniques that are parts of \(t\). Thus for \(t_{s_{1..n}}\) sub techniques of \(t\) one might have n types of data \(\tau_{s_{1..n}}\) that compose \(\tau_{t}\) (e.g. this could be a list of objects each with different types).

A consequence of this is that if \(t\) represents the sum total of all measurements that can be made on a cell, then knowing \(\tau_{t}\) immediately delimits the number of types that can ever be asserted for the cell. It does not limited the number of types that can be inferred about the cell, but what we have come to call the experimental type of the cell is fixed and known. All further information relating types derived from \(\tau_{t}\) to types that are mutually exclusive with \(\tau_{t}\) (\(\tau_{\neg{t}}\) maybe?) must come from some heroic experiment that is able to overcome the practical limitations of \(t\).

Footnotes:

1

There are some cases where \(\tau_{t}\) is of scientific interest, but they are not usually about the cell itself, rather the scientific interest arises from the fact that different techniques that all attempt to measure similar things often have different biases and different types of systematic error. As a result, \(\tau_{t}\) nearly always needs to be accounted for when doing integrative analysis, if only so that an explicit factor can be added to account for its influence on the variability of the results

Date: 2021-02-05T10:37:25-08:00

Author: Tom Gillespie

Created: 2022-12-22 Thu 01:38

Validate