3 min read

Dataset Design: Temporal Concurrency - What

## dataset-design-and-temporal-concurrency

  • δI : Information-carrying columns
  • δG : Grouping columns (categorical, descriptive)
  • δY : Measurements (e.g., purchase price, height, product ratings)
  • δT : Temporal columns to include dates and temporal hierarchies
  • δE : Record life-cycle tracking columns (for example, effective dates in slowly changing dimension parlance)

Welcome back!

In Part 1, we discussed the importance of giving proper treatment to understanding the who and when of a problem statement which provide the context within which a solution will be crafted. In this article, we’ll discuss the quantitative building blocks of the problem statement itself.

First, let's define the key measures and related metrics listed in the problem statement: (show)
  • \(\gamma_1\): Average length of stay \(\\\enspace\enspace\enspace\enspace\gamma(x)=\bar{x};\enspace x := \big(t_{i+1}-t_{i}\big)\)

  • \(\gamma_2\): Counts of lapses in medication adherence \(\\\enspace\enspace\enspace\enspace f(x)=\sum{x};\enspace x:=\Big\{\matrix{0, \text{No lapse}\\1, \text{Lapse}}\) (there’s more to this operation which will be covered in Part 3)

  • \(\gamma_3\): Cumulative count of lapses in medication adherence \(\\\enspace\enspace\enspace\enspace f(\gamma_2, j)=\sum_{i=1}^{j}{\gamma_2|j\le{i}}\) (yes, I have not defined \(j\): that will be covered in Part 3)

  • \(\gamma_{4}\): Number of unique members \(\\\enspace\enspace\enspace\enspace f(x)=\#x;\enspace x:=\text{the set of unduplicated member identifiers}\)

  • \(\gamma_5\): Total expenditures \(\\\enspace\enspace\enspace\enspace f(x)=\sum{x};\enspace x:=\text{cost}\)

Note that for each of metrics, a metric and a measure were defined. The measure is the content of the metric, while the metric operates on the measure.

For example, \(\gamma_1\)’s measure is days between dates and the metric is mean of {measure}.

Next, recall that the dimensions of the problem statement are the who and when of the problem statement. These were discussed in Part 1, so we won’t go into them here. Instead, I want to prepare you for the next article in this series that addresses the how of the problem statement.

Recall the relationship between metrics and measures, the latter being content that is operated on by the former. What they often have in common is being able to be functionally expressed.

Using “Average length of stay” as an example, \(\gamma_1\) can be written as follows:

The measure:

\(f(t_{i+1}, t_i) := t_{i+1}-t_i \Rightarrow \mathbb{F}\)

The metric \((\gamma_1)\):

\(g(\mathbb{F}, k) := k^{-1}{\sum_{j=1}^{k}{\mathbb{F}_j}}\\ \enspace\enspace \equiv k^{-1}{\sum_{j=1}^{k}{\big(t_{i+1}-t_{i}\big)_j}}\)

, where \(k\) indexes the number of observations.

Parameters \(t\) and \(k\) are influenced by Who and When. While the metrics define What to do the final item to address is How to apply the metrics to the inputs given Who is involved and When:

\(\Big\langle\) \(\gamma_1\), \(\gamma_2\), \(\gamma_3\), \(\gamma_4\), \(\gamma_5\) \(\Big\rangle\) in the context of \(\Big\langle\) \(W\), \(\omega_1\), \(\omega_2\), \(\omega_3\) \(\Big\rangle\) expressed \(\text{How}\)?

In Part 3, we’ll do just that 🙂. See you in ’24!

Until next time, I wish you much success in your journey as a data practitioner!
Life is data, but data is not life: analyze responsibly!