majortrack.tracker¶
This python package implements the evolutionary clustering algorithm presented in the paper by [LB19].
- LB19(1,2)
Jonas I Liechti and Sebastian Bonhoeffer. A time resolved clustering method revealing longterm structures and their short-term internal dynamics. arXiv preprint arXiv:1912.04261, 2019.
-
class
MajorTrack
(clusterings, history, **kwargs)[source]¶ Bases:
object
- Parameters
Sequence of clusterings.
If provided as a `dict`:
- keys: float, datetime
The time points.
- values: list, dict
The membership list of each clustering indicating to which cluster a data source belongs. See
memberships
for details.
history (int) – sets the number of time points (or slices) the algorithm can maximally go back in time to check for majority matches.
optional parameter (**kwargs) –
- timepoints: list
The time points of each clustering.
Note
If clusterings if of type dict then the keys will be used as time points and this optional parameter is ignored, even if provided.
- slice_widths: list, float (default=None)
The temporal duration of each snapshot in the sequence of clusterings. If not provided then simply the difference between time point i and i+1 is used as the width of slice i. The width of the last slice is assumed to be the same as the duration of the 2nd last slice.
- individuals: list
A list of all distinct data sources present in the dataset.
Todo
Build it from self.clusterings.
- group_matchup_method: str (default=’fraction’)
Set the method to calculate the similarity between two clusters from different clusterings. By default the fraction of identical members is used as explained in the original article [LB19].
- use_lazylists: bool (default=False)
Determine if
LazyList
’s should be used to store data about dynamic clusters or normal lists. Most likely you want to use normal lists.
-
clusters
¶ Ensemble of all dynamic clusters.
Todo
What’s the type of an element? Is it just an identifier?
- Type
list or LazyList
-
cluster_trace
¶ Ensemble of all tracing paths the dynamic clusters.
Todo
What’s the type of an element?
- Type
list or LazyList
-
group_matchup
¶ Holds for each time point the tracing and mapping sets of all clusters. Each element is a dict with the keys
'forward'
and'backward'
. Both hold a dict indicating for a cluster the best matching cluster along with the similarity score of the particular relation in a tuple.Example
self.group_matchup[1] = { 'backward': {0: (0, 1.0), ...}, # ^ ^ ^ # | | similarity score # | cluster from previous time point # cluster from current time point. }
- Type
-
group_mappings
¶ Holds for each slice a list of mapping sets. The list is ordered like
grougings
.Example
mt = MajorTrack(...) idx, cluster_id = 0, 1 # get the set of data sources in this cluster c_members = mt.clusterings[0][1] # get the corresponding mapping set (from index idx + 1) mapping_set = mt.group_mappings[0][1]
-
group_tracings
¶ Holds for each slice a list of tracing sets. The list is ordered like
grougings
.
-
group_mappers
¶ Holds for each slice a list of mapper sets. The list is ordered like
grougings
.
-
group_tracers
¶ Holds for each slice a list of tracer sets. The list is ordered like
grougings
.
-
comm_group_members
¶ Todo
Unsure about this.
- Type
?
-
comm_members
¶ holding for each slice of the dataset a dictionary indicating for each cluster (key) a list of data sources (values).
Todo
Rename to dc_members
-
individual_group_membership
¶ holding for each slice of the dataset a dictionary indicating for a data source the cluster it belongs to.
-
individual_membership
¶ holding for each slice of the dataset a dictionary indicating for a data source the dynamic cluster it belongs to.
-
community_births
¶ holding all dynamic cluster birth events.
Todo
Check and report format of this attribute.
-
community_deaths
¶ A list holding all dynamic cluster death events.
Todo
Check and report format of this attribute.
-
community_lifespans
¶ providing for each dynamic cluster the lifespan in the unit slices:
{comm_id: nbr_slices}
- Type
-
community_cby_split_merges
¶ dynamic clusters that occurred through a split-merge event.
-
community_dby_split_merges
¶ dynamic clusters that vanished after a split-merge event.
-
community_growths
¶ reports all growth events, i.e. changes in the size of a dynamic cluster that are not related to split or merge events.
-
community_shrinkages
¶ reports all shrinkage events, i.e. decreases in the size of a dynamic cluster that are not related to split or merge events.
-
community_autocorrs
¶ hold for each dynamic cluster a dictionary with the auto-correlation (value) between the index of a slice (key) and the previous slice. The autocorrelation is given by:
\[\frac{|dc_{i} \cap dc_{j}|}{|dc_{i} \cup dc_{j}|_{res}}\]where \(i, j\) are the indices from_dix and to_idx and \(|<selection>|_{res}\) is the number of data sources within <selection> counting all data sources (if residents=False) or only those present in both slices (residents=True).
-
combined_population
(idx_prev=None, idx_next=None, *args, **kwargs)[source]¶ Returns combination of the populations of two (or more) time points.
This is simply the union of the populations at both time points. If not arguments are provided then an iterator is returned that gets the set of combined individuals between each slice.
If only one index is provided then the other one will be completed, i.e. idx_prev = idx_next - 1 or idx_next = idx_prev + 1
If further arguments are provided (all have to be unnamed), then the union is taken between all of these time points.
Example
self.resident_population(2,4,5)
This will return the combined population of the time points 2, 4 and 5.
- Parameters
idx_prev (int (default=None)) – index of the 1st time point to get the population from.
idx_next (int (default=None)) –
index of the 2nd time point to get the population from.
Note
If both idx_prev and idx_next are None then a pairwise iterator is returned that allows to loop over the combined population of neighbouring time points.
-
get_alluvialdiagram
(axes, iterator=None, cluster_width=datetime.timedelta(1), *args, **kwargs)[source]¶ Takes a matplotlib axes and draws an alluvialdiagram on it. iterator is the iterator s to draw the clusters for. If `iterator is not provided, then the alluvialdiagram will contain all the clustrings in the time series.
- Parameters
kwargs –
- cluster_location: ‘center’, ‘start, ‘end’ location withing the
aggregation time window where the cluster should be put. Default is ‘center’
cluster_width: with of the clusters
cluster_label: None (default), ‘groupsize’, ‘group_index’
merged_edgecolor: edgecolor of merged groups
merged_facecolor: facecolor of merged groups
merged_linewidth: linewidth of merged groups
- cluster_facecolor: either single color or dict with idx as keys
holding a dict with group_id as key
- cluster_edgecolor: either single color or dict with idx as keys
holding a dict with group_id as key
- flux_facecolor: either single color or dict with idx as keys
holding a dict with cluster tuple as key, with the first element a group id form time step idx and the second a group id form time step idx+1
new_coloring: False
- distinct_colors: can be an instance of DistinctColors that will
be used for the coloring
-
get_auto_corrs
(residents=True)[source]¶ Get the auto-correlation between any two consecutive slices.
This method computes for all dynamic clusters the auto-correlation between any two consecutive slices, if the dynamic community exists in both. If residents==True, then only the individuals present in both time points are considered.
-
get_community_avg_lifespan
(mode='ensemble')[source]¶ Determines the lifespans of all dynamic clusters.
- Parameters
mode (str (default='ensemble')) – Determines what type of average should be computed. Possible are either ensemble (default) or ‘weighted_per_indiv_per_slice’. The ensemble average simply consists of the arithmetic mean of all lifespans. The weighted_per_indiv_per_slice yields the average value of the life span of a dynamic cluster a randomly picked data source belongs to during at randomly picked slice.
- Returns
avg_dc_lifespan – the average number of slices a dynamic cluster exists.
- Return type
-
get_community_births
()[source]¶ Determines all birth events.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.community_births
- Return type
-
get_community_deaths
()[source]¶ Determines all dynamic community death events.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.community_deaths
- Return type
-
get_community_group_membership
()[source]¶ Defines per timepoint a list of clusters belonging to a dynamic cluster
- Returns
None – Adds new attributes:
attr:~.MajorTrack.comm_group_members
attr:~.MajorTrack.comm_all
attr:~.MajorTrack.comm_nbr
- Return type
-
get_community_growths
()[source]¶ - Returns
None – Adds new attributes:
attr:~.MajorTrack.community_growths
- Return type
-
get_community_lifespans
()[source]¶ Determines the lifespans of all dynamic clusters.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.community_lifespans
- Return type
-
get_community_membership
()[source]¶ Defines for each time point a membership list of data sources for each existing dynamic cluster.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.comm_members
- Return type
-
get_community_merges
()[source]¶ Get all merge events and determine what clusters arise through pure merge events.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.community_merges
attr:~.MajorTrack.community_cby_merges
attr:~.MajorTrack.community_dby_merges
- Return type
-
get_community_shrinkages
()[source]¶ - Returns
None – Adds new attributes:
attr:~.MajorTrack.community_shrinkages
- Return type
-
get_community_splits
()[source]¶ Get all split events and determine what clusters arise through a pure split event, i.e. not a split-merge combination.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.community_splits
attr:~.MajorTrack.community_cby_splits
attr:~.MajorTrack.community_cby_split_merges
attr:~.MajorTrack.community_dby_splits
attr:~.MajorTrack.community_dby_split_merges
- Return type
-
get_dcs
(bijective_paths=True, **kwargs)[source]¶ Derives from the history of dynamic clusters from
group_matchup
.Todo
Rename to get_dc
- Parameters
bijective_paths (bool (default=True)) – If set to True then at each step in the construction of the tracing flow a mapping flow needs to map forward to the target cluster in order to continue to extend the tracing flow.
optional parameter (**kwargs) –
- from_idx: int
starting index.
Note
At the starting index all clusters are per definition new dynamic clusters.
- to_idx: int
Stopping index. The community detection algorithm will stop at this index (including it).
-
get_flow
(idx, source_set, bwd=True, max_dist=None, **kwargs)[source]¶ - Parameters
idx (int) – time series index defining the starting point
source_set (set) – set of clusters at the starting point slice.
bwd (bool (default=True)) – indicating the direction, True is backward, False forward.
max_dist (int (default=None)) – set the maximal length of the flow.
optional parameter (**kwargs) –
- majority: bool (default=True)
allows to specify if of only the majority should be used to move between time-points.
- validate_path: function (default=:meth:~.MajorTrack._from_flow
Provide a validation method to use during the construction of a flow.
- Returns
flow – identity flow starting (including) from the source set.
- Return type
-
get_group_matchup
(matchup_method=None)[source]¶ Determine majority relation between neighbouring snapshots.
- Parameters
matchup_method (str (default=None)) – If provided this overwrites
group_matchup_method
. It determines the method to use when calculating similarities between clusters from neighbouring snapshots.- Returns
self – with new attribute
group_matchup
.- Return type
-
get_individual_group_membership
()[source]¶ Defines for each time point a dict holding for each data source its cluster membership.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.individual_group_membership
- Return type
-
get_individual_membership
()[source]¶ Defines for each time point a dict holding for each data source its dynamic cluster membership.
- Returns
None – Adds new attributes:
attr:~.MajorTrack.individual_membership
- Return type
-
get_marginal_flows
(idx, included_flows)[source]¶ Determines the ensemble of marginal clusters given a target cluster and its identity flow.
-
get_span
(idx, span_set, get_indivs=True)[source]¶ Create the tracer tree.
- Parameters
idx (int) – index of the slice in which to start.
If an int is provided it specifies the index of the target cluster. If a str is given, it is considered as the label of a data source and the containing cluster is selected.
Todo
The label of a cluster should be the only option.
get_indivs (bool (default=True)) – If set to True a list of sets of individual is returned for each slice starting from the index. If it is set to False a list of cluster labels is returned for each slice.
-
resident_fraction
(idx_prev=None, idx_next=None, *args)[source]¶ Get the fraction of the combined population of tow (or more) slices.
This indicates the population fraction that is present at all time points (or slices).
This is simply the size of the intersection divided by the size of the union of the populations If further arguments are provided (all have to be unnamed), then the resident fraction is computed between all of these time points.
- Parameters
- Returns
resident_fraction – indicating the fraction of the population of data sources (union of all) that is present in all slices. If no values for the parameters idx_prev and idx_next are provided this method returns an iterator that will yield the fraction of the resident population between any two consecutive slices.
- Return type
float, iterator
-
resident_population
(idx_prev=None, idx_next=None, *args, **kwargs)[source]¶ Return the resident population between two time points.
The resident population is simply the intersect of the populations at both time points.
If not arguments are provided then an iterator is returned that gets the set of resident individuals between each slice.
If only one index is provided then the other one will be completed, i.e. idx_prev = idx_next - 1 or idx_next = idx_prev + 1
If further arguments are provided (all have to be unnamed), then the intersect is taken between all of these time points.
Example:
self.resident_population(2,4,5) will return the resident population between the time points 2, 4 and 5