\noindent \textbf{AG1:} Understanding the different characteristics of email threads in terms of the patterns of communication \\
\noindent \textbf{AG2:} Building computational models of thread characterisations that would help identify and categorise the various types of communication \\
\noindent \textbf{AG3:} Identifying threads, emails or individuals of interest utilising the understanding and modelling of email threads
Stage-1: User-centred Visualisation Design
The usual User-Centred Process
Overall Goal: Understand and model different communication patterns
You don’t exactly know the labels for these, i.e., what are the different patterns that exist
But you know the different conceptual aspects/characteristics, e.g., events(inclusion/exclusion) over time, changes in the set of people involved, the different temporal characteristics(e.g., send/receive frequency, overall length, etc.)
Outcomes from this stage:
A list of tasks and visualisation designs to address them
Feature engineering that is also informed by the characteristics of communication from Stage-0
Initial feature debugging with visualisation
Outcomes:
The features
Note: Stage-0 and Stage-1 are tightly interlinked. One goes through an iterative debugging phase where the features are used to see how effective(i.e., how good is VIS to visually distinguish the characteristic captured by the feature) the visualisation is and the visualisation is used to assess how good(i.e., how good is the FEATURE to capture the characteristic) the features are .
Stage-3: Initial Generation of Classes
It is OK to think of this stage as an unsupervised setting where there are no labels and the classes are not really known
The user can start exploratively by marking instances that“seem to be interesting” through an initial investigation of the features
An initial set of classes can be formed in this initial iteration
This process can be supported by an initial clustering of the space using an unsupervised approach(see the below screens) here in the Supplementary note-1
Stage-4: Active Learning a Classification/Clustering Model
Using ideas from active learning to improve the model
Sample good examples to get the user to label
NOTE: Unlike the classical active learning approach where the samples are chosen based on an heuristic, we take a visualisation centred approach and get the user to choose instances that they think might be good to label. Of course, we also need to support the user in this process, both through automatic guidance(show similar threads) and also building views to help the user in this process(the thread overview)
Supplementary notes
Supplementary note-0:
Some cornerstone ideas:
Visualisation-driven heuristic numeric metrics to characterise/explore the threads
These are the metrics that will enable us to categorise the threads
These are the metrics that will enable us to choose the“similar threads”
The exploration is what will lead to the initial labelling of the threads
We can list similar threads to help with the navigation process for choosing better
Over time, we can learn the threads’ characteristics by getting more and more labels from the users
What if the feature set is not sufficient — create more?
Active Learning Framework
High-level problem characterisation
Supplementary notes
Supplementary note-0: