Active Learning in ThreadSift

Active Learning Framework


High-level problem characterisation 

  • \noindent \textbf{AG1:} Understanding the different characteristics of email threads in terms of the patterns of communication \\
  • \noindent \textbf{AG2:} Building computational models of thread characterisations that would help identify and categorise the various types of communication \\
  • \noindent \textbf{AG3:} Identifying threads, emails or individuals of interest utilising the understanding and modelling of email threads

Stage-1: User-centred Visualisation Design
  • The usual User-Centred Process
  • Overall Goal: Understand and model different communication patterns 
  • You don’t exactly know the labels for these, i.e., what are the different patterns that exist
  • But you know the different conceptual aspects/characteristics, e.g., events (inclusion/exclusion) over time, changes in the set of people involved, the different temporal characteristics (e.g., send/receive frequency, overall length, etc.)
  • Outcomes from this stage:
  • A list of tasks and visualisation designs to address them

Stage-2: Visualisation-informed Feature Engineering 
  • Feature engineering that is also informed by the characteristics of communication from Stage-0
  • Initial feature debugging with visualisation
  • Outcomes:
  • The features
Note: Stage-0 and Stage-1 are tightly interlinked. One goes through an iterative debugging phase where the features are used to see how effective (i.e., how good is VIS to visually distinguish the characteristic captured by the feature) the visualisation is and the visualisation is used to assess how good (i.e., how good is the FEATURE to capture the characteristic) the features are . 

Stage-3: Initial Generation of Classes
  • It is OK to think of this stage as an unsupervised setting where there are no labels and the classes are not really known
  • The user can start exploratively by marking instances that “seem to be interesting” through an initial investigation of the features
  • An initial set of classes can be formed in this initial iteration
  • This process can be supported by an initial clustering of the space using an unsupervised approach (see the below screens) here in the Supplementary note-1

Stage-4: Active Learning a Classification/Clustering Model
  • Using ideas from active learning to improve the model
  • Sample good examples to get the user to label
  • NOTE: Unlike the classical active learning approach where the samples are chosen based on an heuristic, we take a visualisation centred approach and get the user to choose instances that they think might be good to label. Of course, we also need to support the user in this process, both through automatic guidance (show similar threads) and also building views to help the user in this process (the thread overview)

Supplementary notes

Supplementary note-0:

Some cornerstone ideas:
  • Visualisation-driven heuristic numeric metrics to characterise/explore the threads
  • These are the metrics that will enable us to categorise the threads
  • These are the metrics that will enable us to choose the “similar threads”
  • The exploration is what will lead to the initial labelling of the threads
  • We can list similar threads to help with the navigation process for choosing better
  • Over time, we can learn the threads’ characteristics by getting more and more labels from the users
  • What if the feature set is not sufficient — create more?

Supplementary note-1:


Supplementary note-2: