Active Learning in ThreadSift

Active Learning Framework

High-level problem characterisation

\noindent \textbf{AG1:} Understanding the different characteristics of email threads in terms of the patterns of communication \\

\noindent \textbf{AG2:} Building computational models of thread characterisations that would help identify and categorise the various types of communication \\

\noindent \textbf{AG3:} Identifying threads, emails or individuals of interest utilising the understanding and modelling of email threads

Stage-1: User-centred Visualisation Design

The usual User-Centred Process

Overall Goal: Understand and model different communication patterns

You don’t exactly know the labels for these, i.e., what are the different patterns that exist

But you know the different conceptual aspects/characteristics, e.g., events (inclusion/exclusion) over time, changes in the set of people involved, the different temporal characteristics (e.g., send/receive frequency, overall length, etc.)

Outcomes from this stage:

A list of tasks and visualisation designs to address them

Stage-2: Visualisation-informed Feature Engineering

Feature engineering that is also informed by the characteristics of communication from Stage-0

Initial feature debugging with visualisation

Outcomes:

The features

Note: Stage-0 and Stage-1 are tightly interlinked. One goes through an iterative debugging phase where the features are used to see how effective (i.e., how good is VIS to visually distinguish the characteristic captured by the feature) the visualisation is and the visualisation is used to assess how good (i.e., how good is the FEATURE to capture the characteristic) the features are .

Stage-3: Initial Generation of Classes

It is OK to think of this stage as an unsupervised setting where there are no labels and the classes are not really known

The user can start exploratively by marking instances that “seem to be interesting” through an initial investigation of the features

An initial set of classes can be formed in this initial iteration

This process can be supported by an initial clustering of the space using an unsupervised approach (see the below screens) here in the Supplementary note-1

Stage-4: Active Learning a Classification/Clustering Model

Using ideas from active learning to improve the model

Sample good examples to get the user to label

NOTE: Unlike the classical active learning approach where the samples are chosen based on an heuristic, we take a visualisation centred approach and get the user to choose instances that they think might be good to label. Of course, we also need to support the user in this process, both through automatic guidance (show similar threads) and also building views to help the user in this process (the thread overview)

Supplementary notes

Supplementary note-0:

Some cornerstone ideas:

Visualisation-driven heuristic numeric metrics to characterise/explore the threads

These are the metrics that will enable us to categorise the threads

These are the metrics that will enable us to choose the “similar threads”

The exploration is what will lead to the initial labelling of the threads

We can list similar threads to help with the navigation process for choosing better

Over time, we can learn the threads’ characteristics by getting more and more labels from the users

What if the feature set is not sufficient — create more?

Some good ideas to help with sampling here — http://hunch.net/~active_learning/active_learning_icml09.pdf

Supplementary note-1:

User-uploaded image: activeLearningSlide.png

Supplementary note-2:

​​Active Learning Framework

​​High-level problem characterisation

​​Supplementary notes

​​Supplementary note-0:

Active Learning Framework

High-level problem characterisation

Supplementary notes

Supplementary note-0: