VAST 2019 - Paper Skeleton

Titles alternatives:

  • Threadlet: Modelling Under-Specified Complex Patterns through Interactive Visualisation Supported Active Learning
  • Modelling Under-Specified Categories through Interactive Visualisation Assisted Active Learning

EuroVis 2019 -  Paper Skeleton

Title alternatives:

  • ThreadSift: Categorising Email Threads through an Visual Active Learning Framework
  • ThreadSift: Modelling Complex Patterns through a Visual Pseudo-Labelling based Active Learning Approach
  • ThreadSift: Characterising Complex Patterns through a Visual Pseudo-Labelling based Active Learning Approach
  • ThreadSift: Modelling Complex Patterns through an Interactive Visual Semi-Supervised Active Learning Approach
  • ThreadSift: A Visual Data Science Approach for Modelling Complex Patterns through Interactive and Pseudo-Labelling based Active Learning


  • ThreadSift: Modelling Complex Patterns through a Visual Pseudo-Labelling based Active Learning Approach

ThreadSift: Modelling Complex Patterns through a Pseudo-Labelling based Active Learning Visual Data Science Approach

ThreadSift: Modelling Complex Patterns through a Pseudo-Labelling based Active Learning 

ThreadSift: A Visual Data Science Approach for Modelling Complex Patterns through a Pseudo-Labelling based Active Learning Framework


Abstract

An often faced scenario in data science is one where analysts need to build models that categorise complex patterns in the data without any prior knowledge of the underlying categories. This very route from the unlabelled data records to ready-to-deploy models requires the involvement and guidance of a data analyst in the whole modelling pipeline -- from the discovery and characterisation of relevant categories, to the indication of representative members of these categories, and eventually to the generation of classification models. In this paper, we propose a visual data science approach to empower analysts in this process through an interactive visual, pseudo-labelling based active learning framework that supports the discovery, characterisation, and modelling of complex patterns. We present this framework through an applied case of classifying communication patterns in email threads where we take a human-centric approach in designing novel visualisations and in engineering data features as the basis of our visual analysis environment. We demonstrate the validity of our approach through a case study on email thread classification that is conducted together with experts, and through an evaluation of the generalisability of the generated models. We observe that our framework help experts in inferring relevant categories of communication, and in turning their observations into effective models that can then be used to classify threads at scale. 

We enable the human-guided modelling of data through 

employ an active learning method to 

within the segmentation of customers based on their transactions, classification of conversation threads based on communication characteristics, to name a few. 

This is akin to unsupervised learning 

In several domains where analyst are interested in identifying of complex data collections, 



In areas, such as eDiscovery, communication patterns carry valuable information for the categorisation of email threads.

Things to highlight/Contributions:
  • Novel interactive visualisation designs to support the identification of conversation types
  • Visualisation informed feature generation
  • A user-driven classification model building approach through interactive exploration of the feature space and active learning supported labelling
  • Active learning approach to support an efficient and robust labelling pipeline
  • Use of a suitable active learner model to close the loop in model building
  • Classification of threads with respect to their communication characteristics