Threadlet: Modelling Under-Specified Complex Patterns through Interactive Visualisation Supported Active Learning
Modelling Under-Specified Categories through Interactive Visualisation Assisted Active Learning
EuroVis 2019 - Paper Skeleton
Title alternatives:
ThreadSift: Categorising Email Threads through an Visual Active Learning Framework
ThreadSift: Modelling Complex Patterns through a Visual Pseudo-Labelling based Active Learning Approach
ThreadSift: Characterising Complex Patterns through a Visual Pseudo-Labelling based Active Learning Approach
ThreadSift: Modelling Complex Patterns through an Interactive Visual Semi-Supervised Active Learning Approach
ThreadSift: A Visual Data Science Approach for Modelling Complex Patterns through Interactive and Pseudo-Labelling based Active Learning
ThreadSift: Modelling Complex Patterns through a Visual Pseudo-Labelling based Active Learning Approach
ThreadSift: Modelling Complex Patterns through a Pseudo-Labelling based Active Learning Visual Data Science Approach
ThreadSift: Modelling Complex Patterns through a Pseudo-Labelling based Active Learning
ThreadSift: A Visual Data Science Approach for Modelling Complex Patterns through a Pseudo-Labelling based Active Learning Framework
Abstract
An often faced scenario in data science is one where analysts need to build models that categorise complex patterns in the data without any prior knowledge of the underlying categories. This very route from the unlabelled data records to ready-to-deploy models requires the involvement and guidance of a data analyst in the whole modelling pipeline-- from the discovery and characterisation of relevant categories, to the indication of representative members of these categories, and eventually to the generation of classification models. In this paper, we propose a visual data science approach to empower analysts in this process through an interactive visual, pseudo-labelling based active learning framework that supports the discovery, characterisation, and modelling of complex patterns. We present this framework through an applied case of classifying communication patterns in email threads where we take a human-centric approach in designing novel visualisations and in engineering data features as the basis of our visual analysis environment. We demonstrate the validity of our approach through a case study on email thread classification that is conducted together with experts, and through an evaluation of the generalisability of the generated models. We observe that our framework help experts in inferring relevant categories of communication, and in turning their observations into effective models that can then be used to classify threads at scale.
We enable the human-guided modelling of data through
employ an active learning method to
within the segmentation of customers based on their transactions, classification of conversation threads based on communication characteristics, to name a few.
This is akin to unsupervised learning
In several domains where analyst are interested in identifying of complex data collections,
In areas, such as eDiscovery, communication patterns carry valuable information for the categorisation of email threads.
Things to highlight/Contributions:
Novel interactive visualisation designs to support the identification of conversation types
Visualisation informed feature generation
A user-driven classification model building approach through interactive exploration of the feature space and active learning supported labelling
Active learning approach to support an efficient and robust labelling pipeline
Use of a suitable active learner model to close the loop in model building
Classification of threads with respect to their communication characteristics
Titles alternatives:
EuroVis 2019 - Paper Skeleton
Title alternatives:
Abstract