The South Big Data Hub Community Engagement and Partnerships Working Group has initially identified use cases as an essential way to educate and help new entrants to data science or HPC navigate the decisions surrounding which cyber-infrastructure tools are appropriate for a given project. The working group seeks to initially collect tangible use cases from different domains, forms of cyber-infrastructure, and software tools.
Launching V.I.T.A.L DATA SERIES(Video,Image, Text, Audio, and Learning Data)
The VITAL series, created by the Community Engagement Group and concerns the analysis of Video, Image, Text, Audio, and Learning(VITAL) data in multiple contexts. In it we will be exploring use cases and techniques across disciplines in each of these fundamental types of data. This is an effort to break down barriers across disciplines and bridge solutions across sectors including industry, academia, government, and non-profits. The VITAL series is being developed by the Community Engagement working group. To participate or become a speaker please contact: Dr. Renata Rawlings-Goss Co-Executive Director of the South Big Data Hub at firstname.lastname@example.org @Renata Rawlings-Gossand cc: email@example.com
V.I.T.A.L Series: Machine Learning Data Analysis | Thursday, February 16th, 1:00-2:00 pm EST
Join the last open panel discussion in the VITAL data series if you are a researcher or company working on machine learning in any sector. We will be discussing machine learning, deep learning, spatial-temporal data time-series data, and more.
Dr. Polo Chau Machine Learning Tools: blend techniques from machine learning(Belief Propagation), data mining(anomaly detection), visualization and user interaction
Dr. Le SongEmbedding Graphical Models with Applications to Recommendation Systems, Knowledge Reasoning and Materials Science
Bio: Polo Chau
Dr. Polo Chau is an Assistant Professor at Georgia Tech’s School of Computational Science and Engineering, and an Associate Director of the MS Analytics program. He holds a PhD in Machine Learning and a Masters in human-computer interaction(HCI). His PhD thesis won Carnegie Mellon’s Computer Science Dissertation Award, Honorable Mention. His research group at Georgia Tech bridges data mining and HCI-- innovates at their intersection-- to synthesize scalable, interactive tools that help people understand and interact with big data. He leads the popular annual IDEA workshop that catalyzes cross-pollination across HCI and data mining. He served as general chair for ACM IUI 2015, and is a steering committee member of the conference. His research group have created novel detection technologies for malware(patented with Symantec, protects 120M+ people), auction fraud(WSJ, CNN, MSN), comment spam(patented & deployed with Yahoo), fake reviews(SDM’14 Best Student Paper), insider trading(SEC), unauthorized mobile device access(Wired, Engadget); and fire risk prediction(KDD’16 Best Student Paper, runner up). He received faculty awards from Google, Yahoo, and LexisNexis. He also received the Raytheon Faculty Fellowship, Edenfield Faculty Fellowship, Outstanding Junior Faculty Award. He is the only two-time Symantec fellow.
At the Polo Club of Data Science, we innovate at the intersection of data mining and human-computer interaction(HCI), combining the best fom both worlds to synthesize scalable interactive tools for making sense of billion-scale graph data. I will present some of our latest systems:
(1) Visage: an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code(e.g., finding money laundering rings of bankers and business owners).
(2) Facets & Apolo: combine machine inference and visualization to guide the user to interactively explore large graphs. The user gives examples of relevant nodes, and the systems recommend interesting and surprising areas the user may want to see next.
Bio: Le Song
Le Song is an assistant professor in the Department of Computational Science and Engineering, College of Computing, Georgia Institute of Technology. He received his Ph.D. in Machine Learning from University of Sydney and NICTA in 2008, and then conducted his post-doctoral research in the Department of Machine Learning, Carnegie Mellon University, between 2008 and 2011. Before he joined Georgia Institute of Technology in 2011, he was a research scientist at Google briefly. His principal research direction is machine learning, especially kernel methods and probabilistic graphical models for large scale and complex problems, arising from artificial intelligence, network analysis, computational biology and other interdisciplinary domains. He is the recipient of the Recsys’16 Deep Learning Workshop Best Paper Award, AISTATS'16 Best Student Paper Award, IPDPS'15 Best Paper Award, NSF CAREER Award’14, NIPS’13 Outstanding Paper Award, and ICML’10 Best Paper Award. He has also served as the area chair or senior program committee for many leading machine learning and AI conferences such as ICML, NIPS, AISTATS and AAAI, and the action editor for JMLR.
Structured data, such as sequences, trees, graphs and hypergraphs, are prevalent in a number of real world applications such as social network analysis, recommendation systems and knowledge base reasoning. The availability of large amount of such structured data has posed great challenges for the machine learning community. How to represent such data to capture their similarities or differences? How to learn predictive models from a large amount of such data, and efficiently? How to learn to generate structured data de novo given certain desired properties?
In this talk, I will present a structure embedding framework(Structure2Vec), an effective and scalable approach for representing structured data based on the idea of embedding latent variable models into a feature space, and learning such feature space using discriminative information. Interestingly, Structure2Vec extracts features by performing a sequence of nested nonlinear operations in a way similar to graphical model inference procedures, such as mean field(or convolution over graph) and belief propagation. In large scale applications involving materials design, recommendation system and knowledge reasoning, Structure2Vec consistently produces the-state-of-the-art predictive performance. In some cases, Structure2Vec is able to produces a more accurate model yet being 10,000 times smaller.
Notes:(Feel free to add questions, links or ideas below)
Polo Chau Presentation-Analysis of Big Data as well as small data to understand the challenges. Although, we have access to a large amount of data the question remains how do we understand all the data. The approach we use is by interacting Data mining with Human-computer interaction. both develop methods fro understanding data and we work towards combine both to get the best of both worlds. Scalable and interactive tool.
Human- In0The-Loop- Graph Analysis
Detecting Fake Yelp Reviews: 6 + common venues within days
But, is it easy to build graph queries?
Query formulation is an incremental process.
VISAGE uses graph autocomplete which helps prevent over specification
Compare VISAGE(faster) vs CYPHER
Leading to finding mroe relevant nodes:Apolo(Machine learning + interactive Vis)