Deep Learning On Code with an Unbounded Vocabulary

Title:Deep Learning On Code with an Unbounded Vocabulary

Authors:Milan Cvitkovic, Badal Singh and Anima Anandkumar

Conference:MLP 2018

Tags:abstract syntax tree, ast augmented ast, augmented ast, control flow, deep learning, fixed vocabulary, graph neural network, graph neural networks, learning representation, machine learning, natural language processing, neural network, source code, unbounded vocabulary and variable naming task

Abstract:

A major challenge when using techniques from Natural Language Processing for supervised learning on computer program source code is that many words in code are neologisms. Reasoning over such an unbounded vocabulary is not something NLP methods are typically suited for. We introduce a deep model that contends with an unbounded vocabulary (at training or test time) by embedding new words as nodes in a graph as they are encountered and processing the graph with a Graph Neural Network.