Download PDFOpen PDF in browser

Deep Learning On Code with an Unbounded Vocabulary

EasyChair Preprint no. 466

11 pagesPublished: August 29, 2018

Abstract

A major challenge when using techniques from Natural Language Processing for supervised learning on computer program source code is that many words in code are neologisms. Reasoning over such an unbounded vocabulary is not something NLP methods are typically suited for. We introduce a deep model that contends with an unbounded vocabulary (at training or test time) by embedding new words as nodes in a graph as they are encountered and processing the graph with a Graph Neural Network.

Keyphrases: abstract syntax tree, ast augmented ast, augmented ast, control flow, deep learning, fixed vocabulary, Graph Neural Network, Graph Neural Networks, Learning Representation, machine learning, Natural Language Processing, neural network, source code, unbounded vocabulary, variable naming task

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:466,
  author = {Milan Cvitkovic and Badal Singh and Anima Anandkumar},
  title = {Deep Learning On Code with an Unbounded Vocabulary},
  howpublished = {EasyChair Preprint no. 466},
  doi = {10.29007/bc6w},
  year = {EasyChair, 2018}}
Download PDFOpen PDF in browser