Deep Learning Technology Using Apache MXNet on AWS
Over the past few years, the re-emergence of deep learning and advancements in computational capabilities has led to massive improvements in performance for various machine learning tasks. Particularly in the field of natural language processing, industry advancements in transfer learning and dialogue management have enabled us at Finn AI to sit at the forefront of conversational AI through the development and public deployment of virtual banking assistants.
At Finn AI, we’re focused on gaining deep domain knowledge in retail banking. This means mapping out hundreds of intents and entities specific to banking and training our model to effectively understand what bank customers are asking and looking for.
In our early days, we used the Apache Spark framework to train our models. However, as we reached a plateau in performance, we knew we had to examine the use of deep learning as an alternative approach in order to infer more from our rapidly growing natural language dataset. At the time, development state of the Spark framework did not include deep learning pipelines.
As a result, we decided to search for a new deep learning framework that could better support our needs. During our performance evaluation between Apache MXNet, TensorFlow and Pytorch (beta), we found that MXNet’s larger datasets outperformed the others. They also offered the flexibility of imperative and symbolic programming, thus leveraging static and dynamic computational graphs. The imperative approach is very useful for debugging and has dramatically helped us iterate on experiments to build prototypes faster. Meanwhile, the symbolic approach is very efficient in terms of memory and optimization which comes in handy with production deployments.
Aside from that, we were already using AWS, so migrating to MXNet was the natural choice. Along with its large selection of supported language APIs and best in class scaling with multiple GPUs, the flexibility of the MXNet framework has allowed us to successfully launch to production with major global financial institutions, including Bank of Montreal (BMO), ATB Financial and Banpro.
Understanding Data from The Banking Domain
There are thousands of unique use cases, responses and feature functionalities within a bank. In order to effectively understand and cover such a large scope, we use several models for intent recognition, entity recognition and dialogue management. In this post, we will discuss how we classify intents using a supervised model to confidently respond to a user, and how MXNet supports our training with millions of banking-specific data points.
In this example, a bank customer asks the chatbot, “what’s the balance of my checking account?”. Our core model at Finn AI consists of two main machine learning models.
First is the intent recognition model where we classify the input text into a category. When a user asks for their account balance, we detect that this utterance corresponds to the “i.fai.getbalance” intent.
Next is the entity recognition model which looks for named entities in the input text. The model is able to extract the intent “checking account” as an account name and intelligently output an appropriate response to the user.
With the exception of some custom intents that are treated separately, our core model covers the basics for what a bank customer would ask. In order to improve accuracy and expand out-of-scope queries, our model built with MXNet supports continuous training by feeding the model with real world utterances redacted from our live production deployments around the world.
Tokenizing Data Through Preprocessing
To prepare NLP data for training, it must go through preprocessing, which will do three things:
- Tokenize the text
- Pad or slice the text
- Convert each token to an index mapping
Taking the sentence, “what is the balance of my checking account?”, we first tokenize the utterance to [“what”, “is”, “the”, “balance”, “of”, “my”, “checking”, “account”]. Each word is represented as an integer [134 2090 45 622 32 345 433 60]. We use MXNet’s GluonNLP to easily integrate with tokenizers such as Spacy and NLTK.
>>> tokenize = nlp.data.SpacyTokenizer(lang=‘en’)>>> tokenize(“What is the balance of my checking account?”)>>> [‘what’, ‘is’, ‘the’, ‘balance’, ‘of’, ‘my’, ‘checking’, ‘account’, ‘?’]Similarly,>>> tokenize = nlp.data.NLTKMosesTokenizer()
We apply various data augmentation techniques. In order to expand our vocabulary and range of utterances, we introduce random noise and errors to train the models. The noise helps the model generalize spelling errors without requiring strict adherence to spell checkers and protects against an overly formal representation of language to maintain a natural conversational flow.
>>> [Utterance, Intent] -> [char_duplicator(utterance, utterance_length), intent_name]>>> [Utterance, Intent] -> [char_deleter(utterance, utterance_length), intent_name]>>> [Utterance, Intent] -> [char_flipper(utterance, utterance_length), intent_name]>>> [Utterance, Intent] -> [char_typo(utterance, utterance_length, locale), intent_name]
However, despite the vast advancements in machine translation techniques, language translated from English is far from perfect, especially with languages that don’t follow the same protocols. We solve this problem with data augmentation by providing a more representative example of utterance data for non-English datasets which tend to be more formal and non-localized. This is especially prominent with languages which have evolved independently over time (i.e. Canadian French vs. European French).
We first employed a padding approach to keep sequence sizes consistent. However, the problem with this approach is that we either had to discard data beyond our maximum number of tokens or add padded data on shorter messages. This restricts our sequences to a certain length, resulting in the loss of potentially valuable information. At the same time, shorter sequences that required padded data were not very beneficial for our convolutions.
With the GluonNLP bucketing feature, we are able to feed variable-length inputs into our model and overcome the limitations of the padding approach. By grouping similar length samples into buckets and forming mini-batches, we end up having less padded samples and longer sequence lengths. As a result, we have more efficient iterations and reduced processing times, making the GluonNLP bucketing feature one of the most useful functions for Finn AI.
bucket_batch_sampler = nlp.data.sampler.FixedBucketSampler(training_data_lengths,bucket_scheme=bucket_scheme,batch_size=32,num_buckets=10,ratio=0,shuffle=True)
training_data_lengths: a list of sample lengths in your training data
bucket_scheme: defines how you want your bucket widths to vary in sizes — you can use Constant Width, Linear Width and Exponential Bucket Width schemes
ratio: a parameter which can be used to scale how to form larger batches out of smaller buckets
This drastically reduces our time to train the model and gives us the ability to respond to longer messages without having to discard any data.
In order to test our model and see how it performs in the real world, we run our model multiple times with different hyperparameters to find the optimal combination and model performance. We have our own hyperparameter tuning model that we use internally at Finn AI, but Amazon SageMaker also offers a Bayesian Optimization ‘built-to-go’ feature. Rather than having to create instances ourselves, Amazon SageMaker can spin-up an instance which helps with efficiencies.
Deploying Our Models to Production
Now that we have a trained model in place, the last step is to deploy it to production for our bank customers around the world. Within our AWS infrastructure, models are trained in the same Amazon Virtual Private Cloud (VPC) where the data resides.
Once it’s ready, we package our prediction server into Docker images for the corresponding trained model. Then we push them into customer production environment using Amazon Elastic Container Service, which allows for simple and flexible management of containers.
But the training cycle never ends at Finn AI. After the model is deployed into public, we continue to train and annotate new data and utterances that bank customers are asking. We are always adding new intent functionalities and optimizing the model’s ability to answer out-of-scope questions to maintain a positive customer experience.
Being able to analyze and evaluate the performance of our trained models is important for continuous improvement. To do so, we generate model reports that summarize an extensive set of analytics to determine the effect of a model on overall performance. This function specifically focuses on identifying which intents and entities may “clash” with one another and acts as a safeguard against ad-hoc intents being developed without thorough analysis. In addition, specification of confidence thresholds and the effect on F1, precision, recall, and overall performance helps us optimize each deployment.
As a result, we have a set of recommendations and actionable insights to help us optimize and make the best use of our models. For instance, we may discover from our reports that data labelling needs to be adjusted, or additional data is required to best detect some intents and entities. We make use of techniques such as t-distributed Stochastic Neighbor Embedding (t-SNE) to visualize our taxonomy, and a set of utterances on a two-dimensional graph in order to identify opportunities for new and hybrid intents and to help set priorities for relabelling.
Apache MXNet allows Finn AI to use the latest in deep learning technology, enabling us to deliver state-of-the art model performances and remain on the cutting-edge of conversational AI. With its flexible interface and large library of datasets, we’ve been able to successfully create beautiful conversational assistants for banking customers around the world.