ADP - Machine Learning Automation
During my internship at ADP, I developed a Natural Language Processing (NLP) solution using a BERT model to automatically extract and organize information from bulk-imported documents. This project was one of ADP’s first machine learning-based solutions integrated into their product suite, showcasing my ability to innovate and deliver impactful solutions.
Key Elements
Natural Language Processing
Transformers
Tokenization Preparation of the Dataset
Training & Testing BERT Model
Containerizing via Kubernetes to Scale Training
Statistical Analysis of the Model's Performance


What I Learned
By leveraging an NLP model, BERT (Bidirectional Encoder Representations from Transformers), we can extract key information from bulk imported documents to categorize & sort any documents uploaded by ADP clients.
BERT can use its NER (Named Entity Recognition) ability to mark various types of entities in a text sequence to extract what we'd like. To do so we have to label our own dataset.
Tokenization is used to prepare the training and testing dataset.
We fine-tune the model by feeding it our custom dataset and apply MLM (Masked Language Modeling) technique to produce a NER capable model with high enough accuracy.
In our case, ~93%