Like many teenagers, I was into video and strategy games such as Chess, in addition competing in FIRST Robotics competitions (FRC). A few years ago, when Google released their AlphaGo AI Reinforcement Learning (RL) Algorithm that could beat the best Go player in the entire world, I was simply blown away by that. I was intrigued by the Deep Reinforcement Learning that was behind its breakthrough and wanted to know more about how it could be used to not only solve simple problems, like playing Atari 2600 games, but also performing complex locomotive tasks, such as grabbing and stacking delicate objects, performing surgery, or even just learning to walk.
I started off my ML journey a couple of summers ago when I took an introductory course in Machine Learning taught by Prof. Andrew Ng from Stanford University. From there, I began learning more about reinforcement learning mostly from reading books, researching blogs such as Reinforcement Learning: An Introduction, CS294 Deep Reinforcement Learning by John Schulman and Pieter Abbeel, watching lectures about RL from top professors such as CS229 Machine Learning - Lecture 16: Reinforcement Learning by Andrew Ng, and just trying out new things using the OpenAI Gym library. While much of the concepts initially flew over my head, by reading more and seeing different ways to explain the same concept, I was able to get a decent understanding of some basic reinforcement learning concepts. Before long, I had built my first Reinforcement Learning Algorithm, a 2-D robot which could balance a pole on a small block, the famous CartPole game!
This first success inspired me to continue exploring this field. I began reading on more refined reinforcement learning methods that were even faster and more efficient, such as Trust-Region Policy Optimization and Proximal Policy Optimization. I sought help from Prof. Alex Ihler from UCI to help me understand the nuances of the RL algorithms, and he was kind enough to explain the intuition behind those approaches. I also tried working with new environments, such as a 2-D driving environment, and eventually even got the algorithms to run in a simulated environment.
I was motivated to continue further, and began delving deeper into the math of these algorithms, which unsurprisingly, was often far too complex for me to understand in one go. But, after reading more blogs and published articles, and with the help of Prof. Ihler, I finally gained a decent understanding of the algorithms, and even published a paper in the Baltic Journal of Modern Computing in October 2018 (link: https://tinyurl.com/yymayq94).
Another area of AI which I grew to find interesting was AI ethics and how AI can be biased. The article that triggered my interest was one written on the image classification algorithms bias to race. I was intrigued by this slightly different perspective on AI, and especially
noted the problems of AI bias in natural language with respect to sequence learning models. I took a course to understand the inner workings of sequence learning models including RNNs, GRUs, LSTMs. This gave me the necessary background for the project that I’ve been working on for the past year to combat cyberbullying in online forums by building a fairer toxic comment detection algorithm. I will be presenting this at the 2019 RE-WORK AI for good summit on the Deep Dive Track.
As social media grows, toxic comments and cyberbullying continue to grow with it, especially toxic comments that use an identity term in a negative way, and according to research from UK nonprofit Ditch the Label, on average, 7 out of 10 teenagers are victims of cyberbullying. The algorithms that are supposed to be blocking these toxic comments often aren’t doing their job properly because they gain an inadvertent bias towards some of these identity terms (caused by a lack or an insufficient number of non-toxic comments with the identity terms in the training data) and flag them as toxic even when they aren’t.
Furthermore, these algorithms aren’t transparent, so users don’t know why a comment was flagged as toxic. According to a study from the National Center for Biotechnology Information, with online behavior, people are less likely to say something offensive if they know the effects of what they’re saying. Research has shown that toxic behavior is less intense when monitoring systems are in place.Therefore, an effective way to address this cyberbullying crisis would be to have algorithms that explain their classifications.
The other part of the problem with current algorithms relates to the bias of the algorithms. So why does the bias actually occur?
The main reason for this bias is a lack of understanding of context. For example, if a biased model sees an identity term used in a non-toxic sentence, it might incorrectly assume that it’s toxic because it can’t tell when the word is toxic and when it isn’t -- a result of a lack of contextual understanding. Prior methods to fix this bias have manually selected terms that the model could potentially be biased against and manually added non-toxic data so that the model can try to understand the context of these words a little bit better. However, the models weren’t transparent, they weren’t able to properly locate what terms led to the toxicity and where to address the bias when it occured in the dataset. This forced researchers to manually identify the areas of bias, which can be slow, inaccurate, and not scalable, leading to a large amount of unaddressed bias.
Therefore, the objective in this project was to develop a more scalable/automated way to de-bias toxic comment detection models by using more transparent algorithms. I used the Hierarchical Attention Network, which allowed me to determine which identity terms specifically the model was biased against. To fix this bias, I added non-toxic data that contained these identity terms back into a dataset by using a grid search to vary the meta-parameters to determine the optimal amount of non-toxic data to augment the dataset with.
The overall accuracy (AUC) on the test dataset was 98%, compared to one from a paper by Google AI on the same dataset, which got 0.95. In addition, the model became significantly less biased towards many identity terms. Before the debiasing procedure, the model identified the non-toxic sentence as toxic because it contained certain identity terms that may have caused it to consider that the sentence was toxic. However, after the debiasing procedure, the model is able to parse the context a little bit better, and is no longer as biased towards those terms.
Using the models from this experiment, I built a Chrome Extension for Google Docs (install here: https://tinyurl.com/y24cc4sb) called detoxifAI (website: www.detoxifAI.com) that detects toxic comments and highlights which words are toxic, allowing users to constructively improve their behavior online. I am currently in the process of rolling out a pilot to local schools in my area, whose principals and teachers have displayed keen interest for deploying detoxifAI in their schools.
My journey in AI and Deep Learning started off as a simple curiosity for building something as crazy as a walking robot, but has, over the years, turned into a sustainable interest that I’ll continue to have in the coming years. I look forward to not only learning about AI more in the coming years, but also using AI to solve complex, real-world problems.