Traditionally CNN is popular is for identifying objects inside images. It can also be extended for text classification with the help of word embeddings. CNN has been found effective for text in search query retrieval, sentence modelling and other traditional NLP (Natural Language Processing) tasks.
Once an image is converted to vectorized representation or text is converted to embedding, it looks similar to machine as shown in picture below. In case of image each cell in the represents raw intensity of specific channel whereas in case of text, each row of table represents a word.
Just like in traditional CNN, lower level layers helps in identifying edges, parts of bigger objects and successive layers identifies objects, in case of text classification, lower layer tried to find association between words where as higher layer tries to find association between group of words. These groups can be sentences, paragraphs or smaller subgroups.
A typical process for classifying a news dataset require multiple steps outlined below.
Here we have taken example of news dataset with predefined categories like sports, politics, crime, technology etc.
First we download glove embedding from this site. These files contain mapping for each word to 100 dimension vector also known as embedding. These embeddings are derived based on probability of coocurreneces between words. We read these embeddings into a python dictionary for look up at later point
2. Download news training dataset from this site. The dataset contains several folder with folder name as categories and under each category, there are text files for corresponding news. We create feature dataset from this set. We convert each text file into fixed length (1000) sequence of words by padding.
3. We create embedding layer that contains mapping for words into glove vectors
4. We then create CNN network for out training.
A typical convolution layer network architecture has multiple layers as shown below.
CNN is supervised ML algorithm. The training set is first converted to word embeddings using glove embeddings. We first pass it through a series of convolution and pooling layer to extract lower levels features first and then learn higher level features from lower level features.
A typical 1D convolution operation for text looks like this. Here we choose kernel (2x5 table in yellow) or filter of size 2 words and stride of 1. We then multiply first 2 rows of word embedding matrix (Words I and Like) element wise and then take sum. So 0.2*0.6 + 0.1*0.5 +….+ 0.5*0.1+0.1*0.1=0.51.
We then move to next 2 words (Like and this) and repeat the process. So this would generate a vector of size 7*1 in this example. We can apply many such kernels.
In our training first we split input dataset into 80% training and 20% validation set. We feed input dataset for training to CNN. we are going to apply 128 such filters on training dataset. Each filter will be applied to 5 words at a time. After convolution the output is passed through RELU activation layer to remove negative samples and keep only positive samples. Output of RELU is passed through max pooling layer to retain most important information.
We then pass the output through a dropout layer to prevent overfitting. We then pass it through another set fo 1D convolution, RELU and max pooling.
Finally the last layer in CNN is typically feed forward neural network that learns to map the pooling function output to output categories in terms of softmax probabilities and adjusts weights to reduce errors using rmsprop.
Now that our network architecture is up, we train the model for 15 epochs and measure its performance over validation set.
Here is evaluation graph. Our F1 score is about 75% after 15 epochs.
We can then try few samples to test our model.
The model looks pretty good at this point.
If you would like to learn news classification from scratch with video tutorials from scratch without any prior knowledge of glove embedding or CNN, please sign up for my following course
About Author Evergreen Technologies:
Active in teaching online courses in Computer vision , Natural Language Processing and SaaS system developmentOver 20 years of experience in fortune 500 companiesBlog: https://www.evergreentech.onlineYoutube Channel: https://www.youtube.com/channel/UCPyeQQp4CfprsybVr8rgBlQ?view_as=subscriber
•Linked in: @evergreenllc2020
Over 22,000 students in 145 countries