AI (Artificial Intelligence) has been roaring through every industry to this day and age. Whether it is self driving cars and other vehicles, whether it is predicting stock markets, AI has been used to automate repetitive process in every industry. Whether it is Finance, Law, Education, Healthcare, automotive, or any other industry, no industry has been left stranded of the Midas touch of AI. Science fiction writers have their favorite tool now. People have predicted so many amazing (or catastrophic) changes that will happen with the rise of AI.
While we are yet to create the self aware robots as seen in the movies, there are many processes that we have been able to automate. There are some mind blowing things even today that we can do with AI, especially with tools such as Machine Learning and Deep Learning. Automating the process of essay grading is one of such things. This thing opens so many doors for automation in education sector, where AI has not been able to make the level of impact as the other industries.
In schools or colleges, grading essays properly and impartially can be tedious work. Even with the lower grades, the grading of essays take a very large amount of time, time which can used in development of students, individual attention of students. There is another problem with the current grading process which often goes unnoticed is that it is very hard for a teacher to be grading every teacher with the same metrics. Impartiality is inevitable as human error comes into play. Although that does not seem like a very big problem, I reckon in 10-15 years, automated essay grading software will be just as accessible and necessary tool for teachers as a smartphone. We wanted to address that problem.
Our objective was to automate the essay grading process in such a way that the grading is done on a personalized level to the teacher. That is, Machine learns on the essays and grades provided by teacher and machine will grade the next essays which is closest to the grades that the teacher will give. So what that means is, teachers will get to maintain their grading styles while automating the process and saving them immense amount of time.
Understanding the process of grading:
Of course, one obvious thing that many data science and Machine Learning enthusiast fail to understand that ML is not just about models, libraries, rather it is least about that. If we want to automate the process using Machine Learning, we need to understand the process, identify the patterns, identify the features which affect the grading. Let us look at the major parts of the essays which are essential, generally to grade the essays:
1. Usage of words: This is most elementary and most essential part of essay of course. How the words are used, if there is spelling mistake in the words, how many words are used with the spelling mistake. Also, which word is chosen in the range of similar words. Why that word was chosen, whether the word was chosen to justify the context, or any other reason? Was same word was repeated in the essay or similar words are used in different forms of essays. All these things of course affect the grading of the essay.
2. Grammar and punctuation: This is also very important part of essay grading process. Is the grammar correct, is very important. Grammar itself depends on proper usage verbs. So phrases and clauses are the parts which contribute to the grading score. And of course, good punctuation is absolutely necessary for a good essay.
3. Keeping the theme: Keeping the sentences as to the topic and theme of the essay is very important feature as well. Of course it is harder to quantify this feature. But it is one of the important feature of course.
Now we need to fetch and quantify these features. That is where the power of word-embedding come in. One important thing to note here is we don’t need to match these quantified features with scores, we will just pass these numbers and model will do the connection itself. That is the power of deep learning. We just need to identify the features which affect the target, deep learning model will assign the appropriate words automatically.
What is word embedding?
Word embedding is a way to quantify words. What we do here is connect words to a bunch of numbers and then train them in such a way that words with same or similar meanings will have similar numbers. That is the numbers with lesser differences between them. Dimension of the embeddings is the quantity of numbers associated with one word. You can get complete understanding of word embeddings here. We used 300 dimensional embedding vectors (also known as glove) for the task. These glove vectors are added from open source pretrained vectors by Stanford. So we have a set of numbers defining each words.
Quantifying the features:
Now the task in hand is to quantify the features we had mentioned above. Let us go through them one by one.
1. Usage of words: This is easy with word embeddings, word embeddings are the quantified forms of the words, so the words which are tough enough and to the context will automatically get high scores. Of course, that will be the case if teachers prefer that part. Some teacher might prefer easier set of words. Model will learn based on the data it is provided.
2. Grammar and punctuation: Now this is the tricky part. What we do here is we use stopwords and punctuation data provided by NLTK library. And we fetch those stopwords in the given essay.
3. Keeping the theme/context: Theme/context here can be thought of the relation between different set of sentences. The closer the various sentences in an essay are, the more the essay stays in context. We measure closeness of the sentence by combining embeddings of each word in an sentence. Then, we create the vector for a sentence.
Then we work on the code and deployment after we have established the way to go and the understanding part. Stay tuned to know the challenges faced in coding and deployment and how we solved them.