This post is a follow-up to my previous post on hedging, computational rhetoric, and the Wells Report, which details the results of a special investigation into the Jonathan Martin bullying scandal that has recently been the subject of media scrutiny. In that post, I attempted to analyze the Wells Report with a personally developed app called the Hedge-O-Matic. The Hedge-O-Matic uses Naive Bayes classification routines to tag sentences for their hedgey or not-so-hedgey rhetorical content.The Hedge-O-Matic is trained on sentences culled from 150+ academic, science articles. When tested on like genres, the Hedge-O-Matic generally proves 78-82% accurate using a 10-fold cross validation process (90% of the training set is used to determine the hedge or non-hedge quality of a randomly generated 10% of test data). In this post, I weigh the Hedge-O-Matic’s results against a hand-coded version of the Wells Report.
- After exported the Hedge-O-Matic results to a csv file, I created a separate column for my hand-coded input. Like the Hedge-O-Matic, I coded each sentence as a hedge or non_hedge.
- I then imported this csv in to the data processing library Pandas.
- I then compared the similarity between the Hedge-O-Matic’s tags and my own. If the fields were the same, the row would be tagged as “True;” otherwise, “False”
- I then calculated accuracy, precision, and recall scores with the “True”/”False” totals
Results and Discussion
Hedge-O-Matic Accuracy = 0.580289456
This is a far cry from the 78-82% accuracy that I am accustomed to seeing from the Hedge-O-Matic. In a sense, the app is doing little more than spitballing at the sentences.
This result is also to be expected given the nature of the training and test sets. The Hedge-O-Matic is tuned for academic, science articles. While formal, the Wells Report is written for a different audience and incorporates different conventions. Moreover, the Wells Report features numerous quotations of text messages, which is another genre by itself. At the present state of development, the Hedge-O-Matic has not been shown anything that resembles a text message, especially not the expletive-laden communications at the heart of the Martin harassment case.
I will also remind people of a problem that I discussed in the previous post: quotation boundaries. My study is relying on a classification output in which certain sentences escaped tokenization because punctuation was within a quotation. This is a remnant of test done on a literary text, in which many instances of quotations did not end the sentence. As a result, many of the tagged sentences were not single sentences, and this could have shifted the results. When adjusted for the tokenization error, the output appears thus:
Hedge-O-Matic Original Length: 1,451 sentences
Hedge-O-Matic Adjusted Length: 1,564 sentences
At this time, I have not run the adjusted sentences; however, with a 7.7% loss of sentences, we can expect some degradation of accuracy.
To provide more clarity on these accuracy figures, here are the tagged distributions of hedge/non-hedge sentences and the correctness of their predictions:
These numbers translate to the following precision, recall scores, and F1 scores:
I can attribute the low precision of hedge finding in the Wells Report to a number of factors:
- Borderline hedging/non-hedging with confidence scores around 0.5 are classed as hedges. In other words, when in doubt, the Hedge-O-Matic hedges its own bets by declaring a sentence a hedge.
- The hedging moves made in a legalistic document such as the Wells Report are different than those made in a scientific article, the most notable being reported speech. In many instances, the authors of the Wells Report will quote or recapitulate the sentiments of their interview subjects. Thus, while the interview subject may say something hedgey, the authors themselves are not hedging; they are describing with confidence what they have witnessed.
There are other more global limitations at work here as well. The most notable is that the training set for the Hedge-O-Matic has not been trained on enough linguistic variability to account for the rhetorical moves made in the Wells Report.
There is also a high degree of overfitting occurring because of the smallishness and regularity of the training set. Thus, words that are often markers of hedging in scientific articles (“however,” “can,” “believed,”) are biasing the classifier and predicting hedge sentences even though such words may be the bracketed within an instance of reported speech or paraphrase.
That said, I would contend that the low accuracy of the Hedge-O-Matic in the case of the Wells Report is actually a good result because it supports the pivotal assumption girding this computational rhetorics project–that different discourses feature different conventions that signal disciplinary and generic boundaries and that these boundaries can be traced by computers.