Text mining Reuters, “of course”

Recently, @leoncrawl and @billhid broached the idea of tracking the use of the expression “of course” in journalistic writing. This is an interesting challenge because the phrase “of course” is an easy item to track computationally and, as a phrase, refers to a delimited set of rhetorical strategies. The phrase “of course” can indicate:

  • Agreement between interlocutors
  • Emphasis of a point
  • Anticipation of reader challenge/dissent

What is interesting about the 3 strategies above is that they all convey a pointed transaction between writer/speaker and audience.

In the first case, “of course” might signal a person’s assent or acknowledgement of a point made by another. “Of course” can stand in for “yes” or “I understand.”

In the second case, a writer/speaker may wish to stress something he/she feels is a manifest proposition or stress something he/she feels the reader will find manifest. While a writer/speaker belaboring the obvious might simply be redundancy, the strategy could also signal a writer/speaker’s attempt to construct an argument whose scaffolding requires the articulation of some type of commonplace. “Of course” could also act as a citation of a previous claim, which the writer/speaker has intensified with additional interpretation and/or evidence. In the more general case, the writer/speaker is making a request from the audience for a specific type of attention, i.e., a request to bear with the line of reasoning or recall former arguments.

In the third case, the writer/speaker anticipates audience objection to an argument. The use of “of course” functions as a qualifying device that attempts to temper audience reaction to controversial or fuzzy claims. For example, I am inclined to mention here that I am neglecting more fine-grained readings of “of course” because of the limited scope of this data analysis. To these ends, I might have said, “of course, these 3 uses of the phrase “of course” are generalizations and mainly serve to outline my interest in the linguistic phenomenon.”

All of this is to say that “of course” as a unit of meaning presents potential for understanding larger patterns of rhetoric within a text.

Data

In order to track the use of “of course” in journalistic texts, I resourced the Reuters Corpus offered in the Natural Language Toolkit. The Reuters Corpus consists of 10,788 news articles, divided into 90 categories. These articles found in category can overlap, so there is a potential for double counting when accessing textual data by category name.

Method

The Reuters Corpus is tokenized according to sentence and word. For example, a sentence in the Reuters Corpus would appear as ['ASIAN', 'EXPORTERS', 'FEAR', 'DAMAGE', 'FROM', 'U', '.', 'S', '.-', 'JAPAN', 'RIFT', 'Mounting', 'trade', 'friction', 'between', 'the', 'U', '.', 'S', '.', 'And', 'Japan', 'has', 'raised', 'fears', 'among', 'many', 'of', 'Asia', "'", 's', 'exporting', 'nations', 'that', 'the', 'row', 'could', 'inflict', 'far', '-', 'reaching', 'economic', 'damage', ',', 'businessmen', 'and', 'officials', 'said', '.']. As you can see, punctuation marks such as commas and periods are also tokenized.

As a result of this tokenization, I created a script that would search each sentence for the tokens “of” and “Of.” When a list of sentences containing “of” or “Of” was created, I extracted each sentence in which the token “course” followed “of” or “Of” in the list’s index.

The Reuters Corpus comprises of 54,922 sentences. Of these sentence, my script only located 15 unique sentences that contain the “of course” construction (0.0002731145988856924%). Such a low frequency may surprise some, but checking against the 84 sentence in which the token “course” appears, this is an accurate return. Moreover, given the rhetorical constraints of the phrase “of course” described above, we might expect that there would be more limited occasions for authorial nods to audience reaction then would be the use of “course” as a noun (e.g., “due course” or “course of action”).

The sentences and their associated categories can be found in the table below:

Reuters Category Sentences
trade [‘” But of course we \’ ve had reassuring signs from the Japanese for quite some time ,” he added .’, ‘” We are faithfully abiding by the … Agreement but of course there are some problems ,” a spokesman for the International Trade and Industry Ministry told Reuters .’, ‘” We are aiming to reduce production in Japan but of course this takes time ,” he said .’]
dlr [‘FED DATA SUGGEST NO CHANGE IN MONETARY POLICY New U . S . Banking data suggest the Federal Reserve is guiding monetary policy along a steady path and is not signalling any imminent change of course , economists said .’]
nat-gas [‘” Of course , to move forward with these kinds of options would require reopening tax issues settled last year ( in the tax reform bill ) — an approach which has not , in general , been favored by the administration .’]
dmk [‘” Bringing the franc close to the mark would , of course , have to be done step by step under the watchful eye of monetary policy ,” he told shareholders .’]
money-fx [‘” Bringing the franc close to the mark would , of course , have to be done step by step under the watchful eye of monetary policy ,” he told shareholders .’, ‘FED DATA SUGGEST NO CHANGE IN MONETARY POLICY New U . S . Banking data suggest the Federal Reserve is guiding monetary policy along a steady path and is not signalling any imminent change of course , economists said .’, ‘” Finally , of course , and there is no need to keep this quiet , the cut in interest rates was also in line with the changed economic situation of the last few months ,” he added .’]
crude [‘In Athens , Greek Prime Minister Papandreou said that if the Turkish vessel Sismik 1 began research operations ” we will hinder it , of course not with words , as it cannot be stopped with words .’, ‘Carrington said , ” I am of course anxious to help in any way I can , provided that both Greece and Turkey , and the other allies , wish me to do so .’, ‘Asked about the prospect for oil prices , he said : ” I think they will stabilise around 18 dlrs , although there is a little turbulence …” ” Of course the spot price will fluctuate , but the official price will remain at 18 dlrs ,” he added .’, ‘Asked about the prospect for oil prices , he said : ” I think they will stabilise around 18 dlrs , although there is a little turbulence …” ” Of course the spot price will fluctuate , but the official price will remain at 18 dlrs ,” he added .’, ‘” Of course , to move forward with these kinds of options would require reopening tax issues settled last year ( in the tax reform bill ) — an approach which has not , in general , been favored by the administration .’]
acq [‘TWA , of course , may refile it when it is able to comply with our procedural rules ,” the DOT said .’]
copper [‘Referring to Zambia \’ s preparations for a possible cut in economic links with South Africa , Kaunda told Reuters in an interview on March 1 , ” My main concern , of course , is the mines because whatever happens we must continue to run the mines .’]
interest [‘” Of course they are less profitable than other ( variable rate ) mortgages ,” said a spokesman for Midland Bank Plc , which earlier this year said it earmarked 500 mln dlrs for fixed rate new mortgage loans .’]
gnp [‘” But broad money will continue to be taken into account in assessing monetary conditions , as of course will the exchange rate ,” the Chancellor told Parliament .’]
ship [‘In Athens , Greek Prime Minister Papandreou said that if the Turkish vessel Sismik 1 began research operations ” we will hinder it , of course not with words , as it cannot be stopped with words .’, ‘Carrington said , ” I am of course anxious to help in any way I can , provided that both Greece and Turkey , and the other allies , wish me to do so .’]
money-supply [‘FED DATA SUGGEST NO CHANGE IN MONETARY POLICY New U . S . Banking data suggest the Federal Reserve is guiding monetary policy along a steady path and is not signalling any imminent change of course , economists said .’, ‘” Finally , of course , and there is no need to keep this quiet , the cut in interest rates was also in line with the changed economic situation of the last few months ,” he added .’, ‘” Of course , it will take the dollar into account in future policy decisions but if the economy is weak , it won \’ t pull back from easing .’]

*Note: because these sentences are grouped according to category, they are instances of repeated sentences. The sentences totals provided above have excluded the overlap.

What is interesting to note about the usage of “of course” is that they primarily occur in acts of reported speech–when the subject of a news report is providing a statement to an audience, and not when the writer is him/herself adding meta-commentary into the story.