I was recently inspired by the Journal of Digital Humanities issue on topic modeling in the humanities to do some topic modeling of my own. Much of my research has deliberately avoided this technique because I have been concerned with other matters of discourse, and honestly, I had more than enough natural language processing tasks to occupy myself with, not the least of which is the hand-coding and compiling of corpus of rhetorical moves.
MALLET, however, provides an easy method to import documents and create topic models based on latent Dirichlet allocation.
For this experiment, I processed 77 scientific articles from journal specializing in human evolutionary biology, climate studies, poultry science, and plant biology. I used MALLET’s default stopword list and generated 20 categories. I should note here that the science article files could be cleaner. Some artifacts of previous processing and analysis were present; however, because this is only an exploratory experiment in topic modeling, my concern over these idiosyncrasies is minimal.
Below, you will find a table that contains the topics and keys generated from these 77 scientific articles.
|0||0.03434||true kin fertility women living children residence age time marriage birth influence contraceptive child number journal significant virilocally effect|
|1||0.0308||model class cuii mhp female xala site ann models hypergyny probability homosexual values mlr females migration stratification binding societies|
|2||0.10804||al egg fed eggs diet hens diets breed higher age laying fatty feed birds meal observed acid kadaknath aseel|
|3||0.0375||local forest people land resources production households adaptation groups access livestock income policies areas gum drought trade collection government|
|4||0.04795||true litter al birds perfringens ice broiler production broilers treatment cake false flocks barrier flock chicks density pen content|
|5||0.01368||heels high female attractiveness walkers flat gait wearing shoes participants judgements females cv sd condition flexion attractive women walking|
|6||0.02839||reaction yield equiv table cl metal scheme temperature cs alcohol precipitation thumbnail catalyst mol image
article product phosphine entry
|7||0.08426||al immune response stress expression onac il cells corticosterone responses birds chickens innate dietary genes treatment rice cell system|
|8||0.04974||temperature hens gene embryonic incubation heat experiment al egg eggs early mortality feathered higher performance development dw ambient stress|
|9||0.06695||social support individual moralization learning trait individuals optimum mating payoff behavior artifact trial strategy friends eq population learner strategies|
|10||0.04691||cdm projects countries ldcs project seed supply coat cer cers nanoparticle protein demand cent energy poas potential eu scenarios|
|11||0.73786||effects high data important time increase conditions increased study results effect significant level studies similar higher factors table|
|12||0.0769||al bcn binding avt uv bacteria pituitary light birds chicken receptor cell bacterial crh neurohypophysis jejuni ct peptides genes|
|13||0.04168||women male men preferences faces masculinity cues wealth facial exposure high competition low sex female ratings images participants scents|
|14||0.0546||carbon environmental pes services climate disaster change energy development emissions adaptation service countries interventions drm local land reconstruction reduce|
|15||0.09328||adaptation climate change sustainable social development risk vulnerability state problematization policy knowledge poverty report practices neoliberal context discourse|
|16||0.02862||children reciprocity partner choice altruism games participants indirect previous cooperation age model public behavior contributions sex goods shared partners|
|17||0.08861||al propolis mc samples rev kg ml min study vaccine poultry mg virus concentration fowl performed reported pcr chicks|
|18||0.09039||animal welfare animals selection genetic al traits production breeding environment poultry species activity behavior ducks natural physiological genetics birds|
|19||0.03237||climate countries baseline finance al strains da salmonella developing funds resistance cent poultry level enteritidis sources additional oda global|
Read thematically, we can see that MALLET has arranged topics according to fairly cohesive sets of key words. Terms associated with climate are clustered together. Meanwhile, terms such as “egg” are clustered around “incubation” and “chicken.” The fact that more topics are pulled toward poultry science seem to reflect the fact that the documents contain more articles from poultry science.
The most notable finding to me seems to be Topic 11. This has been judged the most frequent topic in the corpus by a fair margin and involves what could be considered genre cues of scientific research: “data,” “results,” “significance,” “high importance.” These terms are generally, and generically, found in the abstract, introduction, and discussion section of scientific articles and cut across disciplines.