Concreteness has long been central to psychological theories of learning and thinking, and increasingly has practical applications to domains with prevalent natural language data, like advice and plan-making. However, the literature provides diffuse and competing definitions of concreteness in natural language. In this package, we codify simple guidelines for automated concreteness detection within and across domains, developed from a review of existing methods in the literature.
You can install the doc2concrete package directly, like so:
Here, we provide a single function, doc2concrete
, that
maps operationalize models of document-level concreteness based on a
survey of datasets in several domains, including advice. This package is
built as an accompaniment to Yeomans (2021), which reviews existing
linguistic concreteness models across several domains. This function
conducts two kinds of analyses, which can be selected using the
domain
argument.
First, we provide pre-trained models specifically tuned to measure concreteness in two open-ended goal pursuit domains - advice and plan-making. These were developed using supervised machine learning tools, and robustly outperform other domain-specific models. We trained the advice model across a range of datasets from lab and field settings (9 studies, 4,608 participants), and we trained the plan-making model from plans students wrote at the beginning of online classes (7 classes, 5,172 students). Our package implements the best- performing supervised models - the LASSO model with bag-of-ngrams and dictionary features - to calculate concreteness in a new set of in-domain texts.
Although it is not ideal, researchers may have to rely on a domain-general model if they are in an unfamiliar domain, or conducting exploratory work. In this case, our results suggest that the mTurk dictionary provides the most robust measure of concreteness across the domains we tested here (Brysbeart et al., 2014). We also found substantial variation in concreteness within and across domains, however we provide this open-domain model as a scaleable starting point for researchers interested in other domains. However, we highly recommend that researchers conduct deeper work to better understand their own domain-specific model of concreteness.
We offer a document-level implementation of the original Brysbaert dictionary, with some adjustments to the standard protocol. Previous practice commonly excluded documents with insufficient word counts, and produced skewed distributions (i.e. short documents had much higher variance). Instead, our package suggests smoothing, which calculates a weighted combination of each document’s raw score and the group average, with the weight proportional to document length. This smoothing somewhat improved the accuracy of the model for concreteness in advice, and in plan-making. We suspect there may be other gains from fine-tuning the standard dictionary approach (for example, varying the weights on words) that should be explored in future research.
[1] 3.493333 2.446667
[1] 2.916 1.670
We have included an example dataset, feedback_dat
, for
researchers to get a handle on the workflow. These data were collected
from Mechanical Turk workers, who were asked to think of a person in
their life to whom they could give feedback on a recent task. Then, they
were asked to write what feedback they would provide (Blunden, Green
& Gino, 2018). The written feedback was shown to 5-6 raters (also
mTurkers) who evaluated the specificity of the feedback, and the average
of these raters is offered as a ground truth measure of
concreteness.
data("feedback_dat")
cor.test(doc2concrete(feedback_dat$feedback,domain="open"),
feedback_dat$concrete)
Pearson's product-moment correlation
data: doc2concrete(feedback_dat$feedback, domain = "open") and feedback_dat$concrete
t = 1.922, df = 169, p-value = 0.05629
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.003900988 0.289964955
sample estimates:
cor
0.146257
Pearson's product-moment correlation
data: doc2concrete(feedback_dat$feedback, domain = "advice") and feedback_dat$concrete
t = 3.5387, df = 169, p-value = 0.0005194
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1171968 0.3970710
sample estimates:
cor
0.2626497
text1 text2
0.319223492 -0.008545974
text1 text2
0.3515253 -0.1800056
That’s it! Enjoy! And please reach out to us with any questions, concerns, bug reports, use cases, comments, or fun facts you might have.
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904-911.
Blunden, H., Green, P., & Gino, F. (2018). The Impersonal Touch: Improving Feedback- Giving with Interpersonal Distance. Academy of Management Proceedings, 2018(1).
Yeomans, M. (2021). A concrete example of construct construction in natural language. Organizational Behavior and Human Decision Processes, 162, 81-94.