Like all survey-based analyses, the Price Sensitivity Meter analysis is usually based on data from a sample and not from the full target population. If the analysis should be helpful for making statements about the range of acceptable prices in the target population, it is necessary that the sample is a good representation of this target population. In an ideal world, we have a scenario where each member of the target population has the same likelihood of being included in the sample. In practice, this is often not the case - as participation in surveys is often voluntary and some people might refuse, selection of a surveying technique (for instance online interviewing excludes all members of the population that do not have internet access or use it very rarely) etc. A sample is hardly ever perfect, in particular when it comes to market research where cost and timing constraints come into play.
Weighting the survey data is one potential solution to dealing with the shortcomings from an imbalanced sample. This imbalance is sometimes a deliberate choice (for instance when using cluster sampling) and sometimes the result of systematic problems of the fieldwork. It is important to note that weighting is often a rather pragmatic solution to a problem, but not necessarily the best solution from a methodological standpoint. It can certainly help, but it is not the miracle cure that finds reliable insights in a crappy sample.
The pricesensitivitymeter package includes a function that
allows to use weighted survey data. This vignette explains the
use of this function. For a general overview of the main function for
unweighted data, please read the help of th psm_analysis()
function. The function that uses weighted data uses the survey
package to calculate the weights; please read its documentation if you
need to find out how to specify your sample design. The author of the
survey package has also published a very helpful book1 that
offers guidance on weighting in general and the R package in
particular.
This vignette will show a simple workflow with simulated (random) data.
The following paragraphs will show the benefits of weighting in case of a biased sample. Let’s consider the following example:
This is of course an extreme example because the same variable (gender) is directly linked to both the likelihood to be included in the sample and the price preference.
The R code below creates this example data set with the gender bias and creates the survey design object with the survey package (assuming simple random sampling and a population of 10k people).
set.seed(1976)
library(survey)
#> Warning: Paket 'survey' wurde unter R Version 4.2.3 erstellt
#> Lade nötiges Paket: grid
#> Lade nötiges Paket: Matrix
#> Lade nötiges Paket: survival
#>
#> Attache Paket: 'survey'
#> Das folgende Objekt ist maskiert 'package:graphics':
#>
#> dotchart
# Creating dataset with price acceptance and biased gender variable
input_data <- data.frame(tch = round(rnorm(n = 250, mean = 8, sd = 0.5), digits = 2),
ch = round(rnorm(n = 250, mean = 12, sd = 0.5), digits = 2),
ex = round(rnorm(n = 250, mean = 13, sd = 0.5), digits = 2),
tex = round(rnorm(n = 250, mean = 15, sd = 0.5), digits = 2),
gender = sample(x = c("male", "female"),
size = 250,
replace = TRUE,
prob = c(2/3, 1/3)))
# for women: increasing the price acceptance by +50%
input_data$tch[input_data$gender == "female"] <- input_data$tch[input_data$gender == "female"] * 1.5
input_data$ch[input_data$gender == "female"] <- input_data$ch[input_data$gender == "female"] * 1.5
input_data$ex[input_data$gender == "female"] <- input_data$ex[input_data$gender == "female"] * 1.5
input_data$tex[input_data$gender == "female"] <- input_data$tex[input_data$gender == "female"] * 1.5
# for survey design object: occurrence of each gender in the target population
# it's only one figure in this example because we assume that gender should be perfectly balanced.
# if this is not balanced in the population, we would need a vector with the number of occurrences in the population.
# the sum of the strata size across all strata gives the total population size
# (here: two strata with 5k each = 10k total population)
input_data$gender_pop <- 5000
# creating the survey design object for post-stratification based on gender
# we assume that the selection of respondents within each gender is biased and...
# only the gender balance in the sample is problematic
input_design <- survey::svydesign(ids = ~ 1, # no clusters
probs = NULL, # hence no cluster sampling probabilities,
strata = input_data$gender, # stratified by gender
fpc = input_data$gender_pop, # strata size in the population
data = input_data) # data object used as input
After those preparations, we can run the weighted Price Sensitivity
Meter Analysis using the psm_analysis_weighted()
function.
Its basic inputs are the names of the variables with the price
information (“too cheap”, “cheap”, “expensive” and “too expensive”) and
the survey design object that we have just created using the
svydesign()
function of the survey package. To
emphasize the difference to unweighted data, the input for this survey
design object is called “design” instead of “data”. Please note that the
psm_analysis_weighted()
function requires a “design” input
and cannot cope with just four vectors that contain the price
information.
The resulting object and the summary()
function use the
same template as the function for the unweighted PSM analysis.
output_weighted_psm <- psm_analysis_weighted(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
design = input_design)
summary(output_weighted_psm)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: 11.91 - 15.63
#> Indifference Price Point: 13.69
#> Optimal Price Point: 13.63
#>
#> ---
#> 228 cases with individual price preferences were analyzed (weighted data).
#> Total data set consists of 250 cases. Analysis was limited to cases with transitive price preferences.
#> (Removed: n = 22 / 9% of data)
If the sample is a perfect representation of the target population,
the results of the Price Sensitivity Meter analysis will be the same
regardless if the psm_analysis()
function or the
psm_analysis_weighted()
function is used - under the
condition that there are no respondents with intransitive price
preferences.
The code below demonstrates first what happens if there are indeed only respondents with transitive preferences. I will discuss the special case of respondents with intransitive price preferences afterwards.
Note that in this example women still have a higher price acceptance than men. The main difference to the example above is the fact that the gender balance in the sample is a perfect representation of the target population (as the data is simulated, I model the target population after the sample - which is obviously impossible in real life). It is important to keep this distinction in mind: Weighting can correct for differences of the likelihood to be included in the sample, but it cannot be used to understand if different strata/groups have a different price acceptance. In this example below, the price acceptance differs quite a lot across genders, but we would need sub-group analyses to understand this.
# Creating dataset with price acceptance and unbiased gender variable
input_data_2 <- data.frame(tch = round(rnorm(n = 250, mean = 4, sd = 0.5), digits = 2),
ch = round(rnorm(n = 250, mean = 8, sd = 0.5), digits = 2),
ex = round(rnorm(n = 250, mean = 12, sd = 0.5), digits = 2),
tex = round(rnorm(n = 250, mean = 16, sd = 0.5), digits = 2),
gender = sample(x = c("male", "female"),
size = 250,
replace = TRUE,
prob = c(0.5, 0.5)))
# for women: increasing the price acceptance by +50%
input_data_2$tch[input_data_2$gender == "female"] <- input_data_2$tch[input_data_2$gender == "female"] * 1.5
input_data_2$ch[input_data_2$gender == "female"] <- input_data_2$ch[input_data_2$gender == "female"] * 1.5
input_data_2$ex[input_data_2$gender == "female"] <- input_data_2$ex[input_data_2$gender == "female"] * 1.5
input_data_2$tex[input_data_2$gender == "female"] <- input_data_2$tex[input_data_2$gender == "female"] * 1.5
# now let's create a sample design object (using the survey package)
# ... assuming that gender is balanced equally in the population of 10000
# for survey design object: occurrence of each gender in the target population
# would usually be information from sampling frame, differs here only for demonstration purposes
# here: scaling up based on actual sample information (hypothetical population of 250 * 4 = 10k)
input_data_2$gender_pop <- NA
input_data_2$gender_pop[input_data_2$gender == "female"] <- sum(input_data_2$gender == "female") * 40
input_data_2$gender_pop[input_data_2$gender == "male"] <- sum(input_data_2$gender == "male") * 40
# creating the survey design object for post-stratification based on gender
input_design_2 <- survey::svydesign(ids = ~ 1, # no clusters
probs = NULL, # hence no cluster sampling probabilities,
strata = input_data_2$gender, # stratified by gender
fpc = input_data_2$gender_pop, # strata size in the population
data = input_data_2) # data object used as input
# quick check: there is only one weight for all our strata
# if we would have different weights per gender, we would see two unique values here
unique(weights(input_design_2, type = "analysis"))
#> [1] 40
# running both weighted and unweighted analysis on the same data
check_weighted_1 <- psm_analysis_weighted(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
design = input_design_2)
check_unweighted_1 <- psm_analysis(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
data = input_data_2)
# results should be identical
summary(check_weighted_1)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: 7.17 - 16.82
#> Indifference Price Point: 12.06
#> Optimal Price Point: 8
#>
#> ---
#> 250 cases with individual price preferences were analyzed (weighted data).
summary(check_unweighted_1)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: 7.17 - 16.82
#> Indifference Price Point: 12.06
#> Optimal Price Point: 7.995
#>
#> ---
#> 250 cases with individual price preferences were analyzed (unweighted data).
As this example has shown, the results are identical when the weights are identical (meaning that our sample is a perfect representation of the target population) and the information on price acceptance is transitive for all respondents. Let’s now consider an example where we have intransitive price information for some respondents. For a few selected men in the sample, I manipulate the values so that the “cheap” price is larger than the “expensive” price.
input_data_3 <- input_data_2
manipulated_men <- sample(which(input_data_2$gender == "male"), 10)
input_data_3$ch[manipulated_men] <- input_data_3$tex[manipulated_men]
The survey design object is now re-created and used as an input for the weighted Price Sensitivity Meter analysis. When we compare the results again to an unweighted analysis, we see a slight difference. This is the case because the weighting occurs after removing cases with intransitive preferences (at the same time when the weighted empirical cumulative density functions are calculated). At this stage in our example, we still have all women from our original sample, but have removed a few men with intransitive preferences. Therefore, all remaining men have a slightly higher weight in our sample. This affects the results of our analysis.
# creating the survey design object for post-stratification based on gender
input_design_3 <- survey::svydesign(ids = ~ 1, # no clusters
probs = NULL, # hence no cluster sampling probabilities,
strata = input_data_3$gender, # stratified by gender
fpc = input_data_3$gender_pop, # strata size in the population
data = input_data_3) # data object used as input
check_weighted_2 <- psm_analysis_weighted(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
design = input_design_3)
check_unweighted_2 <- psm_analysis(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
data = input_data_3)
# results should be different now
summary(check_weighted_2)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: 7.22 - 17.09
#> Indifference Price Point: 12.08
#> Optimal Price Point: 8
#>
#> ---
#> 240 cases with individual price preferences were analyzed (weighted data).
#> Total data set consists of 250 cases. Analysis was limited to cases with transitive price preferences.
#> (Removed: n = 10 / 4% of data)
summary(check_unweighted_2)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: 7.22 - 17.09
#> Indifference Price Point: 12.075
#> Optimal Price Point: 7.995
#>
#> ---
#> 240 cases with individual price preferences were analyzed (unweighted data).
#> Total data set consists of 250 cases. Analysis was limited to cases with transitive price preferences.
#> (Removed: n = 10 / 4% of data)
The example above has been carefully chosen to show this effect. In real life examples, it only occurs when two conditions are fulfilled at the same time:
If any of those conditions is not fulfilled, any difference in the price estimates due to removing cases with intransitive preferences will be very small and unsystematic.
It is also possible to run a weighted Price Sensitivity Meter
analysis if there is no “too cheap” price (like for certain services
like a drawing account, access to a sports match, …). The
psm_analysis_weighted()
function handles this in the same
way as the psm_analysis()
function: The “toocheap” variable
should be set to NA for all cases. Having no “too cheap” price means
that the lower bound of the price range and the optimal price point
cannot be estimated, whereas the upper end of the price range and the
indifference price point are estimated.
# setting up data with NAs in "too cheap" variable
input_data_2 <- input_data
input_data_2$tch <- NA
# create new sample design
input_design_2 <- survey::svydesign(ids = ~ 1, # no clusters
probs = NULL, # hence no cluster samling probabilities,
strata = input_data_2$gender, # stratified by gender
fpc = input_data_2$gender_pop, # strata size in the population
data = input_data_2) # data object used as input
test_2 <- psm_analysis_weighted(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
design = input_design_2)
summary(test_2)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: NA - 15.63
#> Indifference Price Point: 13.69
#> Optimal Price Point: NA
#>
#> ---
#> 228 cases with individual price preferences were analyzed (weighted data).
#> Total data set consists of 250 cases. Analysis was limited to cases with transitive price preferences.
#> (Removed: n = 22 / 9% of data)
The psm_analysis_weighted()
function also allows to
include an estimate for the price with optimal reach or optimal revenue
via the Newton Miller Smith extension. This option can be used in the
same way as in the unweighted psm_analysis()
function.
# setting up dataset with purchase intent information
input_data_3 <- input_data
input_data_3$pi_ch <- sample(x = c(1:5), size = nrow(input_data_3),
replace = TRUE, prob = c(0.1, 0.1, 0.2, 0.3, 0.3))
input_data_3$pi_ex <- sample(x = c(1:5), size = nrow(input_data_3),
replace = TRUE, prob = c(0.3, 0.3, 0.2, 0.1, 0.1))
# re-creating the survey design object
input_design_3 <- survey::svydesign(ids = ~ 1,
probs = NULL,
strata = input_data_3$gender,
fpc = input_data_3$gender_pop,
data = input_data_3)
# running the weighted Price Sensitivity Meter analysis
test_3 <- psm_analysis_weighted(toocheap = "tch",
cheap = "ch",
expensive = "ex",
tooexpensive = "tex",
design = input_design_3,
pi_cheap = "pi_ch",
pi_expensive = "pi_ex",
pi_scale = 5:1,
pi_calibrated = c(0.7, 0.5, 0.3, 0.1, 0))
summary(test_3)
#> Van Westendorp Price Sensitivity Meter Analysis
#>
#> Accepted Price Range: 11.91 - 15.63
#> Indifference Price Point: 13.69
#> Optimal Price Point: 13.63
#>
#> Newton Miller Smith Extension
#> Price with Optimal Reach: 12.16
#> Price with Optimal Revenue: 17.7
#>
#> ---
#> 228 cases with individual price preferences were analyzed (weighted data).
#> Total data set consists of 250 cases. Analysis was limited to cases with transitive price preferences.
#> (Removed: n = 22 / 9% of data)
Lumley, T (2010) Complex Surveys: A Guide to Analysis Using R↩︎