--- title: "Unfolding in a Nutshell" author: "Patrick Mair" output: rmarkdown::html_vignette bibliography: smacof.bib vignette: > %\VignetteIndexEntry{Unfolding in a Nutshell} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` This vignette gives a very basic introduction to multidimensional unfolding for preference data using smacof. Technical details and more advanced examples can be found in the main package vignette (`vignette("smacof")`), in @Borg+Groenen+Mair:2012, and in @Mair:2018b. ## Data Structure Unfolding can be seen as a variant of MDS for preference data. The input data structure is rectangular, as opposed to MDS where we have a symmetric dissimilarity matrix as input. There are two types of preference data that can be subject to unfolding: rankings and ratings. Let us illustrate the ranking structure using the breakfast dataset: ```{r breakfast, message = FALSE} library(smacof) head(breakfast) ``` ``r nrow(breakfast)`` individuals ranked ``r ncol(breakfast)`` breakfast items according to their preference. Note that these rankings are a special case of dissimilarities. The smaller the ranking value, the "more similar" a breakfast item and an individual are. The second data type compatible with unfolding are ratings. A typical example for rating data are item responses in a questionnaire (e.g., on a 1-5 rating scale). There is one issue users have to be careful about. If rating scales are scored and labelled in a "positive" direction (e.g. 1 as "fully disagree" and 5 as "fully agree"), the scale needs to be reversed such that the input data are dissimilarities, as required by the `unfolding()` function. Here we show an example involving 10 items related to Internet privacy, scored on a scale from 1-100. Note that a score of 100 implies highest preference. Therefore the data have to reversed prior to the unfolding fit. ```{r privacy, message = FALSE} library(MPsychoR) data(Privacy) Privacy_rev <- 101-Privacy head(Privacy_rev) ``` We have a total sample size of ``r nrow(Privacy_rev)`` individuals. ## Unfolding Fit Unfolding is a dual scaling method, as we aim to scale the rows ("ideal points") and the columns ("object points") of the input data jointly. As in MDS, we try to keep the number of dimensions low in order to be able to plot the unfolding configuration. Let us start with the breakfast ranking data. A basic 2D unfolding solution can be fitted as follows: ```{r breakunf} un_breakfast <- unfolding(breakfast) un_breakfast ``` We obtain a stress-1 value of ``r round(un_breakfast$stress, 3)``. As in MDS, users should not judge the goodness-of-fit by solely relying on this stress value but rather use several diagnostic tools in combination [see @Mair+Borg+Rusch:2016 for details]. Note that we fitted a ratio unfolding version which implies that the input dissimilarities remain untransformed. The `unfolding()` function provdes the same transformation options as `mds()` for improving the goodness-of-fit. In addition, the function can be forced to fit an individual transformation function for each row which leads to row-conditional unfolding (`conditionality = "row"`). In this example we stick to the basic ratio solution and produce the configuration plot. ```{r breakconf, fig.width = 6, fig.height = 6} plot(un_breakfast, main = "Configuration Breakfast Data") ``` This plot is highly intuitive to interpret as the distances between any pair of points are Euclidean: breakfast items close to each other are similarly preferred; individuals close to each other have similar breakfast preferences; the closer a breakfast item to an individual, the higher the individuals' preference for this item. Of course, to which degree this interpretation reflects the actual preferences in the data depends on the goodness-of-fit of the solution. Sometimes it is also possible to interpret the dimensions but, just as in MDS, this is not as crucial. The unfolding fit for the privacy rating data can be achieved in an analogous manner. ```{r privunf} un_privacy <- unfolding(Privacy_rev) un_privacy ``` The resulting configuration plot looks as follows (we suppress the row labels): ```{r privconf, fig.width = 6, fig.height = 6} plot(un_privacy, main = "Unfolding Configuration Internet Privacy", label.conf.rows = list(label = FALSE)) ``` The plot shows nicely three clusters of items: items related to individualization (apc1-3), items related to providing correct data (apc4-6), and items related to disadvantages of personal communication (dpc1-4). The first dimension distinguishes between advantages (apc items) and disadvantages of personal communication (dpc items). The solution can also be plotted using `ggplot2`: ```{r ggconfu, message = FALSE, fig.width = 6, fig.height = 6} library(ggplot2) conf_items <- as.data.frame(un_privacy$conf.col) conf_persons <- as.data.frame(un_privacy$conf.row) p <- ggplot(conf_persons, aes(x = D1, y = D2)) p + geom_point(size = 0.5, colour = "gray") + coord_fixed() + geom_point(aes(x = D1, y = D2), conf_items, colour = "cadetblue") + geom_text(aes(x = D1, y = D2, label = rownames(conf_items)), conf_items, colour = "cadetblue", vjust = -0.8) + ggtitle("Unfolding Configuration Internet Privacy") ``` It is important to fix the aspect ratio to 1 such that the distances in the plot are Euclidean. In order to avoid overlapping labels, the `ggrepel` package provides some good options.