Multinomial modeling is a formal approach to measuring cognitive processes, such as the capacity to store, organize, and retrieve items in memory, or to make inferences and logical deductions, or to discriminate and categorize similar stimuli. While such processes are not directly observable, theoretically they can be assumed to interact in certain ways to determine observable behaviors. The goal of multinomial modeling is to identify which underlying factors are important in a cognitive task, explain how those processes combine to create observable behavior, and then use experimental data to estimate the relative contributions of the different cognitive factors. In this way, multinomial models can be used as measurement tools to examine unobservable cognitive processes.
Multinomial models are developed for categorical data, where each participant’s response falls into one and only one of a finite set of observable data categories. Assume there are J such categories and N experimental response observations, where nj observations fall into category C_{j} , j=1,2, …, J . Then if the observations are independent and identically distributed with probability p_{j} of falling into category C_{j} , the category count vector, D = (n_{1}, n_{2}, …, n_{J}), follows the multinomial distribution given by where the category probabilities are nonnegative and sum to one. The count data for a multinomial model usually come from a cognitive experiment where each participant in an experimental group produces a categorical response to a set of items; for example, pictures are ‘recognized’ or ‘not recognized’ or letter strings are judged to be ‘words’ or ‘nonwords’ . Most data sets for multinomial modeling involve more than two response categories, and in addition there may be more than one type of item, each with its own system of response categories. For example, in a source monitoring experiment, participants study a list of items from two sources, Source A or Source B (e.g., presented in a male vs. female voice, or presented visually vs. auditorily). Later, participants are given a recognition memory test consisting of three types of items, namely the two types of old list items and new distracter items, and they must classify each tested item as Source A, Source B, or New. The resulting multinomial data structure consists of three category systems, each with three response categories. If the responses in different category systems are independent and category counts within a system follow a multinomial distribution, the probability of the data structure is given by the product of three multinomial distributions, one for each category system. The key to creating a multinomial model is to take a multinomial data structure and express the category probabilities in terms of underlying, cognitively interpretable parameters. One needs to specify a cognitive processing architecture along with cognitively interpretable parameters and formal computational rules that can generate the count data described by the multinomial distribution. Once the model is constructed and data are collected, standard tools in statistical inference can analyze the data and evaluate the adequacy of the fit of the model to the data, and estimate the values of the cognitive parameters that are likely to have created the data, and test hypotheses about the parameters. In this way unobservable cognitive processes can be measured indirectly with the use of the model. Multinomial models of various types have been used in cognitive psychology since the 1960s; however, in the 1980s a particular approach called multinomial processing tree (MPT) modeling was developed at a general level by the authors of this entry. The central characteristic of MPT models is that they have a particular type of cognitive architecture represented as a rooted tree structure. Such a structure assumes that cognitive processes follow one another, and subsequent processes are conditionally dependent on the success or failure of earlier processes. For example, if a model has parameters for item attention, item storage, and item retrieval, then successful storage depends on successful attention. In turn, successful retrieval depends on successful storage. If any of these processes fail, then responses may be governed by guessing biases corresponding to various states of incomplete information. Each series of processing possibilities leads to different observable responses, and there are usually many of these processing patterns, each represented by the “branches” of the tree architecture. One early example of an MPT model is the pairclustering model developed by William Batchelder and David Riefer. Their model was designed to separately measure storage capacity from retrieval capacity in human memory. The data for the model involve a specially designed free recall task where a participant studies a list of words one at a time, and then at a later time memory is tested by having the participant produce as many of the studied words as they can in any order. The list consists of pairs of exemplars from several categories such as vehicles (taxi, car) or flowers (rose, daisy). Recall of each category pair is scored into four categories:
The model postulates three parameters each designed to measure a different cognitive process:
The connection of the parameters to the category probabilities is based on a combination of psychological considerations and reasonable approximations. In particular it is assumed that both members of a category pair are recalled successively if and only if the words in the pair were clustered and the cluster is retrieved (joint probability cr). Also, if a cluster was stored but not retrieved, then neither word is recalled. In contrast, with probability (1c) the words in a pair are not clustered, and in this case each word in the pair is or is not recalled individually with probability u, subject to the condition that if both nonclustered words are recalled they are not recalled successively. These assumptions can be displayed in the following processing tree. From the tree it is easy to express the category probabilities, p_{j} = Pr(C_{j }) for j=1,2,3,4, in terms of the parameters. The result expresses each category probability as a sum of the probabilities of the branches that lead to that category as follows:
The example of the pairclustering model illustrates the basic properties of multinomial modeling, which are the tree architecture and the use of observable categorical data to measure underlying cognitive processes. However, the example does not illustrate three aspects typical of most applications of MPT models. First, in the example there were three parameters representing cognitive processes and only three degrees of freedom in the data structure (since the four category probabilities are required to sum to one). In cases where there are more degrees of freedom in the categorical data than parameters, the system of equations expressing category probabilities in terms of parameters is over determined and standard techniques in mathematical statistics are used to estimate the parameters from the data. Second, the pairclustering model involves just one system of categories, but many MPT models are developed for several category systems, each of which is associated with its own processing tree. For example, MPT models for the source monitoring experiment discussed earlier specify three processing trees, one for each item type (A, B, or New). Finally, unlike the example of the pairclustering model, most applications of MPT models involve two or more experimental groups of participants, where the same model with possibly different parameter values is assumed to govern each group’s category count data. In this case MPT models are used to conduct hypothesis tests in an effort to discover which cognitive processes account for differences between the groups. The usual approach in experimental psychology for analyzing data from multiple experimental groups is to apply standard statistical tools like analysis of variance or linear regression. While these standard tests are well developed to detect group differences and associate them to experimental manipulations, they do not allow one to pinpoint the cognitive bases for the differences. Thus MPT modeling can be a valuable supplement to traditional statistical analysis because it explains group differences in terms of differences in cognitive processing parameters. As can be seen, MPT models are simple statistical models that are easy to develop and test. But before an MPT model can be used as a measurement tool it must be validated. A validated model is one where the parameters can be shown to be valid representations of the cognitive processes they stand for. Establishing validation involves conducting simple cognitive studies where experimental manipulations are designed to affect some parameters and not others. These experiments attempt to dissociate the parameters by showing that they can be independently manipulated in ways that are consistent with established psychological theory. For example, if a multinomial model has memory storage and memory retrieval parameters like the pairclustering model, then providing retrieval cues during recall should increase the value of the retrieval parameter r but not the value of the storage parameter c. Other manipulations such as increased study time should affect the cluster storage parameter but probably not the retrieval parameter. Since the 1990s, MPT modeling has become an increasingly popular approach to cognitive modeling, and its use has been facilitated by several software packages that can perform parameter estimation and hypotheses testing. To date there have been over a hundred examples of the application of MPT modeling. Most of these applications have been in the standard cognitive areas of memory, reasoning, and perception; however, clinical, social, and developmental psychology are also areas where MPT modeling is active. There are also a number of ongoing projects that explore the statistical properties of these models. For example, there has been recent work creating hierarchical MPT models to handle variation in parameter values due to individual differences in the participants, as well as latent class MPT models that can be used to model subgroups of participants with different cognitive abilities. References
