Linear discriminant function analysis i. In addition, discriminant analysis is used to determine the minimum number of dimensions needed to describe these differences. A distinction is sometimes made between descriptive discriminant analysis and predictive discriminant analysis. We will be illustrating predictive discriminant analysis on this page. Please note: The purpose of this page is to show how to use various data analysis commands.
|Published (Last):||21 May 2017|
|PDF File Size:||5.8 Mb|
|ePub File Size:||4.67 Mb|
|Price:||Free* [*Free Regsitration Required]|
The variables include three continuous, numeric variables outdoor, social and conservative and one categorical variable job with three levels: 1 customer service, 2 mechanic and 3 dispatcher. We are interested in the relationship between the three continuous variables and our categorical variable.
Specifically, we would like to know how many dimensions we would need to express this relationship. Using this relationship, we can predict a classification based on the continuous variables or assess how well the continuous variables separate the categories in the classification.
We will be discussing the degree to which the continuous variables can be used to discriminate between the groups. Some options for visualizing what occurs in discriminant analysis can be found in the Discriminant Analysis Data Analysis Example. To start, we can examine the overall means of the continuous variables.
We are interested in how job relates to outdoor, social and conservative. From this output, we can see that some of the means of outdoor, social and conservative differ noticeably from group to group in job. These differences will hopefully allow us to use these predictors to distinguish observations in one job group from observations in another job group.
Next, we can look at the correlations between these three predictors. These correlations will give us some indication of how much unique information each predictor will contribute to the analysis. Uncorrelated variables are likely preferable in this respect. We will also look at the frequency of each job group. The discriminant command in SPSS performs canonical linear discriminant analysis which is the classical form of discriminant analysis. In this example, we specify in the groups subcommand that we are interested in the variable job, and we list in parenthesis the minimum and maximum values seen in job.
We next list the discriminating variables, or predictors, in the variables subcommand. In this example, we have selected three predictors: outdoor, social and conservative. We will be interested in comparing the actual groupings in job to the predicted groupings generated by the discriminant analysis. For this, we use the statistics subcommand. This will provide us with classification statistics in our output.
Data Summary a. Analysis Case Processing Summary — This table summarizes the analysis dataset in terms of valid and excluded cases.
In this example, all of the observations in the dataset are valid. Group Statistics — This table presents the distribution of observations into the three groups within job. We can see the number of observations falling into each of the three groups. In this example, we are using the default weight of 1 for each observation in the dataset, so the weighted number of observations in each group is equal to the unweighted number of observations in each group.
Eigenvalues and Multivariate Tests c. Function — This indicates the first or second canonical linear discriminant function. The number of functions is equal to the number of discriminating variables, if there are more groups than variables, or 1 less than the number of levels in the group variable.
In this example, job has three levels and three discriminating variables were used, so two functions are calculated. Each function acts as projections of the data onto a dimension that best separates or discriminates between the groups. Eigenvalue — These are the eigenvalues of the matrix product of the inverse of the within-group sums-of-squares and cross-product matrix and the between-groups sums-of-squares and cross-product matrix. These eigenvalues are related to the canonical correlations and describe how much discriminating ability a function possesses.
See superscript e for underlying calculations. We can verify this by noting that the sum of the eigenvalues is 1. Then 1. For any analysis, the proportions of discriminating ability will sum to one. Thus, the last entry in the cumulative column will also be one. Canonical Correlation — These are the canonical correlations of our predictor variables outdoor, social and conservative and the groupings in job.
If we consider our discriminating variables to be one set of variables and the set of dummies generated from our grouping variable to be another set of variables, we can perform a canonical correlation analysis on these two sets. From this analysis, we would arrive at these canonical correlations. Test of Function s — These are the functions included in a given test with the null hypothesis that the canonical correlations associated with the functions are all equal to zero.
In this example, we have two functions. It is the product of the values of 1-canonical correlation2. In this example, our canonical correlations are 0. Chi-square — This is the Chi-square statistic testing that the canonical correlation of the given function is equal to zero. In other words, the null hypothesis is that the function, and all functions that follow, have no discriminating ability.
This hypothesis is tested using this Chi-square statistic. It is based on the number of groups present in the categorical variable and the number of continuous discriminant variables. The Chi-square statistic is compared to a Chi-square distribution with the degrees of freedom stated here. For a given alpha level, such as 0. If not, then we fail to reject the null hypothesis.
Discriminant Function Output m. Standardized Canonical Discriminant Function Coefficients — These coefficients can be used to calculate the discriminant score for a given case. The score is calculated in the same manner as a predicted value from a linear regression, using the standardized coefficients and the standardized variables.
For example, let zoutdoor, zsocial and zconservative be the variables created by standardizing our discriminating variables. The magnitudes of these coefficients indicate how strongly the discriminating variables effect the score.
For example, we can see that the standardized coefficient for zsocial in the first function is greater in magnitude than the coefficients for the other two variables. Thus, social will have the greatest impact of the three on the first discriminant score.
Structure Matrix — This is the canonical structure, also known as canonical loading or discriminant loading, of the discriminant functions. It represents the correlations between the observed variables the three continuous discriminating variables and the dimensions created with the unobserved discriminant functions dimensions. Functions at Group Centroids — These are the means of the discriminant function scores by group for each function calculated. If we calculated the scores of the first function for each case in our dataset, and then looked at the means of the scores by group, we would find that the customer service group has a mean of Predicted Classifications p.
The reasons why an observation may not have been processed are listed here. We can see that in this example, all of the observations in the dataset were successfully classified. Prior Probabilities for Groups — This is the distribution of observations into the job groups used as a starting point in the analysis.
The default prior distribution is an equal allocation into the groups, as seen in this example. SPSS allows users to specify different priors with the priors subcommand.
Predicted Group Membership — These are the predicted frequencies of groups from the analysis. The numbers going down each column indicate how many were correctly and incorrectly classified. For example, of the 89 cases that were predicted to be in the customer service group, 70 were correctly predicted, and 19 were incorrectly predicted 16 cases were in the mechanic group and three cases were in the dispatch group. Original — These are the frequencies of groups found in the data.
We can see from the row totals that 85 cases fall into the customer service group, 93 fall into the mechanic group, and 66 fall into the dispatch group. These match the results we saw earlier in the output for the frequencies command. Across each row, we see how many of the cases in the group are classified by our analysis into each of the different groups. For example, of the 85 cases that are in the customer service group, 70 were predicted correctly and 15 were predicted incorrectly 11 were predicted to be in the mechanic group and four were predicted to be in the dispatch group.
Count — This portion of the table presents the number of observations falling into the given intersection of original and predicted group membership.
For example, we can see in this portion of the table that the number of observations originally in the customer service group, but predicted to fall into the mechanic group is The row totals of these counts are presented, but column totals are not. For example, we can see that the percent of observations in the mechanic group that were predicted to be in the dispatch group is This is NOT the same as the percent of observations predicted to be in the dispatch group that were in the mechanic group.
The latter is not presented in this table. Zoutdoor Zsocial Zconservative Score1 Score2 Descriptive Statistics.
Techniques Exploratoires MultivariÃ©es :
Discriminant Analysis | SPSS Annotated Output
Discriminant Function Analysis | SPSS Data Analysis Examples