factors analysis of discrimination characteristics
FACTOR ANALYSIS OF DISCRIMINATION CHARACTERISTICS ( word lenght: 3431) Snapshoot summary: The goal of our project is to perform an exploratory factor analysis to determine which factors influence discrimination and at the same time find the most common reason for its occurrence. Discrimination is omnipresent in the everyday society and there are many characteristics by which a person can be discriminated against. That is why we have decided to try to group these variables into broader collections which could describe the problem in a more straightforward and simplistic way without sacrificing the quality of the information.
The results of the analysis were satisfactory since we obtained two factors which we appropriately called “Discrimination based on regional characteristics” and “Discrimination based on visible characteristics”. We believe that these factors are a credible representation of the variables used in the research and that they summarize the variables perfectly. Introduction: Discrimination is the unjust or prejudicial treatment of different categories of people, especially on the grounds of race, age, or sex. l Unfortunately it is still existing even today and manifesting itself in all the spheres of life from football stadiums to orking places.
Since this is a topic of great importance to the society we have decided to use it as a topic of our research. Given that there are many individual characteristics on which people can be discriminated against the goal of our project is to try to simplify the variables available on discrimination into simpler factors which are easier to interpret. The method we are using in conducting our analysis is called factor analysis. It is the perfect instrument since it is in general a data reduction tool. It uses correlation between variables to group them into smaller number of factors which is Just what we need.
The software we used is the one demonstrated in class (Statistical Package for the Social Sciences – SPSS). To confirm the validity of the data and our research several tests were used including Kaiser- Meyer-Olkin Measure of Sampling Adequacy, Bartlett’s Test of Sphericity, Determinant of the Correlation Matrix et al. Two types of analysis are being used to conduct this project. They are the factor and regression analysis. The factors from factor analysis are being used as independent variables. Concisely, our goal is to construct a model which would explain discrimination in general.
Data description: In order to complete this project we are using secondary data taken from the Fourth Round of the European Social Survey (ESS) conducted in 2008. Secondary data are data that have been collected previously for research and used by another organization. Since ESS data was collected without having in mind the specific topic of our research there could arise some problems with the secondary data. These did not collect the information ourselves and there might be a difficulty with assessing the accuracy of the fgures if the source is a less familiar one.
Since we are sing European Social Survey (ESS) these problems might not even arise because the ESS has standardized data, which are carried out with the European Science Foundation, the European Commission’s Framework Programs and the national founding bodies in every nation concerned. 3 Definition of the variables could also potentially be a problem. To be sure that the variables are appropriate we have examined the labels and descriptions of all the variables used in the project. Source biasness is always a possible issue when using secondary data.
However, the ESS we used was conducted in numerous European countries on a very large sample size hich minimizes the chance of favoritism. Furthermore, the survey respondents were randomly drawn which means that the sample is representative of the population. Our understanding and interpretation of the data may be wrong. During the course of the project we have tried to minimize these factors through constant verification of the analysis procedures. Taking everything into consideration we can say that the data, that is going to be used, is accurate and reliable.
The usage of the ESS’s data gives us some additional advantages such as: Due to the expertise and the xperience of the people performing the ESS we can conclude that the data is of a much higher quality than if we would have collected it ourselves. Another positive aspect of the ESS is that it is being updated every two years (we are using the data form 2008) and because of that it is more reliable and trusted every time it is updated. Since they have greater resources they can afford to perform this kind of a study. On the other hand we could not have been able to assure such resources to perform our own surveys, especially on such a scale.
As previously mentioned, we are ware that many factors could possibly lead to discrimination and the prospect of having excluded an important variable is quite possible. However, since our analysis is based on the ESS we used the variables provided there expecting to receive a decent R-squared. Furthermore, after examining the variables we observed that the most important characteristics such as religion, language and race were considered in the survey. All the variables used are listed in the table below. Variable name Variable Label DSCRRCE Discrimination of respondent’s group??? . olour or race DSCRNTN Discrimination of respondent’s group: nationality DSCRRLG Discrimination of respondent’s group: religion DSCRLNG Discrimination of respondent’s group: language DSCRETN Discrimination of respondent’s group: ethnic group Discrimination of respondent’s group: age DSCRGND Discrimination of respondent’s group: gender DSCRSEX Discrimination of respondent’s group: sexuality DSCRDSB Discrimination of respondent’s group: disability All of the variables used can only take values of O and 1 which mean they are binary variables.
If the respondents would answer that they were not discriminated on a certain basis then the variable for that respondent would take a value of O. If on the contrary, the respondent answers he was in fact discriminated on a specific basis than the variable would take a value of 1 . One of the problems that could potentially arise in our research is the number of missing values in our data. That is why it is important to conduct a Missing Value Analysis to see if there are too many absent values.
As we can see from the table below in the dataset that we used there were no missing values which indicates that the data was probably already screened to remove them or all of the respondents answered each of the questions. If the values ere pre-screened this can impose a difficulty because if too many values were omitted this could potentially imply a certain level of bias. However, since we can see from the table that we still have a sample size which is more than enough we can continue with our analysis. Univariate Statistics Mean Std. Deviation Missing No. f Extremesa Count Percent Low dscrrce 56752 . 01 dscrntn . 112 . 0 dscrrlg . 100 dscrlng dscretn . 098 dscrage . 113 dscrgnd . 085 dscrsex . 053 dscrdsb . 075 a. Number of cases outside the range (Mean – 2*SD, Mean + 2*SD). For the sake of the project we will also include a table of Descriptive Statistics which oes not reveal much due to the specific nature of the values our variables can take, but it does give us some information. Since we are analyzing discrimination we are predicting that not a lot of people who participated in the ESS have been discriminated.
That can be confirmed by looking at the mean value of each of the variables. Our mean values are low but the explanation for this lies in the fact that the values obtained in this survey were O and 1 as we mentioned previously (if the respondent answered O it means that he was not discriminated on a certain basis and consequently, if he answered 1 it means hat he was discriminated on that basis). So as a result of the low means we can conclude that a lot of people answered O on the discrimination questions. We even calculated the approximate number of people who have been discriminated on specific ground.
The way we calculated the approximate number of people who reported discrimination is by multiplying the mean values of variables with the number of people who took the survey. By doing this we can clearly see what the main characteristics by which people were discriminated were. Age and Nationality stand out with the highest number of reports (732 and 721 respectively) while exuality was the characteristic which was reported the least amount of times. This could potentially be a good indicator which variables will have the highest factor loadings, something that we will talk about further down in the multivariate analysis section.
Descriptive Statistics Discrimination of respondent’s: colour or race . 0089 Discrimination of respondent’s: religion . 0102 Discrimination of respondent’s: ethnic group . 0097 Discrimination of respondent’s: age . 0129 Discrimination of respondent’s: sexuality . 0028 Discrimination of respondent’s: nationality . 0127 Discrimination of respondent’s: language . 0082 Discrimination of respondent’s: gender . 0073 Discrimination of respondent’s: disability . 0056 MULTIVARIATE ANALYSIS SECTION This is the main part of the project in which we will determine the factors using SPSS.
There are some steps that have to be done before the final separation of the variables into factors which will help us establish whether the variables we used are appropriate and suitable for this type of research. The first thing we should check is correlations between the variables. If any of the correlations between two variables are too high we should omit one of them since that implies that they are not very ifferent and might be measuring the same thing4. High correlation is often considered to be over 0. 8. Since we don’t have any high correlation values there is no need to omit any of the variables.
One more very important thing that this table provides is the Determinant. This number must not be equal to zero because that would imply multicollinearity. Since we can see here that the Determinant is equal to 0. 575 we can conclude that there is no risk of multicollinearity and we can continue. Correlation Matrixa Correlation Discrimination: colour or race ??? religion Discrimination. ??? ethnic group Discrimination: age ??? sexuality ?? nationality ??? language ??? gender 1. 000 . 220 . 295 . 054 . 058 . 166 . 106 . 189 . 018 . 041 . 212 . 171 . 074 . 049 . 213 . 089 . 020 . 187 Discrimination: sexuality . 050 . 43 Discrimination: nationality . 395 . 077 Discrimination: language . 232 Discrimination: gender Discrimination: disability . 021 . 111 . 045 . 038 a. Determinant = . 575 The next test is the Kaiser-Meyer-Olkin Measure of Sampling Adequacy which is the measure that shows us will the factor analysis give trustworthy factors. The closer our value is to 1 the more reliable are our factors. The threshold for this index is 0. 55 and the value we obtained for this measure is 0. 692. Since it is a significantly higher than the threshold value, we consider it acceptable and therefore we will continue with our research.
Together with KMO we perform the Bartlett’s Test of Sphericity which tests the null hypothesis that the correlation matrix is an identity matrix i. e. each variable correlates perfectly with itself (so r = 1) but has no correlation with the other variables (r = O). Therefore by this test we examine whether the variables are uncorrelated in the population. We want to reject the null hypothesis and the data in our table shows us that we can do this with since our level of significance is very low below 0. 05). KMO and Bartlett’s Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett’s Test of Sphericity Approx.
Chi-Square df stg. Next we should check the Communalities Table. It shows us the level of variance each of the variables has in common with the rest of the variables. We are concerned with the “Extraction” column because it shows us the amount of variance in each variable that is explained by the extracted number of factors. 6 The general rule is that communality values should not be very low (lower than 0. 3 or 0. 4). Here we encounter a problem since one of our variables “Discrimination of respondent’s group: exuality’ is too low and the best thing we can do is to omit it from further analysis.
Communalities Extraction Method: Principal Component Analysis. Just to be on the safe side we have performed the extraction with all nine variables (including “sexuality’) and obtained a factor Component Matrix in which “sexuality’ had a very low loading on both of the factors. The results were no better with the Rotated Component Matrix and that is an additional reason why we are omitting this variable from further research. A repeated Communalities Table, now with only eight factors, is more acceptable and looks as following: Initial