The Cross-cultural Structural Validity of the Big Five Personality Inventory (BFI-10) in a South African Sample

The study sought to assess the structural validity of the 10-item measure of the B5P instrument in the South African context. World Values Survey data, collected in South Africa (N = 3 531), were analysed using exploratory and confirmatory factor analyses, to ascertain the factorial structure of the data, including across sub-groups, focusing on measurement invariance. The theorised factorial structure of the B5P survey did not mirror the theorised structure in the South African sample. This was demonstrated in the inspection report, as well as in the tests of measurement invariance. Even sub-groups, typifying the Westernised, educated, industrialised, wealthy, and democratic part of the South African society, did not provide structures that mirrored the theorised model. The assumption that wellestablished instruments are valid in settings, different to the one where they were initially developed in, should be questioned, and such instruments should not be used, unless thoroughly tested.


Introduction
The Big Five personality (B5P) model has a long history, stretching back to Fiske (1949), and including researchers such as Norman (1967), Smith (1967) and Goldberg (1981), as well as McCrae and Costa (1987). Methodologically, the B5P model is based on psycho-lexical studies, such as those conducted by Allport and Odbert (1936), and, later, Cattell (1950). It is a method still used extensively in contemporary studies of personality, including South Africa (Hill et al., 2013). John (2021: 35) states that, after decades of research, and long debates about the right number of factors and the best labels for these factors, the field has now achieved an initial consensus on a general taxonomy of personality traits: the 'Big Five' personality dimensions. Many versions of B5P measures were published, of which the 240-item Revised NEO Personality Inventory, the 60-item NEO Five-Factor Inventory, and the 44-item Big Five Inventory, all from Costa and McCrae (1992), are the most well-known versions. The Basic Traits Inventory (BTI) questionnaire (including 193 items), which is a South African instrument, also assessed the B5P assessment. Several shorter-versions of the B5P are available, such as the Big Five Personality Inventory (BFI-10) (Rammstedt and John, 2007) and the 20-item mini-International Personality Item Pool-Five-Factor Model (mini-IPIP-FFM) (Donnellan, Oswald et al., 2006), which has gained popularity in psychology research in recent times (Rammstedt et al., 2018;Soto and John, 2017;Nunes et al., 2018). The World Values Survey (WVS) (6th Wave) included a 10-item version of the B5P (Rammstedt and John, 2007) and applied it across 24 countries. It could be presumed that the designers of the WVS believed that this version of the B5P were applicable to South Africa, or they may have wished to gain data on the psychometric characteristics of the instrument to assess its applicability. However, no research could be located with specific reference to the (structural) validity of this short version of the B5P in South Africa. This arguably limits the use of this handy instrument within the South African context.
To illustrate the foregoing challenge, Gurven's (2013) study in Bolivia, which tested the B5P model using the 44-item BFI tool, produced results that were inconsistent with those obtained in other Western countries, using the same instrument. Additionally, Lajaaj et al. (2019) study of data from face-to-face surveys, conducted in 23 low-and middleincome countries, on commonly used measures of the BP5 personality model, discovered that the construct measures failed to measure the intended personality traits, indicating a low validity of the instrument. This finding casts doubt on the cross-cultural validity of the BP5 scales, prompting a call for additional research on them; particularly in non-WEIRD environments, to resolve the discrepancy. Numerous explanations have been advanced in the literature to explain such cross-cultural discrepancies in measurement non-variance or cross-cultural validity of commonly used personality measures. These include participants' systematic response patterns (Boer et al., 2018), the effect of enumerator interaction (Laajaj et al., 2019) and, finally, participants' low levels of education (Gurven et al., 2013). Taken together, these and other factors distort personality measures, particularly in large-scale studies; thereby, increasing the risk of misinterpretation of personality traits in multi-group studies.
While it is preferable to use well-established, comprehensive, and longer multi-item measurement instruments for B5Prelated studies, due to their superior content validity and reliability, time constraints may make this impossible, in some instances (Rammstedt and John, 2007). The use of lengthy instruments may result in participation exhaustion, annoyance and inattentive responses, and there is a world-wide trend to use shorter instruments (Kemper, 2019;Rossi-Ferrario, 2019). Ultimately, the reliability of the results of longer instruments can potentially be compromised (Soto and John, 2017). As a result, shorter versions of instruments, such as Rammstedt and John's (2007) 10-item BFI-10, Donnellan et al.'s (2006) 20-item mini-IPIP-FFM, and Gerlitz and Schupp's (2005) 15-item Big Five Personality Inventory (BFI-S) have emerged over the years. Robins et al. (2001) attest to the value of brief instruments, particularly in large-scale, multi-group surveys and longitudinal studies. They posit that, in general, brief versions have been found to be just as valid as comprehensive and sophisticated scales. They are also more attractive, due to their lower cost and efficiency. Additionally, the psychometric superiority of longer measures is not always reflected in practice (Burisch, 1997).
Despite the advantages of brief instruments in psychology research, some concerns have been raised about them and about brief B5P measures. To begin with, Langford (2003) observes that longer B5P questionnaires are, on average, more representative of broad personality constructs and have greater measurement superiority when compared to shorter versions. Secondly, Gosling et al. (2003) observe that shorter versions of the B5P have inferior psychometric properties to this standard multi-item instrument. Lajaaj et al. (2019), who used a brief 15-item instrument to analyse data from 94 751 participants in 23 low-and middle-income countries (non-Western, educated, industrialised, rich and democratic populations), encountered similar instrument validity issues. In the study, commonly used personality questions failed to measure the desired personality traits and demonstrated low validity. The WVS, a multi-country research study that examines people's values and beliefs, temporal evolution, and socio-political implications, incorporated the B5P measures for the first time in its sixth wave (covering the years from 2010 to 2014). Rammstedt and John's (2007) 10-item BFI-10 was used to develop the personality measures. The BFI-10 scale adequately measures the extraversion, agreeableness, openness, conscientiousness and neuroticism dimensions of human personality, as demonstrated by several previous studies (Balgiu, 2018;Guido et al., 2015;Sudzina, 2016).
While the BFI-10 has received widespread support for its adequate measurement properties, Ludeke and Larsen (2017), as well as Simha and Parboteeah (2020), expressed reservations about the credibility of the data elicited from participants in the sixth wave of the WVS. Ludeke and Larsen (2017) noted that, in the sixth wave of the WVS, certain measurement items associated with the same personality trait tended not to correlate as expected (in fact, they correlated negatively), contradicting findings from previous research. Disparities were discovered across all five personality dimensions. In the same study, a factor analysis of the ten items did not produce the five dimensions predicted by the B5P model. Rather, the factor analysis of the BFI-10 scales items produced an unwieldy three-factor solution that was difficult to interpret. Another finding from the study was that BFI-10 measures are unreliable in non-Western countries, among less wealthy and educated populations. As a result, caution should be exercised in future studies that make use of WVS data. These results corroborate Chapman and Elliot's (2019) conclusions, when looking at the relationship between BFI-10 measures and mortality in the General Social Survey (USA). They found that using abbreviated versions of longer personality tests, in cross-cultural studies, produced unusual results that did not replicate those obtained using the original versions.
Literature reveals that the global acceptance of the B5P model is questioned; the use of short versions of psychometric tests are sometimes questioned (Chapman and Elliot, 2019;Kemper et al., 2019; and the use of the 10-item measure in the WVS is questioned by some. The literature affirms the need for an analysis of the structural validity of the 10item measure in the WVS, including in the South African context. This research aims to assess the structural validity of the 10-item measure of the B5P instrument in the South African context, with the intention of advocating for, or against, its use. The following research question guided the study: • Is the 10-item B5P instrument structurally valid when applied to the South African context? The purpose is to explore whether factors such as language proficiency (AEgisdóttir, 2008;Abrahams and Mauer, 1999;Király et al., 2019;McDonald, 2011) can result in the BFI-10 not being valid within this national context.

Methodology
Cross-sectional secondary data, obtained from the WVS database, was used in this study. The data was collected during the WVS wave 6 survey, where a probability-based sample of 3 531 participants were interviewed across South Africa. The sample comprised mainly women, 1 824 (51.7%), and the mean age of all the participants was 37.72 years, with a standard deviation of 15.675. Worldometer (live data) report that women are a small majority in South Africa (51.35%), while the median age is 27.10 years (Worldometer, 2023). It should, however, be noted that the inclusion criteria for participation in the study was 18 years, and a large portion of South Africans are below this age. This means that the sample was somewhat representative in terms sex and the age of the South African population. Since WEIRD was used to create subgroups in the sample, the 427 white participants were deemed to be more Western than the other groups and were used to represent the W (Western) dimension in WEIRD; 316 participants, with university-level education, with or without a degree, were used to represent the E (Educated) dimension in WEIRD; 436 participants from the higher and upper middle class groups were combined to represent the R (Rich) dimension in WEIRD; and, the participants, totalling 692, who selected responses '9' and '10' to the questionnaire item regarding participants' perceptions of an active democracy in the country were combined (278 and 414), which represented the D (Democratic) dimension in WEIRD. It is acknowledged that "Democratic" should be a consistent within a country, but, given the absence of any other option in the dataset, this variable was included.
Participants completed a pre-validated instrument on the Big Five personality factors and self-reported their demographics (these were used to stratify the data). For the personality variable, participants completed the B5P measures of the WVS (6th Wave). The version used was the 10-item BFI-10, by Rammstedt and John (2007). The participants responded to measures, scored on a 5-point Likert-type scale, ranging from 1=disagree strongly to 5=strongly agree. See table 1 for the items of the questionnaire and the way it was coded in the survey.  (2007) report test-retest reliability coefficients of at least 0.7 for each of the Big Five traits. Other researchers also report respectable test-retest coefficients, namely .515-.873 (Carciofo et al., 2016, 66-87) (Rammstedt and Krebs, 2007) and an aggregated alpha coefficient of as low as .55 (Erdle and Rushton, 2011). An alpha coefficient of .55 is hardly acceptable, and was a red light noticed by the authors of this paper.
Considering factorial validity, Rammstedt and John (2007) found the expected five-factor structure theorised by the B5P model. Similar positive findings were reported by Bagliu (2018) and Guido et al. (2015), all of whom agreed that the theoretical structure of the BFI-10 is supported by data. However, studies by John et al. (2019), and Ludeke and Larsen (2017), did not demonstrate adequate empirical support for the five-factor structure in non-WEIRD contexts.
The following variables were used as direct measures of WEIRD or non-WEIRD: • V254 -Ethnic group (5-point categorical data): Whites were seen as more representative of the dimension, Western, than black and Indian -as examples of other groups. • V248 -Highest educational level attained (8-point ordinal data): Those with "Some university level education (without a degree)" and "University level education (with a degree)" were selected as more representative of educated, rather than those who had completed secondary school, or who had lower educational achievements. • V225 -How often one made use of a PC (4-point categorical data): Those who used their PCs frequently were deemed as more industrialised than those who used it occasionally and never, or those who did not have access to PCs. • V238 -Social class (subjective) (8-point ordinal data): Those in the higher and upper-middle classes were categorised as rich, compared to those in the lower classes. • V141 -How democratically is this country being governed today? (10-point interval scale): Though it is reasonable to assume that a country, in this case South Africa, has a certain level of democracy, excluding democracy as a variable will give rise to varying perceptions regarding the level of democracy within a country. As a measure of democracy, those with scores of 9 and 10 were seen as representative of the D (Democratic) in the WEIRD acronym.
All the data on the measurements were extracted from the WVS Wave 6 (Inglehart et al., 2014). The study was approved by the University of South Africa's School of Business Leadership's Research Ethics Review Committee. The data needed to address the research question was freely and readily available online, on the WVS website. Thus, the researchers downloaded and proceeded to analyse it. In accordance with the requirements of the owners of the database, the researchers acknowledged the data's source.
The researchers used IBM SPSS 28 to perform exploratory factor analyses (EFA), to first establish whether the data would fit the B5P model using the entire South African sample, as recommended by De Roover and Vermunt (2019).
The natural fit of the data was then determined, using Kaiser's rule regarding eigenvalues. As a rule, the EFA is followed by a confirmatory factor analysis (CFA), and then a multilevel multi-group confirmatory factor analysis (MGCFA), in a stepwise and hierarchical process, in the test for structural validity (Vandenberg and Lance, 2000;Gouveia et al., 2015;Putnick and Bonstein, 2016). Varimax rotation was used. Because the EFA result demonstrated that the data did not fit the total dataset for the B5P model well, the sample was stratified into five smaller samples, using the WEIRD acronym. Following this, the same sequence of statistical tests was performed to assess whether the B5P model would fit these sub-populations. The expectation was that the same number of factors would be extracted and that loadings would be similar across groups (Bialosiewicz et al., 2013). Neither EFA nor CFA could confirm configural fit; thus, no further tests were performed.

Results
For the full South African sample, the outcome of the EFA showed a Kaiser-Meyer-Olkin measure of sampling adequacy of 0.872, and the Bartlett's Test of Sphericity approximate Chi-Square was 10557.903 (df=45), which was statistically significant (p<.0001) (N=3 525). When applying Kaiser's criterion of retaining factors with loading, with eigenvalues greater than 1, two factors were retained, explaining 55.64% of the variance. When "forcing" the data into the five-factor solution, the variance observed was 76.60%. Table 2 presents these results. In the first column, the items are indicated as numbers, with the measured construct indicated as follows: E -Extraversion, A -Agreeableness; C -Conscientiousness; N -Neuroticism; O -Openness. Reverse coding is indicated by R. The "filled" cells represent the theorised structure of the BFI-10.
An indicator of factorial equivalence is when the same number of factors emerge, as proposed in the theoretical model (Bialosiewicz et al., 2013). As shown in table 2, two instead of five factors were extracted when applying Keiser's criterion of retaining factors, with loading with eigenvalues greater than 1. Another indicator of structural validity and measurement invariance is when items associated with the same latent variable load on the same factor (McGovern and Lowe, 2018). To test this premise, the data was forced into a five-factor solution. The anticipation was that the B5P traits would each load on a separate factor. Considering the "pointer loadings", which are the items with the high loading (in this case higher than 0.60, and bolded in table 2), it was found that none of the loadings mirrored the theoretical B5P model. Also absent from the results in table 2 were the expected positive and negative loadings per factor, since each personality trait was measured with two items, one of which was reverse coded. To test for structural validity in a comprehensive manner, a confirmatory factor analyses (CFA) were performed, where five factors were proposed, with two items loading on each factor. The results were as follows: CFI=0.870 (cut off CFI>0.90); RMSEA=0.122 (cut off RMSEA<0.08) (Hu and Bentler, 1999). The results revealed a poor fit.
The full South African dataset seemed not to mirror the theorised model. Given the WEIRD principle, it was assumed that groups that meet this criterion may be better in reflecting the B5P model. The sample was thus stratified using indicators in the WVS, which could be deemed proxies to Western, educated, industrialised, rich, and democratic. When applying the same statistical testing procedure to the full sample, the following results (see table 3) emerged for those from a Western background. In the first column the items are indicated as numbers with the measured construct indicated as follows: E -Extraversion, A -Agreeableness; C -Conscientiousness; N -Neuroticism; O -Openness. Reverse coding is indicated by R. The "filled" cells represent theorised structure of the BFI-10.
The number of factors extracted, applying the Kaiser's standard, was two, signalling structural fit problems (Bialosiewicz et al., 2013). After "forcing" the data into five factors, neither the "pointer loadings" nor the expected positive and negative loadings per factor were present in the Western sample. When testing for structural validity, using confirmatory factor analyses with five factors, the following results were produced: CFI=1 (cut off CFI>0.90) (a CFI=1 describes an over-identified model, which suggests that less than 5-factor model could describe the data better); RMSEA= 0.226 (cut off RMSEA<0.08). The results revealed a poor fit.
It was then proposed that those in South Africa, who are more educated, may resemble the B5P configuration. These results are presented in table 4. In the first column, the items are indicated as numbers, with the measured construct indicated as follows: E -Extraversion, A -Agreeableness; C -Conscientiousness; N -Neuroticism; O -Openness. Reverse coding is indicated by R. The "filled" cells represent the theorised structure of the BFI-10.
As per the previous analyses, the number of factors extracted, applying Kaiser's rule, was two, signalling structural fit problems. After "forcing" the data into five factors, neither the "pointer loadings" nor the expected positive and negative loadings per factor were present in the educated category. When testing for structural validity, using confirmatory factor analyses with five factors, the following results were produced: CFI= 0.758 (cut off CFI>0.90); RMSEA= 0.172; (cut off RMSEA<0.08). The results revealed a poor fit.
Following the WEIRD principle, attention was next given to the industrialised group. It was expected that those in a more industrialised environment, would structurally be close to the B5P. The results are summarised in table 5. In the first column, the items are indicated as numbers, with the measured construct indicated as follows: E -Extraversion, A -Agreeableness; C -Conscientiousness; N -Neuroticism; O -Openness. Reverse coding is indicated by R. The "filled" cells represents the theorised structure of the BFI-10.
The number of factors extracted, applying Kaiser's criterion, was again two, signalling structural fit problems. When forcing the data into five factors, neither the "pointer loadings" nor the expected positive and negative loadings per factor, were present in the industrialised category. When testing for structural validity, using affirmative factor analyses with five factors, the following results were produced: CFI=0.764 (cut-off CFI>0.90); RMSEA=0.166 (cut off RMSEA<0.08). The results revealed a poor fit.
Next, the rich category was explored as a potential group that reflects the B5P taxonomy. In the first column, the items are indicated numbers with the measured construct indicated as follows: E -Extraversion, A -Agreeableness; C -Conscientiousness; N -Neuroticism; O -Openness. Reverse coding is indicated by R. The "filled" cells represent the theorised structure of the BFI-10.
As summarised in table 6, the number of factors extracted, using Kaiser's criterion, numbered two, signalling structural fit problems. When the data was forced into five factors, neither the "pointer loadings" nor the expected positive and negative loadings per factor were present in the rich. When testing for structural validity, using affirmative factor analyses with five factors, the following results were produced: CFI=0.811 (cut off CFI>0.90); RMSEA=0.164 (cut off RMSEA<0.08). The results revealed a poor fit.
Lastly, the idea was explored that those who believe they live within a democratic dispensation, may share the personality characteristics portrayed in the B5P theory of personality. In the first column the items are indicated as numbers with the measured construct indicated as follows: E -Extraversion, A -Agreeableness; C -Conscientiousness; N -Neuroticism; O -Openness. Reverse coding is indicated by R. The "filled" cells represent the theorised structure of the BFI-10.
The number of factors extracted, based on Kaiser's criterion, was two (see table 7), signalling structural fit problems (Bialosiewicz et al., 2013). When the data was "forced" into five factors, neither the "pointer loadings" nor the expected positive and negative loadings per factor, were present in the "democratic" category. The following results were obtained from the confirmatory factor analyses with five factors: CFI=0.861 (cut off CFI> 0.90); RMSEA=0.134 (cut off RMSEA<0.08).

Discussion
The literature is clear that the B5P model is still well respected, but with several concerns, particularly with the shorter BFI-10 version. These challenges included it having been established from the standpoint of Westernised cultural settings and its interpretation, thus, not directly comparable to those formed in populations with distinct cultural traits. Also, the BFI-10, which was used to measure personality traits in the WVS wave 6, was seen to have inferior psychometric properties and, in some instances, failed to capture the Big Five personality traits. The limitations of the research instrument have thus necessitated this study. The integrity of the data collected via the WVS was not questioned, as it was collected by a well-established and reputable service provider. The sample, as per the demographic data, was representative of the South African population.
The results revealed that the B5P model did not fit the data for the total population. Longer versions seem to work well (Abrahams and Mauer, 1999;McDonald, 2011;Taylor and De Bruin, 2006). When EFA was applied to the data, the results revealed that the number of factors extracted, applying Keiser's criterion was 2, and not the expected 5, as per the B5P model; thus, signalling structural fit problems. Even when the data was "forced" in EFA into a five-factor model, none of the loadings mirrored the theoretical B5P model. Also absent from these EFA results were the expected positive and negative loadings per factor. When testing for CFA, the results echoed those found with the EFA, with none of the indicators fitting the theorised model. These results are contrary to previous research by Guido et al. (2015) and Sudzina (2016), where it was found that the BFI-10 scale properly measured the extraversion, agreeableness, openness, conscientiousness and neuroticism characteristics of human personality. However, the results affirm the research by Ludeke and Larsen (2017), as well as Simha and Parboteeah (2020), where the empirical data in the sixth wave of the WVS was found not to adequately reflect the B5P model.
It was envisaged that creating subgroups (of the South African sample) informed by the WEIRD principle (Westernised, educated, industrialised, wealthy, and democratic) could yield results more in line with expectations. These mirrored the total population results, and this was demonstrated with the EFA and confirmed with the CFA results. These affirm some studies done across cultures and nations (Chapman and Elliot, 2019;Ludeke and Larsen, 2017), where it was found that the BFI-10 research instrument was not structurally valid and the measure demonstrated equivalence. This research is also novel in that it used the WEIRD principle within a specific population and provided insight into the applicability of the BFI-10 research to population enclaves with WEIRD attributes but situated in a non-Western country. This can be seen as a unique methodological contribution to the study.
Our study has a limitation in that it relies only on secondary sources, and we had no influence over the data collecting process. However, multiple research publications, in credible journals, that have used the WVS datasets, corroborate the data's legitimacy. Another issue is that the research instrument employed was written in English. However, several of the participants were not native English speakers, which might have hampered their interpretation of the questionnaire items. Future researchers are encouraged to assess the structural validity of all instruments "imported" to their cultures, before using them. This may not only be a concern in South Africa, but also in other countries, which are not WEIRD. Specifically, regarding the B5P assessments and the BFI-10, the functionality of the theory, across cultures, may require further investigation, as well as the utility of short versions of instruments, across cultures and languages. It may be difficult to capture meaning in a few words or sentences across the divide of culture and language (Djiwandono, 2006).

Conclusions
It was concluded that the B5P model was not replicable in the short format in South Africa, or within any of the subgroups created using the WEIRD principle. The analyses revealed that the BFI-10 factor structures are not replicated in the South African and South African sub-samples and that the BFI-10 lacks measurement invariance within the South African population. The assumption that well-established instruments are valid in settings different to the ones they were initially developed in, should be questioned, and such instruments should not be used, unless thoroughly tested. This study exposes the extent of measurement non-invariance when using an instrument in a foreign setting, and shows that, even in equitable subgroups, invariance does not occur. Those working with foreign individuals or conducting cross-cultural research should be particularly aware of these threats to validity. Though only some research, specific to measurement invariance and the WVS, is available, the decision to include the BFI-10 in the 7th wave of the survey seems wise.