Trait-specific testing of the equal environment assumption: The case of school grades and upper secondary school attendance

Objective: This paper tests the equal environment assumption for school grades and upper secondary school attendance and describes the conditions under which violations are problematic. Background: A growing number of sociologists use twin-based research designs, particularly the Classical Twin Design (CTD), to differentiate between genetic and social causes of social inequalities. One key assumption of CTD is that environmental influences are shared by monozygotic and dizygotic twins to the same extent; called the equal environment assumption (EEA). This assumption is frequently contested and the target of concern, because violation can result in an overestimation of heritability and an underestimation of the role of the social environment. Method: Using data from the first wave of the German TwinLife study, the paper illustrates two approaches to test EEA for school grades and enrolment in upper secondary school (Gymnasium). The analysis is based on a sample of twins (N = 1,576) aged ten to twelve years. Results: The results show that the approaches are able to detect violations of EEA (though in different ways), depending on the environmental variables that might causally be involved in trait variance. Only in one case was a violation was observed; it had no effect on heritability estimates. Conclusion: While EEA holds for school grades, violations do not automatically invalidate CTD in case of upper secondary school attendance.


Introduction
Twin-based research designs are relatively new to sociology, and a growing number of sociologists use these designs to differentiate between genetic and social causes of social inequalities (e.g. Nielsen 2016; Grätz & Torche 2016;Jaeger & Møllegaard 2017;Schulz et al. 2017;Gil-Hernández 2019;. Demographers have used twin-based research designs for longer to differentiate between genetic and social causes of demographic outcomes, such as fertility (e.g. Rodgers et al. 2001; for an overview, see Mills & Tropf 2015). Though there are newer approaches based on molecular genetic variation in unrelated individuals, such as genome-wide association studies (GWAS) and polygenic risk scores derived from such GWAS (e.g. Rietveld et al. 2013), twin-based research designs have great value in understanding the sources of social inequalities and underlying family processes. For example, the study of twins opens up possibilities to examine causal relations in the comorbidity of traits based on discordant twins. In addition, twin studies help us to better understand underlying biological processes. For example, MZ twin designs allow the study of biological discordance against an equivalent genetic background (for a detailed overview see van Dongen et al. 2012). Compared to traditional sibling analysis, which suffers from unobserved heterogeneity (Solon et al. 1991: 512), twin designs have the advantage that they allow us to differentiate between genetic and environmental confounds (Diewald et al. 2016; for an example see Baier 2019). However, this superior control comes at the cost of strong assumptions that are regularly contested.
The most frequently applied genetically informative design is the Classical Twin Design (CTD). CTD is based on a comparison of monozygotic (MZ) and dizygotic (DZ) twins pairs (Keller et al. 2010: 377). One of its key and in general most debated assumptions is the equal environment assumption 1 (EEA). EEA assumes that environmental influences are shared by MZ and DZ twins to the same extent (Derks et al. 2006: 403-404). At the heart of concerns regarding EEA is the observation that MZ twins often experience much more similar home environments and are often treated more alike than DZ twins (e.g. Robin et al. 1994;Evans & Martin 2000;Felson 2009). They more often share the same room, are more often dressed alike, and more often play together than DZ twin pairs (Loehlin & Nichols 1976: 50-51;Robin et al. 1994;LoParo & Waldman 2014). However, greater similarities in the environment of an MZ twin pair do not automatically invalidate CTD. Even if violations occur, these do not necessarily affect estimations, as long as differential treatment of MZ and DZ twins is unrelated or only weakly related to a trait under study; this is referred to as the "trait-relevant" definition of EEA (LoParo & Waldman 2014: 611).
Sociological research in this area often points to the role of family and demographic processes in explaining social inequalities (Kiernan & Mensah 2011;Mare 2011). In this case, there is clear evidence of family environment and parental treatment being relevant for a child's life chances. Apart from investments in children related, for example, to cultural activities, differences in parent-child interactions especially have been shown to cause differences in child outcomes, including their academic achievement (Fan & Chen 2001;Lareau 2002;Spera 2005;Cheadle 2008;Kiernan & Mensah 2011). Accordingly, even 1 The literature criticizing it includes Joseph (2015); Beckwith & Morris (2008); Moore & Shenk (2017). The literature supporting it includes Derks et al. (2006). small systematic differences in the home environment and in the parental treatment of MZ and DZ twins could -in the long run -cause strong differences in child development (Plomin & Daniels 2011: 576).
As argued by Bouchard and McGue (2003: 9) and Joseph (2015), it is "good scientific practice" to test and demonstrate the validity of the "trait-relevant" definition of EEAespecially for sociological research on the genetic and social causes of social inequalities. Previous research has tested EEA mainly for health outcomes and psychological traits 2 (for an overview, see Felson 2014). To my knowledge, there have been only three studies so far that tested the validity of EEA for status-related outcomes, including income, years of education (Felson 2014), high school grade point average (GPA) (Conley et al. 2013;Felson 2014), and qualification test scores (Loehlin & Nichols 1976: 51-52). These studies presented mixed results. While Conley et al. (2013) found no indication of EEA being violated in the case of GPA, Felson (2014) observed EEA being invalid for income and years of education and describes the overall bias as modest. However, previous research is based on relatively small samples. Conley et al. (2013: 421) based their analysis on 392 twin pairs. Small samples make it more difficult to detect violations of EEA (Derks, Dolan, & Boomsma 2006). Further research is therefore needed to validate previous results.
Moreover, since violations of EEA can lead to an upward bias in heritability estimates, violations of EEA have been regarded as a possible explanation for part of the "missing heritability" problem (Felson 2014). Missing heritability refers to the gap in heritability estimates derived from twin data and genotyped data (Young 2019). While the research has discussed various reasons for this gap, such as the presence of non-additive genetic effects (Zuk et al. 2012;Zhu et al. 2015), or the effects of rare variants (Zuk et al. 2014;Tropf et al. 2017), researchers have additionally argued that twins studies might simply overestimate heritability in the case of violations of the underlying assumptions (Felson 2014;Young 2019). Therefore, testing the "trait-relevant" definition of EEA is important to underpin the validity of results obtained from twin data.
Given the need to pay greater attention to the problem of differential treatment in cases when EEA is likely to be violated (e.g. Richardson & Norgate 2005), this paper studies to what extent EEA holds for three educational outcomes: child's maths grade, German grade, and enrolment in upper secondary school (Gymnasium). Addressing the "trait-relevant" definition of EEA, I illustrate two different approaches to test EEA and compare the results. Both approaches link possible violations of EEA to differences in experiences of parental treatment among MZ and DZ twins. However, the first approach does so only indirectly, based on the physical similarity between twins. The second approach directly studies the parenting the twins receive by looking at the mother's reports on her parenting style. Mothers are normally the person most knowledgeable about the child (Jenkins et al. 2003: 102). Parenting styles are influenced by family structures (Chan & Koo 2010) and can be understood as the parent's capacity to socialize their children by changing the effectiveness of parenting practices expressed in parenting activities (Darling & Steinberg 1993: 493). Parenting styles have not only been shown to influence educational outcomes such as school grades (Conger et al. 1992: 532, 536-537), they have also been identified as important mediators of the effect of family background on school grades (Kaiser, Li, & Pollmann-Schult 2019).
I test the validity of EEA and evaluate the effect of violations of EEA on heritability estimates using data from the first wave of the German TwinLife panel study. TwinLife includes families across the full range of the social strata and is representative of the German population (Lang & Kottwitz 2017). To my knowledge, this is the first kind of study testing the validity of EEA for educational outcomes in Germany, and it is one of the few that considers the validity of EEA for status-related outcomes overall. In addition, the study extends previous research by illustrating and comparing the results of two approaches to test EEA.

The classical twin design and its extensions
Independent of the underlying method, whether it is twin correlations, structural equation models, or advanced regression techniques, CTD compares the similarities in a trait between MZ and DZ twin pairs to calculate the narrow sense heritability of a trait (h²) (Keller et al. 2010: 377). Heritability estimates have been criticized for being misleading, because they "convey[s] a sense of direct genetic influence" on traits (Moore & Shenk 2017: 2). Nevertheless, heritability estimates provide a good indicator of possible genetic confounding (Freese, Li, & Wade 2003). In addition, heritability estimates can vary between sub-groups in a given population, and comparing estimates between groups provides information about the nature of between-group differences (Visscher, Hill, &Wray 2008: 257). For example, heritability estimates can be used as an indicator describing the degree to which sub-groups are differently able to fully develop their genetic potential (Scarr-Salapatek 1971;. CTD calculates the heritability of a trait (h²) as twice the difference between the intraclass correlations of MZ ( ) and DZ twins ( ) (Conley et al. 2013: 416). Since MZ twins share 100% and DZ twins on average 50% of their genetic makeup, intra-class correlations ( , ) can be decomposed into a heritability (h²) and a shared environment (C²) component (Felson 2014: 185-186).
In this context, the equal environment assumption (EEA) is crucial because it helps to solve the equations. 3 EEA assumes that "the covariance between environment and genetics is zero" (Conley et al. 2013: 416) -i.e. that the environment has the same effect on MZ and DZ twins' behaviour. Only when EEA is fulfilled, subtracting the equations (1) and (2) leads to differences in twin-pair correlations ( − ) being equal to 0.5 h²; thus, twice the difference being the heritability of a trait (h²) (Felson 2009: 4).
An extension of CTD often used by sociologists is genetically informed linear mixed models, such as ACDE models and their variants. ACDE models partition the total variance (var(y)) in a trait into four components (Rabe-Hesketh et al. 2008: 281): an additive genetic (A), a shared environment (C), a non-additive genetic (D), and a non-shared environment (E) component.
Another assumption is the absence of genetic assortative mating, or a random selection of mates in a population (Conley et al. 2013: 415).

Heritability
The degree to which an outcome variable (trait) varies by genetic variation in a given population (Freese & Shostak 2009).

Narrow sense heritability (h 2 )
The additive genetic effects which represent the averaged effects of single alleles on the phenotype (Neale & Cardon 2013: 12).

Broad sense heritability (H 2 )
The sum of the additive genetic and non-additive genetic effects (Visscher, Hill, & Wray 2008: 256). Non-additive genetic effects relate mainly to dominance and epistasis.
Dominance relates to interactions between alleles at single loci, whereas epistasis describes the interaction between alleles at different loci (Neale & Cardon 2013: 12).

Missing heritability
The gap in heritability estimates from twin data and genotyped data (Young 2019).
In this context, the additive genetic component represents the main or averaged effects of single alleles on the phenotype (h²). The non-additive genetic component refers to two main types of genetic non-additivity: dominance and epistasis (Neale & Cardon 2013: 12). Dominance relates to interactions between alleles at single loci, 4 whereas epistasis describes the interaction between alleles at different loci (Neale & Cardon 2013: 12). The C component reflects the extent of homogeneous effects of environments shared by the twins on a trait that work in the same direction and make twins more similar. However, even when the twins share an environment, its effect does not necessarily end up in the C component. In cases in which the twins experience the same environment differently, the variance will enter the E component. For example, the same degree of parental control of a child's behaviour can be experienced differently by the twins and finally lead to different outcomes. Therefore, the E component reflects two types of effect that make twins less alike: (1) unshared environments, e.g. different peer groups, and (2) distinct reactions by the twins to the same environment (Turkheimer & Waldron 2000;Freese & Jao 2017).
However, ACDE models and their variants cannot easily be estimated, because there are more unknown parameters than known parameters (Coventry & Keller 2005: 214-215). In this context, EEA again provides a solution, because it reduces the number of estimated parameters. Leaving three parameters and two covariance terms, one for MZ and one for DZ twins, to be estimated, the model is identified by additionally assuming either no additive-genetic effects (ACE model) or no shared-environment effects (ADE model) (Keller et al. 2010;Zyphur et al. 2013: 575-576). In this context, the twin correlations (ICC) can be used as a first indicator for the presence of non-additive genetic effects. When the MZ correlations (rMZ) are twice as large as the DZ correlations (rDZ), the ADE model applies (Bleidorn et al. 2018). Otherwise, the model reduces to an ACE model.
There are different extensions of CTD that also relax EEA and control for geneenvironment interactions, e.g. through the inclusion of environmental indices (Boomsma et al. 2002: 875;Conley et al. 2013: 416) or through the inclusion of additional informants, such as parents, siblings, and even other relatives (Keller et al. 2010).

Gene-environment interplay and EEA validity
Testing the "trait-relevant" definition of EEA requires researchers to understand the environmental variables that might causally be involved in trait variance (Richardson & Norgate 2005: 341). While this makes testing the validity of EEA much more complicated, some researchers argue that even if "trait-relevant" influences are found, leading to differential treatment of MZ and DZ twins, this would not necessarily lead to biased estimates (Joseph 2015 chapter 7;Verhulst & Hatemi 2013). These researchers argue that genes can confound environmental similarities (Derks et al. 2006: 403). For example, parents' treatment of their children seems to be influenced by their children's genetic makeup (evocative gene-environment correlation; Plomin, DeFries, & Loehlin 1977), and MZ twins appear to be treated more alike regarding their mother's expression of warmth than DZ twins (Kendler 1996: 15). These differences between MZ and DZ twins could be explained either by the greater genetic similarity in MZ twins leading to greater behavioural similarity in the twins themselves, which impacts the parenting they receive, or by parents of MZ twins being less able to differentiate the behaviour between them (Grätz & Torche 2016: 10). In both cases, the greater similarity in MZ twins' traits would then relate to geneenvironment interplay.
Technically, EEA allows for the confounding of genotype and environment, called geneenvironment correlation (rGE), as well as environments moderating the effects of genes, or genes affecting the sensitivity to environments, called gene-environment interaction (GxE) (Price & Jaffee 2008: 305-306). In the presence of rGE and GxE heritability estimates based on CTD, these encompass not only direct genetic effects, but also indirect effects (Stenberg 2011). However, Joseph (2012) criticizes this argument for circular reasoning, because CTD's premise is the goal of separating variances into a genetic and an environmental component based on EEA, assuming no rGE nor GxE. In this context, it is a conceptual issue of how the genetic component is understood and defined, and whether greater environmental similarities resulting from gene-environment interplay are understood as reflecting genetic effects. Strictly speaking, effects related to rGE or GxE cannot be clearly allocated to either the environmental or the genetic component.
In addition, as argued by Fosse et al. (2015), for evocative genetic effects to be a valid defence of the twin method, MZ twins themselves must be regarded as the primary causal agents of any increased correlation in a child's "trait-relevant" exposures. However, it often remains an empirical question whether twins' behaviour is more alike, because they are treated more alike, due to their more similar appearance, or due to other underlying factors (Matheny et al. 1976). In addition, even if the presence of rGE or GxE is understood as violating EEA, it is debatable whether violations necessarily result in an overestimation of heritability (Walker et al. 2004;Richardson & Norgate 2005;Conley et al. 2013: 415;Joseph 2015). As demonstrated by Verhulst & Hatemi (2013), it is not in all cases that the presence of GxE and rGE has meaningful effects on the estimated variance components (compare Conley et al. 2013;Felson 2014). Again, this seems to be the case only when the specified environment is substantially correlated with the trait under study. In such cases, extensions of the CTD that deal with GxE and rGE are available and can be utilized (Purcell 2002;Verhulst & Hatemi 2013: 368-369, 371).

Testing EEA validity
Different ways to test EEA have been developed (for an overview see Derks et al. 2006: 403-404;LoParo & Waldman 2014: 606-607). A first method is based on a comparison of the impact of twins' actual and perceived zygosity on trait similarity (Kendler et al. 1993). The twins' zygosity is not automatically obvious to the parents (Bamforth & Machin 2004), and not always determined correctly by professionals during pregnancy or after birth, leading to a substantive proportion of twins being misclassified (Ooki, Yokoyama, & Asaka 2004;Cutler et al. 2015). In cases in which misperceived zygosity leads to differences in trait similarity, EEA is regarded as being violated, because these difference are assumed to relate to treatment effects. Trait similarity is said to be affected by the twins' environments treating them more alike, due to greater perceived similarity and not based on their actual genetic similarity (see Conley et al. 2013 for an example).
An important limitation of the first approach is that in most datasets the number of misperceived twins is too small to actually test EEA. For example, the analysis of Conley et al. (2013: 421) for High School GPA is based on twelve misperceived DZ twins and fifty-six misperceived MZ twins. For the current analysis, the number of misperceived twins in TwinLife can be considered too low to detect violations of EEA (in the cohort studied the sample includes just 222 misperceived twins). A second alternative method is to investigate in how far the physical resemblance between twins, often leading to misperceptions of their zygosity, leads to differences in how MZ and DZ twins are treated (Hettema et al. 1995). Appling this method, researchers study the correlation between the physical similarity of twin pairs and trait similarity after controlling for zygosity (LoParo & Waldman 2014: 606). If greater physical resemblance leads to greater trait similarity, after controlling for zygosity, EEA is again assumed to be violated due to the remaining differences probably relating to treatment effects.
While there is often more data available to apply the second approach, in relation to both approaches it can be argued that the greater trait similarity in DZ twins can relate to greater genetic similarity (e.g. Plomin, Willerman & Loehlin 1976: 50), leading to more physical resemblance and increasing the likelihood that the twins' zygosity is misperceived. In this case, significant correlations would then point to gene-environment interplay (GxE or rGE), which does not violate EEA (see section 2.2). However, most researchers probably want to describe the extent of GxE or rGE separately from heritability estimates, utilizing the extensions of CTD. In addition, it is still possible to test EEA based on the first two approaches by looking at MZ twins only and regarding the extent to which physical similarity and misperceived zygosity affect trait similarity. In this case, greater trait similarity can no longer be related to greater genetic similarity.
A third method to detect violations of EEA is to determine the extent to which increased environmental similarity in MZ twins relates to the behaviour of the twins themselves or is initiated by others (LoParo & Waldman 2014: 607). While in the first case environmental similarity could again be attributed to gene-environment interplay (GxE or rGE), in the second case environmental similarities would again relate to treatment effects. One problem with this method is the additional information required to determine whether any observed behaviour was initiated by important others.
A fourth method, developed by Derks et al. (2006), suggests that EEA can be evaluated based on multivariate data and by using only DZ twins. Using more than one observed trait variable, one can calculate the shared environmental correlation in DZ twins for these phenotypic traits. As long as this correlation does not deviate significantly from 1 for same sex DZ twins, indicating that shared environment affects the traits alike, EEA is supported. The advantage of this method is that it does not require information on environmental similarity between twins. However, as demonstrated by Derks et al (2006: 409), this method requires 1) that "the shared environmental correlation in DZ twins is different from .5", 2) the included trait variables are not perfectly correlated, 3) the factor loadings of the variance components are not collinear. In addition, regarding the constraints needed to reduce the unknown parameters to get the model identified, 4) an identifying constraint is needed "that does not lead to a significant decrease in model fit" (Derks et al. 2006: 409).
A fifth method, following the approach by Loehlin & Nichols (1976), is to evaluate the associations between similarities in twins' environments and trait similarities within zygosity groups (LoParo & Waldman 2014: 607;Derks et al. 2006: 404). If this correlation is significantly greater than zero, EEA is violated. This is because the greater similarity in the traits studied for MZ twins is no longer linked to only greater similarities in genetic endowments. This frequently applied method has the advantage that it can be applied without extra information on either the physical resemblance of the twins, information about twins' perceived zygosity, or information from different informants, i.e. twins and their parents. In addition, it can be applied in contexts where the focus is on one particular trait and there are no additional traits to which the shared environmental correlation in DZ twins can be compared. Accordingly, the sixth method places fewer demands on the data than most of the other methods. However, precise information on the twins' environments is needed, while researchers need to be sure which facets of the twins' environment is relevant for the traits studied. A slightly improved version of this method is applied by Felson (2014), who estimated heritability for thirty-two different outcomes with and without controls for environmental similarity. Comparing the changes in heritability estimates, he was able to test whether environmental similarity significantly reduced heritability and thus whether EEA was violated or not.
A sixth method is to compare the similarity of how parents report the way they treat their twins with the similarities in the twins' traits for MZ and DZ twin pairs (Kendler & Gardner 1998). If greater similarities in the parental reports relate to grater similarity in the twins' traits, this might point to differential treatment effects. This method is useful when data on the physical similarity of twin pairs, or any related information such as the misperception of the twins' zygosity, is not available. Beyond that, research studying the causes of social inequalities frequently relies on mechanisms related to parenting (Kaiser, Li, & Pollmann-Schult 2019). In this context, many sociological studies have focused particularly on parenting practices in terms of investments in children, such as cultural activities that affect the formation of a child's cultural capital (e.g. Lareau & Weininger 2003;Roksa & Potter 2011). However, more recently, parenting styles, which psychologists have traditionally analysed (e.g. Fan & Chen 2001;Chao 1994Chao , 2001García & Gracia 2009), have been integrated into sociological research as an important concept to describe the mechanisms through which parents influence a child's skills development and -most importantly -educational outcomes (e.g. Pong, Hao, & Gardner 2005;Chan & Koo 2010;Kiernan & Mensah 2011;Kaiser, Li, & Pollmann-Schult 2019). As described before, parenting styles moderate the relationship between parental activities and child outcomes by transforming parent-child interactions and influencing a child's personality (Darling & Steinberg 1993: 493). For example, when parents support their children with their homework (parenting activity), they might either strictly control their children's behaviours ("authoritarian parents"), provide a high level of support ("indulgent parents"), or do both ("authoritative parents") (Huver et al. 2010). Parents can then actually influence child outcomes. For example, more nurturant parenting, expressed in terms of greater parental warmth, has been observed to lead to better school performance (Conger et al. 1992: 532, 536-537), while insufficient parental control and over-controlling have been observed to impact negatively on child development through raising levels of child depression and lowering levels of child competence (Schiffrin et al. 2014: 548, 554). In addition, there is evidence that specific dimensions of parenting styles are differently affected by the genetic makeup of children. In their meta-analysis of parent-based designs Kendler & Baker (2007: 619-620) found that in particular parental expression of emotional warmth is more strongly affected by a child's genetic makeup than parental expression of behavioural control. Though the results vary according to whether parental or child reports are taken into account (Kendler & Baker 2007: 619), this makes parenting styles particularly interesting for studying possible violations of the "trait-relevant" definition of EEA related to differences in family processes between MZ and DZ twin families.
Therefore, alongside the second approach -investigating the effects of physical resemblance on trait similarity in MZ and DZ twins -which is used for comparison, this paper tests the "trait-relevant" definition of EEA based on the sixth method.

TwinLife
This paper is based on the first wave of TwinLife, a prospective longitudinal study of twins and their families in Germany (Diewald et al. 2017). The first wave includes four cohorts of about 500 pairs of MZ and about 500 pairs of same-sex DZ twins per cohort (in total N = 8,194 twins, nested in 4,097 families). Sampling was based on a stratified random sampling strategy using administrative data from communal registration offices. TwinLife thus includes families across the full range of the social strata (Lang & Kottwitz 2017). Of the four birth cohorts in the data (C1: born 2009-2010, C2: born 2003-2004, C3: born 1997-1998, C4: born 1991-1992), I focus on the second-youngest cohort, who were aged between ten and twelve years at the time of the first interview (N = 2,086 twins out of 1,041 families 5 ). I focus on mother's reports of parenting styles because the information on fathers is limited. Mothers more often took part in the survey, more often completed the required information on their parenting styles, and can normally be regarded as the person most knowledgeable about the child (Jenkins et al. 2003: 102). Table 1 provides some basic information on the sample demographics of the TwinLife dataset in cohort 2. Children enrolled in primary school (incl. schools with an orientation level for secondary education), schools for special needs, other unspecified school types, or in Waldorf schools were excluded from the analysis (N=302). In addition, in a few cases, the information on school grades or the school track was missing (N=137). Excluding these children and restricting the analysis to full twin pairs reduced the analytical sample to a maximum of 1,576 cases.

School grades and enrolment in upper secondary school
In TwinLife the information on school grades was taken from the most recent report card of the children (Mattheus et al. 2017: 6). For respondents for whom this information was missing, parents were asked to report on their children's academic performance. The performance of school children in the German school system is evaluated based on a sixpoint grading scale. Grades range from 1 (excellent) to 6 (insufficient). In this paper, I look only at the German and maths grades of children. As demonstrated by Table 2, on average children scored between "good" (2) and "satisfactory" (3) in both subjects. However, looking at differences in grades across school types, children from upper secondary school received better grades than children from lower or intermediate secondary school. Since grades can have different meanings across school types, the following analysis of grades is partly split between lower/intermediate secondary and upper secondary schools (Gymnasium), and restricted to twin pairs enrolled in the same school type.
Apart from looking at school grades, I am also interested in enrolment in upper secondary school (Gymnasium) compared with enrolment in other secondary school types. As demonstrated by Table 2, about 52% of the children (N=823) in cohort 2 attended upper secondary school, which is above the population mean (40% in the school year 2014/15; Malecki 2016: 26). However, taking into account all children, also those still enrolled in primary education and any other excluded school types (N=302), 43% are enrolled in upper secondary school, reflecting the general population very well.

The twins' physical resemblance
The twins' physical resemblance was derived based upon a set of questions included in the physical similarity questionnaire (Lenau et al. 2017). These questions referred to the twins' parents' perceptions of (1) "significant differences", (2) "slight differences", or (3) "no differences" in the twins' hair colour, hair texture, eye colour, and earlobes, and parents' assessments of the twins' similarity based on earlier photographs and resemblance in early childhood; these were recoded into (1) "had no resemblance at all", (2) "looked exactly the same", (3) "had a strong resemblance, like siblings". Taking the mean score of these, the resulting score ranges from one to three, with higher values indicating greater physical resemblance between the twins. As expected, the twins' physical resemblance is generally higher for MZ than for DZ twins (Table 1). Moreover, there is variation in physical resemblance in both MZ and DZ twins, which is crucial for one aspect of the method applied.

Parental treatment (parenting styles)
Research testing EEA frequently relied upon indicators describing differences in parental treatment -in particular the parenting twins receive (Felson 2014). In this paper, too, I measure similarities in family environment based on parental treatment, more precisely the mother's report on her parenting styles (e.g. how often she "praised" or "scolded" her children). Parenting styles were reported according to ten items from five different subscales (for an overview see Baum et al. 2020). These five subscales identify mother's parenting styles according to her emotional warmth (Jaursch 2003) (three items), her negative communication (Schwarz et al. 1997) (two items), the degree of inconsistent parenting (Reichle & Franiek 2005) (two items), strict control (Schwarz et al. 1997) (two items), and psychological control (Reitzle et al. 2001) (one item). The ten items measure parenting styles on a scale of one ("never") to five ("very frequent"), according to how often specific parenting behaviours occurred. Aggregating these items by use of mean scores results in five variables that are reliable (Table 3). Previous research suggests that facets of parenting styles are affected differently by the genetic makeup of children (Kendler & Baker 2007: 619-620). Therefore, it seems necessary to analyse the different parenting dimensions separately. In a first step, I look at the five sub-dimensions separately. In a second step, the ten items were then aggregated to reflect overall negative parenting styles expressed by mothers ("negative parenting"). In this context, the items measuring emotional warmth were recoded so that higher values reflected the absence of emotional warmth.
As demonstrated by Table 3, mothers in TwinLife score comparatively high on the items measuring the expression of emotional warmth, and tend to less often report negative communication and inconsistent parenting as parenting styles. Moreover, the resulting new variable "negative parenting" is normally distributed and reliable.

Methods
This paper looks at three different educational traits to test EEA. In this context, the analysis is split into two parts. In the first, I derive the results of different multilevel mixed-effects ACE variance decomposition models (Guo & Wang 2002). The analysis is based on the acelong command developed by Lang (2018). Acelong is a wrapper for generalized structural equation models (GSEM) that estimates different types of multilevel mixedeffects ACE variance decomposition model, such as that proposed by Guo and Wang (2002). Based on the variance decomposition model, I calculate the sizes of the different variance components, the additive genetic (A), the shared environment (C), and the non-shared environment (E) component, for the traits of interest. I also report MZ and DZ correlations (inter-class correlation, ICC). While school grades are assumed to be scaled metrically, a child's enrolment in upper secondary school is binary. Therefore, the analysis of a child's track attendance is based on a linear probability model, where an underlying latent variable describing a child's probability of attending the upper secondary school is assumed: In extreme cases of combinations of independent variables, linear probability models have been observed to estimate coefficients that imply probabilities below 0 or above 1. However, such cases are very unlikely to occur (Hellevik 2009). In addition, linear probability models and logistic regression models produce similar results when the percentage of cases with high values on the dependent variable varies between 0.2 and 0.8 (Hellevik 2009: 62-64, 68). In the current case of the binary dependent variable, the percentage of cases with high values, i.e. those enrolled in upper secondary school, is about 52% (see Table 2). Comparing the results for the linear probability model with those for a respective binary logistic regression additionally shows that the results are indifferent (see Table 5). Therefore, for ease of interpretation, I discuss only the results of the linear probability model.
Moreover, the models are based on maximum likelihood estimation, to improve the model fit, and use clustered robust error terms to resolve problems related to heteroscedasticity (Hellevik 2009). Finally, for the analysis of school grades, results of school-type specific models are reported to account for possible differences in the grading between school types.
In the second part of the analysis, I test the validity of EAA for the outcomes studied based on two different approaches. First, I investigate the effect of twins' physical resemblance on trait similarity at the twin-pair level based on OLS regression models. In this context, I include an interaction term between physical resemblance and the twins' zygosity to study the effects within zygosity groups. Second, I evaluate the associations between similarities in parenting the twins experience and similarities in school grades as well as the twins' chances of being enrolled in an upper secondary school within zygosity groups. If the calculated correlations are significantly greater than zero, EEA is violated (Derks et al. 2006: 403-404). Similarities in parenting experiences were derived by calculating the absolute difference in the parenting a pair of twins received ( = ( 1 − 2 ). The higher the values in the resulting variables, the greater the differences in parenting the twins experience, and the smaller the values, the greater the similarities. Following this approach, the differences in school grades were also calculated for each twin pair. For track attendance a binary variable was derived, describing whether the twins were on the same or different educational tracks (upper secondary school or any other track).
As demonstrated by Table 4, the majority of twin pairs were enrolled in the same track (85%). In about 47% of cases both twins attended a upper secondary school. Table 4 also shows that there is sufficient variance in school grades and parenting styles within twin pairs as indicated by the mean differences. Interestingly, there are greater differences within twin pairs regarding experiences of maternal control than experiences of emotional warmth. Moreover, differences tended to be smaller when regarding overall parenting style ("negative parenting") instead of looking at the different parenting sub-dimensions. Based on the observed differences, in a final step, EEA is investigated using OLS regression models that examine if differences in the parenting experiences explain differences in the twins' grades or their likelihood of enrolling in higher secondary education, controlling for the child's zygosity. These models are again derived at the twinpair level and control for the other parenting styles. Table 5 provides an overview of the results of the multilevel mixed-effects ACE variance decomposition models for all twins for whom there is full information at the twin-pair level. Looking at the reported twin correlations, I find no indication of non-additive genetic effects, suggesting that the ACE model is valid. As demonstrated by Table 5 for all three traits, and independent of whether one looks at the school types separately or combined, and of whether the twins attend the same school type or not, we find substantial heritability estimates. These range from around 37% of the variance for maths grades to about 56% of the variance for German grades being explained by variances in genes. The results additionally suggested that shared and non-shared family environments, too, explain a large proportion of the variance in school grades. For the enrolment in upper secondary school, the shared-environment component turns out to be even more important. The results are consistent with previous research -for example, on the heritability of school grades (Eifler, Star, & Riemann 2019). However, the derived confidence intervals turn out to be relatively large, reflecting the relatively low power of the track-specific models (this probably relates to small sample size). For some of the trackspecific models, particularly for German grades, the lower bound of the confidence interval is actually negative; suggesting that under specific circumstances the model even reduces to an AE model. Notes: 1 only full twin pairs, 2 including comprehensive schools, 3 twins enrolled in the same school type.

Results for the Classical Twin Design (CTD)
Reading note: Where the confidence intervals for C cover negative values, the models may reduce to an AE model under specific circumstances. Table 6 investigates the effect of twins' physical resemblance on trait similarity controlling for the twins' zygosity. The results suggest there is no violation of EEA for school grades. The result is robust for all twins, as well as for twins enrolled in the same school type (not shown), or for both twins enrolled in specific school tracks (Table 6). Similarly, the results suggest no violation of EEA for a child's chance of being enrolled in an upper secondary school. In a second step, the similarities in maternal reports on how they treat their twins with similarities in the twins' traits are studied for each zygosity group using different regression models. Table 7 shows that nearly all parenting dimensions correlate with a child's school grades and track attendance. The results in Table 8 show a significant effect for zygosity, suggesting greater differences in the outcomes studied for DZ twins compared to MZ twins. This effect relates to the role of genetic endowments in explaining greater similarities in the traits studied. Only in the case of track attendance is there a significant direct and interaction effect for differences in maternal psychological control, suggesting a violation of EEA. The results are the same regardless of whether the models control for other parenting styles or take into account only twins enrolled in the same school type (not shown). Correcting for multiple comparisons based on the Benjamini-Hochberg method (Benjamini & Hochberg 1995), assuming that in one case the H0 is erroneously rejected, the result for psychological control remains significant (p < 0.03).

Physical resemblance
Following the approach of Felson (2014), I test the extent to which the observed violation of EEA for psychological control leads to an overestimation of heritability. Comparing the results for nested models with and without controls for the extent of psychological control (not presented here), the derived heritability estimates are not different (without controls: A=40.3%, with controls: A=40.6%). Thus, the results suggest that the observed violation of EEA did not result in a meaningful overestimation of heritability.

Discussion
Twin-based research designs are relatively new to sociology, and a growing number of sociologists use these to differentiate between genetic and social causes of social inequalities. This paper tested one of the key and most debated underlying assumptions behind the most frequently applied genetically informative designs, CTD, i.e. the equal environment assumption (EEA). The paper extends previous research, which has mainly tested the validity of EEA for psychological and health outcomes (Felson 2014), and tested the "trait-relevant" definition of EEA for school grades and enrolment in higher secondary education based on two different approaches. Both approaches link possible violations of EEA either indirectly or directly to the experiences of differential parental treatment by MZ and DZ twins. In sociology, differences in family environment and the parenting children receive have been integrated as important key concepts to describe the mechanisms through which parents influence a child's skills development and educational outcomes (e.g. Lareau 2002;Pong, Hao, & Gardner 2005;Kaiser, Li, & Pollmann-Schult 2019). Systematic differences between MZ and DZ twins could in the long run relate to strong differences in child development (Plomin & Daniels 2011: 576). Therefore, testing for possible violations of EEA based on differences in the parental treatment of twins is particularly relevant to sociological studies based on CTD. In this paper I focused on parenting styles, which have been shown to influence educational outcomes such as school grades (Conger et al. 1992: 532, 536-537), and to mediate the effect of family background on school grades (Kaiser, Li, & Pollmann-Schult 2019). Given that different facets of parenting styles appear to be affected differently by the genetic makeup of children (Kendler & Baker 2007: 619-620), this makes parenting styles particularly interesting for studying possible violations of the "traitrelevant" definition of EEA in the context of status-related outcomes. The results demonstrated that, independent of the approach applied, EEA was not violated in the case of school grades. However, there was an indication of EEA being violated in the case of track attendance when taking into account the extent of maternal psychological control children receive. Interestingly, no violation of EEA was detected when testing the effect of physical similarity on track attendance, and violations did not show up for the aggregated measure of negative parenting styles. Comparing heritability estimates for models with and without extra controls to capture the source of violation, following the approach of Felson (2014), the results turned out to be almost identical. Greater similarities in psychological control for MZ twins did not lead to an overestimation of heritability. This probably relates to the weak correlation between maternal psychological control and child's track attendance. Violations of EEA probably need to be much stronger to meaningfully affect the size of the variance components. Though this result is reassuring, because in many applications of CTD associations between the twins' family environments and traits studied can be expected to be weak or mediocre, it remains good practice to test the validity of the "trait-relevant" definition of EEA when using CTD based on one of the available approaches.
One limitation of this study is that the maternal reports on their parenting styles could have been affected by social desirability. Future research should combine reports from fathers and mothers to address this limitation. In the current study the information from fathers is limited and could not be added. Another limitation is that even though the analytical sample was much larger compared to previous studies (e.g. Conley et al. 2013;Felson 2014), confidence intervals were still relatively large. Regarding the results presented in Tables 6 and 8, one might detect more violations of EEA in the case of an even larger sample size. Thus, future research should corroborate the findings presented using an even larger sample.
Nevertheless, the approaches applied were able to detect a violation of EEA for track attendance and -even more importantly -showed that such violations did not automatically invalidate CTD. The results highlight again the importance of the trait-relevant definition of EEA that can be tested only by taking into account the environmental variables that might causally be involved and contribute to greater similarity in MZ compared to DZ twins (Richardson & Norgate 2005: 341). In many cases these environmental variables relate to the family environments of the twins. For example, the tracking decision for children is particularly influenced by the family resources and the decisions taken by the parents (Ditton & Krüsken 2006;Jähnen & Helbig 2015). Therefore, the source of a possible bias violating EEA can be located primarily inside the family. Taking this into account, using an indicator that is only indirectly linked with the family environment or parental treatment, such as physical similarity, it can be more difficult to detect violations of EEA. Using measures directly related to trait variance, i.e. specific parenting styles, seemingly provides better chances.
To identify these influencing factors, a detailed investigation of the family environments and the underlying processes explaining possible trait variances is necessary. In many cases, however, it is not only the family environment that is decisive; so, too, are environments outside the family, which could contribute to greater similarity in MZ compared to DZ twins. For instance, school grades are influenced by parental resources and parental treatment, but they are also subject to school environments and teacherstudent relationships. Thus, if school environments were the main source of bias, one will probably be unable to detect violations of EEA by looking at family environments only.
Taken together, there seems to be a trade-off between the approaches to detect violations of EEA based on how narrowly the environments included in the analysis are defined. Therefore, before testing, researchers need to be sure and clear about where violations of EEA are to be expected. Otherwise, violations of EEA might be overlooked. This insight is relevant not only to research into the genetic and social causes of social inequalities, it extends to all sociological research that relies on CTD.