The bilingual gap in children’s language, emotional, and pro-social development

In this paper we examine whether – conditional on other child endowments and family inputs – bilingual children achieve different language, emotional, and pro-social developmental outcomes. Our data, which allow us to analyze children’s development in a dynamic framework, are extracted from the UK Millennium Cohort Study (MCS). We model the development production functions for bilingual children using cumulative value-added specifications, which account for parental investments and children’s own ability. Analysis based on child age confirms that bilingual children initially have worse language skills than their monolingual peers. The commencement of schooling appears to attenuate these differences, and by age seven, bilingual children have a developmental advantage. We find evidence of a positive relationship between bilingualism and some aspects of emotional development, and it is mainly boys who appear to benefit from their bilingual background. Current version: November 23, 2020


Introduction
Schools, especially in major urban cities, are used by a large number of children for whom English is a second language. These children speak numerous home languages and often constitute the majority of children in a single classroom (Bialystok, 2006). In the United States, for example, more than 4.9 million English learners were enrolled in public elementary and secondary schools during the 2013-14 school year, representing just over 10% of the total student population (National Center for Education Statistics, 2016), while in the Canadian province of Ontario over 25% of students are identified as English language learners (Ontario Education Department, 2007). Similarly, the UK Department of Education estimates that there were more than a million children in the 5-18 age group who were enrolled in UK schools, who collectively speak more than 360 different languages (Department for Education, 2013).
The rise in global migration has left these countries grappling with the challenges of educating and integrating bilingual children. For instance, the cost of providing English language support to 50 pupils has been estimated by Edinburgh City Council to be approximately £ 33,000 per year, and this sum is driven largely by the cost of employing an English language support teacher (Heather Rolfe and Hilary Metcalf, 2009). At the same time, there is a debate going on, which is largely centered around the economic merits of speaking a second language in the context of globalization -directly addressing the danger of a country that relies on the primacy of, say, English (see Nuffield Foundation (2002) for one example of the debate in relation to the UK). The wider question is, to what extent bilingualism matters in the early phase of a child's development, or whether bilingual children perform better than English-only students (Bialystok et al., 2012;Hoff, 2013).
Studies can be cited to support either side of the debate. On the one hand, dual language learning in early childhood places additional burdens on language development in comparison to the acquisition of a single language (Hoff, 2013). On the other hand, there is some evidence in the neuroscience and developmental psychology literature which indicates that bilingual children develop enhanced executive control -i.e., an ability to effectively manage cognitive processes such as problem solving, memory, and thought -resulting in a "bilingual advantage" (Bialystok et al., 2012). Similarly, parents' language choices may influence the way children express their emotions; this affects their emotional development. Shifting from one language to another, for example, helps children to regulate their emotional response by using a less emotional non-native language (Chen et al., 2012). Bilingual children tend to exhibit fewer behavioral problems than monolingual English and Spanish speakers (see Han, 2010 andHuang, 2010) and have stronger family cohesion and higher self-esteem (Portes and Hao, 2002).
Motivated by the importance of early life experience in driving children's life chances, this paper attempts to link these diverse strands of the literature by integrating the role of language in children's skill formation. Building on the theoretical analyses of Todd and Wolpin (2003), Cunha et al. (2006), and Cunha et al. (2010), we examine whether bilingual children have different development outcomes on the basis of "conditional on child endowments" and other family inputs. More specifically, we estimate a skills production function for children born in the UK to at least one migrant parent. This focus on the early development of the native-born children of migrant parents allows us to avoid any potential confounding factors associated with the time which has elapsed since migration. It also provides the opportunity to test whether, given the same educational environment, bilingual children have different developmental outcomes as compared to their monolingual peers. Finally, the specific challenges we identify and the policy solutions which consequently need to be implemented should be relevant to this particular group of children, namely UK born children who are born to at least one migrant parent. 1 We advance the literature available on the subject in several important ways. First, we explore the data available from the UK Millennium Cohort Study (MCS), which allows us to analyze children's language, emotional, and pro-social development in depth and within a dynamic framework. The strength of these data relative to other available data is that they provide unusually detailed longitudinal information on children's home environment.
We estimate the bilingual gap in a child's language, emotional, and pro-social behavior as a cumulative process that depends on current and past endowments of cognitive and noncognitive capacity, as well as on the history of family investments. Previous studies that have relied on cross-sectional data are only able to examine the contemporaneous relationship between bilingualism and child development. However, it is important to study this effect over time. Researchers have argued that although it takes several years of exposure to English for immigrant children to approach native-like fluency in conversational skills, it takes five to nine years to become proficient in academic English (see Collier, 1989 and August and Hakuta, 1997). Our examination of the bilingual gap in the acquisition of language, emotional, and pro-social skills between the ages of three and fourteen, complements existing studies that focus solely on the link between bilingualism and developmental skills which are measured at specific ages. To our knowledge, this study is the first to identify the evolution of bilingual effects across language, emotional, and pro-social behavior assessments from an early age to adolescence.
Second, studying the dynamic consequences of the language which is spoken at home on socio-emotional development is important. It has been shown that behavioral and emotional difficulties not only reflect impaired mental health and high probability of school dropout (Raver, 2003), but are also linked to adult outcomes, including educational attainment, earnings, and employment (see Najman et al., 2004 andJones et al., 2015). Pro-sociality is another important aspect of human personality that affects a wide range of economic decisions, with both low and high pro-social behavior conferring the risk for adverse behavior and mental health difficulties in children (see Nantel-Vivier et al., 2014). 2 Engagement in prosocial activities requires a certain level of language competency since it is largely linguistically rooted.
Third, we explore a range of parental behaviors and home environment activities that can potentially drive the estimated language effects. We hypothesize that potential differences in children's language and pro-social behavior could arise from differences in maternal behavior and mother-child interactions. Finally, a potential problem when estimating the skill production function lies in accounting for unobserved heterogeneity that could confound the relationship. We improve upon the existing evidence by utilizing value-added and dynamic panel data models. While we cannot interpret our results as being causal, the inclusion of past endowments and an estimation of the models with a comprehensive set of controls should largely reduce the bias arising from unobserved heterogeneity. 3 The findings in this article have important implications for identifying children (at different ages) who are likely to lag behind and thus need additional support, and understanding the potential for policies, families, and schools to improve the learning outcomes for these children.
Using an age-specific analysis, we find that children identified as bilingual have lower levels of language development in early childhood. The start of schooling appears to attenuate these differences and at age seven, once we account for past endowments, bilingual children have a language advantage. At ages 11 and 14, bilingual children are not significantly different relative to their monolingual peers; however, we find a language deficit for boys. Boys' underachievement at age 14 in verbal skills is consistent with some official figures. According to the 2011 National Literacy Trust data, 76% of boys achieved the expected level in English at Key Stage 3 (age 14) compared to 88% of girls (All-Party Parliamentary Literacy Group Commission et al., 2012).
We show that there is evidence in maternal behavior -especially in the mothers' provision of cognitive stimulation and interactions at home -that can explain in part the relationship between the narrowing gap between bilingual and monolingual children and early language development. Mothers identified as bilingual are consistently more likely to spend their time with the child and to invest in home activities. In the case of emotional development, it is mainly boys that are found to benefit from their bilingual background, though the estimated bilingual effect varies based on age. Finally, we document a fair amount of heterogeneity in estimated effects along the mother's education. Highly educated mothers are more likely to speak English at home. Consequently, the language development of their children is found to be statistically equivalent to monolingual peers from the age of five onward. Thus, our findings may be particularly useful in development of intervention programs for immigrant families, in helping intervention staff to promote early language, emotional, and pro-social behavior, in helping immigrant families to remain aware of how the use of different languages can have an impact on their child's development. 4 The rest of the paper is organized as follows. Section 2 discusses the previous literature and Section 3 presents the data, sample selection, and variables' construction. Section 4 outlines our conceptual framework and empirical specification. Section 5 discusses the results and Section 6 concludes the paper.

The Literature
While we know little about the relationship between bilingualism and emotional and prosocial development, there is an extensive literature in linguistic and developmental psychology on the impact of bilingualism on children's cognitive development. These studies are usually 3 Estimating the causal effect of bilingualism on children's outcomes is a challenging task. Our data do not provide us with any quasi-experimental variation which is suitable for causal identification. Indeed, it is hard to imagine an experiment that would lead to a random assignment of the language that children speak. Therefore, we regard our results as providing descriptive, non-causal evidence. 4 For example, there is evidence that school-based interventions promote pro-social behavior and therefore help in reducing aggressive behavior (Caprara et al., 2014). motivated by the observation that bilingualism confuses children (Genesee, 1989), or that bilingual children are slower to develop language skills because their learning capacity is divided across acquiring multiple languages (Macnamara, 1967;Hoff, 2013). Studies of bilingual children which focus on verbal tests of intelligence appear to lend support to these concerns by concluding that, on average, bilingual children exhibit slower cognitive development (see Darcy, 1953;Hakuta, 1986;and Bhatia and Ritchie, 2008) and initially possess a smaller vocabulary in each of their languages (Oller and Eilers, 2002;Clifton-Sprigg, 2016). A lack of English proficiency is cited as a primary reason for first-and second-generation children's poor academic performance in elementary school (Rosenthal et al., 1983).
There is emerging evidence, however, of the importance of bilingualism for some forms of cognitive functioning. Bilingualism is associated with higher executive functioning and attention in children (Bialystok, 2001;Yang et al., 2011) and young adults (Costa et al., 2008), and seems to protect against cognitive decline in old age (Bialystok et al., 2012). Further, learning a foreign language may improve the development of analytical and communicative skills (Saiz and Zoido, 2005), while exposure to two languages at an early age is positively correlated with reading, phonological awareness, and language competence in both languages (Kovelman et al., 2008).
Moreover, dual language competency is also positively related to the emotional and behavioral well-being and school-functioning of children born to immigrant parents (Collins et al., 2011). Han (2010) and Leyendecker et al. (2014) find that bilingualism encourages and facilitates children's communication and that this has an impact on their socio-emotional and behavioral skills. At school, young children from immigrant families have been found to have particular strengths in the areas of social skills and behavior (Crosnoe, 2007;De Feyter and Winsler, 2009). Some other studies, however, contradict these findings. For example, Spomer and Cowen (2001) compare mental health referrals for children who were English language learners and native learners, and find that the English language learners were rated higher in shyness/anxiety, and lower in peer social skills and assertiveness, by their teachers.
Most of these studies do not provide causal evidence, since exogenous variation in bilingualism is rarely available. Exceptions include a few cohort and randomized control studies (see Slavin et al. (2011) and Chin et al. (2013) (Plewis et al., 2007). Earlier UK birth cohort studies sampled babies born within a given week; on the other hand, the MCS has the advantage of capturing a birth cohort of infants who were born across a whole year. To date, six surveys have been conducted, at nine months, three, five, seven, eleven, and fourteen years old, resulting in a uniquely detailed portrait of children's development.
The MCS collects information about many diverse aspects of children's lives including: children's behavior, cognitive development, health, schooling, and living arrangements; parental employment and education; family income and poverty status; housing, neighborhood, and residential mobility; and social capital and ethnicity. The main unit of observation is the cohort member (the child), and the required information is collected from cohort members and the main respondent (typically the child's biological mother). 5 For example, children's verbal reasoning test scores, which function as a measure of language development, are collected directly from children by trained interviewers, while questions about the children's socioemotional and pro-social behavior are asked of parents. 5 The "main respondent" is defined as a person who answered the main interview questions regarding the cohort child.
In the majority of cases, this was the biological mother.

Bilingualism in our estimation sample
Bilingualism is a multifaceted phenomenon with no commonly accepted definition. We adopt the perspective proposed by Kohnert (2010), who views bilinguals as "individuals who receive regular input in two or more languages during the most dynamic period of communication development -somewhere between birth and adolescence" (p. 457). We use information pertaining to the language spoken at home to construct a measure of bilingual status. 6 The main respondent, usually the mother, is asked: "Is English the language spoken at home?" Children of respondents reporting "Yes -English only." or "Yes -mostly English and sometimes other languages." are classified as monolingual, whereas the children of respondents reporting "Yesabout half English and half other language.", "No -mostly other, sometime English.", or "Noother languages only." are classified as being bilingual. 7 The self-reported "language spoken at home" is not a perfect measure of bilingualism, since bilingualism is more than simply a language experience. Our data, however, do not include measures of English fluency of the parents, nor do we have precise measures of the amount of time for which each language is spoken. 8 Consequently, we follow Locay et al. (2013) and capture "bilingualism" through the language spoken at home.
Information about the language status of the children constituting our sample is depicted in Figure 1. At age three, 56% of the children are exposed to a foreign language at home; however, by age 14, less than 30% of children continue to be bilingual.
Given our research interests, we focus on a sub-sample of UK born children, for each of whom at least one parent is foreign-born; these are the children most likely to be exposed to multiple languages at home. 9 A total of 3,245 children (at age 9 months) meet these selection requirements. We further restrict the sample to cases with complete information on our language, emotional, and pro-social development measures and other control variables.
These restrictions yield a sample of 1,993 children at age three; with subsequent attrition the sample was reduced to 1,682 at age five; 1,693 at age seven; 1,504 at age eleven; and 1,361 at age fourteen. 10 In our robustness sub-section, we also report some additional checks performed 6 We assume that a bilingual child who attends full-time school, is given adequate exposure to both English and their other language. Research indicates that when children divide their learning time equally between the two languages, this results in 50% exposure to one language and 50% to the other; thus, the two languages develop like that of monolingual children (Genesee et al., 2004). 7 The survey questions on the language spoken at home in the first wave (child at nine months) are slightly different. In particular, the first wave provides only three options to the question: "Yes -English only", "Yes -English and another language", and "No -other languages only". The first option is classified as monolingual and the second and third options are considered as bilingual. Our results are robust when compared to the alternative construction -the first two options classified as monolingual and the third as bilingual. We also based our definition on the findings from Ramrez (2016) that at 11 months of age, just before most babies begin to say their first words, babies from monolingual English households are specialized to process the sounds of English, and not the sounds of Spanish, an unfamiliar language. However, babies from bilingual Spanish-English households are specialized to process the sounds of both languages, Spanish and English. 8 Many different definitions of bilingualism have been used in the literature. See Lambert and Macnamara, 1969;Skutnabb-Kangas, 1981;Byram, 2004;andFromkin et al., 2018. In particular, Skutnabb-Kangas (1981) also focuses on immigrant children, arguing that bilingualism has to do with person's origin. 9 Since the entirety of the MCS sample children are born in the UK, all children in our sample are exposed to broadly the same institutional and cultural environment outside of the home. Moreover, restricting the sample to children with at least one foreign-born parent is helpful in reducing any heterogeneity associated with the fact that foreign-born parents may differ from native-born parents in ways that are non-random. 10 The longitudinal pattern of response in the MCS is complex, with attrition, re-entry, and a small number of late entrants who were eligible at wave 1 but were not included since they were not recorded on the register for Child Benefit. Child Benefit, a universal provision payable from the child's date of birth, was used as the sampling frame for the MCS. Our regression estimates are weighted to account for non-response rates.
across different sub-samples. In particular, we expect that having both parents come from the same country of birth exposes children to the parental language in a more consistent way than does having parents who both come from an English or non-English background.
We begin with an analysis by age group that exploits information for these children at every age in relation to which they are observed. We subsequently estimate dynamic models that rely on an unbalanced panel of 9,532 child-wave observations for those children aged three to fourteen for whom we have complete information on the key development measures and additional controls of interest. 11

Language and emotional development
The MCS provides a range of cognitive ability measures which are intended to be administered directly to children. We use age appropriate tests that come from the British Ability Scales (BAS). Our measure of language development comes from a series of tests of verbal reasoning and knowledge, and is constructed using the following assessments: the BAS naming vocabulary test taken at ages three and five; the BAS word reading test taken at age seven; the BAS verbal similarity test taken at age 11; and the BAS word activities test taken at age 14. The naming vocabulary score reflects expressive language skills, vocabulary (knowledge of nouns), ability to attach verbal labels to pictures, retrieval of names from long-term memory, and general level of language development (Hansen et al., 2012). The word reading score assesses English reading ability, while the verbal similarity score assesses verbal reasoning and verbal knowledge.
Finally, the word activities assessment score measures the ability of children to understand the meaning of words. For each of these verbal reasoning tests, we use the age-standardized scores. To facilitate interpretation of the results, we transform the outcomes into z-scores. We are unable to consider cognitive outcomes on other cognitive tests (i.e., the school readiness and mathematical reasoning tests) since there are no consistent measures available over time.
11 To maintain statistical power, in the analysis performed by child's age, we do not restrict our sample to children with non-missing information on all co-variates across all waves. Our unbalanced panel of 9,532 child-wave observations is further reduced to 6,310 observations due to the lag instrumental structure utilized in the GMM approach. i) emotional issues: child complains of headaches, stomach aches/sickness, often seems worried, unhappy, nervous, or clingy in new situations; ii) conduct problems: child often has temper tantrums, fights with or bullies other children, and is often argumentative with adults; iii) hyperactivity: child is restless, over-active, cannot stay still for long, is constantly fidgeting, and is easily distracted; iv) peer problems: child tends to play alone, does not have at least one good friend, is not generally liked by other children, is picked on or bullied by other children, and gets on better with adults; and v) pro-social behavior: child is considerate of others' feelings, shares readily with others, is helpful if someone is hurt, upset, or ill, and is kind to younger children. Respondents indicate whether each item is "not true", "somewhat true", or "certainly true". Scores from the conduct problems, hyperactivity, emotional symptoms, and peer problems sub-scales are summed to construct a total SDQ score which varies between 0 and 40, with higher scores being indicative of more behavioral and emotional difficulties. Note that we exclude the pro-social behavior from the SDQ index since it is a positive trait and will be examined separately. The absence of pro-social behaviors is conceptually different from the presence of psychological difficulties (Goodman, 2001). Scores of the pro-social sub-scales range from 0 to 10 with a higher score indicating more engagement in pro-social behavior. The SDQ and pro-social scores are expressed as a normalized z-score.

Control variables
A major advantage of the MCS data is that they include detailed information about socioeconomic characteristics, migration background, and parental inputs. Our socio-economic controls include the child's age, gender, ethnicity, birth weight, number of siblings, and the question of whether their biological mother lives in the household; also included are the mother's age and marital status, parental education levels, region of residence, and the family's poverty status (i.e., income being less than 60% of the median UK income). 12 Parental education at birth captures the additional time that educated parents, especially mothers, tend to spend with their child and also the fact that better educated parents are more likely to be fluent in English (Locay et al., 2013).
There is evidence that the language proficiency of one family member is positively associated with that of other family members, and that children's language proficiency is more highly correlated with maternal, rather than paternal, proficiency (Chiswick et al., 2005).
Unfortunately, the MCS does not include measures of parental fluency in English. It does, however, provide information about migrant status, age at (UK) arrival, and country of origin, which may all be correlated with parental language proficiency and, consequently, with the children's propensity to speak English at home. In particular, we control for whether mothers and fathers come from a non-English-speaking country. 13 We also control for whether migrant 12 This is the official measure of poverty that is reported in the official Households Below Average Income Report (Department for Work and Pension, 2018), and is defined as living in a household with net equivalent income less than 60% of the UK median household income. 13 We used The World Almanac and Book of Facts (Park, 2005) to determine whether English was an official or predominant language in each country of origin. We classify the following as English-speaking countries: Australia, Canada, Barbados, Bermuda, Dominica, Ethiopia, Ghana, Guyana, Hong Kong, India, Ireland, U.S., Jamaica, Kenya, New Zealand, Nigeria, Saint Lucia, Saint Vincent and the Grenadines, Seychelles, Singapore, Trinidad and Tobago, and Zambia.
parents arrived in the UK before the age of 11 (as they are likely to have a higher degree of English proficiency). 14 In our robustness section, we provide an additional sensitivity check, in which we exclude children whose parents arrived after 1997 (585 individuals arrived between 1997 and 2002), thus retaining a sample of children having parents who arrived between 1954 and 1996.
In addition, given that the survey provides information on the specific language used at home, such as Cantonese and Arabic, we experiment in constructing a measure indicating the language distance between the home language and English (see Isphording andOtten, 2014 andIsphording, 2015). 15 However, given the dynamic nature of our research question, this time-invariant measure does not provide any useful insights and is not utilized in the current analysis. 16 Finally, parents' investments in their children's development are captured in responses to detailed questions regarding the interactions that parents have with their children. Specifically, when children were nine months old, the main respondent was asked how important: i) talking; ii) cuddling; iii) stimulating; and iv) establishing regular sleeping and eating times were for the development of the child. The responses to these questions provide us with an insight into parents' approach to child rearing in infancy, and we control for this in our cross-sectional analysis at age three. We measure parental investments at older ages using information about how often parents read to their children or take them to the library. 17 Table 1 reports the summary statistics of our unstandardized measures of development outcomes, along with other characteristics. 18 There are pronounced differences between the two groups of children. Within their age cohort, bilingual children score much lower than their monolingual counterparts in the verbal reasoning exercise at both age three and five, but by age 14 The parents in our sample arrived between 1954 and 2002. Parents from non-English-speaking countries who arrive at an early age typically have English language skills comparable to migrant parents from English-speaking countries (Bleakley and Chin, 2008). Age at arrival may also have an impact on cultural assimilation. Those parents who arrived at older ages may differ in their values, views about parenting, etc., all of which may have an impact on the language and emotional development of their children. 15 Isphording (2015) shows that linguistic barriers caused by language differences play a crucial role in the determination of the destination-country language proficiency of immigrants. The argument is that immigrants face very different costs of language acquisition associated with their linguistic origin. Immigrant children learn English easier and faster if their language of origin is linguistically closer to that of the host country (Isphording and Otten, 2014). Hence, the cost of acquiring the host country language depends on the distance of the migrants' mother tongue from the dominant majority language, in our case English (Dustmann et al., 2003). 16 Linguistic distance may also be utilized through an instrumental variable (IV) strategy along with other covariates, such as age at arrival interacted with non-English-speaking background (see Bleakley and Chin, 2004). However, the exclusion restriction required to justify IV estimation is unlikely to hold good in our setting. Younger migrants are likely to differ from older migrants along non-language dimensions that also affect children's outcomes. Instead, we experiment by controlling for linguistic distance from both parents' language and English, rather than using indicators for having parents from non-English backgrounds. These results, available on request, overall reveal a similar pattern by child's age. Only three years olds whose parents speak a foreign language at home have a significantly higher probability of reporting emotional issues. They also tend to be less pro-social. 17 The level and quality of parental investments are usually proxied by the Home Observation Measurement of the Environment (HOME) scores that have been shown to be significantly correlated with later cognitive, health, and noncognitive development (Todd and Wolpin, 2007;Cunha and Heckman, 2008). Similar constructs are available in our data but they are inconsistently measured over waves and therefore are not used in our analysis. 18 Corresponding information for native-born children is available upon request. Note that the scaling of the unstandardized verbal measure changes across waves, thereby making it not directly comparable across time.  seven the gap in language development between the two groups becomes insignificant. Bilingual children also score higher on the SDQ total difficulties score and lower on the pro-social score, but these differences tend to decline with age. Children whose parents speak a foreign language at home come mainly from non-white, ethnic, and lower educated households.

The bilingual gap in children's language, emotional and pro-social development
Parental investment at nine months, as measured by mother's attitudes toward child rearing, is significantly different between bilingual and monolingual mothers. Mothers of bilingual infants are more likely to believe that talking to, cuddling, and stimulating their babies and ensuring regular sleeping and feeding habits for infants are important. When bilingual children are aged three to five, mothers score lower on our parental activity measures, whereas, in contrast, when bilingual children are aged seven, parents tend to spend more time reading to their child and visiting the library. These results suggest that there is some variation in parental practices over time.

Methods
Our primary objective is to understand whether -relative to their monolingual peers -there is a gap in the language, emotional, and pro-social development of children exposed to a foreign language at home, and how this gap evolves over time. We address this issue by estimating child development production functions using the approach developed by Wolpin (2003, 2007) and applied in Fiorini and Keane (2014), Del Boca et al. (2017), and Del Bono et al. (2016). Children's development is assumed to be a cumulative process that depends on both contemporaneous and historical family investments, as well as children's skill endowments.
Our focus is on the developmental consequences of families' decisions to raise their children in a bilingual home environment.
There are well-known econometric challenges involved in generating unbiased estimates of production function parameters along with observational data. The first hurdle arises because investments in children are not exogenous, but instead result from the active choices that parents make when trying to maximize their children's human development given the constraints they face. This would not necessarily be a problem if comprehensive data on all relevant inputs (e.g., parental, school, and community investments; child endowments; etc.) pertaining to child development were to be observed (see Todd and Wolpin, 2003); however, this is rarely, if ever, the case. These unobserved inputs (e.g., children's innate ability) are almost certainly correlated with certain other inputs (e.g., reading to children) which are observed by researchers. This correlation results in the usual omitted variable bias problem. The second problem stems from the threat that both structural (e.g., simultaneity and mutual causality) and statistical endogeneity (e.g., unobserved heterogeneity and measurement error) pose to causal estimation of the effect of parental investments on children's development.
Our empirical strategy is to account for the cumulative nature of child development using value-added specifications which control for measures of lagged development (see Hanushek et al., 2009 andHarris, 2010 for details). Specifically, our model of children' development is given by: where C iw denotes the language, emotional, and pro-social development for child i in waves 2, 3, 4, 5, and 6 (or at child's age 3, 5, 7, 11, and 14), − is an indicator of bilingual status in wave ( 1) w − , and ′ X iw is a vector of child-and family-specific covariates. The advantage of using lagged values (instead of contemporaneous values) of bilingual status is that we can ensure that the lagged values precede the child's performance on the verbal language test and development of emotional difficulties, thereby ruling out reverse-causality concerns. We include a rich list of factors which contribute to child development such as child's age, gender, ethnicity, number of siblings and birth weight, maternal age, parental education measured at nine months, the presence of the biological mother in the household, parents' non-English background, young migrant indicator, reading and library visitation frequency, poverty, and regional indicators.
Finally, C i w , 1 − captures lagged development, while λ is a persistence (autoregressive) parameter that links development across periods, such that | |< 1 λ . Including C i w , 1 − in the model not only captures persistence in language and emotional development, but also controls for unobserved ability (see Todd and Wolpin, 2003). Under certain conditions, the lagged development measure effectively acts as a sufficient statistic for all historical time-varying inputs (see Sass et al., 2014).
Ordinary Least Square (OLS) estimation of Eq. (1) produces unbiased estimates of the child development production function, for as long as ε iw is i.i.d. There may, however, also be unobserved differences in the rate at which children acquire language skills or develop emotionally. This implies that the error term is best modeled as H P X child-specific differences in the rate of development. Even if we make the strong assumption that there is no correlation between family inputs and children's unobserved ability, OLS estimation of Eq.
(1) will not generally result in unbiased estimates of the child development production function, because lagged development C i w , 1 − will depend on μ i , causing it to be correlated with ε iw . Unfortunately, fixed-effect estimation does not resolve the issue (Nickell, 1981).
Consequently, in addition to using OLS, we also undertake instrumental variable estimation of Eq. (1) using a generalized method of moments (GMM) framework. This results in consistent estimates even in the presence of child-specific unobserved heterogeneity as further lags of development measures which use two or more period lags of dependent variable and/or of X-variable, and which are valid instruments for C i w , 1 − (see Bond, 1991 andBlundell andBond, 1998). This estimation strategy has been successfully applied in related papers which are focused on skill formation (Del Bono et al., 2016) and the education production function (Andrabi et al., 2011). Specifically, we estimate Eq. (1) using a system of equations -referred to as "system-differences and levels" GMM -which includes Eq. (1) along with an additional equation that purges the child-specific effect μ i from the model: where ∆ prefix denotes the change from one period to the next, i.e., The instruments used in this system are the two-period lagged measures of language and emotional difficulties (and pro-social behavior), the two-period lag of bilingual status along with contemporaneous inputs (child's age in months and birth weight) for the difference as per Eq.
(2), and past inputs (presence of the natural mother in the household) along with invariant inputs such as child's gender, ethnicity, and mother's and father's education at child's birth for the levels as per Eq. (1). 19 Note that the variability exploited by the system-GMM estimation comes from those children who switch their bilingual status in the observation period. 20 Hence, we need a sufficient number of children to change their bilingual status between survey waves in order to accurately estimate the effects. Although such variation may not seem very intuitive, it is established that transition to primary school is a key moment, when linguistic habits are subject to change; the combination of a stronger dominant language and social differences may lead many children to reject their home language (see Hundt, 2006). In our data, we observe 700 children who switched from bilingual to monolingual status, while 235 children switched from monolingual to bilingual status over the observed period. 21 The dynamic GMM estimator is unbiased and consistent under the assumption that there is no second-order serial correlation in the error term. To test this assumption, we use a test procedure outlined in Arellano and Bond (1991). Specifically, results from two tests of the validity of the instruments (the Sargan-Hansen test of over-identifying restriction) and autocorrelation (the AR(2) process) are performed. 22

Results and Discussion
We begin our empirical analysis with age-specific models which provide descriptive evidence of evolution in the bilingual development gap from an early childhood (age of three) to 19 When experimenting with different instruments, we look for a proliferation of instruments that may overfit the endogenous variables, and we ensure that the model passes both the test for instrument validity (i.e., the Sargan-Hansen test) and the test for second-order serial correlation (see Arellano and Bond, 1991). 20 The GMM can be used for periods of T ≥ 3 since identification comes from large N, and not from large T. Arellano and Bover (1995) and Blundell and Bond (1998) demonstrate that the system-GMM approach -by adding additional moment restrictions -permits lagged first differences to be used as instruments in the levels equations, and this corrects for any bias that would emerge as a consequence of using the standard GMM estimator. 21 In Table A1 in Appendix, we report the characteristics of those children who switch bilingual status. Children who switch exclusively to the English language, as early as the age of five, score relatively higher in the verbal reasoning exercise, when compared to those who always stayed bilingual or moved to bilingual at age of seven. A feature of our specification is that there is no differentiation in the effect of bilingualism among those who move into versus those who move out of bilingual status. 22 One limitation of our data is the irregularly spaced survey design; this may have an impact on the validity of the Arellano-Bond tests. Our waves refer to a longer period and there may be some unobservable factors that we have not fully captured. To address the missing data issue, Millimet and McDonough (2017) incorporate imputation of the missing covariates. As an alternative, one might consider imputation of the missing lagged dependent variable. However, this introduces additional complications in that the measurement error, representing the deviation between the actual lagged dependent variable and the imputed variable, is likely to depend on the deviation between the covariates from period t-1 and the covariates used to impute the missing value (see Millimet and McDonough, 2017). We experimented with the Millimet and McDonough (2017) estimator; however, due to the smaller sample size the estimated effects are less precise.
adolescence (age of 14). All estimates are weighted to account for the non-response rate and standard errors are clustered at the family level. We then turn to our dynamic OLS and GMM estimates of the value-added child development production function by pooling all data waves together. In Subsection 5.3, we examine whether there is evidence that the production functions of language, emotional, and pro-social skills are different for subgroups of the population.
Subsection 5.4 documents whether there are corresponding patterns of disparities in maternal behavior and home environments, and finally, in Subsection 5.5 we test the robustness of our estimates in relation to several sample restrictions.

The bilingual gap in children's development by age
We first investigate how the bilingual gap evolves from early childhood to adolescence by estimating Eq. (1) separately by age group. These results provide evidence of the way in which the bilingual gap in development evolves as children mature. We report the estimated bilingual gap separately by age, for language, emotional development, and pro-social behavior as measured by the standardized verbal reasoning test scores (Table 2), the SDQ score (Table 3), and the pro-social score (Table 4). We consider three specifications: Panel (A) in Tables 2-4 includes only basic controls (child's age in months, mother's age at birth, and gender and Children are interviewed approximately every three years; therefore, the coefficients on lagged bilingual status and other family inputs can be interpreted as an effect arising from application of the input over a three-year interval.
The estimates reported in Table 2, clearly point to a strong negative correlation between language spoken at home during early childhood and subsequent language development.
We find that three-year-old children whose parents speak a foreign language at home have significantly lower verbal reasoning test scores than their monolingual peers. Other things being equal in a very basic specification (Panel (A) in Tables 2-4), speaking a foreign language at home when the child was nine months old reduced the child's language development, measured at age three, by 0.63 standard deviations. Adding a comprehensive set of controls reduces the estimated bilingual deficit in language development relative to that of an otherwise similar monolingual child to 0.35 standard deviations (see Panels (B) and (C) in Tables 2-4). In comparison, the negative impact of bilingualism on language is greater than the effect of low socio-economic background -a nine-month-old child who is living in poverty can be expected 23 The full set of controls included in Panels (B) and (C) at each age are slightly different depending on data availability but tend to include child's age in months, gender, ethnicity and birth weight, mother's age and marital status, mother's and father's highest qualification measured when the child is nine months old, an indicator for young migrants, non-English speaking background, whether the biological mother present in the household, number of siblings, frequency of reading to the child or library visits, poverty indicator, regional controls, along with time-varying characteristics measured at period a -1, such as poverty indicator, number of siblings, marital status, and presence of biological mother.
to have a verbal reasoning score that is 0.14 standard deviations below that of a non-poor child at age three. 24 The penalty for not speaking English exclusively at home is reduced at age five, once we fully control for the observable characteristics and past developmental outcomes; however, bilingual children continue to score lower on average in comparison to their monolingual 24 The full set of estimated coefficients are available upon request. These estimates additionally reveal that the education of the mother and father, measured at the time of the child's birth, has a significant and positive association with language development at age three. For example, having a mother with a higher education degree is associated with a 0.19 standard deviation increase in the verbal reasoning score, as opposed to having a mother with a lower qualification. Children with gross, fine motor development and communication delays measured at nine months have not caught up developmentally by age three. The effect of having parents who arrived in the UK at younger ages results in a significant positive increase in verbal development at age three by around 0.13 standard deviations. Parental inputs, as measured by the frequency with which the mother or father reads to the child and/or visits the library, significantly impacts the verbal outcomes of fiveyear-old children. Overall, adding a comprehensive set of controls reduces the estimated coefficient on current bilingual status, and in some specifications, the effect is rendered statistically insignificant at the conventional levels. Notes: Standard errors, clustered at the family level, are in parentheses. All regressions are weighted to account for the non-response rate. Each panel corresponds to a separate specification, where the bilingual effect with basic controls (child's age in months, gender and ethnicity, and mother's age) is reported in Panel (A); Panel (B) additionally includes child's birth weight, mother's age and marital status, mother's and father's highest qualification obtained when the child is nine months old, a young migrant indicator, mother and father coming from non-English speaking background, biological mother present in the household, child's number of siblings, frequency of reading to the child or library visits, poverty indicator, regional controls along with time-varying characteristics measured at period w −1, such as a poverty indicator, child's number of siblings, mother's marital status, and presence of biological mother; however, it omits lagged endowment effects; Panel (C) in addition to the full set of controls includes lagged input measures for language and emotional development.
counterparts even if we control for past development levels. The gap is statistically significant at the 1% level. Other things being equal, the results suggest that, on average, bilingual children's verbal reasoning scores are 0.20 standard deviations below that of monolingual children. This provides suggestive evidence that five-year-old bilingual children continue to fall behind in their expressive language and knowledge of names. Consistent with the results of the literature available from Cunha and Heckman (2008) and Dickerson and Popli (2016), past language endowment has a positive and significant effect on language development as measured at a later age. For example, an increase of one standard deviation in the verbal reasoning test score at age three is associated with greater language development (0.40 of a standard deviation) at age five.
Language development in bilingual children, at the age of seven, in the basic specification, is found to be statistically equivalent to that in monolingual children in specifications that omit the prior developmental outcomes (see Panels (A) and (B) in Table 2). However, once we control Notes: Standard errors, clustered at the family level, are in parentheses. All regressions are weighted to account for the non-response rate. Each panel corresponds to a separate specification where the bilingual effect with basic controls (child's age in months, gender and ethnicity, and mother's age) is reported in Panel (A); Panel (B) additionally includes child's birth weight, mother's age and marital status, mother's and father's highest qualification obtained when the child is nine months old, a young migrant indicator, mother and father coming from non-English speaking background, biological mother present in the household, child's number of siblings, frequency of reading to the child or library visits, poverty indicator, regional controls along with time-varying characteristics measured at period w -1, such as a poverty indicator, child's number of siblings, mother's marital status, and presence of biological mother; however, it omits lagged endowment effects; Panel (C) in addition to the full set of controls includes lagged input measures for language and emotional development.
for the measures of prior language and emotional development, we find that seven year olds who were bilingual at age five have significantly higher language development (0.22 standard deviations) than monolingual children, whether or not they continue to be bilingual themselves at age seven. 25 To obtain an understanding of how important these estimates are, we can compare them to the effect of parental education. At the age of seven, having a mother and father with a higher education degree is associated respectively with a 0.10 and 0.11 standard deviation advantage in language development, relative to having parents with lower qualifications.
The gap associated with being bilingual (measured via the one-period lag of bilingualism) is nearly twice as large. The persistence in language outcomes, as measured by past verbal reasoning skills, is also statistically significant and supports the notation of self-productivityskills acquired during previous periods enhance skill formation at a later period (Cunha et al., 2006). 25 Recall that in the verbal reasoning test taken at age seven (the BAS-Word reading test), the child must correctly pronounce words within locally accepted standards, with emphasis on the correct syllable or syllables. Notes: Standard errors, clustered at the family level, are in parentheses. All regressions are weighted to account for the non-response rate. Each panel corresponds to a separate specification where the bilingual effect with basic controls (child's age in months, gender and ethnicity, and mother's age) is reported in Panel (A); Panel (B) additionally includes child's birth weight, mother's age and marital status, mother's and father's highest qualification obtained when the child is nine months old, a young migrant indicator, mother and father coming from non-English speaking background, biological mother present in the household, child's number of siblings, frequency of reading to the child or library visits, poverty indicator, regional controls along with time-varying characteristics measured at period w -1, such as a poverty indicator, child's number of siblings, mother's marital status, and presence of biological mother; however, it omits lagged endowment effects; Panel (C) in addition to the full set of controls includes lagged input measures for pro-social development.
These results reported at the age of seven are consistent with Mumtaz and Humphreys (2001), who show that bilingual children are better at reading as compared to their monolingual counterparts. This suggests a possible "transfer" of first language literacy skills to the development of reading in a second language. It also supports the view that bilingual reading development may aid the acquisition of certain literacy skills such as phonological awareness and memory, and the regular word-reading ability that accumulates over time. 26 At age 11, the advantage in the verbal reasoning tests for children who were bilingual at the age of seven (measured via the one-period lag of bilingualism) is not statistically significant once we fully control for child characteristics, parental characteristics, and children's past developmental outcomes. These results are mirrored for bilingual children at age 14 with a decrease in verbal reasoning scores of 0.09 of a standard deviation, and the effect is insignificant. There is a statistically significant persistence in language development over time. Interestingly, the persistence in language development diminishes as children age; also, the negative association between having emotional difficulties and language developmentas measured by the SDQ scores -which is observed at early ages is no longer evident once children reach adolescence.
Our results for emotional development, as measured by the standardized SDQ, are presented in Table 3. We find that the discrepancies in SDQ scores between bilingual and monolingual children, originally explained in Table 1, are insignificant once we fully control for observable characteristics and past endowments (see Panels (B) and (C) in Table 1). In other words, being bilingual has no significant impact on the emotional development of children, regardless of their age. Children's past language development is found to be associated with their emotional development, however. An increase of one standard deviation in the past language score is associated with a reduction of 0.11 of a standard deviation in emotional difficulties at age eleven (see Panel (C) in Table 3). There is a strong persistence in emotional development over time; lagged SDQ scores are highly predictive of current SDQ scores. For example, a one standard deviation increase in emotional difficulties at age five is associated with a 0.56 standard deviation increase in emotional difficulties at age seven.
We observe similar results for pro-social development (see Table 4). The pro-social development of bilingual children is statistically equivalent to monolingual children in the specification that controls for past endowments of pro-sociality (see Panel (C) in Table 4). The persistence in pro-social behavior is significant and stable over time. Specifically, other things being equal, a one standard deviation increase in the lagged pro-social score increases current pro-social behavior by 0.30-0.40 standard deviations.
In summary, results in Tables 2-4 reveal several important findings. First, there is a negative relationship between bilingualism and early language development. The magnitude of this early disadvantage is significant, and consistent with some previous studies. In a more general context, Lang and Sepulveda (2007) find a black-white gap in cognitive tests, ranging from 0.20 to 0.50 of a standard deviation after controlling for a myriad of mother and family controls. The start of schooling, however, appears to attenuate the language gap and by age 14 the difference in early language development is insignificant. We find that there are no significant differences in children's emotional development and pro-social behavior. Finally, our estimates emphasize that prior language and socio-emotional inputs have a noticeable influence on child development.

Dynamic estimates of the bilingual gap in children's development
Results from the cumulative value-added models which were estimated via OLS and dynamic GMM on our unbalanced panel of 9,532 child-wave observations are presented in Table 5.
Columns (1) and (2) in Table 5 report the OLS estimates of Eq. (1), without and with past language and emotional endowments, respectively. Column (3) in Table 5 shows the system GMM estimator that combines the levels and the first difference equation. 27 Omitting the previous language and emotional development and pooling all ages together (Column (1)), it is discovered that the lagged bilingual status is associated with a decrease in children's verbal language scores (0.10 of a standard deviation), other things being equal. 28 When past development measures are controlled, the estimated coefficient is smaller and not statistically different from zero (Column (2)). There is persistence in language development that is statistically significant; a one standard deviation increase in the lagged verbal reasoning scores is associated with an increase in the current verbal score of 0.28 of a standard deviation.
In general, as in our age-specific analysis, the coefficient on past language skills is well below one, indicating that persistence in language achievement -while significant -is far from perfect. Lower persistence in the measures of language development indicates that learning is lost at a greater rate over time. Past emotional development index correlates negatively with language development.
The system GMM estimates in Column (3) allow for the possibility of detecting the presence of endogenous feedback effects, i.e., parents may adjust their behavior and resource decisions, as governed by feedback on children's outcomes. In the presence of such feedback effects, we expect to observe both a reduction in the persistence parameter, λ, and an increase in the bilingual coefficients (see Andrabi et al., 2011 andDel Bono et al., 2016). We do not find evidence which suggests that feedback effects may be present. The point estimates of the coefficient on past language skills increases from 0.28 in the OLS (Column (2)) to 0.60 in the GMM specification (Column (3)). The coefficient which is associated with past bilingual status increases in absolute value; however, it remains insignificantly different from zero. The system GMM standard errors are much higher than their OLS counterpart. 29 A natural question is how these dynamic estimates compare to the different age effects that are reported in Table 2.
We keep in mind that the dynamic panel GMM estimator identifies the bilingual effect using children who switch between bilingual and monolingual status over time. The insignificant but positive coefficient associated with the past bilingual status suggests that there is no penalty for children who switch their bilingual status.
27 The specifications also control for the differences in the survey time by including dummies for waves in regressions. 28 In a specification that includes mother fixed-effects (not reported, but available upon request), the estimates are of a similar order of magnitude as those from the OLS model. This suggests that bilingual differences are not simply the product of unobserved variation in time-invariant family characteristics (see De Haan et al., 2014 for similar discussion.). 29 In terms of precision, this estimator appears much more satisfactory than difference-GMM. This is because lagged differences are stronger instruments for lagged levels (whereas lagged levels are weak instruments for lagged differences). Notes: Robust standard errors, which are clustered at the family-level, are placed in parentheses. Columns (1) and (2) report OLS estimates of Eq. (1) without and with past language and emotional endowments, respectively. Column (3) reports the system GMM estimator which assumes that effects have constant correlation with the inputs. All estimates include the full set of controls and time dummies. Verbal reasoning skills refer to language development and the SDQ score refers to emotional development.
Consistent with our age-specific results, using the dynamic GMM model, we find no effect of bilingualism on the emotional and pro-social development of children. The point estimate of z-SDQ score a −1 is 0.27 of a standard deviation in the system GMM specification, representing a decrease with respect to the OLS results reported in Column (2). Similarly, the coefficient on past pro-social behavior decreases from 0.34 in the OLS specification (Column (2)) to 0.29 in the system GMM (Column (3)). Consistent with previous results, the dynamic specification also illustrates a significant persistence in language, emotional, and pro-social behavior over time. In particular, about 30% of pro-social development persists over time -a one standard deviation increase in the lagged pro-social score increases the current pro-social behavior by 0.29 of a standard deviation (Panel (C)) -while more than 0.60 of a standard deviation of verbal reasoning skills persists across ages. This finding, which is also emphasized by Del Bono et al. (2016) and Fiorini and Keane (2014), is in line with the idea that production functions for cognitive and non-cognitive skills are very different (Cunha et al., 2006). 30

Heterogeneity
Next, we explore whether there is any heterogeneity in the bilingual effects reported so far. First, we consider whether the effects of bilingualism on language, emotional, and pro-scoial development differ by the children's gender. This additional information is important for identifying whether, for example, gender-specific strategies might be more effective in reducing language difficulties that arise from the language spoken at home. 31 The results, reported in Table 6, are based on age models with full set controls, including past endowments as reported in Panel (C) in Tables 2-4. We find that the gap in language development at an early age is not significantly different between boys and girls. Other things being equal, bilingualism, by age three, decreases verbal reasoning scores by 0.35 of a standard deviation for girls and by 0.32 of a standard deviation for boys. The differences become more pronounced as the children age. As indicated by the results in Table 6, boys mainly drive the bilingual language advantage at age seven; the lagged bilingual status is positive and significant at the 5% level. However, by age 14, boys are back to having lower language development. Boys' underachievement at that age, which corresponds to Key Stage 3, is in line with some official figures. For example, international comparisons of 15-year- 30 We assess the sensitivity of our results by re-estimating our dynamic and child age-specific models using a balanced panel of children, with no missing information in the main family background variables over the five observed waves; that leaves us with 926 children, from whom 4,630 child-year observations could be gathered. We do not observe any substantive change in our bilingual coefficients. The results are available on request. 31 Previous research shows a gender gap in children's performance. For example, boys' attitudes towards reading and writing, the amount of time they spend in reading, and their achievement in literacy are all poorer than of the corresponding values for girls (All-Party Parliamentary Literacy Group Commission, 2012). The literature also suggests that gender differences do exist in the maintenance of home language. While girls are more likely to possess higher social skills and academic competence, boys often have more problem behaviors (Margetts, 2005). Furthermore, our estimates, not reported but available on request, consistently suggest that boys are more likely to experience emotional difficulties and that their verbal language development is well below the language development of girls.
olds' Programme for International Student Assessment (PISA) results show that girls do better in reading than boys across all of the OECD countries (Bradshaw et al., 2009).
In the case of emotional development, it is mainly boys that are found to benefit from their bilingual background, though the effect varies by age. Specifically, at ages five and eleven, Notes: Standard errors, which are clustered at the family level, are in parentheses. The specifications include the same controls as in Panel (C) in Tables 2-4. Verbal reasoning skills refer to language development and the SDQ score refers to emotional development.
the advantage in emotional development for boys is statistically significant when compared to girls. Other things being equal, speaking a foreign language at home when the child was seven years old reduces the total difficulties score by 0.19 of a standard deviation. No significant gap in the emotional development of girls is evident. To gain a better understanding of the domains underlying the emotional development result, we focus on the role of bilingualism by looking at the five SDQ domains separately, rather than at the single SDQ score. Results, available upon request, indicate a significant difference in conduct problems and hyperactivity for boys. Specifically, lagged bilingual status reduces conduct problems and hyperactivity outcomes by 0.18 and 0.26 of a standard deviation, respectively. Finally, for both boys and girls, speaking a foreign language at home does not associate with significantly different pro-social behavior. Although these results should be interpreted with caution due to the low sample sizes, explaining why the emotional development of boys benefits from bilingual status deserves more investigation. It could be hypothesized that the environmental and cultural background may influence children's social competence differently among boys and girls. We leave gender differences in emotional development between bilingual boys and girls to a potential future research.
Furthermore, in Table 7, we report additional results by distinguishing two groups of children based on their mother's education -those whose mother attained a high school qualification, first degree, or diploma, and all the other children (mothers with A-level, GCSE/O-level, or below). We do find evidence of heterogeneous production functions in the development outcomes across children of mothers with different educational qualifications. The early language disadvantage is greater for children whose mothers have a higher educational qualification, but is insignificant by age five. In contrast, the negative association between bilingualism and language development is still evident at age five for children whose mothers have an A-level or lower qualifications; however, the effect size is similar in magnitude for children whose mothers have an A-level or lower qualification. Surprisingly, at the age of seven, a greater positive bilingual effect is evident for children of mothers having a low education.
This may be driven by selection, i.e., mothers less educated and less proficient in the English language are more likely to speak a foreign language, and they may invest more heavily in home activities to enhance the learning environment of their children (and possibly try to compensate the early bilingual cognitive disadvantage). We cannot disentangle this explanation since our data do not allow us to pin down the language proficiency level.

Differences in early parental behavior and home environment
Motivated by our findings that bilingual differences in language assessments appear as early as the age of three and that the gap narrows over time, we investigate whether there are corresponding patterns of disparities in maternal behavior and home environment that can explain these early deficits in child development. We hypothesize that potential improvement in children's language development could arise from differences in parent-child interactions, and that a relatively high level of material education and time spent with the child has a positive effect on language development. In addition, it is well established that adverse circumstances such as poverty may create a stressful home environment, which contributes to psychological distress in parents and children (Holmes and Kiernan, 2013). This distress may filter into Notes: Standard errors, which are clustered at the family-level, are placed in parentheses. The specifications include the same controls as in Panel (C) in Tables 2-4. Verbal Reasoning skills refer to language development and the SDQ score refers to emotional development.
parenting practices, which tend to be punitive, inconsistent, and less nurturing. For example, persistent maternal mental health problems have been most strongly associated with child behavioral issues (Noonan et al., 2018). A leading potential explanation in our analysis is that, in general, immigrant parents foster higher growth rates in learning skills relative to native parents, which could result from different attitudes or parenting behaviors (see Hull and Norris, 2018).
For this part of the analysis, we use dependent variables that indicate the mother's home activities and well-being behaviors: (i) the time she spends with the child; (ii) the frequency of reading to the child; (iii) whether she is in a good relationship with her partner; (iv) whether a doctor has ever diagnosed her with depression or anxiety; (v) her life satisfaction; and (vi) whether she is currently at work. 32 In Table 8, we report our estimates of bilingual maternal behavior toward children from age three to fourteen. We find evidence that mothers who were identified as bilinguals are more likely to spend time with their child; they are more likely to be in a good relationship with their partner and to have higher life satisfaction. They are also less likely to experience depressive symptoms. Overall, the evidence on healthy maternal investments and home environment suggest that children of bilingual families may allow them to "catch up" and develop comparative levels of skills relative to monolingual peers, which is in line with our findings.

Robustness
The differences between bilingual and monolingual samples, found in Table 1, highlight the challenge which is involved in deriving an empirical method that effectively controls for unobserved child characteristics. Given that we have no experimental design, it is crucial to show that our results are robust to different sample restrictions. Firstly, we expect children whose parents are both foreign to be exposed to foreign languages in a more consistent way than are those with at least one native-born parent. Given this, we restrict the sample to children with two foreign-born parents. For instance, at age three we exclude 607 children on account of having a native-born parent. Table A2 in Appendix shows that results classified on the basis of age group and accounting for child characteristics and past endowments, are robust to this sample restriction. The estimated bilingual effect is very similar to those reported previously in Panel (C) in Tables 2-4, and maintains the result that language disadvantage fades with age. The differences in the SDQ and pro-social outcomes are not significantly different from our main results. The coefficient on past language development is in the range of ≈0.44-0.19, indicating that there is a significant loss of language skills over time.
Secondly, we exclude children whose parents arrived in the UK between 1997 and 2002 (this results in 585 children being dropped from the sample). The estimates, which are reported in Table A3 in Appendix, generally show similar results. We also test the robustness of our sample restriction by re-estimating our model on an expanded sample that includes those 32 Specifically, as part of self-completion module, the main respondent (usually biological mother) is asked about the quantum of time she usually spends with the child, and the possible responses are: "plenty of time", "just enough", "not quite enough", and "nowhere near enough", the higher scores corresponding to a lower quantum of time spent with the child. The variable "frequency reading" is measured on a 5-point scare, with a score of 5 indicating that the mother reads every day and a score of 1 corresponding to "not reading at all". The variable "good relationship with partner" takes on the value 1 if the mother were to "strongly agree" or "agree" on the question, "Your partner is usually sensitive to and aware of your needs. Do you agree?", and is asked in waves 1, 2, and 3. In each wave, respondents were asked whether a doctor had ever diagnosed them with depression or anxiety. If they respond positively, the variable "depressive symptoms" is assigned a 1; otherwise the value is 0. The life satisfaction is measured on an 11-point scale, with a higher score corresponding to a better outcome.
who report speaking Welsh (n = 196) or Gaelic (n = 2) in wave 1. This allows us to consider a notion of bilingualism that is not based solely on the language spoken at home. The estimates, available on request, are qualitatively similar. If anything, the magnitude of the estimates increases slightly in absolute value after the inclusion of the Welsh speakers.
Another potential concern is that any misclassification error induced by the categorical assignment of the language spoken at home may potentially cause a bias in our estimates. 33 33 See Dustmann and Van Soest (2004) for an analysis of speaking fluency of immigrants using ordered response models with classification errors. Notes: Robust standard errors, which are clustered at the family level, are placed in parentheses. All regressions are weighted to account for the non-response rate. Samples have been restricted to children in relation to whom complete information is available on language, emotional, and pro-social behavior. All specifications control for series of mother cohort dummies, ethnicity, marital status, highest grade completed at child's birth, the question of whether mother comes from non-English speaking background, gender of child, number of siblings, poverty indicator, and regional controls. ***Significant at the 1% level.
We address this by testing the robustness of our results with an alternative definition of being bilingual. Specifically, we allow the following categories to enter the model: i) mostly English and sometimes other languages spoken at home; ii) only other languages spoken at home, where the omitted category would be placed; and iii) only English language spoken at home. The OLS estimates, available on request, lend overall support to our main findings.
Specifically, the levels of language development in children who mostly speak English, but also other languages, at home and those who speak only other languages at home, are well behind the corresponding levels for their "English only" counterparts when they are aged three and five. The negative language impact is greater for children who mostly speak other than English languages at home. By the ages of 11 and 14, the differences in their development are not statistically significant.
We further consider a propensity score matching (PSM) analysis. 34 This approach consists of matching '"treated" children (i.e., bilingual children) with "untreated" children, based on their observed pretreatment characteristics, and then comparing their language, emotional, and pro-social behaviors. An average treatment effect on the treated (ATT) is obtained by averaging individual-level differences in behavior between the treated and untreated groups. In general, there are several advantages to using PSM methods. First, matching estimators do not impose functional form restrictions, nor do they assume that the treatment effect is homogeneous across the population (Zhao, 2005). Second, with a sufficient vector of observables, matching has been shown to yield estimates that compare favorably with experimental studies (Michalopoulos et al., 2004 andSmith andTodd, 2001). Finally, within the context of our study, a benefit of this approach is that it does not rely on children switching their bilingual status for identification, as in our dynamic analysis. Since it is not necessarily clear a priori which matching algorithm should be implemented, it is a standard practice to present results from multiple techniques (see Anderson, 2013). We consider the multivariate nearest neighbor, kernel matching, and inverse probability weighting (IPW). Technically, the IPW approach, for example, increases the similarity of the distribution of covariates in the treated and control groups via reweighting, which leads to reduced reliance on the functional form of the OLS specification. The multivariate nearest neighbor allows exercise of control over prior achievement.
The PSM estimates, performed by age group, are generally consistent with those reported from OLS models, Panel (B) (without past endowments), with slightly larger effects, on average, being reported from the PSM estimator (see Table A4 in Appendix). In most cases, where the OLS estimates are statistically significant, the corresponding PSM estimates are typically insignificantly different, suggesting that the degree to which PSM tightens the OLS bound is small, and that OLS seems to perform a reasonable job despite our reservations. Regardless of the matching algorithm used, results are consistent with our main findings.
To examine the potential role of unobserved heterogeneity, we also use recently developed tests that explore the stability of the coefficients of interest while increasing the set of control variables. We report estimates of the parameter δ , developed by Oster (2019), that indicates the level of selection on unobserved variables, as a proportion of the level of selection on observed variables, required to drive the treatment effect to zero. The assumptions underlying the calculation of δ can be varied. In our case, we vary the assumed value of R max − and the R 2 34 Propensity score matching may be thought of as assuming the selection problem way because it relies on conditional independence that implies no selection of the unobservable conditional on the observable. from a hypothetical regression of the outcomes of interest on treatment (bilingual status) and both observed and unobserved controls. If language, emotional, and pro-social outcomes are fully explained by the language spoken at home and full controls are set in the specification, then R max − = 1 . 35 The estimates of δ , which are reported in Table A5 in Appendix, correspond to the bilingual coefficients which are estimated with full controls and past endowments. Results from these tests all exceed 1 in absolute value, aside from the estimates for emotional development at age five.

Attrition
One potential concern is possible non-random attrition in the MCS, between waves. In Table A6 in Appendix, we compare selective descriptive statistics for individuals who attrite from the sample for any of the three outcomes and those who remain in the sample in wave 6 (age 14). In general, the verbal reasoning scores of children who attrite are slightly lower on average than those of children who do not. Even if attrition is not random, it is important to note that if attrition were attributable to fixed individual characteristics, our use of a firstdifference estimator in the GMM implies that our estimates remain unbiased.
We also test whether attrition in any of the three outcomes is random using the approach of Fitzgerald et al. (1998). This approach is based on the assumption that all determinants of attrition can be controlled for (when selection is based on observables). Specifically, we implement a probit model where our dependent variable takes the value of 1 for individuals who drop out of the sample in wave 6 due to non-response in each of the three outcomes, and the value of zero for individuals who remain in the sample. Results reported in Table A7 in Appendix correspond to three separate probit models for the verbal, emotional, and pro-social development on the attrition. The Pseudo R squared -results from our attrition probits suggest that baseline variables explain less than 3% of attrition in verbal reasoning, emotional, and pro-social development outcomes between waves 2 and 6. 36 Variables that significantly predict attrition in verbal test outcomes include child's past verbal test score, age, and marital status and education of the mother. A Wald test of whether the explanatory variables are jointly equal to zero results in c 2 = 68.6, suggesting the joint significance of these variables in predicting attrition. Note that bilingual status and father's/mother's background are insignificant predictors.
We then use the inverse of the fitted probabilities from these probit models to construct weights and adjust our main dynamic estimates, as reported in Table 5. The results, available on request, show that the inverse probability weighted estimates are numerically similar, and qualitatively identical, to the original estimates; therefore, we conclude that the attrition in the main outcomes is not likely to affect our estimates. Since our weights take into account important observable information, including baseline cognitive scores, we conclude that it is unlikely for unobservable factors to drive an attrition process which may substantially change our results. 35 The suggested cut-off to define an "acceptable" level of selection is an estimate of δ (calculated using ( R max R − = 1.3 *  ) that exceeds 1, with a higher value indicating increased robustness to selection on unobservables (see Oster, 2019 for details). 36 The results are available on request.
Finally, in Appendix, we provide quantitative evidence on the importance of missing data and ensure that this unavailability is, in our case, unlikely to cause bias. We test whether attrition in any of the three outcomes is random using the approach of Fitzgerald et al. (1998) and show that attrition is not likely to affect our results.

Conclusion
Relatively few empirical studies evaluate the key issues which surround the development of language in bilingual children over time, despite the availability of burgeoning literature on child cognitive and non-cognitive development, more generally. In this article, we model the language, emotional, and pro-social development and production functions for bilingual children, using cumulative value-added specifications which account for parental investments and children's own ability.
We emphasize several aspects of our findings. First, analysis by child age confirms that very young bilingual children initially have worse language skills than their monolingual peers. The start of schooling appears to attenuate these differences, and at age seven bilingual children show a language advantage. Across the whole age range, we find that, on average, there is no bilingual gap in language development once we control for lagged developmental outcomes and for a rich set of controls in a dynamic panel data framework.
There are several possible channels for this initial result. For example, Feng et al. (2014) show that the bilingual gap in early reading scores is explained by family differences in socioeconomic disadvantage as well as in parental, home, and school investment. Both may be particularly important if we consider the improvement in language skills at age seven, and the declining influence of the lag of bilingual status, as being a result of the increasing exposure of bilingual children to English as they age through participation in various social activities, childcare, and starting school. We show that there are corresponding patterns of disparities in parental behavior and home environment that bilingual mothers provide to their children.
Specifically, maternal time inputs are greater for children whose mothers are identified as bilingual.
At the same time, our analysis suggests that bilingual boys have fewer behavioral difficulties on average as compared with their monolingual peers. Specifically, all else being equal, speaking a foreign language at home when the boy is seven years old reduces the total difficulty score by 0.19 standard deviations. The outcome persistence parameters are generally low but slightly higher in the emotional development production function; this suggests that the production functions for language and emotional development are different.
We also find a fair amount of heterogeneity along mother's education. For instance, the language disadvantage for children whose mothers are highly educated is found insignificant at the age of five onward; this suggests that parental education does play an important role in children's learning prospects. In comparison, five-year-old bilingual children of mothers with low education continue to fall behind in their expressive verbal skills. Interestingly, at the age of seven, children whose mothers are educated up to the A-level or below mainly drive the bilingual language advantage; it might be the case that these parents start investing more heavily in home activities to enhance the learning environment of their children (possibly trying to compensate the early bilingual cognitive disadvantage).
Ultimately, the most important finding is that, overall, bilingual children are not significantly different when compared with their monolingual peers in language development, taken together with the positive effect of bilingualism on emotional development. With the advent of arguments promoting language acquisition in school from a relatively young age as a means to promote economic competitiveness and growth, the lack of any gap in language, emotional, and pro-social development in bilingual children is vital.

Availability of data and material
Data from the MCS is made available by the Centre for Longitudinal Studies (CLS) through the Data Archive. Neither the CLS nor the Data Archive bear any responsibility for the analysis or interpretation of the data reported here.

Ethical standards
Not applicable. Notes: Standard errors, which are clustered at the family level, are placed in parentheses. All regressions are weighted to account for the non-response rate. The results correspond to our full specification, which includes child's age in months, gender, ethnicity and birth weight, mother's age and marital status, mother's and father's highest qualification measured at child's age 9 months, a young migrant indicator, mother and father coming from non-English speaking background, biological mother present in the household, number of siblings, frequency of reading to the child or library visits, poverty indicator, regional controls along with time-varying characteristics measured at period a-1, such as poverty indicator, number of siblings, marital status, presence of biological mother, and lagged input measures for language and emotional development.
**Significant at the 5% level. *Significant at the 10% level. Notes: Standard errors, which are clustered at the family level, are placed in parentheses. All regressions are weighted to account for the non-response rate. The results correspond to our full specification, which includes child's age in months, gender, ethnicity and birth weight, mother's age and marital status, mother's and father's highest qualification measured at child's age 9 months, a young migrant indicator, mother and father coming from non-English speaking background, biological mother present in the household, number of siblings, frequency of reading to the child or library visits, poverty indicator, regional controls along with time-varying characteristics measured at period w-1, such as poverty indicator, number of siblings, marital status, presence of biological mother, and lagged input measures for language and emotional development.
**Significant at the 5% level. *Significant at the 10% level. Notes: Bootstrapped standard errors are placed in parentheses.
**Significant at the 5% level. *Significant at the 10% level. Notes: δ is the estimate of delta parameter developed in Oster (2019), which indicates how much selection on unobserved variables would be required to drive the "bilingual" estimate to zero, when measurement is made as a proportion to the selection on observed variables.  Notes: Parentheses contain t-statistics. Brackets contain standard deviations. ***Significant at the 1% level. **Significant at the 5% level. *Significant at the 10% level.