![]() |
|
||
|
IPUMS-Constructed Family
Interrelationship Variables
Basic Family Interrelationship
Variables: SPLOC, MOMLOC, and POPLOC
Consider the household in Table 1. RELATE sufficiently establishes that the two daughters are both children of the household head/householder, but to identify the other family interrelationships we must look to the daughters' other characteristics. We can infer that the son-in-law is married to the second daughter rather than the first one because they share the same surname and are both listed as married. For analogous reasons, we know that the grandchild is probably the child of the second daughter listed. It is also safe to assume that the two boarders are married to one another because they are both married, they share the same surname, they are both adults close to the same age, and they are listed adjacently. Table 1. Family Relationships to Household Head
To allow users to identify relationships among spouses, parents, and
children without forcing them to use multiple variables and complicated
logic, the IPUMS includes a set of pointers called SPLOC,
MOMLOC,
and POPLOC. These pointers identify
the location within the household of each individual's own spouse, mother,
and father, respectively. Table 2 illustrates these variables. PERNUM
(Person number in unit) is an IPUMS variable that indicates each individual's
position within the household as listed on the original census form. SPLOC
shows the PERNUM of each individual's own spouse. In Table 2, the son-in-law
is married to the second daughter, and her PERNUM is 04. Therefore, the
son-in-law's SPLOC is 04-the same as his wife's PERNUM. MOMLOC and POPLOC
show the PERNUMs of own mothers and own fathers; for example, the mother
and father of the grandchild are in positions 04 (MOMLOC) and 03 (POPLOC).
Of course, many persons do not have a spouse, mother, or father living
in the household with them; these cases are assigned a code of 00 for the
appropriate variable(s).
Table 2. Family Relationships with SPLOC, MOMLOC, and POPLOC
SPLOC, MOMLOC, and POPLOC can be used to identify conjugal units, to attach characteristics of spouses or parents, to develop specialized own-child measures, or to serve as building blocks for more elaborate measures of family composition. In most cases, users will be able to manipulate these variables to construct their own measures within a statistical package and will not be forced to resort to higher-level programming.1 Most scholarly family classification schemes are built up from information on the presence of immediate kin. The basic Census Bureau classifications focus on the presence of spouses and children of the household head/householder; the Laslett scheme widely used by historians is based on a count of "conjugal family units" consisting of parents and children or married couples.2 SPLOC, MOMLOC, and POPLOC make it relatively simple to construct such classifications. Family historians are increasingly moving from household-level schemes of family classification toward individual-level measures of family structure. For example, instead of measuring the proportion of households headed by a single female parent, we might assess the proportion of women who were single parents or the proportion of children residing with mothers only. Such individual-level analyses offer a variety of advantages that have been detailed elsewhere.3 The individual-level IPUMS pointer variables are especially suited for creating these kinds of measures. Additional Constructed Family
Variables
The additional individual-level constructed variables on family and
household relationships listed in Table 3 are fully illustrated in Table
4. FAMSIZE (Number of own family members) and FAMUNIT (Family unit membership)
use the same definition of family employed for NFAMS. FAMSIZE is useful
for creating a variety of family measures. For example, to determine if
a family contains extended kin beyond spouse and children, one can subtract
the size of the immediate family (spouse and children) from FAMSIZE; if
the result is greater than one, there are other kin present. More complex
measures of extended family configurations can be constructed using FAMUNIT,
which in combination with SERIAL
provides a unique identifier for each related group in the census.
Table 4. Family Relationships with Additional Constructed Family Variables
The IPUMS also includes the four most commonly requested measures of own children - NCHILD (Number of own children), ELDCH (Age of eldest own child), YNGCH (Age of youngest own child), and NCHLT5 (Number of own children under age 5), derived from MOMLOC, POPLOC, and AGE. Finally, there is NSIBS (Number of own siblings), which counts the number of persons within the household who share a common parent or who have family relationship codes that imply a sibling relationship. Creation of MOMLOC and POPLOC
For persons who have family relationships other than head, wife, child, sibling, or sibling-in-law, the relationship information does not identify parental relationships with as much precision. For example, we know that a person listed as a grandchild of the head is the child of one of the household head's children and/or children-in-law. However, if the family contains more than one person listed as child or child-in-law, we can not always be sure which one(s) are the grandchild's parent(s). Even if there is only one child present, there is still room for error, since a grandchild could be the offspring of persons not living in the household. In some cases - such as secondary families consisting of boarders - the relationship codes may provide no information for linking parents and children. Whenever the family relationship codes are unclear, we must turn to other information to identify parent-child relationships. Every census from 1880 to 1990 contains three additional pieces of information that can be used to clarify ambiguities: age, marital status, and the order in which individuals are listed on the census form.5 For example, if a household contains a widowed daughter followed immediately by a grandchild who is twenty years younger than the daughter, we may reasonably infer a mother-child relationship even if other daughters are present. Each census year also includes other information that can be used to distinguish parental relationships, but the availability of this information is irregular. For example, in the census years 1880, 1910, 1920, 1940, and 1950, we can identify persons who share the same surname. For 1970, 1980, and 1990, on the other hand, we can identify the number of children ever born to every adult woman. Our procedure for linking parents and children attempted to reconcile two competing goals. The first goal was to create fully compatible links by using only information available across all census years. The second was to identify, as accurately as possible, as many of the total links as we could for each census year. To accomplish the latter, we had to use all information available for any given year. The IPUMS programming rules for MOMLOC and POPLOC therefore represent a compromise between the conflicting goals of compatibility and completeness. The linking rules are described in detail in the Appendix to this chapter. Compatibility is our first priority, so we begin by establishing all parental relationships that could be plausibly identified using only relationship, age, sex, marital status, and sequence in the household listing-that is, information available in all of the component samples. The first three rules (see the Appendix to this chapter) reflect this part of the process. The next four rules use information available only in some census years to comb out the few parent-child links not identified by the first three rules. We developed these rules through trial-and-error experimentation, continuously checking the results of the programming with our own interpretations of a collection of the most problematic household records selected from several census years, and then fine-tuning the rules until we were satisfied. The variables MOMRULE and POPRULE identify which particular logical rule was used to establish a parental relationship in any given case. For analyses comparing multiple census years, users can ensure full compatibility by using only those links that were established under rules 1 through 3. In practice, the additional information available in particular census years does not make a great deal of difference. For the censuses of 1880 through 1960, 99.5 percent or more of parental links were established by means of the first three logical rules.6 In recent census years, the percentage of cases requiring additional information has risen because marital status has become less of a determinant of parenthood - by 1990, "only" 97.9 percent of maternal links were established by means of the first three rules. Identifying Stepparents
Table 5. STEPMOM/STEPPOP Values
a See Appendix . The frequency of value 2 for STEPMOM is lower than the frequency of value 7 for MOMRULE only because most mothers assigned under rule 7 have an improbable age difference and are therefore assigned a STEPMOM value of 1. Where more than one value for STEPMOM or STEPPOP was valid, the lower value was assigned. To analyze biological children one can eliminate links with a value of greater than zero on STEPMOM or STEPPOP. When comparing successive census years, one should use only values 1 and 2 of STEPMOM and STEPPOP, since they are the only ones consistently available. With the exception of the 1900 and 1910 census years, 2 percent or less of children can be identified as stepchildren or adopted children. The frequency of identifiable stepchildren is somewhat higher in 1900 and 1910, which is not surprising since those census years provide more relevant information than any others. In particular, they are the only years that indicate the number of surviving children for each woman. The true percentage of stepchildren and adopted children is no doubt higher in all census years than STEPMOM and STEPPOP indicate. Because we cannot identify all biological children, own-child fertility estimates derived from the census will be slightly biased. In particular, we would expect that estimates of mothers' ages at childbirth may be a bit low, because second and third wives are on average younger than first wives. Creation of SPLOC
The spouse links were carried out by means of seven IPUMS programming rules described in the Appendix to this chapter. These rules use only information that is available in all census years and are therefore fully compatible. Even though the spousal rules ignore much relevant information available in particular census years - such as surname, marriage duration, and age at first marriage - we nevertheless consider them to be much more reliable than the rules governing parental links. Comparison of IPUMS and 1910
Sample Linking Procedures
We experimented extensively with similar probability-based point systems for assigning links, but we found them unsatisfactory. The importance of any particular characteristic depends on its context. For example, surnames assume great significance when the relationship codes are ambiguous, but they should otherwise be ignored. A simple additive point system proved incapable of such distinctions. The 1910 procedure ran into similar difficulties. Despite the complexity of the probability-based linking system, it was sufficient to identify only the most straightforward links. More than 20 percent of individuals in the sample - some 75,000 - fell into the gray zone and had to be reexamined by hand. If we had adopted a similar procedure for the IPUMS, it would have meant looking up about 10 million cases individually, which would have multiplied the cost of the IPUMS manyfold. The logical rules described in the Appendix to this chapter produce results very similar to the 1910 procedure at a fraction of the cost. Excluding stepchildren, the maternal links obtained through each method differed in 0.66 percent of cases. When the two methods differed, we examined each case and found that in many cases the 1910 links were clearly correct. In most cases, however, the census listings are truly ambiguous, and the links are a matter of guesswork. The spousal links are more clear-cut: the IPUMS and the 1910 procedures produce identical results in over 99.9 percent of cases, even though the IPUMS method ignores all variables that are not available for the entire period from 1880 to 1990. Imputing Relationships and
Interrelationships for 1850, 1860, and 1870: IMPREL, IMPMOM, IMPPOP, and
IMPSP
The 1850-1870 census instructions to marshals specified that within each household, "the names are to be written beginning with the father and mother; or, if either, or both, be dead, begin with some other ostensible head of the family; to be followed, as far as practicable, with the name of the oldest child residing at home, then the next oldest, and so on to the youngest, then the other inmates, lodgers and boarders, laborers, domestics, and servants." In addition to this sequential information, the 1850-1870 censuses provides other valuable clues to family relationship: surname, age, sex, occupation, and birthplace. These form the bases for the IPUMS variable IMPREL (Imputed relationship), which in turn is used to create IMPMOM (Imputed location of mother), IMPPOP (Imputed location of father), and IMPSP (Imputed location of spouse). Imputed relationship (IMPREL): Most of the time, 1850-1870 relationships could be inferred using a rather simple set of logical rules. However, about a quarter of the cases were too ambiguous to determine in this way. For these, we designed a probabilistic "hot deck" imputation procedure similar to the procedures that the Census Bureau uses to allocate missing and inconsistent information. Logical rules: Seventy-five percent of cases were assigned by the following logical rules:
Note that we did not include race as a predictor, as is customary in such allocation procedures. In the context of the other nineteen variables, race was an insignificant predictor of relationship to head. Moreover, the 1880 black population, which contained many freed slaves, probably differed significantly from the free black population of the 1850 and 1860 samples. (Slaves are not included in the 1850 sample because that census did not collect enough information about them). This would make the 1880 variable RACE an unreliable predictor for 1850 and 1860. We tested the imputation procedure by applying it to the 1880 sample, matching each person to another 1880 person. (We instructed the program not to match a person to him/herself or to any other person in the same household.) We also imputed relationships for the 1910 sample, matching persons to 1880 donors in order to see if thirty years of change in household composition would introduce unacceptable biases. Both tests yielded satisfactory results. For 1880, the imputed relationship matched the relationship listed on the census form (and included in RELATE) 95 percent of the time. For heads, wives, and their children, this figure rises to nearly 99 percent. For 1910, the results were virtually the same. Furthermore, the method is unbiased; it yields the correct distribution of family relationships for both samples. Given that the 1910 imputed relationships are just as unbiased and reliable as the 1880 ones, the imputed relationships for 1850-1870 should also be unbiased and reliable. Nevertheless, as with any imputed variable, users should exercise reasonable caution. In particular, they should note that any differentials in household structure between population subgroups (e.g., by class, race, or ethnicity) are likely to be slightly understated due to random error in the imputation. Imputed family interrelationships (IMPMOM, IMPPOP, and IMPSP): Just as the IPUMS uses RELATE (Relationship to head) to create MOMLOC, POPLOC, and SPLOC for 1880 through 1990, it uses IMPREL to create IMPMOM (Imputed location of mother), IMPPOP (Imputed location of father), and IMPSP (Imputed location of spouse) for 1850-1870. The rules are described in the Appendix to this chapter. The 1850-1870 samples also contain imputed versions of the above-mentioned
constructed variables NFAMS,
NCOUPLES,
NMOTHERS,
NFATHERS,
FAMUNIT,
NCHILD,
ELDCH,
YNGCH,
NCHLT5,
and NSIBS. These are all located in
the same IPUMS columns as their respective latter-year counterparts.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Logical Rules for Inferring MOMLOC/IMPMOM, POPLOC/IMPPOP, and SPLOC/IMPSP In a few instances, however, it is necessary to use additional information available only for a subset of census years. The IPUMS linking procedure is designed to allow users to use only links based on information available across all census years, or to use extra information available in a particular census year to make the additional links. Parental Links
Rule 1. Unambiguous relationships
If the relationship of individual to household head/householder is listed as grandson, granddaughter, or grandchild, then establish parental link to the most proximate listed ever-married child and/or child-in-law with a plausible age difference. Plausible age differences are defined as 12 to 54 years for women, and 15 to 74 years for men. If there is more than one eligible parent, choose the most proximate. Rule 3. All other relatives
and nonrelatives via household position
Rule 4. All other relatives
and nonrelatives via surname
Rule 5. Grandchildren
via children-ever-born or surviving
Rule 6. All other relatives
and nonrelatives via children-ever-born or surviving
Rule 7. Spouse of linked
parent
Users who want to limit their analysis to links that could be recognized in all census years can simply ignore links based on Rules 4 through 7. In each year, over 95 percent of links were established on the basis of Rule 1. For the period before 1980, over 99 percent of cases were linked on the basis of Rules 1, 2, or 3, which are fully compatible across census years. With the increase of births to never-married women in recent census years, however, Rules 5 and 6 have become increasingly important, since they substitute information on children-ever-born for information on marital status. We performed two basic checks for inconsistency of the family links. First, if two parents were linked but they were not married to each other, we unlinked the father. Second, if both partners in a married couple were linked to the same parent, we chose the best parental link based on detailed relationship code, surname, and proximity within the household. Spousal Links
Rule 1. Link married women to previous adjacent married males with an appropriate relationship.8 Appropriate relationships are defined as follows:
Rule 3. Link married women to nonadjacent married males with an appropriate relationship as defined in Rule 1, provided both are over age 15, the husband is no more than 25 years older than the wife, and the wife is no more than 10 years older than the husband. Rule 4. Link married women with a relationship not specified on the Rule 1 appropriate relationship list to previous adjacent married men with appropriate ages as defined in Rule 3. Ignore relationship, but do not marry an unrelated person to a relative. Rule 5. Same as Rule 4, but link married women to subsequent adjacent married men. ENDNOTES:
SORT CASES BY SERIAL, PERNUM MATCH FILES TABLE=* /FILE='IPUMS.SYS' /BY SERIAL,PERNUM SAVE OUTFILE='IPUMS2.SYS' FINISH It is almost as easy to use MOMLOC and POPLOC to attach
characteristics of own children. The following SPSS-X command file uses
similar logic together with the AGGREGATE command to count the number of
own children under the age of 10 for each woman:
|