4. Making a Period Life Table

# Making a life table provides lots of R lessons. Here, I step
# through the logic of constructing a period life table taken from the
# Human Mortality Database.

# Learn to Love Your Text Editor

# Here are the first couple lines of a file downloaded from the HMD
# website which I have named "usa_deaths_1x1.txt". There is a comment
# in the first line.

# The United States of America, Deaths (5x1) Last modified: 27-Aug-2004
Year Age Female Male Total
1959 0 47902.43 64566.45 112468.88
1959 1-4 7683.28 9492.76 17176.04
1959 5-9 3714.58 5349.69 9064.27


# Reading Data Tables

#It is always a good idea to include a header row when you read in a
#ASCII table of data

usa.deaths <- read.table("usa_deaths_1x1.txt", header=TRUE, skip=1)

# Here's what the first 5 rows look like:

usa.deaths[1:5,]
Year Age Female Male Total
1 1959 0 47902.43 64566.45 112468.88
2 1959 1 3248.39 3836.93 7085.32
3 1959 2 1856.79 2423.21 4280.00
4 1959 3 1381.59 1750.88 3132.47
5 1959 4 1196.51 1481.74 2678.25

my.data <- read.table("data.txt", header=TRUE)

# Frequently, there will be text before the data begin describing the
# data set, telling you when it was last updated, etc. Here, the
# skip=n option is useful. The file I read had a one-line comment at
# the beginning, so I passed a skip=1 argument to read.table().

# This tells R to ignore the first line of the file and to treat the
# second as a header indicating the column names.

# Make sure that the data file contains only data after the skipped
# portion. For example, in mortality data, the last open age interval
# will often be denoted with something like 85+ or 110+ (as in the
# HMD). You need to use your text editor to remove such
# marks. Otherwise, R will not recognize the contents of the file as
# values. Instead it will treat them as factors. This can lead to all
# sorts of maddening problems.

# Subsetting Data Using Logicals

# Say we wanted to construct a life table for 1959, the first year in
# the file. We need to extract the number of recorded female deaths
# nDx, needed to construct a life table.

nKx <- usa.deaths$Female[usa.deaths$Year == 1959]

# We would use an analogous procedure to get the exposures, nKx, from
# the relevant data frame. Say we had done that and now had three
# vectors: "Age", "nKx", "nDx".

# We can put these into a data frame:

usa.data <- data.frame("Age"=Age, "nKx"=nKx, "nDx"=nDx)

# This will create a data frame with three columns and appropriate
# column names (the things in quotes -- they need not be the same as
# the variable names, of course).

# Vectorized calculations

# The first thing we need to calculate a period life table is the
# central death rate. The calculation is:

nMx <- usa.data$nDx/usa.data$nKx

# If the two vectors are the same size R, will divide the one by the
# other elementwise, no problem. What happens if they're different
# sizes?

# Note that we have to make the variables "nDx" and "nKx" visible to
# R. We do this by using the data.frame name "usa.data" and indicate
# which columns of this data frame to use with the dollar sign. If
# you plan to use the columns of a data frame frequently for a
# session, you can attach the data frame, making all its columns
# visible to R.

attach(usa.data)

# Now to calculate the central death rates, we simply type

nMx <- nDx/nKx

# In order to construct a life table, we need to convert the central
# death rates to age-specific probabilities of death, nqx. The
# Greville equation shows that we can do this with the only additional
# information of the average age of individuals dying in the interval
# x to x+n, nax.

# For a life table with 5 year age classes for age=5,10,..., it is
# usually a reasonable assumption that nax=2.5. That is, deaths are
# uniformly distributed through the interval. Its the first two nax
# values we need to worry about. We can easily do this in R:

nax <- rep(2.5,length(Age))

# Now, we just need to substitute the correct values of nax for nax[1]
# and nax[2].

# Assuming that you have been given values of b0 and b1 (e.g., from
# Keyfitz, or Coale-Demeny), substitute for nax[1] and, depending on the
# model, also substitute an appropriate value for nax[2].

nax[1] <- b0 + b1 *nMx[1]
nax[2] <- 1.5

# The average of of death in an open age interval is probably not
# x+n/2. A standard assumption is that the nax value for the last age
# class is the inverse of the observed nMx value.

nmax <- length(nMx) # create a variable to hold this as it will be useful again
nax[nmax] <- 1/nMx[nmax]

# You need to specify the widths of the age classes. R is very
# flexible with the way it handles concatenated lists. Here's a way
# to do it for the standard age classes of a human life table:

n <- c(1,4, rep(5, nmax - 3),999)

# The first age class is one year wide, the second is four, the next
# nmax-3 is five, and the last is some arbitrarily large number.

# Apply the Greville equation to convert nMx -> nqx:

nqx <- (n*nMx) / (1 + (n-nax)*nMx)
nqx[nmax] <- 1.0 # by definition

# Here's a nifty vectorized trick to calculate lx from nqx. No for loops!

lx <- c(1,1-qx)
lx <- cumprod(lx)

# (we could actually do this in one line: lx <- cumprod(c(1,1-qx)) )

# Vectorized calculation of ndx. This is just lx-lx+n. Do this in
# one line without a loop:

dx <- -diff(lx)

# In order to calculate nLx from lx, we need two things lx and dx.
# Unfortunately, they appear with different indices, which makes for a
# tricky for-loop. We need the lx value for age x but a ndx for age
# x+n. Here's trick that allows us to avoid complicated indexing:

lxpn <- lx[-1]

# If you want to remove a single element i from a list, use a negative
# index. This only works for removing a single element. If, say, we
# wanted to remove the first three elements of nMx, we would use
# nMx[4:nmax].

# Having removed the first element from lx and called it lxpn, we have
# solved the staggered-index problem without a for-loop:

nLx <- n*lxpn + ndx*nax

# Tx is the person years of remaining life in a cohort. To get Tx, we
# need to sum nLx from each age x forward to the end of the life
# table. Using a nifty vectorized trick, quite specific to R, we can
# simply calculate this:

Tx <- rev(cumsum(rev(nLx))

# We reverse nLx, calculate the cumulative sum and then reverse it
# again (thereby making it the right direction).

# ex is Tx divided by lx. lx is one element too long (remember, we
# prepend it with a 1 (by definition)):

ex <- Tx/lx[1:nmax]

# Now all we need to do is put all the columns in a structure. It's
# nice to round the numbers to make it more readable:

lt <- cbind("age"=Age,
"nax" = round(nax,4),
"nMx" = round(nMx,4),
"nqx" = round(nqx[1:nmax],4),
"lx" = round(lx[1:nmax],4),
"ndx" = round(ndx,4),
"nLx" = round(nLx,4),
"Tx" = round(Tx,2),
"ex" = round(ex,2) )

# Note that in order to put a bunch of columns together we don't use
# c(). Instead, we use cbind(). If we wanted rows, we'd have used
# rbind().

Posted by jhj1 at August 18, 2005 11:21 AM

Comments

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?