The pedigree routines came out of a simple need – to quickly draw a pedigree structure on the screen, within R, that was ``good enough’’ to help with debugging the actual routines of interest, which were those for fitting mixed effecs Cox models to large family data. As such the routine had compactness and automation as primary goals; complete annotation (monozygous twins, multiple types of affected status) and most certainly elegance were not on the list. Other software could do that much better.
It therefore came as a major surprise when these routines proved useful to others. Through their constant feedback, application to more complex pedigrees, and ongoing requests for one more feature, the routine has become what it is today. This routine is still not suitable for really large pedigrees, nor for heavily inbred ones such as in animal studies, and will likely not evolve in that way. The authors fondest hope is that others will pick up the project.
The [[famid]] variable is placed last as it was a later addition to
the code; thus prior invocations of the function that use positional
arguments will not be affected.
If present, this allows a set of pedigrees to be generated, one per
family. The resultant structure will be an object of class
[[pedigreeList]].
Note that a factor variable is not listed as one of the choices for the subject identifier. This is on purpose. Factors were designed to accomodate character strings whose values came from a limited class – things like race or gender, and are not appropriate for a subject identifier. All of their special properties as compared to a character variable turn out to be backwards for this case, in particular a memory of the original level set when subscripting is done.
However, due to the awful decision early on in S to automatically
turn every character into a factor — unless you stood at the door with a
club to head the package off — most users have become ingrained to the
idea of using them for every character variable. (I encourage you to set
the global option stringsAsFactors=FALSE to turn off autoconversion – it
will measurably improve your R experience). Therefore, to avoid
unnecessary hassle for our users the code will accept a factor as input
for the id variables, but the final structure does not retain it.
Gender and relation do become factors. Status follows the pattern of the
survival routines and remains an integer.
We will describe the code in a set of blocks.
The code starts out with some checks on the input data.
Is it all the same length, are the codes legal, etc. Checks for ids
being non-missing, and for sex to be as expected of the codes 1-4 for
female/male/unknown/terminated.
Create the variables descibing a missing father and/or mother, which is what we expect both for people at the top of the pedigree and for marry-ins, adding in the family id information. It is easier to do it first. If there are multiple families in the pedigree, make a working set of identifiers that are of the form `family/subject’. Family identifiers can be factor, character, or numeric.
Next check that any mother or father identifiers are found in the identifier list, and are of the right sex. Subjects who don’t have a mother or father are founders. For those people both of the parents should be missing.
Now, paste the parts together into a basic pedigree. The fields for father and mother are not the identifiers of the parents, but their row number in the structure.
The final structure will be in the order of the original data, and all the components except [[relation]] will have the same number of rows as the original data.
Subscripting of a pedigree list extracts one or more families from
the list. We treat character subscripts in the same way that dimnames on
a matrix are used. Factors are a problem though: assume that we have a
vector x with names joe'',
charlie’‘, ``fred’’, then
[[x[‘joe’]]] is the first element of the vector, but [[temp <-
factor(‘joe’, ‘charlie’, ‘fred’); z <- temp[1]; x[z] ]] will be the
second element! R is implicitly using as.numeric on factors when they
are a subscript; this caught an early version of the code when an
element of a data frame was used to index the pedigree: characters are
turned into factors when bundled into a data frame.
Note:
\begin{enumerate}
\item What should we do if the family id is a numeric: when the user
says [4] do they mean the fourth family in the list or family '4'?
The user is responsible to say ['4'] in this case.
\item In a normal vector invalid subscripts give an NA, e.g. (1:3)[6], but
since there is no such object as an ``NA pedigree'', we emit an error
for this.
\item The [[drop]] argument has no meaning for pedigrees, but must to be
a defined argument of any subscript method; we simply ignore it.
\item Updating the father/mother is a minor nuisance;
since they must are integer indices to rows they must be
recreated after selection. Ditto for the relationship matrix.
\end{enumerate}
For a pedigree, the subscript operator extracts a subset of individuals.
We disallow selections that retain only 1 of a subject's parents, since %'
they cause plotting trouble later.
Relations are worth keeping only if both parties in the relation were
selected.
Convert the pedigree to a data.frame so it is easy to view when removing or
trimming individuals with their various indicators.
The relation and hints elements of the pedigree object are not easy to
put in a data.frame with one entry per subject. These items are one entry
per subject, so are put in the returned data.frame: id, findex, mindex,
sex, affected, status. The findex and mindex are converted to the actual id
of the parents rather than the index.
Can be used with as.data.frame(ped) or data.frame(ped). Specify in Namespace
file that it is an S3 method.
This function is useful for checking the pedigree object with the
$findex$ and $mindex$ vector instead of them replaced with the ids of
the parents. This is not currently included in the package.
It usually doesn't make sense to print a pedigree, since the id is just %'
a repeat of the input data and the family connections are pointers.
Thus we create a simple summary.