A genealogical corpus is a set of individuals linked by relations of kinship and marriage with basic and supplementary information for each individual that has been coded :

 Basic Information :

  • A unique identity number (ID)
  • Name(s)
  • Gender: H (man), F (woman), X (gender unknown)
  • Father’s ID number
  • Mother’s ID number
  • Spouse(s) ID number

 Supplementary Information:

  • Biographical informations (birth, marriage, death dates and places, other properties)
  • Notes


Tips for Collecting Kinship Data 
Data are not only a result but also a means of data collection. They should be easily accessible in order to guide your research and to cross-check your informant’s answers. When dealing with archives, this is often fairly simple: you can take a computer with you. But in many fieldwork situations this is not possible. However, noting kinship "by hand" can be extremely fast and efficient, if some basic principles are observed :

  • Always use a compact medium, such as a notebook. Do not use filesheets or loose papers. You cannot use them during interviews, and there is a high risk of loosing some of them.

  • Separate graphics and text. A good method is to use a notebook with the left page for drawing genealogies, the right page for listing the individuals and their properties, and numbers for identifying these individuals (if numbers get large, it is recommended to use, in addition, initial letters to prevent identification problems in case of numbering errors) — Attribute an identity number to each individual and never attribute that number to another individual. If you have "doubles", make a link to the original number but do not re-assign it. Holes in the series of numbers do not cause any damage, but ambiguities in identity numbers cause much damage, and are extremely difficult to detect.

  • Do not use identity numbers as codes. Identity numbers serve to identify individuals - and nothing else (except, perhaps, to recall the order in which you have entered them and to document the history of your corpus). If you want to convey information on individuals gender, clan affiliation, residence, etc., do not use identity numbers for that.

  • Never forget to make regularly copies and store them on different places. This holds for all data, but especially for kinship data, due to the network properties of kinship: one lost notebook may render twenty others useless.

 Frequently Asked Questions :

  • Do I have to number individuals continuously ?
    No. Discontinuous numbering is no problem for Puck nor for most other genealogical programs. Pajek requires continuous numbering, but Puck can convert datasets into pajek file format including renumbering without loss of information on original numbers (by using the option "numbered" for exportation). However, you should avoid too large empty spaces between identity numbers, because some search methods may get more time intensive.

  • Some individuals in my dataset are doubles, do not exist, or have become obsolete. Can I delete them ?
    Yes, but do not reassign their identity numbers to other individuals! Just leave their positions empty. In the case of doubles, it can be useful to keep them in your dataset, so that you can easily find informations on the individuals in the different places in your notebooks. You can mark them as doubles by assigning them as a name the identity number of the original. If needed, you can always eliminate them by the eliminate doubles option.

  • How do I code kinship relations between individuals when I ignore the exact genealogical chain ?
    If you know the exact genealogical relation, you may introduce into your dataset virtual
    individuals - having « # » as a name - as intermediary links (for instance, if you know that A is B’s paternal brother, you may introduce a virtual common father). Make sure, however, that the kinship term people give you really corresponds to the supposed genalogical relation (in many societies, kinship terms may designate large classes of relations, some of them may be without any genealogical foundation whatsoever!) If you are not 100% sure that your « brother » really is a brother in a genalogical sense, you should rather store the information in a note or as relational property of the concerned individuals.

  • How do I code divorced spouses ?
    Like all other spouses, living or dead, married or divorced. You can store the information on divorce among the individuals properties (see also File formats for kinship data).