Survey Manual

Emergency Nutrition Surveys, Afghanistan, 2002




            This package contains most of the some computer files you may useful when conducting a survey evaluating 1) the nutritional status of children less than 5 years of age, 2) the nutritional status of women of reproductive age (15 – 49 years), 3) the recent rate and causes of mortality, 4) the  reproductive health of women of reproductive age, 5) child and infant breastfeeding practices, 6) micronutrient status of children less than 5 years of age, and 7) mictronutrient status of women of reproductive age.  These materials were used in carrying out a survey in Badghis Province in March 2002.  Distribution of this package of files has two objectives:

1)       To make assessment surveys easier for organizations working in Afghanistan who wish to carry out such surveys.

2)       To assist in the standardization of the types of data collected and the method by which data are collected.  Standardization will then make surveys in different areas of Afghanistan more comparable, allowing more complete assessment of nutrition and health. 

Of course, organizations working in different areas may encounter different situations.  For this reason, revision and adaptation is encouraged to make these survey materials more usable to the organization conducting the survey.  For example, survey managers may wish to revise the local calendar, originally developed for the western provinces, to make it more specific to their geographic area of concern.  However, because one of the objectives of this package is standardization, we encourage the use, as much as possible, of these original materials.  The data gathered by a survey using this package will conform to the consensus recommendations made by the Nutrition Planning Workshop held in Kabul on 11-14 February 2002.  It is strongly encouraged to include in any district or province-wide survey at least the minimum data recommended by this workshop. 

Many documents contained in this package are in both English and Dari.  Because these materials were originally developed for a survey in Badghis Province, we have not translated anything into Pushto.  Others are encouraged to undertake this task!  Documents were originally written in English, then translated to Dari, then backtranslated by a different translator.  Moreover, during the Badghis survey, some errors were found; these should have been corrected after the survey; however, please feel free to correct any erros which remain. 


List and description of computer files in this package


This package contains the computer files listed below.  As a convention, in this document the names of computer files will be in small upper case to distinguish them from other text; however, when you open the ZIP file, these names will appear in normal upper and lower case.  Filenames with "(Eng)" or "(Dari)" in them indicate which language that document is in.  Filenames without such a language designator are in English.  A warning: you may wish to open those files in Dari on a computer equiped with arabic Windows or Microsoft Word.  When I open them on my computer, they become irretrievably corrupted.

            Documents created originally in Microsoft Word 2000 were converted to Rich Text Format (filenames ending in .rtf) so that they are accessable to earlier versions of Word and other word processing programs.  


            a) Survey manual – READ ME 1ST.rtf – The document you are reading now

            Data collection form (Eng).rtf

            Data collection form (Dari).rtf

            Cluster control form (Eng).rtf

            Cluster control form (Dari).rtf

            Local calendar (Eng).rtf

            Local calendar (Dari).rtf

            local calendar – blank.rtf

            Verbal autopsy questions (Eng).rtf

            Verbal autopsy questions (Dari).rtf

            data to be collected.rtf

            Random number tables.rtf

            Supplies & equipment list.rtf

            Training schedule.rtf

            Anthropometry standardization form.rtf

            duties & procedures.rtf

            Badghis Report.rtf

            Badghis Report.ppt

            Data files -

Variable names in data coll form.rtf


            death rate calculation.xls

Breastfeeding analysis.xls


Data to be collected


            As mentioned above, the data to be collected by surveys using this package was agreed upon by a meeting of various ministries of the government of Afghanistan, mutiple United Nations agencies, various donors, and many non-governmental organizations (NGOs).  The basic variables are listed in the document data to be collected.rtf.


Justification for and description of sampling method


            Because of the general lack of population data at district, provincial, and national levels, the Badghis survey, and probably most other surveys in Afghanistan, will have to use cluster sampling methods.  If each cluster is the same size, this first stage of sampling must be done probability proportional to size.  That is, clusters must be selected using the method most commonly used for cluster surveys.  This method is recommended by the World Health Organization (WHO) and is described in many other resources. 

The second stage of sampling is a bit more controversial.  We found during the Badghis survey that village leaders often had or could quickly help us construct a list of all the households currently living in each village selected to contain a cluster.  We then employed a random number table to selected households from this list.  For this reason, there is no need to employ the frequently used, albeit incorrect and biased, method for the second stage sampling which is recommended by the Expanded Programme on Immunization (EPI) of the WHO.  The EPI method biases household selection toward the center of the village, which in many cases introduces serious bias into measures of health and nutrition.  We strongly discourage its use!  Moreover, it probably takes as little time to construct a household list and choose households correctly with a random number table than it does to find the center of the village, choose a direction, count the houses along this direction, and choose a starting household, as recommended in the EPI method.  Moreover, truly random selection of households probably decreases the design effect introduced by cluster sampling, since the households in a single cluster are no longer adjacent to each other, but instead scattered throughout the village.  This decrease in design effect can substantially increase the precision of the survey without increasing the sample size. 

Our experience in the Badghis survey demonstrated that people with secondary or university training, but no training or experience in mathematics or statistics, can easily learn the sampling procedures described above and in the Badghis survey report (Badghis Report.rtf).  This sampling method is described in detail in the document duties & procedures.rtf,  which can be given to each English-speaking team supervisor to assist him/her in the field when carrying out the second stage of sampling.  Please give this method a try; you will find that it is not much more difficult, and possibly easier, than the traditional EPI method.  And it is really much more correct. 

Of course, if you already have a list of all the sampling units in your sampling universe, that is, if you have a list of all households which you might want to include in your survey, there is no need whatsoever for cluster sampling.  Using simple or systematic random sampling, you can achieve much greater precision with the same sample size or the same precision with a much smaller sample size.  For example, in Maslakh Camp outside Herat, a new registration exercise in February 2002 has made available a computerized list of all households in the camp.  Each household has an address listed in the computer, and the houses in the camp are all numbered with these addresses.  Therefore, it would be very easy to choose a systematic random sample from the computer, then go to each of the selected households.  Do not blindly rely on cluster sampling for all your survey needs; you will be wasting valuable resources and time in some situations. 


Sample size calculations


            The file Badghis Report.rtf contains the report from the Badghis survey.  The sample size calculations used are fully explained therein.  In short, the traditional 30 x 30 nutrition cluster survey (30 children in each of 30 clusters) is sometimes not necessary and probably often wastes resources and time if survey managers do not need extreme precision.

When looking at more than one health and nutrition outcome, you must calculate the sample size needed to achieve your desired precision for each important outcome.  The outcome with the largest sample size dictates the sample size for the entire survey. 

Moreover, you need to distinguish between sampling units and units of analysis.  For example, in the Badghis survey, we sampled households, but we wanted to collect and analyze data on children < 5 years of age.  Assuming each household had, on average, 1.3 children < 5 years of age ( as shown in the MICS2 survey), we selected enough households to enroll our target number of 534 children (about 18 children per cluster).  The number of households necessary for the entire survey was 534 divided by 1.3 or 410 households.  This comes out to only 13.5 households per cluster.  However, because of the heavy emigration from Badghis Province in the past few years, we expected as many as 25% of households to be absent.  We therefore selected 18 households per cluster. 

            We chose 30 clusters because with fewer clusters, the design effect tends to rise rapidly.  More clusters is better; however, adding clusters may not provide enough additional precision to make the extra logistic efforts worthwhile.  In Badghis Province, and probably most other rural areas of Afghanistan, the great distances between clusters present serious logistic constraints, making the addition of clusters beyond 30 too costly in time, money, and manpower.  Of course, survey managers who do not face such constraints and who wish to further decrease the design effect (and thus increase precision) could choose more numerous, smaller clusters. 

            A word on whether to select only one child per selected household or include all eligible children in each selected household:  if you choose to select only one child, please do so randomly.  Do not select the oldest or youngest child or the child who is most readily available.  Such procedures introduce serious bias into your sample of children.  If you select only one child, you must record how many eligible children are in each household, and then perform a statistically weighted analysis.  In such situations, the child selected in a household with more than one eligible child represents all the eligible children in that household and therefore must be given more statistical weight during data analysis.  We think it is much easier to just include all eligible children in each household.  The data analysis is much easier, the sample is not longer biased toward single children, and it does not substantially increase the clustering (which increase the design effect).


Training survey workers


            The survey worker training for the Badghis survey took about 4 days.  If you are using survey workers with some experience in surveys or anthropometric measurement, training may take less time.  The training schedule used for the Badghis survey is contained in the file Training schedule.rtf. 

            Training included practice in the use of a local calendar of events to determine children's age more precisely that initially reported by the mother.  This calendar in included in the files Local calendar (Eng).rtf and Local calendar (Dari).rtf.  Also included is the file local calendar – blank.rtf.  This file gives you the format of the local calendar used in Badghis, but leaves the specific events blank so you can create a local calendar more appropriate to the region of your survey.  The Badghis survey workers tended to accept at face value the mother's reported age or month of birth.  They needed substantial encouragement to probe more deeply for the child's month and year of birth in order to calculate a more accurate age.  For this reason, our height-for-age estimates, as shown in the survey report, are probably not very good.  We hope you can do better. 

            An important part of training in anthropometric techniques is the standardization exercise, in which all survey teams weight and measure the same children to estimate  interobserver variability.  The form we used in the Badghis training is contained in the file Anthropometry standardization form.rtf.   Such forms are also available from other sources.  

            Training also included 1) extensive discussion of specific job duties for each category of survey team worker, 2) detailed instructions for the second sampling stage, and 3) common questions and answers.  These topics are covered in some detail in the document duties & procedures.rtf.  Training also included practice using the random number tables (Random number tables.rtf).


Field work


The supplies and equipment we used in the Badghis survey are listed in the file    Supplies & equipment list.rtf.  These supplies were, of course, split up among the teams.  Each team requires only one or a few items of some equipment, such as scales, height boards, and copies of instructions.  Therefore, if you have fewer or more teams that the five we had, you may need fewer or more of some items.  The overall requirement for other items, such as data collection forms, depends on the sample size. 

We found that mothers were much for familiar with the Afghan calendar than the Gregorian calendar and that survey workers were more comfortable and could be more accurate if they recorded the Afghan dates.  The data analysis program (Nutr.pgm) takes this into account when calculating ages and other time periods. 

After completion of the second stage of sampling at a selected village, the survey workers listed selected households on the cluster control form, contained in the files Cluster control form (Eng).rtf and Cluster control form (Dari).rtf.  They then completed each column as data collection in that cluster progressed.  This form provides an important record of how successful each team was in locating and completing data collection at each household in the cluster and should be submitted with the data collection forms.  The form also provides the team with a list of selected households so that they need not carry the village leader's list of village during data collection. 

            The cause of each death during the recall period was determined by verbal autopsy.  The questions used are contained in the files     Verbal autopsy questions (Eng).rtf and Verbal autopsy questions (Dari).rtf.  They are based on validated questions as recommended by WHO and should not be substantially altered.  Of course, should a survey manager wish to explore other causes of death, questions can be added.  But careful consideration should be given to where new questions should be placed in the list.  The list of questions is hierarchical, that is, when the interviewer receives a positive response from the respondent to a specific question, questioning stops and the appropriate code for cause of death is recorded on the data collection forms.  No questions occurring after this question are posed to this respondent.  For example, if a mother said her child died with bloody diarrhea, questions about pneumonia and meningitis would not be asked.  Survey managers who interpose a question must realize that if the mother responds yes to this new question, no questions after it will be asked and exploration of these causes of death will not occur for this specific death. 


Data entry and checking


            Data collection forms are contained in the files Data collection form (Eng).rtf and Data collection form (Dari).rtf.    Please feel free to add questions, but keep in mind that the data entry programs in this package, including the EpiInfo CHECK files, may no longer work unless you modify them to include the new added fields.  

            If you wish to use the data collection forms as they are, then the files contained in the Zip file Data files - should all be put into the same folder.  These files include all the EpiInfo files necessary to enter survey data collected on the data collection forms provided in this package.  Set EpiInfo's default path for data files to this folder by using the EpiInfo main menu choice "Setup" and choosing "Path for data files."   You will then be able to enter data with the CHECK files activated.  They will help decrease errors during data entry.  Also, using the data entry files contained in this package allows the data from each household to be entered into four separate data files.  Linking variables will automatically be created so that these files can be related during data analysis.  The four data files are:

hh.rec – Containing the data on general household factors from the top half of the page 1 of the data collection form

hhmember.rec – Containing the household census on the bottom half of the page 1 of the data collection form.

women.rec – Containing the data on women of reproductive age from the page 2 of the data collection form.

child.rec – Containing the data on children < 5 years of age from the page 3 of the data collection form.

You can also enter data into these files separately without using the automatic linking created during data entry.  If you do this, please pay careful attention to correctly entering data into the variables needed for linking the data files during analysis.  These variables are called:

CL – Cluster number

HH – Household number

MEMNUM – The number for each individual in the household

MOTHERS – The MEMNUM for each child's mother

These variables are then used to calculate unique identification number for each cluster (a variable called CLUSTER), household (CLHH), and individual (CLHHMEM).

These data files may contain some fields, such as child's hemoglobin, child's MUAC, and mother's hemoglobin, which were not used during the Badghis survey; they are included in the data files for the use of others who may wish to collect these data in subsequent surveys. 


Data analysis


The file nutr.pgm will produce the results necessary to complete a report similar to the report of the Badghis survey.  Please carefully read the notes at the top of nutr.pgm.  To use these files, just put the nutr.pgm file along with all the other data files (.REC, .CHK,.DAT, .IX, and .IXT files) in the folder which is the default EpiInfo path, as mentioned above.  Then type "run nutr.pgm" from the command line in the EpiInfo Analysis program.  The output of analysis will be contained in a file called nutr.ana   If you used the original data entry files, nutr.pgm will relate the four data files files (hh.rec, hhmember.rec, women.rec, and child.rec) appropriately during data analysis to combine data from all four data files.  The file zscore.rpt, contained in the file Data files -, must also be placed in the default folder in order to produce the distributions of zscores necessary to produce figures similar to those in the Badghis report. 

The file Variable names in data coll form.rtf is a copy of the data collection form to which the variable names for all the variables in the datafiles (hh.rec, hhmember.rec, women.rec, and child.rec) have been added in red font in squiggly brackets, for example {age}, {sex}, etc.  The variable names are placed in or adjacent to the space on the data collection form where these data were recorded on the paper data collection forms in the field. 

You can use the Excel spreadsheet death rate calculation.xls to calculate death rates from the results contained in the EpiInfo output (nutr.ana).  The output necessary to do this is found in the section of nutr.ana called " Numerators & denominators for mortality rates."   The spreadsheet identified which variables from nutr.ana to use.  Just fill in the required data, shown in blue font in the spreadsheet, and it will calculate the rates and 95% confidence intervals for you.

The spreadsheet Breastfeeding analysis.xls can be used to calculate the 3-month moving average percents of children eating solid food and children still breastfeeding.  The data necessary to complete the spreadsheet are found in nutr.ana under the heading " Other breastfeeding analyses."   The numbers can be cut and pasted into the spreadsheet and then into the Power Point file to create the graphs as seen in the file Badghis Report.ppt.  With these graphs, you can determine the median age of introduction of solid food and cessation of breastfeeding. 

Of course, the EpiInfo Analysis program does not generate hypothesis tests (that is, p-values and confidence intervals) which account for the cluster sampling.  As a result, the p-values and confidence intervals in the output from nutr.pgm (the file nutr.ana) should be ignored!  The nutr.pgm program does, however, create a new data file, called childcl.rec  which can be used in the EpiInfo Csample program to generate the correct confidence intervals and p-values shown in the Badghis report.  nutr.ana also produces a new data called mortcl.rec which can be used in the EpiInfo CSample program to calculate the design effect for mortality rates. These design effects will be necessary when using the spreadsheet death rate calculation.xls to calculate death rates and confidence intervals around death rates.  In mortcl.rec, a single variable (agesexgrp) identifies both age group and sex.  This facilitates the use of mortcl.rec in Csample.

The clusters in Badghis were chosen from a list of villages.  This list started with villages in one district, then the villages in another district were listed, then the next district, and so on.  This essentially produced a stratified sample because we guaranteed that the number of clusters in each district would be roughly proportional to the share of the province's population contained in that district.  The data can be analyzed taking into account this equally-probability stratified sample.  Unlike many stratified samples, no statistical weighting needs to be done because the sampling fraction in each stratum (district) is the same.  The big advantage to such an analysis is that the precision often increases if the analysis accounts for this stratification.  This is not difficult; when analyzing data in Csample, in the box called "Strata," choose the variable "DISTRICT."  For some variables, the increase in precision will be well worth the very small trouble. 

Because of the hassle in using Csample, the Badghis report does not contain much hypothesis testing.  It is confined to confidence intervals around the estimates of the prevalence of child malnutrition, the risk ratios for risk factors for child malnutrition, and confidence intervals around the mortality rates.  

The nutr.pgm program also produces some analyses which are not contained in the Badghis report.  Moreover, it does not produce all possible analyses of the data; others are welcome to add to nutr.pgm or suggest additional analyses to be added to it. 


Writing the report


The results of analysis can be presented in many different ways.  The Badghis report (Badghis Report.rpt) provides only one example.  However, the presentation of the anthropometric results should include the prevalence of overall global acute malnutrition as well as the prevalence of moderate and severe acute malnutrition.  The definitions given in the Methods section of the Badghis report should be used to define child and adult malnutrition.  This is necessary to compare the results of different surveys.

Any report of the results of a survey should include a description of the methods used so that readers could, if wished, duplicate the survey.  This also allows readers to judge the appropriateness of the methods for sampling, data collection, and data analysis.  I also allows others to use the same methods so that surveys can be compared. 

We recommend using something like the figures which display the distribution of z-scores.  Such a display allows survey managers and readers to determine if there is a small subpopulation which is disproportionately malnourished and therefore makes up a large percentage of the malnourished children or, alternately, if all children are relatively malnourished and only those who started a bit thin now fall below the cut-off points defining malnutrition. 


Distributing the results


            Of course, the results of any survey should be distributed as widely as possible.  As mentioned above, one objective of this survey package is to standardize nutrition assessment in Afghanistan.  Your results could save someone else the trouble of conducting a survey if they are widely distributed and conform to the consensus recommendations of the Kabul meeting. 


A final word


            Thank you for looking over this survey package.  We hope it helps you carry out health and nutrition assessment in whatever geographic area you are working in.  And we hope that your revisions will make it more useful and easier to use.  I understand that this may look complicated at first glance, but it really is not.  If you need any help with planning and implementing your survey or analyzing the results, please feel free to contact one of the people listed below:


Felicite Tchibindat

Nutrition Project Officer, UNICEF – Afghanistan Country Office



Bradley A. Woodruff (Woody)

Medical Epidemiologist, U.S. Centers for Disease Control & Prevention