Ask the Experts

IASS members can submit questions on topics related to survey methods and research. We will aim to provide the answer on this website as soon as possible.


What is the state of play on statistical matching with a focus on auxiliary information, complex survey designs and quality issues? By Marcello D’Orazio, Marco Di Zio and Mauro Scanu

Doubly and Multiply Robust Procedures for Missing Survey Data. By Sixia Chen  and David Haziza 

Mixed mode official surveys. Current status and near future. By Barry Schouten, Jan van den Brakel, Dierdre Giesen, Annemieke Luiten and Vivian Meertens

How to Measure Disclosure Risk in Microdata? By Natalie Shlomo

Data Sources for Business Statistics: What has Changed? By Stefan Bender and Joseph W. Sakshaug

Machine Learning from the Perspective of Official Statistics by Marco J. H. Puts and Piet J. H. Daas
What are the conditions under which various survey designs that do not use probability samples might still be useful for making inferences to a larger population?
Answer is available as PDF only. Click here to download
What is the best way of combining the results from two surveys, which ask some common questions, possibly on overlapping samples, to improve the variance of the estimates for the common items?
Answer is available as PDF only. Click here to download
Can web surveys provide an adequate alternative to phone and face to face surveys?
Answer is available as PDF only, click here to download


For autobiographical information, what is the longest reference period one can use in a survey?
Survey research questions concerning behaviors are often dependent upon the retrospective recall ability of respondents. Empirical evidence indicates that the accuracy for temporal information lessens with time; the design tradeoff is often one in which researchers wish to ask about short reference periods so as to reduce response error vs. the need to ask about long reference periods, so as to capture rare events (e.g., hospitalizations; major purchases).
Given this, what can survey designers use as guidelines for determining the ideal reference period for capturing accurate autobiographical information? Unfortunately, there is no simply answer to this question, since reference period is but one factor that impacts the quality of data based on retrospective recall. In addition to reference period, questionnaire designers need to consider the distinctiveness of the behavior of interest, the saliency of the behavior, and the nature of the task one is asking the respondent to perform. As the length of the reference period increases, the likelihood of multiple, similar events also increases.
Although the occurrence of multiple, similar events may make the task of retrieving information concerning any one event more difficult (lack of distinctiveness), responses to a yes/no or ever/never question may be improved due to the repeated behavior within the reference period. Important or salient events or behaviors tend to be well reported; in part, such events may benefit from more elaborate encoding and more frequent retrieval and reporting of the event (rehearsal).One of the difficulties facing questionnaire designers is the lack of information concerning the behavior experience of different respondents. The theoretical and empirical literature suggests that simple experience structures are simple to report, even over long periods of time, and that difficult experience structures are quite difficult to report, regardless of the length of the reference period. Hence, questionnaire designers may wish to tailor question sequences, based on questions which sort respondents according to the complexity of his or her behavioral experience.
What are the practical implications for the questionnaire designers with respect to the length of the reference period? Researchers must consider both the characteristics of the behavior or event and the characteristics of the response task. With respect to the characteristics of the behavior, the length of the reference period can vary as a function of the distinctiveness and saliency of the behavior or event of interest. However, one must also consider the nature of the task facing the respondent in making the determination of the reference period. The retrieval of detailed episodic information will most likely require the use of a different reference period than ever/never occurrence questions. For example, the retrieval of detailed dietary information may require the use of a reference period no longer than 24 hours whereas the quality of reports of a purchase of a new automobile may be quite high for reference periods of a year or more. In addition, questionnaire designers can improve the quality of retrospective reports of behavior (regardless of reference period) through the use of multiple cues which take advantage of the way in which memories are organized (including cues which focus on details other than when the event occurred) and by allowing respondents sufficient time to adequately search his or her memory.For a review of the theoretical and empirical literature related to the effects of the length of reference period, the following volumes provide an excellent review as well as a number of references for additional reading:

S. Sudman, N. Bradburn, and N. Schwarz (1996). Thinking about Answers: The Application of Cognitive Processes to Survey Methodology San Francisco: Jossey-Bass. See Chapters 7 and 8.

Tourangeau, R., Rips, L, and Rasinski, K. (2000) The Psychology of Survey Response. Cambridge University Press. See Chapters 3 and 4.

The Survey Statistician, no. 50, pages 14-15, July 2004


When is auxiliary information best used; for sample design or for estimation purposes?
To answer this question, we first need some structure. Suppose we are interested in estimating the population totals for a set of variables based an a random sample of population units drawn from some sampling frame.Aiding this endeavor is a second set of auxiliary variables for which we already know the population totals. This auxiliary information can be used at the estimation stage to compensate for units selected for the sample that fail to provide adequate responses and for population units missing from the sampling frame. There are no equivalent potential uses of such information at the design stage.Even in a textbook environment where the sampling frame is complete and every sampled unit provides a usable response, the combination of a simple random sample and an estimator constructed to make prudent use of the auxiliary information will usually produce better results than a cleverly constructed sample design employing the same information coupled with an expansion estimator (a sample-weighted sum where the weights are the reciprocals of the probabilities of selection).That is not always the case. In one of the simplest examples of the use of auxiliary information, there is a single variable of interest and a single auxiliary. The value of the auxiliary variable is known and positive for every unit in the population. If the unit-by-unit ratios of the variable of interest and the auxiliary variable behave like independent random variables with a common mean and variance, then for a given sample size, the combination of a sample drawn with probabilities proportional to the auxiliary variable and an expansion estimator will tend to be more efficient (have less variance) than a simple random sample combined with a ratio estimator (an expansion estimator for the variable of interest multiplied by the ratio of the population total for the auxiliary variable and the expansion estimator for the auxiliary variable).
Few surveys are conducted to estimate a single variable total, however. If we have two or more variables of interest each with their own auxiliary, then we can construct a different ratio estimator for each variable total, but we can only draw the sample once. The design may be relatively efficient for one of those variables, but not for all of them. Moreover, with only a modest loss of efficiency, we can use the same calibration estimator for each survey variable. A calibration estimator looks like an expansion estimator but the original sampling weights are replaced by calibration weights. Calibration weights are modifications of the original sampling weights constructed so that the calibration-weighted sum across the sample of each auxiliary variable equals its population total. (Note: The ratio estimator is an example of a calibration estimator in which the calibration weight for a unit is its original sampling weight multiplied by the ratio of the population total for a lone auxiliary variable to the expansion estimator for that auxiliary variable.) There is no equivalent way to design a sample to be as efficient for all variables of interest at once.
This is not to say that auxiliary information should be used in estimation exclusively. Far from it. The National Agricultural Statistics Service calibrates its quarterly crops surveys on as many as 20 auxiliary variables in a state. The agency uses the same set of auxiliary variables in its sample design to assure adequate sample sizes for each variable of interest (if the auxiliary value is zero, the corresponding survey value will likely be zero as well). Moreover, the sample design can be used to increase efficiency of the estimator.
As an example of this, recall the single-variable-of-interest-single-auxiliary example discussed above. Suppose again that the unit-by-unit ratios of the variable of interest and the auxiliary variable can be treated as independent random variables with a common mean and variance (formally, this is a model assumption). For a fixed sample size, the design under which the anticipated (model-expected) variance of the ratio estimator is minimized selects units with probability proportional to the auxiliary variable. It turns out that under that design, the original-sample-weighted sum of the auxiliary variable equals its population total, and the ratio estimator collapses into the expansion estimator.Perhaps the best answer to the question is a Zen one. Mu. Unask it. Auxiliary variables can most profitably be used in sample design and estimation simultaneously.Kott, Phillip S. and Bailey, Jeffrey T. (2000), “The Theory and Practice of Maximal Brewer Selection with Poisson PRN Sampling,” Paper presented at the International Conference on Establishment Surveys, II, Buffalo, New York.
Also at
Phil Kott
703-877-8000 x102The Survey Statistician, no. 53, page 13-14, January 2006
Why collect new data when data are readily available in registers and data banks?
There are three reasons that may limit the suitability of register and other data. The first reason is that certain data in a register may not be accurate if the quality of values of individual fields is not high. Some information such as the value of a sex code or age may be missing or inaccurate because its accuracy is not needed for the day-to-day needs of the database. For instance, a tax file may not need sex code or age to be accurate. The tax file may not accurately track children in those countries where tax breaks are given for dependent children. If the main tax file needs to connect into other files with supplementary tax information, then the quality of the tax id field in each file needs to be high.Any error in the tax id field usually causes the main tax record to not be connected with the correct corresponding supplementary record.
The second reason is that a set of files may not have unique, verified identifiers. If the analyst needs to use joint (x, y) data where x comes from one file and y from another file, then x- and y-data can usually only be easily and accurately linked using the unique identifiers.
The third reason is that an analyst may need an extra z-variable to combine with (x, y) data where z is not in any known file. For instance, if z is the amount of tax savings by some individuals due to a specific tax break or the result of a treatment of a particular disease by a new drug.If sex code is in error (or missing), then it might be corrected using the first name. If age or date-of-birth is missing or in error, then it might be corrected using an auxiliary data source. If matching is on name and address, then name would need to be accurate and address would need to be current. If the file contains 30,000 records with the name ‘John Smith,’ then an erroneous or out-of-date address would not allow linkage of records across two files. In some situations, it is virtually impossible to match a record ‘Karen Jones 1964Apr10’ with ‘Susan K. Smith 1985Jan07’ because Karen Jones now has the last name Smith, she usually uses her middle name Karen instead of Susan, and one the dates-of-birth is completely wrong. With businesses, it may be extremely difficult to match ‘John L. Smith and Sons, Inc 1234 Main Street’ with ‘JLS Co. PO Box 657.’ Business names are often represented in a number of difficult-to-compare variations. An address associated with a business may be associated with a location, a PO Box, or the address of an accountant.False match and false nonmatch rates are needed to evaluate the quality of matching. If the false match rate is moderate or high, then the resultant merged file may yield substantial analytic errors. Figure 1 illustrates the situation. The line represents the true regression line. Figure 1a shows original (x, y) regression data without matching error. Figure 1b shows the regression data with 10% matching error and Figure 1c shows the regression data with 50% matching error. With a 10% false match rate, the regression coefficient is more than 10% in error and the R2 statistic is low by 25%. With a 50% false match rate, the regression coefficient is more than 50% in error and the R2 statistic is 75% low. If the false match rate is very low and the false nonmatch rate is moderate, then the intersection A ∩ B may not be a representative subset of either file A or file B. For instance, if low income individuals in either file A or file B contain disproportionately higher typographical variation than other individuals, then the low-income individuals will not be well represented in the intersection.Chart-photo-hereThe Survey Statistician, no. 53, page 12-13, January 2006
We made a survey last year in the housing sector, and we are going to repeat it next year in order to measure changes. What do we need to think of? Should we make an independent sample?
An independent sample is probably not a good idea. The reason is that you could benefit largely from the postive correlations that occur for many housing survey variables, between the two years that you compare, by using (a large part of) last years sample again, and base your inference on the differences for individual households. Such a sampling procedure can create drastic gains in sampling variances as compared to the one based on two independent samples, because the positive correlations are very favorable for the same sample used at both occasions with an estimator based on differences.
In Wallis & Roberts (1965) an example is given showing that 25 units, measured twice was comparable in precision to two samples of 2.222 units, measured once each! As the authors puts it: “This illustrates the the potential importance of proper statistical planning before collecting data”.Another issue that is important to keep in mind is that the estimators really will estimate the true change. If you make changes to the methodolgy of the survey, you will have to exclude those as explanation of the difference that is obtained. So, the same mode of data collection, and the same questionnaire at the two occasions, would probably be a wise design decision.Wallis W. A. & Roberts H. V.: Statistics: A New Approach, Twelfth printing 1965.The Survey Statistician, no. 52, page 16, July 2005


What is a specification error in the context of a sample survey?
The following excerpt from Biemer and Lyberg (2003) provides some information on this question:
“Specification error occurs when the concept implied by the survey question and the concept that should be measured in the survey differ. When this occurs, the wrong parameter is being estimated in the survey and, thus, inferences based upon the estimate may be erroneous.
Specification error is often caused by poor communication between the researcher, data analyst, or survey sponsor and the questionnaire designer.For example, in an agricultural survey, the researcher or sponsor may be interested in the value of a parcel of land if it were sold at fair market value. That is, if the land were put up for sale today, what would be a fair price for the land? However, the survey question may simply ask “For what price would you sell this parcel of land?” Thus, instead of measuring the market value of the parcel, the question may instead be measuring how much the parcel is worth to the farm operator.
There may be quite a difference in these two values. The farm operator may not be ready to sell the land unless offered a very high price for it – a price much higher than market value. Since the survey question does not match the concept (or construct) underlying the research question, we say that the question suffers from specification error.To take this example a step further, suppose the survey analyst is only interested in the value of the parcel without any of the capital improvements that may exist on it such as fences, irrigation equipment, air fields, silos, out buildings and so on. However, the survey question may be mute on this point. For example, it may simply ask “What do you think is the current market value of this parcel of land?” Note that this question does not explicitly exclude capital improvements made to the land and thus, the value of the land may be inflated by these improvements without the knowledge of the researcher. A more appropriate question might be, “What do you think is the current market value of this parcel of land? Do not include any capital improvements in your estimate such as fences, silos, irrigation equipment, and so on.”The question, “What do you think is the current market value of this parcel of land?” is not necessarily a poorly worded question. Rather, it is the wrong question to ask considering the research objectives.
A questionnaire designer who does not clearly understand the research objectives and how data on land values will be used by agricultural economists and other data users may not recognize this specification error. For that reason, identifying specification errors usually requires that the questions be reviewed thoroughly by the research analyst or someone with a good understanding of the concepts that need to be measured in order to properly address the research objectives. The research analyst should review each question relative to the original intent as it relates to the study objectives and determine whether the question adequately reflects that intent. For the land values example, the agricultural economist or other analyst who will use the data on land values would be the best person to check the survey questionnaire for specification errors. In general, detecting specification error usually requires a review of the survey questions by researchers who are responsible for analyzing the data to address the research objectives and who know best about what concepts should be measured in the survey.Note that in some disciplines (for example, econometrics), specification error means including the wrong variables in a model, such as a regression model, or leaving important variables out of the model. In our terminology, specification error does not refer to a model, but a question on the questionnaire.”
It should also be noted that specification errors are more common in business and institutional surveys than in household surveys.

The Survey Statistician, no. 51, pages 20-21, January 2005

What purposes can web surveys be used for?
Web surveys can be used in some cases to make inferences to some populations and in some cases as a qualitative research tool. The design of the web survey is the critical factor. For instance if it is the case that a client can provide a reasonably complete email list of the target population then a web survey is a methodology that can be used to make statistical inferences. Some examples of realistic email lists for web surveys include lists of employees for a employee satisfaction survey and lists of purchasers at a e-commerce website to measure customer loyalty.
Another design that can be used for inferential purposes involves two modes of data collection – a Random Digit Dial (RDD) screening survey to identify eligible respondents that are then asked to go to a website to complete the survey.
One important part of web surveys that are designed for inferential purposes includes using an access code so respondents can only complete the survey once.Knowledge Networks ( ) uses another web survey design. They use a RDD survey to recruit households to join their panel. They provide recruited households with a web device that allows these households to complete web surveys. Their website has various case studies and white papers that discuss the quality of the data collected in their panel.Harris Interactive has created a large panel of Internet users. This panel is not representative of Internet users. They do this by conducting parallel surveys, asking their panel members and respondents to an RDD survey the same set of questions. They employee propensity scoring (Rosenbaum and Rubin, 1983) methods to adjust the results of web surveys of their panel members.
Many web surveys involve a non-random selection of respondents. Many are opt-in surveys where the potential for a self-selection bias exists. These surveys may be implemented as “pop up” windows at a web site. In other cases a web site is open for interviewing and potential respondents may receive email invitations or see invitations (with the URL of the website) at various other websites or in other media. In other cases panels of web users can be created and surveyed as the need arises. These types of design provide qualitative data and generally should not be used for inferential purposes.

Rosenbaum, P.R. and Rubin, D.B., 1983. “The Central Role of the Propensity Score in Observational Studies for Casual Effects.”  
Biometrika 70 (1): 41-55.

The Survey Statistician, no. 50, pages 15-16, July 2004

Now that cell phones are so frequently used: What is the current status of telephone surveys?
Editorial note: The use of telephone surveys has varied between countries depending on the degree to which households have had access to a telephone in their homes. The progress of the new technology had also progressed differently. We have therefore asked experts from three different parts of the world to respond to this question.Dennis Trewin, Australian Bureau of StatisticsAn important aspect of all surveys is to have a good sample frame. This has always been problematic with random digit dialing, even more so with the increasing availability of cell phones. For surveys where households are visited more than once (e.g. a monthly labour force survey), area frameworks can be used with the first interview conducted using face to face interviews. A telephone number can be obtained, perhaps a cell phone, for subsequent interviews. We have found the additional cost associated with the first interview being face to face to be a very worthwhile investment in terms of the improved quality.For one off surveys, random digit dialing is then a possibility if a telephone framework is not available. This could include cell phone numbers. Of course, “probabilities” would have to be carefully assessed if we include cell phone numbers as it is more likely that households have multiple phones.I would also caution that all the survey errors and costs be carefully evaluated and compared with the costs of face to face interviewing. The cost of an area frame that is established to support a full range of household surveys, as is often the case for larger survey organisations, can be affordable on a per survey basis as the costs can be amortised over many surveys. It may well be that random digit dialing is a false economy. Certainly in Australia we have not found it an attractive option even though nearly all households have telephones.Edith de Leeuw, MethodikA Amsterdam (with thanks to Fred Bronner, Albert Emmering, Ger Snijkers, and André Zijdenbos)

The very first telephone surveys were short 10-minute surveys with very simple questions; this was back in 1970. From this simple beginning the telephone survey evolved into a scientific data collection method and became a serious threat to the face-to-face method between 1980-1990. Now in the 21-st century, the question is raised if telephone surveys still have a future. Changes in technology and in society are threatening the validity of telephone surveys. The use of answering machines and other screening devices makes it more difficult to contact respondents, the growing telephone SPAM makes it more difficult to convince respondents to cooperate, and the growing number of mobile (cell) phones is a special challenge for coverage and sampling.

At present the number of cell phone-only persons is limited to special groups (e.g., students), who were always difficult to reach in standard consumer research. Most households still have a fixed-line in their main dwelling, and only during weekends and in summer, when people are away does it pay to incorporate cell phones in standard telephone surveys. A good example is the Finish Labour Force survey in the month July. Incorporating cell phones asks an adaptation of the methodology and will increase surveys costs. For instance, a cell phone is personal a fixed line is a household, which has implications for sampling. For business-to-business surveys cell phones are less of a problem.

Telephone surveys still have a future and survey methodologists are working hard to overcome the challenges by adapting old methods and developing new methods. Telephone surveys are necessary because face-to-face surveys are too costly, especially in sparsely populated areas and are only used in special cases where interviewers have to perform extra tasks (e.g., observe behaviour or administer tests as in health surveys). Web surveys are still limited to special groups only. Telephone surveys are flexible and combine the personal extras of interviewers with lower costs. Especially in mixed mode designs, telephone surveys will be indispensable for the coming time. It will be used as major mode in household surveys, but also as a prenotification or reminder in web-surveys of individuals and businesses and in electronic data exchange procedure for establishment surveys. Furthermore, it is an excellent selection or screening tool for internet or access panels.

Recommended reading: Gad Nathan (2001), Telesurvey methodologies for household surveys-A review and some thoughts for the future.  
Survey Methodology
, 27, 1, 7-31

Mike Brick, Westat, USA

In the United States and Canada researchers use telephones for both sampling households and as a mode for conducting interviews. The increasing prevalence of cell phones has different but substantial effects on both of these uses that are discussed below.

Since Waksberg (1978) first introduced an efficient and valid probability method for random digit dialing (RDD), all RDD methods sample only landline telephone numbers. Blumberg, Luke, and Cynamon (2004) found that in the first half of 2003 only 3% of U.S. adults lived in households with only cell phones. However, this percentage is likely to grow substantially over time and result in greater noncoverage. A related problem associated with the increased use of cell phones is a recently implemented regulation that allows people to switch from landline service to a cell phone and keep the same telephone number. Although a large number of persons may not choose this option, it could still make RDD sampling even more difficult. Furthermore, with cell phones in over 60% of households already, RDD response rates may already be suffering deleterious effects. If cell phone users in households with landlines primarily use their cell phones, it may be more difficult to contact and interview these persons on their landlines.

The effects of the proliferation of cell phones on the use of the telephone as a mode of data collection are less clear. One possibility is that people may be more available and willing to be interviewed from their cell phones. While cell phones may be perceived by respondents as providing a more convenient or private option, government agencies have concerns about the confidentiality of these interviews because of the ability to intercept these conversations. Since a cell phone interview could be done while the respondent is also doing another activity such as driving a car, the interviewer must pay some attention to the concurrent activites of the respondent beyond what is currently required for interviews over landlines. An ethical issue arises because the person receiving a call on a cell phone is responsible for the charges in the U.S. Cell phone users who do not wish to participate in the survey are still responsible for the cost of calls made by the survey organization to that phone. This issue may be resolved by revisions in the costing structure in the U.S., but until then, alternatives such as monetary incentives may be necessary.

The increase in cell phone usage presents serious challenges to both RDD sampling methods and the use of the telephone as a mode of data collection. Research to address these challenges has begun, but much work and ingenuity are required. One direction of research is the renewed interest in mixed mode data collection. The dynamics of the technological changes in telephony is likely to continue to require frequent modifications in survey methodology.

Blumberg, S., Luke, J., and Cynamon, M. (2004). Has cord-cutting cut into random-digit-dialed health surveys? The prevalence and impact of wireless substitution.Proceedings of the Eights Conference on Health Survey Research Methods, Atlanta, GA.

Waksberg, J. (1978). Sampling methods for random digit dialing. Journal of the American Statistical Association73, 40-46.

The Survey Statistician, no. 50, pages 16-18, July 2004

What is the possible impact of question ordering and of adding questions to the end of a questionnaire?
Maurius Cronje provided a very interesting discussion of the effect of questionnaire wording and order in his article in the July 2003 Survey Statistician (p. 27-31). Not only can the answer to a question be affected by whether or not some other question precedes it, as discussed in the article, but it can even be affected by whether or not other questions come later in the survey. Shapiro (1987) gave three examples of such an occurrence.The most dramatic example involved the National Health Interview Survey. This survey is conducted every week by the U. S. Bureau of Census for the National Center for Health Statistics. The survey is conducted face-to-face by well-trained interviewers who generally work on the survey for several years. Among other things, the survey always asks about acute health conditions noticed by the respondent within the two weeks prior to the week of interview. (An acute health condition is defined as a condition which has lasted less than three months and which has involved either medical attention or restricted activity.)
For two years, 1973 and 1974, there were at least 55 additional questions asked about the reported acute health conditions. Table 1 is an abbreviated version of the table provided in the 1987 paper. The table shows that the number of acute conditions per 100 persons declined by 20.3 percent from 1972 to 1973 and increased by 20.7 percent from 1974 to 1975.Table 1 Number of Acute Conditions per 100 Persons per Year

1971 1972  1973  1974  1975
218.5 219.7 175.1 175.7 212.0

The 1987 paper noted that the results were not obtained under controlled experimental conditions, but that it is highly unlikely that health conditions truly changed in 1973 and 1974. It is not possible to determine the cause of the decline in the years when the supplementary questions were asked, but the paper stated that there were two plausible explanations. First, the interviewer did not want to burden him/herself, or to burden the respondent, with more questions and therefore classified some respondents incorrectly. Second, the presence of the long supplement caused interviewers to rush through the questions to complete the interview more quickly, resulting in less complete reporting by respondents.

Shapiro, G. (1987). Interviewer-Respondent bias resulting from adding supplemental questions. Journal of Official Statistics, 3, (2), 155-168.

The Survey Statistician, no. 50, page 19, July 2004


How can imputation be trusted since it creates artificial values?
In one way or another, surveys have to deal with the problem of missing values. Different reasons may explain the presence of missing values, such as refusal to provide the desired information for at least one question or an impossibility to contact a given unit. Missing values can also be created at the editing stage of the survey in an attempt to resolve problems of inconsistent or suspect responses.
To deal with missing values, many estimation techniques such as maximum likelihood estimation, nonresponse weight adjustment and imputation can be used. Choosing to use imputation is often based on practical considerations. For instance, imputation is convenient for ultimate users since it creates a complete rectangular file, which can be used to obtain estimates of population parameters of interest as if there were no missing value. This property is particularly useful when dealing with item nonresponse, where missing values occur for some but not all variables in the survey. Also, imputation ensures some consistency between estimates produced by different users.
Although imputation is usually a very convenient method of compensating for missing values, it is well known that imputed values cannot be treated as true values when making inferences about unknown population parameters. In fact, the real goal of imputation is to help support estimation in order to make appropriate inferences rather than simply predict values of micro data. However, to achieve this goal, imputation does consists in predicting each individual missing value, but of course, this does not necessarily mean that the imputed value for a given unit is a high-quality estimate for the true unknown value. Imputation methods must be developed in such a way as to lead to reasonably high-quality estimates, at least at certain aggregate levels.In order to make inferences in the presence of missing values, assumptions about the unknown mechanism that generates missing values, i.e. the nonresponse mechanism, are needed. These assumptions are called nonresponse model. This is to be contrasted to sampling theory, where the mechanism that generates samples is completely known to the statistician.
Often, the nonresponse model only requires that the nonresponse mechanism be independent of the variables of interest after conditioning on some auxiliary variables observed for all sample units. In such a case, a model for the variables of interest, i.e. an imputation model, is needed. The imputation model is usually the key to obtain efficient predictions, or efficient imputations, for the missing values. In particular, the use of auxiliary variables well correlated to the variables of interest is important to reduce the error in the estimates due to missing values. Therefore, to the extent possible, it is crucial to validate all model assumptions underlying the imputation strategy in order to make valid inferences in the presence of missing values.
If a careful modeling effort is performed, then imputation can be trusted as a method of treatment of missing values.
Finally, it is important to note that missing values lead to estimates that are more variable than those that would be obtained if the entire sample could be observed. As a result, variance estimates derived under the assumption of full response are not valid in the presence of missing values. Therefore, the imputation strategy and/or the variance estimation approach must take imputation into account in order to make valid inferences.

The Survey Statistician, no. 51, pages 18-19, January 2005


We have designed an establishment survey to provide estimates of value added on the ISIC 1-2 digit level. Is it possible to obtain estimates on the 3 and 4 ISIC level using small area estimation techniques?

Suppose that simple random samples are drawn independently from each ISIC 2 digit level group, treating the latter as strata. Also let us assume that the group-specific direct estimates of value added provide adequate precision on the 2 -digit level. The question is whether the sample data can also be used to make reliable estimates of value added on the lower 3 and 4 level groups.

Clearly, direct estimates treating the lower level groups as domains will be inadequate due to unduly small sample sizes in many of the domains (even zero in several 4 level groups). It is therefore necessary to employ indirect estimates based on small area (or domain) techniques. Such estimates “borrow strength” by using the sample values from related domains, thus increasing the “effective” sample size in the domains. These values are brought into the estimation process through a model (explicit or implicit) that provides a link to related domains through the use of supplementary data related to the variable of interest, such as recent census counts and current administrative records.
Availability of good auxiliary data and determination of suitable linking models are crucial to the formulation of reliable indirect domain estimates. Explicit models should be preferred because (1) such models can be validated from the sample data, (2) efficient indirect estimates can be derived under assumed models, (3) estimates of mean squared error can be obtained, and (4) variety of models can be developed depending on the complexity of data structures.

A model-based indirect estimate is typically of the form of a weighted average of a direct estimate and a “synthetic” estimate if the domain sample size is non-zero; otherwise, it has the form of a synthetic estimate that uses data from all the domains that are linked together. It is better to try small area estimation techniques on 3 level groups first before going to 4 level groups. A detailed account of small area estimation techniques is given in my book “Small Area Estimation”, Wiley 2003. The book “ Indirect Estimation in U.S. Federal Programs”, Springer 1996, edited by W. A. Schaible provides applications of indirect estimation in U.S. Federal Programs.

J. N. K. Rao
School of Mathematics and Statistics
Carleton University
Ottawa, Canada
E mail:

The Survey Statistician, no. 51, pages 21-22, January 2005

What is the bias of p, for estimating a proportion P when misclassificatons occur?
The misclassification model can be used as a survey model for dichotomous variables. It postulates the existence of two misclassification probabilities:α = the probability that an individual, who actually has not got the characteristic under study, erroneously is classified as having it.
β = the probability that an individual, who actually has got the characteristic, erroneously is classified as not having it.These two error types are usually called “false positives” and “false negatives” using terminology from medical diagnosis where the model has been extensively used.The mathematical expectation of p is:
Ep = P(1 – β) + (1 – P)αAnd the bias
Bp = Ep – P = α – P(α + β)The bias of the estimator p is thus a linear function of P with the intercept α and slope -(α + β)If P = 0 the bias will become α since all positive responses will be false positive ones.
If P = 1 the bias will become -β since all negative responses will be false negatives.Om P = α/(α + β) ; that is if α/β = P/(1-P) the estimator p will be unbiased.

Example: If both misclassification probabilities are 0,05 and the parameter under estimation 0,10:
α = β = 0,05                   P = 0,1

The expectation of p is Ep = 0,14 and the bias Bp =0,04.

The Survey Statistician, no. 51, page 21, January 2005


What are the pros and cons of letting users have access to data micro files?
It is not very easy to give a short reply to such a broad question because one should consider at least ethical, confidentiality, and usability aspects of the issue. From the information contents it is quite evident that the basic micro is the only source to investigate all dependencies. Every time the data are aggregated to some higher level, say from enterprises to industries, or individuals/households to some geographical or other domain tables, the researcher will lose some information.
Ecological fallacy is a term which describes the aggregation bias: one cannot derive firm conclusions at the basic level when the model was fitted at the aggregated level while the opposite is normally amenable, i.e. conclude at higher levels when basic data were used.In the context of social surveys it is a tradition to analyse the basic data.
Even if there is a choice whether to use basic data or similar data aggregated to multiway tables the natural choice is in favour of basic data despite the fact one can apply nearly or exactly the same model in analysis. However, in business surveys and studies using, e.g. national accounts data, the situation is slightly different.
Early econometric methods were developed to the aggregated data and the modifications to fit micro data appeared much later. It was obvious because enterprise or local unit micro data were not available. But now there are micro data and much of the recent econometric analysis are based on those.Recently a lot of efforts have been devoted to merging basic data sets with each other and/or administrative and statistical registers by record linkage or statistical matching. The outcome of those data sets provides users (whether statistical agencies or researchers) with much richer data for analysis. But there is a clear drawback: the probability of disclosure will increase.Ethical, legal and confidentiality issues are the major cons of access to basic data, and they are linked together.
The ISI declaration of professional ethics say ”Statisticians are frequently furnished with information by the funder or employer who may legitimately require it to be kept confidential. Statistical methods and procedures that have been utilised to produce published data should not, however, be kept confidential.” That declaration is to be updated and probably the coming version will contain much clearer ethical guidelines on confidentiality, as do many of the national and international rules.
Many statistical offices can provide researchers with basic data but they set conditions on the use. For example, such that the data are only used for the research purposes (possibly limited by subject and time), researchers are not allowed to try to reveal the informants by any means, the results and publication may be required to be investigated by the data provider etc. And the basic data sets are many times controlled by the data providers to check that the confidentiality rules fulfilled. Data may also be perturbated, or synthetized in order to avoid confidentiality violations. But even with those methods such data sets exist whose disclosure cannot totally be avoided. Especially that is related with very skewed distributions, typical in business data and some other rare events in general.Currently there is a lot of research on confidentiality control methods which will hopefully give new tools to the data providers. The new data access methods via internet and other electronic networks develop so rapidly that new and strong methods are really needed.

The Survey Statistician, no. 50, pages 18-19, July 2004

What is meant by blurring in the context of statistical disclosure control?
Blurring in the context of statistical disclosure control has not a negative meaning as in the usual context. It means that a reported value is replaced by an average. There are many possible ways to implement blurring. Groups of records for averaging may be formed by matching on other variables or by sorting on the variable of interest. The number of records in a group (whose data will be averaged) may be fixed or random. The average associated with a particular group may be assigned to all members of a group, or to the “middle” member (as in a moving average). It may be performed on more than one variable with different groupings for each variable. More information on statistical disclosure control can be found in the glossary that has been published under the knob GLOSSARY at the home page of the CASC project on the internet:

The Survey Statistician, no. 54, page 16, July 2006


What can you tell about survey quality awareness?
Quality awareness among users and producers are, of course, of great importance for statistical surveys as well as in any other branch. For statistical surveys, however, quality has many facets, some of which can be evaluated by means of a so-called customer satisfaction survey. For instance the user as well as the producer of a statistical survey easily obtains the timeliness accuracy. For other aspects of survey quality, more sophisticated methods must be used to assess the quality. Most important is the accuracy of the survey.Accuracy is an indicator of the degree, to which the user can rely on the results of a survey. The accuracy should meet the needs of the users and the inevitable inaccuracies should be properly reported, thus enabling the users to make their own judgements as to whether the quality of the data support intended uses or not.Perfect survey quality is merely a theoretical ideal. Inaccuracies come from errors in the survey. Sources of errors are the sampling procedure (sampling errors) and other steps of the survey (nonsampling errors) particularly nonresponse and measurement errors.It is crucial that the producer obtains a fair assessment of the accuracy of the survey by means of evaluation efforts like the computing of a confidence intervals and nonsresponse rates, to mention the two most common measures. Thus, the producer’s perceived accuracy is obtained. It is equally important that this is communicated to the users to affect their perceived accuracy. This is done by means of a quality declaration.Generally, the assessment of quality measures of random errors, i.e. sampling quality, is readily obtained by the computation of confidence intervals (or a point estimates together with their coefficients of variation, CV) that measure the very uncertainty associated with the sampling error, taking advantage of the sampling theory. Unfortunately, there is no unified theory that evaluates response quality in a similar way. Actually the quality awareness varies with the different sources of errors. This may be described by the quality awareness ladder, introducing the following four steps of quality awareness:Step 4:   Quantitative measure of inaccuracy (confidence interval) available
Step 3:   Quantitative measure of indicator of issue (nonresponse rate) available
Step 2:   Vague awareness of quality issue occurs (as often for measurement error)
Step 1:   No awareness of quality issue (sometimes detected “by accident”)Obviously, the quality awareness varies from one issue to another. Sampling errors represent the highest step of quality awareness, whereas indicators or judgments normally evaluate nonsampling errors. There is no single measure that brings together all the uncertainty of a statistical survey. However, much work in the realm of survey methods development today is devoted to the bringing up of the awareness of quality issues up the ladder, and much of this work has been successful, though far from completed in terms of a single “total error quality indicator”. Especially, works on measurement error models and variance due to imputation makes it possible to provide quality information beyond that of rates.Statistical agencies have developed, and are using, quality guidelines and/or policies to help staff have quality in their mind during all steps of surveys and to provide a framework for correctly informing the users of data quality.

Christianson A and Polfeldt T (1995): Evaluation and Improvement of Response Quality at Statistics Sweden – A Research Approach. In Statistics Sweden: European Harmonization, National Decentralization and Quality. Proceedings of the 1st International Conference on Methodological Issues in Official Statistics. Stockholm, June 12-13 1995.

Christianson A and Tortora R D (1995) Issues in Surveying Businesses: An International Survey. Chapter 14 of Cox B G, Binder D A, Nanjamma Chinnappa B, Christianson A, Colledge M J, and Kott P S. Business Survey methods. Wiley ISBN 0-471-59852-6.
The Survey Statistician, no. 54, pages 16-17, July 2006

What is a tolerable nonresponse rate?

This is a somewhat controversial question. Historically, the view upon what is a tolerable nonrespons rate, was stricter, say 40 years ago, than it is today. Some agencies even applied so called minimum permformance standards, implying that survey results were suppressed when standards were not met. Improved weighting methods and a hardening survey climate has made the view upon what is a tolerable nonresponse rate more liberal. However, there is still a common agreement in two respects:
First, it depends on the purpose of the survey if the nonresponse rate should be considered acceptable or not. Some decisions demand a higher degree of accuracy than others do. It is the survey users’ responsibilty to take the uncertainty of nonresponse into account when she or he makes decisions.

Thus, it s the survey producer’s (if other than the user) obligation to communicate a fair account of this uncertainty to the users. There are also limits as to when a probability sample still is a probability sample when the nonresponse increases, thus jeopardizing the basis of inferense from a sample to the population, from which it is drawn.

So, there is no straightforward answer to the question in terms of a specific percentage. An interesting discussion on this, giving different points of view, is to be found in the July 2002 issue of the Survey Statistician in the “discussion corner” starting with an article by the title Avoid the Need to Impute.

The Survey Statistician, no. 52, page 16, July 2005

What can I do to enhance the quality of retrospective reports?
The quality of retrospective reports depends on the quality of autobiographical remembering. Errors arise because all remembering is reconstructive rather than reproductive, and the accuracy of memory reconstructions depend (among other things) on the degree to which information is encoded when events occur, how well information is stored over time, and the extent to which the conditions at retrieval provide effective retrieval cues. With regard to retrospective reports, survey researchers only have some degree of control on the conditions of retrieval. As my work has shown, providing more effective retrieval cues in questionnaires can enhance the quality of retrospective reports. One line of studies has been based on using episodic cues to help participants overcome potential source monitoring errors, as can happen when people remember thinking about voting, or their usual voting behavior, as evidence that they had voted in the last election (Belli, Traugott, Young, & McGonagle, 1999; Belli & Moore, 2004). Another line of studies focuses on calendar-based interviewing methodologies, which permit an ability to utilize retrieval cues that exist within the structure of autobiographical memory to an extent greater than that permissible in traditional standardized question-list (Q-list) methods (Belli, 1998). Studies which have compared calendar-based to standardized Q-list interviewing methodologies have shown higher quality retrospective reports with the calendar-based approaches (Belli, 2004; Belli, Shay, & Stafford, 2001; van der Vaart, 2004). In many cases, additional costs in terms of increased interviewing time are negligible, and there have been no observed increase in interviewer variance, despite the more conversational and flexible nature of calendar-based interviewing (Belli, Lee, Stafford, & Chou, 2004). Finally, calendar-based interviewing has demonstrated its advantages in comparison to Q-list approaches with 2-year and life course reference periods, in both face-to-face and telephone modes, and in both paper and pencil and computer-assisted interviewing data collection methods.

Belli, R. F. (1998). The structure of autobiographical memory and the event history calendar: Potential improvements in the quality of retrospective reports in surveys.Memory6, 383-406.

Belli, R. F. (2004, August) Improving the Quality of Retrospective Reports: Calendar Interviewing Methodologies. Paper presented at the sixth international conference on logic and methodology, Amsterdam, the Netherlands.

Belli, R. F., Lee, E. H., Stafford, F. P., & Chou, C-H. (2004). Calendar and question-list survey methods: Association between interviewer behaviors and data quality.Journal of Official Statistics20, 185-218.

Belli, R. F., & Moore, S. E. (2004). An experimental comparison of question formats used to reduce vote overreporting. Manuscript submitted for publication.

Belli, R.F., Shay, W. L., & Stafford, F. P. (2001). Event history calendars and question list surveys: A direct comparison of interviewing methods. Public Opinion Quarterly65, 45-74.

Belli, R. F., Traugott, M. W., Young, M., & McGonagle, K. A. (1999). Reducing vote overreporting in surveys: Social desirability, memory failure, and source monitoring. Public Opinion Quarterly63, 90-108.

van der Vaart, W. (2004). The time-line as a device to enhance recall in standardized research interviews: A split ballot study. Journal of Official Statistics20, 301-317.

The Survey Statistician, no. 51, pages 19-20, January 2005

IASS members are welcome to submit questions using the form below.

The IASS does not commit to answer questions asked by non-members.
Names and email addresses will be not disclosed.

    Your Name (required)

    Your Email (required)


    Your Message