PREVContentsINDEX

Section 8 Robust Nonlinear regression with AD Model Builder

The code for the admodel template for this example is found in the file VONB.TPL. This example is intended to demonstrate the advantages of using AD Model Builder's robust regression routine over standard nonlinear least square regression procedures. Further discussion about the underlying theory can be found in the AUTODIF User's Manual, but it is not necessary to understand the theory to make use of the procedure.

Results for nonlinear regression with good data set 1pt

 index         value      std dev       1       2       3   
    1   Linf  5.4861e+01 2.4704e+00  1.0000
    2   K     1.7985e-01 2.7127e-02 -0.9191  1.0000
    3   t0    1.8031e-01 2.9549e-01 -0.5856  0.7821  1.0000
\medbreak This example estimates the parameters describing a growth curve from a set of data consisting of ages and size-at-age data. The form of the (von Bertalanffy) growth curve is assumed to be

The three parameters of the curve to be estimated are , , and .

Let and be the observed size and age of the 'th animal. The predicted size is given by equation . The least squares estimates for the parameters are found by minimizing


DATA_SECTION
  init_int nobs;
  init_matrix data(1,nobs,1,2)
  vector age(1,nobs);
  vector size(1,nobs);
PARAMETER_SECTION
  init_number Linf;
  init_number K;
  init_number t0;
  vector pred_size(1,nobs)
  objective_function_value f;
PRELIMINARY_CALCS_SECTION
  // get the data out of the columns of the data matrix 
  age=column(data,1);
  size=column(data,2);
  Linf=1.1*max(size);  // set Linf to 1.1 times the longest observed length
PROCEDURE_SECTION
  pred_size=Linf*(1.-exp(-K*(age-t0)));
  f=regression(size,pred_size);
Notice the use of the regression function which calculates the log-likelihood function of the nonlinear least-squares regression. This part of the code is formally identical to the code for the linear regression problem in the simple example even though we are now doing nonlinear regression. A graph of the least-square estimated growth curve and the observed data is given in figure 1. The parameter estimates and their estimated standard deviations which are produced by AD Model Builder are also given. For example the estimate for is 54.86 with a standard deviation of 2.47. Since a 95% confidence limit is about two standard deviations the usual 95% confidence limit of for this analysis would be .

A disadvantage of least squares regression is the sensitivity of the estimates to a few ``bad'' data points or outliers. Figure 2 show the least squares estimates when the observed size for age 2 and age 14 have been moved off the curve. There has been a rather large change in some of the parameter estimates. For example the estimate for has changed from to and the estimated standard deviation for this parameter has increased to This is a common effect of outliers on least-squares estimates. They greatly increase the size of the estimates of the standard deviations. As a result the confidence limits for the parameters are increased. In this case the 95% confidence limits for have been increased from to .

\medbreak Of course for this simple example it could be argued that a visual examination of the residuals would identify the outliers so that they could be removed. This is true, but in larger nonlinear models it is often not possible or convenient to identify and remove all the outliers in this fashion. Also the process of removing ``inconvenient'' observations from data can be uncomfortably close to ``cooking'' the data in order to obtain the desired result from the analysis. An alternative approach which avoids these difficulties is to employ AD Model Builder's robust regression procedure which removes the undue influence of outlying points without the need to expressly remove them from the data.

\vfil

Nonlinear regression with bad data set 1pt


 Nonlinear regression with bad data set
 index         value      std dev       1       2       3   
    1   Linf  4.8905e+01 5.9938e+00  1.0000
    2   K     2.1246e-01 1.2076e-01 -0.8923  1.0000
    3   t0   -5.9153e-01 1.4006e+00 -0.6548  0.8707  1.0000

Section 9 Modifying the model to use robust nonlinear regression

To invoke the robust regression procedure it is necessary to make three changes to the existing code. The template for the robust regression version of the model can be found in the file VONBR.TPL.


DATA_SECTION
  init_int nobs;
  init_matrix data(1,nobs,1,2)
  vector age(1,nobs)
  vector size(1,nobs)
PARAMETER_SECTION
  init_number Linf
  init_number K
  init_number t0
  vector pred_size(1,nobs)
  objective_function_value f
  init_bounded_number a(0.0,0.7,2)
PRELIMINARY_CALCS_SECTION
  // get the data out of the columns of the data matrix 
  age=column(data,1);
  size=column(data,2);
  Linf=1.1*max(size);  // set Linf to 1.1 times the longest observed length
  a=0.7;
PROCEDURE_SECTION
  pred_size=Linf*(1.-exp(-K*(age-t0)));
  f=robust_regression(size,pred_size,a);
The main modification to the model involves the addition of the parameter a, which is used to estimate the amount of contamination by outliers. This parameter is declared in the PARAMETER_SECTION.

  init_bounded_number a(0.0,0.7,2)
This introduces two concepts, putting bounds on the values which initial parameters can take on and carrying out the minimization in a number of stages. The value of a should be restricted to lie between 0.0 and 0.7 (See the discussion on robust regression in the AUTODIF user's manual if you want to know where the 0.0 and 0.7 come from). This is accomplished by declaring a to be of type init_bounded_number. In general it is not possible to estimate the parameter a determining the amount of contamination by outliers until the other parameters of the model have been ``almost'' estimated, that is, until we have done a preliminary fit of the model. This is a common situation in nonlinear modeling and is discussed further in some later examples. So we want to carry out the minimization in two phases. During the first phase a should be held constant. In general for any initial parameter the last number in its declaration, if present, determines the number of the phase in which that parameter becomes active. If no number is given the parameter becomes active in phase 1. (Note: For an init_bounded_number the upper and lower bounds must be given so the declaration

  init_bounded_number a(0.0,0.7)
would use the default phase 1. The 2 in the declaration for a causes a to be constant until the second phase of the minimization. The second change to the model involves the default initial value a. The default value for a bounded number is the average of the upper and lower bounds. For a this would be which is too small. We want to use the upper bound of . This is done by adding the line

  a=0.7;
in the PRELIMINARY_CALCS_SECTION. Finally we modify the statement in the PROCEDURE_SECTION including the regression function to

  f=robust_regression(size,pred_size,a);
to invoke the robust regression function. That's all there is to it! These three changes will convert any AD Model builder template from a nonlinear regression model to a robust nonlinear regression model.

Robust Nonlinear regression with bad data set 1pt


 index         value      std dev       1       2       3       4   
    1   Linf  5.6184e+01 3.6796e+00  1.0000
    2   K     1.6818e-01 3.4527e-02 -0.9173  1.0000
    3   t0    6.5129e-04 4.5620e-01 -0.5483  0.7724  1.0000
    4   a     3.6144e-01 1.0721e-01 -0.1946  0.0367 -0.2095  1.0000
The results for the robust regression fit to the bad data set are shown in figure 3. The estimate for is with a standard deviation of to give a 95% confidence interval of about . Both the parameter estimates and the confidence limits are much less affected by the outliers for the robust regression estimates than they are for the least squares estimates. The parameter a is estimated to be equal to which indicates that the robust procedure has detected some moderately large outliers.

The results for the robust regression fit to the good data set are shown in figure 4. The estimates are almost identical to the least-square estimates for the same data. This is a property of the robust estimates. They do almost as well as the least-square estimates when the assumption of normally distributed errors in the statistical model is satisfied exactly, and they do much better than least square estimates in the presence of moderate or large outliers. You can lose only a little and you stand to gain a lot by using these estimators.

Robust Nonlinear regression with good data set 1pt

 index         value      std dev       1       2       3       4   
    1   Linf  5.5707e+01 1.9178e+00  1.0000
    2   K     1.7896e-01 1.9697e-02 -0.9148  1.0000
    3   t0    2.1490e-01 2.0931e-01 -0.5604  0.7680  1.0000
    4   a     7.0000e-01 3.2246e-05 -0.0001  0.0000 -0.0000  1.0000

NEXT