index value std dev 1 2 3 1 Linf 5.4861e+01 2.4704e+00 1.0000 2 K 1.7985e-01 2.7127e-02 -0.9191 1.0000 3 t0 1.8031e-01 2.9549e-01 -0.5856 0.7821 1.0000\medbreak This example estimates the parameters describing a growth curve from a set of data consisting of ages and size-at-age data. The form of the (von Bertalanffy) growth curve is assumed to be
Let and be the observed size and age of the 'th animal. The predicted size is given by equation . The least squares estimates for the parameters are found by minimizing
DATA_SECTION init_int nobs; init_matrix data(1,nobs,1,2) vector age(1,nobs); vector size(1,nobs); PARAMETER_SECTION init_number Linf; init_number K; init_number t0; vector pred_size(1,nobs) objective_function_value f; PRELIMINARY_CALCS_SECTION // get the data out of the columns of the data matrix age=column(data,1); size=column(data,2); Linf=1.1*max(size); // set Linf to 1.1 times the longest observed length PROCEDURE_SECTION pred_size=Linf*(1.-exp(-K*(age-t0))); f=regression(size,pred_size);Notice the use of the regression function which calculates the log-likelihood function of the nonlinear least-squares regression. This part of the code is formally identical to the code for the linear regression problem in the simple example even though we are now doing nonlinear regression. A graph of the least-square estimated growth curve and the observed data is given in figure 1. The parameter estimates and their estimated standard deviations which are produced by AD Model Builder are also given. For example the estimate for is 54.86 with a standard deviation of 2.47. Since a 95% confidence limit is about two standard deviations the usual 95% confidence limit of for this analysis would be .
A disadvantage of least squares regression is the sensitivity of the estimates to a few ``bad'' data points or outliers. Figure 2 show the least squares estimates when the observed size for age 2 and age 14 have been moved off the curve. There has been a rather large change in some of the parameter estimates. For example the estimate for has changed from to and the estimated standard deviation for this parameter has increased to This is a common effect of outliers on least-squares estimates. They greatly increase the size of the estimates of the standard deviations. As a result the confidence limits for the parameters are increased. In this case the 95% confidence limits for have been increased from to .
\medbreak Of course for this simple example it could be argued that a visual examination of the residuals would identify the outliers so that they could be removed. This is true, but in larger nonlinear models it is often not possible or convenient to identify and remove all the outliers in this fashion. Also the process of removing ``inconvenient'' observations from data can be uncomfortably close to ``cooking'' the data in order to obtain the desired result from the analysis. An alternative approach which avoids these difficulties is to employ AD Model Builder's robust regression procedure which removes the undue influence of outlying points without the need to expressly remove them from the data.
Nonlinear regression with bad data set 1pt
Nonlinear regression with bad data set index value std dev 1 2 3 1 Linf 4.8905e+01 5.9938e+00 1.0000 2 K 2.1246e-01 1.2076e-01 -0.8923 1.0000 3 t0 -5.9153e-01 1.4006e+00 -0.6548 0.8707 1.0000
To invoke the robust regression procedure it is necessary to make three changes to the existing code. The template for the robust regression version of the model can be found in the file VONBR.TPL.
DATA_SECTION init_int nobs; init_matrix data(1,nobs,1,2) vector age(1,nobs) vector size(1,nobs) PARAMETER_SECTION init_number Linf init_number K init_number t0 vector pred_size(1,nobs) objective_function_value f init_bounded_number a(0.0,0.7,2) PRELIMINARY_CALCS_SECTION // get the data out of the columns of the data matrix age=column(data,1); size=column(data,2); Linf=1.1*max(size); // set Linf to 1.1 times the longest observed length a=0.7; PROCEDURE_SECTION pred_size=Linf*(1.-exp(-K*(age-t0))); f=robust_regression(size,pred_size,a);The main modification to the model involves the addition of the parameter a, which is used to estimate the amount of contamination by outliers. This parameter is declared in the PARAMETER_SECTION.
init_bounded_number a(0.0,0.7,2)This introduces two concepts, putting bounds on the values which initial parameters can take on and carrying out the minimization in a number of stages. The value of a should be restricted to lie between 0.0 and 0.7 (See the discussion on robust regression in the AUTODIF user's manual if you want to know where the 0.0 and 0.7 come from). This is accomplished by declaring a to be of type init_bounded_number. In general it is not possible to estimate the parameter a determining the amount of contamination by outliers until the other parameters of the model have been ``almost'' estimated, that is, until we have done a preliminary fit of the model. This is a common situation in nonlinear modeling and is discussed further in some later examples. So we want to carry out the minimization in two phases. During the first phase a should be held constant. In general for any initial parameter the last number in its declaration, if present, determines the number of the phase in which that parameter becomes active. If no number is given the parameter becomes active in phase 1. (Note: For an init_bounded_number the upper and lower bounds must be given so the declaration
init_bounded_number a(0.0,0.7)would use the default phase 1. The 2 in the declaration for a causes a to be constant until the second phase of the minimization. The second change to the model involves the default initial value a. The default value for a bounded number is the average of the upper and lower bounds. For a this would be which is too small. We want to use the upper bound of . This is done by adding the line
a=0.7;in the PRELIMINARY_CALCS_SECTION. Finally we modify the statement in the PROCEDURE_SECTION including the regression function to
f=robust_regression(size,pred_size,a);to invoke the robust regression function. That's all there is to it! These three changes will convert any AD Model builder template from a nonlinear regression model to a robust nonlinear regression model.
Robust Nonlinear regression with bad data set 1pt
index value std dev 1 2 3 4 1 Linf 5.6184e+01 3.6796e+00 1.0000 2 K 1.6818e-01 3.4527e-02 -0.9173 1.0000 3 t0 6.5129e-04 4.5620e-01 -0.5483 0.7724 1.0000 4 a 3.6144e-01 1.0721e-01 -0.1946 0.0367 -0.2095 1.0000The results for the robust regression fit to the bad data set are shown in figure 3. The estimate for is with a standard deviation of to give a 95% confidence interval of about . Both the parameter estimates and the confidence limits are much less affected by the outliers for the robust regression estimates than they are for the least squares estimates. The parameter a is estimated to be equal to which indicates that the robust procedure has detected some moderately large outliers.
The results for the robust regression fit to the good data set are shown in figure 4. The estimates are almost identical to the least-square estimates for the same data. This is a property of the robust estimates. They do almost as well as the least-square estimates when the assumption of normally distributed errors in the statistical model is satisfied exactly, and they do much better than least square estimates in the presence of moderate or large outliers. You can lose only a little and you stand to gain a lot by using these estimators.
index value std dev 1 2 3 4 1 Linf 5.5707e+01 1.9178e+00 1.0000 2 K 1.7896e-01 1.9697e-02 -0.9148 1.0000 3 t0 2.1490e-01 2.0931e-01 -0.5604 0.7680 1.0000 4 a 7.0000e-01 3.2246e-05 -0.0001 0.0000 -0.0000 1.0000