Section 16 A fisheries catch-at-age model

This section describes a simple catch-at age model. The data input to this model include estimates of the numbers at age caught by the fishery each year and estimates the fishing effort each year. This example introduces AD Model Builder's ability to automatically calculate profile likelihoods for carrying out Bayesian inference. To cause the profile likelihood calculations to be carried out use the -lprof command line argument.

Let index fishing years and index age classes with . The instantaneous fishing mortality rate is assumed to have the form where is called the catchability, is the observed fishing effort, is an age-dependent effect termed the selectivity, and the are deviations from the expected relationship between the observed fishing effort and the resulting fishing mortality. The are assumed to be normally distributed with mean 0. The instantaneous natural mortality rate is assumed to be independent of year and age class. It is not estimated in this version of the model. The instantaneous total mortality rate is given by . The survival rate is given by . The number of age class fish in the population in year is denoted by . The relationship is assumed to hold. Note that using this relationship if one knows then all the can be calculated from knowledge of the initial population in year , and knowledge of the recruitment in each year .

The purpose of the model is to estimate quantities of interest to managers such as the population size and exploitation rates and to make projections about the population. In particular we can get an estimate of the numbers of fish in the population in year for age classes 2 or greater from the relationship . If we have estimates for the mean weight at age , then the projected biomass level of age class fish for year can be computed from the relationship .

Besides getting a point estimate for quantities of interest like we also want to get an idea of how well determined the estimate is. AD Model Builder has completely automated the process of deriving good confidence limits for these parameters in a Bayesian context. One simply needs to declare the parameter to be of type likeprof_number. The results are given in the section on Bayesian inference.

The code for the catch-at-age model is:

  // the number of years of data
  init_int nyrs  
  // the number of age classes in the population
  init_int nages
  // the catch-at-age data
  init_matrix obs_catch_at_age(1,nyrs,1,nages)
  //estimates of fishing effort
  init_vector effort(1,nyrs)
  // natural mortality rate
  init_number M
  // need to have relative weight at age to calculate biomass of 2+
  vector relwt(2,nages)
  log_q -1 
  log_P 5
  init_number log_q(1)   // log of the catchability
  init_number log_P(1)  // overall population scaling parameter
  init_bounded_dev_vector log_sel_coff(1,nages-1,-15.,15.,2)
  init_bounded_dev_vector log_relpop(1,nyrs+nages-1,-15.,15.,2)
  init_bounded_dev_vector effort_devs(1,nyrs,-5.,5.,3)
  vector log_sel(1,nages)
  vector log_initpop(1,nyrs+nages-1);
  matrix F(1,nyrs,1,nages)   // the instantaneous fishing mortality
  matrix Z(1,nyrs,1,nages)   // the instantaneous total mortality
  matrix S(1,nyrs,1,nages)   // the survival rate
  matrix N(1,nyrs,1,nages)   // the predicted numbers at age
  matrix C(1,nyrs,1,nages)   // the predicted catch at age 
  objective_function_value f
  sdreport_number avg_F
  sdreport_vector predicted_N(2,nages)
  sdreport_vector ratio_N(2,nages)
  likeprof_number pred_B
  // this is just to invent some relative average
  // weight numbers
  // example of using FUNCTION to structure the procedure section




FUNCTION get_mortality_and_survival_rates // calculate the selectivity from the sel_coffs for (int j=1;j<nages;j++) { log_sel(j)=log_sel_coff(j); } // the selectivity is the same for the last two age classes log_sel(nages)=log_sel_coff(nages-1);

// This is the same as F(i,j)=exp(log_q)*effort(i)*exp(log_sel(j)); F=outer_prod(mfexp(log_q)*effort,mfexp(log_sel)); if (active(effort_devs)) { for (int i=1;i<=nyrs;i++) { F(i)=F(i)*exp(effort_devs(i)); } } // get the total mortality Z=F+M; // get the survival rate S=mfexp(-1.0*Z);

FUNCTION get_numbers_at_age log_initpop=log_relpop+log_P; for (int i=1;i<=nyrs;i++) { N(i,1)=mfexp(log_initpop(i)); } for (int j=2;j<=nages;j++) { N(1,j)=mfexp(log_initpop(nyrs+j-1)); } for (i=1;i<nyrs;i++) { for (j=1;j<nages;j++) { N(i+1,j+1)=N(i,j)*S(i,j); } } // calculated predicted numbers at age for next year for (j=1;j<nages;j++) { predicted_N(j+1)=N(nyrs,j)*S(nyrs,j); ratio_N(j+1)=predicted_N(j+1)/N(1,j+1); } // calculate predicted biomass for profile // likelihood report pred_B=predicted_N *relwt;

FUNCTION get_catch_at_age C=elem_prod(elem_div(F,Z),elem_prod(1.-S,N));

FUNCTION evaluate_the_objective_function // penalty functions to ``regularize '' the solution f+=.01*norm2(log_relpop); avg_F=sum(F)/double(size_count(F)); if (last_phase()) { // a very small penalty on the average fishing mortality f+= .001*square(log(avg_F/.2)); } else { // use a large penalty during the initial phases to keep the // fishing mortality high f+= 1000.*square(log(avg_F/.2)); } // errors in variables type objective function with errors in // the catch at age and errors in the effort fishing mortality // relationship if (active(effort_devs) { // only include the effort_devs in the objective function if // they are active parameters f+=0.5*double(size_count(C)+size_count(effort_devs)) * log( sum(elem_div(square(C-obs_catch_at_age),.01+C)) + 0.1*norm2(effort_devs)); } else { // objective function without the effort_devs f+=0.5*double(size_count(C)) * log( sum(elem_div(square(C-obs_catch_at_age),.01+C))); } REPORT_SECTION report << "Estimated numbers of fish " << endl; report << N << endl; report << "Estimated numbers in catch " << endl; report << C << endl; report << "Observed numbers in catch " << endl; report << obs_catch_at_age << endl; report << "Estimated fishing mortality " << endl; report << F << endl;

This model employs several instances of the init_bounded_dev_vector type. This type consists of a vector of numbers which sum to 0, that is they are deviations from a common mean, and are bounded. For example the quantities log_relpop are used to parameterize the logarithm of the variations in year class strength of the fish population. Putting bounds on the magnitude of the deviations helps to improve the stability of the model. The bounds are from -15.0 to 15.0 which gives the estimates of relative year class strength a dynamic range of .

The FUNCTION keyword has been employed a number of times in the PARAMETER_SECTION to help structure the code. A function is defined simply by using the FUNCTION keyword followed by the name of the function.

FUNCTION get_mortality_and_survival_rates
Don't include the parentheses or semicolon here. To use the function simply write its name in the procedure section.

You must include the parentheses and the semicolon here.

The REPORT_SECTION shows how to generate a report for an AD Model Builder program. The default report generating machinery utilizes the C++ stream formalism. You don't need to know much about streams to make a report, but a few comments are in order. The stream formalism associates stream object with a file. In this case the stream object associated with the AD Model Builder report file is report. To write an object xxx into the report file you insert the line

  report << xxx;
into the REPORT_SECTION. If you want to skip to a new line after writing the object you can include the stream manipulator endl as in

  report << "Estimated numbers of fish " << endl;
Notice that the stream operations know about common C objects such as strings, so that it is a simple matter to put comments or labels into the report file.

Section 17 Bayesian inference and the profile likelihood

AD Model Builder enables one to quickly build models with large numbers of parameters -- this is especially useful for employing Bayesian analysis. Traditionally however it has been difficult to interpret the results of analysis using such models. In a Bayesian context the results are represented by the posterior probability distribution for the model parameters. To get exact results from the posterior distribution it is necessary to evaluate integrals over large dimensional spaces and this can be computationally intractable. AD Model Builder provides an approximations to these integrals in the form of the profile likelihood. The profile likelihood can be used to estimates for extreme values (such as estimating a value so that for a parameter the probability that or the probability that ) for any model parameter. To use this facility simply declare the parameter of interest to be of type likeprof_number in the PARAMETER_SECTION and assign the correct value to the parameter in the PROCEDURE SECTION.

The code for the catch at age model estimates the profile likelihood for the projected biomass of age class 2+ fish. (Age class 2+ has been used to avoid the extra problem of dealing with the uncertainty of the recruitment of age class 1 fish). As a typical application of the method, the user of the model can estimate the probability that the biomass of fish for next year will be larger or smaller than a certain value. Estimates like these are obviously of great interest to managers of natural resources.

The profile likelihood report for a variable is in a file with the same name as the variable (truncated to eight letters, if necessary, with the suffix .PLT appended). For this example the report is in the file PRED_B.PLT. Part of the file is shown here.

Profile likelihood
 -1411.23 1.1604e-09
 -1250.5 1.71005e-09
 -1154.06 2.22411e-09
  ...................      // skip some here
 278.258 2.79633e-05
 324.632 5.28205e-05
 388.923 6.89413e-05
 453.214 8.84641e-05
 517.505 0.0001116
 581.796 0.000138412
 1289 0.000482459
 1353.29 0.000494449
 1417.58 0.000503261
 1481.87 0.000508715
 1546.16 0.0005107
 1610.45 0.000509175
 1674.74 0.000504171
 1739.03 0.000490788
 1803.32 0.000476089
 1867.61 0.000460214
 1931.91 0.000443313
 1996.2 0.000425539
 2060.49 0.000407049
 2124.78 0.000388
 2189.07 0.00036855
 4503.55 2.27712e-05
 4599.98 2.00312e-05
 4760.71 1.48842e-05
 4921.44 1.07058e-05
 5082.16 7.45383e-06
 6528.71 6.82689e-07
 6689.44 6.91085e-07
 6850.17 7.3193e-07
Minimum width confidence limits:
        significance level  lower bound  upper bound
               0.90             572.537     3153.43
               0.95             453.214     3467.07
               0.975            347.024     3667.76

One sided confidence limits for the profile likelihood:

The probability is 0.9 that pred_B is greater than 943.214 The probability is 0.95 that pred_B is greater than 750.503 The probability is 0.975 that pred_B is greater than 602.507

The probability is 0.9 that pred_B is less than 3173.97 The probability is 0.95 that pred_B is less than 3682.75 The probability is 0.975 that pred_B is less than 4199.03

The file contains the probability density function and the approximate confidence limits for the profile likelihood, the profile likelihood and the normal approximation. Since the format is the same for all three, we only discuss the profile likelihood here. The first part of the report contains pairs of numbers which consist of values of the parameter in the report (in this case PRED_B and the estimated value for the probability density associated with that parameter at the point. The probability that the parameter lies in the interval where can be estimated from the sum

The reports of the one and two sided confidence limits for the parameter were produced this way. Also a plot of verses gives the user an indication of what the probability distribution of the parameter looks like.

The the profile likelihood indicates the fact that the biomass can not be less than zero. The normal approximation is not very useful for calculating the probability that the biomass is very low -- a question of great interest to managers who are probably not going to be impressed by the knowledge that there is an estimated probability of 0.975 that the biomass is greater than -52.660.

One sided confidence limits for the normal approximation

The probability is 0.9 that pred_B is greater than 551.235 The probability is 0.95 that pred_B is greater than 202.374 The probability is 0.975 that pred_B is greater than -52.660

Section 18 Modifying the approximation of the profile likelihood

The functions set_stepnumber() and set_stepsize() can be used to modify the number of points used to approximate the profile likelihood or to change the stepsize between the points. This can be carried out in the PRELIMINARY_CALCS_SECTION. If u has been declared to be of type likeprof_number

  u.set_stepnumber(10);   // default value is 8
  u.set_stepsize(0.2);    // default value is 0.5
set_stepnumber option set_stepsize option will set the number of steps equal to 21 (from -10 to 10) and will set the step size equal to 0.2 times the estimated standard deviation for the parameter u.

Section 19 Changing the default file names for data and parameter input

adding a line of the users code The following code fragment illustrates how the files used for input of the data and parameter values can be changed. This code has been taken from the example catage.tpl and modified. In the DATA_SECTION, the data are first read in from the file catch.dat. Then the effort data are read in from the file effort.dat. The remainder of the data are read in from the file catch.dat. It is necessary to save the current file position in an object of type streampos. This object is used to position the file properly. The keyword !!USER_CODE can be used to include one line of the users's code into the DATA_SECTION or PARAMETER_SECTION. It is more compact than the LOCAL_CALCS construction.

 // will read data from file catchdat.dat
 !!USER_CODE ad_comm::change_datafile_name("catchdat.dat");
  init_int nyrs  
  init_int nages
  init_matrix obs_catch_at_age(1,nyrs,1,nages)
 // now read the effort data from the file effort.dat and save the current
 // file position in catchdat.dat in the object tmp
 !!USER_CODE streampos tmp = ad_comm::change_datafile_name("effort.dat");
  init_vector effort(1,nyrs)
 // now read the rest of the data from the file catchdat.dat 
 // including the ioption argument tmp will reset the file to that position
 !!USER_CODE ad_comm::change_datafile_name("catchdat.dat",tmp);
  init_number M

// ....

PARAMETER_SECTION // will read parameters from file catch.par !!USER_CODE ad_comm::change_parfile_name("catch.par");

Section 20 Using the subvector operation to avoid writing loops

If v is a vector object then for integers l and u the expression v(l,u) is a vector object of the same type with minimum valid index l and maximum valid index u (Of course l and u must be within the valid index range for v and l must be less than or equal to u. The subvector formed by this operation ican be used on both sides of the equals sign in an arithmetic expression. The number of loops which must be written can be significantly reduced in this manner. We shall use the subvector operator to remove some of the loops in the catch-at-age model code.

  // calculate the selectivity from the sel_coffs
  for (int j=1;j<nages;j++)
  // the selectivity is the same for the last two age classes

// same code using the subvector operation log_sel(1,nage-1)=log_sel_coff; // the selectivity is the same for the last two age classes log_sel(nages)=log_sel_coff(nages-1);

Notice that log_sel(1,nage-1) is not a distinct vector from log_sel. This means that an assignment to log_sel(1,nage-1) is an assignment to a part of log_sel. The next example is a bit more complicated. It involves taking a row of a matrix, to form a vector, forming a subvector, and changing the valid index range for the vector. \bestbreak

  // loop form of the code
  for (i=1;i<nyrs;i++)
    for (j=1;j<nages;j++)

// can only eliminate the inside loop for (i=1;i<nyrs;i++) { // ++ increments the index bounds by 1 N(i+1)(2,nyrs)=++elem_prod(N(i)(1,nage-1),S(i)(1,nage-1)); }

Notice that N(i+1) is a vector object so that N(i+1)(2,nyrs) is a subvector of N(i). Another point is that elem_prod(N(i)(1,nage-1),S(i)(1,nage-1)) is a vector object with minimum valid index 1 and maximum valid index nyrs-1. The operator ++ applied to a subvector increments the valid index range by 1 so that it has the same range of valid index values as N(i+1)(2,nyrs). The operator -- would decrement the valid index range by 1.

Section 21 The use of higher dimensional arrays

The example contained in the file FOURD.TPL illustrates some aspects of the use of three and four dimensional arrays. There are now examples of the use of arrays up to dimension 7 in the documentation.
  init_4darray d4(1,2,1,2,1,3,1,3)
  init_3darray d3(1,2,1,3,1,3)
  init_matrix M(1,3,1,3) 
  4darray p4(1,2,1,2,1,3,2,3)
  objective_function_value f
 for (int i=1;i<=3;i++)
   M(i,i)=1;   // set M equal to the identity matrix to start
 for (int i=1;i<=2;i++)
   for (int j=1;j<=2;j++)
     // d4(i,j) is a 3x3 matrix -- d3(i) is a 3x3 matrix 
     // d4(i,j)*M is matrix multiplication -- inv(M) is matrix inverse
     f+= norm2( d4(i,j)*M + d3(i)+ inv(M) );
  report << "Printout of a 4 dimensional array" << endl << endl;
  report << d4 << endl << endl;
  report << "Printout of a 3 dimensional array" << endl << endl;
  report << d3 << endl << endl;
In the DATA_SECTION you can use 3darrays, 4darrays, init_3darrays, init_4darrays. In the PARAMETER_SECTION you can use 3darrays, 4darrays, but at the time of writing init_3darrays, init_4darrays had not been implemented for the PARAMETER_SECTION.

If d4 is a 4darray then d4(i) is a three dimensional array and d4(i,j) is a matrix object so that d4(i,j)*M is matrix multiplication. Similarly if d3 is a 3darray then d3(i) is a matrix object so that d4(i,j)*M + d3(i) +inv(M) combines matrix multiplication, matrix inversion, and matrix addition.