Quantal is a set of routines designed to estimate some "common models", for example probit, logit, multinomial logit, ordered probit, and Poisson regression, usually by maximum likelihood.
Given the availability of Maxlik, it is often easiest to write one's own routines for the analyses found in Quantal. Accordingly, the talk was focused on Maxlik. Again, because of the compactness of GAUSS, the basic steps of a quasi-Newton procedure are easy to program. However, Maxlik does provide a considerable choice of algorithms, and may be worth using for this reason.
The newest version of Maxlik providing for constrained optimisation is not yet available. Maxlik assumes that the likelihood is a sum of identical contributions, one from each observation. The user needs to provide a procedure to calculate the likelihood either for a single observation or a vector of observations. The default is to use numerical derivatives, but the calculation is speeded up, sometimes considerably, if the user provides, a routine which computes the gradient matrix, and, maybe, a routine which computes the Hessian.
The algorithms used are extensively discussed in, for example, Press, et al. (1986). They are all quasi-Newton methods, using an algorithm of the general form, if Theta is the current approximation to parameter vector at the maximum, d a current direction of search, and g the gradient vector,
Theta = Theta + l d
d = H g
Maxlik provides a choice of different approximations to the Hessian H and of the step length l. For the Hessian, and thus the direction, the selection includes BFGS (Broyden, Fletcher, Goldfarb and Shanno) and BHHH (Berndt, Hall, Hall, and Hausman).
The iteration stops when the elasticities of the function with respect to the parameters are less than a user-chosen tolerance. This is implemented by first replacing any parameter less than unity in absolute value by unity; the function value is treated in the same way. The use of elasticities rather than derivatives is suggested in Dennis and Schnabel (1983), but is unusual. As the source code is accessible, one can modify the criterion if this is thought desirable. Davidson and Mackinnon (1993) Section 6.8 and the references therein are informative.
The speaker suggested that the user's likelihood routine be tested with simulated data and thus known parameter values. If an analytic derivative routine is also written, it can be tested using the maxgrd utility provided.
The problem of calculating the gradient analytically may be simplified if the derivative matrix of likelihood contributions (rows) with respect to parameters (columns) has a simple structure. For example, for the Tobit model, this matrix is the matrix of regressor variables with rows weighted by a certain vector. In problems of this form, one could numerically approximate the first column, that is, the derivatives with respect to the first parameter, and obtain the rest of the matrix of derivatives, D say, by scaling this column. The outer product approximation for the Hessian is then just D'D, which can also be used. An application of this to the Tobit model was given.
A preference was expressed for BFGS for the Hessian/direction choice and the option STEPBT for the step length, switching from BFGS to BHHH at a later stage. There are facilities for switching at a point chosen by the user, and, as an alternative, for interactive control of the iterative process.
As well as application modules like Maxlik and Optmum, there are third party products, like Coint, and GAUSSX, which is for econometric analysis. EMCM (Estimation of Multinomial Choice Models) is described in Weeks(1995). One of the advantages of GAUSS is its flexibility, sufficient to handle a wide variety of problems. Another is that as a matrix programming language, it corresponds closely to the underlying `algebraic form'. b = inv(X'X)*(X'y); is a valid GAUSS statement calculating OLS coefficients. In general, the distance from algebra to working GAUSS code is very short. GAUSS is widely and increasingly used. Judge et al. (1989) uses GAUSS to work through examples and exercises, and forms a useful GAUSS primer.
A disadvantage is that one needs GAUSS to run the resulting programs. Authors can distribute their programs in compiled form, for which user will not need the source run-time libraries, but will need the basic GAUSS language.
The speed of GAUSS is very dependent on the degree to which a calculation is vectorised. For example, a vector ans coded 10, 0 and -10 could be recoded as 1, 2 or 3 in a vector x either by
x = (ans == 10) + 2*(ans == 0 ) + 3*(ans == -10);
or by using a do-loop. For a vector of length 100 the latter would take 36 times longer. On a 33-MHz 486, a vector of length 8,100 takes 0.123 seconds if vectorised.
Graphics in GAUSS is adequate, but not nearly as well integrated as, for example, in S-Plus. Development time of the computationally intense parts of programs is very short compared with, say, C++ or FORTRAN. However, the file handling is, again, adequate, but not highly developed, and writing a menu-driven user interface is very laborious. The easiest way to distribute software is as routines plus a driver program or parameter file which users edit for their own data sets and applications.
Whether to use GAUSS clearly depends on the opportunity costs and the availability of assistance. Help is available from ProGAMMA, a not-for-profit cooperation between seven Dutch and Belgian universities.
The unit root and cointegration estimation and testing routines provide Johansen, Stock and Watson, Park, Saikkonen, and Dickey, procedures as well as those associated with P.C.B. Phillips and his co-authors (Ouliaris, Kitamura, Hansen et al.) The routines do the required calculations, leaving the user to display and label the results, which are described in the manual. Routines are provided to calculate critical values or tail area probabilities where necessary, and these are in general also returned by the routine calculating the statistic(s) required. An omission is the provision of the p-values for the Hansen stability tests.
The 12 main ARMA routines provide for various combinations of information criterion and estimation method, with graphs of the criterion `surface' for various p and q. The 8 spectral routines estimate the spectrum either using one of the ten kernels available, or estimating an ARMA model from which the spectrum is calculated. Smith and Smith (1995) observe that legends can sometimes obscure the main curves on a graph. It is possible to alter the graphs oneself, if one is willing to edit the source, and is familiar with GAUSS's graphical facilities.
The Bayes routines provide the posterior densities, with graphs, of the long run autoregressive coefficient, using Jeffries' and uniform priors, and the analysis of the residuals of cointegrating regressions, with the addition of Phillips' (1992) e-prior.
GAUSS error messages are not as helpful as they might be. Typing "==" for "=" can result in a request to contact the manufacturer ! For this reason, users of COINT, or any other set of GAUSS routines, probably need access to an experienced GAUSS user. Thus it does not matter that the routines are not user-proof. Omitting a transpose when calling spwxgrf, which calculates spectra using 5 different kernels and graphs the results, gives an error in polyeval, the GAUSS polynomial evaluation routine. An inexperienced user might find this confusing, but a familiarity with the language leads to rapid identification of the error. The availability of the source enables users to modify the code themselves, for example for incorporation in Monte Carlo analysis.
COINT represents a considerable stock of intellectual capital, and is a worthwhile investment for any user needing to program or use the routines not available elsewhere. There are rough edges, the routines are not all constructed in the same way, and the manual has some minor slips. These are mostly in parts of examples provided for several different routines, and easily corrected. The earlier routines also have example files provided.
While it does appear possible to trick one spectral routine into providing negative estimates if the data is sufficiently inappropriate, this package should be very useful in at least moderately experienced hands.
REFERENCES
R.J. O'Brien
University of Southampton
Acknowledgement: My thanks are due to the first two presenters, for providing their slides. I have freely adapted the material, and am solely responsible for any errors.