•
USP-SP
Prévia do material em texto
<p>Use R!</p><p>Advisors:</p><p>Robert Gentleman</p><p>Kurt Hornik</p><p>Giovanni Parmigiani</p><p>For other titles published in this series, go to</p><p>http://www.springer.com/series/6991</p><p>Paul S.P. Cowpertwait · Andrew V. Metcalfe</p><p>Introductory Time Series</p><p>with R</p><p>123</p><p>Paul S.P. Cowpertwait</p><p>Inst. Information and</p><p>Mathematical Sciences</p><p>Massey University</p><p>Auckland</p><p>Albany Campus</p><p>New Zealand</p><p>p.s.cowpertwait@massey.ac.nz</p><p>Andrew V. Metcalfe</p><p>School of Mathematical</p><p>Sciences</p><p>University of Adelaide</p><p>Adelaide SA 5005</p><p>Australia</p><p>andrew.metcalfe@adelaide.edu.au</p><p>Series Editors</p><p>Robert Gentleman</p><p>Program in Computational Biology</p><p>Division of Public Health Sciences</p><p>Fred Hutchinson Cancer Research Center</p><p>1100 Fairview Avenue, N. M2-B876</p><p>Seattle, Washington 98109</p><p>USA</p><p>Giovanni Parmigiani</p><p>The Sidney Kimmel Comprehensive Cancer</p><p>Center at Johns Hopkins University</p><p>550 North Broadway</p><p>Baltimore, MD 21205-2011</p><p>USA</p><p>Kurt Hornik</p><p>Department of Statistik and Mathematik</p><p>Wirtschaftsuniversität Wien Augasse 2-6</p><p>A-1090 Wien</p><p>Austria</p><p>ISBN 978-0-387-88697-8 e-ISBN 978-0-387-88698-5</p><p>DOI 10.1007/978-0-387-88698-5</p><p>Springer Dordrecht Heidelberg London New York</p><p>Library of Congress Control Number: 2009928496</p><p>c© Springer Science+Business Media, LLC 2009</p><p>All rights reserved. This work may not be translated or copied in whole or in part without the written</p><p>permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,</p><p>NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in</p><p>connection with any form of information storage and retrieval, electronic adaptation, computer</p><p>software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.</p><p>The use in this publication of trade names, trademarks, service marks, and similar terms, even if</p><p>they are not identified as such, is not to be taken as an expression of opinion as to whether or not</p><p>they are subject to proprietary rights.</p><p>Printed on acid-free paper</p><p>Springer is part of Springer Science+Business Media (www.springer.com)</p><p>In memory of Ian Cowpertwait</p><p>Preface</p><p>R has a command line interface that offers considerable advantages over menu</p><p>systems in terms of efficiency and speed once the commands are known and the</p><p>language understood. However, the command line system can be daunting for</p><p>the first-time user, so there is a need for concise texts to enable the student or</p><p>analyst to make progress with R in their area of study. This book aims to fulfil</p><p>that need in the area of time series to enable the non-specialist to progress,</p><p>at a fairly quick pace, to a level where they can confidently apply a range of</p><p>time series methods to a variety of data sets. The book assumes the reader</p><p>has a knowledge typical of a first-year university statistics course and is based</p><p>around lecture notes from a range of time series courses that we have taught</p><p>over the last twenty years. Some of this material has been delivered to post-</p><p>graduate finance students during a concentrated six-week course and was well</p><p>received, so a selection of the material could be mastered in a concentrated</p><p>course, although in general it would be more suited to being spread over a</p><p>complete semester.</p><p>The book is based around practical applications and generally follows a</p><p>similar format for each time series model being studied. First, there is an</p><p>introductory motivational section that describes practical reasons why the</p><p>model may be needed. Second, the model is described and defined in math-</p><p>ematical notation. The model is then used to simulate synthetic data using</p><p>R code that closely reflects the model definition and then fitted to the syn-</p><p>thetic data to recover the underlying model parameters. Finally, the model</p><p>is fitted to an example historical data set and appropriate diagnostic plots</p><p>given. By using R, the whole procedure can be reproduced by the reader,</p><p>and it is recommended that students work through most of the examples.1</p><p>Mathematical derivations are provided in separate frames and starred sec-</p><p>1 We used the R package Sweave to ensure that, in general, your code will produce</p><p>the same output as ours. However, for stylistic reasons we sometimes edited our</p><p>code; e.g., for the plots there will sometimes be minor differences between those</p><p>generated by the code in the text and those shown in the actual figures.</p><p>vii</p><p>viii Preface</p><p>tions and can be omitted by those wanting to progress quickly to practical</p><p>applications. At the end of each chapter, a concise summary of the R com-</p><p>mands that were used is given followed by exercises. All data sets used in</p><p>the book, and solutions to the odd numbered exercises, are available on the</p><p>website http://www.massey.ac.nz/∼pscowper/ts.</p><p>We thank John Kimmel of Springer and the anonymous referees for their</p><p>helpful guidance and suggestions, Brian Webby for careful reading of the text</p><p>and valuable comments, and John Xie for useful comments on an earlier draft.</p><p>The Institute of Information and Mathematical Sciences at Massey Univer-</p><p>sity and the School of Mathematical Sciences, University of Adelaide, are</p><p>acknowledged for support and funding that made our collaboration possible.</p><p>Paul thanks his wife, Sarah, for her continual encouragement and support</p><p>during the writing of this book, and our son, Daniel, and daughters, Lydia</p><p>and Louise, for the joy they bring to our lives. Andrew thanks Natalie for</p><p>providing inspiration and her enthusiasm for the project.</p><p>Paul Cowpertwait and Andrew Metcalfe</p><p>Massey University, Auckland, New Zealand</p><p>University of Adelaide, Australia</p><p>December 2008</p><p>Contents</p><p>Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii</p><p>1 Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1</p><p>1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1</p><p>1.2 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2</p><p>1.3 R language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3</p><p>1.4 Plots, trends, and seasonal variation . . . . . . . . . . . . . . . . . . . . . . . 4</p><p>1.4.1 A flying start: Air passenger bookings . . . . . . . . . . . . . . . . 4</p><p>1.4.2 Unemployment: Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7</p><p>1.4.3 Multiple time series: Electricity, beer and chocolate data 10</p><p>1.4.4 Quarterly exchange rate: GBP to NZ dollar . . . . . . . . . . . 14</p><p>1.4.5 Global temperature series . . . . . . . . . . . . . . . . . . . . . . . . . . 16</p><p>1.5 Decomposition of series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>1.5.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>1.5.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>1.5.3 Estimating trends and seasonal effects . . . . . . . . . . . . . . . 20</p><p>1.5.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21</p><p>1.5.5 Decomposition in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22</p><p>1.6 Summary of commands used in examples . . . . . . . . . . . . . . . . . . . 24</p><p>1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24</p><p>2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.2 Expectation and the ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.2.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.2.2 The ensemble and stationarity . . . . . . . . . . . . . . . . . . . . . . 30</p><p>2.2.3 Ergodic series* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31</p><p>2.2.4 Variance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .</p><p>also used for smoothing, particularly in the engi-</p><p>neering literature. A more specific use of the term filtering is the process of</p><p>obtaining the best estimate of some variable now, given the latest measure-</p><p>ment of it and past measurements. The measurements are subject to random</p><p>error and are described as being corrupted by noise. Filtering is an important</p><p>part of control algorithms which have a myriad of applications. An exotic ex-</p><p>ample is the Huygens probe leaving the Cassini orbiter to land on Saturn’s</p><p>largest moon, Titan, on January 14, 2005.</p><p>1.5.5 Decomposition in R</p><p>In R, the function decompose estimates trends and seasonal effects using</p><p>a moving average method. Nesting the function within plot (e.g., using</p><p>plot(stl())) produces a single figure showing the original series xt and the</p><p>decomposed series mt, st, and zt. For example, with the electricity data, addi-</p><p>tive and multiplicative decomposition plots are given by the commands below;</p><p>the last plot, which uses lty to give different line types, is the superposition</p><p>of the seasonal effect on the trend (Fig. 1.13).</p><p>Time</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>Fig. 1.13. Electricity production data: trend with superimposed multiplicative sea-</p><p>sonal effects.</p><p>1.5 Decomposition of series 23</p><p>> plot(decompose(Elec.ts))</p><p>> Elec.decom plot(Elec.decom)</p><p>> Trend Seasonal ts.plot(cbind(Trend, Trend * Seasonal), lty = 1:2)</p><p>20</p><p>00</p><p>80</p><p>00</p><p>14</p><p>00</p><p>0</p><p>ob</p><p>se</p><p>rv</p><p>ed</p><p>20</p><p>00</p><p>80</p><p>00</p><p>tr</p><p>en</p><p>d</p><p>0.</p><p>90</p><p>1.</p><p>00</p><p>1.</p><p>10</p><p>se</p><p>as</p><p>on</p><p>al</p><p>0.</p><p>94</p><p>1.</p><p>00</p><p>1.</p><p>06</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>ra</p><p>nd</p><p>om</p><p>Time</p><p>Decomposition of multiplicative time series</p><p>Fig. 1.14. Decomposition of the electricity production data.</p><p>In this example, the multiplicative model would seem more appropriate</p><p>than the additive model because the variance of the original series and trend</p><p>increase with time (Fig. 1.14). However, the random component, which cor-</p><p>responds to zt, also has an increasing variance, which indicates that a log-</p><p>transformation (Equation (1.4)) may be more appropriate for this series (Fig.</p><p>1.14). The random series obtained from the decompose function is not pre-</p><p>cisely a realisation of the random process zt but rather an estimate of that</p><p>realisation. It is an estimate because it is obtained from the original time</p><p>series using estimates of the trend and seasonal effects. This estimate of the</p><p>realisation of the random process is a residual error series. However, we treat</p><p>it as a realisation of the random process.</p><p>There are many other reasonable methods for decomposing time series,</p><p>and we cover some of these in Chapter 5 when we study regression methods.</p><p>24 1 Time Series Data</p><p>1.6 Summary of commands used in examples</p><p>read.table reads data into a data frame</p><p>attach makes names of column variables available</p><p>ts produces a time series object</p><p>aggregate creates an aggregated series</p><p>ts.plot produces a time plot for one or more series</p><p>window extracts a subset of a time series</p><p>time extracts the time from a time series object</p><p>ts.intersect creates the intersection of one or more time series</p><p>cycle returns the season for each value in a series</p><p>decompose decomposes a series into the components</p><p>trend, seasonal effect, and residual</p><p>stl decomposes a series using loess smoothing</p><p>summary summarises an R object</p><p>1.7 Exercises</p><p>1. Carry out the following exploratory time series analysis in R using either</p><p>the chocolate or the beer production data from §1.4.3.</p><p>a) Produce a time plot of the data. Plot the aggregated annual series and</p><p>a boxplot that summarises the observed values for each season, and</p><p>comment on the plots.</p><p>b) Decompose the series into the components trend, seasonal effect, and</p><p>residuals, and plot the decomposed series. Produce a plot of the trend</p><p>with a superimposed seasonal effect.</p><p>2. Many economic time series are based on indices. A price index is the</p><p>ratio of the cost of a basket of goods now to its cost in some base year.</p><p>In the Laspeyre formulation, the basket is based on typical purchases in</p><p>the base year. You are asked to calculate an index of motoring cost from</p><p>the following data. The clutch represents all mechanical parts, and the</p><p>quantity allows for this.</p><p>item quantity ’00 unit price ’00 quantity ’04 unit price ’04</p><p>(i) (qi0) (pi0) (qit) (pit)</p><p>car 0.33 18 000 0.5 20 000</p><p>petrol (litre) 2 000 0.80 1 500 1.60</p><p>servicing (h) 40 40 20 60</p><p>tyre 3 80 2 120</p><p>clutch 2 200 1 360</p><p>The Laspeyre Price Index at time t relative to base year 0 is</p><p>LIt =</p><p>∑</p><p>qi0pit∑</p><p>qi0pi0</p><p>1.7 Exercises 25</p><p>Calculate the LIt for 2004 relative to 2000.</p><p>3. The Paasche Price Index at time t relative to base year 0 is</p><p>PIt =</p><p>∑</p><p>qitpit∑</p><p>qitpi0</p><p>a) Use the data above to calculate the PIt for 2004 relative to 2000.</p><p>b) Explain why the PIt is usually lower than the LIt.</p><p>c) Calculate the Irving-Fisher Price Index as the geometric mean of LIt</p><p>and PIt. (The geometric mean of a sample of n items is the nth root</p><p>of their product.)</p><p>4. A standard procedure for finding an approximate mean and variance of a</p><p>function of a variable is to use a Taylor expansion for the function about</p><p>the mean of the variable. Suppose the variable is y and that its mean and</p><p>standard deviation are µ and σ respectively.</p><p>φ(y) = φ(µ) + φ′(µ)(y − µ) + φ′′(µ)</p><p>(y − µ)2</p><p>2!</p><p>+ φ′′′(µ)</p><p>(y − µ)3</p><p>3!</p><p>+ . . .</p><p>Consider the case of φ(.) as e(.). By taking the expectation of both sides</p><p>of this equation, explain why the bias correction factor given in Equation</p><p>(1.5) is an overcorrection if the residual series has a negative skewness,</p><p>where the skewness γ of a random variable y is defined by</p><p>γ =</p><p>E</p><p>[</p><p>(y − µ)3</p><p>]</p><p>σ3</p><p>2</p><p>Correlation</p><p>2.1 Purpose</p><p>Once we have identified any trend and seasonal effects, we can deseasonalise</p><p>the time series and remove the trend. If we use the additive decomposition</p><p>method of §1.5, we first calculate the seasonally adjusted time series and</p><p>then remove the trend by subtraction. This leaves the random component,</p><p>but the random component is not necessarily well modelled by independent</p><p>random variables. In many cases, consecutive variables will be correlated. If</p><p>we identify such correlations, we can improve our forecasts, quite dramatically</p><p>if the correlations are high. We also need to estimate correlations if we are</p><p>to generate realistic time series for simulations. The correlation structure of a</p><p>time series model is defined by the correlation function, and we estimate this</p><p>from the observed time series.</p><p>Plots of serial correlation (the ‘correlogram’, defined later) are also used</p><p>extensively in signal processing applications. The paradigm is an underlying</p><p>deterministic signal corrupted by noise. Signals from yachts, ships, aeroplanes,</p><p>and space exploration vehicles are examples. At the beginning of 2007, NASA’s</p><p>twin Voyager spacecraft were sending back radio signals from the frontier of</p><p>our solar system, including evidence of hollows in the turbulent zone near the</p><p>edge.</p><p>2.2 Expectation and the ensemble</p><p>2.2.1 Expected value</p><p>The expected value, commonly abbreviated to expectation, E, of a variable,</p><p>or a function of a variable, is its mean value in a population. So E(x) is the</p><p>mean of x, denoted µ,1 and E</p><p>[</p><p>(x− µ)2</p><p>]</p><p>is the mean of the squared deviations</p><p>1 A more formal definition of the expectation E of a function φ(x, y) of continuous</p><p>random variables x and y, with a joint probability density function f(x, y), is the</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 27</p><p>Use R, DOI 10.1007/978-0-387-88698-5 2,</p><p>© Springer Science+Business Media, LLC 2009</p><p>28 2 Correlation</p><p>about µ, better known as the variance σ2 of x.2 The standard deviation, σ is</p><p>the square root of the variance. If there are two variables (x, y), the variance</p><p>may be generalised to the covariance, γ(x, y). Covariance is defined by</p><p>γ(x, y) = E [(x− µx)(y − µy)] (2.1)</p><p>The covariance is a measure of linear association between two variables</p><p>(x, y). In §1.4.3, we emphasised that a linear association between variables</p><p>does not imply causality.</p><p>Sample estimates are obtained by adding the appropriate function of the</p><p>individual data values and division by n or, in the case of variance and co-</p><p>variance, n− 1, to give unbiased estimators.3 For example, if we have n data</p><p>pairs, (xi, yi), the sample covariance is given by</p><p>Cov(x, y) =</p><p>∑</p><p>(xi − x)(yi − y)/(n− 1) (2.2)</p><p>If the data pairs are plotted, the lines x = x and y = y divide the plot into</p><p>quadrants. Points in the lower left quadrant have both (xi − x) and (yi − y)</p><p>negative, so the product that contributes to the covariance is positive. Points in</p><p>the upper right quadrant also make a positive contribution. In contrast, points</p><p>in the upper left and lower right quadrants make a negative contribution to the</p><p>covariance. Thus, if y tends to increase when x increases, most of the points</p><p>will be in the lower left and upper right quadrants and the covariance will</p><p>be positive. Conversely, if y tends to decrease as x increases, the covariance</p><p>will be negative. If there is no such linear association, the covariance will be</p><p>small relative to the standard deviations of {xi} and {yi} – always check the</p><p>plot in case there is a quadratic association or some other pattern. In R we</p><p>can calculate a sample covariance, with denominator n−1, from its definition</p><p>or by using the function cov. If we use the mean function, we are implicitly</p><p>dividing by n.</p><p>Benzoapyrene is a carcinogenic hydrocarbon that is a product of incom-</p><p>plete combustion. One source of benzoapyrene and carbon monoxide is au-</p><p>tomobile exhaust. Colucci and Begeman (1971) analysed sixteen air samples</p><p>mean value for φ obtained by integrating over all possible values of x and y:</p><p>E [φ(x, y)] =</p><p>∫</p><p>y</p><p>∫</p><p>x</p><p>φ(x, y)f(x, y) dx dy</p><p>Note that the mean of x is obtained as the special case φ(x, y) = x.</p><p>2 For more than one variable, subscripts can be used to distinguish between the</p><p>properties; e.g., for the means we may write µx and µy to distinguish between</p><p>the mean of x and the mean of y.</p><p>3 An estimator is unbiased for a population parameter if its average value, in in-</p><p>finitely repeated samples of size n, equals that population parameter. If an esti-</p><p>mator is unbiased, its value in a particular sample is referred to as an unbiased</p><p>estimate.</p><p>2.2 Expectation and the ensemble 29</p><p>from Herald Square in Manhattan and recorded the carbon monoxide con-</p><p>centration (x, in parts per million) and benzoapyrene concentration (y, in</p><p>micrograms per thousand cubic metres) for each sample. The data are plotted</p><p>in Figure 2.1.</p><p>Fig. 2.1. Sixteen air samples from Herald Square.</p><p>> www Herald.dat attach (Herald.dat)</p><p>We now use R to calculate the covariance for the Herald Square pairs in</p><p>three different ways:</p><p>> x sum((x - mean(x))*(y - mean(y))) / (n - 1)</p><p>[1] 5.51</p><p>> mean((x - mean(x)) * (y - mean(y)))</p><p>[1] 5.17</p><p>> cov(x, y)</p><p>[1] 5.51</p><p>The correspondence between the R code above and the expectation defini-</p><p>tion of covariance should be noted:</p><p>mean((x - mean(x))*(y - mean(y)))→ E [(x− µx)(y − µy)] (2.3)</p><p>●</p><p>●</p><p>●●●●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>5 10 15 20</p><p>0</p><p>2</p><p>4</p><p>6</p><p>8</p><p>CO</p><p>be</p><p>nz</p><p>oa</p><p>py</p><p>re</p><p>ne</p><p>30 2 Correlation</p><p>Given this correspondence, the more natural estimate of covariance would</p><p>be mean((x - mean(x))*(y - mean(y))). However, as can be seen above,</p><p>the values computed using the internal function cov are those obtained using</p><p>sum with a denominator of n − 1. As n gets large, the difference in denomi-</p><p>nators becomes less noticeable and the more natural estimate asymptotically</p><p>approaches the unbiased estimate.4</p><p>Correlation is a dimensionless measure of the linear association between</p><p>a pair of variables (x, y) and is obtained by standardising the covariance by</p><p>dividing it by the product of the standard deviations of the variables. Corre-</p><p>lation takes a value between −1 and +1, with a value of 0 indicating no linear</p><p>association. The population correlation, ρ, between a pair of variables (x, y)</p><p>is defined by</p><p>ρ(x, y) =</p><p>E [(x− µx)(y − µy)]</p><p>σxσy</p><p>=</p><p>γ(x, y)</p><p>σxσy</p><p>(2.4)</p><p>The sample correlation, Cor, is an estimate of ρ and is calculated as</p><p>Cor(x, y) =</p><p>Cov(x, y)</p><p>sd(x)sd(y)</p><p>(2.5)</p><p>In R, the sample correlation for pairs (xi, yi) stored in vectors x and y is</p><p>cor(x,y). A value of +1 or −1 indicates an exact linear association, with the</p><p>(x, y) pairs falling on a straight line of positive or negative slope, respectively.</p><p>The correlation between the CO and benzoapyrene measurements at Herald</p><p>Square is now calculated both from the definition and using cor.</p><p>> cov(x,y) / (sd(x)*sd(y))</p><p>[1] 0.3551</p><p>> cor(x,y)</p><p>[1] 0.3551</p><p>Although the correlation is small, there is nevertheless a physical expla-</p><p>nation for the correlation because both products are a result of incomplete</p><p>combustion. A correlation of 0.36 typically corresponds to a slight visual im-</p><p>pression that y tends to increase as x increases, although the points will be</p><p>well scattered.</p><p>2.2.2 The ensemble and stationarity</p><p>The mean function of a time series model is</p><p>µ(t) = E (xt) (2.6)</p><p>and, in general, is a function of t. The expectation in this definition is an</p><p>average taken across the ensemble of all the possible time series that might</p><p>4 In statistics, asymptotically means as the sample size approaches infinity.</p><p>2.2 Expectation and the ensemble 31</p><p>have been produced by the time series model (Fig. 2.2). The ensemble consti-</p><p>tutes the entire population. If we have a time series model, we can simulate</p><p>more than one time series (see Chapter 4). However, with historical data, we</p><p>usually only have a single time series so all we can do, without assuming a</p><p>mathematical structure for the trend, is to estimate the mean at each sample</p><p>point by the corresponding observed value. In practice, we make estimates of</p><p>any apparent trend and seasonal effects in our data and remove them, using</p><p>decompose for example, to obtain time series of the random component. Then</p><p>time series models with a constant mean will be appropriate.</p><p>If the mean function is constant, we say that the time series model is</p><p>stationary in the mean. The sample estimate of the population mean, µ, is</p><p>the sample mean, x̄:</p><p>x̄ =</p><p>n∑</p><p>t=1</p><p>xt/n (2.7)</p><p>Equation (2.7) does rely on an assumption that a sufficiently long time series</p><p>characterises the hypothetical model. Such models are known as ergodic, and</p><p>the models in this book are all ergodic.</p><p>2.2.3 Ergodic series*</p><p>A time series model that is stationary in the mean is ergodic in the mean if</p><p>the time average for a single time series tends to the ensemble mean as the</p><p>length of the time series increases:</p><p>lim</p><p>n→∞</p><p>∑</p><p>xt</p><p>n</p><p>= µ (2.8)</p><p>This implies that the time average is independent of the starting point. Given</p><p>that we usually only have a single time series, you may wonder how a time</p><p>series model can fail to be ergodic, or why we should want a model that is</p><p>not ergodic. Environmental and economic time series are single realisations of</p><p>a hypothetical time series model, and we simply define the underlying model</p><p>as ergodic.</p><p>There are, however, cases in which we can have many time series arising</p><p>from the same time series model. Suppose we investigate the acceleration at</p><p>the pilot seat of a new design of microlight aircraft in simulated random gusts</p><p>in a wind tunnel. Even if we have built two prototypes to the same design,</p><p>we cannot be certain they will have the same average acceleration response</p><p>because of slight differences in manufacture. In such cases, the number of time</p><p>series is equal to the number of prototypes. Another example is an experiment</p><p>investigating turbulent flows in some complex system. It is possible that we</p><p>will obtain qualitatively different results from different runs because they do</p><p>depend on initial conditions. It would seem better to run an experiment in-</p><p>volving turbulence many times than to run it once for a much longer time.</p><p>The number of runs is the number of time series. It is straightforward to adapt</p><p>32 2 Correlation</p><p>Time</p><p>E</p><p>ns</p><p>em</p><p>bl</p><p>e</p><p>po</p><p>pu</p><p>la</p><p>tio</p><p>n</p><p>t</p><p>Fig. 2.2. An ensemble of time series. The expected value E(xt) at a particular time</p><p>t is the average taken over the entire population.</p><p>a stationary time series model to be non-ergodic by defining the means for</p><p>the individual time series to be from some probability distribution.</p><p>2.2.4 Variance function</p><p>The variance function of a time series model that is stationary in the mean is</p><p>σ2(t) = E</p><p>[</p><p>(xt − µ)2</p><p>]</p><p>(2.9)</p><p>which can, in principle, take a different value at every time t. But we cannot</p><p>estimate a different variance at each time point from a single time series. To</p><p>progress, we must make some simplifying assumption. If we assume the model</p><p>is stationary in the variance, this constant population variance, σ2, can be</p><p>estimated from the sample variance:</p><p>Var(x) =</p><p>∑</p><p>(xt − x)2</p><p>n− 1</p><p>(2.10)</p><p>2.2 Expectation and the ensemble 33</p><p>In a time series analysis, sequential observations may be correlated. If the cor-</p><p>relation is positive, Var(x) will tend to underestimate the population variance</p><p>in a short series because successive observations tend to be relatively similar.</p><p>In most cases, this does not present a problem since the bias decreases rapidly</p><p>as the length n of the series increases.</p><p>2.2.5 Autocorrelation</p><p>The mean and variance play an important role in the study of statistical</p><p>distributions because they summarise two key distributional properties – a</p><p>central location and the spread. Similarly, in the study of time series models,</p><p>a key role is played by the second-order properties, which include the mean,</p><p>variance, and serial correlation (described below).</p><p>Consider a time series model that is stationary in the mean and the vari-</p><p>ance. The variables may be correlated, and the model is second-order sta-</p><p>tionary if the correlation between variables depends only on the number of</p><p>time steps separating them. The number of time steps between the variables</p><p>is known as the lag. A correlation of a variable with itself at different times</p><p>is known as autocorrelation or serial correlation. If a time series model is</p><p>second-order stationary, we can define an autocovariance function (acvf ), γk,</p><p>as a function of the lag k:</p><p>γk = E [(xt − µ)(xt+k − µ)] (2.11)</p><p>The function γk does not depend on t because the expectation, which is across</p><p>the ensemble, is the same at all times t. This definition follows naturally from</p><p>Equation (2.1) by replacing x with xt and y with xt+k and noting that the</p><p>mean µ is the mean of both xt and xt+k. The lag k autocorrelation function</p><p>(acf ), ρk, is defined by</p><p>ρk =</p><p>γk</p><p>σ2</p><p>(2.12)</p><p>It follows from the definition that ρ0 is 1.</p><p>It is possible to set up a second-order stationary time series model that</p><p>has skewness; for example, one that depends on time t. Applications for such</p><p>models are rare, and it is customary to drop the term ‘second-order’ and</p><p>use ‘stationary’ on its own for a time series model that is at least second-</p><p>order stationary. The term strictly stationary is reserved for more rigorous</p><p>conditions.</p><p>The acvf and acf can be estimated from a time series by their sample</p><p>equivalents. The sample acvf, ck, is calculated as</p><p>ck =</p><p>1</p><p>n</p><p>n−k∑</p><p>t=1</p><p>(</p><p>xt − x</p><p>)(</p><p>xt+k − x</p><p>)</p><p>(2.13)</p><p>Note that the autocovariance at lag 0, c0, is the variance calculated with a</p><p>denominator n. Also, a denominator n is used when calculating ck, although</p><p>34 2 Correlation</p><p>only n − k terms are added to form the numerator. Adopting this definition</p><p>constrains all sample autocorrelations to lie between −1 and 1. The sample</p><p>acf is defined as</p><p>rk =</p><p>ck</p><p>c0</p><p>(2.14)</p><p>We will demonstrate the calculations in R using a time series of wave</p><p>heights (mm relative to still water level) measured at the centre of a wave tank.</p><p>The sampling interval is 0.1 second and the record length is 39.7 seconds. The</p><p>waves were generated by a wave maker driven by a pseudo-random signal that</p><p>was programmed to emulate a rough sea. There is no trend and no seasonal</p><p>period, so it is reasonable to suppose the time series is a realisation of a</p><p>stationary process.</p><p>> www wave.dat plot(ts(waveht)) ; plot(ts(waveht[1:60]))</p><p>The upper plot in Figure 2.3 shows the entire time series. There are no outlying</p><p>values. The lower plot is of the first sixty wave heights. We can see that there</p><p>is a tendency for consecutive values to be relatively similar and that the form</p><p>is like a rough sea, with a quasi-periodicity but no fixed frequency.</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>(</p><p>m</p><p>m</p><p>)</p><p>0 100 200 300 400</p><p>−</p><p>50</p><p>0</p><p>50</p><p>0</p><p>(a) Wave height over 39.7 seconds</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>(</p><p>m</p><p>m</p><p>)</p><p>0 10 20 30 40 50 60</p><p>−</p><p>60</p><p>0</p><p>0</p><p>60</p><p>0</p><p>(b) Wave height over 6 seconds</p><p>Fig. 2.3. Wave height at centre of tank sampled at 0.1 second intervals.</p><p>2.3 The correlogram 35</p><p>The autocorrelations of x are stored in the vector acf(x)$acf, with the</p><p>lag k autocorrelation located in acf(x)$acf[k+1]. For example, the lag 1</p><p>autocorrelation for waveht is</p><p>> acf(waveht)$acf[2]</p><p>[1] 0.47</p><p>The first entry, acf(waveht)$acf[1], is r0 and equals 1. A scatter plot, such</p><p>as Figure 2.1 for the Herald Square data, complements the calculation of</p><p>the correlation and alerts us to any non-linear patterns. In a similar way,</p><p>we can draw a scatter plot corresponding to each autocorrelation. For ex-</p><p>ample, for lag 1 we plot(waveht[1:396],waveht[2:397]) to obtain Figure</p><p>2.4. Autocovariances are obtained by adding an argument to acf. The lag 1</p><p>autocovariance is given by</p><p>> acf(waveht, type = c("covariance"))$acf[2]</p><p>[1] 33328</p><p>Fig. 2.4. Wave height pairs separated by a lag of 1.</p><p>2.3 The correlogram</p><p>2.3.1 General discussion</p><p>By default, the acf function produces a plot of rk against k, which is called</p><p>the correlogram. For example, Figure 2.5 gives the correlogram for the wave</p><p>heights obtained from acf(waveht). In general, correlograms have the follow-</p><p>ing features:</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>● ●●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>● ●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●●</p><p>−500 0 500</p><p>−</p><p>50</p><p>0</p><p>0</p><p>50</p><p>0</p><p>x_t</p><p>x_</p><p>t+</p><p>1</p><p>36 2 Correlation</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.5. Correlogram of wave heights.</p><p>• The x-axis gives the lag (k) and the y-axis gives the autocorrelation (rk) at</p><p>each lag. The unit of lag is the sampling interval, 0.1 second. Correlation</p><p>is dimensionless, so there is no unit for the y-axis.</p><p>• If ρk = 0, the sampling distribution of rk is approximately normal, with a</p><p>mean of −1/n and a variance of 1/n. The dotted lines on the correlogram</p><p>are drawn at</p><p>− 1</p><p>n</p><p>± 2√</p><p>n</p><p>If rk falls outside these lines, we have evidence against the null hypothesis</p><p>that ρk = 0 at the 5% level. However, we should be careful about inter-</p><p>preting multiple hypothesis tests. Firstly, if ρk does equal 0 at all lags k,</p><p>we expect 5% of the estimates, rk, to fall outside the lines. Secondly, the</p><p>rk are correlated, so if one falls outside the lines, the neighbouring ones are</p><p>more likely to be statistically significant. This will become clearer when</p><p>we simulate time series in Chapter 4. In the meantime, it is worth looking</p><p>for statistically significant values at specific lags</p><p>that have some practical</p><p>meaning (for example, the lag that corresponds to the seasonal period,</p><p>when there is one). For monthly series, a significant autocorrelation at lag</p><p>12 might indicate that the seasonal adjustment is not adequate.</p><p>• The lag 0 autocorrelation is always 1 and is shown on the plot. Its inclusion</p><p>helps us compare values of the other autocorrelations relative to the theo-</p><p>retical maximum of 1. This is useful because, if we have a long time series,</p><p>small values of rk that are of no practical consequence may be statistically</p><p>significant. However, some discernment is required to decide what consti-</p><p>tutes a noteworthy autocorrelation from a practical viewpoint. Squaring</p><p>the autocorrelation can help, as this gives the percentage of variability</p><p>explained by a linear relationship between the variables. For example, a</p><p>lag 1 autocorrelation of 0.1 implies that a linear dependency of xt on xt−1</p><p>2.3 The correlogram 37</p><p>would only explain 1% of the variability of xt. It is a common fallacy to</p><p>treat a statistically significant result as important when it has almost no</p><p>practical consequence.</p><p>• The correlogram for wave heights has a well-defined shape that appears</p><p>like a sampled damped cosine function. This is typical of correlograms</p><p>of time series generated by an autoregressive model of order 2. We cover</p><p>autoregressive models in Chapter 4.</p><p>If you look back at the plot of the air passenger bookings, there is a clear</p><p>seasonal pattern and an increasing trend (Fig. 1.1). It is not reasonable to</p><p>claim the time series is a realisation of a stationary model. But, whilst the</p><p>population acf was defined only for a stationary time series model, the sample</p><p>acf can be calculated for any time series, including deterministic signals. Some</p><p>results for deterministic signals are helpful for explaining patterns in the acf</p><p>of time series that we do not consider realisations of some stationary process:</p><p>• If you construct a time series that consists of a trend only, the integers from</p><p>1 up to 1000 for example, the acf decreases slowly and almost linearly from</p><p>1.</p><p>• If you take a large number of cycles of a discrete sinusoidal wave of any</p><p>amplitude and phase, the acf is a discrete cosine function of the same</p><p>period.</p><p>• If you construct a time series that consists of an arbitrary sequence of p</p><p>numbers repeated many times, the correlogram has a dominant spike of</p><p>almost 1 at lag p.</p><p>Usually a trend in the data will show in the correlogram as a slow decay in</p><p>the autocorrelations, which are large and positive due to similar values in the</p><p>series occurring close together in time. This can be seen in the correlogram for</p><p>the air passenger bookings acf(AirPassengers) (Fig. 2.6). If there is seasonal</p><p>variation, seasonal spikes will be superimposed on this pattern. The annual</p><p>cycle appears in the air passenger correlogram as a cycle of the same period</p><p>superimposed on the gradually decaying ordinates of the acf. This gives a</p><p>maximum at a lag of 1 year, reflecting a positive linear relationship between</p><p>pairs of variables (xt, xt+12) separated by 12-month periods. Conversely, be-</p><p>cause the seasonal trend is approximately sinusoidal, values separated by a</p><p>period of 6 months will tend to have a negative relationship. For example,</p><p>higher values tend to occur in the summer months followed by lower values</p><p>in the winter months. A dip in the acf therefore occurs at lag 6 months (or</p><p>0.5 years). Although this is typical for seasonal variation that is approximated</p><p>by a sinusoidal curve, other series may have patterns, such as high sales at</p><p>Christmas, that contribute a single spike to the correlogram.</p><p>2.3.2 Example based on air passenger series</p><p>Although we want to know about trends and seasonal patterns in a time series,</p><p>we do not necessarily rely on the correlogram to identify them. The main use</p><p>38 2 Correlation</p><p>0.0 0.5 1.0 1.5</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>lag (years)</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.6. Correlogram for the air passenger bookings over the period 1949–1960.</p><p>The gradual decay is typical of a time series containing a trend. The peak at 1 year</p><p>indicates seasonal variation.</p><p>of the correlogram is to detect autocorrelations in the time series after we</p><p>have removed an estimate of the trend and seasonal variation. In the code</p><p>below, the air passenger series is seasonally adjusted and the trend removed</p><p>using decompose. To plot the random component and draw the correlogram,</p><p>we need to remember that a consequence of using a centred moving average of</p><p>12 months to smooth the time series, and thereby estimate the trend, is that</p><p>the first six and last six terms in the random component cannot be calculated</p><p>and are thus stored in R as NA. The random component and correlogram are</p><p>shown in Figures 2.7 and 2.8, respectively.</p><p>> data(AirPassengers)</p><p>> AP AP.decom plot(ts(AP.decom$random[7:138]))</p><p>> acf(AP.decom$random[7:138])</p><p>The correlogram in Figure 2.8 suggests either a damped cosine shape that</p><p>is characteristic of an autoregressive model of order 2 (Chapter 4) or that the</p><p>seasonal adjustment has not been entirely effective. The latter explanation is</p><p>unlikely because the decomposition does estimate twelve independent monthly</p><p>indices. If we investigate further, we see that the standard deviation of the</p><p>original series from July until June is 109, the standard deviation of the series</p><p>after subtracting the trend estimate is 41, and the standard deviation after</p><p>seasonal adjustment is just 0.03.</p><p>> sd(AP[7:138])</p><p>2.3 The correlogram 39</p><p>Time</p><p>A</p><p>P</p><p>:r</p><p>an</p><p>do</p><p>m</p><p>0 20 40 60 80 100 120</p><p>0.</p><p>90</p><p>0.</p><p>95</p><p>1.</p><p>00</p><p>1.</p><p>05</p><p>Fig. 2.7. The random component of the air passenger series after removing the</p><p>trend and the seasonal variation.</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>lag (months)</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.8. Correlogram for the random component of air passenger bookings over</p><p>the period 1949–1960.</p><p>[1] 109</p><p>> sd(AP[7:138] - AP.decom$trend[7:138])</p><p>[1] 41.1</p><p>> sd(AP.decom$random[7:138])</p><p>[1] 0.0335</p><p>The reduction in the standard deviation shows that the seasonal adjustment</p><p>has been very effective.</p><p>40 2 Correlation</p><p>2.3.3 Example based on the Font Reservoir series</p><p>Monthly effective inflows (m3s−1) to the Font Reservoir in Northumberland</p><p>for the period from January 1909 until December 1980 have been provided by</p><p>Northumbrian Water PLC. A plot of the data is shown in Figure 2.9. There</p><p>was a slight decreasing trend over this period, and substantial seasonal vari-</p><p>ation. The trend and seasonal variation have been estimated by regression,</p><p>as described in Chapter 5, and the residual series (adflow), which we anal-</p><p>yse here, can reasonably be considered a realisation from a stationary time</p><p>series model. The main difference between the regression approach and us-</p><p>ing decompose is that the former assumes a linear trend, whereas the latter</p><p>smooths the time series without assuming any particular form for the trend.</p><p>The correlogram is plotted in Figure 2.10.</p><p>> www Fontdsdt.dat attach(Fontdsdt.dat)</p><p>> plot(ts(adflow), ylab = 'adflow')</p><p>> acf(adflow, xlab = 'lag (months)', main="")</p><p>Time</p><p>ad</p><p>flo</p><p>w</p><p>0 200 400 600 800</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>1.</p><p>5</p><p>2.</p><p>5</p><p>Fig. 2.9. Adjusted inflows to the Font Reservoir, 1909–1980.</p><p>There is a statistically significant correlation at lag 1. The physical inter-</p><p>pretation is that the inflow next month is more likely than not to be above</p><p>average if the inflow this month is above average. Similarly, if the inflow this</p><p>month is below average it is more likely than not that next month’s inflow</p><p>will be below average. The explanation is that the groundwater supply can be</p><p>thought of as a slowly discharging reservoir. If groundwater is high one month</p><p>it will augment inflows, and is likely to do so next month as well. Given this</p><p>2.4 Covariance of sums of random variables 41</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>lag (months)</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.10. Correlogram for adjusted inflows to the Font Reservoir,</p><p>1909–1980.</p><p>explanation, you may be surprised that the lag 1 correlation is not higher.</p><p>The explanation for this is that most of the inflow is runoff following rainfall,</p><p>and in Northumberland there is little correlation between seasonally adjusted</p><p>rainfall in consecutive months. An exponential decay in the correlogram is</p><p>typical of a first-order autoregressive model (Chapter 4). The correlogram of</p><p>the adjusted inflows is consistent with an exponential decay. However, given</p><p>the sampling errors for a time series of this length, estimates of autocorre-</p><p>lation at higher lags are unlikely to be statistically significant. This is not a</p><p>practical limitation because such low correlations are inconsequential. When</p><p>we come to identify suitable models, we should remember that there is no one</p><p>correct model and that there will often be a choice of suitable models. We</p><p>may make use of a specific statistical criterion such as Akaike’s information</p><p>criterion, introduced in Chapter 5, to choose a model, but this does not imply</p><p>that the model is correct.</p><p>2.4 Covariance of sums of random variables</p><p>In subsequent chapters, second-order properties for several time series models</p><p>are derived using the result shown in Equation (2.15). Let x1, x2, . . . , xn and</p><p>y1, y2, . . . , ym be random variables. Then</p><p>Cov</p><p> n∑</p><p>i=1</p><p>xi,</p><p>m∑</p><p>j=1</p><p>yj</p><p> =</p><p>n∑</p><p>i=1</p><p>m∑</p><p>j=1</p><p>Cov(xi, yj) (2.15)</p><p>where Cov(x, y) is the covariance between a pair of random variables x and</p><p>y. The result tells us that the covariance of two sums of variables is the sum</p><p>42 2 Correlation</p><p>of all possible covariance pairs of the variables. Note that the special case of</p><p>n = m and xi = yi (i = 1, . . . , n) occurs in subsequent chapters for a time</p><p>series {xt}. The proof of Equation (2.15) is left to Exercise 5a.</p><p>2.5 Summary of commands used in examples</p><p>mean returns the mean (average)</p><p>var returns the variance with denominator n− 1</p><p>sd returns the standard deviation</p><p>cov returns the covariance with denominator n− 1</p><p>cor returns the correlation</p><p>acf returns the correlogram (or sets the argument</p><p>to obtain autocovariance function)</p><p>2.6 Exercises</p><p>1. On the book’s website, you will find two small bivariate data sets that are</p><p>not time series. Draw a scatter plot for each set and then calculate the</p><p>correlation. Comment on your results.</p><p>a) The data in the file varnish.dat are the amount of catalyst in a var-</p><p>nish, x, and the drying time of a set volume in a petri dish, y.</p><p>b) The data in the file guesswhat.dat are data pairs. Can you see a</p><p>pattern? Can you guess what they represent?</p><p>2. The following data are the volumes, relative to nominal contents of 750 ml,</p><p>of 16 bottles taken consecutively from the filling machine at the Serendip-</p><p>ity Shiraz vineyard:</p><p>39, 35, 16, 18, 7, 22, 13, 18, 20, 9, −12, −11, −19, −9, −2, 16.</p><p>The following are the volumes, relative to nominal contents of 750 ml, of</p><p>consecutive bottles taken from the filling machine at the Cagey Chardon-</p><p>nay vineyard:</p><p>47, −26, 42, −10, 27, −8, 16, 6, −1, 25, 11, 1, 25, 7, −5, 3</p><p>The data are also available from the website in the file ch2ex2.dat.</p><p>a) Produce time plots of the two time series.</p><p>b) For each time series, draw a lag 1 scatter plot.</p><p>c) Produce the acf for both time series and comment.</p><p>2.6 Exercises 43</p><p>3. Carry out the following exploratory time series analysis using the global</p><p>temperature series from §1.4.5.</p><p>a) Decompose the series into the components trend, seasonal effect, and</p><p>residuals. Plot these components. Would you expect these data to have</p><p>a substantial seasonal component? Compare the standard deviation of</p><p>the original series with the deseasonalised series. Produce a plot of the</p><p>trend with a superimposed seasonal effect.</p><p>b) Plot the correlogram of the residuals (random component) from part</p><p>(a). Comment on the plot, with particular reference to any statistically</p><p>significant correlations.</p><p>4. The monthly effective inflows (m3s−1) to the Font Reservoir are in the file</p><p>Font.dat. Use decompose on the time series and then plot the correlogram</p><p>of the random component. Compare this with Figure 2.10 and comment.</p><p>5. a) Prove Equation (2.15), using the following properties of summation,</p><p>expectation, and covariance:∑n</p><p>i=1 xi</p><p>∑m</p><p>j=1 yj =</p><p>∑n</p><p>i=1</p><p>∑m</p><p>j=1 xiyj</p><p>E [</p><p>∑n</p><p>i=1 xi] =</p><p>∑n</p><p>i=1E (xi)</p><p>Cov (x, y) = E (xy)− E (x)E (y)</p><p>b) By taking n = m = 2 and xi = yi in Equation (2.15), derive the</p><p>well-known result</p><p>Var (x+ y) = Var (x) + Var (y) + 2 Cov (x, y)</p><p>c) Verify the result in part (b) above using R with x and y (CO and</p><p>Benzoa, respectively) taken from §2.2.1.</p><p>3</p><p>Forecasting Strategies</p><p>3.1 Purpose</p><p>Businesses rely on forecasts of sales to plan production, justify marketing de-</p><p>cisions, and guide research. A very efficient method of forecasting one variable</p><p>is to find a related variable that leads it by one or more time intervals. The</p><p>closer the relationship and the longer the lead time, the better this strategy</p><p>becomes. The trick is to find a suitable lead variable. An Australian example</p><p>is the Building Approvals time series published by the Australian Bureau of</p><p>Statistics. This provides valuable information on the likely demand over the</p><p>next few months for all sectors of the building industry. A variation on the</p><p>strategy of seeking a leading variable is to find a variable that is associated</p><p>with the variable we need to forecast and easier to predict.</p><p>In many applications, we cannot rely on finding a suitable leading variable</p><p>and have to try other methods. A second approach, common in marketing,</p><p>is to use information about the sales of similar products in the past. The in-</p><p>fluential Bass diffusion model is based on this principle. A third strategy is</p><p>to make extrapolations based on present trends continuing and to implement</p><p>adaptive estimates of these trends. The statistical technicalities of forecast-</p><p>ing are covered throughout the book, and the purpose of this chapter is to</p><p>introduce the general strategies that are available.</p><p>3.2 Leading variables and associated variables</p><p>3.2.1 Marine coatings</p><p>A leading international marine paint company uses statistics available in the</p><p>public domain to forecast the numbers, types, and sizes of ships to be built</p><p>over the next three years. One source of such information is World Shipyard</p><p>Monitor, which gives brief details of orders in over 300 shipyards. The paint</p><p>company has set up a database of ship types and sizes from which it can</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 45</p><p>Use R, DOI 10.1007/978-0-387-88698-5 3,</p><p>© Springer Science+Business Media, LLC 2009</p><p>46 3 Forecasting Strategies</p><p>forecast the areas to be painted and hence the likely demand for paint. The</p><p>company monitors its market share closely and uses the forecasts for planning</p><p>production and setting prices.</p><p>3.2.2 Building approvals publication</p><p>Building approvals and building activity time series</p><p>The Australian Bureau of Statistics publishes detailed data on building ap-</p><p>provals for each month, and, a few weeks later, the Building Activity Publi-</p><p>cation lists the value of building work done in each quarter. The data in the</p><p>file ApprovActiv.dat are the total dwellings approved per month, averaged</p><p>over the past three months, labelled “Approvals”, and the value of work done</p><p>over the past three months (chain volume measured in millions of Australian</p><p>dollars at the reference year 2004–05 prices), labelled “Activity”, from March</p><p>1996 until September 2006. We start by reading the data into R and then</p><p>construct time series objects and plot the two series on the same graph using</p><p>ts.plot (Fig. 3.1).</p><p>> www Build.dat App.ts Act.ts ts.plot(App.ts, Act.ts, lty = c(1,3))</p><p>Time(quarter)</p><p>D</p><p>w</p><p>el</p><p>lin</p><p>gs</p><p>/m</p><p>on</p><p>th</p><p>;</p><p>bu</p><p>ild</p><p>in</p><p>g</p><p>A</p><p>U</p><p>$</p><p>1996 1998 2000 2002 2004 2006</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>Fig. 3.1. Building approvals (solid line) and building activity (dotted</p><p>line).</p><p>In Figure 3.1, we can see that the building activity tends to lag one quarter</p><p>behind the building approvals, or equivalently that the building approvals ap-</p><p>pear to lead the building activity by a quarter. The cross-correlation function,</p><p>3.2 Leading variables and associated variables 47</p><p>which is abbreviated to ccf, can be used to quantify this relationship. A plot of</p><p>the cross-correlation function against lag is referred to as a cross-correlogram.</p><p>Cross-correlation</p><p>Suppose we have time series models for variables x and y that are stationary</p><p>in the mean and the variance. The variables may each be serially correlated,</p><p>and correlated with each other at different time lags. The combined model is</p><p>second-order stationary if all these correlations depend only on the lag, and</p><p>then we can define the cross covariance function (ccvf ), γk(x, y), as a function</p><p>of the lag, k:</p><p>γk(x, y) = E [(xt+k − µx)(yt − µy)] (3.1)</p><p>This is not a symmetric relationship, and the variable x is lagging variable</p><p>y by k. If x is the input to some physical system and y is the response, the</p><p>cause will precede the effect, y will lag x, the ccvf will be 0 for positive k, and</p><p>there will be spikes in the ccvf at negative lags. Some textbooks define ccvf</p><p>with the variable y lagging when k is positive, but we have used the definition</p><p>that is consistent with R. Whichever way you choose to define the ccvf,</p><p>γk(x, y) = γ−k(y, x) (3.2)</p><p>When we have several variables and wish to refer to the acvf of one rather</p><p>than the ccvf of a pair, we can write it as, for example, γk(x, x). The lag k</p><p>cross-correlation function (ccf ), ρk(x, y), is defined by</p><p>ρk(x, y) =</p><p>γk(x, y)</p><p>σxσy</p><p>. (3.3)</p><p>The ccvf and ccf can be estimated from a time series by their sample</p><p>equivalents. The sample ccvf, ck(x, y), is calculated as</p><p>ck(x, y) =</p><p>1</p><p>n</p><p>n−k∑</p><p>t=1</p><p>(</p><p>xt+k − x</p><p>)(</p><p>yt − y</p><p>)</p><p>(3.4)</p><p>The sample acf is defined as</p><p>rk(x, y) =</p><p>ck(x, y)√</p><p>c0(x, x)c0(y, y)</p><p>(3.5)</p><p>Cross-correlation between building approvals and activity</p><p>The ts.union function binds time series with a common frequency, padding</p><p>with ‘NA’s to the union of their time coverages. If ts.union is used within</p><p>the acf command, R returns the correlograms for the two variables and the</p><p>cross-correlograms in a single figure.</p><p>48 3 Forecasting Strategies</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>App.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>App.ts & Act.ts</p><p>−3.0 −2.0 −1.0 0.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Act.ts & App.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>Act.ts</p><p>Fig. 3.2. Correlogram and cross-correlogram for building approvals and building</p><p>activity.</p><p>> acf(ts.union(App.ts, Act.ts))</p><p>In Figure 3.2, the acfs for x and y are in the upper left and lower right</p><p>frames, respectively, and the ccfs are in the lower left and upper right frames.</p><p>The time unit for lag is one year, so a correlation at a lag of one quarter ap-</p><p>pears at 0.25. If the variables are independent, we would expect 5% of sample</p><p>correlations to lie outside the dashed lines. Several of the cross-correlations</p><p>at negative lags do pass these lines, indicating that the approvals time series</p><p>is leading the activity. Numerical values can be printed using the print()</p><p>function, and are 0.432, 0.494, 0.499, and 0.458 at lags of 0, 1, 2, and 3, re-</p><p>spectively. The ccf can be calculated for any two time series that overlap,</p><p>but if they both have trends or similar seasonal effects, these will dominate</p><p>(Exercise 1). It may be that common trends and seasonal effects are precisely</p><p>what we are looking for, but the population ccf is defined for stationary ran-</p><p>dom processes and it is usual to remove the trend and seasonal effects before</p><p>investigating cross-correlations. Here we remove the trend using decompose,</p><p>which uses a centred moving average of the four quarters (see Fig. 3.3). We</p><p>will discuss the use of ccf in later chapters.</p><p>3.2 Leading variables and associated variables 49</p><p>> app.ran app.ran.ts act.ran act.ran.ts acf (ts.union(app.ran.ts, act.ran.ts))</p><p>> ccf (app.ran.ts, act.ran.ts)</p><p>We again use print() to obtain the following table.</p><p>> print(acf(ts.union(app.ran.ts, act.ran.ts)))</p><p>app.ran.ts act.ran.ts</p><p>1.000 ( 0.00) 0.123 ( 0.00)</p><p>0.422 ( 0.25) 0.704 (-0.25)</p><p>-0.328 ( 0.50) 0.510 (-0.50)</p><p>-0.461 ( 0.75) -0.135 (-0.75)</p><p>-0.400 ( 1.00) -0.341 (-1.00)</p><p>-0.193 ( 1.25) -0.187 (-1.25)</p><p>...</p><p>app.ran.ts act.ran.ts</p><p>0.123 ( 0.00) 1.000 ( 0.00)</p><p>-0.400 ( 0.25) 0.258 ( 0.25)</p><p>-0.410 ( 0.50) -0.410 ( 0.50)</p><p>-0.250 ( 0.75) -0.411 ( 0.75)</p><p>0.071 ( 1.00) -0.112 ( 1.00)</p><p>0.353 ( 1.25) 0.180 ( 1.25)</p><p>...</p><p>The ccf function produces a single plot, shown in Figure 3.4, and again</p><p>shows the lagged relationship. The Australian Bureau of Statistics publishes</p><p>the building approvals by state and by other categories, and specific sectors of</p><p>the building industry may find higher correlations between demand for their</p><p>products and one of these series than we have seen here.</p><p>3.2.3 Gas supply</p><p>Gas suppliers typically have to place orders for gas from offshore fields 24 hours</p><p>ahead. Variation about the average use of gas, for the time of year, depends</p><p>on temperature and, to some extent, humidity and wind speed. Coleman et al.</p><p>(2001) found that the weather accounts for 90% of this variation in the United</p><p>Kingdom. Weather forecasts for the next 24 hours are now quite accurate and</p><p>are incorporated into the forecasting procedure.</p><p>50 3 Forecasting Strategies</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>app.ran.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>app.ran.ts & act.ran.ts</p><p>−3.0 −2.0 −1.0 0.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>act.ran.ts & app.ran.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>act.ran.ts</p><p>Fig. 3.3. Correlogram and cross-correlogram of the random components of building</p><p>approvals and building activity after using decompose.</p><p>−3 −2 −1 0 1 2 3</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>app.ran.ts & act.ran.ts</p><p>Fig. 3.4. Cross-correlogram of the random components of building approvals and</p><p>building activity after using decompose.</p><p>3.3 Bass model 51</p><p>3.3 Bass model</p><p>3.3.1 Background</p><p>Frank Bass published a paper describing his mathematical model, which quan-</p><p>tified the theory of adoption and diffusion of a new product by society (Rogers,</p><p>1962), in Management Science nearly fifty years ago (Bass, 1969). The mathe-</p><p>matics is straightforward, and the model has been influential in marketing. An</p><p>entrepreneur with a new invention will often use the Bass model when mak-</p><p>ing a case for funding. There is an associated demand for market research, as</p><p>demonstrated, for example, by the Marketing Science Centre at the Univer-</p><p>sity of South Australia becoming the Ehrenberg-Bass Institute for Marketing</p><p>Science in 2005.</p><p>3.3.2 Model definition</p><p>The Bass formula for the number of people, Nt, who have bought a product at</p><p>time t depends on three parameters: the total number of people who eventually</p><p>buy the product, m; the coefficient of innovation, p; and the coefficient of</p><p>imitation, q. The Bass formula is</p><p>Nt+1 = Nt + p(m−Nt) + qNt(m−Nt)/m (3.6)</p><p>According to the model, the increase in sales, Nt+1 −Nt, over the next time</p><p>period is equal to the sum of a fixed proportion p and a time varying proportion</p><p>qNt</p><p>m of people who will eventually buy the product but have not yet done so.</p><p>The rationale for the model is that initial sales will be to people who are</p><p>interested in the novelty of the product, whereas later sales will be to people</p><p>who are drawn to the product after seeing their friends and acquaintances use</p><p>it. Equation (3.6) is a difference equation and its solution is</p><p>Nt = m</p><p>1− e−(p+q)t</p><p>1 + (q/p)e−(p+q)t</p><p>(3.7)</p><p>It is easier to verify this result for the continuous-time version of the model.</p><p>3.3.3 Interpretation of the Bass model*</p><p>One interpretation of the Bass model is that the time from product launch</p><p>until purchase is assumed to have a probability</p><p>distribution that can be</p><p>parametrised in terms of p and q. A plot of sales per time unit against time is</p><p>obtained by multiplying the probability density by the number of people, m,</p><p>who eventually buy the product. Let f(t), F (t), and h(t) be the density, cumu-</p><p>lative distribution function (cdf), and hazard, respectively, of the distribution</p><p>of time until purchase. The definition of the hazard is</p><p>52 3 Forecasting Strategies</p><p>h(t) =</p><p>f(t)</p><p>1− F (t)</p><p>(3.8)</p><p>The interpretation of the hazard is that if it is multiplied by a small time</p><p>increment it gives the probability that a random purchaser who has not yet</p><p>made the purchase will do so in the next small time increment (Exercise 2).</p><p>Then the continuous time model of the Bass formula can be expressed in terms</p><p>of the hazard:</p><p>h(t) = p+ qF (t) (3.9)</p><p>Equation (3.6) is the discrete form of Equation (3.9) (Exercise 2). The solution</p><p>of Equation (3.8), with h(t) given by Equation (3.9), for F (t) is</p><p>F (t) =</p><p>1− e−(p+q)t</p><p>1 + (q/p)e−(p+q)t</p><p>(3.10)</p><p>Two special cases of the distribution are the exponential distribution and lo-</p><p>gistic distribution, which arise when q = 0 and p = 0, respectively. The logistic</p><p>distribution closely resembles the normal distribution (Exercise 3). Cumula-</p><p>tive sales are given by the product of m and F (t). The pdf is the derivative</p><p>of Equation (3.10):</p><p>f(t) =</p><p>(p+ q)2e−(p+q)t</p><p>p</p><p>[</p><p>1 + (q/p)e−(p+q)t</p><p>]2 (3.11)</p><p>Sales per unit time at time t are</p><p>S(t) = mf(t) =</p><p>m(p+ q)2e−(p+q)t</p><p>p</p><p>[</p><p>1 + (q/p)e−(p+q)t</p><p>]2 (3.12)</p><p>The time to peak is</p><p>tpeak =</p><p>log(q)− log(p)</p><p>p+ q</p><p>(3.13)</p><p>3.3.4 Example</p><p>We show a typical Bass curve by fitting Equation (3.12) to yearly sales of</p><p>VCRs in the US home market between 1980 and 1989 (Bass website) using</p><p>the R non-linear least squares function nls. The variable T79 is the year from</p><p>1979, and the variable Tdelt is the time from 1979 at a finer resolution of</p><p>0.1 year for plotting the Bass curves. The cumulative sum function cumsum is</p><p>useful for monitoring changes in the mean level of the process (Exercise 8).</p><p>> T79 Tdelt Sales Cusales Bass.nls summary(Bass.nls)</p><p>3.3 Bass model 53</p><p>Parameters:</p><p>Estimate Std. Error t value Pr(>|t|)</p><p>M 6.798e+04 3.128e+03 21.74 1.10e-07 ***</p><p>P 6.594e-03 1.430e-03 4.61 0.00245 **</p><p>Q 6.381e-01 4.140e-02 15.41 1.17e-06 ***</p><p>Residual standard error: 727.2 on 7 degrees of freedom</p><p>The final estimates for m, p, and q, rounded to two significant places, are</p><p>68000, 0.0066, and 0.64 respectively. The starting values for P and Q are p and</p><p>q for a typical product. We assume the sales figures are prone to error and</p><p>estimate the total sales, m, setting the starting value for M to the recorded</p><p>total sales. The data and fitted curve can be plotted using the code below (see</p><p>Fig. 3.5 and 3.6):</p><p>> Bcoef m p q ngete Bpdf plot(Tdelt, Bpdf, xlab = "Year from 1979",</p><p>ylab = "Sales per year", type='l')</p><p>> points(T79, Sales)</p><p>> Bcdf plot(Tdelt, Bcdf, xlab = "Year from 1979",</p><p>ylab = "Cumulative sales", type='l')</p><p>> points(T79, Cusales)</p><p>Fig. 3.5. Bass sales curve fitted to sales of VCRs in the US home market, 1980–1989.</p><p>0 2 4 6 8 10</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>Year from 1979</p><p>S</p><p>al</p><p>es</p><p>p</p><p>er</p><p>y</p><p>ea</p><p>r</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>54 3 Forecasting Strategies</p><p>Fig. 3.6. Bass cumulative sales curve, obtained as the integral of the sales curve,</p><p>and cumulative sales of VCRs in the US home market, 1980–1989.</p><p>It is easy to fit a curve to past sales data. The importance of the Bass</p><p>curve in marketing is in forecasting, which needs values for the parameters m,</p><p>p, and q. Plausible ranges for the parameter values can be based on published</p><p>data for similar categories of past inventions, and a few examples follow.</p><p>Product m p q Reference</p><p>Typical product - 0.030 0.380 VBM1</p><p>35 mm projectors, 1965–1986 3.37 million 0.009 0.173 Bass2</p><p>Overhead projectors, 1960–1970 0.961 million 0.028 0.311 Bass</p><p>PCs, 1981–2010 3.384 billion 0.001 0.195 Bass</p><p>1Value-Based Management; 2Frank M. Bass, 1999.</p><p>Although the forecasts are inevitably uncertain, they are the best informa-</p><p>tion available when making marketing and investment decisions. A prospectus</p><p>for investors or a report to the management team will typically include a set</p><p>of scenarios based on the most likely, optimistic, and pessimistic sets of pa-</p><p>rameters.</p><p>The basic Bass model does not allow for replacement sales and multiple</p><p>purchases. Extensions of the model that allow for replacement sales, multiple</p><p>purchases, and the effects of pricing and advertising in a competitive market</p><p>have been proposed (for example, Mahajan et al. 2000). However, there are</p><p>several reasons why these refinements may be of less interest to investors than</p><p>you might expect. The first is that the profit margin on manufactured goods,</p><p>such as innovative electronics and pharmaceuticals, will drop dramatically</p><p>once patent protection expires and competitors enter the market. A second</p><p>reason is that successful inventions are often superseded by new technology, as</p><p>0 2 4 6 8 10</p><p>0</p><p>20</p><p>00</p><p>0</p><p>40</p><p>00</p><p>0</p><p>60</p><p>00</p><p>0</p><p>Year from 1979</p><p>C</p><p>um</p><p>ul</p><p>at</p><p>iv</p><p>e</p><p>sa</p><p>le</p><p>s</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>3.4 Exponential smoothing and the Holt-Winters method 55</p><p>VCRs have been by DVD players, and replacement sales are limited. Another</p><p>reason is that many investors are primarily interested in a relatively quick</p><p>return on their money. You are asked to consider Bass models for sales of two</p><p>recent 3G mobile communication devices in Exercise 4.</p><p>3.4 Exponential smoothing & the Holt-Winters method</p><p>3.4.1 Exponential smoothing</p><p>Our objective is to predict some future value xn+k given a past history</p><p>{x1, x2, . . . , xn} of observations up to time n. In this subsection we assume</p><p>there is no systematic trend or seasonal effects in the process, or that these</p><p>have been identified and removed. The mean of the process can change from</p><p>one time step to the next, but we have no information about the likely direction</p><p>of these changes. A typical application is forecasting sales of a well-established</p><p>product in a stable market. The model is</p><p>xt = µt + wt (3.14)</p><p>where µt is the non-stationary mean of the process at time t and wt are</p><p>independent random deviations with a mean of 0 and a standard deviation σ.</p><p>We will follow the notation in R and let at be our estimate of µt. Given that</p><p>there is no systematic trend, an intuitively reasonable estimate of the mean</p><p>at time t is given by a weighted average of our observation at time t and our</p><p>estimate of the mean at time t− 1:</p><p>at = αxt + (1− α)at−1 0</p><p>Strategies</p><p>Equation (3.15), for at, can be rewritten in two other useful ways. Firstly,</p><p>we can write the sum of at−1 and a proportion of the one-step-ahead forecast</p><p>error, xt − at−1,</p><p>at = α(xt − at−1) + at−1 (3.17)</p><p>Secondly, by repeated back substitution we obtain</p><p>at = αxt + α(1− α)xt−1 + α(1− α)2xt−2 + . . . (3.18)</p><p>When written in this form, we see that at is a linear combination of the current</p><p>and past observations, with more weight given to the more recent observations.</p><p>The restriction 0 www Motor.dat Comp.ts plot(Comp.ts, xlab = "Time / months", ylab = "Complaints")</p><p>3.4 Exponential smoothing and the Holt-Winters method 57</p><p>Time (months)</p><p>C</p><p>om</p><p>pl</p><p>ai</p><p>nt</p><p>s</p><p>1996 1997 1998 1999 2000</p><p>5</p><p>10</p><p>15</p><p>20</p><p>25</p><p>30</p><p>35</p><p>Fig. 3.7. Monthly numbers of letters of complaint received by a motoring organi-</p><p>sation.</p><p>There is no evidence of a systematic trend or seasonal effects, so it seems</p><p>reasonable to use exponential smoothing for this time series. Exponential</p><p>smoothing is a special case of the Holt-Winters algorithm, which we intro-</p><p>duce in the next section, and is implemented in R using the HoltWinters</p><p>function with the additional parameters set to 0. If we do not specify a value</p><p>for α, R will find the value that minimises the one-step-ahead prediction error.</p><p>> Comp.hw1 plot(Comp.hw1)</p><p>Holt-Winters exponential smoothing without trend and without seasonal</p><p>component.</p><p>Smoothing parameters:</p><p>alpha: 0.143</p><p>beta : 0</p><p>gamma: 0</p><p>Coefficients:</p><p>[,1]</p><p>a 17.70</p><p>> Comp.hw1$SSE</p><p>[1] 2502</p><p>The estimated value of the mean number of letters of complaint per month</p><p>at the end of 1999 is 17.7. The value of α that gives a minimum SS1PE, of</p><p>2502, is 0.143. We now compare these results with those obtained if we specify</p><p>a value for α of 0.2.</p><p>58 3 Forecasting Strategies</p><p>> Comp.hw2 Comp.hw2</p><p>...</p><p>alpha: 0.2</p><p>beta : 0</p><p>gamma: 0</p><p>Coefficients:</p><p>[,1]</p><p>a 17.98</p><p>> Comp.hw2$SSE</p><p>[1] 2526</p><p>Holt−Winters filtering</p><p>Time</p><p>O</p><p>bs</p><p>er</p><p>ve</p><p>d</p><p>/ F</p><p>itt</p><p>ed</p><p>10 20 30 40</p><p>5</p><p>10</p><p>15</p><p>20</p><p>25</p><p>30</p><p>35</p><p>Fig. 3.8. Monthly numbers of letters and exponentially weighted moving average.</p><p>The estimated value of the mean number of letters of complaint per month</p><p>at the end of 1999 is now 18.0, and the SS1PE has increased slightly to 2526.</p><p>The advantage of letting R estimate a value for α is that it is optimum for a</p><p>practically important criterion, SS1PE, and that it removes the need to make</p><p>a choice. However, the optimum estimate can be close to 0 if we have a long</p><p>time series over a stable period, and this makes the EWMA unresponsive to</p><p>any future change in mean level. From Figure 3.8, it seems that there was a</p><p>decrease in the number of complaints at the start of the period and a slight rise</p><p>towards the end, although this has not yet affected the exponentially weighted</p><p>moving average.</p><p>3.4 Exponential smoothing and the Holt-Winters method 59</p><p>3.4.2 Holt-Winters method</p><p>We usually have more information about the market than exponential smooth-</p><p>ing can take into account. Sales are often seasonal, and we may expect trends</p><p>to be sustained for short periods at least. But trends will change. If we have</p><p>a successful invention, sales will increase initially but then stabilise before de-</p><p>clining as competitors enter the market. We will refer to the change in level</p><p>from one time period to the next as the slope.1 Seasonal patterns can also</p><p>change due to vagaries of fashion and variation in climate, for example. The</p><p>Holt-Winters method was suggested by Holt (1957) and Winters (1960), who</p><p>were working in the School of Industrial Administration at Carnegie Institute</p><p>of Technology, and uses exponentially weighted moving averages to update</p><p>estimates of the seasonally adjusted mean (called the level), slope, and sea-</p><p>sonals.</p><p>The Holt-Winters method generalises Equation (3.15), and the additive</p><p>seasonal form of their updating equations for a series {xt} with period p is</p><p>at = α(xt − st−p) + (1− α)(at−1 + bt−1)</p><p>bt = β(at − at−1) + (1− β)bt−1</p><p>st = γ(xt − at) + (1− γ)st−p</p><p> (3.21)</p><p>where at, bt, and st are the estimated level,2 slope, and seasonal effect at time</p><p>t, and α, β, and γ are the smoothing parameters. The first updating equation</p><p>takes a weighted average of our latest observation, with our existing estimate</p><p>of the appropriate seasonal effect subtracted, and our forecast of the level</p><p>made one time step ago. The one-step-ahead forecast of the level is the sum</p><p>of the estimates of the level and slope at the time of forecast. A typical choice</p><p>of the weight α is 0.2. The second equation takes a weighted average of our</p><p>previous estimate and latest estimate of the slope, which is the difference in</p><p>the estimated level at time t and the estimated level at time t− 1. Note that</p><p>the second equation can only be used after the first equation has been applied</p><p>to get at. Finally, we have another estimate of the seasonal effect, from the</p><p>difference between the observation and the estimate of the level, and we take</p><p>a weighted average of this and the last estimate of the seasonal effect for this</p><p>season, which was made at time t− p. Typical choices of the weights β and γ</p><p>are 0.2. The updating equations can be started with a1 = x1 and initial slope,</p><p>b1, and seasonal effects, s1, . . . , sp, reckoned from experience, estimated from</p><p>the data in some way, or set at 0. The default in R is to use values obtained</p><p>from the decompose procedure.</p><p>The forecasting equation for xn+k made after the observation at time n is</p><p>x̂n+k|n = an + kbn + sn+k−p k ≤ p (3.22)</p><p>1 When describing the Holt-Winters procedure, the R help and many textbooks</p><p>refer to the slope as the trend.</p><p>2 The mean of the process is the sum of the level and the appropriate seasonal</p><p>effect.</p><p>60 3 Forecasting Strategies</p><p>where an is the estimated level and bn is the estimated slope, so an+kbn is the</p><p>expected level at time n+k and sn+k−p is the exponentially weighted estimate</p><p>of the seasonal effect made at time n = k− p. For example, for monthly data</p><p>(p = 12), if time n + 1 occurs in January, then sn+1−12 is the exponentially</p><p>weighted estimate of the seasonal effect for January made in the previous year.</p><p>The forecasting</p><p>equation can be used for lead times between (m−1)p+1 and</p><p>mp, but then the most recent exponentially weighted estimate of the seasonal</p><p>effect available will be sn+k−(m−1)p.</p><p>The Holt-Winters algorithm with multiplicative seasonals is</p><p>an = α</p><p>(</p><p>xn</p><p>sn−p</p><p>)</p><p>+ (1− α)(an−1 + bn−1)</p><p>bn = β(an − an−1) + (1− β)bn−1</p><p>sn = γ</p><p>(</p><p>xn</p><p>an</p><p>)</p><p>+ (1− γ)sn−p</p><p> (3.23)</p><p>The forecasting equation for xn+k made after the observation at time n</p><p>becomes</p><p>x̂n+k|n = (an + kbn)sn+k−p k ≤ p (3.24)</p><p>In R, the function HoltWinters can be used to estimate smoothing param-</p><p>eters for the Holt-Winters model by minimising the one-step-ahead prediction</p><p>errors (SS1PE).</p><p>Sales of Australian wine</p><p>The data in the file wine.dat are monthly sales of Australian wine by category,</p><p>in thousands of litres, from January 1980 until July 1995. The categories are</p><p>fortified white, dry white, sweet white, red, rose, and sparkling. The sweet</p><p>white wine time series is plotted in Figure 3.9, and there is a dramatic increase</p><p>in sales in the second half of the 1980s followed by a reduction to a level well</p><p>above the starting values. The seasonal variation looks as though it would be</p><p>better modelled as multiplicative, and comparison of the SS1PE for the fitted</p><p>models confirms this (Exercise 6). Here we present results for the model with</p><p>multiplicative seasonals only. The Holt-Winters components and fitted values</p><p>are shown in Figures 3.10 and 3.11 respectively.</p><p>> www wine.dat sweetw.ts plot(sweetw.ts, xlab= "Time (months)", ylab = "sales (1000 litres)")</p><p>> sweetw.hw sweetw.hw ; sweetw.hw$coef ; sweetw.hw$SSE</p><p>...</p><p>Smoothing parameters:</p><p>alpha: 0.4107</p><p>beta : 0.0001516</p><p>3.4 Exponential smoothing and the Holt-Winters method 61</p><p>gamma: 0.4695</p><p>...</p><p>> sqrt(sweetw.hw$SSE/length(sweetw))</p><p>[1] 50.04</p><p>> sd(sweetw)</p><p>[1] 121.4</p><p>> plot (sweetw.hw$fitted)</p><p>> plot (sweetw.hw)</p><p>Time(months)</p><p>S</p><p>al</p><p>es</p><p>(</p><p>10</p><p>00</p><p>li</p><p>tr</p><p>es</p><p>)</p><p>1980 1985 1990 1995</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 3.9. Sales of Australian sweet white wine.</p><p>The optimum values for the smoothing parameters, based on minimising</p><p>the one-step ahead prediction errors, are 0.4107, 0.0001516, and 0.4695 for α,</p><p>β, and γ, respectively. It follows that the level and seasonal variation adapt</p><p>rapidly whereas the trend is slow to do so. The coefficients are the estimated</p><p>values of the level, slope, and multiplicative seasonals from January to De-</p><p>cember available at the latest time point (t = n = 187), and these are the</p><p>values that will be used for predictions (Exercise 6). Finally, we have calcu-</p><p>lated the mean square one-step-ahead prediction error, which equals 50, and</p><p>have compared it with the standard deviation of the original time series which</p><p>is 121. The decrease is substantial, but a more testing comparison would be</p><p>with the mean one-step-ahead prediction error if we forecast the next month’s</p><p>sales as equal to this month’s sales (Exercise 6). Also, in Exercise 6 you are</p><p>asked to investigate the performance of the Holt-Winters algorithm if the</p><p>three smoothing parameters are all set equal to 0.2 and if the values for the</p><p>parameters are optimised at each time step.</p><p>62 3 Forecasting Strategies</p><p>10</p><p>0</p><p>40</p><p>0</p><p>xh</p><p>at</p><p>10</p><p>0</p><p>30</p><p>0</p><p>Le</p><p>ve</p><p>l</p><p>0.</p><p>40</p><p>0.</p><p>43</p><p>T</p><p>re</p><p>nd</p><p>0.</p><p>8</p><p>1.</p><p>2</p><p>1985 1990 1995</p><p>S</p><p>ea</p><p>so</p><p>n</p><p>Time</p><p>Fig. 3.10. Sales of Australian white wine: fitted values; level; slope (labelled trend);</p><p>seasonal variation.</p><p>Holt−Winters filtering</p><p>Time</p><p>O</p><p>bs</p><p>er</p><p>ve</p><p>d</p><p>/ F</p><p>itt</p><p>ed</p><p>1985 1990 1995</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 3.11. Sales of Australian white wine and Holt-Winters fitted values.</p><p>3.4.3 Four-year-ahead forecasts for the air passenger data</p><p>The seasonal effect for the air passenger data of §1.4.1 appeared to increase</p><p>with the trend, which suggests that a ‘multiplicative’ seasonal component be</p><p>used in the Holt-Winters procedure. The Holt-Winters fit is impressive – see</p><p>Figure 3.12. The predict function in R can be used with the fitted model to</p><p>make forecasts into the future (Fig. 3.13).</p><p>> AP.hw plot(AP.hw)</p><p>3.4 Exponential smoothing and the Holt-Winters method 63</p><p>> AP.predict ts.plot(AP, AP.predict, lty = 1:2)</p><p>Holt−Winters filtering</p><p>Time</p><p>O</p><p>bs</p><p>er</p><p>ve</p><p>d</p><p>/ F</p><p>itt</p><p>ed</p><p>1950 1952 1954 1956 1958 1960</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 3.12. Holt-Winters fit for air passenger data.</p><p>Time</p><p>1950 1955 1960 1965</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>70</p><p>0</p><p>Fig. 3.13. Holt-Winters forecasts for air passenger data for 1961–1964 shown as</p><p>dotted lines.</p><p>The estimates of the model parameters, which can be obtained from</p><p>AP.hw$alpha, AP.hw$beta, and AP.hw$gamma, are α̂ = 0.274, β̂ = 0.0175,</p><p>and γ̂ = 0.877. It should be noted that the extrapolated forecasts are based</p><p>entirely on the trends in the period during which the model was fitted and</p><p>would be a sensible prediction assuming these trends continue. Whilst the ex-</p><p>64 3 Forecasting Strategies</p><p>trapolation in Figure 3.12 looks visually appropriate, unforeseen events could</p><p>lead to completely different future values than those shown here.</p><p>3.5 Summary of commands used in examples</p><p>nls non-linear least squares fit</p><p>HoltWinters estimates the parameters of the Holt-Winters</p><p>or exponential smoothing model</p><p>predict forecasts future values</p><p>ts.union create the union of two series</p><p>coef extracts the coefficients of a fitted model</p><p>3.6 Exercises</p><p>1. a) Describe the association and calculate the ccf between x and y for k</p><p>equal to 1, 10, and 100.</p><p>> w x y ccf(x, y)</p><p>b) Describe the association between x and y, and calculate the ccf.</p><p>> Time x y</p><p>the sum of n terms of a geometric pro-</p><p>gression tend to a finite sum as n tends to infinity? What is this sum?</p><p>c) Obtain an expression for the sum of the weights in an EWMA if we</p><p>specify a1 = x1 in Equation (3.15).</p><p>d) Suppose xt happens to be a sequence of independent variables with a</p><p>constant mean and a constant variance σ2. What is the variance of at</p><p>if we specify a1 = x1 in Equation (3.15)?</p><p>6. Refer to the sweet white wine sales (§3.4.2).</p><p>a) Use the HoltWinters procedure with α, β and γ set to 0.2 and com-</p><p>pare the SS1PE with the minimum obtained with R.</p><p>b) Use the HoltWinters procedure on the logarithms of sales and com-</p><p>pare SS1PE with that obtained using sales.</p><p>66 3 Forecasting Strategies</p><p>c) What is the SS1PE if you predict next month’s sales will equal this</p><p>month’s sales?</p><p>d) This is rather harder: What is the SS1PE if you find the optimum α,</p><p>β and γ from the data available at each time step before making the</p><p>one-step-ahead prediction?</p><p>7. Continue the following exploratory time series analysis using the global</p><p>temperature series from §1.4.5.</p><p>a) Produce a time plot of the data. Plot the aggregated annual mean</p><p>series and a boxplot that summarises the observed values for each</p><p>season, and comment on the plots.</p><p>b) Decompose the series into the components trend, seasonal effect, and</p><p>residuals, and plot the decomposed series. Produce a plot of the trend</p><p>with a superimposed seasonal effect.</p><p>c) Plot the correlogram of the residuals from question 7b. Comment on</p><p>the plot, explaining any ‘significant’ correlations at significant lags.</p><p>d) Fit an appropriate Holt-Winters model to the monthly data. Explain</p><p>why you chose that particular Holt-Winters model, and give the pa-</p><p>rameter estimates.</p><p>e) Using the fitted model, forecast values for the years 2005–2010. Add</p><p>these forecasts to a time plot of the original series. Under what cir-</p><p>cumstances would these forecasts be valid? What comments of cau-</p><p>tion would you make to an economist or politician who wanted to</p><p>use these forecasts to make statements about the potential impact of</p><p>global warming on the world economy?</p><p>8. A cumulative sum plot is useful for monitoring changes in the mean of a</p><p>process. If we have a time series composed of observations xt at times t</p><p>with a target value of τ , the CUSUM chart is a plot of the cumulative</p><p>sums of the deviations from target, cst, against t. The formula for cst at</p><p>time t is</p><p>cst =</p><p>t∑</p><p>i=1</p><p>(xi − τ)</p><p>The R function cumsum calculates a cumulative sum. Plot the CUSUM for</p><p>the motoring organisation complaints with a target of 18.</p><p>9. Using the motor organisation complaints series, refit the exponential</p><p>smoothing model with weights α = 0.01 and α = 0.99. In each case,</p><p>extract the last residual from the fitted model and verify that the last</p><p>residual satisfies Equation (3.19). Redraw Figure 3.8 using the new values</p><p>of α, and comment on the plots, explaining the main differences.</p><p>4</p><p>Basic Stochastic Models</p><p>4.1 Purpose</p><p>So far, we have considered two approaches for modelling time series. The</p><p>first is based on an assumption that there is a fixed seasonal pattern about a</p><p>trend. We can estimate the trend by local averaging of the deseasonalised data,</p><p>and this is implemented by the R function decompose. The second approach</p><p>allows the seasonal variation and trend, described in terms of a level and slope,</p><p>to change over time and estimates these features by exponentially weighted</p><p>averages. We used the HoltWinters function to demonstrate this method.</p><p>When we fit mathematical models to time series data, we refer to the dis-</p><p>crepancies between the fitted values, calculated from the model, and the data</p><p>as a residual error series. If our model encapsulates most of the deterministic</p><p>features of the time series, our residual error series should appear to be a re-</p><p>alisation of independent random variables from some probability distribution.</p><p>However, we often find that there is some structure in the residual error series,</p><p>such as consecutive errors being positively correlated, which we can use to im-</p><p>prove our forecasts and make our simulations more realistic. We assume that</p><p>our residual error series is stationary, and in Chapter 6 we introduce models</p><p>for stationary time series.</p><p>Since we judge a model to be a good fit if its residual error series appears</p><p>to be a realisation of independent random variables, it seems natural to build</p><p>models up from a model of independent random variation, known as discrete</p><p>white noise. The name ‘white noise’ was coined in an article on heat radiation</p><p>published in Nature in April 1922, where it was used to refer to series that</p><p>contained all frequencies in equal proportions, analogous to white light. The</p><p>term purely random is sometimes used for white noise series. In §4.3 we define a</p><p>fundamental non-stationary model based on discrete white noise that is called</p><p>the random walk. It is sometimes an adequate model for financial series and is</p><p>often used as a standard against which the performance of more complicated</p><p>models can be assessed.</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 67</p><p>Use R, DOI 10.1007/978-0-387-88698-5 4,</p><p>© Springer Science+Business Media, LLC 2009</p><p>68 4 Basic Stochastic Models</p><p>4.2 White noise</p><p>4.2.1 Introduction</p><p>A residual error is the difference between the observed value and the model</p><p>predicted value at time t. If we suppose the model is defined for the variable</p><p>yt and ŷt is the value predicted by the model, the residual error xt is</p><p>xt = yt − ŷt (4.1)</p><p>As the residual errors occur in time, they form a time series: x1, x2, . . . , xn.</p><p>In Chapter 2, we found that features of the historical series, such as the</p><p>trend or seasonal variation, are reflected in the correlogram. Thus, if a model</p><p>has accounted for all the serial correlation in the data, the residual series would</p><p>be serially uncorrelated, so that a correlogram of the residual series would</p><p>exhibit no obvious patterns. This ideal motivates the following definition.</p><p>4.2.2 Definition</p><p>A time series {wt : t = 1, 2, . . . , n} is discrete white noise (DWN) if the</p><p>variables w1, w2, . . . , wn are independent and identically distributed with a</p><p>mean of zero. This implies that the variables all have the same variance σ2</p><p>and Cor(wi, wj) = 0 for all i 6= j. If, in addition, the variables also follow a</p><p>normal distribution (i.e., wt ∼ N(0, σ2)) the series is called Gaussian white</p><p>noise.</p><p>4.2.3 Simulation in R</p><p>A fitted time series model can be used to simulate data. Time series simulated</p><p>using a model are sometimes called synthetic series to distinguish them from</p><p>an observed historical series.</p><p>Simulation is useful for many reasons. For example, simulation can be used</p><p>to generate plausible future scenarios and to construct confidence intervals for</p><p>model parameters (sometimes called bootstrapping). In R, simulation is usu-</p><p>ally straightforward, and most standard statistical distributions are simulated</p><p>using a function that has an abbreviated name for the distribution prefixed</p><p>with an ‘r’ (for ‘random’).1 For example, rnorm(100) is used to simulate 100</p><p>independent standard normal variables, which is equivalent to simulating a</p><p>Gaussian white noise series of length 100 (Fig. 4.1).</p><p>> set.seed(1)</p><p>> w plot(w, type = "l")</p><p>1 Other prefixes are also available to calculate properties for standard distributions;</p><p>e.g., the prefix ‘d’ is used to calculate the probability (density) function. See the</p><p>R help (e.g., ?dnorm) for more details.</p><p>4.2 White noise 69</p><p>0 20 40 60 80 100</p><p>−</p><p>2</p><p>−</p><p>1</p><p>0</p><p>1</p><p>2</p><p>time</p><p>w</p><p>Fig. 4.1. Time plot of simulated Gaussian white noise series.</p><p>Simulation experiments in R can easily be repeated using the ‘up’ arrow</p><p>on the keyboard. For this reason, it is sometimes preferable to put all the</p><p>commands on one line, separated by ‘;’, or to nest the functions; for example,</p><p>a plot of a white noise series is given by plot(rnorm(100), type="l").</p><p>The function set.seed is used to provide a starting point (or seed) in</p><p>the simulations, thus ensuring that the simulations can be</p><p>. . . 32</p><p>2.2.5 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33</p><p>ix</p><p>x Contents</p><p>2.3 The correlogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35</p><p>2.3.1 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35</p><p>2.3.2 Example based on air passenger series . . . . . . . . . . . . . . . 37</p><p>2.3.3 Example based on the Font Reservoir series . . . . . . . . . . . 40</p><p>2.4 Covariance of sums of random variables . . . . . . . . . . . . . . . . . . . . 41</p><p>2.5 Summary of commands used in examples . . . . . . . . . . . . . . . . . . . 42</p><p>2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42</p><p>3 Forecasting Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45</p><p>3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45</p><p>3.2 Leading variables and associated variables . . . . . . . . . . . . . . . . . . 45</p><p>3.2.1 Marine coatings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45</p><p>3.2.2 Building approvals publication . . . . . . . . . . . . . . . . . . . . . . 46</p><p>3.2.3 Gas supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49</p><p>3.3 Bass model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.2 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.3 Interpretation of the Bass model* . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52</p><p>3.4 Exponential smoothing and the Holt-Winters method . . . . . . . . 55</p><p>3.4.1 Exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55</p><p>3.4.2 Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59</p><p>3.4.3 Four-year-ahead forecasts for the air passenger data . . . 62</p><p>3.5 Summary of commands used in examples . . . . . . . . . . . . . . . . . . . 64</p><p>3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64</p><p>4 Basic Stochastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67</p><p>4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67</p><p>4.2 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.3 Simulation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.4 Second-order properties and the correlogram . . . . . . . . . . 69</p><p>4.2.5 Fitting a white noise model . . . . . . . . . . . . . . . . . . . . . . . . . 70</p><p>4.3 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.3 The backward shift operator . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.4 Random walk: Second-order properties . . . . . . . . . . . . . . . 72</p><p>4.3.5 Derivation of second-order properties* . . . . . . . . . . . . . . . 72</p><p>4.3.6 The difference operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72</p><p>4.3.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73</p><p>4.4 Fitted models and diagnostic plots . . . . . . . . . . . . . . . . . . . . . . . . . 74</p><p>4.4.1 Simulated random walk series . . . . . . . . . . . . . . . . . . . . . . . 74</p><p>4.4.2 Exchange rate series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75</p><p>Contents xi</p><p>4.4.3 Random walk with drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77</p><p>4.5 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79</p><p>4.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79</p><p>4.5.2 Stationary and non-stationary AR processes . . . . . . . . . . 79</p><p>4.5.3 Second-order properties of an AR(1) model . . . . . . . . . . . 80</p><p>4.5.4 Derivation of second-order properties for an AR(1)</p><p>process* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80</p><p>4.5.5 Correlogram of an AR(1) process . . . . . . . . . . . . . . . . . . . . 81</p><p>4.5.6 Partial autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81</p><p>4.5.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81</p><p>4.6 Fitted models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82</p><p>4.6.1 Model fitted to simulated series . . . . . . . . . . . . . . . . . . . . . 82</p><p>4.6.2 Exchange rate series: Fitted AR model . . . . . . . . . . . . . . . 84</p><p>4.6.3 Global temperature series: Fitted AR model . . . . . . . . . . 85</p><p>4.7 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87</p><p>4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87</p><p>5 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91</p><p>5.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91</p><p>5.2 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92</p><p>5.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92</p><p>5.2.2 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93</p><p>5.2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93</p><p>5.3 Fitted models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94</p><p>5.3.1 Model fitted to simulated data . . . . . . . . . . . . . . . . . . . . . . 94</p><p>5.3.2 Model fitted to the temperature series (1970–2005) . . . . 95</p><p>5.3.3 Autocorrelation and the estimation of sample statistics* 96</p><p>5.4 Generalised least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98</p><p>5.4.1 GLS fit to simulated series . . . . . . . . . . . . . . . . . . . . . . . . . . 98</p><p>5.4.2 Confidence interval for the trend in the temperature</p><p>series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.5 Linear models with seasonal variables . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.5.2 Additive seasonal indicator variables . . . . . . . . . . . . . . . . . 99</p><p>5.5.3 Example: Seasonal model for the temperature series . . . 100</p><p>5.6 Harmonic seasonal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101</p><p>5.6.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102</p><p>5.6.2 Fit to simulated series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103</p><p>5.6.3 Harmonic model fitted to temperature series (1970–2005)105</p><p>5.7 Logarithmic transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109</p><p>5.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109</p><p>5.7.2 Example using the air passenger series . . . . . . . . . . . . . . . 109</p><p>5.8 Non-linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113</p><p>5.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113</p><p>5.8.2 Example of a simulated and fitted non-linear series</p><p>reproduced. If this</p><p>function is left out, a different set of simulated data are obtained, although</p><p>the underlying statistical properties remain unchanged. To see this, rerun the</p><p>plot above a few times with and without set.seed(1).</p><p>To illustrate by simulation how samples may differ from their underlying</p><p>populations, consider the following histogram of a Gaussian white noise series.</p><p>Type the following to view the plot (which is not shown in the text):</p><p>> x hist(rnorm(100), prob = T); points(x, dnorm(x), type = "l")</p><p>Repetitions of the last command, which can be obtained using the ‘up’ arrow</p><p>on your keyboard, will show a range of different sample distributions that</p><p>arise when the underlying distribution is normal. Distributions that depart</p><p>from the plotted curve have arisen due to sampling variation.</p><p>4.2.4 Second-order properties and the correlogram</p><p>The second-order properties of a white noise series {wt} are an immediate</p><p>consequence of the definition in §4.2.2. However, as they are needed so often</p><p>in the derivation of the second-order properties for more complex models, we</p><p>explicitly state them here:</p><p>70 4 Basic Stochastic Models</p><p>µw = 0</p><p>γk = Cov(wt, wt+k) =</p><p>{</p><p>σ2 if k = 0</p><p>0 if k 6= 0</p><p> (4.2)</p><p>The autocorrelation function follows as</p><p>ρk =</p><p>{</p><p>1 if k = 0</p><p>0 if k 6= 0</p><p>(4.3)</p><p>Simulated white noise data will not have autocorrelations that are exactly</p><p>zero (when k 6= 0) because of sampling variation. In particular, for a simu-</p><p>lated white noise series, it is expected that 5% of the autocorrelations will</p><p>be significantly different from zero at the 5% significance level, shown as dot-</p><p>ted lines on the correlogram. Try repeating the following command to view a</p><p>range of correlograms that could arise from an underlying white noise series.</p><p>A typical plot, with one statistically significant autocorrelation, occurring at</p><p>lag 7, is shown in Figure 4.2.</p><p>> set.seed(2)</p><p>> acf(rnorm(100))</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.2. Correlogram of a simulated white noise series. The underlying autocorre-</p><p>lations are all zero (except at lag 0); the statistically significant value at lag 7 is due</p><p>to sampling variation.</p><p>4.2.5 Fitting a white noise model</p><p>A white noise series usually arises as a residual series after fitting an appropri-</p><p>ate time series model. The correlogram generally provides sufficient evidence,</p><p>4.3 Random walks 71</p><p>provided the series is of a reasonable length, to support the conjecture that</p><p>the residuals are well approximated by white noise.</p><p>The only parameter for a white noise series is the variance σ2, which is</p><p>estimated by the residual variance, adjusted by degrees of freedom, given in</p><p>the computer output of the fitted model. If your analysis begins on data that</p><p>are already approximately white noise, then only σ2 needs to be estimated,</p><p>which is readily achieved using the var function.</p><p>4.3 Random walks</p><p>4.3.1 Introduction</p><p>In Chapter 1, the exchange rate data were examined and found to exhibit</p><p>stochastic trends. A random walk often provides a good fit to data with</p><p>stochastic trends, although even better fits are usually obtained from more</p><p>general model formulations, such as the ARIMA models of Chapter 7.</p><p>4.3.2 Definition</p><p>Let {xt} be a time series. Then {xt} is a random walk if</p><p>xt = xt−1 + wt (4.4)</p><p>where {wt} is a white noise series. Substituting xt−1 = xt−2+wt−1 in Equation</p><p>(4.4) and then substituting for xt−2, followed by xt−3 and so on (a process</p><p>known as ‘back substitution’) gives:</p><p>xt = wt + wt−1 + wt−2 + . . . (4.5)</p><p>In practice, the series above will not be infinite but will start at some time</p><p>t = 1. Hence,</p><p>xt = w1 + w2 + . . .+ wt (4.6)</p><p>Back substitution is used to define more complex time series models and</p><p>also to derive second-order properties. The procedure occurs so frequently in</p><p>the study of time series models that the following definition is needed.</p><p>4.3.3 The backward shift operator</p><p>The backward shift operator B is defined by</p><p>Bxt = xt−1 (4.7)</p><p>The backward shift operator is sometimes called the ‘lag operator’. By repeat-</p><p>edly applying B, it follows that</p><p>Bnxt = xt−n (4.8)</p><p>72 4 Basic Stochastic Models</p><p>Using B, Equation (4.4) can be rewritten as</p><p>xt = Bxt + wt ⇒ (1−B)xt = wt ⇒ xt = (1−B)−1wt</p><p>⇒ xt = (1 + B + B2 + . . .)wt ⇒ xt = wt + wt−1 + wt−2 + . . .</p><p>and Equation (4.5) is recovered.</p><p>4.3.4 Random walk: Second-order properties</p><p>The second-order properties of a random walk follow as</p><p>µx = 0</p><p>γk(t) = Cov(xt, xt+k) = tσ2</p><p>}</p><p>(4.9)</p><p>The covariance is a function of time, so the process is non-stationary. In par-</p><p>ticular, the variance is tσ2 and so it increases without limit as t increases. It</p><p>follows that a random walk is only suitable for short term predictions.</p><p>The time-varying autocorrelation function for k > 0 follows from Equation</p><p>(4.9) as</p><p>ρk(t) =</p><p>Cov(xt, xt+k)√</p><p>Var(xt)Var(xt+k)</p><p>=</p><p>tσ2√</p><p>tσ2(t+ k)σ2</p><p>=</p><p>1√</p><p>1 + k/t</p><p>(4.10)</p><p>so that, for large t with k considerably less than t, ρk is nearly 1. Hence, the</p><p>correlogram for a random walk is characterised by positive autocorrelations</p><p>that decay very slowly down from unity. This is demonstrated by simulation</p><p>in §4.3.7.</p><p>4.3.5 Derivation of second-order properties*</p><p>Equation (4.6) is a finite sum of white noise terms, each with zero mean and</p><p>variance σ2. Hence, the mean of xt is zero (Equation (4.9)). The autocovari-</p><p>ance in Equation (4.9) can be derived using Equation (2.15) as follows:</p><p>γk(t) = Cov(xt, xt+k) = Cov</p><p> t∑</p><p>i=1</p><p>wi,</p><p>t+k∑</p><p>j=1</p><p>wj</p><p> =</p><p>∑</p><p>i=j</p><p>Cov(wi, wj) = tσ2</p><p>4.3.6 The difference operator</p><p>Differencing adjacent terms of a series can transform a non-stationary series</p><p>to a stationary series. For example, if the series {xt} is a random walk, it</p><p>is non-stationary. However, from Equation (4.4), the first-order differences of</p><p>{xt} produce the stationary white noise series {wt} given by xt − xt−1 = wt.</p><p>4.3 Random walks 73</p><p>Hence, differencing turns out to be a useful ‘filtering’ procedure in the study</p><p>of non-stationary time series. The difference operator ∇ is defined by</p><p>∇xt = xt − xt−1 (4.11)</p><p>Note that ∇xt = (1−B)xt, so that ∇ can be expressed in terms of the back-</p><p>ward shift operator B. In general, higher-order differencing can be expressed</p><p>as</p><p>∇n = (1−B)n (4.12)</p><p>The proof of the last result is left to Exercise 7.</p><p>4.3.7 Simulation</p><p>It is often helpful to study a time series model by simulation. This enables the</p><p>main features of the model to be observed in plots, so that when historical data</p><p>exhibit similar features, the model may be selected as a potential candidate.</p><p>The following commands can be used to simulate random walk data for x:</p><p>> x for (t in 2:1000) x[t] plot(x, type = "l")</p><p>The first command above places a white noise series into w and uses this</p><p>series to initialise x. The ‘for’ loop then generates the random walk using</p><p>Equation (4.4) – the correspondence between the R code above and Equation</p><p>(4.4) should be noted. The series is plotted and shown in Figure 4.3.2</p><p>A correlogram of the series is obtained from acf(x) and is shown in Fig-</p><p>ure 4.4 – a gradual decay in the correlations is evident in the figure, thus</p><p>supporting the theoretical results in §4.3.4.</p><p>Throughout this book, we will often fit models to data that we have simu-</p><p>lated and attempt to recover the underlying model parameters. At first sight,</p><p>this might seem odd, given that the parameters are used to simulate the data</p><p>so that we already know at the outset the values the parameters should take.</p><p>However, the procedure is useful for a number of reasons. In particular, to</p><p>be able to simulate data using a model requires that the model formulation</p><p>be correctly understood. If the model is understood but incorrectly imple-</p><p>mented, then the parameter estimates from the fitted model may deviate</p><p>significantly from the underlying model values used in the simulation. Simu-</p><p>lation can therefore help ensure that the model is both correctly understood</p><p>and correctly implemented.</p><p>2 To obtain the same simulation and plot, it is necessary to</p><p>have run the previous</p><p>code in §4.2.4 first, which sets the random number seed.</p><p>74 4 Basic Stochastic Models</p><p>0 200 400 600 800 1000</p><p>0</p><p>20</p><p>40</p><p>60</p><p>80</p><p>Index</p><p>x</p><p>Fig. 4.3. Time plot of a simulated random walk. The series exhibits an increasing</p><p>trend. However, this is purely stochastic and due to the high serial correlation.</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.4. The correlogram for the simulated random walk. A gradual decay from a</p><p>high serial correlation is a notable feature of a random walk series.</p><p>4.4 Fitted models and diagnostic plots</p><p>4.4.1 Simulated random walk series</p><p>The first-order differences of a random walk are a white noise series, so the</p><p>correlogram of the series of differences can be used to assess whether a given</p><p>series is reasonably modelled as a random walk.</p><p>> acf(diff(x))</p><p>4.4 Fitted models and diagnostic plots 75</p><p>As can be seen in Figure 4.5, there are no obvious patterns in the correlogram,</p><p>with only a couple of marginally statistically significant values. These signif-</p><p>icant values can be ignored because they are small in magnitude and about</p><p>5% of the values are expected to be statistically significant even when the</p><p>underlying values are zero (§2.3). Thus, as expected, there is good evidence</p><p>that the simulated series in x follows a random walk.</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.5. Correlogram of differenced series. If a series follows a random walk, the</p><p>differenced series will be white noise.</p><p>4.4.2 Exchange rate series</p><p>The correlogram of the first-order differences of the exchange rate data from</p><p>§1.4.4 can be obtained from acf(diff(Z.ts)) and is shown in Figure 4.6.</p><p>A significant value occurs at lag 1, suggesting that a more complex model</p><p>may be needed, although the lack of any other significant values in the cor-</p><p>relogram does suggest that the random walk provides a good approximation</p><p>for the series (Fig. 4.6). An additional term can be added to the random</p><p>walk model using the Holt-Winters procedure, allowing the parameter β to</p><p>be non-zero but still forcing the seasonal term γ to be zero:</p><p>> Z.hw acf(resid(Z.hw))</p><p>Figure 4.7 shows the correlogram of the residuals from the fitted Holt-</p><p>Winters model. This correlogram is more consistent with a hypothesis that</p><p>the residual series is white noise (Fig. 4.7). Using Equation (3.21), with the</p><p>parameter estimates obtained from Z.hw$alpha and Z.hw$beta, the fitted</p><p>model can be expressed as</p><p>76 4 Basic Stochastic Models</p><p>0 1 2 3</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.6. Correlogram of first-order differences of the exchange rate series (UK</p><p>pounds to NZ dollars, 1991–2000). The significant value at lag 1 indicates that an</p><p>extension of the random walk model is needed for this series.</p><p>0 1 2 3</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.7. The correlogram of the residuals from the fitted Holt-Winters model for the</p><p>exchange rate series (UK pounds to NZ dollars, 1991–2000). There are no significant</p><p>correlations in the residual series, so the model provides a reasonable approximation</p><p>to the exchange rate data.</p><p>xt = xt−1 + bt−1 + wt</p><p>bt−1 = 0.167(xt−1 − xt−2) + 0.833bt−2</p><p>}</p><p>(4.13)</p><p>where {wt} is white noise with zero mean.</p><p>4.4 Fitted models and diagnostic plots 77</p><p>After some algebra, Equations (4.13) can be expressed as one equation</p><p>in terms of the backward shift operator:</p><p>(1− 0.167B + 0.167B2)(1−B)xt = wt (4.14)</p><p>Equation (4.14) is a special case – the integrated autoregressive model –</p><p>within the important class of models known as ARIMA models (Chap-</p><p>ter 7). The proof of Equation (4.14) is left to Exercise 8.</p><p>4.4.3 Random walk with drift</p><p>Company stockholders generally expect their investment to increase in value</p><p>despite the volatility of financial markets. The random walk model can be</p><p>adapted to allow for this by including a drift parameter δ.</p><p>xt = xt−1 + δ + wt</p><p>Closing prices (US dollars) for Hewlett-Packard Company stock for 672</p><p>trading days up to June 7, 2007 are read into R and plotted (see the code</p><p>below and Fig. 4.8). The lag 1 differences are calculated using diff() and</p><p>plotted in Figure 4.9. The correlogram of the differences is in Figure 4.10, and</p><p>they appear to be well modelled as white noise. The mean of the differences is</p><p>0.0399, and this is our estimate of the drift parameter. The standard deviation</p><p>of the 671 differences is 0.460, and an approximate 95% confidence interval</p><p>for the drift parameter is [0.004, 0.075]. Since this interval does not include 0,</p><p>we have evidence of a positive drift over this period.</p><p>Day</p><p>C</p><p>lo</p><p>si</p><p>ng</p><p>P</p><p>ric</p><p>e</p><p>0 100 200 300 400 500 600</p><p>20</p><p>25</p><p>30</p><p>35</p><p>40</p><p>45</p><p>Fig. 4.8. Daily closing prices of Hewlett-Packard stock.</p><p>78 4 Basic Stochastic Models</p><p>Day</p><p>D</p><p>iff</p><p>er</p><p>en</p><p>ce</p><p>o</p><p>f c</p><p>lo</p><p>si</p><p>ng</p><p>p</p><p>ric</p><p>e</p><p>0 100 200 300 400 500 600</p><p>−</p><p>2</p><p>−</p><p>1</p><p>0</p><p>1</p><p>2</p><p>3</p><p>Fig. 4.9. Lag 1 differences of daily closing prices of Hewlett-Packard stock.</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Series DP</p><p>Fig. 4.10. Acf of lag 1 differences of daily closing prices of Hewlett-Packard stock.</p><p>> www HP.dat plot (as.ts(Price))</p><p>> DP mean(DP) + c(-2, 2) * sd(DP)/sqrt(length(DP))</p><p>[1] 0.004378 0.075353</p><p>4.5 Autoregressive models 79</p><p>4.5 Autoregressive models</p><p>4.5.1 Definition</p><p>The series {xt} is an autoregressive process of order p, abbreviated to AR(p),</p><p>if</p><p>xt = α1xt−1 + α2xt−2 + . . .+ αpxt−p + wt (4.15)</p><p>where {wt} is white noise and the αi are the model parameters with αp 6= 0</p><p>for an order p process. Equation (4.15) can be expressed as a polynomial of</p><p>order p in terms of the backward shift operator:</p><p>θp(B)xt = (1− α1B− α2B2 − . . .− αpBp)xt = wt (4.16)</p><p>The following points should be noted:</p><p>(a) The random walk is the special case AR(1) with α1 = 1 (see Equation</p><p>(4.4)).</p><p>(b) The exponential smoothing model is the special case αi = α(1 − α)i for</p><p>i = 1, 2, . . . and p→∞.</p><p>(c) The model is a regression of xt on past terms from the same series; hence</p><p>the use of the term ‘autoregressive’.</p><p>(d) A prediction at time t is given by</p><p>x̂t = α1xt−1 + α2xt−2 + . . .+ αpxt−p (4.17)</p><p>(e) The model parameters can be estimated by minimising the sum of squared</p><p>errors.</p><p>4.5.2 Stationary and non-stationary AR processes</p><p>The equation θp(B) = 0, where B is formally treated as a number (real or</p><p>complex), is called the characteristic equation. The roots of the characteristic</p><p>equation (i.e., the polynomial θp(B) from Equation (4.16)) must all exceed</p><p>unity in absolute value for the process to be stationary. Notice that the random</p><p>walk has θ = 1−B with root B = 1 and is non-stationary. The following four</p><p>examples illustrate the procedure for determining whether an AR process is</p><p>stationary or non-stationary:</p><p>1. The AR(1) model xt = 1</p><p>2xt−1 + wt is stationary because the root of</p><p>1− 1</p><p>2B = 0 is B = 2, which is greater than 1.</p><p>2. The AR(2) model xt = xt−1 − 1</p><p>4xt−2 +wt is stationary. The proof of this</p><p>result is obtained by first expressing the model in terms of the backward</p><p>shift operator 1</p><p>4 (B2 − 4B + 4)xt = wt; i.e., 1</p><p>4 (B− 2)2xt = wt. The roots</p><p>of the polynomial are given by solving θ(B) = 1</p><p>4 (B − 2)2 = 0 and are</p><p>therefore obtained as B = 2. As the roots are greater than unity this</p><p>AR(2) model is stationary.</p><p>80 4 Basic Stochastic Models</p><p>3. The model xt = 1</p><p>2xt−1 + 1</p><p>2xt−2 + wt is non-stationary because one of</p><p>the roots is unity. To prove this, first express the model in terms of the</p><p>backward shift operator − 1</p><p>2 (B2+B−2)xt = wt; i.e., − 1</p><p>2 (B−1)(B+2)xt =</p><p>wt. The polynomial θ(B) = − 1</p><p>2 (B − 1)(B + 2) has roots B = 1,−2. As</p><p>there is a unit root (B = 1), the model is non-stationary. Note that the</p><p>other root (B = −2) exceeds unity in absolute value, so only the presence</p><p>of the unit root makes this process non-stationary.</p><p>4. The AR(2) model xt = − 1</p><p>4xt−2 + wt is stationary because the roots of</p><p>1 + 1</p><p>4B</p><p>2 =</p><p>0 are B = ±2i, which are complex numbers with i =</p><p>√</p><p>−1,</p><p>each having an absolute value of 2 exceeding unity.</p><p>The R function polyroot finds zeros of polynomials and can be used to find</p><p>the roots of the characteristic equation to check for stationarity.</p><p>4.5.3 Second-order properties of an AR(1) model</p><p>From Equation (4.15), the AR(1) process is given by</p><p>xt = αxt−1 + wt (4.18)</p><p>where {wt} is a white noise series with mean zero and variance σ2. It can be</p><p>shown (§4.5.4) that the second-order properties follow as</p><p>µx = 0</p><p>γk = αkσ2/(1− α2)</p><p>}</p><p>(4.19)</p><p>4.5.4 Derivation of second-order properties for an AR(1) process*</p><p>Using B, a stable AR(1) process (|α| rho layout(1:2)</p><p>> plot(0:10, rho(0:10, 0.7), type = "b")</p><p>> plot(0:10, rho(0:10, -0.7), type = "b")</p><p>Try experimenting using other values for α. For example, use a small value of</p><p>α to observe a more rapid decay to zero in the correlogram.</p><p>4.5.6 Partial autocorrelation</p><p>From Equation (4.21), the autocorrelations are non-zero for all lags even</p><p>though in the underlying model xt only depends on the previous value xt−1</p><p>(Equation (4.18)). The partial autocorrelation at lag k is the correlation that</p><p>results after removing the effect of any correlations due to the terms at shorter</p><p>lags. For example, the partial autocorrelation of an AR(1) process will be zero</p><p>for all lags greater than 1. In general, the partial autocorrelation at lag k is</p><p>the kth coefficient of a fitted AR(k) model; if the underlying process is AR(p),</p><p>then the coefficients αk will be zero for all k > p. Thus, an AR(p) process has</p><p>a correlogram of partial autocorrelations that is zero after lag p. Hence, a plot</p><p>of the estimated partial autocorrelations can be useful when determining the</p><p>order of a suitable AR process for a time series. In R, the function pacf can</p><p>be used to calculate the partial autocorrelations of a time series and produce</p><p>a plot of the partial autocorrelations against lag (the ‘partial correlogram’).</p><p>4.5.7 Simulation</p><p>An AR(1) process can be simulated in R as follows:</p><p>> set.seed(1)</p><p>> x for (t in 2:100) x[t] plot(x, type = "l")</p><p>> acf(x)</p><p>> pacf(x)</p><p>The resulting plots of the simulated data are shown in Figure 4.12 and give one</p><p>possible realisation of the model. The partial correlogram has no significant</p><p>correlations except the value at lag 1, as expected (Fig. 4.12c – note that the</p><p>82 4 Basic Stochastic Models</p><p>Fig. 4.11. Example correlograms for two autoregressive models: (a) xt = 0.7xt−1 +</p><p>wt; (b) xt = −0.7xt−1 + wt.</p><p>pacf starts at lag 1, whilst the acf starts at lag 0). The difference between the</p><p>correlogram of the underlying model (Fig. 4.11a) and the sample correlogram</p><p>of the simulated series (Fig. 4.12b) shows discrepancies that have arisen due</p><p>to sampling variation. Try repeating the commands above several times to</p><p>obtain a range of possible sample correlograms for an AR(1) process with</p><p>underlying parameter α = 0.7. You are asked to investigate an AR(2) process</p><p>in Exercise 4.</p><p>4.6 Fitted models</p><p>4.6.1 Model fitted to simulated series</p><p>An AR(p) model can be fitted to data in R using the ar function. In the code</p><p>below, the autoregressive model x.ar is fitted to the simulated series of the</p><p>last section and an approximate 95% confidence interval for the underlying</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ● ● ●</p><p>0 2 4 6 8 10</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>lag k</p><p>ρρ k</p><p>(a) αα == 0.7</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>0 2 4 6 8 10</p><p>−</p><p>1</p><p>0</p><p>1</p><p>lag k</p><p>ρρ k</p><p>(b) αα == −− 0.7</p><p>4.6 Fitted models 83</p><p>0 20 40 60 80 100</p><p>0</p><p>2</p><p>(a) Time plot.</p><p>x</p><p>0 5 10 15 20</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>(b) Correlogram: Sample correlation against lag</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>(c) Partial correlogram: Sample partial correlation against lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 4.12. A simulated AR(1) process, xt = 0.7xt−1 + wt. Note that in the partial</p><p>correlogram (c) only the first lag is significant, which is usually the case when the</p><p>underlying process is AR(1).</p><p>parameter is given, where the (asymptotic) variance of the parameter estimate</p><p>is extracted using x.ar$asy.var:</p><p>> x.ar x.ar$order</p><p>[1] 1</p><p>> x.ar$ar</p><p>84 4 Basic Stochastic Models</p><p>[1] 0.601</p><p>> x.ar$ar + c(-2, 2) * sqrt(x.ar$asy.var)</p><p>[1] 0.4404 0.7615</p><p>The method “mle” used in the fitting procedure above is based on max-</p><p>imising the likelihood function (the probability of obtaining the data given the</p><p>model) with respect to the unknown parameters. The order p of the process</p><p>is chosen using the Akaike Information Criterion (AIC; Akaike, 1974), which</p><p>penalises models with too many parameters:</p><p>AIC = −2× log-likelihood + 2× number of parameters (4.22)</p><p>In the function ar, the model with the smallest AIC is selected as the best-</p><p>fitting AR model. Note that, in the code above, the correct order (p = 1)</p><p>of the underlying process is recovered. The parameter estimate for the fitted</p><p>AR(1) model is α̂ = 0.60. Whilst this is smaller than the underlying model</p><p>value of α = 0.7, the approximate 95% confidence interval does contain the</p><p>value of the model parameter as expected, giving us no reason to doubt the</p><p>implementation of the model.</p><p>4.6.2 Exchange rate series: Fitted AR model</p><p>An AR(1) model is fitted to the exchange rate series, and the upper bound</p><p>of the confidence interval for the parameter includes 1. This indicates that</p><p>there would not be sufficient evidence to reject the hypothesis α = 1, which is</p><p>consistent with the earlier conclusion that a random walk provides a good ap-</p><p>proximation for this series. However, simulated data from models with values</p><p>of α > 1, formally included in the confidence interval below, exhibit exponen-</p><p>tially unstable behaviour and are not credible models for the New Zealand</p><p>exchange rate.</p><p>> Z.ar mean(Z.ts)</p><p>[1] 2.823</p><p>> Z.ar$order</p><p>[1] 1</p><p>> Z.ar$ar</p><p>[1] 0.8903</p><p>> Z.ar$ar + c(-2, 2) * sqrt(Z.ar$asy.var)</p><p>[1] 0.7405 1.0400</p><p>> acf(Z.ar$res[-1])</p><p>4.6 Fitted models 85</p><p>In the code above, a “−1” is used in the vector of residuals to remove the</p><p>first item from the residual series (Fig. 4.13). (For a fitted AR(1) model, the</p><p>first item has no predicted value because there is no observation at t = 0; in</p><p>general, the first p values will be ‘not available’ (NA) in the residual series of</p><p>a fitted AR(p) model.)</p><p>By default, the mean is subtracted before the parameters are estimated,</p><p>so a predicted value ẑt at time t based on the output above is given by</p><p>ẑt = 2.8 + 0.89(zt−1 − 2.8) (4.23)</p><p>0 5 10 15</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.13. The correlogram of residual series for the AR(1) model fitted to the</p><p>exchange rate data.</p><p>4.6.3 Global temperature series: Fitted AR model</p><p>The global temperature series was introduced in §1.4.5, where it was apparent</p><p>that the data exhibited an increasing trend after 1970, which may be due to</p><p>the ‘greenhouse effect’. Sceptics may claim that the apparent increasing trend</p><p>can be dismissed as a transient stochastic phenomenon. For their claim to be</p><p>consistent with the time series data, it should be possible to model the trend</p><p>without the use of deterministic functions.</p><p>Consider the following AR model fitted to the mean annual temperature</p><p>series:</p><p>> www = "http://www.massey.ac.nz/~pscowper/ts/global.dat"</p><p>> Global = scan(www)</p><p>> Global.ts</p><p>= ts(Global, st = c(1856, 1), end = c(2005, 12),</p><p>fr = 12)</p><p>86 4 Basic Stochastic Models</p><p>> Global.ar mean(aggregate(Global.ts, FUN = mean))</p><p>[1] -0.1383</p><p>> Global.ar$order</p><p>[1] 4</p><p>> Global.ar$ar</p><p>[1] 0.58762 0.01260 0.11117 0.26764</p><p>> acf(Global.ar$res[-(1:Global.ar$order)], lag = 50)</p><p>0 10 20 30 40 50</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.14. The correlogram of the residual series for the AR(4) model fitted to the</p><p>annual global temperature series. The correlogram is approximately white noise so</p><p>that, in the absence of further information, a simple stochastic model can ‘explain’</p><p>the correlation and trends in the series.</p><p>Based on the output above a predicted mean annual temperature x̂t at</p><p>time t is given by</p><p>x̂t = −0.14 + 0.59(xt−1 + 0.14) + 0.013(xt−2 + 0.14)</p><p>+0.11(xt−3 + 0.14) + 0.27(xt−4 + 0.14) (4.24)</p><p>The correlogram of the residuals has only one (marginally) significant value</p><p>at lag 27, so the underlying residual series could be white noise (Fig. 4.14).</p><p>Thus the fitted AR(4) model (Equation (4.24)) provides a good fit to the</p><p>data. As the AR model has no deterministic trend component, the trends in</p><p>the data can be explained by serial correlation and random variation, implying</p><p>that it is possible that these trends are stochastic (or could arise from a purely</p><p>4.8 Exercises 87</p><p>stochastic process). Again we emphasise that this does not imply that there is</p><p>no underlying reason for the trends. If a valid scientific explanation is known,</p><p>such as a link with the increased use of fossil fuels, then this information would</p><p>clearly need to be included in any future forecasts of the series.</p><p>4.7 Summary of R commands</p><p>set.seed sets a seed for the random number generator</p><p>enabling a simulation to be reproduced</p><p>rnorm simulates Gaussian white noise series</p><p>diff creates a series of first-order differences</p><p>ar gets the best fitting AR(p) model</p><p>pacf extracts partial autocorrelations</p><p>and partial correlogram</p><p>polyroot extracts the roots of a polynomial</p><p>resid extracts the residuals from a fitted model</p><p>4.8 Exercises</p><p>1. Simulate discrete white noise from an exponential distribution and plot the</p><p>histogram and the correlogram. For example, you can use the R command</p><p>w</p><p>5,</p><p>© Springer Science+Business Media, LLC 2009</p><p>92 5 Regression</p><p>to erroneously high statistical significance being attributed to statistical tests</p><p>in standard computer output (the p values will be smaller than they should</p><p>be). Presenting correct statistical evidence is important. For example, an en-</p><p>vironmental protection group could be undermined by allegations that it is</p><p>falsely claiming statistically significant trends. In this chapter, generalised</p><p>least squares is used to obtain improved estimates of the standard error to</p><p>account for autocorrelation in the residual series.</p><p>5.2 Linear models</p><p>5.2.1 Definition</p><p>A model for a time series {xt : t = 1, . . . n} is linear if it can be expressed as</p><p>xt = α0 + α1u1,t + α2u2,t + . . .+ αmum,t + zt (5.1)</p><p>where ui,t is the value of the ith predictor (or explanatory) variable at time</p><p>t (i = 1, . . . ,m; t = 1, . . . , n), zt is the error at time t, and α0, α1, . . . , αm</p><p>are model parameters, which can be estimated by least squares. Note that the</p><p>errors form a time series {zt}, with mean 0, that does not have to be Gaussian</p><p>or white noise. An example of a linear model is the pth-order polynomial</p><p>function of t:</p><p>xt = α0 + α1t+ α2t</p><p>2 . . .+ αpt</p><p>p + zt (5.2)</p><p>The predictor variables can be written ui,t = ti (i = 1, . . . , p). The term</p><p>‘linear’ is a reference to the summation of model parameters, each multiplied</p><p>by a single predictor variable.</p><p>A simple special case of a linear model is the straight-line model obtained</p><p>by putting p = 1 in Equation (5.2): xt = α0 +α1t+ zt. In this case, the value</p><p>of the line at time t is the trend mt. For the more general polynomial, the</p><p>trend at time t is the value of the underlying polynomial evaluated at t, so in</p><p>Equation (5.2) the trend is mt = α0 + α1t+ α2t</p><p>2 . . .+ αpt</p><p>p.</p><p>Many non-linear models can be transformed to linear models. For example,</p><p>the model xt = eα0+α1t+zt for the series {xt} can be transformed by taking</p><p>natural logarithms to obtain a linear model for the series {yt}:</p><p>yt = log xt = α0 + α1t+ zt (5.3)</p><p>In Equation (5.3), standard least squares regression could then be used to fit</p><p>a linear model (i.e., estimate the parameters α0 and α1) and make predictions</p><p>for yt. To make predictions for xt, the inverse transform needs to be applied</p><p>to yt, which in this example is exp(yt). However, this usually has the effect</p><p>of biasing the forecasts of mean values, and we discuss correction factors in</p><p>§5.10.</p><p>Natural processes that generate time series are not expected to be precisely</p><p>linear, but linear approximations are often adequate. However, we are not</p><p>5.2 Linear models 93</p><p>restricted to linear models, and the Bass model (§3.3) is an example of a non-</p><p>linear model, which we fitted using the non-linear least squares function nls.</p><p>5.2.2 Stationarity</p><p>Linear models for time series are non-stationary when they include functions</p><p>of time. Differencing can often transform a non-stationary series with a de-</p><p>terministic trend to a stationary series. For example, if the time series {xt} is</p><p>given by the straight-line function plus white noise xt = α0 + α1t + zt, then</p><p>the first-order differences are given by</p><p>∇xt = xt − xt−1 = zt − zt−1 + α1 (5.4)</p><p>Assuming the error series {zt} is stationary, the series {∇xt} is stationary</p><p>as it is not a function of t. In §4.3.6 we found that first-order differencing</p><p>can transform a non-stationary series with a stochastic trend (the random</p><p>walk) to a stationary series. Thus, differencing can remove both stochastic and</p><p>deterministic trends from time series. If the underlying trend is a polynomial</p><p>of order m, then mth-order differencing is required to remove the trend.</p><p>Notice that differencing the straight-line function plus white noise leads to</p><p>a different stationary time series than subtracting the trend. The latter gives</p><p>white noise, whereas differencing gives a series of consecutive white noise terms</p><p>(which is an example of an MA process, described in Chapter 6).</p><p>5.2.3 Simulation</p><p>In time series regression, it is common for the error series {zt} in Equation</p><p>(5.1) to be autocorrelated. In the code below a time series with an increas-</p><p>ing straight-line trend (50 + 3t) with autocorrelated errors is simulated and</p><p>plotted:</p><p>> set.seed(1)</p><p>> z for (t in 2:100) z[t] Time x plot(x, xlab = "time", type = "l")</p><p>The model for the code above can be expressed as xt = 50 + 3t + zt, where</p><p>{zt} is the AR(1) process zt = 0.8zt−1 +wt and {wt} is Gaussian white noise</p><p>with σ = 20. A time plot of a realisation of {xt} is given in Figure 5.1.</p><p>94 5 Regression</p><p>0 20 40 60 80 100</p><p>10</p><p>0</p><p>20</p><p>0</p><p>30</p><p>0</p><p>40</p><p>0</p><p>time</p><p>x</p><p>Fig. 5.1. Time plot of a simulated time series with a straight-line trend and AR(1)</p><p>residual errors.</p><p>5.3 Fitted models</p><p>5.3.1 Model fitted to simulated data</p><p>Linear models are usually fitted by minimising the sum of squared errors,∑</p><p>z2</p><p>t =</p><p>∑</p><p>(xt−α0−α1u1,t− . . .−αmum,t)2, which is achieved in R using the</p><p>function lm:</p><p>> x.lm coef(x.lm)</p><p>(Intercept) Time</p><p>58.55 3.06</p><p>> sqrt(diag(vcov(x.lm)))</p><p>(Intercept) Time</p><p>4.8801 0.0839</p><p>In the code above, the estimated parameters of the linear model are extracted</p><p>using coef. Note that, as expected, the estimates are close to the underlying</p><p>parameter values of 50 for the intercept and 3 for the slope. The standard</p><p>errors are extracted using the square root of the diagonal elements obtained</p><p>from vcov, although these standard errors are likely to be underestimated</p><p>because of autocorrelation in the residuals. The function summary can also be</p><p>used to obtain this information but tends to give additional information, for</p><p>example t-tests, which may be incorrect for a time series regression analysis</p><p>due to autocorrelation in the residuals.</p><p>After fitting a regression model, we should consider various diagnostic</p><p>plots. In the case of time series regression, an important diagnostic plot is the</p><p>correlogram of the residuals:</p><p>5.3 Fitted models 95</p><p>> acf(resid(x.lm))</p><p>> pacf(resid(x.lm))</p><p>As expected, the residual time series is autocorrelated (Fig. 5.2). In Figure</p><p>5.3, only the lag 1 partial autocorrelation is significant, which suggests that</p><p>the residual series follows an AR(1) process. Again this should be as expected,</p><p>given that an AR(1) process was used to simulate these residuals.</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.2. Residual correlogram for the fitted straight-line model.</p><p>5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 5.3. Residual partial correlogram for the fitted straight-line model.</p><p>5.3.2 Model fitted to the temperature series (1970–2005)</p><p>In §1.4.5, we extracted temperatures for the period 1970–2005. The follow-</p><p>ing regression model is fitted to the global temperature over this period,</p><p>96 5 Regression</p><p>and approximate 95% confidence intervals are given for the parameters us-</p><p>ing confint. The explanatory variable is the time, so the function time is</p><p>used to extract the ‘times’ from the ts temperature object.</p><p>> www Global Global.ts temp temp.lm coef(temp.lm)</p><p>(Intercept) time(temp)</p><p>-34.9204 0.0177</p><p>> confint(temp.lm)</p><p>2.5 % 97.5 %</p><p>(Intercept) -37.2100 -32.6308</p><p>time(temp) 0.0165 0.0188</p><p>> acf(resid(lm(temp ~ time(temp))))</p><p>The confidence interval for the slope does not contain zero, which would pro-</p><p>vide statistical evidence of an increasing trend in global temperatures if the</p><p>autocorrelation in the residuals is negligible. However, the residual series is</p><p>positively autocorrelated at shorter lags (Fig. 5.4), leading to an underesti-</p><p>mate of the standard error and too narrow a confidence interval for the slope.</p><p>Intuitively, the positive correlation between consecutive values reduces the</p><p>effective record length because similar values will tend to occur together. The</p><p>following section illustrates</p><p>the reasoning behind this but may be omitted,</p><p>without loss of continuity, by readers who do not require the mathematical</p><p>details.</p><p>5.3.3 Autocorrelation and the estimation of sample statistics*</p><p>To illustrate the effect of autocorrelation in estimation, the sample mean will</p><p>be used, as it is straightforward to analyse and is used in the calculation of</p><p>other statistical properties.</p><p>Suppose {xt : t = 1, . . . , n} is a time series of independent random variables</p><p>with mean E(xt) = µ and variance Var(xt) = σ2. Then it is well known in</p><p>the study of random samples that the sample mean x̄ =</p><p>∑n</p><p>t=1 xt/n has mean</p><p>E(x̄) = µ and variance Var(x̄) = σ2/n (or standard error σ/</p><p>√</p><p>n). Now let</p><p>{xt : t = 1, . . . , n} be a stationary time series with E(xt) = µ, Var(xt) = σ2,</p><p>and autocorrelation function Cor(xt, xt+k) = ρk. Then the variance of the</p><p>sample mean is given by</p><p>Var (x̄) =</p><p>σ2</p><p>n</p><p>[</p><p>1 + 2</p><p>n−1∑</p><p>k=1</p><p>(1− k/n)ρk</p><p>]</p><p>(5.5)</p><p>5.3 Fitted models 97</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.4. Residual correlogram for the regression model fitted to the global temper-</p><p>ature series (1970–2005).</p><p>In Equation (5.5) the variance σ2/n for an independent random sam-</p><p>ple arises as the special case where ρk = 0 for all k > 0. If ρk > 0, then</p><p>Var(x̄) > σ2/n and the resulting estimate of µ is less accurate than that ob-</p><p>tained from a random (independent) sample of the same size. Conversely, if</p><p>ρk library(nlme)</p><p>> x.gls coef(x.gls)</p><p>(Intercept) Time</p><p>58.23 3.04</p><p>> sqrt(diag(vcov(x.gls)))</p><p>(Intercept) Time</p><p>11.925 0.202</p><p>A lag 1 autocorrelation of 0.8 is used above because this value was used to</p><p>simulate the data (§5.2.3). For historical series, the lag 1 autocorrelation would</p><p>need to be estimated from the correlogram of the residuals of a fitted linear</p><p>model; i.e., a linear model should first be fitted by ordinary least squares</p><p>(OLS) and the lag 1 autocorrelation read off from a correlogram plot of the</p><p>residuals of the fitted model.</p><p>In the example above, the standard errors of the parameters are consid-</p><p>erably greater than those obtained from OLS using lm (§5.3) and are more</p><p>accurate as they take the autocorrelation into account. The parameter esti-</p><p>mates from GLS will generally be slightly different from those obtained with</p><p>OLS, because of the weighting. For example, the slope is estimated as 3.06</p><p>using lm but 3.04 using gls. In principle, the GLS estimators are preferable</p><p>because they have smaller standard errors.</p><p>5.5 Linear models with seasonal variables 99</p><p>5.4.2 Confidence interval for the trend in the temperature series</p><p>To calculate an approximate 95% confidence interval for the trend in the global</p><p>temperature series (1970–2005), GLS is used to estimate the standard error</p><p>accounting for the autocorrelation in the residual series (Fig. 5.4). In the gls</p><p>function, the residual series is approximated as an AR(1) process with a lag</p><p>1 autocorrelation of 0.7 read from Figure 5.4, which is used as a parameter in</p><p>the gls function:</p><p>> temp.gls confint(temp.gls)</p><p>2.5 % 97.5 %</p><p>(Intercept) -39.8057 -28.4966</p><p>time(temp) 0.0144 0.0201</p><p>Although the confidence intervals above are now wider than they were in §5.3,</p><p>zero is not contained in the intervals, which implies that the estimates are</p><p>statistically significant, and, in particular, that the trend is significant. Thus,</p><p>there is statistical evidence of an increasing trend in global temperatures over</p><p>the period 1970–2005, so that, if current conditions persist, temperatures may</p><p>be expected to continue to rise in the future.</p><p>5.5 Linear models with seasonal variables</p><p>5.5.1 Introduction</p><p>As time series are observations measured sequentially in time, seasonal effects</p><p>are often present in the data, especially annual cycles caused directly or indi-</p><p>rectly by the Earth’s movement around the Sun. Seasonal effects have already</p><p>been observed in several of the series we have looked at, including the airline</p><p>series (§1.4.1), the temperature series (§1.4.5), and the electricity production</p><p>series (§1.4.3). In this section, linear regression models with predictor variables</p><p>for seasonal effects are considered.</p><p>5.5.2 Additive seasonal indicator variables</p><p>Suppose a time series contains s seasons. For example, with time series mea-</p><p>sured over each calendar month, s = 12, whereas for series measured over</p><p>six-month intervals, corresponding to summer and winter, s = 2. A seasonal</p><p>indicator model for a time series {xt : t = 1, . . . , n} containing s seasons and</p><p>a trend mt is given by</p><p>xt = mt + st + zt (5.6)</p><p>where st = βi when t falls in the ith season (t = 1, . . . , n; i = 1, . . . , s) and</p><p>{zt} is the residual error series, which may be autocorrelated. This model</p><p>100 5 Regression</p><p>takes the same form as the additive decomposition model (Equation (1.2))</p><p>but differs in that the trend is formulated with parameters. In Equation (5.6),</p><p>mt does not have a constant term (referred to as the intercept), i.e., mt could</p><p>be a polynomial of order p with parameters α1, . . . , αp. Equation (5.6) is then</p><p>equivalent to a polynomial trend in which the constant term depends on the</p><p>season, so that the s seasonal parameters (β1, . . . , βs) correspond to s possible</p><p>constant terms in Equation (5.2). Equation (5.6) can therefore be written as</p><p>xt = mt + β1+(t−1) mod s + zt (5.7)</p><p>For example, with a time series {xt} observed for each calendar month</p><p>beginning with t = 1 at January, a seasonal indicator model with a straight-</p><p>line trend is given by</p><p>xt = α1t+ st + zt =</p><p></p><p>α1t+ β1 + zt t = 1, 13, . . .</p><p>α1t+ β2 + zt t = 2, 14, . . .</p><p>...</p><p>α1t+ β12 + zt t = 12, 24, . . .</p><p>(5.8)</p><p>The parameters for the model in Equation (5.8) can be estimated by OLS</p><p>or GLS by treating the seasonal term st as a ‘factor’. In R, the factor function</p><p>can be applied to seasonal indices extracted using the function</p><p>cycle (§1.4.1).</p><p>5.5.3 Example: Seasonal model for the temperature series</p><p>The parameters of a straight-line trend with additive seasonal indices can be</p><p>estimated for the temperature series (1970–2005) as follows:</p><p>> Seas Time temp.lm coef(temp.lm)</p><p>Time factor(Seas)1 factor(Seas)2 factor(Seas)3</p><p>0.0177 -34.9973 -34.9880 -35.0100</p><p>factor(Seas)4 factor(Seas)5 factor(Seas)6 factor(Seas)7</p><p>-35.0123 -35.0337 -35.0251 -35.0269</p><p>factor(Seas)8 factor(Seas)9 factor(Seas)10 factor(Seas)11</p><p>-35.0248 -35.0383 -35.0525 -35.0656</p><p>factor(Seas)12</p><p>-35.0487</p><p>A zero is used within the formula to ensure that the model does not have an</p><p>intercept. If the intercept is included in the formula, one of the seasonal terms</p><p>will be dropped and an estimate for the intercept will appear in the output.</p><p>However, the fitted models, with or without an intercept, would be equivalent,</p><p>as can be easily verified by rerunning the algorithm above without the zero in</p><p>5.6 Harmonic seasonal models 101</p><p>the formula. The parameters can also be estimated by GLS by replacing lm</p><p>with gls in the code above.</p><p>Using the above fitted model, a two-year-ahead future prediction for the</p><p>temperature series is obtained as follows:</p><p>> new.t alpha beta (alpha * new.t + beta)[1:4]</p><p>factor(Seas)1 factor(Seas)2 factor(Seas)3 factor(Seas)4</p><p>0.524 0.535 0.514 0.514</p><p>Alternatively, the predict function can be used to make forecasts provided</p><p>the new data are correctly labelled within a data.frame:</p><p>> new.dat predict(temp.lm, new.dat)[1:24]</p><p>1 2 3 4 5 6 7 8 9 10 11 12</p><p>0.524 0.535 0.514 0.514 0.494 0.504 0.503 0.507 0.495 0.482 0.471 0.489</p><p>13 14 15 16 17 18 19 20 21 22 23 24</p><p>0.542 0.553 0.532 0.531 0.511 0.521 0.521 0.525 0.513 0.500 0.488 0.507</p><p>5.6 Harmonic seasonal models</p><p>In the previous section, one parameter estimate is used per season. However,</p><p>seasonal effects often vary smoothly over the seasons, so that it may be more</p><p>parameter-efficient to use a smooth function instead of separate indices.</p><p>Sine and cosine functions can be used to build smooth variation into a</p><p>seasonal model. A sine wave with frequency f (cycles per sampling interval),</p><p>amplitude A, and phase shift φ can be expressed as</p><p>A sin(2πft+ φ) = αs sin(2πft) + αc cos(2πft) (5.9)</p><p>where αs = A cos(φ) and αc = A sin(φ). The expression on the right-hand</p><p>side of Equation (5.9) is linear in the parameters αs and αc, whilst the left-</p><p>hand side is non-linear because the parameter φ is within the sine function.</p><p>Hence, the expression on the right-hand side is preferred in the formulation</p><p>of a seasonal regression model, so that OLS can be used to estimate the</p><p>parameters. For a time series {xt} with s seasons there are [s/2] possible</p><p>cycles.1 The harmonic seasonal model is defined by</p><p>1 The notation [ ] represents the integer part of the expression within. In most</p><p>practical cases, s is even and so [ ] can be omitted. However, for some ‘seasons’,</p><p>s may be an odd number, making the notation necessary. For example, if the</p><p>‘seasons’ are the days of the week, there would be [7/2] = 3 possible cycles.</p><p>102 5 Regression</p><p>xt = mt +</p><p>[s/2]∑</p><p>i=1</p><p>{</p><p>si sin(2πit/s) + ci cos(2πit/s)</p><p>}</p><p>+ zt (5.10)</p><p>wheremt is the trend which includes a parameter for the constant term, and si</p><p>and ci are unknown parameters. The trend may take a polynomial form as in</p><p>Equation (5.2). When s is an even number, the value of the sine at frequency</p><p>1/2 (when i = s/2 in the summation term shown in Equation (5.10)) will</p><p>be zero for all values of t, and so the term can be left out of the model.</p><p>Hence, with a constant term included, the maximum number of parameters</p><p>in the harmonic model equals that of the seasonal indicator variable model</p><p>(Equation (5.6)), and the fits will be identical.</p><p>At first sight it may seem strange that the harmonic model has cycles of</p><p>a frequency higher than the seasonal frequency of 1/s. However, the addition</p><p>of further harmonics has the effect of perturbing the underlying wave to make</p><p>it less regular than a standard sine wave of period s. This usually still gives</p><p>a dominant seasonal pattern of period s, but with a more realistic underlying</p><p>shape. For example, suppose data are taken at monthly intervals. Then the</p><p>second plot given below might be a more realistic underlying seasonal pattern</p><p>than the first plot, as it perturbs the standard sine wave by adding another</p><p>two harmonic terms of frequencies 2/12 and 4/12 (Fig. 5.5):</p><p>> TIME plot(TIME, sin(2 * pi * TIME/12), type = "l")</p><p>> plot(TIME, sin(2 * pi * TIME/12) + 0.2 * sin(2 * pi * 2 *</p><p>TIME/12) + 0.1 * sin(2 * pi * 4 * TIME/12) + 0.1 *</p><p>cos(2 * pi * 4 * TIME/12), type = "l")</p><p>The code above illustrates just one of many possible combinations of harmon-</p><p>ics that could be used to model a wide range of possible underlying seasonal</p><p>patterns.</p><p>5.6.1 Simulation</p><p>It is straightforward to simulate a series based on the harmonic model given</p><p>by Equation (5.10). For example, suppose the underlying model is</p><p>xt = 0.1 + 0.005t+ 0.001t2 + sin(2πt/12)+</p><p>0.2 sin(4πt/12) + 0.1 sin(8πt/12) + 0.1 cos(8πt/12) + wt</p><p>(5.11)</p><p>where {wt} is Gaussian white noise with standard deviation 0.5. This model</p><p>has the same seasonal harmonic components as the model represented in Fig-</p><p>ure 5.5b but also contains an underlying quadratic trend. Using the code</p><p>below, a series of length 10 years is simulated, and it is shown in Figure 5.6.</p><p>> set.seed(1)</p><p>> TIME w Trend Seasonal x plot(x, type = "l")</p><p>5.6.2 Fit to simulated series</p><p>With reference to Equation (5.10), it would seem reasonable to place the</p><p>harmonic variables in matrices, which can be achieved as follows:</p><p>> SIN for (i in 1:6) {</p><p>104 5 Regression</p><p>0 20 40 60 80 100 120</p><p>0</p><p>5</p><p>10</p><p>15</p><p>Fig. 5.6. Ten years of simulated data for the model given by Equation (5.11).</p><p>COS[, i] x.lm1 coef(x.lm1)/sqrt(diag(vcov(x.lm1)))</p><p>(Intercept) TIME I(TIME^2) COS[, 1] SIN[, 1] COS[, 2]</p><p>1.239 1.125 25.933 0.328 15.442 -0.515</p><p>SIN[, 2] COS[, 3] SIN[, 3] COS[, 4] SIN[, 4] COS[, 5]</p><p>3.447 0.232 -0.703 0.228 1.053 -1.150</p><p>SIN[, 5] COS[, 6] SIN[, 6]</p><p>0.857 -0.310 0.382</p><p>The preceding output</p><p>has three significant coefficients. These are used in the</p><p>following model:2</p><p>2 Some statisticians choose to include both the COS and SIN terms for a particular</p><p>frequency if either has a statistically significant value.</p><p>5.6 Harmonic seasonal models 105</p><p>> x.lm2 coef(x.lm2)/sqrt(diag(vcov(x.lm2)))</p><p>(Intercept) I(TIME^2) SIN[, 1] SIN[, 2]</p><p>4.63 111.14 15.79 3.49</p><p>As can be seen in the output from the last command, the coefficients are all</p><p>significant. The estimated coefficients of the best-fitting model are given by</p><p>> coef(x.lm2)</p><p>(Intercept) I(TIME^2) SIN[, 1] SIN[, 2]</p><p>0.28040 0.00104 0.90021 0.19886</p><p>The coefficients above give the following model for predictions at time t:</p><p>x̂t = 0.280 + 0.00104t2 + 0.900 sin(2πt/12) + 0.199 sin(4πt/12) (5.12)</p><p>The AIC can be used to compare the two fitted models:</p><p>> AIC(x.lm1)</p><p>[1] 165</p><p>> AIC(x.lm2)</p><p>[1] 150</p><p>As expected, the last model has the smallest AIC and therefore provides the</p><p>best fit to the data. Due to sampling variation, the best-fitting model is not</p><p>identical to the model used to simulate the data, as can easily be verified by</p><p>taking the AIC of the known underlying model:</p><p>> AIC(lm(x ~ TIME +I(TIME^2) +SIN[,1] +SIN[,2] +SIN[,4] +COS[,4]))</p><p>[1] 153</p><p>In R, the algorithm step can be used to automate the selection of the best-</p><p>fitting model by the AIC. For the example above, the appropriate command</p><p>is step(x.lm1), which contains all the predictor variables in the form of the</p><p>first model. Try running this command, and check that the final output agrees</p><p>with the model selected above.</p><p>A best fit can equally well be based on choosing the model that leads to</p><p>the smallest estimated standard deviations of the errors, provided the degrees</p><p>of freedom are taken into account.</p><p>5.6.3 Harmonic model fitted to temperature series (1970–2005)</p><p>In the code below, a harmonic model with a quadratic trend is fitted to the</p><p>temperature series (1970–2005) from §5.3.2. The units for the ‘time’ variable</p><p>are in ‘years’, so the divisor of 12 is not needed when creating the harmonic</p><p>variables. To reduce computation error in the OLS procedure due to large</p><p>numbers, the TIME variable is standardized after the COS and SIN predictors</p><p>have been calculated.</p><p>106 5 Regression</p><p>> SIN for (i in 1:6) {</p><p>COS[, i] TIME mean(time(temp))</p><p>[1] 1988</p><p>> sd(time(temp))</p><p>[1] 10.4</p><p>> temp.lm1 coef(temp.lm1)/sqrt(diag(vcov(temp.lm1)))</p><p>(Intercept) TIME I(TIME^2) COS[, 1] SIN[, 1] COS[, 2]</p><p>18.245 30.271 1.281 0.747 2.383 1.260</p><p>SIN[, 2] COS[, 3] SIN[, 3] COS[, 4] SIN[, 4] COS[, 5]</p><p>1.919 0.640 0.391 0.551 0.168 0.324</p><p>SIN[, 5] COS[, 6] SIN[, 6]</p><p>0.345 -0.409 -0.457</p><p>> temp.lm2 coef(temp.lm2)</p><p>(Intercept) TIME SIN[, 1] SIN[, 2]</p><p>0.1750 0.1841 0.0204 0.0162</p><p>> AIC(temp.lm)</p><p>[1] -547</p><p>> AIC(temp.lm1)</p><p>[1] -545</p><p>> AIC(temp.lm2)</p><p>[1] -561</p><p>Again, the AIC is used to compare the fitted models, and only statistically</p><p>significant terms are included in the final model.</p><p>To check the adequacy of the fitted model, it is appropriate to create a</p><p>time plot and correlogram of the residuals because the residuals form a time</p><p>series (Fig. 5.7). The time plot is used to detect patterns in the series. For</p><p>example, if a higher-ordered polynomial is required, this would show up as a</p><p>curve in the time plot. The purpose of the correlogram is to determine whether</p><p>there is autocorrelation in the series, which would require a further model.</p><p>5.6 Harmonic seasonal models 107</p><p>> plot(time(temp), resid(temp.lm2), type = "l")</p><p>> abline(0, 0, col = "red")</p><p>> acf(resid(temp.lm2))</p><p>> pacf(resid(temp.lm2))</p><p>In Figure 5.7(a), there is no discernible curve in the series, which implies</p><p>that a straight line is an adequate description of the trend. A tendency for the</p><p>series to persist above or below the x-axis implies that the series is positively</p><p>autocorrelated. This is verified in the correlogram of the residuals, which shows</p><p>a clear positive autocorrelation at lags 1–10 (Fig. 5.7b).</p><p>1970 1975 1980 1985 1990 1995 2000 2005</p><p>−</p><p>0.</p><p>4</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>(a)</p><p>R</p><p>es</p><p>id</p><p>ua</p><p>l</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>(b)</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>(c)</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 5.7. Residual diagnostic plots for the harmonic model fitted to the temperature</p><p>series (1970–2005): (a) the residuals plotted against time; (b) the correlogram of the</p><p>residuals (time units are months); (c) partial autocorrelations plotted against lag</p><p>(in months).</p><p>The correlogram in Figure 5.7 is similar to that expected of an AR(p)</p><p>process (§4.5.5). This is verified by the plot of the partial autocorrelations,</p><p>in which only the lag 1 and lag 2 autocorrelations are statistically significant</p><p>(Fig. 5.7). In the code below, an AR(2) model is fitted to the residual series:</p><p>108 5 Regression</p><p>> res.ar res.ar$ar</p><p>[1] 0.494 0.307</p><p>> sd(res.ar$res[-(1:2)])</p><p>[1] 0.0837</p><p>> acf(res.ar$res[-(1:2)])</p><p>The correlogram of the residuals of the fitted AR(2) model is given in Figure</p><p>5.8, from which it is clear that the residuals are approximately white noise.</p><p>Hence, the final form of the model provides a good fit to the data. The fitted</p><p>model for the monthly temperature series can be written as</p><p>xt = 0.175 +</p><p>0.184(t− 1988)</p><p>10.4</p><p>+ 0.0204 sin(2πt) + 0.0162 sin(4πt) + zt (5.13)</p><p>where t is ‘time’ measured in units of ‘years’, the residual series {zt} follow</p><p>an AR(2) process given by</p><p>zt = 0.494zt−1 + 0.307zt−2 + wt (5.14)</p><p>and {wt} is white noise with mean zero and standard deviation 0.0837.</p><p>If we require an accurate assessment of the standard error, we should refit</p><p>the model using gls, allowing for an AR(2) structure for the errors (Exer-</p><p>cise 6).</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.8. Correlogram of the residuals of the AR(2) model fitted to the residuals of</p><p>the harmonic model for the temperature series.</p><p>5.7 Logarithmic transformations 109</p><p>5.7 Logarithmic transformations</p><p>5.7.1 Introduction</p><p>Recall from §5.2 that the natural logarithm (base e) can be used to transform</p><p>a model with multiplicative components to a model with additive components.</p><p>For example, if {xt} is a time series given by</p><p>xt = m′</p><p>t s</p><p>′</p><p>t z</p><p>′</p><p>t (5.15)</p><p>where m′</p><p>t is the trend, s′t is the seasonal effect, and z′t is the residual error,</p><p>then the series {yt}, given by</p><p>yt = log xt = logm′</p><p>t + log s′t + log z′t = mt + st + zt (5.16)</p><p>has additive components, so that if mt and st are also linear functions, the</p><p>parameters in Equation (5.16) can be estimated by OLS. In Equation (5.16),</p><p>logs can be taken only if the series {xt} takes all positive values; i.e., xt > 0 for</p><p>all t. Conversely, a log-transformation may be seen as an appropriate model</p><p>formulation when a series can only take positive values and has values near</p><p>zero because the anti-log forces the predicted and simulated values for {xt}</p><p>to be positive.</p><p>5.7.2 Example using the air passenger series</p><p>Consider the air passenger series from §1.4.1. Time plots of the original series</p><p>and the natural logarithm of the series can be obtained using the code below</p><p>and are shown in Figure 5.9.</p><p>> data(AirPassengers)</p><p>> AP plot(AP)</p><p>> plot(log(AP))</p><p>In Figure 5.9(a), the variance can be seen to increase as t increases, whilst</p><p>after the logarithm is taken the variance is approximately constant over the</p><p>period of the record (Fig. 5.9b). Therefore, as the number of people using</p><p>the airline can also only be positive, the logarithm would be appropriate in</p><p>the model formulation for this time series. In the following code, a harmonic</p><p>model with polynomial trend is fitted to the air passenger series. The function</p><p>time is used to</p><p>extract the time and create a standardised time variable TIME.</p><p>> SIN for (i in 1:6) {</p><p>SIN[, i] TIME mean(time(AP))</p><p>110 5 Regression</p><p>(a)</p><p>A</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>(</p><p>10</p><p>00</p><p>s)</p><p>1950 1952 1954 1956 1958 1960</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>(b)</p><p>A</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>(</p><p>10</p><p>00</p><p>s)</p><p>1950 1952 1954 1956 1958 1960</p><p>5.</p><p>0</p><p>5.</p><p>5</p><p>6.</p><p>0</p><p>6.</p><p>5</p><p>Fig. 5.9. Time plots of (a) the airline series (1949–1960) and (b) the natural loga-</p><p>rithm of the airline series.</p><p>[1] 1955</p><p>> sd(time(AP))</p><p>[1] 3.48</p><p>> AP.lm1 coef(AP.lm1)/sqrt(diag(vcov(AP.lm1)))</p><p>(Intercept) TIME I(TIME^2) I(TIME^3) I(TIME^4) SIN[, 1]</p><p>744.685 42.382 -4.162 -0.751 1.873 4.868</p><p>COS[, 1] SIN[, 2] COS[, 2] SIN[, 3] COS[, 3] SIN[, 4]</p><p>-26.055 10.395 10.004 -4.844 -1.560 -5.666</p><p>COS[, 4] SIN[, 5] COS[, 5] SIN[, 6] COS[, 6]</p><p>1.946 -3.766 1.026 0.150 -0.521</p><p>> AP.lm2 coef(AP.lm2)/sqrt(diag(vcov(AP.lm2)))</p><p>5.7 Logarithmic transformations 111</p><p>(Intercept) TIME I(TIME^2) SIN[, 1] COS[, 1] SIN[, 2]</p><p>922.63 103.52 -8.24 4.92 -25.81 10.36</p><p>COS[, 2] SIN[, 3] SIN[, 4] COS[, 4] SIN[, 5]</p><p>9.96 -4.79 -5.61 1.95 -3.73</p><p>> AIC(AP.lm1)</p><p>[1] -448</p><p>> AIC(AP.lm2)</p><p>[1] -451</p><p>> acf(resid(AP.lm2))</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>1.</p><p>0</p><p>(a)</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>(b)</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 5.10. The correlogram (a) and partial autocorrelations (b) of the residual</p><p>series.</p><p>The residual correlogram indicates that the data are positively autocorre-</p><p>lated (Fig. 5.10). As mentioned in §5.4, the standard errors of the parameter</p><p>estimates are likely to be under-estimated if there is positive serial corre-</p><p>lation in the data. This implies that predictor variables may falsely appear</p><p>‘significant’ in the fitted model. In the code below, GLS is used to check the</p><p>significance of the variables in the fitted model, using the lag 1 autocorrelation</p><p>(approximately 0.6) from Figure 5.10.</p><p>112 5 Regression</p><p>> AP.gls coef(AP.gls)/sqrt(diag(vcov(AP.gls)))</p><p>(Intercept) TIME I(TIME^2) SIN[, 1] COS[, 1] SIN[, 2]</p><p>398.84 45.85 -3.65 3.30 -18.18 11.77</p><p>COS[, 2] SIN[, 3] SIN[, 4] COS[, 4] SIN[, 5]</p><p>11.43 -7.63 -10.75 3.57 -7.92</p><p>In Figure 5.10(b), the partial autocorrelation plot suggests that the resid-</p><p>ual series follows an AR(1) process, which is fitted to the series below:</p><p>> AP.ar AP.ar$ar</p><p>[1] 0.641</p><p>> acf(AP.ar$res[-1])</p><p>The correlogram of the residuals of the fitted AR(1) model might be taken</p><p>for white noise given that only one autocorrelation is significant (Fig. 5.11).</p><p>However, the lag of this significant value corresponds to the seasonal lag (12)</p><p>in the original series, which implies that the fitted model has failed to fully</p><p>account for the seasonal variation in the data. Understandably, the reader</p><p>might regard this as curious, given that the data were fitted using the full</p><p>seasonal harmonic model. However, seasonal effects can be stochastic just</p><p>as trends can, and the harmonic model we have used is deterministic. In</p><p>Chapter 7, models with stochastic seasonal terms will be considered.</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.11. Correlogram of the residuals from the AR(1) model fitted to the residuals</p><p>of the logarithm model.</p><p>5.8 Non-linear models 113</p><p>5.8 Non-linear models</p><p>5.8.1 Introduction</p><p>For the reasons given in §5.2, linear models are applicable to a wide range of</p><p>time series. However, for some time series it may be more appropriate to fit</p><p>a non-linear model directly rather than take logs or use a linear polynomial</p><p>approximation. For example, if a series is known to derive from a known non-</p><p>linear process, perhaps based on an underlying known deterministic law in</p><p>science, then it would be better to use this information in the model formula-</p><p>tion and fit a non-linear model directly to the data. In R, a non-linear model</p><p>can be fitted by least squares using the function nls.</p><p>In the previous section, we found that using the natural logarithm of a</p><p>series could help stabilise the variance. However, using logs can present diffi-</p><p>culties when a series contains negative values, because the log of a negative</p><p>value is undefined. One way around this problem is to add a constant to all</p><p>the terms in the series, so if {xt} is a series containing (some) negative values,</p><p>then adding c0 such that c0 > max{−xt} and then taking logs produces a</p><p>transformed series {log(c0 +xt)} that is defined for all t. A linear model (e.g.,</p><p>a straight-line trend) could then be fitted to produce for {xt} the model</p><p>xt = −c0 + eα0+α1t+zt (5.17)</p><p>where α0 and α1 are model parameters and {zt} is a residual series that may</p><p>be autocorrelated.</p><p>The main difficulty with the approach leading to Equation (5.17) is that</p><p>c0 should really be estimated like any other parameter in the model, whilst in</p><p>practice a user will often arbitrarily choose a value that satisfies the constraint</p><p>(c0 > max{−xt}). If there is a reason to expect a model similar to that in</p><p>Equation (5.17) but there is no evidence for multiplicative residual terms, then</p><p>the constant c0 should be estimated with the other model parameters using</p><p>non-linear least squares; i.e., the following model should be fitted:</p><p>xt = −c0 + eα0+α1t + zt (5.18)</p><p>5.8.2 Example of a simulated and fitted non-linear series</p><p>As non-linear models are generally fitted when the underlying non-linear func-</p><p>tion is known, we will simulate a non-linear series based on Equation (5.18)</p><p>with c0 = 0 and compare parameters estimated using nls with those of the</p><p>known underlying function.</p><p>Below, a non-linear series with AR(1) residuals is simulated and plotted</p><p>(Fig. 5.12):</p><p>> set.seed(1)</p><p>> w z for (t in 2:100) z[t] Time f x plot(x, type = "l")</p><p>> abline(0, 0)</p><p>0 20 40 60 80 100</p><p>0</p><p>10</p><p>0</p><p>20</p><p>0</p><p>30</p><p>0</p><p>40</p><p>0</p><p>time</p><p>Fig. 5.12. Plot of a non-linear series containing negative values.</p><p>The series plotted in Figure 5.12 has an apparent increasing exponential</p><p>trend but also contains negative values, so that a direct log-transformation</p><p>cannot be used and a non-linear model is needed. In R, a non-linear model is</p><p>fitted by specifying a formula with the parameters and their starting values</p><p>contained in a list:</p><p>> x.nls summary(x.nls)$parameters</p><p>Estimate Std. Error t value Pr(>|t|)</p><p>alp0 1.1764 0.074295 15.8 9.20e-29</p><p>alp1 0.0483 0.000819 59.0 2.35e-78</p><p>The estimates for α0 and α1 are close to the underlying values that were</p><p>used to simulate the data, although the standard errors of these estimates are</p><p>likely to be underestimated because of the autocorrelation in the residuals.3</p><p>3 The generalised least squares function gls can be used to fit non-linear mod-</p><p>els with autocorrelated residuals. However, in practice, computational difficulties</p><p>often arise when using this function with non-linear models.</p><p>5.10 Inverse transform and bias correction 115</p><p>5.9 Forecasting from regression</p><p>5.9.1 Introduction</p><p>A forecast is a prediction into the future. In the context of time series re-</p><p>gression, a forecast involves extrapolating a fitted model into the future by</p><p>evaluating the model function for a new series of times. The main problem</p><p>with this approach is that the trends present in the fitted series may change</p><p>in the future. Therefore, it is better to think of a forecast from a regression</p><p>model as an</p><p>expected value conditional on past trends continuing into the</p><p>future.</p><p>5.9.2 Prediction in R</p><p>The generic function for making predictions in R is predict. The function</p><p>essentially takes a fitted model and new data as parameters. The key to using</p><p>this function with a regression model is to ensure that the new data are</p><p>properly defined and labelled in a data.frame.</p><p>In the code below, we use this function in the fitted regression model</p><p>of §5.7.2 to forecast the number of air passengers travelling for the 10-year</p><p>period that follows the record (Fig. 5.13). The forecast is given by applying</p><p>the exponential function (anti-log) to predict because the regression model</p><p>was fitted to the logarithm of the series:</p><p>> new.t TIME SIN for (i in 1:6) {</p><p>COS[, i] SIN new.dat AP.pred.ts ts.plot(log(AP), log(AP.pred.ts), lty = 1:2)</p><p>> ts.plot(AP, AP.pred.ts, lty = 1:2)</p><p>5.10 Inverse transform and bias correction</p><p>5.10.1 Log-normal residual errors</p><p>The forecasts in Figure 5.13(b) were obtained by applying the anti-log to the</p><p>forecasted values obtained from the log-regression model. However, the process</p><p>116 5 Regression</p><p>(a)</p><p>Lo</p><p>g</p><p>of</p><p>a</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>1950 1955 1960 1965 1970</p><p>5.</p><p>0</p><p>6.</p><p>0</p><p>7.</p><p>0</p><p>(b)</p><p>A</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>(</p><p>10</p><p>00</p><p>s)</p><p>1950 1955 1960 1965 1970</p><p>20</p><p>0</p><p>60</p><p>0</p><p>10</p><p>00</p><p>Fig. 5.13. Air passengers (1949–1960; solid line) and forecasts (1961–1970; dotted</p><p>lines): (a) logarithm and forecasted values; (b) original series and anti-log of the</p><p>forecasted values.</p><p>of using a transformation, such as the logarithm, and then applying an inverse</p><p>transformation introduces a bias in the forecasts of the mean values. If the</p><p>regression model closely fits the data, this bias will be small (as shown in the</p><p>next example for the airline predictions). Note that a bias correction is only</p><p>for means and should not be used in simulations.</p><p>The bias in the means arises as a result of applying the inverse transform</p><p>to a residual series. For example, if the time series are Gaussian white noise</p><p>{wt}, with mean zero and standard deviation σ, then the distribution of the</p><p>inverse-transform (the anti-log) of the series is log-normal with mean e</p><p>1</p><p>2 σ2</p><p>.</p><p>This can be verified theoretically, or empirically by simulation as in the code</p><p>below:</p><p>> set.seed(1)</p><p>> sigma w mean(w)</p><p>[1] 4.69e-05</p><p>5.10 Inverse transform and bias correction 117</p><p>> mean(exp(w))</p><p>[1] 1.65</p><p>> exp(sigma^2/2)</p><p>[1] 1.65</p><p>The code above indicates that the mean of the anti-log of the Gaussian</p><p>white noise and the expected mean from a log-normal distribution are equal.</p><p>Hence, for a Gaussian white noise residual series, a correction factor of e</p><p>1</p><p>2 σ2</p><p>should be applied to the forecasts of means. The importance of this correction</p><p>factor really depends on the value of σ2. If σ2 is very small, the correction</p><p>factor will hardly change the forecasts at all and so could be neglected with-</p><p>out major concern, especially as errors from other sources are likely to be</p><p>significantly greater.</p><p>5.10.2 Empirical correction factor for forecasting means</p><p>The e</p><p>1</p><p>2 σ2</p><p>correction factor can be used when the residual series of the fitted</p><p>log-regression model is Gaussian white noise. In general, however, the distri-</p><p>bution of the residuals from the log regression (Exercise 5) is often negatively</p><p>skewed, in which case a correction factor can be determined empirically us-</p><p>ing the mean of the anti-log of the residual series. In this approach, adjusted</p><p>forecasts {x̂′t} can be obtained from</p><p>x̂′t = e</p><p>ˆlog xt</p><p>n∑</p><p>t=1</p><p>ezt/n (5.19)</p><p>where { ˆlog xt : t = 1, . . . , n} is the predicted series given by the fitted log-</p><p>regression model, and {zt} is the residual series from this fitted model.</p><p>The following example illustrates the procedure for calculating the correc-</p><p>tion factors.</p><p>5.10.3 Example using the air passenger data</p><p>For the airline series, the forecasts can be adjusted by multiplying the predic-</p><p>tions by e</p><p>1</p><p>2 σ2</p><p>, where σ is the standard deviation of the residuals, or using an</p><p>empirical correction factor as follows:</p><p>> summary(AP.lm2)$r.sq</p><p>[1] 0.989</p><p>> sigma lognorm.correction.factor empirical.correction.factor lognorm.correction.factor</p><p>[1] 1.001171</p><p>> empirical.correction.factor</p><p>[1] 1.001080</p><p>> AP.pred.ts</p><p>. . . . 113</p><p>xii Contents</p><p>5.9 Forecasting from regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.9.2 Prediction in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.10 Inverse transform and bias correction . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.10.1 Log-normal residual errors . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.10.2 Empirical correction factor for forecasting means . . . . . . 117</p><p>5.10.3 Example using the air passenger data . . . . . . . . . . . . . . . . 117</p><p>5.11 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118</p><p>5.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118</p><p>6 Stationary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121</p><p>6.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121</p><p>6.2 Strictly stationary series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121</p><p>6.3 Moving average models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122</p><p>6.3.1 MA(q) process: Definition and properties . . . . . . . . . . . . . 122</p><p>6.3.2 R examples: Correlogram and simulation . . . . . . . . . . . . . 123</p><p>6.4 Fitted MA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124</p><p>6.4.1 Model fitted to simulated series . . . . . . . . . . . . . . . . . . . . . 124</p><p>6.4.2 Exchange rate series: Fitted MA model . . . . . . . . . . . . . . 126</p><p>6.5 Mixed models: The ARMA process . . . . . . . . . . . . . . . . . . . . . . . . 127</p><p>6.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127</p><p>6.5.2 Derivation of second-order properties* . . . . . . . . . . . . . . . 128</p><p>6.6 ARMA models: Empirical analysis . . . . . . . . . . . . . . . . . . . . . . . . . 129</p><p>6.6.1 Simulation and fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129</p><p>6.6.2 Exchange rate series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129</p><p>6.6.3 Electricity production series . . . . . . . . . . . . . . . . . . . . . . . . 130</p><p>6.6.4 Wave tank data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133</p><p>6.7 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135</p><p>6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135</p><p>7 Non-stationary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137</p><p>7.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137</p><p>7.2 Non-seasonal ARIMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137</p><p>7.2.1 Differencing and the electricity series . . . . . . . . . . . . . . . . 137</p><p>7.2.2 Integrated model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138</p><p>7.2.3 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139</p><p>7.2.4 Simulation and fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140</p><p>7.2.5 IMA(1, 1) model fitted to the beer production series . . . 141</p><p>7.3 Seasonal ARIMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142</p><p>7.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142</p><p>7.3.2 Fitting procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143</p><p>7.4 ARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145</p><p>7.4.1 S&P500 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145</p><p>7.4.2 Modelling volatility: Definition of the ARCH model . . . . 147</p><p>7.4.3 Extensions and GARCH models . . . . . . . . . . . . . . . . . . . . . 148</p><p>Contents xiii</p><p>7.4.4 Simulation and fitted GARCH model . . . . . . . . . . . . . . . . 149</p><p>7.4.5 Fit to S&P500 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150</p><p>7.4.6 Volatility in climate series . . . . . . . . . . . . . . . . . . . . . . . . . . 152</p><p>7.4.7 GARCH in forecasts and simulations . . . . . . . . . . . . . . . . 155</p><p>7.5 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155</p><p>7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155</p><p>8 Long-Memory Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159</p><p>8.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159</p><p>8.2 Fractional differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159</p><p>8.3 Fitting to simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161</p><p>8.4 Assessing evidence of long-term dependence . . . . . . . . . . . . . . . . . 164</p><p>8.4.1 Nile minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164</p><p>8.4.2 Bellcore Ethernet data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165</p><p>8.4.3 Bank loan rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166</p><p>8.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167</p><p>8.6 Summary of additional commands used . . . . . . . . . . . . . . . . . . . . 168</p><p>8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168</p><p>9 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.2 Periodic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.2.1 Sine waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.2.2 Unit of measurement of frequency . . . . . . . . . . . . . . . . . . . 172</p><p>9.3 Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173</p><p>9.3.1 Fitting sine waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173</p><p>9.3.2 Sample spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175</p><p>9.4 Spectra of simulated series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175</p><p>9.4.1 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175</p><p>9.4.2 AR(1): Positive coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . 177</p><p>9.4.3 AR(1): Negative coefficient . . . . . . . . . . . . . . . . . . . . . . . . . 178</p><p>9.4.4 AR(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178</p><p>9.5 Sampling interval and record length. . . . . . . . . . . . . . . . . . . . . . . . 179</p><p>9.5.1 Nyquist frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181</p><p>9.5.2 Record length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181</p><p>9.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183</p><p>9.6.1 Wave tank data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183</p><p>9.6.2 Fault detection on electric motors . . . . . . . . . . . . . . . . . . . 183</p><p>9.6.3 Measurement of vibration dose . . . . . . . . . . . . . . . . . . . . . . 184</p><p>9.6.4 Climatic indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187</p><p>9.6.5 Bank loan rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189</p><p>9.7 Discrete Fourier transform (DFT)* . . . . . . . . . . . . . . . . . . . . . . . . 190</p><p>9.8 The spectrum of a random process* . . . . . . . . . . . . . . . . . . . . . . . . 192</p><p>9.8.1 Discrete white noise . . . . . . .</p><p>and</p><p>comment on the plot. Fit back-to-back Weibull distributions to the</p><p>errors.</p><p>c) Simulate 20 realisations of inflow for the next 10 years.</p><p>d) Give reasons why a log transformation may be suitable for the series</p><p>of inflows.</p><p>e) Regress log(inflow) on month using indicator variables and time t</p><p>(as above), and fit a suitable AR model to the residual error series.</p><p>f) Plot a histogram of the residual errors of the fitted AR model, and</p><p>comment on the plot. Fit a back-to-back Weibull distribution to the</p><p>residual errors.</p><p>120 5 Regression</p><p>g) Simulate 20 realisations of log(inflow) for the next 10-years. Take</p><p>anti-logs of the simulated values to produce a series of simulated flows.</p><p>h) Compare both sets of simulated flows, and discuss which is the more</p><p>satisfactory.</p><p>6. Refit the harmonic model to the temperature series using gls, allowing</p><p>for errors from an AR(2) process.</p><p>a) Construct a 99% confidence interval for the coefficient of time.</p><p>b) Plot the residual error series from the model fitted using GLS against</p><p>the residual error series from the model fitted using OLS.</p><p>c) Refit the AR(2) model to the residuals from the fitted (GLS) model.</p><p>d) How different are the fitted models?</p><p>e) Calculate the annual means. Use OLS to regress the annual mean</p><p>temperature on time, and construct a 99% confidence interval for its</p><p>coefficient.</p><p>6</p><p>Stationary Models</p><p>6.1 Purpose</p><p>As seen in the previous chapters, a time series will often have well-defined</p><p>components, such as a trend and a seasonal pattern. A well-chosen linear re-</p><p>gression may account for these non-stationary components, in which case the</p><p>residuals from the fitted model should not contain noticeable trend or seasonal</p><p>patterns. However, the residuals will usually be correlated in time, as this is</p><p>not accounted for in the fitted regression model. Similar values may cluster to-</p><p>gether in time; for example, monthly values of the Southern Oscillation Index,</p><p>which is closely associated with El Niño, tend to change slowly and may give</p><p>rise to persistent weather patterns. Alternatively, adjacent observations may</p><p>be negatively correlated; for example, an unusually high monthly sales figure</p><p>may be followed by an unusually low value because customers have supplies</p><p>left over from the previous month. In this chapter, we consider stationary</p><p>models that may be suitable for residual series that contain no obvious trends</p><p>or seasonal cycles. The fitted stationary models may then be combined with</p><p>the fitted regression model to improve forecasts. The autoregressive models</p><p>that were introduced in §4.5 often provide satisfactory models for the residual</p><p>time series, and we extend the repertoire in this chapter. The term stationary</p><p>was discussed in previous chapters; we now give a more rigorous definition.</p><p>6.2 Strictly stationary series</p><p>A time series model {xt} is strictly stationary if the joint statistical distribu-</p><p>tion of xt1 , . . . , xtn is the same as the joint distribution of xt1+m, . . . , xtn+m for</p><p>all t1, . . . , tn and m, so that the distribution is unchanged after an arbitrary</p><p>time shift. Note that strict stationarity implies that the mean and variance</p><p>are constant in time and that the autocovariance Cov(xt, xs) only depends on</p><p>lag k = |t − s| and can be written γ(k). If a series is not strictly stationary</p><p>but the mean and variance are constant in time and the autocovariance only</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 121</p><p>Use R, DOI 10.1007/978-0-387-88698-5 6,</p><p>© Springer Science+Business Media, LLC 2009</p><p>122 6 Stationary Models</p><p>depends on the lag, then the series is called second-order stationary.1 We focus</p><p>on the second-order properties in this chapter, but the stochastic processes</p><p>discussed are strictly stationary. Furthermore, if the white noise is Gaussian,</p><p>the stochastic process is completely defined by the mean and covariance struc-</p><p>ture, in the same way as any normal distribution is defined by its mean and</p><p>variance-covariance matrix.</p><p>Stationarity is an idealisation that is a property of models. If we fit a</p><p>stationary model to data, we assume our data are a realisation of a stationary</p><p>process. So our first step in an analysis should be to check whether there is any</p><p>evidence of a trend or seasonal effects and, if there is, remove them. Regression</p><p>can break down a non-stationary series to a trend, seasonal components, and</p><p>residual series. It is often reasonable to treat the time series of residuals as a</p><p>realisation of a stationary error series. Therefore, the models in this chapter</p><p>are often fitted to residual series arising from regression analyses.</p><p>6.3 Moving average models</p><p>6.3.1 MA(q) process: Definition and properties</p><p>A moving average (MA) process of order q is a linear combination of the</p><p>current white noise term and the q most recent past white noise terms and is</p><p>defined by</p><p>xt = wt + β1wt−1 + . . .+ βqwt−q (6.1)</p><p>where {wt} is white noise with zero mean and variance σ2</p><p>w. Equation (6.1)</p><p>can be rewritten in terms of the backward shift operator B</p><p>xt = (1 + β1B + β2B2 + · · ·+ βqBq)wt = φq(B)wt (6.2)</p><p>where φq is a polynomial of order q. Because MA processes consist of a finite</p><p>sum of stationary white noise terms, they are stationary and hence have a</p><p>time-invariant mean and autocovariance.</p><p>The mean and variance for {xt} are easy to derive. The mean is just zero</p><p>because it is a sum of terms that all have a mean of zero. The variance is σ2</p><p>w(1+</p><p>β2</p><p>1 + . . .+β2</p><p>q ) because each of the white noise terms has the same variance and</p><p>the terms are mutually independent. The autocorrelation function, for k ≥ 0,</p><p>is given by</p><p>ρ(k) =</p><p></p><p>1 k = 0∑q−k</p><p>i=0 βiβi+k/</p><p>∑q</p><p>i=0 β</p><p>2</p><p>i k = 1, . . . , q</p><p>0 k > q</p><p>(6.3)</p><p>where β0 is unity. The function is zero when k > q because xt and xt+k</p><p>then consist of sums of independent white noise terms and so have covariance</p><p>1 For example, the skewness, or more generally E(xtxt+kxt+l), might change over</p><p>time.</p><p>6.3 Moving average models 123</p><p>zero. The derivation of the autocorrelation function is left to Exercise 1. An</p><p>MA process is invertible if it can be expressed as a stationary autoregressive</p><p>process of infinite order without an error term. For example, the MA process</p><p>xt = (1− βB)wt can be expressed as</p><p>wt = (1− βB)−1xt = xt + βxt−1 + β2xt−2 + . . . (6.4)</p><p>provided |β| rho q) ACF beta rho.k for (k in 1:10) rho.k[k] plot(0:10, c(1, rho.k), pch = 4, ylab = expression(rho[k]))</p><p>> abline(0, 0)</p><p>The plot in Figure 6.1(b) is the autocovariance function</p><p>for an MA(3) process</p><p>with parameters β1 = −0.7, β2 = 0.5, and β3 = −0.2, which has negative</p><p>124 6 Stationary Models</p><p>0 2 4 6 8 10</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>lag k</p><p>ρρ k</p><p>(a)</p><p>0 2 4 6 8 10</p><p>0</p><p>1</p><p>lag k</p><p>ρρ k</p><p>(b)</p><p>Fig. 6.1. Plots of the autocorrelation functions for two MA(3) processes: (a) β1 =</p><p>0.7, β2 = 0.5, β3 = 0.2; (b) β1 = −0.7, β2 = 0.5, β3 = −0.2.</p><p>correlations at lags 1 and 3. The function expression is used to get the</p><p>Greek symbol ρ.</p><p>The code below can be used to simulate the MA(3) process and plot the cor-</p><p>relogram of the simulated series. An example time plot and correlogram are</p><p>shown in Figure 6.2. As expected, the first three autocorrelations are signif-</p><p>icantly different from 0 (Fig. 6.2b); other statistically significant correlations</p><p>are attributable to random sampling variation. Note that in the correlogram</p><p>plot (Fig. 6.2b) 1 in 20 (5%) of the sample correlations for lags greater than</p><p>3, for which the underlying population correlation is zero, are expected to be</p><p>statistically significantly different from zero at the 5% level because multiple</p><p>t-test results are being shown on the plot.</p><p>> set.seed(1)</p><p>> b x for (t in 4:1000) {</p><p>for (j in 1:3) x[t] plot(x, type = "l")</p><p>> acf(x)</p><p>6.4 Fitted MA models</p><p>6.4.1 Model fitted to simulated series</p><p>An MA(q) model can be fitted to data in R using the arima function with</p><p>the order function parameter set to c(0,0,q). Unlike the function ar, the</p><p>6.4 Fitted MA models 125</p><p>0 200 400 600 800 1000</p><p>−</p><p>4</p><p>−</p><p>2</p><p>0</p><p>2</p><p>4</p><p>(a)</p><p>Time t</p><p>R</p><p>ea</p><p>lis</p><p>at</p><p>io</p><p>n</p><p>at</p><p>t</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.2. (a) Time plot and (b) correlogram for a simulated MA(3) process.</p><p>function arima does not subtract the mean by default and estimates an in-</p><p>tercept term. MA models cannot be expressed in a multiple regression form,</p><p>and, in general, the parameters are estimated with a numerical algorithm. The</p><p>function arima minimises the conditional sum of squares to estimate values of</p><p>the parameters and will either return these if method=c("CSS") is specified</p><p>or use them as initial values for maximum likelihood estimation.</p><p>A description of the conditional sum of squares algorithm for fitting an</p><p>MA(q) process follows. For any choice of parameters, the sum of squared</p><p>residuals can be calculated iteratively by rearranging Equation (6.1) and re-</p><p>placing the errors, wt, with their estimates (that is, the residuals), which are</p><p>denoted by ŵt:</p><p>S(β̂1, . . . , β̂q) =</p><p>n∑</p><p>t=1</p><p>ŵ2</p><p>t =</p><p>n∑</p><p>t=1</p><p>{</p><p>xt − (β̂1ŵt−1 + · · ·+ β̂qŵt−q)</p><p>}2</p><p>(6.5)</p><p>conditional on ŵ0, . . . , ŵt−q being taken as 0 to start the iteration. A numerical</p><p>search is used to find the parameter values that minimise this conditional sum</p><p>of squares.</p><p>In the following code, a moving average model, x.ma, is fitted to the simu-</p><p>lated series of the last section. Looking at the parameter estimates (coefficients</p><p>in the output below), it can be seen that the 95% confidence intervals (approx-</p><p>imated by coeff. ±2 s.e. of coeff.) contain the underlying parameter values (0.8,</p><p>0.6, and 0.4) that were used in the simulations. Furthermore, also as expected,</p><p>126 6 Stationary Models</p><p>the intercept is not significantly different from its underlying parameter value</p><p>of zero.</p><p>> x.ma x.ma</p><p>Call:</p><p>arima(x = x, order = c(0, 0, 3))</p><p>Coefficients:</p><p>ma1 ma2 ma3 intercept</p><p>0.790 0.566 0.396 -0.032</p><p>s.e. 0.031 0.035 0.032 0.090</p><p>sigma^2 estimated as 1.07: log likelihood = -1452, aic = 2915</p><p>It is possible to set the value for the mean to zero, rather than estimate</p><p>the intercept, by using include.mean=FALSE within the arima function. This</p><p>option should be used with caution and would only be appropriate if you</p><p>wanted {xt} to represent displacement from some known fixed mean.</p><p>6.4.2 Exchange rate series: Fitted MA model</p><p>In the code below, an MA(1) model is fitted to the exchange rate series.</p><p>If you refer back to §4.6.2, a comparison with the output below indicates</p><p>that the AR(1) model provides the better fit, as it has the smaller standard</p><p>deviation of the residual series, 0.031 compared with 0.042. Furthermore, the</p><p>correlogram of the residuals indicates that an MA(1) model does not provide</p><p>a satisfactory fit, as the residual series is clearly not a realistic realisation of</p><p>white noise (Fig. 6.3).</p><p>> www x x.ts x.ma x.ma</p><p>Call:</p><p>arima(x = x.ts, order = c(0, 0, 1))</p><p>Coefficients:</p><p>ma1 intercept</p><p>1.000 2.833</p><p>s.e. 0.072 0.065</p><p>sigma^2 estimated as 0.0417: log likelihood = 4.76, aic = -3.53</p><p>> acf(x.ma$res[-1])</p><p>6.5 Mixed models: The ARMA process 127</p><p>0 5 10 15</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.3. The correlogram of residual series for the MA(1) model fitted to the</p><p>exchange rate data.</p><p>6.5 Mixed models: The ARMA process</p><p>6.5.1 Definition</p><p>Recall from Chapter 4 that a series {xt} is an autoregressive process of order p,</p><p>an AR(p) process, if</p><p>xt = α1xt−1 + α2xt−2 + . . .+ αpxt−p + wt (6.6)</p><p>where {wt} is white noise and the αi are the model parameters. A useful</p><p>class of models are obtained when AR and MA terms are added together in a</p><p>single expression. A time series {xt} follows an autoregressive moving average</p><p>(ARMA) process of order (p, q), denoted ARMA(p, q), when</p><p>xt = α1xt−1+α2xt−2+. . .+αpxt−p+wt+β1wt−1+β2wt−2+. . .+βqwt−q (6.7)</p><p>where {wt} is white noise. Equation (6.7) may be represented in terms of the</p><p>backward shift operator B and rearranged in the more concise polynomial</p><p>form</p><p>θp(B)xt = φq(B)wt (6.8)</p><p>The following points should be noted about an ARMA(p, q) process:</p><p>(a) The process is stationary when the roots of θ all exceed unity in absolute</p><p>value.</p><p>(b) The process is invertible when the roots of φ all exceed unity in absolute</p><p>value.</p><p>(c) The AR(p) model is the special case ARMA(p, 0).</p><p>(d) The MA(q) model is the special case ARMA(0, q).</p><p>(e) Parameter parsimony. When fitting to data, an ARMA model will often</p><p>be more parameter efficient (i.e., require fewer parameters) than a single</p><p>MA or AR model.</p><p>128 6 Stationary Models</p><p>(e) Parameter redundancy. When θ and φ share a common factor, a stationary</p><p>model can be simplified. For example, the model (1 − 1</p><p>2B)(1 − 1</p><p>3B)xt =</p><p>(1− 1</p><p>2B)wt can be written (1− 1</p><p>3B)xt = wt.</p><p>6.5.2 Derivation of second-order properties*</p><p>In order to derive the second-order properties for an ARMA(p, q) process</p><p>{xt}, it is helpful first to express the xt in terms of white noise components</p><p>wt because white noise terms are independent. We illustrate the procedure for</p><p>the ARMA(1, 1) model.</p><p>The ARMA(1, 1) process for {xt} is given by</p><p>xt = αxt−1 + wt + βwt−1 (6.9)</p><p>where wt is white noise, with E(wt) = 0 and Var(wt) = σ2</p><p>w. Rearranging</p><p>Equation (6.9) to express xt in terms of white noise components,</p><p>xt = (1− αB)−1(1 + βB)wt</p><p>Expanding the right-hand-side,</p><p>xt = (1 + αB + α2B2 + . . .)(1 + βB)wt</p><p>=</p><p>( ∞∑</p><p>i=0</p><p>αiBi</p><p>)</p><p>(1 + βB)wt</p><p>=</p><p>(</p><p>1 +</p><p>∞∑</p><p>i=0</p><p>αi+1Bi+1 +</p><p>∞∑</p><p>i=0</p><p>αiβBi+1</p><p>)</p><p>wt</p><p>= wt + (α+ β)</p><p>∞∑</p><p>i=1</p><p>αi−1wt−i (6.10)</p><p>With the equation in the form above, the second-order properties follow. For</p><p>example, the mean E(xt) is clearly zero because E(wt−i) = 0 for all i, and</p><p>the variance is given by</p><p>Var(xt) = Var</p><p>[</p><p>wt + (α+ β)</p><p>∞∑</p><p>i=1</p><p>αi−1wt−i</p><p>]</p><p>= σ2</p><p>w + σ2</p><p>w(α+ β)2(1− α2)−1 (6.11)</p><p>The autocovariance γk, for k > 0, is given by</p><p>Cov (xt, xt+k) = (α+ β)αk−1σ2</p><p>w + (α+ β)2 σ2</p><p>wα</p><p>k</p><p>∞∑</p><p>i=1</p><p>α2i−2</p><p>= (α+ β)αk−1σ2</p><p>w + (α+ β)2 σ2</p><p>wα</p><p>k(1− α2)−1</p><p>(6.12)</p><p>6.6 ARMA models: Empirical analysis 129</p><p>The autocorrelation ρk then follows as</p><p>ρk = γk/γ0 = Cov (xt, xt+k) /Var (xt)</p><p>=</p><p>αk−1(α+ β)(1 + αβ)</p><p>1 + αβ + β2</p><p>(6.13)</p><p>Note that Equation (6.13) implies ρk = αρk−1.</p><p>6.6 ARMA models: Empirical analysis</p><p>6.6.1 Simulation and fitting</p><p>The ARMA process, and the more general ARIMA processes discussed in the</p><p>next chapter, can be simulated using the R function arima.sim, which takes a</p><p>list of coefficients representing the AR</p><p>and MA parameters. An ARMA(p, q)</p><p>model can be fitted using the arima function with the order function param-</p><p>eter set to c(p, 0, q). The fitting algorithm proceeds similarly to that for</p><p>an MA process. Below, data from an ARMA(1, 1) process are simulated for</p><p>α = −0.6 and β = 0.5 (Equation (6.7)), and an ARMA(1, 1) model fitted to</p><p>the simulated series. As expected, the sample estimates of α and β are close</p><p>to the underlying model parameters.</p><p>> set.seed(1)</p><p>> x coef(arima(x, order = c(1, 0, 1)))</p><p>ar1 ma1 intercept</p><p>-0.59697 0.50270 -0.00657</p><p>6.6.2 Exchange rate series</p><p>In §6.3, a simple MA(1) model failed to provide an adequate fit to the exchange</p><p>rate series. In the code below, fitted MA(1), AR(1) and ARMA(1, 1) models</p><p>are compared using the AIC. The ARMA(1, 1) model provides the best fit</p><p>to the data, followed by the AR(1) model, with the MA(1) model providing</p><p>the poorest fit. The correlogram in Figure 6.4 indicates that the residuals of</p><p>the fitted ARMA(1, 1) model have small autocorrelations, which is consistent</p><p>with a realisation of white noise and supports the use of the model.</p><p>> x.ma x.ar x.arma AIC(x.ma)</p><p>[1] -3.53</p><p>> AIC(x.ar)</p><p>130 6 Stationary Models</p><p>[1] -37.4</p><p>> AIC(x.arma)</p><p>[1] -42.3</p><p>> x.arma</p><p>Call:</p><p>arima(x = x.ts, order = c(1, 0, 1))</p><p>Coefficients:</p><p>ar1 ma1 intercept</p><p>0.892 0.532 2.960</p><p>s.e. 0.076 0.202 0.244</p><p>sigma^2 estimated as 0.0151: log likelihood = 25.1, aic = -42.3</p><p>> acf(resid(x.arma))</p><p>0 1 2 3</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.4. The correlogram of residual series for the ARMA(1, 1) model fitted to the</p><p>exchange rate data.</p><p>6.6.3 Electricity production series</p><p>Consider the Australian electricity production series introduced in §1.4.3. The</p><p>data exhibit a clear positive trend and a regular seasonal cycle. Furthermore,</p><p>the variance increases with time, which suggests a log-transformation may be</p><p>appropriate (Fig. 1.5). A regression model is fitted to the logarithms of the</p><p>original series in the code below.</p><p>6.6 ARMA models: Empirical analysis 131</p><p>> www CBE Elec.ts Time Imth Elec.lm acf(resid(Elec.lm))</p><p>The correlogram of the residuals appears to cycle with a period of 12 months</p><p>suggesting that the monthly indicator variables are not sufficient to account</p><p>for the seasonality in the series (Fig. 6.5). In the next chapter, we find that this</p><p>can be accounted for using a non-stationary model with a stochastic seasonal</p><p>component. In the meantime, we note that the best fitting ARMA(p, q) model</p><p>can be chosen using the smallest AIC either by trying a range of combinations</p><p>of p and q in the arima function or using a for loop with upper bounds on</p><p>p and q – taken as 2 in the code shown below. In each step of the for loop,</p><p>the AIC of the fitted model is compared with the currently stored smallest</p><p>value. If the model is found to be an improvement (i.e., has a smaller AIC</p><p>value), then the new value and model are stored. To start with, best.aic is</p><p>initialised to infinity (Inf). After the loop is complete, the best model can</p><p>be found in best.order, and in this case the best model turns out to be an</p><p>AR(2) model.</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.5. Electricity production series: correlogram of the residual series of the fitted</p><p>regression model.</p><p>> best.order best.aic for (i in 0:2) for (j in 0:2) {</p><p>fit.aic best.order</p><p>[1] 2 0 0</p><p>> acf(resid(best.arma))</p><p>The predict function can be used both to forecast future values from</p><p>the fitted regression model and forecast the future errors associated with the</p><p>regression model using the ARMA model fitted to the residuals from the</p><p>regression. These two forecasts can then be summed to give a forecasted value</p><p>of the logarithm for electricity production, which would then need to be anti-</p><p>logged and perhaps adjusted using a bias correction factor. As predict is</p><p>a generic R function, it works in different ways for different input objects</p><p>and classes. For a fitted regression model of class lm, the predict function</p><p>requires the new set of data to be in the form of a data frame (object class</p><p>data.frame). For a fitted ARMA model of class arima, the predict function</p><p>requires just the number of time steps ahead for the desired forecast. In the</p><p>latter case, predict produces an object that has both the predicted values and</p><p>their standard errors, which can be extracted using pred and se, respectively.</p><p>In the code below, the electricity production for each month of the next three</p><p>years is predicted.</p><p>> new.time new.data predict.lm predict.arma elec.pred ts.plot(cbind(Elec.ts, elec.pred), lty = 1:2)</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.6. Electricity production series: correlogram of the residual series of the</p><p>best-fitting ARMA model.</p><p>6.6 ARMA models: Empirical analysis 133</p><p>The plot of the forecasted values suggests that the predicted values for</p><p>winter may be underestimated by the fitted model (Fig. 6.7), which may be</p><p>due to the remaining seasonal autocorrelation in the residuals (see Fig. 6.6).</p><p>This problem will be addressed in the next chapter.</p><p>Time</p><p>1960 1965 1970 1975 1980 1985 1990 1995</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>16</p><p>00</p><p>0</p><p>Fig. 6.7. Electricity production series: observed (solid line) and forecasted values</p><p>(dotted line). The forecasted values are not likely to be accurate because of the</p><p>seasonal autocorrelation present in the residuals for the fitted model.</p><p>6.6.4 Wave tank data</p><p>The data in the file wave.dat are the surface height of water (mm), relative</p><p>to the still water level, measured using a capacitance probe positioned at the</p><p>centre of a wave tank. The continuous voltage signal from this capacitance</p><p>probe was sampled every 0.1 second over a 39.6-second period. The objective</p><p>is to fit a suitable ARMA(p, q) model that can be used to generate a realistic</p><p>wave input to a mathematical model for an ocean-going tugboat in a computer</p><p>simulation. The results of the computer simulation will be compared with tests</p><p>using a physical model of the tugboat in the wave tank.</p><p>The pacf suggests that p should be at least 2 (Fig. 6.8). The best-fitting</p><p>ARMA(p, q) model, based on a minimum variance of residuals, was obtained</p><p>with both p and q equal to 4. The acf and pacf of the residuals from this model</p><p>are consistent with the residuals being a realisation of white noise (Fig. 6.9).</p><p>> www wave.dat attach (wave.dat)</p><p>> layout(1:3)</p><p>> plot (as.ts(waveht), ylab = 'Wave height')</p><p>> acf (waveht)</p><p>> pacf (waveht)</p><p>> wave.arma acf (wave.arma$res[-(1:4)])</p><p>> pacf (wave.arma$res[-(1:4)])</p><p>> hist(wave.arma$res[-(1:4)], xlab='height / mm', main='')</p><p>134 6 Stationary Models</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>0 100 200 300 400</p><p>−</p><p>50</p><p>0</p><p>50</p><p>0</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20 25</p><p>−</p><p>0.</p><p>6</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 6.8. Wave heights: time plot, acf, and pacf.</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20 25</p><p>−</p><p>0.</p><p>15</p><p>0.</p><p>00</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>height / mm</p><p>F</p><p>re</p><p>qu</p><p>en</p><p>cy</p><p>−400 −200 0 200 400 600</p><p>0</p><p>40</p><p>80</p><p>Fig. 6.9. Residuals after fitting an ARMA(4, 4) model to wave heights: acf, pacf,</p><p>and histogram.</p><p>6.8 Exercises 135</p><p>6.7 Summary of R commands</p><p>arima.sim simulates data from an ARMA (or ARIMA) process</p><p>arima fits an ARMA (or ARIMA) model to data</p><p>seq generates a sequence</p><p>expression used to plot maths symbol</p><p>6.8 Exercises</p><p>1. Using the relation Cov(</p><p>∑</p><p>xt,</p><p>∑</p><p>yt) =</p><p>∑∑</p><p>Cov(xt, yt) (Equation (2.15))</p><p>for time series {xt} and {yt}, prove Equation (6.3).</p><p>2. The series {wt} is white noise with zero mean and variance σ2</p><p>w. For the</p><p>following moving average models, find the autocorrelation function and</p><p>determine whether they are invertible. In addition, simulate 100 observa-</p><p>tions for each model in R, compare the time plots of the simulated series,</p><p>and comment on how the two series might be distinguished.</p><p>a) xt = wt + 1</p><p>2wt−1</p><p>b) xt = wt + 2wt−1</p><p>3. Write the following models in ARMA(p, q) notation and determine whether</p><p>they are stationary and/or invertible (wt is white noise). In each case,</p><p>check for parameter redundancy and ensure that the ARMA(p, q) nota-</p><p>tion is expressed in the simplest form.</p><p>a) xt = xt−1 − 1</p><p>4xt−2 + wt + 1</p><p>2wt−1</p><p>b) xt = 2xt−1 − xt−2 + wt</p><p>c) xt = 3</p><p>2xt−1 − 1</p><p>2xt−2 + wt − 1</p><p>2wt−1 + 1</p><p>4wt−2</p><p>d) xt = 3</p><p>2xt−1 − 1</p><p>2xt−2 + 1</p><p>2wt − wt−1</p><p>e) xt = 7</p><p>10xt−1 − 1</p><p>10xt−2 + wt − 3</p><p>2wt−1</p><p>f) xt = 3</p><p>2xt−1 − 1</p><p>2xt−2 + wt − 1</p><p>3wt−1 + 1</p><p>6wt−2</p><p>4. a) Fit a suitable regression model to the air passenger series. Comment</p><p>on the correlogram of the residuals from the fitted regression model.</p><p>b) Fit an ARMA(p, q) model for values of p and q no greater than 2</p><p>to the residual series of the fitted regression model. Choose the best</p><p>fitting model based on the AIC and comment on its correlogram.</p><p>c) Forecast the number of passengers travelling on the airline in 1961.</p><p>5. a) Write an R function that calculates the autocorrelation function (Equa-</p><p>tion (6.13)) for an ARMA(1, 1) process. Your function should take</p><p>parameters representing α and β for the AR and MA components.</p><p>136 6 Stationary Models</p><p>b) Plot the autocorrelation function above for the case with α = 0.7 and</p><p>β = −0.5 for lags 0 to 20.</p><p>c) Simulate n = 100 values of the ARMA(1, 1) model with α = 0.7</p><p>and β = −0.5, and compare the sample correlogram to the theoretical</p><p>correlogram plotted in part (b). Repeat for n = 1000.</p><p>6. Let {xt : t = 1, . . . , n} be a stationary time series with E(xt) = µ,</p><p>Var(xt) = σ2, and Cor(xt, xt+k) = ρk. Using Equation (5.5) from Chapter</p><p>5:</p><p>a) Calculate Var(x̄) when {xt} is the MA(1) process xt = wt + 1</p><p>2wt−1.</p><p>b) Calculate Var(x̄) when {xt} is the MA(1) process xt = wt − 1</p><p>2wt−1.</p><p>c) Compare each of the above with the variance of the sample mean</p><p>obtained for the white noise case ρk = 0 (k > 0). Of the three mod-</p><p>els, which would have the most accurate estimate of µ based on the</p><p>variances of their sample means?</p><p>d) A simulated example that extracts the variance of the sample mean</p><p>for 100 Gaussian white noise series each of length 20 is given by</p><p>> set.seed(1)</p><p>> m for (i in 1:100) m[i] var(m)</p><p>[1] 0.0539</p><p>For each of the two MA(1) processes, write R code that extracts the</p><p>variance of the sample mean of 100 realisations of length 20. Compare</p><p>them with the variances calculated in parts (a) and (b).</p><p>7. If the sample autocorrelation function of a time series appears to cut off</p><p>after lag q (i.e., autocorrelations at lags higher than q are not significantly</p><p>different from 0 and do not follow any clear patterns), then an MA(q)</p><p>model might be suitable. An AR(p) model is indicated when the partial</p><p>autocorrelation function cuts off after lag p. If there are no convincing</p><p>cutoff points for either function, an ARMA model may provide the best</p><p>fit. Plot the autocorrelation and partial autocorrelation functions for the</p><p>simulated ARMA(1, 1) series given in §6.6.1. Using the AIC, choose a</p><p>best-fitting AR model and a best-fitting MA model. Which best-fitting</p><p>model (AR or MA) has the smallest number of parameters? Compare this</p><p>model with the fitted ARMA(1, 1) model of §6.6.1, and comment.</p><p>7</p><p>Non-stationary Models</p><p>7.1 Purpose</p><p>As we have discovered in the previous chapters, many time series are non-</p><p>stationary because of seasonal effects or trends. In particular, random walks,</p><p>which characterise many types of series, are non-stationary but can be trans-</p><p>formed to a stationary series by first-order differencing (§4.4). In this chap-</p><p>ter we first extend the random walk model to include autoregressive and</p><p>moving average terms. As the differenced series needs to be aggregated (or</p><p>‘integrated’) to recover the original series, the underlying stochastic process</p><p>is called autoregressive integrated moving average, which is abbreviated to</p><p>ARIMA.</p><p>The ARIMA process can be extended to include seasonal terms, giving a</p><p>non-stationary seasonal ARIMA (SARIMA) process. Seasonal ARIMA models</p><p>are powerful tools in the analysis of time series as they are capable of modelling</p><p>a very wide range of series. Much of the methodology was pioneered by Box</p><p>and Jenkins in the 1970’s.</p><p>Series may also be non-stationary because the variance is serially corre-</p><p>lated (technically known as conditionally heteroskedastic), which usually re-</p><p>sults in periods of volatility , where there is a clear change in variance. This</p><p>is common in financial series, but may also occur in other series such as cli-</p><p>mate records. One approach to modelling series of this nature is to use an</p><p>autoregressive model for the variance, i.e. an autoregressive conditional het-</p><p>eroskedastic (ARCH) model. We consider this approach, along with the gen-</p><p>eralised ARCH (GARCH) model in the later part of the chapter.</p><p>7.2 Non-seasonal ARIMA models</p><p>7.2.1 Differencing and the electricity series</p><p>Differencing a series {xt} can remove trends, whether these trends are stochas-</p><p>tic, as in a random walk, or deterministic, as in the case of a linear trend. In</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 137</p><p>Use R, DOI 10.1007/978-0-387-88698-5 7,</p><p>© Springer Science+Business Media, LLC 2009</p><p>138 7 Non-stationary Models</p><p>the case of a random walk, xt = xt−1 + wt, the first-order differenced se-</p><p>ries is white noise {wt} (i.e., ∇xt = xt − xt−1 = wt) and so is stationary.</p><p>In contrast, if xt = a + bt + wt, a linear trend with white noise errors, then</p><p>∇xt = xt−xt−1 = b+wt−wt−1, which is a stationary moving average process</p><p>rather than white noise. Notice that the consequence of differencing a linear</p><p>trend with white noise is an MA(1) process, whereas subtraction of the trend,</p><p>a+ bt, would give white noise. This raises an issue of whether or not it is sen-</p><p>sible to use differencing to remove a deterministic trend. The arima function</p><p>in R does not allow the fitted differenced models to include a constant. If you</p><p>wish to fit a differenced model to a deterministic trend using R you need to</p><p>difference, then mean adjust the differenced series to have a mean of 0, and</p><p>then fit an ARMA model to the adjusted differenced series using arima with</p><p>include.mean set to FALSE and d = 0.</p><p>A corresponding issue arises with simulations from an ARIMA model.</p><p>Suppose xt = a + bt + wt so ∇xt = yt = b + wt − wt−1. It follows directly</p><p>from the definitions that the inverse of yt = ∇xt is xt = x0 +</p><p>∑t</p><p>i=1 yi. If an</p><p>MA(1) model is fitted to the differenced time series, {yt}, the coefficient of</p><p>wt−1 is unlikely to be identified as precisely −1. It follows that the simulated</p><p>{xt} will have increasing variance (Exercise 3) about a straight line.</p><p>We can take first-order differences in R using the difference function diff.</p><p>For example, with the Australian electricity production series, the code below</p><p>plots the data and first-order differences of the natural logarithm of the series.</p><p>Note that in the layout command below the first figure is allocated two 1s</p><p>and is therefore plotted over half (i.e., the first two fourths) of the frame.</p><p>> www CBE Elec.ts layout(c(1, 1, 2, 3))</p><p>> plot(Elec.ts)</p><p>> plot(diff(Elec.ts))</p><p>> plot(diff(log(Elec.ts)))</p><p>The increasing trend is no longer apparent in the plots of the differenced series</p><p>(Fig. 7.1).</p><p>7.2.2 Integrated model</p><p>A series {xt} is integrated of order d, denoted as I(d), if the dth difference of</p><p>{xt} is white noise {wt}; i.e., ∇dxt = wt. Since ∇d ≡ (1 − B)d, where B is</p><p>the backward shift operator, a series {xt} is integrated of order d if</p><p>(1−B)dxt = wt (7.1)</p><p>The random walk is the special case I(1). The diff command from the pre-</p><p>vious section can be used to obtain higher-order differencing either by re-</p><p>peated application or setting the parameter d to the required values; e.g.,</p><p>7.2 Non-seasonal ARIMA models 139</p><p>(a)</p><p>Time</p><p>S</p><p>er</p><p>ie</p><p>s</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>20</p><p>00</p><p>40</p><p>00</p><p>60</p><p>00</p><p>80</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>(b)</p><p>Time</p><p>D</p><p>iff</p><p>s</p><p>er</p><p>ie</p><p>s</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>−</p><p>15</p><p>00</p><p>0</p><p>10</p><p>00</p><p>(c)</p><p>Time</p><p>D</p><p>iff</p><p>lo</p><p>g−</p><p>se</p><p>rie</p><p>s</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>−</p><p>0.</p><p>15</p><p>0.</p><p>05</p><p>0.</p><p>20</p><p>Fig. 7.1. (a) Plot of Australian electricity production series; (b) plot of the first-</p><p>order differenced series; (c) plot of the first-order differenced log-transformed series.</p><p>diff(diff(x)) and diff(x, d=2) would both produce second-order differ-</p><p>enced series of x. Second-order differencing may sometimes successfully reduce</p><p>a series with an underlying curve trend to white noise. A further parameter</p><p>(lag) can be used to set the lag of the differencing. By default, lag is set to</p><p>unity, but other values can be useful for removing additive seasonal effects.</p><p>For example, diff(x, lag=12) will remove both a linear trend and additive</p><p>seasonal effects in a monthly series.</p><p>7.2.3 Definition and examples</p><p>A time series {xt} follows an ARIMA(p, d, q) process if the dth differences of</p><p>the {xt} series are an ARMA(p, q) process. If we introduce yt = (1−B)dxt,</p><p>140 7 Non-stationary Models</p><p>then θp(B)yt = φq(B)wt. We can now substitute for yt to obtain the more</p><p>succinct form for an ARIMA(p, d, q) process as</p><p>θp(B)(1−B)dxt = φq(B)wt (7.2)</p><p>where θp and φq are polynomials of orders p and q, respectively. Some examples</p><p>of ARIMA models are:</p><p>(a) xt = xt−1+wt+βwt−1, where β is a model parameter. To see which model</p><p>this represents, collect together like terms, factorise them, and express</p><p>them in terms of the backward shift operator (1 − B)xt = (1 + βB)wt.</p><p>Comparing this with Equation (7.2), we can see that {xt} is ARIMA(0, 1,</p><p>1), which is sometimes called an integrated moving average model, denoted</p><p>as IMA(1, 1). In general, ARIMA(0, d, q) ≡ IMA(d, q).</p><p>(b) xt = αxt−1+xt−1−αxt−2+wt, where α is a model parameter. Rearranging</p><p>and factorising gives (1 − αB)(1 −B)xt = wt, which is ARIMA(1, 1, 0),</p><p>also known as an integrated autoregressive process and denoted as ARI(1,</p><p>1). In general, ARI(p, d) ≡ ARIMA(p, d, 0).</p><p>7.2.4 Simulation and fitting</p><p>An ARIMA(p, d, q) process can be fitted to data using the R function arima</p><p>with the parameter order set to c(p, d, q). An ARIMA(p, d, q) process can</p><p>be simulated in R by writing appropriate code. For example, in the code below,</p><p>data for the ARIMA(1, 1, 1) model xt = 0.5xt−1+xt−1−0.5xt−2+wt+0.3wt−1</p><p>are simulated and the model fitted to the simulated series to recover the</p><p>parameter estimates.</p><p>> set.seed(1)</p><p>> x for (i in 3:1000) x[i] arima(x, order = c(1, 1, 1))</p><p>Call:</p><p>arima(x = x, order = c(1, 1, 1))</p><p>Coefficients:</p><p>ar1 ma1</p><p>0.423 0.331</p><p>s.e. 0.043 0.045</p><p>sigma^2 estimated as 1.07: log likelihood = -1450, aic = 2906</p><p>Writing your own code has the advantage in that it helps to ensure that you</p><p>understand the model. However, an ARIMA simulation can be carried out</p><p>using the inbuilt R function arima.sim, which has the parameters model and</p><p>n to specify the model and the simulation length, respectively.</p><p>7.2 Non-seasonal ARIMA models 141</p><p>> x arima(x, order = c(1, 1, 1))</p><p>Call:</p><p>arima(x = x, order = c(1, 1, 1))</p><p>Coefficients:</p><p>ar1 ma1</p><p>0.557 0.250</p><p>s.e. 0.037 0.044</p><p>sigma^2 estimated as 1.08: log likelihood = -1457, aic = 2921</p><p>7.2.5 IMA(1, 1) model fitted to the beer production series</p><p>The Australian beer production series is in the second column of the dataframe</p><p>CBE in §7.2.1. The beer data is dominated by a trend of increasing beer pro-</p><p>duction over the period, so a simple integrated model IMA(1, 1) is fitted to</p><p>allow for this trend and a carryover of production from the previous month.</p><p>The IMA(1, 1) model is often appropriate because it represents a linear trend</p><p>with white noise added. The residuals are analysed using the correlogram (Fig.</p><p>7.2), which has peaks at yearly cycles and suggests that a seasonal term is</p><p>required.</p><p>> Beer.ts Beer.ima Beer.ima</p><p>Call:</p><p>arima(x = Beer.ts, order = c(0, 1, 1))</p><p>Coefficients:</p><p>ma1</p><p>-0.333</p><p>s.e. 0.056</p><p>sigma^2 estimated as 360: log likelihood = -1723, aic = 3451</p><p>> acf(resid(Beer.ima))</p><p>From the output above the fitted model is xt = xt−1+wt−0.33wt−1. Forecasts</p><p>can be obtained using this model, with t set to the value required for the</p><p>forecast. Forecasts can also be obtained using the predict function in R with</p><p>the parameter n.ahead set to the number of values in the future. For example,</p><p>the production for the next year in the record is obtained using predict and</p><p>the total annual production for 1991 obtained by summation:</p><p>> Beer.1991 sum(Beer.1991$pred)</p><p>[1] 2365</p><p>142 7 Non-stationary Models</p><p>0.0 0.5 1.0 1.5 2.0</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.2. Australian beer series: correlogram of the residuals of the fitted IMA(1,</p><p>1) model</p><p>7.3 Seasonal ARIMA models</p><p>7.3.1 Definition</p><p>A seasonal ARIMA model uses differencing at a lag equal to the number of</p><p>seasons (s) to remove additive seasonal effects. As with lag 1 differencing to</p><p>remove a trend, the lag s differencing introduces a moving average term. The</p><p>seasonal ARIMA model includes autoregressive and moving average terms at</p><p>lag s. The seasonal ARIMA(p, d, q)(P , D, Q)s model can be most succinctly</p><p>expressed using the backward shift operator</p><p>ΘP (Bs)θp(B)(1−Bs)D(1−B)dxt = ΦQ(Bs)φq(B)wt (7.3)</p><p>where ΘP , θp, ΦQ, and φq are polynomials of orders P , p, Q, and q, respec-</p><p>tively. In general, the model is non-stationary, although if D = d = 0 and the</p><p>roots of the characteristic equation (polynomial terms on the left-hand side of</p><p>Equation (7.3)) all exceed unity in absolute value, the resulting model would</p><p>be stationary. Some examples of seasonal ARIMA models are:</p><p>(a) A simple AR model with a seasonal period of 12 units, denoted as</p><p>ARIMA(0, 0, 0)(1, 0, 0)12, is xt = αxt−12 + wt. Such a model would</p><p>be appropriate for monthly data when only the value in the month of the</p><p>previous year influences the current monthly value. The model is station-</p><p>ary when |α−1/12| > 1.</p><p>(b) It is common to find series with stochastic trends that nevertheless</p><p>have seasonal influences. The model in (a) above could be extended to</p><p>xt = xt−1 + αxt−12 − αxt−13 + wt. Rearranging and factorising gives</p><p>7.3 Seasonal ARIMA models 143</p><p>(1 − αB12)(1 − B)xt = wt or Θ1(B12)(1 − B)xt = wt, which, on com-</p><p>paring with Equation (7.3), is ARIMA(0, 1, 0)(1, 0, 0)12. Note that this</p><p>model could also be written ∇xt = α∇xt−12 +wt, which emphasises that</p><p>the change at time t depends on the change at the same time (i.e., month)</p><p>of the previous year. The model is non-stationary since the polynomial on</p><p>the left-hand side contains the term (1 − B), which implies that there</p><p>exists a unit root B = 1.</p><p>(c) A simple quarterly seasonal moving average model is xt = (1−βB4)wt =</p><p>wt−βwt−4. This is stationary and only suitable for data without a trend.</p><p>If the data also contain a stochastic trend, the model could be extended</p><p>to include first-order differences, xt = xt−1 + wt − βwt−4, which is an</p><p>ARIMA(0, 1, 0)(0, 0, 1)4 process. Alternatively, if the seasonal terms con-</p><p>tain a stochastic trend, differencing can be applied at the seasonal period</p><p>to give xt</p><p>= xt−4 + wt − βwt−4, which is ARIMA(0, 0, 0)(0, 1, 1)4.</p><p>You should be aware that differencing at lag s will remove a linear trend,</p><p>so there is a choice whether or not to include lag 1 differencing. If lag 1</p><p>differencing is included, when a linear trend is appropriate, it will introduce</p><p>moving average terms into a white noise series. As an example, consider a time</p><p>series of period 4 that is the sum of a linear trend, four additive seasonals,</p><p>and white noise:</p><p>xt = a+ bt+ s[t] + wt</p><p>where [t] is the remainder after division of t by 4, so s[t] = s[t−4]. First, consider</p><p>first-order differencing at lag 4 only. Then,</p><p>(1−B4)xt = xt − xt−4</p><p>= a+ bt− (a+ b(t− 4)) + s[t] − s[t−4] + wt − wt−4</p><p>= 4b+ wt − wt−4</p><p>Formally, the model can be expressed as ARIMA(0, 0, 0)(0, 1, 1)4 with a</p><p>constant term 4b. Now suppose we apply first-order differencing at lag 1 before</p><p>differencing at lag 4. Then,</p><p>(1−B4)(1−B)xt = (1−B4)(b+ s[t] − s[t−1] + wt − wt−1)</p><p>= wt − wt−1 − wt−4 + wt−5</p><p>which is a ARIMA(0, 1, 1)(0, 1, 1)4 model with no constant term.</p><p>7.3.2 Fitting procedure</p><p>Seasonal ARIMA models can potentially have a large number of parameters</p><p>and combinations of terms. Therefore, it is appropriate to try out a wide</p><p>range of models when fitting to data and to choose a best-fitting model using</p><p>144 7 Non-stationary Models</p><p>an appropriate criterion such as the AIC. Once a best-fitting model has been</p><p>found, the correlogram of the residuals should be verified as white noise. Some</p><p>confidence in the best-fitting model can be gained by deliberately overfitting</p><p>the model by including further parameters and observing an increase in the</p><p>AIC.</p><p>In R, this approach to fitting a range of seasonal ARIMA models is straight-</p><p>forward, since the fitting criteria can be called by nesting functions and the</p><p>‘up arrow’ on the keyboard used to recall the last command, which can then</p><p>be edited to try a new model. Any obvious terms, such as a differencing term</p><p>if there is a trend, should be included and retained in the model to reduce</p><p>the number of comparisons. The model can be fitted with the arima function,</p><p>which requires an additional parameter seasonal to specify the seasonal com-</p><p>ponents. In the example below, we fit two models with first-order terms to</p><p>the logarithm of the electricity production series. The first uses autoregressive</p><p>terms and the second uses moving average terms. The parameter d = 1 is re-</p><p>tained in both the models since we found in §7.2.1 that first-order differencing</p><p>successfully removed the trend in the series. The seasonal ARI model provides</p><p>the better fit since it has the smallest AIC.</p><p>> AIC (arima(log(Elec.ts), order = c(1,1,0),</p><p>seas = list(order = c(1,0,0), 12)))</p><p>[1] -1765</p><p>> AIC (arima(log(Elec.ts), order = c(0,1,1),</p><p>seas = list(order = c(0,0,1), 12)))</p><p>[1] -1362</p><p>It is straightforward to check a range of models by a trial-and-error approach</p><p>involving just editing a command on each trial to see if an improvement in the</p><p>AIC occurs. Alternatively, we could write a simple function that fits a range of</p><p>ARIMA models and selects the best-fitting model. This approach works better</p><p>when the conditional sum of squares method CSS is selected in the arima</p><p>function, as the algorithm is more robust. To avoid over parametrisation, the</p><p>consistent Akaike Information Criteria (CAIC; see Bozdogan, 1987) can be</p><p>used in model selection. An example program follows.</p><p>get.best.arima best.arima.elec best.fit.elec acf( resid(best.fit.elec) )</p><p>> best.arima.elec [[3]]</p><p>[1] 0 1 1 2 0 2</p><p>> ts.plot( cbind( window(Elec.ts,start = 1981),</p><p>exp(predict(best.fit.elec,12)$pred) ), lty = 1:2)</p><p>From the code above, we see the best-fitting model using terms up to second</p><p>order is ARIMA(0, 1, 1)(2, 0, 2)12. Although higher-order terms could be tried</p><p>by increasing the values in maxord, this would seem unnecessary since the</p><p>residuals are approximately white noise (Fig. 7.3b). For the predicted values</p><p>(Fig. 7.3a), a biased correction factor could be used, although this would seem</p><p>unnecessary given that the residual standard deviation is small compared with</p><p>the predictions.</p><p>7.4 ARCH models</p><p>7.4.1 S&P500 series</p><p>Standard and Poors (of the McGraw-Hill companies) publishes a range of</p><p>financial indices and credit ratings. Consider the following time plot and cor-</p><p>relogram of the daily returns of the S&P500 Index1 (from January 2, 1990 to</p><p>December 31, 1999), available in the MASS library within R.</p><p>> library(MASS)</p><p>> data(SP500)</p><p>> plot(SP500, type = 'l')</p><p>> acf(SP500)</p><p>The time plot of the returns is shown in Figure 7.4(a), and at first glance</p><p>the series appears to be a realisation of a stationary process. However, on</p><p>1 The S&P500 Index is calculated from the stock prices of 500 large corpora-</p><p>tions. The time series in R is the returns of the S&P500 Index, defined as</p><p>100ln(SPIt/SPIt−1), where SPIt is the value of the S&P500 Index on trading</p><p>day t.</p><p>146 7 Non-stationary Models</p><p>(a)</p><p>Time</p><p>1982 1984 1986 1988 1990 1992</p><p>80</p><p>00</p><p>12</p><p>00</p><p>0</p><p>0.0 0.5 1.0 1.5 2.0</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.3. Electricity production series: (a) time plot for last 10 years, with added</p><p>predicted values (dotted); (b) correlogram of the residuals of the best-fitting seasonal</p><p>ARIMA model.</p><p>closer inspection, it seems that the variance is smallest in the middle third of</p><p>the series and greatest in the last third. The series exhibits periods of increased</p><p>variability, sometimes called volatility in the financial literature, although it</p><p>does not increase in a regular way. When a variance is not constant in time</p><p>but changes in a regular way, as in the airline and electricity data (where the</p><p>variance increased with the trend), the series is called heteroskedastic. If a</p><p>series exhibits periods of increased variance, so the variance is correlated in</p><p>time (as observed in the S&P500 data), the series exhibits volatility and is</p><p>called conditional heteroskedastic.</p><p>Note that the correlogram of a volatile series does not differ significantly</p><p>from white noise (Fig. 7.4b), but the series is non-stationary since the variance</p><p>is different at different times. If a correlogram appears to be white noise (e.g.,</p><p>Fig. 7.4b), then volatility can be detected by looking at the correlogram of</p><p>the squared values since the squared values are equivalent to the variance</p><p>7.4 ARCH models 147</p><p>0 500 1000 1500 2000 2500</p><p>−</p><p>6</p><p>−</p><p>2</p><p>2</p><p>(a)</p><p>Index</p><p>S</p><p>P</p><p>50</p><p>0</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.4. Standard and Poors returns of the S&P500 Index: (a) time plot; (b)</p><p>correlogram.</p><p>(provided the series is adjusted to have a mean of zero). The mean of the</p><p>returns of the S&P500 Index between January 2, 1990 and December 31,</p><p>1999 is 0.0458. Although this is small compared with the variance, it accounts</p><p>for an increase in the S&P500 Index from 360 to 1469 over the 2527 trading</p><p>days. The correlogram of the squared mean-adjusted values of the S&P500</p><p>index is given by</p><p>> acf((SP500 - mean(SP500))^2)</p><p>From this we can see that there is evidence of serial correlation in the squared</p><p>values, so there is evidence of conditional heteroskedastic behaviour and</p><p>volatility (Fig. 7.5).</p><p>7.4.2 Modelling volatility: Definition of the ARCH model</p><p>In order to account for volatility, we require a model that allows for conditional</p><p>changes in the variance. One approach to this is to use an autoregressive model</p><p>for the variance process. This leads to the following definition. A</p><p>series {εt}</p><p>is first-order autoregressive conditional heteroskedastic, denoted ARCH(1), if</p><p>148 7 Non-stationary Models</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.5. Returns of the Standard and Poors S&P500 Index: correlogram of the</p><p>squared mean-adjusted values.</p><p>εt = wt</p><p>√</p><p>α0 + α1ε2t−1 (7.4)</p><p>where {wt} is white noise with zero mean and unit variance and α0 and α1</p><p>are model parameters.</p><p>To see how this introduces volatility, square Equation (7.4) to calculate</p><p>the variance</p><p>Var (εt) = E</p><p>(</p><p>ε2t</p><p>)</p><p>= E</p><p>(</p><p>w2</p><p>t</p><p>)</p><p>E</p><p>(</p><p>α0 + α1ε</p><p>2</p><p>t−1</p><p>)</p><p>= E</p><p>(</p><p>α0 + α1ε</p><p>2</p><p>t−1</p><p>)</p><p>= α0 + α1Var (εt−1) (7.5)</p><p>since {wt} has unit variance and {εt} has zero mean. If we compare Equa-</p><p>tion (7.5) with the AR(1) process xt = α0 + α1xt−1 + wt, we see that the</p><p>variance of an ARCH(1) process behaves just like an AR(1) model. Hence, in</p><p>model fitting, a decay in the autocorrelations of the squared residuals should</p><p>indicate whether an ARCH model is appropriate or not. The model should</p><p>only be applied to a prewhitened residual series {εt} that is uncorrelated and</p><p>contains no trends or seasonal changes, such as might be obtained after fitting</p><p>a satisfactory SARIMA model.</p><p>7.4.3 Extensions and GARCH models</p><p>The first-order ARCH model can be extended to a pth-order process by in-</p><p>cluding higher lags. An ARCH(p) process is given by</p><p>7.4 ARCH models 149</p><p>εt = wt</p><p>√√√√α0 +</p><p>p∑</p><p>i=1</p><p>αpε2t−i (7.6)</p><p>where {wt} is again white noise with zero mean and unit variance.</p><p>A further extension, widely used in financial applications, is the generalised</p><p>ARCH model, denoted GARCH(q, p), which has the ARCH(p) model as the</p><p>special case GARCH(0, p). A series {εt} is GARCH(q, p) if</p><p>εt = wt</p><p>√</p><p>ht (7.7)</p><p>where</p><p>ht = α0 +</p><p>p∑</p><p>i=1</p><p>αiε</p><p>2</p><p>t−i +</p><p>q∑</p><p>j=1</p><p>βjht−j (7.8)</p><p>and αi and βj (i = 0, 1, . . . , p; j = 1, . . . , q) are model parameters. In R, a</p><p>GARCH model can be fitted using the garch function in the tseries library</p><p>(Trapletti and Hornik, 2008). An example now follows.</p><p>7.4.4 Simulation and fitted GARCH model</p><p>In the following code data are simulated for the GARCH(1, 1) model at =</p><p>wt</p><p>√</p><p>ht, where ht = α0 +α1at−1 +β1ht−1 with α1 +β1 set.seed(1)</p><p>> alpha0 alpha1 beta1 w a h for (i in 2:10000) {</p><p>h[i] acf(a)</p><p>> acf(a^2)</p><p>The series in a exhibits the GARCH characteristics of uncorrelated values</p><p>(Fig. 7.6a) but correlated squared values (Fig. 7.6b).</p><p>In the following example, a GARCH model is fitted to the simulated series</p><p>using the garch function, which can be seen to recover the original parameters</p><p>since these fall within the 95% confidence intervals. The default is GARCH(1,</p><p>1), which often provides an adequate model, but higher-order models can be</p><p>specified with the parameter order=c(p,q) for some choice of p and q.</p><p>150 7 Non-stationary Models</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.6. Correlograms for GARCH series: (a) simulated series; (b) squared values</p><p>of simulated series.</p><p>> library(tseries)</p><p>> a.garch confint(a.garch)</p><p>2.5 % 97.5 %</p><p>a0 0.0882 0.109</p><p>a1 0.3308 0.402</p><p>b1 0.1928 0.295</p><p>In the example above, we have used the parameter trace=F to sup-</p><p>press output and a numerical estimate of gradient grad="numerical" that</p><p>is slightly more robust (in the sense of algorithmic convergence) than the</p><p>default.</p><p>7.4.5 Fit to S&P500 series</p><p>The GARCH model is fitted to the S&P500 return series. The residual series</p><p>of the GARCH model {ŵt} are calculated from</p><p>ŵt =</p><p>εt√</p><p>ĥt</p><p>7.4 ARCH models 151</p><p>If the GARCH model is suitable the residual series should appear to be a</p><p>realisation of white noise with zero mean and unit variance. In the case of a</p><p>GARCH(1, 1) model,</p><p>ĥt = α̂0 + α̂1ε</p><p>2</p><p>t−1 + β̂1ĥt−1</p><p>with ĥ1 = 0 for t = 2, . . . , n.2 The calculations are performed by the function</p><p>garch. The first value in the residual series is not available (NA), so we remove</p><p>the first value using [-1] and the correlograms are then found for the resultant</p><p>residual and squared residual series:</p><p>> sp.garch sp.res acf(sp.res)</p><p>> acf(sp.res^2)</p><p>Both correlograms suggest that the residuals of the fitted GARCH model be-</p><p>have like white noise, indicating a satisfactory fit has been obtained (Fig. 7.7).</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.7. GARCH model fitted to mean-adjusted S&P500 returns: (a) correlogram</p><p>of the residuals; (b) correlogram of the squared residuals.</p><p>2 Notice that a residual for time t = 1 cannot be calculated from this formula.</p><p>152 7 Non-stationary Models</p><p>7.4.6 Volatility in climate series</p><p>Recently there have been studies on volatility in climate series (e.g., Romilly,</p><p>2005). Temperature data (1850–2007; see Brohan et al. 2006) for the southern</p><p>hemisphere were extracted from the database maintained by the University</p><p>of East Anglia Climatic Research Unit and edited into a form convenient for</p><p>reading into R. In the following code, the series are read in, plotted (Fig. 7.8),</p><p>and a best-fitting seasonal ARIMA model obtained using the get.best.arima</p><p>function given in §7.3.2. Confidence intervals for the parameters were then</p><p>evaluated (the transpose t() was taken to provide these in rows instead of</p><p>columns).</p><p>Time</p><p>st</p><p>em</p><p>p.</p><p>ts</p><p>1850 1900 1950 2000</p><p>−</p><p>1.</p><p>0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>Fig. 7.8. The southern hemisphere temperature series.</p><p>> stemp stemp.ts plot(stemp.ts)</p><p>> stemp.best stemp.best[[3]]</p><p>[1] 1 1 2 2 0 1</p><p>> stemp.arima t( confint(stemp.arima) )</p><p>ar1 ma1 ma2 sar1 sar2 sma1</p><p>2.5 % 0.832 -1.45 0.326 0.858 -0.0250 -0.97</p><p>97.5 % 0.913 -1.31 0.453 1.004 0.0741 -0.85</p><p>The second seasonal AR component is not significantly different from zero,</p><p>and therefore the model is refitted leaving this component out:</p><p>> stemp.arima t( confint(stemp.arima) )</p><p>ar1 ma1 ma2 sar1 sma1</p><p>2.5 % 0.83 -1.45 0.324 0.924 -0.969</p><p>97.5 % 0.91 -1.31 0.451 0.996 -0.868</p><p>To check for goodness-of-fit, the correlogram of residuals from the ARIMA</p><p>model is plotted (Fig. 7.9a). In addition, to investigate volatility, the correlo-</p><p>gram of the squared residuals is found (Fig. 7.9b).</p><p>0.0 0.5 1.0 1.5 2.0 2.5</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.5 1.0 1.5 2.0 2.5</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.9. Seasonal ARIMA model fitted to the temperature series: (a) correlogram</p><p>of the residuals; (b) correlogram of the squared residuals.</p><p>> stemp.res layout(1:2)</p><p>154 7 Non-stationary Models</p><p>> acf(stemp.res)</p><p>> acf(stemp.res^2)</p><p>There is clear evidence of volatility since the squared residuals are corre-</p><p>lated at most lags (Fig. 7.9b). Hence, a GARCH model is fitted to the residual</p><p>series:</p><p>> stemp.garch t(confint(stemp.garch))</p><p>a0 a1 b1</p><p>2.5 % 1.06e-05 0.0330 0.925</p><p>97.5 % 1.49e-04 0.0653 0.963</p><p>> stemp.garch.res acf(stemp.garch.res)</p><p>> acf(stemp.garch.res^2)</p><p>Based on the output above, we can see that the coefficients of the fitted</p><p>GARCH model are all statistically significant, since zero does not fall in any of</p><p>the confidence intervals. Furthermore, the correlogram of the residuals shows</p><p>no obvious patterns or significant values (Fig. 7.10). Hence, a satisfactory fit</p><p>has been obtained.</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.10. GARCH model fitted to the residuals of the seasonal ARIMA model</p><p>of the temperature series: (a) correlogram of the residuals; (b) correlogram of the</p><p>squared residuals.</p><p>7.6 Exercises 155</p><p>7.4.7 GARCH in forecasts and simulations</p><p>If a GARCH model is fitted to the residual errors of a fitted time series</p><p>model, it will not influence the average prediction at some point in time since</p><p>the mean of the residual errors is zero. Thus, single-point forecasts from a</p><p>fitted time series model remain unchanged when GARCH models are fitted</p><p>to the residuals. However, a fitted GARCH model will affect the variance of</p><p>simulated predicted values and thus result in periods of changing variance or</p><p>volatility in simulated series.</p><p>The main application of GARCH models is for simulation studies, espe-</p><p>cially in finance, insurance, teletraffic, and climatology. In all these applica-</p><p>tions, the periods of high variability tend to lead to untoward events, and it is</p><p>essential to model them in a realistic manner. Typical R code for simulation</p><p>was given in §7.4.4.</p><p>7.5 Summary of R commands</p><p>garch fits a GARCH (or ARCH) model to data</p><p>7.6 Exercises</p><p>In each of the following, {wt} is white noise with zero mean.</p><p>1. Identify each of the following as specific ARIMA models and state whether</p><p>or not they are stationary.</p><p>a) zt = zt−1 − 0.25zt−2 + wt + 0.5wt−1</p><p>b) zt = 2zt−1 − zt−2 + wt</p><p>c) zt = 0.5zt−1 + 0.5zt−2 + wt − 0.5wt−1 + 0.25wt−2</p><p>2. Identify the following as certain multiplicative seasonal ARIMA models</p><p>and find out whether they are invertible and stationary.</p><p>a) zt = 0.5zt−1 + zt−4 − 0.5zt−5 + wt − 0.3wt−1</p><p>b) zt = zt−1 + zt−12 − zt−13 + wt − 0.5wt−1 − 0.5wt−12 + 0.25wt−13</p><p>3. Suppose xt = a+ bt+ wt. Define yt = ∇xt.</p><p>a) Show that xt = x0 +</p><p>∑t</p><p>i=1 yi and identify x0.</p><p>b) Now suppose an MA(1) model is fitted to {yt} and the fitted model is</p><p>yt = b+wt + βwt−1. Show that a simulated {xt} will have increasing</p><p>variance about the line a+ bt unless β is precisely −1.</p><p>156 7 Non-stationary Models</p><p>4. The number of overseas visitors to New Zealand is recorded for each month</p><p>over the period 1977 to 1995 in the file osvisit.dat on the book website</p><p>(http://www.massey.ac.nz/∼pscowper/ts/osvisit.dat). Download the file</p><p>into R and carry out the following analysis. Your solution should include</p><p>any R commands, plots, and comments. Let xt be the number of overseas</p><p>visitors in time period t (in months) and zt = ln(xt).</p><p>a) Comment on the main features in the correlogram for {zt}.</p><p>b) Fit an ARIMA(1, 1, 0) model to {zt} giving the estimated AR pa-</p><p>rameter and the standard deviation of the residuals. Comment on the</p><p>correlogram of the residuals of this fitted ARIMA model.</p><p>c) Fit a seasonal ARIMA(1, 1, 0)(0, 1, 0)12 model to {zt} and plot the</p><p>correlogram of the residuals of this model. Has seasonal differencing</p><p>removed the seasonal effect? Comment.</p><p>d) Choose the best-fitting Seasonal ARIMA model from the following:</p><p>ARIMA(1, 1, 0)(1, 1, 0)12, ARIMA(0, 1, 1)(0, 1, 1)12, ARIMA(1, 1,</p><p>0)(0, 1, 1)12, ARIMA(0, 1, 1)(1, 1, 0)12, ARIMA(1, 1, 1)(1, 1, 1)12,</p><p>ARIMA(1, 1, 1)(1, 1, 0)12, ARIMA(1, 1, 1)(0, 1, 1)12. Base your choice</p><p>on the AIC, and comment on the correlogram of the residuals of the</p><p>best-fitting model.</p><p>e) Express the best-fitting model in part (d) above in terms of zt, white</p><p>noise components, and the backward shift operator (you will need</p><p>to write this out by hand, but it is not necessary to expand all the</p><p>factors).</p><p>f) Test the residuals from the best-fitting seasonal ARIMA model for</p><p>stationarity.</p><p>g) Forecast the number of overseas visitors for each month in the next</p><p>year (1996), and give the total number of visitors expected in 1996</p><p>under the fitted model. [Hint: To get the forecasts, you will need to use</p><p>the exponential function of the generated seasonal ARIMA forecasts</p><p>and multiply these by a bias correction factor based on the mean</p><p>square residual error.]</p><p>5. Use the get.best.arima function from §7.3.2 to obtain a best-fitting</p><p>ARIMA(p, d, q)(P , D, Q)12 for all p, d, q, P , D, Q ≤ 2 to the</p><p>logarithm of the Australian chocolate production series (in the file at</p><p>http://www.massey.ac.nz/∼pscowper/ts/cbe.dat). Check that the correl-</p><p>ogram of the residuals for the best-fitting model is representative of white</p><p>noise. Check the correlogram of the squared residuals. Comment on the</p><p>results.</p><p>6. This question uses the data in stockmarket.dat on the book website</p><p>http://www.massey.ac.nz/∼pscowper/ts/, which contains stock market</p><p>7.6 Exercises 157</p><p>data for seven cities for the period January 6, 1986 to December 31, 1997.</p><p>Download the data into R and put the data into a variable x. The first</p><p>three rows should be:</p><p>> x[1:3,]</p><p>Amsterdam Frankfurt London HongKong Japan Singapore NewYork</p><p>1 275.76 1425.56 1424.1 1796.59 13053.8 233.63 210.65</p><p>2 275.43 1428.54 1415.2 1815.53 12991.2 237.37 213.80</p><p>3 278.76 1474.24 1404.2 1826.84 13056.4 240.99 207.97</p><p>a) Plot the Amsterdam series and the first-order differences of the series.</p><p>Comment on the plots.</p><p>b) Fit the following models to the Amsterdam series, and select the best</p><p>fitting model: ARIMA(0, 1, 0); ARIMA(1, 1, 0), ARIMA(0, 1, 1),</p><p>ARIMA(1, 1, 1).</p><p>c) Produce the correlogram of the residuals of the best-fitting model and</p><p>the correlogram of the squared residuals. Comment.</p><p>d) Fit the following GARCH models to the residuals, and select the</p><p>best-fitting model: GARCH(0, 1), GARCH(1, 0), GARCH(1, 1), and</p><p>GARCH(0, 2). Give the estimated parameters of the best-fitting</p><p>model.</p><p>e) Plot the correlogram of the residuals from the best fitting GARCH</p><p>model. Plot the correlogram of the squared residuals from the best</p><p>fitting GARCH model, and comment on the plot.</p><p>7. Predict the monthly temperatures for 2008 using the model fitted to the</p><p>climate series in §7.4.6, and add these predicted values to a time plot of</p><p>the temperature series from 1990. Give an upper bound for the predicted</p><p>values based on a 95% confidence level. Simulate ten possible future tem-</p><p>perature scenarios for 2008. This will involve generating GARCH errors</p><p>and adding these to the predicted values from the fitted seasonal ARIMA</p><p>model.</p><p>8</p><p>Long-Memory Processes</p><p>8.1 Purpose</p><p>Some time series exhibit marked correlations at high lags, and they are re-</p><p>ferred to as long-memory processes. Long-memory is a feature of many geo-</p><p>physical time series. Flows in the Nile River have correlations at high lags,</p><p>and Hurst (1951) demonstrated that this affected the optimal design capacity</p><p>of a dam. Mudelsee (2007) shows that long-memory is a hydrological prop-</p><p>erty that can lead to prolonged drought or temporal clustering of extreme</p><p>floods. At a rather different scale, Leland et al. (1993) found that Ethernet</p><p>local area network (LAN) traffic appears to be statistically self-similar and a</p><p>long-memory process. They showed that the nature of congestion produced by</p><p>self-similar traffic differs drastically from that predicted by the traffic models</p><p>used at that time. Mandelbrot and co-workers investigated the relationship</p><p>between self-similarity and long term memory and played a leading role in</p><p>establishing fractal geometry as a subject of study.</p><p>8.2 Fractional differencing</p><p>Beran (1994) describes the qualitative features of a typical sample path (real-</p><p>isation) from a long-memory process. There are relatively long periods during</p><p>which the observations tend to stay at a high level and similar long periods</p><p>during which observations tend to be at a low level. There may appear to</p><p>be trends or cycles over short time periods, but these do not persist and the</p><p>entire series looks stationary. A more objective criterion is that sample corre-</p><p>lations rk decay to zero at a rate that is approximately proportional to k−λ</p><p>for some 0</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 159</p><p>Use R, DOI 10.1007/978-0-387-88698-5 8,</p><p>© Springer Science+Business Media, LLC 2009</p><p>160 8 Long-Memory Processes</p><p>its autocorrelation function. A stationary process xt with long-memory has</p><p>an autocorrelation function ρk that satisfies the condition</p><p>lim</p><p>k→∞</p><p>ρk = ck−λ</p><p>for some 0 1</p><p>2 . The</p><p>Hurst parameter, H, is defined by H = 1 − λ/2 and hence ranges from 1</p><p>2</p><p>to 1. The closer H is to 1, the more persistent the time series. If there is no</p><p>long-memory effect, then H = 1</p><p>2 .</p><p>A fractionally differenced ARIMA process {xt}, FARIMA(p, d, q), has the</p><p>form</p><p>φ(B)(1−B)dxt = ψ(B)wt (8.1)</p><p>for some − 1</p><p>2 cf d cf[1] for (i in 1:39) cf[i+1] 4) degrees of freedom has</p><p>kurtosis 6/(ν−4) and so is heavy tailed. If, for example, d = 0.45 and L = 40,</p><p>then</p><p>(1−B)−dwt = wt + 0.45wt−1 + 0.32625wt−2 + 0.2664375wt−3</p><p>+ · · ·+ 0.0657056wt−40</p><p>The autocorrelation function ρk of a FARIMA(0, d, 0) process tends towards</p><p>Γ (1− d)</p><p>Γ (d)</p><p>|k|2d−1</p><p>for large n. The process is stationary provided − 1</p><p>2 library(fracdiff)</p><p>> set.seed(1)</p><p>> fds.sim x fds.fit n L d fdc fdc[1] for (k in 2:L) fdc[k] y for (i in (L+1):n) {</p><p>csm y z.ar ns z par(mfcol = c(2, 2))</p><p>> plot(as.ts(x), ylab = "x")</p><p>> acf(x) ; acf(y) ; acf(z)</p><p>In Figure 8.1, we show the results when we generate a realisation {xt} from</p><p>a fractional difference model with no AR or MA parameters, FARIMA(0, 0.4,</p><p>0). The very slow decay in both the acf and pacf indicates long-memory. The</p><p>estimate of d is 0.3921. The fractionally differenced series, {yt}, appears to be</p><p>a realisation of DWN. If, instead of fitting a FARIMA(0, d, 0) model, we use</p><p>ar, the order selected is 38. The residuals from AR(38) also appear to be a</p><p>realisation from DWN, but the single-parameter FARIMA model is far more</p><p>parsimonious.</p><p>In Figure 8.2, we show the results when we generate a realisation {xt}</p><p>from a FARIMA(1, 0.4, 0) model with an AR parameter of 0.9. The estimates</p><p>of d and the AR parameter, obtained from fracdiff, are 0.429 and 0.884,</p><p>respectively. The estimate of the AR parameter made from the fractionally</p><p>differenced series {yt} using ar is 0.887, and the slight difference is small by</p><p>comparison with the estimated error and is of no practical importance. The</p><p>residuals appear to be a realisation of DWN (Fig. 8.2).</p><p>8.3 Fitting to simulated data 163</p><p>(a)</p><p>Time</p><p>x</p><p>0 2000 6000 10000</p><p>−</p><p>4</p><p>0</p><p>2</p><p>4</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>(a)</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 8.1. A simulated series with long-memory FARIMA(0, 0.4, 0): (a) time series</p><p>plot (x); (b) correlogram of series x; (c) partial correlogram of y; (d) correlogram</p><p>after fractional differencing (z).</p><p>> summary(fds.fit)</p><p>...</p><p>Coefficients:</p><p>Estimate Std. Error z value Pr(>|z|)</p><p>d 0.42904 0.01439 29.8 ar(y)</p><p>Coefficients:</p><p>1</p><p>0.887</p><p>Order selected 1 sigma^2 estimated as 1.03</p><p>164 8 Long-Memory Processes</p><p>(a)</p><p>Time</p><p>x</p><p>0 2000 6000 10000</p><p>−</p><p>30</p><p>−</p><p>10</p><p>10</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(c)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(d)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 8.2. A time series with long-memory FARIMA(1, 0.4, 0): (a) time series plot</p><p>(x); (b) correlogram of series x; (c) correlogram of the differenced series (y); (d)</p><p>correlogram of the residuals after fitting an AR(1) model (z).</p><p>8.4 Assessing evidence of long-term dependence</p><p>8.4.1 Nile minima</p><p>The data in the file Nilemin.txt are annual minimum water levels (mm)</p><p>of the Nile River for the years 622 to 1284, measured at the Roda Island</p><p>gauge near Cairo. It is likely that there may be a trend over a 600-year period</p><p>due to changing climatic conditions or changes to the channels around Roda</p><p>Island. We start the analysis by estimating and removing a linear trend fitted</p><p>by regression. Having done this, a choice of nar is taken as a starting value</p><p>for using fracdiff on the residuals from the regression.</p><p>Given the iterative</p><p>nature of the fitting process, the choice of initial values for nar and nma should</p><p>not be critical. The estimate of d with nar set at 5 is 0.3457. The best-fitting</p><p>model to the fractionally differenced series is AR(1) with parameter 0.14. We</p><p>now re-estimate d using fracdiff with nar equal to 1, but in this case the</p><p>estimate of d is unchanged. The residuals are a plausible realisation of DWN.</p><p>The acf of the squared residuals indicates that a GARCH model would be</p><p>appropriate. There is convincing evidence of long-term memory in the Nile</p><p>River minima flows (Fig. 8.3).</p><p>8.4 Assessing evidence of long-term dependence 165</p><p>0 100 200 300 400 500 600</p><p>10</p><p>00</p><p>13</p><p>00</p><p>Nile minima</p><p>Time</p><p>D</p><p>ep</p><p>th</p><p>(</p><p>m</p><p>m</p><p>)</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Detrended Nile minima</p><p>Fractionally differenced series</p><p>Time</p><p>m</p><p>m</p><p>0 100 200 300 400 500</p><p>−</p><p>20</p><p>0</p><p>10</p><p>0</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fractionally differenced series</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Residuals</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Squared residuals</p><p>Fig. 8.3. Nile River minimum water levels: time series (top left); acf of detrended</p><p>time series (middle left); fractionally differenced detrended series (lower left); acf of</p><p>fractionally differenced series (top right); acf of residuals of AR(1) fitted to frac-</p><p>tionally differenced series (middle right); acf of squared residuals of AR(1) (lower</p><p>right).</p><p>8.4.2 Bellcore Ethernet data</p><p>The data in LAN.txt are the numbers of packet arrivals (bits) in 4000 consecu-</p><p>tive 10-ms intervals seen on an Ethernet at the Bellcore Morristown Research</p><p>and Engineering facility. A histogram of the numbers of bits is remarkably</p><p>skewed, so we work with the logarithm of one plus the number of bits. The</p><p>addition of 1 is needed because there are many intervals in which no pack-</p><p>ets arrive. The correlogram of this transformed time series suggests that a</p><p>FARIMA model may be suitable.</p><p>The estimate of d, with nar set at 48, is 0.3405, and the fractionally dif-</p><p>ferenced series has no substantial correlations. Nevertheless, the function ar</p><p>fits an AR(26) model to this series, and the estimate of the standard devi-</p><p>ation of the errors, 2.10, is slightly less than the standard deviation of the</p><p>fractionally differenced series, 2.13. There is noticeable autocorrelation in the</p><p>series of squared residuals from the AR(26) model, which is a feature of time</p><p>series that have bursts of activity, and this can be modelled as a GARCH</p><p>166 8 Long-Memory Processes</p><p>ln(bits+1)</p><p>Time</p><p>x</p><p>0 1000 2000 3000 4000</p><p>0</p><p>2</p><p>4</p><p>6</p><p>8</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>ln(bits+1)</p><p>Fractionally differenced series</p><p>Time</p><p>y</p><p>0 1000 2000 3000 4000</p><p>−</p><p>6</p><p>0</p><p>4</p><p>8</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fractionally differenced series</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Residuals</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Squared residuals</p><p>Fig. 8.4. Bellcore local area network (LAN) traffic, ln(1+number of bits): time</p><p>series (top left); acf of time series (middle left); fractionally differenced series (lower</p><p>left); acf of fractionally differenced series (top right); acf of residuals of AR(26) fitted</p><p>to fractionally differenced series (middle right); acf of squared residuals of AR(26)</p><p>(lower right).</p><p>process (Fig. 8.4). In Exercises 1 and 2, you are asked to look at this case in</p><p>more detail and, in particular, investigate whether an ARMA model is more</p><p>parsimonious.</p><p>8.4.3 Bank loan rate</p><p>The data in mprime.txt are of the monthly percentage US Federal Reserve</p><p>Bank prime loan rate,2 courtesy of the Board of Governors of the Federal</p><p>Reserve System, from January 1949 until November 2007. The time series is</p><p>plotted in the top left of Figure 8.5 and looks as though it could be a realisation</p><p>of a random walk. It also has a period of high variability. The correlogram</p><p>shows very high correlations at smaller lags and substantial correlation up to</p><p>lag 28. Neither a random walk nor a trend is a suitable model for long-term</p><p>2 Data downloaded from Federal Reserve Economic Data at the Federal Reserve</p><p>Bank of St. Louis.</p><p>8.5 Simulation 167</p><p>simulation of interest rates in a stable economy. Instead, we fit a FARIMA</p><p>model, which has the advantage of being stationary.</p><p>Interest rate</p><p>Time</p><p>x</p><p>0 100 200 300 400 500 600 700</p><p>5</p><p>10</p><p>20</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Interest rate</p><p>Fractionally differenced series</p><p>Time</p><p>y</p><p>0 100 200 300 400 500 600</p><p>5</p><p>10</p><p>20</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fractionally differenced series</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Residuals</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Squared residuals</p><p>Fig. 8.5. Federal Reserve Bank interest rates: time series (top left); acf of time series</p><p>(middle left); fractionally differenced series (lower left); acf of fractionally differenced</p><p>series (upper right); acf of residuals of AR(17) fitted to fractionally differenced series</p><p>(middle right); acf of squared residuals of AR(17) (lower right).</p><p>The estimate of d is almost 0, and this implies that the decay of the</p><p>correlations from an initial high value is more rapid than it would be for a</p><p>FARIMA model. The fitted AR model has an order of 17 and is not entirely</p><p>satisfactory because of the statistically significant autocorrelation at lag 1 in</p><p>the residual series. You are asked to do better in Exercise 3. The substantial</p><p>autocorrelations of the squared residuals from the AR(17) model indicate that</p><p>a GARCH model is needed. This has been a common feature of all three time</p><p>series considered in this section.</p><p>8.5 Simulation</p><p>FARIMA models are important for simulation because short-memory models,</p><p>which ignore evidence of long-memory, can lead to serious overestimation of</p><p>168 8 Long-Memory Processes</p><p>system performance. This has been demonstrated convincingly at scales from</p><p>reservoirs to routers in telecommunication networks.</p><p>Realistic models for simulation will typically need to incorporate GARCH</p><p>and heavy-tailed distributions for the basic white noise series. The procedure</p><p>is to fit a GARCH model to the residuals from the AR model fitted to the</p><p>fractionally differenced series. Then the residuals from the GARCH model</p><p>are calculated and a suitable probability distribution can be fitted to these</p><p>residuals (Exercise 5). Having fitted the models, the simulation proceeds by</p><p>generating random numbers from the fitted probability model fitted to the</p><p>GARCH residuals.</p><p>8.6 Summary of additional commands used</p><p>fracdiff fits a fractionally differenced, FARIMA(p, d, q), model</p><p>fracdiff.sim simulates a FARIMA model</p><p>8.7 Exercises</p><p>1. Read the LAN data into R.</p><p>a) Plot a boxplot and histogram of the number of bits.</p><p>b) Calculate the skewness and kurtosis of the number of bits.</p><p>c) Repeat (a) and (b) for the logarithm of 1 plus the number of bits.</p><p>d) Repeat (a) for the residuals after fitting an AR model to the fraction-</p><p>ally differenced series.</p><p>e) Fit an ARMA(p, q) model to the fractionally differenced series. Is this</p><p>an improvement on the AR(p) model?</p><p>f) In the text, we set nar in fracdiff at 48. Repeat the analysis with</p><p>nar equal to 2.</p><p>2. Read the LAN data into R.</p><p>a) Calculate the number of bits in 20-ms intervals, and repeat the analysis</p><p>using this time series.</p><p>b) Calculate the number of bits in 40-ms intervals, and repeat the analysis</p><p>using this time series.</p><p>c) Repeat (a) and (b) for realisations from FARIMA(0, d, 0).</p><p>3. Read the Federal Reserve Bank data into R.</p><p>a) Fit a random walk model and comment.</p><p>b) Fit an ARMA(p, q) model and comment.</p><p>8.7 Exercises 169</p><p>4. The rescaled adjusted range is calculated for a time series {xt} of length</p><p>m as follows. First compute the mean, x̄, and standard deviation, s, of</p><p>the series. Then calculate the adjusted partial sums</p><p>Sk =</p><p>k∑</p><p>t=1</p><p>xt − kx̄</p><p>for k = 1, . . . ,m. Notice that S(m) must equal zero and that large devia-</p><p>tions from 0 are indicative of persistence. The rescaled adjusted range</p><p>Rm = {max(S1, . . . , Sm)− min(S1, . . . , Sm)}/s</p><p>is the difference</p><p>. . . . . . . . . . . . . . . . . . . . . . . . 193</p><p>9.8.2 AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193</p><p>xiv Contents</p><p>9.8.3 Derivation of spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193</p><p>9.9 Autoregressive spectrum estimation . . . . . . . . . . . . . . . . . . . . . . . . 194</p><p>9.10 Finer details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194</p><p>9.10.1 Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194</p><p>9.10.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195</p><p>9.10.3 Daniell windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196</p><p>9.10.4 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196</p><p>9.10.5 Tapering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197</p><p>9.10.6 Spectral analysis compared with wavelets . . . . . . . . . . . . . 197</p><p>9.11 Summary of additional commands used . . . . . . . . . . . . . . . . . . . . 197</p><p>9.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198</p><p>10 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.2 Identifying the gain of a linear system . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.2.1 Linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.2.2 Natural frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202</p><p>10.2.3 Estimator of the gain function . . . . . . . . . . . . . . . . . . . . . . 202</p><p>10.3 Spectrum of an AR(p) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203</p><p>10.4 Simulated single mode of vibration system . . . . . . . . . . . . . . . . . . 203</p><p>10.5 Ocean-going tugboat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205</p><p>10.6 Non-linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207</p><p>10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208</p><p>11 Multivariate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211</p><p>11.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211</p><p>11.2 Spurious regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211</p><p>11.3 Tests for unit roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214</p><p>11.4 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216</p><p>11.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216</p><p>11.4.2 Exchange rate series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218</p><p>11.5 Bivariate and multivariate white noise . . . . . . . . . . . . . . . . . . . . . 219</p><p>11.6 Vector autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220</p><p>11.6.1 VAR model fitted to US economic series . . . . . . . . . . . . . . 222</p><p>11.7 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227</p><p>11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227</p><p>12 State Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229</p><p>12.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229</p><p>12.2 Linear state space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230</p><p>12.2.1 Dynamic linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230</p><p>12.2.2 Filtering* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231</p><p>12.2.3 Prediction* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232</p><p>12.2.4 Smoothing* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233</p><p>12.3 Fitting to simulated univariate time series . . . . . . . . . . . . . . . . . . 234</p><p>Contents xv</p><p>12.3.1 Random walk plus noise model . . . . . . . . . . . . . . . . . . . . . . 234</p><p>12.3.2 Regression model with time-varying coefficients . . . . . . . 236</p><p>12.4 Fitting to univariate time series . . . . . . . . . . . . . . . . . . . . . . . . . . . 238</p><p>12.5 Bivariate time series – river salinity . . . . . . . . . . . . . . . . . . . . . . . . 239</p><p>12.6 Estimating the variance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 242</p><p>12.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243</p><p>12.8 Summary of additional commands used . . . . . . . . . . . . . . . . . . . . 244</p><p>12.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244</p><p>References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247</p><p>Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249</p><p>1</p><p>Time Series Data</p><p>1.1 Purpose</p><p>Time series are analysed to understand the past and to predict the future,</p><p>enabling managers or policy makers to make properly informed decisions.</p><p>A time series analysis quantifies the main features in data and the random</p><p>variation. These reasons, combined with improved computing power, have</p><p>made time series methods widely applicable in government, industry, and</p><p>commerce.</p><p>The Kyoto Protocol is an amendment to the United Nations Framework</p><p>Convention on Climate Change. It opened for signature in December 1997 and</p><p>came into force on February 16, 2005. The arguments for reducing greenhouse</p><p>gas emissions rely on a combination of science, economics, and time series</p><p>analysis. Decisions made in the next few years will affect the future of the</p><p>planet.</p><p>During 2006, Singapore Airlines placed an initial order for twenty Boeing</p><p>787-9s and signed an order of intent to buy twenty-nine new Airbus planes,</p><p>twenty A350s, and nine A380s (superjumbos). The airline’s decision to expand</p><p>its fleet relied on a combination of time series analysis of airline passenger</p><p>trends and corporate plans for maintaining or increasing its market share.</p><p>Time series methods are used in everyday operational decisions. For exam-</p><p>ple, gas suppliers in the United Kingdom have to place orders for gas from the</p><p>offshore fields one day ahead of the supply. Variation about the average for</p><p>the time of year depends on temperature and, to some extent, the wind speed.</p><p>Time series analysis is used to forecast demand from the seasonal average with</p><p>adjustments based on one-day-ahead weather forecasts.</p><p>Time series models often form the basis of computer simulations. Some</p><p>examples are assessing different strategies for control of inventory using a</p><p>simulated time series of demand; comparing designs of wave power devices us-</p><p>ing a simulated series of sea states; and simulating daily rainfall to investigate</p><p>the long-term environmental effects of proposed water management policies.</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 1</p><p>Use R, DOI 10.1007/978-0-387-88698-5 1,</p><p>© Springer Science+Business Media, LLC 2009</p><p>2 1 Time Series Data</p><p>1.2 Time series</p><p>In most branches of science, engineering, and commerce, there are variables</p><p>measured sequentially in time. Reserve banks record interest rates and ex-</p><p>change rates each day. The government statistics department will compute</p><p>the country’s gross domestic product on a yearly basis. Newspapers publish</p><p>yesterday’s noon temperatures for capital cities from around the world. Me-</p><p>teorological offices record rainfall at many different sites with differing reso-</p><p>lutions. When</p><p>between the largest surplus and the greatest deficit. If we</p><p>have a long time series of length n, we can calculate Rm for values of m</p><p>from, for example, 20 upwards to n in steps of 10. When m is less than</p><p>n, we can calculate n−m values for Rm by starting at different points in</p><p>the series. Hurst plotted ln(Rm) against ln(m) for many long time series.</p><p>He noticed that lines fitted through the points were usually steeper for</p><p>geophysical series, such as streamflow, than for realisations of independent</p><p>Gaussian variables (Gaussian DWN). The average value of the slope (H)</p><p>of these lines for the geophysical time series was 0.73, significantly higher</p><p>than the average slope of 0.5 for the independent sequences. The linear</p><p>logarithmic relationship is equivalent to</p><p>Rm ∝ mH</p><p>Plot ln(Rm) against ln(m) for the detrended Nile River minimum flows.</p><p>5. a) Refer to the data in LAN.txt and the time series of logarithms of the</p><p>numbers of packet arrivals, with 1 added, in 10-ms intervals calcu-</p><p>lated from the numbers of packet arrivals. Fit a GARCH model to the</p><p>residuals from the AR(26) model fitted to the fractionally differenced</p><p>time series.</p><p>b) Calculate the residuals from the GARCH model, and fit a suitable</p><p>distribution to these residuals.</p><p>c) Calculate the mean number of packets arriving in 10-ms intervals. Set</p><p>up a simulation model for a router that has a realisation of the model</p><p>in (a) as input and can send out packets at a constant rate equal to</p><p>the product of the mean number of packets arriving in 10-ms intervals</p><p>with a factor g, which is greater than 1.</p><p>d) Code the model fitted in (a) so that it will provide simulations of</p><p>time series of the number of packets that are the input to the router.</p><p>Remember that you first obtain a realisation for ln(number of packets</p><p>+ 1) and then take the exponential of this quantity, subtract 1, and</p><p>round the result to the nearest integer.</p><p>170 8 Long-Memory Processes</p><p>e) Compare the results of your simulation with a model that assumes</p><p>Gaussian white noise for the residuals of the AR(26) model for g =</p><p>1.05, 1.1, 1.5, and 2.</p><p>9</p><p>Spectral Analysis</p><p>9.1 Purpose</p><p>Although it follows from the definition of stationarity that a stationary time</p><p>series model cannot have components at specific frequencies, it can never-</p><p>theless be described in terms of an average frequency composition. Spectral</p><p>analysis distributes the variance of a time series over frequency, and there are</p><p>many applications. It can be used to characterise wind and wave forces, which</p><p>appear random but have a frequency range over which most of the power is</p><p>concentrated. The British Standard BS6841, “Measurement and evaluation of</p><p>human exposure to whole-body vibration”, uses spectral analysis to quantify</p><p>exposure of personnel to vibration and repeated shocks. Many of the early</p><p>applications of spectral analysis were of economic time series, and there has</p><p>been recent interest in using spectral methods for economic dynamics analysis</p><p>(Iacobucci and Noullez, 2005).</p><p>More generally, spectral analysis can be used to detect periodic signals</p><p>that are corrupted by noise. For example, spectral analysis of vibration signals</p><p>from machinery such as turbines and gearboxes is used to expose faults before</p><p>they cause catastrophic failure. The warning is given by the emergence of new</p><p>peaks in the spectrum. Astronomers use spectral analysis to measure the red</p><p>shift and hence deduce the speeds of galaxies relative to our own.</p><p>9.2 Periodic signals</p><p>9.2.1 Sine waves</p><p>Any signal that has a repeating pattern is periodic, with a period equal to</p><p>the length of the pattern. However, the fundamental periodic signal in mathe-</p><p>matics is the sine wave. Joseph Fourier (1768–1830) showed that sums of sine</p><p>waves can provide good approximations to most periodic signals, and spectral</p><p>analysis is based on sine waves.</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 171</p><p>Use R, DOI 10.1007/978-0-387-88698-5 9,</p><p>© Springer Science+Business Media, LLC 2009</p><p>172 9 Spectral Analysis</p><p>Spectral analysis can be confusing because different authors use different</p><p>notation. For example, frequency can be given in radians or cycles per sam-</p><p>pling interval, and frequency can be treated as positive or negative, or just</p><p>positive. You need to be familiar with the sine wave defined with respect to</p><p>a unit circle, and this relationship is so fundamental that the sine and cosine</p><p>functions are called circular functions.</p><p>Imagine a circle with unit radius and centre at the origin, O, with the</p><p>radius rotating at a rotational velocity of ω radians per unit of time. Let t</p><p>be time. The angle, ωt, in radians is measured as the distance around the</p><p>circumference from the positive real (horizontal) axis, with the anti-clockwise</p><p>rotation defined as positive (Fig. 9.1). So, if the radius sweeps out a full circle,</p><p>it has been rotated through an angle of 2π radians. The time taken for this</p><p>one revolution, or cycle, is 2π/ω and is known as the period.</p><p>The sine function, sin(ωt), is the projection of the radius onto the vertical</p><p>axis, and the cosine function, cos(ωt), is the projection of the radius onto the</p><p>horizontal axis. In general, a sine wave of frequency ω, amplitude A, and phase</p><p>ψ is</p><p>A sin(ωt+ ψ) (9.1)</p><p>The positive phase shift represents an advance of ψ/2π cycles. In spectral</p><p>analysis, it is convenient to refer to specific sine waves as harmonics. We rely</p><p>on the trigonometric identity that expresses a general sine wave as a weighted</p><p>sum of sine and cosine functions:</p><p>A sin(ωt+ ψ) = A cos(ψ)sin(ωt) +A sin(ψ)cos(ωt) (9.2)</p><p>Equation (9.2) is fundamental for spectral analysis because a sampled sine</p><p>wave of any given amplitude and phase can be fitted by a linear regression</p><p>model with the sine and cosine functions as predictor variables.</p><p>9.2.2 Unit of measurement of frequency</p><p>The SI1 unit of frequency is the hertz (Hz), which is 1 cycle per second and</p><p>equivalent to 2π radians per second. The hertz is a derived SI unit, and in</p><p>terms of fundamental SI units it has unit s−1. A frequency of f cycles per</p><p>second is equivalent to ω radians per second, where</p><p>ω = 2πf ⇔ f =</p><p>ω</p><p>2π</p><p>(9.3)</p><p>The mathematics is naturally expressed in radians, but Hz is generally used</p><p>in physical applications. By default, R plots have a frequency axis calibrated</p><p>in cycles per sampling interval.</p><p>1 SI is the International System of Units, abbreviated from the French Le Systéme</p><p>International d’Unités.</p><p>9.3 Spectrum 173</p><p>−1.0 −0.5 0.0 0.5 1.0</p><p>−</p><p>1.</p><p>0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>x (Real axis)</p><p>y</p><p>(I</p><p>m</p><p>ag</p><p>in</p><p>ar</p><p>y</p><p>ax</p><p>is</p><p>)</p><p>eiωωt</p><p>ωωt</p><p>cos((ωωt))</p><p>sin((ωωt))</p><p>Fig. 9.1. Angle ωt is the length along the radius. The projection of the radius onto</p><p>the x and y axes is cos(ωt) and sin(ωt), respectively.</p><p>9.3 Spectrum</p><p>9.3.1 Fitting sine waves</p><p>Suppose we have a time series of length n, {xt : t = 1, . . . , n}, where it is</p><p>convenient to arrange that n is even, if necessary by dropping the first or last</p><p>term. We can fit a time series regression with xt as the response and n − 1</p><p>predictor variables:</p><p>cos</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>, cos</p><p>(</p><p>4πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>4πt</p><p>n</p><p>)</p><p>, cos</p><p>(</p><p>6πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>6πt</p><p>n</p><p>)</p><p>, . . . ,</p><p>cos</p><p>(</p><p>2(n/2−1)πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>2(n/2−1)πt</p><p>n</p><p>)</p><p>, cos (πt).</p><p>We will denote the estimated coefficients by a1, b1, a2, b2, a3, b3, . . . , an/2−1,</p><p>bn/2−1, an/2, respectively, so</p><p>xt = a0 + a1cos</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>+ b1sin</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>+ · · ·</p><p>+ an/2−1cos</p><p>(</p><p>2(n/2− 1)πt</p><p>n</p><p>)</p><p>+ bn/2−1sin</p><p>(</p><p>2(n/2− 1)πt</p><p>n</p><p>)</p><p>+ an/2cos (πt)</p><p>Since the number of coefficients equals the length of the time series, there are</p><p>no degrees of freedom for error. The intercept term, a0, is just the mean x. The</p><p>lowest frequency is one cycle, or 2π radians, per record length, which is 2π/n</p><p>174 9 Spectral Analysis</p><p>radians per sampling interval. A general frequency, in this representation, is m</p><p>cycles per record length, equivalent to 2πm/n radians per sampling interval,</p><p>where m is an integer between 1 and n/2. The highest frequency is π radians</p><p>per sampling interval, or equivalently 0.5 cycles per sampling interval, and it</p><p>makes</p><p>n/2 cycles in the record length, alternating between −1 and +1 at the</p><p>sampling points. This regression model is a finite Fourier series for a discrete</p><p>time series.2</p><p>We will refer to the sine wave that makes m cycles in the record length</p><p>as the mth harmonic, and the first harmonic is commonly referred to as the</p><p>fundamental frequency . The amplitude of the mth harmonic is</p><p>Am =</p><p>√</p><p>a2</p><p>m + b2m</p><p>Parseval’s Theorem is the key result, and it expresses the variance of the time</p><p>series as a sum of n/2 components at integer frequencies from 1 to n/2 cycles</p><p>per record length:</p><p>1</p><p>n</p><p>∑n</p><p>t=1 x</p><p>2</p><p>t = A2</p><p>0 + 1</p><p>2</p><p>∑(n/2)−1</p><p>m=1 A2</p><p>m +A2</p><p>n/2</p><p>Var(x) = 1</p><p>2</p><p>∑(n/2)−1</p><p>m=1 A2</p><p>m +A2</p><p>n/2</p><p>(9.4)</p><p>Parseval’s Theorem follows from the fact that the sine and cosine terms used</p><p>as explanatory terms in the time series regression are uncorrelated, together</p><p>with the result for the variance of a linear combination of variables (Exer-</p><p>cise 1). A summary of the harmonics, and their corresponding frequencies</p><p>and periods,3 follows:</p><p>harmonic period frequency frequency contribution</p><p>(cycle/samp. int.) (rad/samp. int.) to variance</p><p>1 n 1/n 2π/n 1</p><p>2A</p><p>2</p><p>1</p><p>2 n/2 2/n 4π/n 1</p><p>2A</p><p>2</p><p>2</p><p>3 n/3 3/n 6π/n 1</p><p>2A</p><p>2</p><p>3</p><p>...</p><p>...</p><p>...</p><p>...</p><p>...</p><p>n/2− 1 n/(n/2− 1) (n/2− 1)/n (n− 2)π/n 1</p><p>2A</p><p>2</p><p>n/2−1</p><p>n/2 2 1/n π A2</p><p>n/2</p><p>Although we have introduced the Am in the context of a time series regres-</p><p>sion, the calculations are usually performed with the fast fourier transform</p><p>algorithm (FFT). We say more about this in §9.7.</p><p>2 A Fourier series is an approximation to a signal defined for continuous time over</p><p>a finite period. The signal may have discontinuities. The Fourier series is the sum</p><p>of an infinite number of sine and cosine terms.</p><p>3 The period of a sine wave is the time taken for 1 cycle and is the reciprocal of</p><p>the frequency measured in cycles per time unit.</p><p>9.4 Spectra of simulated series 175</p><p>9.3.2 Sample spectrum</p><p>A plot of A2</p><p>m, as spikes, against m is a Fourier line spectrum. The raw pe-</p><p>riodogram in R is obtained by joining the tips of the spikes in the Fourier</p><p>line spectrum to give a continuous plot and scaling it so that the area equals</p><p>the variance. The periodogram distributes the variance over frequency, but it</p><p>has two drawbacks. The first is that the precise set of frequencies is arbitrary</p><p>inasmuch as it depends on the record length. The second is that the peri-</p><p>odogram does not become smoother as the length of the time series increases</p><p>but just includes more spikes packed closer together. The remedy is to smooth</p><p>the periodogram by taking a moving average of spikes before joining the tips.</p><p>The smoothed periodogram is also known as the (sample) spectrum. We de-</p><p>note the spectrum of {xt} by Cxx(), with an argument ω or f depending on</p><p>whether it is expressed in radians or cycles per sampling interval. However,</p><p>the smoothing will reduce the heights of peaks, and excessive smoothing will</p><p>blur the features we are looking for. It is a good idea to consider spectra</p><p>with different amounts of smoothing, and this is made easy for us with the R</p><p>function spectrum. The argument span is the number of spikes in the moving</p><p>average,4 and is a useful guide for an initial value, for time series of lengths</p><p>up to a thousand, is twice the record length.</p><p>The time series should either be mean adjusted (mean subtracted) before</p><p>calculating the periodogram or the a0 spike should be set to 0 before averaging</p><p>spikes to avoid increasing the low-frequency contributions to the variance. In</p><p>R, the spectrum function goes further than this and removes a linear trend</p><p>from the series before calculating the periodogram. It seems appropriate to fit</p><p>a trend and remove it if the existence of a trend in the underlying stochastic</p><p>process is plausible. Although this will usually pertain, there may be cases in</p><p>which you wish to attribute an apparent trend in a time series to a fractionally</p><p>differenced process, and prefer not to remove a fitted trend. You could then use</p><p>the fft function and average the spikes to obtain a spectrum of the unadjusted</p><p>time series (§9.7).</p><p>The spectrum does not retain the phase information, though in the case</p><p>of stationary time series all phases are equally likely and the sample phases</p><p>have no theoretical interest.</p><p>9.4 Spectra of simulated series</p><p>9.4.1 White noise</p><p>We will start by generating an independent random sample from a normal</p><p>distribution. This is a realisation of a Gaussian white noise process. If no span</p><p>is specified in the spectrum function, R will use the heights of the Fourier line</p><p>4 Weighted moving averages can be used, and the choice of weights determines the</p><p>spectral window.</p><p>176 9 Spectral Analysis</p><p>spectrum spikes to construct a spectrum with no smoothing.5 We compare</p><p>this with a span of 65 in Figure 9.2.</p><p>> layout(1:2)</p><p>> set.seed(1)</p><p>> x spectrum(x, log = c("no"))</p><p>> spectrum(x, span = 65, log = c("no"))</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>2</p><p>4</p><p>6</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(a)</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>8</p><p>1.</p><p>1</p><p>1.</p><p>4</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(b)</p><p>Fig. 9.2. Realisation of Gaussian white noise: (a) raw periodogram; (b) spectrum</p><p>with span = 65.</p><p>The default is a logarithmic scale for the spectrum, but we have changed</p><p>this by setting the log parameter to "no". The frequency axis is cycles per</p><p>sampling interval.</p><p>The second spectrum is much smoother as a result of the moving average</p><p>of 65 adjacent spikes. Both spectra are scaled so that their area is one-half</p><p>the variance of the time series. The rationale for this is that the spectrum is</p><p>5 By default, spectrum applies a taper to the first 10% and last 10% of the series and</p><p>pads the series to a highly composite length. However, 2048 is highly composite,</p><p>and the taper has little effect on a realisation of this length.</p><p>9.4 Spectra of simulated series 177</p><p>defined from −0.5 to 0.5, and is symmetric about 0. However, in the context of</p><p>spectral analysis, there is no useful distinction between positive and negative</p><p>frequencies, and it is usual to plot the spectrum over [0, 0.5], scaled so that its</p><p>area equals the variance of the signal. So, for a report it is better to multiply</p><p>the R spectrum by a factor of 2 and to use hertz rather than cycles per sampling</p><p>interval for frequency. You can easily do this with the following R commands,</p><p>assuming the width of the sampling interval is Del (which would need to be</p><p>assigned first):</p><p>> x.spec spx spy plot (spx, spy, xlab = "Hz", ylab = "variance/Hz", type = "l")</p><p>The theoretical spectrum for independent random variation with variance</p><p>of unity is flat at 2 over the range [0, 0.5]. The name white noise is chosen</p><p>to be reminiscent of white light made up from equal contributions of energy</p><p>across the visible spectrum. An explanation for the flat spectrum arises from</p><p>the regression model. If we have independent random errors, the E[am] and</p><p>E[bm] will all be 0 and the E[A2</p><p>m] are all equal. Notice that the vertical scale</p><p>for the smoothed periodogram is from 0.8 to 1.4, so it is relatively flat (Fig.</p><p>9.2). If longer realisations are generated and the bandwidth is held constant,</p><p>the default R spectra will tend towards a flat line at a height of 1.</p><p>The bandwidths shown in Figure 9.2 are calculated from the R definition</p><p>of bandwidth as span×{0.5/(n/2)}/</p><p>√</p><p>12. A more common definition of band-</p><p>width in the context of spectral analysis is span/(n/2) cycles per sampling</p><p>interval. The latter definition is the spacing between statistically independent</p><p>estimates of the spectrum height, and it is larger than the R bandwidth by a</p><p>factor of 6.92.</p><p>The spectrum distributes variance over frequency, and the expected shape</p><p>does not depend on the distribution that is being sampled. You are asked to</p><p>investigate the effect, if any, of using random numbers from an exponential,</p><p>rather than normal, distribution in Exercise 2.</p><p>9.4.2 AR(1): Positive coefficient</p><p>We generate a realisation of length 1024 from an AR(1) process with α equal</p><p>to 0.9 and compare the time series</p><p>plot, correlogram, and spectrum in Figure</p><p>9.3.</p><p>> set.seed(1)</p><p>> x for (t in 2:1024) x[t] layout(1:3)</p><p>> plot(as.ts(x))</p><p>> acf(x)</p><p>> spectrum(x, span = 51, log = c("no"))</p><p>178 9 Spectral Analysis</p><p>(a)</p><p>Time</p><p>x</p><p>0 200 400 600 800 1000</p><p>−</p><p>6</p><p>−</p><p>2</p><p>2</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>20</p><p>40</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.3. Simulated AR(1) process with α = 0.9: (a) time plot; (b) correlogram; (c)</p><p>spectrum.</p><p>The plot of the time series shows the tendency for consecutive values to</p><p>be relatively similar, and change is relatively slow, so we might expect the</p><p>spectrum to pick up low-frequency variation. The acf quantifies the tendency</p><p>for consecutive values to be relatively similar. The spectrum confirms that</p><p>low-frequency variation dominates.</p><p>9.4.3 AR(1): Negative coefficient</p><p>We now change α from 0.9 to −0.9. The plot of the time series (Fig. 9.4)</p><p>shows the tendency for consecutive values to oscillate, change is rapid, and we</p><p>expect the spectrum to pick up high-frequency variation. The acf quantifies</p><p>the tendency for consecutive values to oscillate, and the spectrum shows high</p><p>frequency variation.</p><p>9.4.4 AR(2)</p><p>Consider an AR(2) process with parameters 1 and −0.6. This can be inter-</p><p>preted as a second-order difference equation describing the motion of a lightly</p><p>damped single mode system (Exercise 3), such as a mass on a spring, subjected</p><p>9.5 Sampling interval and record length 179</p><p>(a)</p><p>Time</p><p>x</p><p>0 200 400 600 800 1000</p><p>−</p><p>8</p><p>−</p><p>2</p><p>2</p><p>6</p><p>0 5 10 15 20 25 30</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>10</p><p>30</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.4. Simulated AR(1) process with α = −0.9: (a) time plot; (b) correlogram;</p><p>(c) spectrum.</p><p>to a sequence of white noise impulses. The spectrum in Figure 9.5 shows a</p><p>peak at the natural frequency of the system – the frequency at which the mass</p><p>will oscillate if the spring is extended and then released.</p><p>> set.seed(1)</p><p>> x for (t in 3:1024) x[t] layout (1:3)</p><p>> plot (as.ts(x))</p><p>> acf (x)</p><p>> spectrum (x, span = 51, log = c("no"))</p><p>9.5 Sampling interval and record length</p><p>Many time series are of an inherently continuous variable that is sampled to</p><p>give a time series at discrete time steps. For example, the National Climatic</p><p>Data Center (NCDC) provides 1-minute readings of temperature, wind speed,</p><p>and pressure at meteorological stations throughout the United States. It is</p><p>180 9 Spectral Analysis</p><p>(a)</p><p>Time</p><p>x</p><p>0 200 400 600 800 1000</p><p>−</p><p>4</p><p>0</p><p>4</p><p>0 5 10 15 20 25 30</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>2</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>4</p><p>8</p><p>12</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.5. Simulated AR(2) process with α1 = 1 and α2 = −0.6: (a) time plot; (b)</p><p>correlogram; (c) spectrum.</p><p>crucial that the continuous signal be sampled at a sufficiently high rate to</p><p>retain all its information. If the sampling rate is too low, we not only lose</p><p>information but will mistake high-frequency variation for variation at a lower</p><p>frequency. This latter phenomenon is known as aliasing and can have serious</p><p>consequences.</p><p>In signal processing applications, the measurement device may return a</p><p>voltage as a continuously varying electrical signal. However, analysis is usu-</p><p>ally performed on a digital computer, and the signal has to be sampled to give</p><p>a time series at discrete time steps. The sampling is known as analog-to-digital</p><p>conversion (A/D). Modern oscilloscopes sample at rates as high as Giga sam-</p><p>ples per second (GS/s) and have anti-alias filters, built from electronic com-</p><p>ponents, that remove any higher-frequency components in the original contin-</p><p>uous signal. Digital recordings of musical performances are typically sampled</p><p>at rates of 1 Mega sample per second (MS/s) after any higher-frequencies</p><p>have been removed with anti-alias filters. Since the frequency range of human</p><p>hearing is from about 15 to 20,000 Hz, sampling rates of 1 MS/s are quite</p><p>adequate for high-fidelity recordings.</p><p>9.5 Sampling interval and record length 181</p><p>9.5.1 Nyquist frequency</p><p>The Nyquist frequency is the cutoff frequency associated with a given sam-</p><p>pling rate and is one-half the sampling frequency. Once a continuous signal</p><p>is sampled, any frequency higher than the Nyquist frequency will be indistin-</p><p>guishable from its low-frequency alias.</p><p>To understand this phenomenon, suppose the sampling interval is ∆ and</p><p>the corresponding sampling frequency is 1/∆ samples per second. A sine wave</p><p>with a frequency of 1/∆ cycles per second is generated by the radius in Figure</p><p>9.1 rotating anti-clockwise at a rate of 1 revolution per sampling interval ∆,</p><p>and it follows that it cannot be detected when sampled at this rate. Similarly, a</p><p>sine wave with a frequency of −1/∆ cycles per second, generated by the radius</p><p>in Figure 9.1 rotating clockwise at a rate of 1 revolution per sampling interval</p><p>∆, is also undetectable. Now consider a sine wave with a frequency f that lies</p><p>within the interval [−1/(2∆), 1/(2∆)]. This sine wave will be indistinguishable</p><p>from any sine wave generated by a radius that completes an integer number</p><p>of additional revolutions, anti-clockwise or clockwise, during the sampling</p><p>interval. More formally, the frequency f will be indistinguishable from</p><p>f ± k∆ (9.5)</p><p>where k is an integer. Figure 9.6 shows a sine function with a frequency of 1 Hz,</p><p>sin(2πt), sampled at 0.2 s, together with its alias when k in Equation (9.5)</p><p>equals −1. This alias frequency is 1− 1/0.2, which equals −4 Hz. Physically,</p><p>a frequency of −4 Hz is identical to a frequency of 4 Hz, except for a phase</p><p>difference of half a cycle (sin(−θ) = − sin(θ) = sin(θ − π)).</p><p>> t tc x xc xa plot (t, x)</p><p>> lines (tc, xc)</p><p>> lines (tc, xa, lty = "dashed")</p><p>To summarise, the Nyquist frequency Q is related to the sampling interval</p><p>∆ by</p><p>Q =</p><p>1</p><p>2∆</p><p>(9.6)</p><p>and Q should be higher than any frequency components in the continuous</p><p>signal.</p><p>9.5.2 Record length</p><p>To begin with, we need to establish the highest frequency we can expect to</p><p>encounter and set the Nyquist frequency Q well above this. The Nyquist fre-</p><p>quency determines the sampling interval, ∆, from Equation (9.6). If the time</p><p>182 9 Spectral Analysis</p><p>Fig. 9.6. Aliased frequencies: 1 Hz and 4 Hz with ∆ = 0.2 second.</p><p>series has length n, the record length, T , is n∆. The fundamental frequency</p><p>is 1/T Hz, and this is the spacing between spikes in the Fourier line spec-</p><p>trum. If we wish to distinguish frequencies separated by ε Hz, we should aim</p><p>for independent estimates of the spectrum centred on these frequencies. This</p><p>implies that the bandwidth must be at most ε. If we take a moving average</p><p>of L spikes in the Fourier line spectrum, we have the following relationship:</p><p>2L</p><p>n∆</p><p>=</p><p>2L</p><p>T</p><p>≤ ε (9.7)</p><p>For example, suppose we wish to distinguish frequencies separated by 1 Hz</p><p>in an audio recording. A typical sampling rate for audio recording is 1 MS/s,</p><p>corresponding to ∆ = 0.000001. If we take L equal to 100, it follows from</p><p>Equation (9.7) that n must exceed 200×106. This is a long time series but the</p><p>record length is less than four minutes. If a time series of this length presents</p><p>computational problems, an alternative method for computing a smoothed</p><p>spectrum is to calculate the Fourier line spectrum for the 100 subseries of two</p><p>million observations and average these 100 Fourier line spectra.</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>0.0 0.5 1.0 1.5 2.0</p><p>−</p><p>1.</p><p>0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>t</p><p>x</p><p>9.6 Applications 183</p><p>9.6 Applications</p><p>9.6.1 Wave tank data</p><p>The data in the file wave.dat are the surface height, relative to still water</p><p>level, of water at the centre of a wave tank sampled over 39.6 seconds at a</p><p>rate of 10 samples per second. The aim of the analysis is to check whether the</p><p>spectrum is a realistic emulation of typical sea spectra. Referring to Figure</p><p>9.7, the time series plot gives a general impression of the wave profile over time</p><p>and we can see that there are no obvious erroneous values. The correlogram</p><p>is qualitatively similar to that for a realisation of an AR(2) process,6 but</p><p>an AR(2) model would not account for a second peak in the spectrum at a</p><p>frequency near 0.09.</p><p>> www wavetank.dat attach (wavetank.dat)</p><p>> layout (1:3)</p><p>> plot (as.ts(waveht))</p><p>> acf (waveht)</p><p>> spectrum (waveht)</p><p>The default method of fitting the spectrum used above does not require the</p><p>ar function. However, the ar function is used in §9.9 and selects an AR(13)</p><p>model. The shape of the estimated spectrum in Figure 9.7 is similar to that</p><p>of typical sea spectra.</p><p>9.6.2 Fault detection on electric motors</p><p>Induction motors are widely used in industry, and although they are generally</p><p>reliable, they do require maintenance. A common fault is broken rotor bars,</p><p>which reduce the output torque capability and increase vibration, and if left</p><p>undetected can lead to catastrophic failure of the electric motor. The measured</p><p>current spectrum of a typical motor in good condition will have a spike at</p><p>mains frequency, commonly 50 Hz, with side band peaks at 46 Hz and 54 Hz.</p><p>If a rotor bar breaks, the magnitude of the side band peaks will increase by a</p><p>factor of around 10. This increase can easily be detected in the spectrum.</p><p>Siau et al. (2004) compare current spectra for an induction motor in good</p><p>condition and with one broken bar. They sample the current at 0.0025-second</p><p>intervals, corresponding to a Nyquist frequency of 200 Hz, and calculate spec-</p><p>tra from records of 100 seconds length. The time series have length 40,000,</p><p>and the bandwidth with a span of 60 is 1.2 Hz (Equation (9.7)).</p><p>The data are in the file imotor.txt. R code for drawing the spectra (Fig.</p><p>9.8) follows. The broken bar condition is indicated clearly by the higher side</p><p>band peaks in the spectrum. In contrast, the standard deviations of the good</p><p>condition and broken condition time series are very close.</p><p>6 The pacf, not shown here, also suggests that an AR(2) model would be plausible.</p><p>184 9 Spectral Analysis</p><p>(a)</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>0 100 200 300 400</p><p>−</p><p>50</p><p>0</p><p>50</p><p>0</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0e</p><p>+</p><p>00</p><p>3e</p><p>+</p><p>05</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.7. Wave elevation series: (a) time plot; (b) correlogram; (c) spectrum.</p><p>> www imotor.dat attach (imotor.dat)</p><p>> xg.spec xb.spec freqg freqb plot(freqg, 10*log10(xg.spec$spec[4400:5600]), main = "",</p><p>xlab = "Frequency (Hz)", ylab = "Current spectrum (dB)", type="l")</p><p>> lines(freqb, 10 * log10(xb.spec$spec[4400:5600]), lty = "dashed")</p><p>> sd(good)</p><p>[1] 7071.166</p><p>> sd(broken)</p><p>[1] 7071.191</p><p>9.6.3 Measurement of vibration dose</p><p>The drivers of excavators in open cast mines are exposed to considerable me-</p><p>chanical vibration. The British Standard Guide BS6841:1987 is routinely used</p><p>to quantify the effects. A small engineering company has developed an active</p><p>9.6 Applications 185</p><p>44 46 48 50 52 54 56</p><p>0</p><p>20</p><p>40</p><p>60</p><p>80</p><p>10</p><p>0</p><p>Frequency (Hz)</p><p>C</p><p>ur</p><p>re</p><p>nt</p><p>s</p><p>pe</p><p>ct</p><p>ru</p><p>m</p><p>(</p><p>dB</p><p>)</p><p>Fig. 9.8. Spectrum of current signal from induction motor in good condition (solid)</p><p>and with broken rotor bar (dotted). Frequency is in cycles per 0.0025 second sam-</p><p>pling interval.</p><p>vibration absorber for excavators and has carried out tests. The company has</p><p>accelerometer measurements of the acceleration in the forward (x), sideways</p><p>(y), and vertical (z) directions during a rock-cutting operation. The estimated</p><p>vibration dose value is defined as</p><p>eV DV =</p><p>[</p><p>(1.4× ā)4 × T</p><p>]1/4</p><p>(9.8)</p><p>where ā is the root mean square value of frequency-weighted acceleration</p><p>(ms−2) and T is the duration (s). The mean square frequency-weighted accel-</p><p>eration in the vertical direction is estimated by</p><p>ā2</p><p>z =</p><p>∫</p><p>Cz̈z̈(f)W (f) df (9.9)</p><p>where the weighting function, W (f), represents the relative severity of vibra-</p><p>tion at different frequencies for a driver, and the acceleration time series is the</p><p>second derivative of the displacement signal, denoted z̈. Components in the</p><p>186 9 Spectral Analysis</p><p>forward and sideways directions are defined similarly, and then ā is calculated</p><p>as</p><p>ā = (ā2</p><p>x + ā2</p><p>y + ā2</p><p>z)</p><p>1/2 (9.10)</p><p>The data in the file zdd.txt are acceleration in the vertical direction (mm</p><p>s−2) measured over a 5-second period during a rock-cutting operation. The</p><p>sampling rate is 200 per second, and analog anti-aliasing filters were used to</p><p>remove any frequencies above 100 Hz in the continuous voltage signal from the</p><p>accelerometer. The frequency-weighting function was supplied by a medical</p><p>consultant. It is evaluated at 500 frequencies to match the spacing of the</p><p>spectrum ordinates and is given in vibdoswt.txt. The R routine has been</p><p>written to give diagrams in physical units, as required for a report.7</p><p>> www zdotdot.dat attach (zdotdot.dat)</p><p>> www wt.dat attach (wt.dat)</p><p>> acceln.spec Frequ Sord Time layout (1:3)</p><p>> plot (Time, Accelnz, xlab = "Time (s)",</p><p>ylab = expression(mm~ s^-2),</p><p>main = "Acceleration", type = "l")</p><p>> plot (Frequ, Sord, main = "Spectrum", xlab = "Frequency (Hz)",</p><p>ylab = expression(mm^2~s^-4~Hz^-1), type = "l")</p><p>> plot (Frequ, Weight, xlab = "Frequency (Hz)",</p><p>main = "Weighting function", type = "l")</p><p>> sd (Accelnz)</p><p>[1] 234.487</p><p>> sqrt( sum(Sord * Weight) * 0.2 )</p><p>[1] 179.9286</p><p>Suppose a driver is cutting rock for a 7-hour shift. The estimated root</p><p>mean square value of frequency weighted acceleration is 179.9 (mm s−2). If</p><p>we assume continuous exposure throughout the 7-hour period, the eVDV cal-</p><p>culated using Equation (9.8) is 3.17 (m s−1.75). The British Standard states</p><p>that doses as high as 15 will cause severe discomfort but is non-committal</p><p>about safe doses arising from daily exposure. The company needs to record</p><p>acceleration measurements during rock-cutting operations on different occa-</p><p>sions, with and without the vibration absorber activated. It can then estimate</p><p>the decrease in vibration dose that can be achieved by fitting the vibration</p><p>absorber to an excavator (Fig. 9.9).</p><p>7 Within R, type demo(plotmath) to see a list of mathematical operators that can</p><p>be used by the function expression for plots.</p><p>9.6 Applications 187</p><p>0 1 2 3 4 5</p><p>−</p><p>60</p><p>0</p><p>0</p><p>40</p><p>0</p><p>(a)</p><p>Time (s)</p><p>m</p><p>m</p><p>s</p><p>−−2</p><p>0 20 40 60 80 100</p><p>50</p><p>0</p><p>15</p><p>00</p><p>(b)</p><p>Frequency (Hz)</p><p>m</p><p>m</p><p>2 s</p><p>−−4</p><p>H</p><p>z−−1</p><p>0 20 40 60 80 100</p><p>0.</p><p>4</p><p>0.</p><p>7</p><p>1.</p><p>0</p><p>(c)</p><p>Frequency (Hz)</p><p>W</p><p>ei</p><p>gh</p><p>t</p><p>Fig. 9.9. Excavator series: (a) acceleration in vertical direction; (b) spectrum; (c)</p><p>frequency weighting function.</p><p>9.6.4 Climatic indices</p><p>Climatic indices are strongly related to ocean currents, which have a major</p><p>influence on weather patterns throughout the world. For example, El Niño is</p><p>associated with droughts throughout much of eastern Australia. A statistical</p><p>analysis of these indices is essential for two reasons. Firstly, it helps us assess</p><p>evidence of climate change. Secondly, it allows us to forecast, albeit with</p><p>limited confidence, potential natural disasters such as droughts and to take</p><p>action to mitigate the effects. Farmers, in particular, will modify their plans</p><p>for crop planting if drought is more likely than usual. Spectral analysis enables</p><p>us to identify any tendencies towards periodicities or towards persistence in</p><p>these indices.</p><p>The Southern Oscillation Index (SOI) is defined as the normalised pressure</p><p>difference between Tahiti and Darwin. El Niño events occur when the SOI is</p><p>strongly negative, and are associated with droughts in eastern Australia.</p><p>The</p><p>monthly time series8 from January 1866 until December 2006 are in soi.txt.</p><p>The time series plot in Figure 9.10 is a useful check that the data have been</p><p>read correctly and gives a general impression of the range and variability of</p><p>the SOI. But, it is hard to discern any frequency information. The spectrum</p><p>is plotted with a logarithmic vertical scale and includes a 95% confidence in-</p><p>terval for the population spectrum in the upper right. The confidence interval</p><p>can be represented as a vertical line relative to the position of the sample</p><p>8 More details and the data are at http://www.cru.uea.ac.uk/cru/data/soi.htm.</p><p>188 9 Spectral Analysis</p><p>spectrum indicated by the horizontal line, because it has a constant width on</p><p>a logarithmic scale (§9.10.2). The spectrum has a peak at a low-frequency, so</p><p>we enlarge the low frequency section of the spectrum to identify this frequency</p><p>more precisely. It is about 0.022 cycles per month and corresponds to a period</p><p>of 45 months. However, the peak is small and lower frequency contributions</p><p>to the spectrum are substantial, so we cannot expect a regular pattern of El</p><p>Niño events.</p><p>> www soi.dat attach (soi.dat)</p><p>> soi.ts layout (1:3)</p><p>> plot (soi.ts)</p><p>> soi.spec plot (soi.spec$freq[1:60], soi.spec$spec[1:60], type = "l")</p><p>(a)</p><p>Time</p><p>S</p><p>O</p><p>I</p><p>1880 1900 1920 1940 1960 1980 2000</p><p>−</p><p>4</p><p>0</p><p>2</p><p>4</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>5</p><p>2.</p><p>0</p><p>10</p><p>.0</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(b)</p><p>0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035</p><p>6.</p><p>0</p><p>7.</p><p>0</p><p>8.</p><p>0</p><p>9.</p><p>0</p><p>(c)</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>Fig. 9.10. Southern Oscillation Index: (a) time plot; (b) spectrum; (c) spectrum</p><p>for the low-frequencies.</p><p>The Pacific Decadal Oscillation (PDO) index is the difference between an</p><p>average of sea surface temperature anomalies in the North Pacific Ocean pole-</p><p>ward of 20 ◦N and the monthly mean global average anomaly.9 The monthly</p><p>time series from January 1900 until November 2007 is in pdo.txt. The spec-</p><p>trum in Figure 9.11 has no noteworthy peak and increases as the frequency</p><p>9 The time series data are available from http://jisao.washington.edu/pdo/.</p><p>9.6 Applications 189</p><p>becomes lower. The function spectrum removes a fitted linear trend before</p><p>calculating the spectrum, so the increase as the frequency tends to zero is</p><p>evidence of long-term memory in the PDO.</p><p>(a)</p><p>Time</p><p>P</p><p>D</p><p>O</p><p>1900 1920 1940 1960 1980 2000</p><p>−</p><p>3</p><p>0</p><p>2</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>2</p><p>1.</p><p>0</p><p>5.</p><p>0</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(b)</p><p>Fig. 9.11. Pacific Decadal Oscillation: (a) time plot; (b) spectrum.</p><p>> www pdo.dat attach (pdo.dat)</p><p>> pdo.ts layout (1:2)</p><p>> plot (pdo.ts)</p><p>> spectrum( PDO, span = sqrt( 2 * length(PDO) ) )</p><p>This analysis suggests that a FARIMA model might be suitable for modelling</p><p>the PDO and for generating future climate scenarios.</p><p>9.6.5 Bank loan rate</p><p>The data in mprime.txt are the monthly percentage US Federal Reserve Bank</p><p>prime loan rate,10 courtesy of the Board of Governors of the Federal Reserve</p><p>System, from January 1949 until November 2007. We will plot the time series,</p><p>the correlogram, and a spectrum on a logarithmic scale (Fig. 9.12).</p><p>10 Data downloaded from Federal Reserve Economic Data at the Federal Reserve</p><p>Bank of St. Louis.</p><p>190 9 Spectral Analysis</p><p>> www intr.dat attach (intr.dat)</p><p>> layout (1:3)</p><p>> plot (as.ts(Interest), ylab = 'Interest rate')</p><p>> acf (Interest)</p><p>> spectrum(Interest, span = sqrt(length(Interest)) / 4)</p><p>The height of the spectrum increases as the frequency tends to zero (Fig.</p><p>9.12). This feature is similar to that observed in the spectrum of the PDO</p><p>series in §9.6.5 and is again indicative of long-term memory, although it is less</p><p>pronounced in the loan rate series. In §8.4.3, we found that the estimate of the</p><p>fractional differencing parameter was close to 0 and that the apparent long</p><p>memory could be adequately accounted for by high-order ARMA models.</p><p>(a)</p><p>Time</p><p>In</p><p>te</p><p>re</p><p>st</p><p>r</p><p>at</p><p>e</p><p>0 100 200 300 400 500 600 700</p><p>5</p><p>10</p><p>20</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>1e</p><p>−</p><p>02</p><p>1e</p><p>+</p><p>02</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.12. Federal Reserve Bank loan rates: (a) time plot; (b) correlogram; (c) spec-</p><p>trum.</p><p>9.7 Discrete Fourier transform (DFT)*</p><p>The theoretical basis for spectral analysis can be described succinctly in terms</p><p>of the discrete Fourier transform (DFT). The DFT requires the concept of</p><p>9.7 Discrete Fourier transform (DFT)* 191</p><p>complex numbers and Euler’s formula for a complex sinusoid, but the theory</p><p>then follows nicely. In R, complex numbers are handled by typing i following,</p><p>without a space, a numerical value; for example,</p><p>> z1 z2 z1 - z2</p><p>[1] 3+4i</p><p>> z1 * z2</p><p>[1] 1-5i</p><p>> abs(z1)</p><p>[1] 3.61</p><p>Euler’s formula for a complex sinusoid is</p><p>eiθ = cos(θ) + i sin(θ) (9.11)</p><p>If the circle in Figure 9.1 is at the centre of the complex plane, eiθ is the point</p><p>along the circumference. This remarkable formula can be verified using Taylor</p><p>expansions of eiθ, sin(θ), and cos(θ).</p><p>The DFT is usually calculated using the fast fourier transform algorithm</p><p>(FFT), which is very efficient for long time series. The DFT of a time series of</p><p>length n, {xt : t = 0, . . . , n− 1}, and its inverse transform (IDFT) are defined</p><p>by Equation (9.12) and Equation (9.13), respectively.</p><p>Xm =</p><p>n−1∑</p><p>t=0</p><p>xte</p><p>−2πimt/n m = 0, . . . , n− 1 (9.12)</p><p>xt =</p><p>1</p><p>n</p><p>n−1∑</p><p>m=0</p><p>Xme</p><p>2πitm/n t = 0, . . . , n− 1 (9.13)</p><p>It is convenient to start the time series at t = 0 for these definitions be-</p><p>cause m then corresponds to frequency 2πm/n radians per sampling interval.</p><p>The steps in the derivation of the DFT-IDFT transform pair are set out in</p><p>Exercise 5. The DFT is obtained in R with the function fft(), where x[t+1]</p><p>corresponds to xt and X[m+1] corresponds to Xm.</p><p>> set.seed(1)</p><p>> n x x</p><p>[1] -0.626 0.184 -0.836 1.595 0.330 -0.820 0.487 0.738</p><p>> X X</p><p>192 9 Spectral Analysis</p><p>[1] 1.052+0.000i -0.852+0.007i 0.051+2.970i -1.060-2.639i</p><p>[5] -2.342+0.000i -1.060+2.639i 0.051-2.970i -0.852-0.007i</p><p>> fft(X, inverse = TRUE)/n</p><p>[1] -0.626-0i 0.184+0i -0.836-0i 1.595-0i 0.330+0i -0.820-0i</p><p>[7] 0.487+0i 0.738+0i</p><p>The complex form of Parseval’s Theorem, first given in Equation (9.4), is</p><p>n−1∑</p><p>t=0</p><p>x2</p><p>t =</p><p>n−1∑</p><p>m=0</p><p>|Xm|2/n (9.14)</p><p>If n is even, the |Xm|2 contribution to the variance corresponds to a frequency</p><p>of 2πm/n for m = 1, . . . , n/2. For m = n/2, . . . , (n − 1), the frequencies</p><p>are greater than the Nyquist frequency, π, and are aliased to the frequencies</p><p>2π(m−n)/n, which lie in the range [−π,−2π/n]. All but two of the Xm occur</p><p>as complex conjugate pairs; that is, Xn−j = X∗</p><p>j for j = 1, . . . , n/2 − 1. The</p><p>following lines of R code give the spikes of the Fourier line spectrum FL at</p><p>frequencies in frq scaled so that FL[1] is mean(x)^2 and the sum of FL[2],</p><p>..., FL[n/2+1] is(n-1)*var(x)/n.</p><p>> fq frq FL FL [1] frq[1] for ( j in 2:(n/2) ) {</p><p>FL [j] FL [n/2 + 1] frq[n/2 + 1]</p><p>193</p><p>9.8.1 Discrete white noise</p><p>The spectrum of discrete white noise with variance σ2 is easily obtained from</p><p>the definition since the only non-zero value of γk is σ2 when k = 0.</p><p>Γ (ω) =</p><p>σ2</p><p>2π</p><p>− π spectrum( waveht, log = c("no"), method = c("ar") )</p><p>The smooth shape is useful for qualitative comparisons with the sea spectra</p><p>(Fig. 9.13). The analysis also indicates that we could use an AR(13) model</p><p>to obtain realisations of time series with this same spectrum in computer</p><p>simulations. A well-chosen probability distribution for the errors could be</p><p>used to give a realistic simulation of extreme values in the series.</p><p>9.10 Finer details</p><p>9.10.1 Leakage</p><p>Suppose a time series is a sampled sine function at a specific frequency. If this</p><p>frequency corresponds to one of the frequencies in the finite Fourier series,</p><p>then there will be a spike in the Fourier line spectrum at this frequency. This</p><p>coincidence is unlikely to arise by chance, so now suppose that the specific</p><p>frequency lies between two of the frequencies in the finite Fourier series. There</p><p>will not only be spikes at these two frequencies but also smaller spikes at</p><p>neighbouring frequencies (Exercise 6). This phenomenon is known as leakage.</p><p>9.10 Finer details 195</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0e</p><p>+</p><p>00</p><p>1e</p><p>+</p><p>05</p><p>2e</p><p>+</p><p>05</p><p>3e</p><p>+</p><p>05</p><p>4e</p><p>+</p><p>05</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>Series: x</p><p>AR (13) spectrum</p><p>Fig. 9.13. Wave elevation series: spectrum calculated from fitting an AR model.</p><p>9.10.2 Confidence intervals</p><p>Consider a frequency ω0 corresponding to a spike of the Fourier line spec-</p><p>trum. If we average an odd number, L, of scaled spikes to obtain a smoothed</p><p>spectrum, then</p><p>C(ω0) =</p><p>1</p><p>L</p><p>(L−1)/2∑</p><p>l=−(L−1)/2</p><p>CRP (ωl) (9.24)</p><p>where CRP are the raw periodogram, scaled spike estimates. Now taking the</p><p>expectation of both sides of Equation (9.24), and assuming the raw peri-</p><p>odogram is unbiased for the population spectrum, we obtain</p><p>E [C(ω0)] =</p><p>1</p><p>L</p><p>(L−1)/2∑</p><p>l=−(L−1)/2</p><p>Γ (ωl) (9.25)</p><p>Provided the population spectrum does not vary much over the interval[</p><p>−ω−(L−1)/2, ω(L−1)/2</p><p>]</p><p>,</p><p>E [C(ω0)] ≈ Γ (ω0) (9.26)</p><p>But, notice that if ω0 corresponds to a peak or trough of the spectrum, the</p><p>smoothed spectrum will be biased low or high. The more the smoothing, the</p><p>196 9 Spectral Analysis</p><p>more the bias. However, some smoothing is essential to reduce the variability.</p><p>The following heuristic argument gives an approximate confidence interval for</p><p>the spectrum. If we divide both sides of Equation (9.24) by Γ (ω0) and take</p><p>the variance, we obtain</p><p>Var [C(ω0)/Γ (ω0)] ≈</p><p>1</p><p>L2</p><p>(L−1)/2∑</p><p>l=−(L−1)/2</p><p>Var [CRP (ωl)/Γ (ωl)] (9.27)</p><p>where we have used the fact that spikes in the Fourier line spectrum are</p><p>independent – a consequence of Parseval’s Theorem. Now each spike is an</p><p>estimate of variance at frequency ωl based on 2 degrees of freedom. So,</p><p>2CRP (ωl)</p><p>Γ (ωl)</p><p>∼ χ2</p><p>2 (9.28)</p><p>The variance of a chi-square distribution is twice its degrees of freedom. Hence,</p><p>Var [C(ω0)/Γ (ω0)] ≈</p><p>1</p><p>L</p><p>(9.29)</p><p>A scaled sum of L chi-square variables, each with 2 degrees of freedom, is a</p><p>scaled chi-square variable with 2L degrees of freedom and well approximated</p><p>by a normal distribution. Thus an approximate 95% confidence interval for</p><p>Γ (ω) is [(</p><p>1− 2√</p><p>L</p><p>)</p><p>C(ω),</p><p>(</p><p>1 +</p><p>2√</p><p>L</p><p>)</p><p>C(ω)</p><p>]</p><p>(9.30)</p><p>We have dropped the subscript on ω because the result remains a good ap-</p><p>proximation for estimates of the spectrum interpolated between C(ωl).</p><p>9.10.3 Daniell windows</p><p>The function spectrum uses a modified Daniell window, or smoother, that</p><p>gives half weight to the end values. If more than one number is specified for</p><p>the parameter span, it will use a series of Daniell smoothers, and the net result</p><p>will be a centred moving average with weights decreasing from the centre. The</p><p>rationale for using a series of smoothers is that it will decrease the bias.</p><p>9.10.4 Padding</p><p>The simplest FFT algorithm assumes that the time series has a length that is</p><p>some power of 2. A positive integer is highly composite if it has more divisors</p><p>than any smaller positive integer. The FFT algorithm is most efficient when</p><p>the length n is highly composite, and by default spec.pgram pads the mean</p><p>adjusted time series with zeros to reach the smallest highly composite number</p><p>that is greater than or equal to the length of the time series. Padding can be</p><p>9.11 Summary of additional commands used 197</p><p>avoided by setting the parameter fast=FALSE. A justification for padding is</p><p>that the length of the time series is arbitrary and that adding zeros has no</p><p>effect on the frequency composition. Adding zeros does reduce the variance,</p><p>and this must be remembered when scaling the spectrum, so that its area</p><p>equals the variance of the original time series.</p><p>9.10.5 Tapering</p><p>The length of a time series is not usually related to any underlying frequency</p><p>composition. However, the discrete Fourier series keeps replicating the original</p><p>time series as −∞</p><p>that are corrupted by noise. Spectral analy-</p><p>sis can be used for spatial series such as surface roughness transects, and</p><p>two-dimensional spectral analysis can be used for measurements of surface</p><p>roughness made over a plane. However, spectral analysis is not suitable for</p><p>non-stationary applications.</p><p>In contrast, wavelets have been developed to summarise the variation in</p><p>frequency composition through time or over space. There are many applica-</p><p>tions, including compression of digital files of images and in speech recognition</p><p>software. Nason (2008) provides an introduction to wavelets using the R pack-</p><p>age WaveThresh4.</p><p>9.11 Summary of additional commands used</p><p>spectrum returns the spectrum</p><p>spec.pgam returns the spectrum with more control of parameters</p><p>fft returns the DFT</p><p>198 9 Spectral Analysis</p><p>9.12 Exercises</p><p>1. Refer to §9.3.1 and take n = 128.</p><p>a) Use R to calculate cos(2πt/n), sin(2πt/n), and cos(4πt/n) for t =</p><p>1, . . . , n. Calculate the three variances and the three correlations.</p><p>b) Assuming the results above generalise, provide an explanation for Par-</p><p>seval’s Theorem.</p><p>c) Explain why the A2</p><p>n/2 term in Equation (9.4) is not divided by 2.</p><p>2. Repeat the investigation of realisations from AR processes in §9.4 using</p><p>random deviates from an exponential distribution with parameter 1 and</p><p>with its mean subtracted, rather than the standard normal distribution.</p><p>3. The differential equation for the oscillatory response x of a lightly damped</p><p>single mode of vibration system, such as a mass on a spring, with a forcing</p><p>term w is</p><p>ẍ+ 2ζΩẋ+Ω2x = w</p><p>where ζ is the damping coefficient, which must be less than 1 for an</p><p>oscillatory response, and Ω is the natural frequency. Approximate the</p><p>derivatives by backward differences:</p><p>ẍ = xt − 2xt−1 + xt−2 ẋ = xt − xt−1</p><p>and set w = wt and rearrange to obtain the form of the AR(2) process in</p><p>§8.4.4. Consider an approximation using central differences.</p><p>4. Suppose that</p><p>xt =</p><p>n−1∑</p><p>m=0</p><p>ame2πimt/n m = 0, . . . , n− 1 (9.31)</p><p>for some coefficients am that we wish to determine. Now multiply both</p><p>sides of this equation by e−2πijt/n and sum over t from 0 to n−1 to obtain</p><p>n−1∑</p><p>t=0</p><p>xte−2πijt/n =</p><p>n−1∑</p><p>t=0</p><p>n−1∑</p><p>m=0</p><p>ame2πi(m−j)t/n (9.32)</p><p>Consider a fixed value of j. Notice that the sum to the right of am is</p><p>a geometric series with sum 0 unless m = j. This is Equation (9.12)</p><p>expressed it terms of naj in place of Xm with a factor of n.</p><p>5. Write R code to average an odd number of spike heights obtained from</p><p>fft and hence plot a spectrum.</p><p>9.12 Exercises 199</p><p>6. Sample the three signals</p><p>a) sin(πt/2)</p><p>b) sin(3πt/4)</p><p>c) sin(5πt/8)</p><p>at times t = 0, . . . , 7, using fft to compare their line spectra.</p><p>7. Sample the signal sin(11πt/32) for t = 0, . . . , 31. Use fft to calculate the</p><p>Fourier line spectrum. The cosine bell taper applied to the beginning α</p><p>and ending α of a series is defined by[</p><p>1− cos</p><p>(</p><p>π{t+ 0.5}/{αn}</p><p>)]</p><p>xt (t+ 1) ≤ αn[</p><p>1− cos</p><p>(</p><p>π{n− t− 0.5}/{αn}</p><p>)]</p><p>xt (t+ 1) ≥ (1− α)n</p><p>Investigate the effect of this taper, with α = 0.1, on the Fourier line</p><p>spectrum of the sampled signal.</p><p>8. Sea spectra are sometimes modelled by the Peirson-Moskowitz spectrum,</p><p>which has the form below and is usually only appropriate for deep water</p><p>conditions.</p><p>Γ (ω) = aω−5e−bω−4</p><p>0 ≤ ω ≤ π</p><p>Plot the Peirson-Moskowitz spectrum in R for a few choices of parameters</p><p>a and b. Compare it with the wave elevation spectra (Fig. 9.7).</p><p>10</p><p>System Identification</p><p>10.1 Purpose</p><p>Vibration is defined as an oscillatory movement of some entity about an equi-</p><p>librium state. It is the means of producing sound in musical instruments, it</p><p>is the principle underlying the design of loudspeakers, and it describes the</p><p>response of buildings to earthquakes. The squealing of disc brakes on a car</p><p>is caused by vibration. The up and down motion of a ship at sea is a low-</p><p>frequency vibration. Spectral analysis provides the means for understanding</p><p>and controlling vibration.</p><p>Vibration is generally caused by some external force acting on a system,</p><p>and the relationship between the external force and the system response can</p><p>be described by a mathematical model of the system dynamics. We can use</p><p>spectral analysis to estimate the parameters of the mathematical model and</p><p>then use the model to make predictions of the response of the system under</p><p>different forces.</p><p>10.2 Identifying the gain of a linear system</p><p>10.2.1 Linear system</p><p>We consider systems that have clearly defined inputs and outputs, and aim</p><p>to deduce the system from measurements of the inputs and outputs or to</p><p>predict the output knowing the system and the input. Attempts to under-</p><p>stand economies and to control inflation by increasing interest rates provide</p><p>ambitious examples of applications of these principles.</p><p>A mathematical model of a dynamic system is linear if the output to a</p><p>sum of input variables, x and y, equals the sum of the outputs corresponding</p><p>to the individual inputs. More formally, a mathematical operator L is linear</p><p>if it satisfies</p><p>L (ax+ by) = aL(x) + bL(y)</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 201</p><p>Use R, DOI 10.1007/978-0-387-88698-5 10,</p><p>© Springer Science+Business Media, LLC 2009</p><p>202 10 System Identification</p><p>where a and b are constants. For a linear system, the output response to a</p><p>sine wave input is a sine wave of the same frequency with an amplitude that</p><p>is proportional to the amplitude of the input. The ratio of the output ampli-</p><p>tude to the input amplitude, known as the gain, and the phase lag between</p><p>input and output depend on the frequency of the input, and this dependence</p><p>provides a complete description of a linear system.</p><p>Many physical systems are well approximated by linear mathematical mod-</p><p>els, provided the input amplitude is not excessive. In principle, we can identify</p><p>a linear model by noting the output, commonly referred to as the response,</p><p>to a range of sine wave inputs. But there are practical limitations to such a</p><p>procedure. In many cases, while we may be able to measure the input, we</p><p>certainly cannot specify it. Examples are wave energy devices moored at sea,</p><p>and the response of structures to wind forcing. Even when we can specify the</p><p>input, recording the output over a range of frequencies is a slow procedure. In</p><p>contrast, provided we can measure the input and output, and the input has</p><p>a sufficiently broad spectrum, we can identify the linear system from spectral</p><p>analysis. Also, spectral methods have been developed for non-linear systems.</p><p>A related application of spectral analysis is that we can determine the</p><p>spectrum of the response if we know the system and the input spectrum.</p><p>For example, we can predict the output of a wave energy device if we have</p><p>a mathematical model for its dynamics and know typical sea spectra at its</p><p>mooring.</p><p>10.2.2 Natural frequencies</p><p>If a system is set in motion by an initial displacement or impact, it may oscil-</p><p>late, and this oscillation takes place at the natural frequency (or frequencies)</p><p>of the system. A simple example is the oscillation of a mass suspended by</p><p>a spring. Linear systems have large gains at natural frequencies and, if large</p><p>oscillations are undesirable, designers need to ensure that the natural frequen-</p><p>cies of the system are far removed from forcing frequencies. Alternatively, in</p><p>the case of wave energy devices, for example, the designer may aim for the</p><p>natural frequencies of the device to match predominant frequencies in the sea</p><p>spectrum. A common example of forcing a system at its natural frequency is</p><p>pushing a child on a swing.</p><p>10.2.3 Estimator of the gain function</p><p>If a linear system is forced by a sine wave of amplitude A at frequency f ,</p><p>the response has an amplitude G(f)A, where G(f) is the gain at frequency</p><p>f . The ratio of the variance of the output to the variance of the input, for</p><p>sine waves at this frequency, is G(f)2. If the input is a stationary random</p><p>process rather than a single sine wave, its variance is distributed over a range</p><p>of frequencies, and this distribution is described by the spectrum. It seems</p><p>intuitively reasonable</p><p>to estimate the square of the gain function by the ratio</p><p>10.4 Simulated single mode of vibration system 203</p><p>of the output spectrum to the input spectrum. Consider a linear system with</p><p>a single input, xt, and a single output, yt. The gain function can be estimated</p><p>by</p><p>Ĝ(f) =</p><p>√</p><p>Cyy(f)</p><p>Cuu(f)</p><p>(10.1)</p><p>A corollary is that the output spectrum can be estimated if the gain func-</p><p>tion is known, or has been estimated, and the input spectrum has been esti-</p><p>mated by</p><p>Cyy = G2Cuu (10.2)</p><p>Equation (10.2) also holds if spectra are expressed in radians rather than</p><p>cycles, in which case the gain is a function G(ω) of ω.</p><p>10.3 Spectrum of an AR(p) process</p><p>Consider the deterministic part of an AR(p) model with a complex sinusoid</p><p>input,</p><p>xt − α1xt−1 − . . .− αpxt−p = eiωt (10.3)</p><p>Assume a solution for xt of the form Aeiθeiωt, where A is a complex number,</p><p>and substitute this into Equation (10.3) to obtain</p><p>A =</p><p>(</p><p>1− α1e</p><p>−iω − . . .− αpe</p><p>−iωp</p><p>)−1</p><p>(10.4)</p><p>The gain function, expressed as a function of ω, is the absolute value of A. Now</p><p>consider a discrete white noise input, wt, in place of the complex sinusoid. The</p><p>system is now an AR(p) process. Applying Equation (10.2), with population</p><p>spectra rather than sample spectra, and noting that the spectrum of white</p><p>noise with unit variance is 1/π (§9.8.1), gives</p><p>Γxx(ω) = |A|2 Γww =</p><p>1</p><p>π</p><p>(</p><p>1− α1e</p><p>−iω − . . .− αpe</p><p>−iωp</p><p>)−2</p><p>0 ≤ ω m a0 a1 n y set.seed(1)</p><p>> for (i in 3:n) {</p><p>x[i] Sxx Syy Gemp Freq FreH Omeg OmegH Gth Gar plot(FreH, Gth, xlab = "Frequency (Hz)", ylab = "Gain", type="l")</p><p>> lines(FreH, Gemp, lty = "dashed")</p><p>> lines(FreH, Gar, lty = "dotted")</p><p>10.5 Ocean-going tugboat</p><p>The motion of ships and aircraft is described by displacements along the or-</p><p>thogonal x, y, and z axes and rotations about these axes. The displacements</p><p>are surge, sway, and heave along the x, y, and z axes, respectively. The ro-</p><p>tations about the x, y, and z axes are roll, pitch, and yaw, respectively (Fig.</p><p>10.2). So, there are six degrees of freedom for a ship’s motion in the ocean,</p><p>and there are six natural frequencies. However, the natural frequencies will</p><p>not usually correspond precisely to the displacements and rotations, as there</p><p>is a coupling between displacements and rotations. This is typically most pro-</p><p>nounced between heave and pitch. There will be a natural frequency with</p><p>206 10 System Identification</p><p>0 1 2 3 4 5</p><p>0.</p><p>00</p><p>0.</p><p>05</p><p>0.</p><p>10</p><p>0.</p><p>15</p><p>0.</p><p>20</p><p>0.</p><p>25</p><p>Frequency (Hz)</p><p>G</p><p>ai</p><p>n</p><p>Fig. 10.1. Gain of single-mode linear system. The theoretical gain is shown by</p><p>a solid line and the estimate made from the spectra obtained from the difference</p><p>equation is shown by a broken line. The theoretical gain of the difference equation</p><p>is plotted as a dotted line and coincides exactly with the estimate.</p><p>a corresponding mode that is predominantly heave, with a slight pitch, and</p><p>another natural frequency that is predominantly pitch, with a slight heave.</p><p>Naval architects will start with computer designs and then proceed to</p><p>model testing in a wave tank before building a prototype. They will have a</p><p>good idea of the frequency response of the ship from the models, but this will</p><p>have to be validated against sea trials. Here, we analyse some of the data from</p><p>the sea trials of an ocean-going tugboat. The ship sailed over an octagonal</p><p>course, and data were collected on each leg. There was an impressive array</p><p>of electronic instruments and, after processing analog signals through anti-</p><p>aliasing filters, data were recorded at 0.5s intervals for roll (degrees), pitch</p><p>(degrees), heave (m), surge (m), sway (m), yaw</p><p>(degrees), wave height (m),</p><p>and wind speed (knots).</p><p>> www tug.dat attach(tug.dat)</p><p>> Heave.spec Wave.spec G par(mfcol = c(2, 2))</p><p>> plot( as.ts(Wave) )</p><p>> acf(Wave)</p><p>> spectrum(Wave, span = sqrt(length(Heave)), log = c("no"), main = "")</p><p>> plot(Heave.spec$freq, G, xlab="frequency Hz", ylab="Gain", type="l")</p><p>Figure 10.3 shows the estimated wave spectrum and the estimated gain</p><p>from wave height to heave. The natural frequencies associated with the</p><p>heave/pitch modes are estimated as 0.075 Hz and 0.119 Hz, and the cor-</p><p>responding gains from wave to heave are 0.15179 and 0.1323. In theory, the</p><p>gain will approach 1 as the frequency approaches 0, but the sea spectrum has</p><p>negligible components very close to 0, and no sensible estimate can be made.</p><p>Also, the displacements were obtained by integrating accelerometer signals,</p><p>and this is not an ideal procedure at very low frequencies.</p><p>10.6 Non-linearity</p><p>There are several reasons why the hydrodynamic response of a ship will not</p><p>be precisely linear. In particular, the varying cross-section of the hull accounts</p><p>208 10 System Identification</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>0 500 1500 2500</p><p>−</p><p>1</p><p>0</p><p>1</p><p>0 5 10 20 30</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>0</p><p>1.</p><p>0</p><p>2.</p><p>0</p><p>Frequency</p><p>S</p><p>pe</p><p>ct</p><p>ru</p><p>m</p><p>bandwidth = 0.0052</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>00</p><p>0.</p><p>05</p><p>0.</p><p>10</p><p>0.</p><p>15</p><p>Frequency Hz</p><p>G</p><p>ai</p><p>n</p><p>Fig. 10.3. Gain of heave from wave.</p><p>for non-linear buoyancy forces. Metcalfe et al. (2007) investigate this by fitting</p><p>a regression of the heave response on lagged values of the response, squares,</p><p>and cross-products of these lagged values, wave height, and wind speed. The</p><p>probing method looks at the response of the fitted model to the sum of two</p><p>complex sinusoids at frequencies ω1 and ω2. The non-linear response can be</p><p>shown as a three-dimensional plot of the gain surface against frequency ω1</p><p>and ω2 or by a contour diagram. However, in this particular application the</p><p>gain associated with the non-linear terms was small compared with the gain of</p><p>the linear terms (Metcalfe et al., 2007). This is partly because the model was</p><p>fitted to data taken when the ship was in typical weather conditions – under</p><p>extreme conditions, when capsizing is likely, linear models are inadequate.</p><p>10.7 Exercises</p><p>1. The differential equation that describes the motion of a linear system with</p><p>a single mode of vibration, such as a mass on a spring, has the general</p><p>form</p><p>ÿ + 2ζΩẏ +Ω2y = x</p><p>10.7 Exercises 209</p><p>The parameter Ω is the undamped natural frequency, and the parameter</p><p>ζ is the damping coefficient. The response is oscillatory if ζ www CBE Elec.ts Choc.ts plot(as.vector(aggregate(Choc.ts)), as.vector(aggregate(Elec.ts)))</p><p>> cor(aggregate(Choc.ts), aggregate(Elec.ts))</p><p>[1] 0.958</p><p>The high correlation of 0.96 and the scatter plot do not imply that the elec-</p><p>tricity and chocolate production variables are causally related (Fig. 11.1). In-</p><p>stead, it is more plausible that the increasing Australian population accounts</p><p>for the increasing trend in both series. Although we can fit a regression of</p><p>one variable as a linear function of the other, with added random variation,</p><p>such regression models are usually termed spurious because of the lack of any</p><p>causal relationship. In this case, it would be far better to regress the variables</p><p>on the Australian population.</p><p>Fig. 11.1. Annual electricity and chocolate production plotted against each other.</p><p>The term spurious regression is also used when</p><p>underlying stochastic</p><p>trends in both series happen to be coincident, and this seems a more appro-</p><p>priate use of the term. Stochastic trends are a feature of an ARIMA process</p><p>with a unit root (i.e., B = 1 is a solution of the characteristic equation). We</p><p>illustrate this by simulating two independent random walks:</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>30000 40000 50000 60000 70000 80000 90000</p><p>20</p><p>00</p><p>0</p><p>60</p><p>00</p><p>0</p><p>10</p><p>00</p><p>00</p><p>14</p><p>00</p><p>00</p><p>Chocolate production</p><p>E</p><p>le</p><p>ct</p><p>ric</p><p>ity</p><p>p</p><p>ro</p><p>du</p><p>ct</p><p>io</p><p>n</p><p>11.2 Spurious regression 213</p><p>> set.seed(10); x for(i in 2:100) {</p><p>x[i] plot(x, y)</p><p>> cor(x, y)</p><p>[1] 0.904</p><p>The code above can be repeated for different random number seeds though</p><p>you will only sometimes notice spurious correlation. The seed value of 10 was</p><p>selected to provide an example of a strong correlation that could have resulted</p><p>by chance. The scatter plot shows how two independent time series variables</p><p>might appear related when each variable is subject to stochastic trends (Fig.</p><p>11.2).</p><p>Fig. 11.2. The values of two independent simulated random walks plotted against</p><p>each other. (See the code in the text.)</p><p>Stochastic trends are common in economic series, and so considerable care</p><p>is required when trying to determine any relationships between the variables</p><p>in multiple economic series. It may be that an underlying relationship can be</p><p>justified even when the series exhibit stochastic trends because two series may</p><p>be related by a common stochastic trend.</p><p>For example, the daily exchange rate series for UK pounds, the Euro, and</p><p>New Zealand dollars, given for the period January 2004 to December 2007,</p><p>are all per US dollar. The correlogram plots of the differenced UK and EU</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●●</p><p>●</p><p>0 2 4 6 8 10 12</p><p>−</p><p>5</p><p>0</p><p>5</p><p>10</p><p>x</p><p>y</p><p>214 11 Multivariate Models</p><p>series indicate that both exchange rates can be well approximated by random</p><p>walks (Fig. 11.3), whilst the scatter plot of the rates shows a strong linear</p><p>relationship (Fig. 11.4), which is supported by a high correlation of 0.95. Since</p><p>the United Kingdom is part of the European Economic Community (EEC),</p><p>any change in the Euro exchange rate is likely to be apparent in the UK</p><p>pound exchange rate, so there are likely to be fluctuations common to both</p><p>series; in particular, the two series may share a common stochastic trend. We</p><p>will discuss this phenomenon in more detail when we look at cointegration in</p><p>§11.4.</p><p>> www xrates xrates[1:3, ]</p><p>UK NZ EU</p><p>1 0.558 1.52 0.794</p><p>2 0.553 1.49 0.789</p><p>3 0.548 1.49 0.783</p><p>> acf( diff(xrates$UK) )</p><p>> acf( diff(xrates$EU) )</p><p>> plot(xrates$UK, xrates$EU, pch = 4)</p><p>> cor(xrates$UK, xrates$EU)</p><p>[1] 0.946</p><p>11.3 Tests for unit roots</p><p>When investigating any relationship between two time series variables we</p><p>should check whether time series models that contain unit roots are suitable.</p><p>If they are, we need to decide whether or not there is a common stochastic</p><p>trend. The first step is to see how well each series can be approximated as</p><p>a random walk by looking at the correlogram of the differenced series (e.g.,</p><p>Fig. 11.3). Whilst this may work for a simple random walk, we have seen in</p><p>Chapter 7 that stochastic trends are a feature of any time series model with</p><p>a unit root B = 1 as a solution of the characteristic equation, which would</p><p>include more complex ARIMA processes.</p><p>Dickey and Fuller developed a test of the null hypothesis that α = 1 against</p><p>an alternative hypothesis that α library(tseries)</p><p>> adf.test(x)</p><p>11.3 Tests for unit roots 215</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 11.3. Correlograms of the differenced exchange rate series: (a) UK rate; (b)</p><p>EU rate.</p><p>Augmented Dickey-Fuller Test</p><p>data: x</p><p>Dickey-Fuller = -2.23, Lag order = 4, p-value = 0.4796</p><p>alternative hypothesis: stationary</p><p>This result is not surprising since we would only expect 5% of simulated</p><p>random walks to provide evidence against a null hypothesis of a unit root</p><p>at the 5% level. However, when we analyse physical time series rather than</p><p>realisations from a known model, we should never mistake lack of evidence</p><p>against a hypothesis for a demonstration that the hypothesis is true. The test</p><p>result should be interpreted with careful consideration of the length of the</p><p>time series, which determines the power of the test, and the general context.</p><p>The null hypothesis of a unit root is favoured by economists because many</p><p>financial time series are better approximated by random walks than by a</p><p>stationary process, at least in the short term.</p><p>An alternative to the augmented Dickey-Fuller test, known as the Phillips-</p><p>Perron test (Perron, 1988), is implemented in the R function pp.test. The</p><p>distinction between the two tests is that the Phillips-Perron procedure esti-</p><p>mates the autocorrelations in the stationary process ut directly (using a kernel</p><p>smoother) rather than assuming an AR approximation, and for this reason</p><p>the Phillips-Perron test is described as semi-parametric. Critical values of the</p><p>test statistic are either based on asymptotic theory or calculated from exten-</p><p>216 11 Multivariate Models</p><p>0.70 0.75 0.80 0.85</p><p>0.</p><p>48</p><p>0.</p><p>52</p><p>0.</p><p>56</p><p>EU rate</p><p>U</p><p>K</p><p>r</p><p>at</p><p>e</p><p>Fig. 11.4. Scatter plot of the UK and EU exchange rates. Both rates are per US</p><p>dollar.</p><p>sive simulations. There is no evidence to reject the unit root hypothesis, so</p><p>we conclude that the UK pound and Euro exchange rates are both likely to</p><p>contain unit roots.</p><p>> pp.test(xrates$UK)</p><p>Phillips-Perron Unit Root Test</p><p>data: xrates$UK</p><p>Dickey-Fuller Z(alpha) = -10.6, Truncation lag parameter = 7,</p><p>p-value = 0.521</p><p>alternative hypothesis: stationary</p><p>> pp.test(xrates$EU)</p><p>Phillips-Perron Unit Root Test</p><p>data: xrates$EU</p><p>Dickey-Fuller Z(alpha) = -6.81, Truncation lag parameter = 7,</p><p>p-value = 0.7297</p><p>alternative hypothesis: stationary</p><p>11.4 Cointegration</p><p>11.4.1 Definition</p><p>Many multiple time series are highly correlated in time. For example, in §11.2</p><p>we found the UK pound and Euro exchange rates very highly correlated. This</p><p>is explained by the similarity of the two economies relative to the US economy.</p><p>Another example is the high correlation between the Australian electricity and</p><p>11.4 Cointegration 217</p><p>chocolate production series, which can be reasonably attributed to an increas-</p><p>ing Australian population rather than a causal relationship. In addition, we</p><p>demonstrated that two series that are independent and contain unit roots</p><p>(e.g., they follow independent random walks) can show an apparent linear re-</p><p>lationship, due to chance similarity of the random walks over the period of the</p><p>time series, and stated that such a correlation would be spurious. However,</p><p>as demonstrated by the analysis of the UK pounds and Euro exchange rates,</p><p>it is quite possible for two series to contain unit roots and be related. Such</p><p>series are said to be cointegrated. In the case of the exchange rates, a stochas-</p><p>tic trend in the US economy during a period when the European economy is</p><p>relatively stable will impart a common, complementary, stochastic trend to</p><p>the UK pound and Euro exchange rates. We now state the precise definition</p><p>of cointegration.</p><p>a variable is measured sequentially in time over or at a fixed</p><p>interval, known as the sampling interval , the resulting data form a time series.</p><p>Observations that have been collected over fixed sampling intervals form a</p><p>historical time series. In this book, we take a statistical approach in which the</p><p>historical series are treated as realisations of sequences of random variables. A</p><p>sequence of random variables defined at fixed sampling intervals is sometimes</p><p>referred to as a discrete-time stochastic process, though the shorter name</p><p>time series model is often preferred. The theory of stochastic processes is vast</p><p>and may be studied without necessarily fitting any models to data. However,</p><p>our focus will be more applied and directed towards model fitting and data</p><p>analysis, for which we will be using R.1</p><p>The main features of many time series are trends and seasonal varia-</p><p>tions that can be modelled deterministically with mathematical functions of</p><p>time. But, another important feature of most time series is that observations</p><p>close together in time tend to be correlated (serially dependent). Much of the</p><p>methodology in a time series analysis is aimed at explaining this correlation</p><p>and the main features in the data using appropriate statistical models and</p><p>descriptive methods. Once a good model is found and fitted to data, the an-</p><p>alyst can use the model to forecast future values, or generate simulations, to</p><p>guide planning decisions. Fitted models are also used as a basis for statistical</p><p>tests. For example, we can determine whether fluctuations in monthly sales</p><p>figures provide evidence of some underlying change in sales that we must now</p><p>allow for. Finally, a fitted statistical model provides a concise summary of the</p><p>main characteristics of a time series, which can often be essential for decision</p><p>makers such as managers or politicians.</p><p>Sampling intervals differ in their relation to the data. The data may have</p><p>been aggregated (for example, the number of foreign tourists arriving per day)</p><p>or sampled (as in a daily time series of close of business share prices). If data</p><p>are sampled, the sampling interval must be short enough for the time series</p><p>to provide a very close approximation to the original continuous signal when</p><p>it is interpolated. In a volatile share market, close of business prices may not</p><p>suffice for interactive trading but will usually be adequate to show a com-</p><p>pany’s financial performance over several years. At a quite different timescale,</p><p>1 R was initiated by Ihaka and Gentleman (1996) and is an open source implemen-</p><p>tation of S, a language for data analysis developed at Bell Laboratories (Becker</p><p>et al. 1988).</p><p>1.3 R language 3</p><p>time series analysis is the basis for signal processing in telecommunications,</p><p>engineering, and science. Continuous electrical signals are sampled to provide</p><p>time series using analog-to-digital (A/D) converters at rates that can be faster</p><p>than millions of observations per second.</p><p>1.3 R language</p><p>It is assumed that you have R (version 2 or higher) installed on your computer,</p><p>and it is suggested that you work through the examples, making sure your</p><p>output agrees with ours.2 If you do not have R, then it can be installed free</p><p>of charge from the Internet site www.r-project.org. It is also recommended</p><p>that you have some familiarity with the basics of R, which can be obtained</p><p>by working through the first few chapters of an elementary textbook on R</p><p>(e.g., Dalgaard 2002) or using the online “An Introduction to R”, which is</p><p>also available via the R help system – type help.start() at the command</p><p>prompt to access this.</p><p>R has many features in common with both functional and object oriented</p><p>programming languages. In particular, functions in R are treated as objects</p><p>that can be manipulated or used recursively.3 For example, the factorial func-</p><p>tion can be written recursively as</p><p>> Fact Fact(5)</p><p>[1] 120</p><p>In common with functional languages, assignments in R can be avoided,</p><p>but they are useful for clarity and convenience and hence will be used in</p><p>the examples that follow. In addition, R runs faster when ‘loops’ are avoided,</p><p>which can often be achieved using matrix calculations instead. However, this</p><p>can sometimes result in rather obscure-looking code. Thus, for the sake of</p><p>transparency, loops will be used in many of our examples. Note that R is case</p><p>sensitive, so that X and x, for example, correspond to different variables. In</p><p>general, we shall use uppercase for the first letter when defining new variables,</p><p>as this reduces the chance of overwriting inbuilt R functions, which are usually</p><p>in lowercase.4</p><p>2 Some of the output given in this book may differ slightly from yours. This is most</p><p>likely due to editorial changes made for stylistic reasons. For conciseness, we also</p><p>used options(digits=3) to set the number of digits to 4 in the computer output</p><p>that appears in the book.</p><p>3 Do not be concerned if you are unfamiliar with some of these computing terms,</p><p>as they are not really essential in understanding the material in this book. The</p><p>main reason for mentioning them now is to emphasise that R can almost certainly</p><p>meet your future statistical and programming needs should you wish to take the</p><p>study of time series further.</p><p>4 For example, matrix transpose is t(), so t should not be used for time.</p><p>4 1 Time Series Data</p><p>The best way to learn to do a time series analysis in R is through practice,</p><p>so we now turn to some examples, which we invite you to work through.</p><p>1.4 Plots, trends, and seasonal variation</p><p>1.4.1 A flying start: Air passenger bookings</p><p>The number of international passenger bookings (in thousands) per month</p><p>on an airline (Pan Am) in the United States were obtained from the Federal</p><p>Aviation Administration for the period 1949–1960 (Brown, 1963). The com-</p><p>pany used the data to predict future demand before ordering new aircraft and</p><p>training aircrew. The data are available as a time series in R and illustrate</p><p>several important concepts that arise in an exploratory time series analysis.</p><p>Type the following commands in R, and check your results against the</p><p>output shown here. To save on typing, the data are assigned to a variable</p><p>called AP.</p><p>> data(AirPassengers)</p><p>> AP AP</p><p>Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec</p><p>1949 112 118 132 129 121 135 148 148 136 119 104 118</p><p>1950 115 126 141 135 125 149 170 170 158 133 114 140</p><p>1951 145 150 178 163 172 178 199 199 184 162 146 166</p><p>1952 171 180 193 181 183 218 230 242 209 191 172 194</p><p>1953 196 196 236 235 229 243 264 272 237 211 180 201</p><p>1954 204 188 235 227 234 264 302 293 259 229 203 229</p><p>1955 242 233 267 269 270 315 364 347 312 274 237 278</p><p>1956 284 277 317 313 318 374 413 405 355 306 271 306</p><p>1957 315 301 356 348 355 422 465 467 404 347 305 336</p><p>1958 340 318 362 348 363 435 491 505 404 359 310 337</p><p>1959 360 342 406 396 420 472 548 559 463 407 362 405</p><p>1960 417 391 419 461 472 535 622 606 508 461 390 432</p><p>All data in R are stored in objects, which have a range of methods available.</p><p>The class of an object can be found using the class function:</p><p>> class(AP)</p><p>[1] "ts"</p><p>> start(AP); end(AP); frequency(AP)</p><p>[1] 1949 1</p><p>[1] 1960 12</p><p>[1] 12</p><p>1.4 Plots, trends, and seasonal variation 5</p><p>In this case, the object is of class ts, which is an abbreviation for ‘time</p><p>series’. Time series objects have a number of methods available, which include</p><p>the functions start, end, and frequency given above. These methods can be</p><p>listed using the function methods, but the output from this function is not</p><p>always helpful. The key thing to bear in mind is that generic functions in R,</p><p>such as plot or summary, will attempt to give the most appropriate output</p><p>to any given input object; try typing summary(AP) now to see what happens.</p><p>As the objective in this book is to analyse time series, it makes sense to</p><p>put our data into objects of class ts. This can be achieved using a function</p><p>also called ts, but this was not necessary for the airline data, which were</p><p>already stored in this form.</p><p>Two non-stationary time series {xt} and {yt} are cointegrated if some</p><p>linear combination axt + byt, with a and b constant, is a stationary</p><p>series.</p><p>As an example consider a random walk {µt} given by µt = µt−1 + wt,</p><p>where {wt} is white noise with zero mean, and two series {xt} and {yt} given</p><p>by xt = µt +wx,t and yt = µt +wy,t, where {wx,t} and {wy,t} are independent</p><p>white noise series with zero mean. Both series are non-stationary, but their</p><p>difference {xt − yt} is stationary since it is a finite linear combination of</p><p>independent white noise terms. Thus the linear combination of {xt} and {yt},</p><p>with a = 1 and b = −1, produced a stationary series, {wx,t−wy,t}. Hence {xt}</p><p>and {yt} are cointegrated and share the underlying stochastic trend {µt}.</p><p>In R, two series can be tested for cointegration using the Phillips-Ouliaris</p><p>test implemented in the function po.test within the tseries library. The</p><p>function requires the series be given in matrix form and produces the results</p><p>for a test of the null hypothesis that the two series are not cointegrated. As an</p><p>example, we simulate two cointegrated series x and y that share the stochastic</p><p>trend mu and test for cointegration using po.test:</p><p>> x for (i in 2:1000) mu[i] x y adf.test(x)$p.value</p><p>[1] 0.502</p><p>> adf.test(y)$p.value</p><p>[1] 0.544</p><p>> po.test(cbind(x, y))</p><p>Phillips-Ouliaris Cointegration Test</p><p>218 11 Multivariate Models</p><p>data: cbind(x, y)</p><p>Phillips-Ouliaris demeaned = -1020, Truncation lag parameter = 9,</p><p>p-value = 0.01</p><p>In the example above, the conclusion of the adf.test is to retain the null</p><p>hypothesis that the series have unit roots. The po.test provides evidence</p><p>that the series are cointegrated since the null hypothesis is rejected at the 1%</p><p>level.</p><p>11.4.2 Exchange rate series</p><p>The code below is an analysis of the UK pound and Euro exchange rate</p><p>series. The Phillips-Ouliaris test shows there is evidence that the series are</p><p>cointegrated, which justifies the use of a regression model. An ARIMA model</p><p>is then fitted to the residuals of the regression model. The ar function is used</p><p>to determine the best order of an AR process. We can investigate the adequacy</p><p>of our cointegrated model by using R to fit a more general ARIMA process to</p><p>the residuals. The best-fitting ARIMA model has d = 0, which is consistent</p><p>with the residuals being a realisation of a stationary process and hence the</p><p>series being cointegrated.</p><p>> po.test(cbind(xrates$UK, xrates$EU))</p><p>Phillips-Ouliaris Cointegration Test</p><p>data: cbind(xrates$UK, xrates$EU)</p><p>Phillips-Ouliaris demeaned = -21.7, Truncation lag parameter = 10,</p><p>p-value = 0.04118</p><p>> ukeu.lm ukeu.res ukeu.res.ar ukeu.res.ar$order</p><p>[1] 3</p><p>> AIC(arima(ukeu.res, order = c(3, 0, 0)))</p><p>[1] -9886</p><p>> AIC(arima(ukeu.res, order = c(2, 0, 0)))</p><p>[1] -9886</p><p>> AIC(arima(ukeu.res, order = c(1, 0, 0)))</p><p>[1] -9880</p><p>> AIC(arima(ukeu.res, order = c(1, 1, 0)))</p><p>[1] -9876</p><p>11.5 Bivariate and multivariate white noise 219</p><p>Comparing the AICs for the AR(2) and AR(3) models, it is clear there is</p><p>little difference and that the AR(2) model would be satisfactory. The example</p><p>above also shows that the AR models provide a better fit to the residual</p><p>series than the ARIMA(1, 1, 0) model, so the residual series may be treated</p><p>as stationary. This supports the result of the Phillips-Ouliaris test since a</p><p>linear combination of the two exchange rates, obtained from the regression</p><p>model, has produced a residual series that appears to be a realisation of a</p><p>stationary process.</p><p>11.5 Bivariate and multivariate white noise</p><p>Two series {wx,t} and {wy,t} are bivariate white noise if they are stationary</p><p>and their cross-covariance γxy(k) = Cov(wx,t, wy,t+k) satisfies</p><p>γxx(k) = γyy(k) = γxy(k) = 0 for all k 6= 0 (11.1)</p><p>In the equation above, γxx(0) = γyy(0) = 1 and γxy(0) may be zero or non-</p><p>zero. Hence, bivariate white noise series {wx,t} and {wy,t} may be regarded as</p><p>white noise when considered individually but when considered as a pair may</p><p>be cross-correlated at lag 0.</p><p>The definition of bivariate white noise readily extends to multivariate white</p><p>noise. Let γij(k) = Cov(wi,t, wj,t+k) be the cross-correlation between the se-</p><p>ries {wi,t} and {wj,t} (i, j = 1, . . . n). Then stationary series {w1,t}, {w2,t}, ...,</p><p>{wn,t} are multivariate white noise if each individual series is white noise and,</p><p>for each pair of series (i 6= j), γij(k) = 0 for all k 6= 0. In other words, multi-</p><p>variate white noise is a sequence of independent draws from some multivariate</p><p>distribution.</p><p>Multivariate Gaussian white noise can be simulated with the rmvnorm</p><p>function in the mvtnorm library. The function may take a mean and covari-</p><p>ance matrix as a parameter input, and the dimensions of these determine the</p><p>dimension of the output matrix. In the following example, the covariance ma-</p><p>trix is 2 × 2, so the output variable x is bivariate with 1000 simulated white</p><p>noise values in each of two columns. An arbitrary value of 0.8 is chosen for</p><p>the correlation to illustrate the use of the function.</p><p>> library(mvtnorm)</p><p>> cov.mat w cov(w)</p><p>[,1] [,2]</p><p>[1,] 1.073 0.862</p><p>[2,] 0.862 1.057</p><p>> wx wy ccf(wx, wy, main = "")</p><p>220 11 Multivariate Models</p><p>The ccf function verifies that the cross-correlations are approximately zero</p><p>for all non-zero lags (Fig. 11.5). As an exercise, check that the series in each</p><p>column of x are approximately white noise using the acf function.</p><p>One simple use of bivariate or multivariate white noise is in the method</p><p>of prewhitening. Separate SARIMA models are fitted to multiple time series</p><p>variables so that the residuals of the fitted models appear to be a realisation</p><p>of multivariate white noise. The SARIMA models can then be used to forecast</p><p>the expected values of each time series variable, and multivariate simulations</p><p>can be produced by adding multivariate white noise terms to the forecasts.</p><p>The method works well provided the multiple time series have no common</p><p>stochastic trends and the cross-correlation structure is restricted to the error</p><p>process.</p><p>−20 −10 0 10 20</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 11.5. Cross-correlation of simulated bivariate Gaussian white noise</p><p>11.6 Vector autoregressive models</p><p>Two time series, {xt} and {yt}, follow a vector autoregressive process of order</p><p>1 (denoted VAR(1)) if</p><p>xt = θ11xt−1 + θ12yt−1 + wx,t</p><p>yt = θ21xt−1 + θ22yt−1 + wy,t (11.2)</p><p>where {wx,t} and {wy,t} are bivariate white noise and θij are model param-</p><p>eters. If the white noise sequences are defined with mean 0 and the process</p><p>is stationary, both time series {xt} and {yt} have mean 0 (Exercise 1). The</p><p>simplest way of incorporating a mean is to define {xt} and {yt} as deviations</p><p>from mean values. Equation (11.2) can be rewritten in matrix notation as</p><p>Zt = ΘZt−1 + wt (11.3)</p><p>11.6 Vector autoregressive models 221</p><p>where</p><p>Zt =</p><p>(</p><p>xt</p><p>yt</p><p>)</p><p>Θ =</p><p>(</p><p>θ11 θ12</p><p>θ21 θ22</p><p>)</p><p>wt =</p><p>(</p><p>wx,t</p><p>wy,t</p><p>)</p><p>Equation (11.3) is a vector expression for an AR(1) process; i.e., the process</p><p>is vector autoregressive. Using the backward shift operator, Equation (11.3)</p><p>can also be written</p><p>(I−ΘB)Zt = θ(B)Zt = wt (11.4)</p><p>where θ is a matrix polynomial of order 1 and I is the 2× 2 identity matrix.</p><p>A VAR(1) process can be extended to a VAR(p) process by allowing θ to be</p><p>a matrix polynomial of order p. A VAR(p) model for m time series is also</p><p>defined by Equation (11.4), in which I is the m ×m identity matrix, θ is a</p><p>polynomial of m ×m matrices of parameters, Zt is an m × 1 matrix of time</p><p>series variables, and wt is multivariate white noise. For a VAR model, the</p><p>characteristic equation is given by a determinant of a matrix. Analogous to</p><p>AR models, a VAR(p) model is stationary if the roots of the determinant |θ(x)|</p><p>all exceed unity in absolute value. For the VAR(1) model, the determinant is</p><p>given by ∣∣∣∣ 1− θ11x −θ12x</p><p>−θ21x 1− θ22x</p><p>∣∣∣∣ = (1−</p><p>In the next example, we shall create a ts object</p><p>from data read directly from the Internet.</p><p>One of the most important steps in a preliminary time series analysis is to</p><p>plot the data; i.e., create a time plot. For a time series object, this is achieved</p><p>with the generic plot function:</p><p>> plot(AP, ylab = "Passengers (1000's)")</p><p>You should obtain a plot similar to Figure 1.1 below. Parameters, such as</p><p>xlab or ylab, can be used in plot to improve the default labels.</p><p>Time</p><p>P</p><p>as</p><p>se</p><p>ng</p><p>er</p><p>s</p><p>(1</p><p>00</p><p>0s</p><p>)</p><p>1950 1952 1954 1956 1958 1960</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 1.1. International air passenger bookings in the United States for the period</p><p>1949–1960.</p><p>There are a number of features in the time plot of the air passenger data</p><p>that are common to many time series (Fig. 1.1). For example, it is apparent</p><p>that the number of passengers travelling on the airline is increasing with time.</p><p>In general, a systematic change in a time series that does not appear to be</p><p>periodic is known as a trend . The simplest model for a trend is a linear increase</p><p>or decrease, and this is often an adequate approximation.</p><p>6 1 Time Series Data</p><p>A repeating pattern within each year is known as seasonal variation, al-</p><p>though the term is applied more generally to repeating patterns within any</p><p>fixed period, such as restaurant bookings on different days of the week. There</p><p>is clear seasonal variation in the air passenger time series. At the time, book-</p><p>ings were highest during the summer months of June, July, and August and</p><p>lowest during the autumn month of November and winter month of February.</p><p>Sometimes we may claim there are cycles in a time series that do not corre-</p><p>spond to some fixed natural period; examples may include business cycles or</p><p>climatic oscillations such as El Niño. None of these is apparent in the airline</p><p>bookings time series.</p><p>An understanding of the likely causes of the features in the plot helps us</p><p>formulate an appropriate time series model. In this case, possible causes of</p><p>the increasing trend include rising prosperity in the aftermath of the Second</p><p>World War, greater availability of aircraft, cheaper flights due to competition</p><p>between airlines, and an increasing population. The seasonal variation coin-</p><p>cides with vacation periods. In Chapter 5, time series regression models will</p><p>be specified to allow for underlying causes like these. However, many time</p><p>series exhibit trends, which might, for example, be part of a longer cycle or be</p><p>random and subject to unpredictable change. Random, or stochastic, trends</p><p>are common in economic and financial time series. A regression model would</p><p>not be appropriate for a stochastic trend.</p><p>Forecasting relies on extrapolation, and forecasts are generally based on</p><p>an assumption that present trends continue. We cannot check this assumption</p><p>in any empirical way, but if we can identify likely causes for a trend, we can</p><p>justify extrapolating it, for a few time steps at least. An additional argument</p><p>is that, in the absence of some shock to the system, a trend is likely to change</p><p>relatively slowly, and therefore linear extrapolation will provide a reasonable</p><p>approximation for a few time steps ahead. Higher-order polynomials may give</p><p>a good fit to the historic time series, but they should not be used for extrap-</p><p>olation. It is better to use linear extrapolation from the more recent values</p><p>in the time series. Forecasts based on extrapolation beyond a year are per-</p><p>haps better described as scenarios. Expecting trends to continue linearly for</p><p>many years will often be unrealistic, and some more plausible trend curves</p><p>are described in Chapters 3 and 5.</p><p>A time series plot not only emphasises patterns and features of the data</p><p>but can also expose outliers and erroneous values. One cause of the latter is</p><p>that missing data are sometimes coded using a negative value. Such values</p><p>need to be handled differently in the analysis and must not be included as</p><p>observations when fitting a model to data.5 Outlying values that cannot be</p><p>attributed to some coding should be checked carefully. If they are correct,</p><p>5 Generally speaking, missing values are suitably handled by R, provided they are</p><p>correctly coded as ‘NA’. However, if your data do contain missing values, then it</p><p>is always worth checking the ‘help’ on the R function that you are using, as an</p><p>extra parameter or piece of coding may be required.</p><p>1.4 Plots, trends, and seasonal variation 7</p><p>they are likely to be of particular interest and should not be excluded from</p><p>the analysis. However, it may be appropriate to consider robust methods of</p><p>fitting models, which reduce the influence of outliers.</p><p>To get a clearer view of the trend, the seasonal effect can be removed by</p><p>aggregating the data to the annual level, which can be achieved in R using the</p><p>aggregate function. A summary of the values for each season can be viewed</p><p>using a boxplot, with the cycle function being used to extract the seasons</p><p>for each item of data.</p><p>The plots can be put in a single graphics window using the layout func-</p><p>tion, which takes as input a vector (or matrix) for the location of each plot</p><p>in the display window. The resulting boxplot and annual series are shown in</p><p>Figure 1.2.</p><p>> layout(1:2)</p><p>> plot(aggregate(AP))</p><p>> boxplot(AP ~ cycle(AP))</p><p>You can see an increasing trend in the annual series (Fig. 1.2a) and the sea-</p><p>sonal effects in the boxplot. More people travelled during the summer months</p><p>of June to September (Fig. 1.2b).</p><p>1.4.2 Unemployment: Maine</p><p>Unemployment rates are one of the main economic indicators used by politi-</p><p>cians and other decision makers. For example, they influence policies for re-</p><p>gional development and welfare provision. The monthly unemployment rate</p><p>for the US state of Maine from January 1996 until August 2006 is plotted</p><p>in the upper frame of Figure 1.3. In any time series analysis, it is essential</p><p>to understand how the data have been collected and their unit of measure-</p><p>ment. The US Department of Labor gives precise definitions of terms used to</p><p>calculate the unemployment rate.</p><p>The monthly unemployment data are available in a file online that is read</p><p>into R in the code below. Note that the first row in the file contains the name</p><p>of the variable (unemploy), which can be accessed directly once the attach</p><p>command is given. Also, the header parameter must be set to TRUE so that R</p><p>treats the first row as the variable name rather than data.</p><p>> www Maine.month attach(Maine.month)</p><p>> class(Maine.month)</p><p>[1] "data.frame"</p><p>When we read data in this way from an ASCII text file, the ‘class’ is not</p><p>time series but data.frame. The ts function is used to convert the data to a</p><p>time series object. The following command creates a time series object:</p><p>8 1 Time Series Data</p><p>(a) Aggregated annual series</p><p>1950 1952 1954 1956 1958 1960</p><p>20</p><p>00</p><p>40</p><p>00</p><p>Jan Mar May Jul Sep Nov</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>(b) Boxplot of seasonal values</p><p>Fig. 1.2. International air passenger bookings in the United States for the period</p><p>1949–1960. Units on the y-axis are 1000s of people. (a) Series aggregated to the</p><p>annual level; (b) seasonal boxplots of the data.</p><p>> Maine.month.ts Maine.annual.ts layout(1:2)</p><p>> plot(Maine.month.ts, ylab = "unemployed (%)")</p><p>> plot(Maine.annual.ts, ylab = "unemployed (%)")</p><p>We can calculate the precise percentages in R, using window. This</p><p>function</p><p>will extract that part of the time series between specified start and end points</p><p>1.4 Plots, trends, and seasonal variation 9</p><p>and will sample with an interval equal to frequency if its argument is set to</p><p>TRUE. So, the first line below gives a time series of February figures.</p><p>> Maine.Feb Maine.Aug Feb.ratio Aug.ratio Feb.ratio</p><p>[1] 1.223</p><p>> Aug.ratio</p><p>[1] 0.8164</p><p>On average, unemployment is 22% higher in February and 18% lower in</p><p>August. An explanation is that Maine attracts tourists during the summer,</p><p>and this creates more jobs. Also, the period before Christmas and over the</p><p>New Year’s holiday tends to have higher employment rates than the first few</p><p>months of the new year. The annual unemployment rate was as high as 8.5%</p><p>in 1976 but was less than 4% in 1988 and again during the three years 1999–</p><p>2001. If we had sampled the data in August of each year, for example, rather</p><p>than taken yearly averages, we would have consistently underestimated the</p><p>unemployment rate by a factor of about 0.8.</p><p>(a)</p><p>un</p><p>em</p><p>pl</p><p>oy</p><p>ed</p><p>(</p><p>%</p><p>)</p><p>1996 1998 2000 2002 2004 2006</p><p>3</p><p>4</p><p>5</p><p>6</p><p>(b)</p><p>un</p><p>em</p><p>pl</p><p>oy</p><p>ed</p><p>(</p><p>%</p><p>)</p><p>1996 1998 2000 2002 2004</p><p>3.</p><p>5</p><p>4.</p><p>0</p><p>4.</p><p>5</p><p>5.</p><p>0</p><p>Fig. 1.3. Unemployment in Maine: (a) monthly January 1996–August 2006; (b)</p><p>annual 1996–2005.</p><p>10 1 Time Series Data</p><p>Time</p><p>un</p><p>em</p><p>pl</p><p>oy</p><p>ed</p><p>(</p><p>%</p><p>)</p><p>1996 1998 2000 2002 2004 2006</p><p>4.</p><p>0</p><p>4.</p><p>5</p><p>5.</p><p>0</p><p>5.</p><p>5</p><p>6.</p><p>0</p><p>Fig. 1.4. Unemployment in the United States January 1996–October 2006.</p><p>The monthly unemployment rate for all of the United States from January</p><p>1996 until October 2006 is plotted in Figure 1.4. The decrease in the unem-</p><p>ployment rate around the millennium is common to Maine and the United</p><p>States as a whole, but Maine does not seem to be sharing the current US</p><p>decrease in unemployment.</p><p>> www US.month attach(US.month)</p><p>> US.month.ts plot(US.month.ts, ylab = "unemployed (%)")</p><p>1.4.3 Multiple time series: Electricity, beer and chocolate data</p><p>Here we illustrate a few important ideas and concepts related to multiple time</p><p>series data. The monthly supply of electricity (millions of kWh), beer (Ml),</p><p>and chocolate-based production (tonnes) in Australia over the period January</p><p>1958 to December 1990 are available from the Australian Bureau of Statistics</p><p>(ABS).6 The three series have been stored in a single file online, which can be</p><p>read as follows:</p><p>www CBE[1:4, ]</p><p>choc beer elec</p><p>1 1451 96.3 1497</p><p>2 2037 84.4 1463</p><p>3 2477 91.2 1648</p><p>4 2785 81.9 1595</p><p>6 ABS data used with permission from the Australian Bureau of Statistics:</p><p>http://www.abs.gov.au.</p><p>1.4 Plots, trends, and seasonal variation 11</p><p>> class(CBE)</p><p>[1] "data.frame"</p><p>Now create time series objects for the electricity, beer, and chocolate data.</p><p>If you omit end, R uses the full length of the vector, and if you omit the month</p><p>in start, R assumes 1. You can use plot with cbind to plot several series on</p><p>one figure (Fig. 1.5).</p><p>> Elec.ts Beer.ts Choc.ts plot(cbind(Elec.ts, Beer.ts, Choc.ts))</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>E</p><p>le</p><p>c.</p><p>ts</p><p>10</p><p>0</p><p>15</p><p>0</p><p>20</p><p>0</p><p>B</p><p>ee</p><p>r.</p><p>ts</p><p>20</p><p>00</p><p>40</p><p>00</p><p>60</p><p>00</p><p>80</p><p>00</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>C</p><p>ho</p><p>c.</p><p>ts</p><p>Time</p><p>Chocolate, Beer, and Electricity Production: 1958−1990</p><p>Fig. 1.5. Australian chocolate, beer, and electricity production; January 1958–</p><p>December 1990.</p><p>The plots in Figure 1.5 show increasing trends in production for all three</p><p>goods, partly due to the rising population in Australia from about 10 million</p><p>to about 18 million over the same period (Fig. 1.6). But notice that electricity</p><p>production has risen by a factor of 7, and chocolate production by a factor of</p><p>4, over this period during which the population has not quite doubled.</p><p>The three series constitute a multiple time series. There are many functions</p><p>in R for handling more than one series, including ts.intersect to obtain the</p><p>intersection of two series that overlap in time. We now illustrate the use of the</p><p>intersect function and point out some potential pitfalls in analysing multiple</p><p>12 1 Time Series Data</p><p>1900 1920 1940 1960 1980 2000</p><p>5</p><p>10</p><p>15</p><p>m</p><p>ill</p><p>io</p><p>ns</p><p>Fig. 1.6. Australia’s population, 1900–2000.</p><p>time series. The intersection between the air passenger data and the electricity</p><p>data is obtained as follows:</p><p>> AP.elec start(AP.elec)</p><p>[1] 1958 1</p><p>> end(AP.elec)</p><p>[1] 1960 12</p><p>> AP.elec[1:3, ]</p><p>AP Elec.ts</p><p>[1,] 340 1497</p><p>[2,] 318 1463</p><p>[3,] 362 1648</p><p>In the code below, the data for each series are extracted and plotted</p><p>(Fig. 1.7).7</p><p>> AP layout(1:2)</p><p>> plot(AP, main = "", ylab = "Air passengers / 1000's")</p><p>> plot(Elec, main = "", ylab = "Electricity production / MkWh")</p><p>> plot(as.vector(AP), as.vector(Elec),</p><p>xlab = "Air passengers / 1000's",</p><p>ylab = "Electricity production / MWh")</p><p>> abline(reg = lm(Elec ~ AP))</p><p>7 R is case sensitive, so lowercase is used here to represent the shorter record of air</p><p>passenger data. In the code, we have also used the argument main="" to suppress</p><p>unwanted titles.</p><p>1.4 Plots, trends, and seasonal variation 13</p><p>> cor(AP, Elec)</p><p>[1] 0.884</p><p>In the plot function above, as.vector is needed to convert the ts objects to</p><p>ordinary vectors suitable for a scatter plot.</p><p>Time</p><p>A</p><p>ir</p><p>P</p><p>as</p><p>se</p><p>ng</p><p>er</p><p>s</p><p>(1</p><p>00</p><p>0s</p><p>)</p><p>1958.0 1958.5 1959.0 1959.5 1960.0 1960.5 1961.0</p><p>30</p><p>0</p><p>40</p><p>0</p><p>50</p><p>0</p><p>60</p><p>0</p><p>Time</p><p>E</p><p>le</p><p>ct</p><p>ric</p><p>ity</p><p>p</p><p>ro</p><p>du</p><p>ct</p><p>io</p><p>n</p><p>(G</p><p>M</p><p>kW</p><p>h)</p><p>1958.0 1958.5 1959.0 1959.5 1960.0 1960.5 1961.0</p><p>16</p><p>00</p><p>20</p><p>00</p><p>Fig. 1.7. International air passengers and Australian electricity production for the</p><p>period 1958–1960. The plots look similar because both series have an increasing</p><p>trend and a seasonal cycle. However, this does not imply that there exists a causal</p><p>relationship between the variables.</p><p>The two time series are highly correlated, as can be seen in the plots, with a</p><p>correlation coefficient of 0.88. Correlation will be discussed more in Chapter 2,</p><p>but for the moment observe that the two time plots look similar (Fig. 1.7) and</p><p>that the scatter plot shows an approximate linear association between the two</p><p>variables (Fig. 1.8). However, it is important to realise that correlation does</p><p>not imply causation. In this case, it is not plausible that higher numbers of</p><p>air passengers in the United States cause, or are caused by, higher electricity</p><p>production in Australia. A reasonable explanation for the correlation is that</p><p>the increasing prosperity and technological development in both countries over</p><p>this period accounts for the increasing trends. The two time series also happen</p><p>to have similar seasonal variations. For these reasons, it is usually appropriate</p><p>to remove trends and seasonal effects before comparing multiple series. This</p><p>is often achieved by working with the residuals of a regression model that has</p><p>deterministic terms to represent the trend and seasonal effects (Chapter 5).</p><p>14 1 Time Series Data</p><p>In the simplest cases, the residuals can be modelled as independent random</p><p>variation from a single distribution, but much of the book is concerned with</p><p>fitting more sophisticated models.</p><p>Fig. 1.8. Scatter plot of air passengers and Australian electricity production for</p><p>the period: 1958–1960. The apparent linear relationship between the two variables</p><p>is misleading and a consequence of the trends in the series.</p><p>1.4.4 Quarterly exchange rate: GBP to NZ dollar</p><p>The trends and seasonal patterns in the previous two examples were clear</p><p>from the plots. In addition, reasonable explanations</p><p>could be put forward for</p><p>the possible causes of these features. With financial data, exchange rates for</p><p>example, such marked patterns are less likely to be seen, and different methods</p><p>of analysis are usually required. A financial series may sometimes show a</p><p>dramatic change that has a clear cause, such as a war or natural disaster. Day-</p><p>to-day changes are more difficult to explain because the underlying causes are</p><p>complex and impossible to isolate, and it will often be unrealistic to assume</p><p>any deterministic component in the time series model.</p><p>The exchange rates for British pounds sterling to New Zealand dollars</p><p>for the period January 1991 to March 2000 are shown in Figure 1.9. The</p><p>data are mean values taken over quarterly periods of three months, with the</p><p>first quarter being January to March and the last quarter being October to</p><p>December. They can be read into R from the book website and converted to</p><p>a quarterly time series as follows:</p><p>> www Z Z[1:4, ]</p><p>[1] 2.92 2.94 3.17 3.25</p><p>> Z.ts plot(Z.ts, xlab = "time / years",</p><p>ylab = "Quarterly exchange rate in $NZ / pound")</p><p>Short-term trends are apparent in the time series: After an initial surge</p><p>ending in 1992, a negative trend leads to a minimum around 1996, which is</p><p>followed by a positive trend in the second half of the series (Fig. 1.9).</p><p>The trend seems to change direction at unpredictable times rather than</p><p>displaying the relatively consistent pattern of the air passenger series and</p><p>Australian production series. Such trends have been termed stochastic trends</p><p>to emphasise this randomness and to distinguish them from more deterministic</p><p>trends like those seen in the previous examples. A mathematical model known</p><p>as a random walk can sometimes provide a good fit to data like these and is</p><p>fitted to this series in §4.4.2. Stochastic trends are common in financial series</p><p>and will be studied in more detail in Chapters 4 and 7.</p><p>Time (years)</p><p>E</p><p>xc</p><p>ha</p><p>ng</p><p>e</p><p>ra</p><p>te</p><p>in</p><p>$</p><p>N</p><p>Z</p><p>/</p><p>po</p><p>un</p><p>d</p><p>1992 1994 1996 1998 2000</p><p>2.</p><p>2</p><p>2.</p><p>6</p><p>3.</p><p>0</p><p>3.</p><p>4</p><p>Fig. 1.9. Quarterly exchange rates for the period 1991–2000.</p><p>Two local trends are emphasised when the series is partitioned into two</p><p>subseries based on the periods 1992–1996 and 1996–1998. The window function</p><p>can be used to extract the subseries:</p><p>> Z.92.96 Z.96.98 layout (1:2)</p><p>> plot(Z.92.96, ylab = "Exchange rate in $NZ/pound",</p><p>xlab = "Time (years)" )</p><p>> plot(Z.96.98, ylab = "Exchange rate in $NZ/pound",</p><p>xlab = "Time (years)" )</p><p>Now suppose we were observing this series at the start of 1992; i.e., we</p><p>had the data in Figure 1.10(a). It might have been tempting to predict a</p><p>16 1 Time Series Data</p><p>(a) Exchange rates for 1992−1996</p><p>Time (years)</p><p>E</p><p>xc</p><p>ha</p><p>ng</p><p>e</p><p>ra</p><p>te</p><p>in</p><p>$</p><p>N</p><p>Z</p><p>/</p><p>po</p><p>un</p><p>d</p><p>1992 1993 1994 1995 1996</p><p>2.</p><p>2</p><p>2.</p><p>6</p><p>3.</p><p>0</p><p>3.</p><p>4</p><p>(b) Exchange rates for 1996−1998</p><p>Time (years)</p><p>E</p><p>xc</p><p>ha</p><p>ng</p><p>e</p><p>ra</p><p>te</p><p>in</p><p>$</p><p>N</p><p>Z</p><p>/</p><p>po</p><p>un</p><p>d</p><p>1996.0 1996.5 1997.0 1997.5 1998.0</p><p>2.</p><p>4</p><p>2.</p><p>8</p><p>Fig. 1.10. Quarterly exchange rates for two periods. The plots indicate that without</p><p>additional information it would be inappropriate to extrapolate the trends.</p><p>continuation of the downward trend for future years. However, this would have</p><p>been a very poor prediction, as Figure 1.10(b) shows that the data started to</p><p>follow an increasing trend. Likewise, without additional information, it would</p><p>also be inadvisable to extrapolate the trend in Figure 1.10(b). This illustrates</p><p>the potential pitfall of inappropriate extrapolation of stochastic trends when</p><p>underlying causes are not properly understood. To reduce the risk of making</p><p>an inappropriate forecast, statistical tests, introduced in Chapter 7, can be</p><p>used to test for a stochastic trend.</p><p>1.4.5 Global temperature series</p><p>A change in the world’s climate will have a major impact on the lives of</p><p>many people, as global warming is likely to lead to an increase in ocean levels</p><p>and natural hazards such as floods and droughts. It is likely that the world</p><p>economy will be severely affected as governments from around the globe try</p><p>1.4 Plots, trends, and seasonal variation 17</p><p>to enforce a reduction in fossil fuel use and measures are taken to deal with</p><p>any increase in natural disasters.8</p><p>In climate change studies (e.g., see Jones and Moberg, 2003; Rayner et al.</p><p>2003), the following global temperature series, expressed as anomalies from</p><p>the monthly means over the period 1961–1990, plays a central role:9</p><p>> www Global Global.ts Global.annual plot(Global.ts)</p><p>> plot(Global.annual)</p><p>It is the trend that is of most concern, so the aggregate function is used</p><p>to remove any seasonal effects within each year and produce an annual series</p><p>of mean temperatures for the period 1856 to 2005 (Fig. 1.11b). We can avoid</p><p>explicitly dividing by 12 if we specify FUN=mean in the aggregate function.</p><p>The upward trend from about 1970 onwards has been used as evidence</p><p>of global warming (Fig. 1.12). In the code below, the monthly time inter-</p><p>vals corresponding to the 36-year period 1970–2005 are extracted using the</p><p>time function and the associated observed temperature series extracted using</p><p>window. The data are plotted and a line superimposed using a regression of</p><p>temperature on the new time index (Fig. 1.12).</p><p>> New.series New.time plot(New.series); abline(reg=lm(New.series ~ New.time))</p><p>In the previous section, we discussed a potential pitfall of inappropriate</p><p>extrapolation. In climate change studies, a vital question is whether rising</p><p>temperatures are a consequence of human activity, specifically the burning</p><p>of fossil fuels and increased greenhouse gas emissions, or are a natural trend,</p><p>perhaps part of a longer cycle, that may decrease in the future without needing</p><p>a global reduction in the use of fossil fuels. We cannot attribute the increase in</p><p>global temperature to the increasing use of fossil fuels without invoking some</p><p>physical explanation10 because, as we noted in §1.4.3, two unrelated time</p><p>series will be correlated if they both contain a trend. However, as the general</p><p>consensus among scientists is that the trend in the global temperature series is</p><p>related to a global increase in greenhouse gas emissions, it seems reasonable to</p><p>8 For general policy documents and discussions on climate change, see the website</p><p>(and links) for the United Nations Framework Convention on Climate Change at</p><p>http://unfccc.int.</p><p>9 The data are updated regularly and can be downloaded free of charge from the</p><p>Internet at: http://www.cru.uea.ac.uk/cru/data/.</p><p>10 For example, refer to US Energy Information Administration at</p><p>http://www.eia.doe.gov/emeu/aer/inter.html.</p><p>18 1 Time Series Data</p><p>Time</p><p>te</p><p>m</p><p>pe</p><p>ra</p><p>tu</p><p>re</p><p>in</p><p>o C</p><p>1900 1950 2000</p><p>−</p><p>1.</p><p>0</p><p>0.</p><p>0</p><p>(a) Monthly series: January 1856 to December 2005</p><p>Time</p><p>te</p><p>m</p><p>pe</p><p>ra</p><p>tu</p><p>re</p><p>in</p><p>o C</p><p>1900 1950 2000</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>(b) Mean annual series: 1856 to 2005</p><p>Fig. 1.11. Time plots of the global temperature series (oC).</p><p>Time</p><p>te</p><p>m</p><p>pe</p><p>ra</p><p>tu</p><p>re</p><p>in</p><p>o C</p><p>1970 1975 1980 1985 1990 1995 2000 2005</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Fig. 1.12. Rising mean global temperatures, January 1970–December 2005. Ac-</p><p>cording to the United Nations Framework Convention on Climate Change, the mean</p><p>global temperature is expected to continue to rise in the future unless greenhouse</p><p>gas emissions are reduced on a global scale.</p><p>1.5 Decomposition of series 19</p><p>acknowledge</p><p>a causal relationship and to expect the mean global temperature</p><p>to continue to rise if greenhouse gas emissions are not reduced.11</p><p>1.5 Decomposition of series</p><p>1.5.1 Notation</p><p>So far, our analysis has been restricted to plotting the data and looking for</p><p>features such as trend and seasonal variation. This is an important first step,</p><p>but to progress we need to fit time series models, for which we require some</p><p>notation. We represent a time series of length n by {xt : t = 1, . . . , n} =</p><p>{x1, x2, . . . , xn}. It consists of n values sampled at discrete times 1, 2, . . . , n.</p><p>The notation will be abbreviated to {xt} when the length n of the series</p><p>does not need to be specified. The time series model is a sequence of random</p><p>variables, and the observed time series is considered a realisation from the</p><p>model. We use the same notation for both and rely on the context to make</p><p>the distinction.12 An overline is used for sample means:</p><p>x̄ =</p><p>∑</p><p>xi/n (1.1)</p><p>The ‘hat’ notation will be used to represent a prediction or forecast. For</p><p>example, with the series {xt : t = 1, . . . , n}, x̂t+k|t is a forecast made at time</p><p>t for a future value at time t + k. A forecast is a predicted future value, and</p><p>the number of time steps into the future is the lead time (k). Following our</p><p>convention for time series notation, x̂t+k|t can be the random variable or the</p><p>numerical value, depending on the context.</p><p>1.5.2 Models</p><p>As the first two examples showed, many series are dominated by a trend</p><p>and/or seasonal effects, so the models in this section are based on these com-</p><p>ponents. A simple additive decomposition model is given by</p><p>xt = mt + st + zt (1.2)</p><p>where, at time t, xt is the observed series, mt is the trend, st is the seasonal</p><p>effect, and zt is an error term that is, in general, a sequence of correlated</p><p>random variables with mean zero. In this section, we briefly outline two main</p><p>approaches for extracting the trend mt and the seasonal effect st in Equation</p><p>(1.2) and give the main R functions for doing this.</p><p>11 Refer to http://unfccc.int.</p><p>12 Some books do distinguish explicitly by using lowercase for the time series and</p><p>uppercase for the model.</p><p>20 1 Time Series Data</p><p>If the seasonal effect tends to increase as the trend increases, a multiplica-</p><p>tive model may be more appropriate:</p><p>xt = mt · st + zt (1.3)</p><p>If the random variation is modelled by a multiplicative factor and the variable</p><p>is positive, an additive decomposition model for log(xt) can be used:13</p><p>log(xt) = mt + st + zt (1.4)</p><p>Some care is required when the exponential function is applied to the predicted</p><p>mean of log(xt) to obtain a prediction for the mean value xt, as the effect is</p><p>usually to bias the predictions. If the random series zt are normally distributed</p><p>with mean 0 and variance σ2, then the predicted mean value at time t based</p><p>on Equation (1.4) is given by</p><p>x̂t = emt+ste</p><p>1</p><p>2 σ2</p><p>(1.5)</p><p>However, if the error series is not normally distributed and is negatively</p><p>skewed,14 as is often the case after taking logarithms, the bias correction</p><p>factor will be an overcorrection (Exercise 4) and it is preferable to apply an</p><p>empirical adjustment (which is discussed further in Chapter 5). The issue is</p><p>of practical importance. For example, if we make regular financial forecasts</p><p>without applying an adjustment, we are likely to consistently underestimate</p><p>mean costs.</p><p>1.5.3 Estimating trends and seasonal effects</p><p>There are various ways to estimate the trend mt at time t, but a relatively</p><p>simple procedure, which is available in R and does not assume any specific</p><p>form is to calculate a moving average centred on xt. A moving average is</p><p>an average of a specified number of time series values around each value in</p><p>the time series, with the exception of the first few and last few terms. In this</p><p>context, the length of the moving average is chosen to average out the seasonal</p><p>effects, which can be estimated later. For monthly series, we need to average</p><p>twelve consecutive months, but there is a slight snag. Suppose our time series</p><p>begins at January (t = 1) and we average January up to December (t = 12).</p><p>This average corresponds to a time t = 6.5, between June and July. When we</p><p>come to estimate seasonal effects, we need a moving average at integer times.</p><p>This can be achieved by averaging the average of January up to December</p><p>and the average of February (t = 2) up to January (t = 13). This average of</p><p>13 To be consistent with R, we use log for the natural logarithm, which is often</p><p>written ln.</p><p>14 A probability distribution is negatively skewed if its density has a long tail to the</p><p>left.</p><p>1.5 Decomposition of series 21</p><p>two moving averages corresponds to t = 7, and the process is called centring.</p><p>Thus the trend at time t can be estimated by the centred moving average</p><p>m̂t =</p><p>1</p><p>2xt−6 + xt−5 + . . .+ xt−1 + xt + xt+1 + . . .+ xt+5 + 1</p><p>2xt+6</p><p>12</p><p>(1.6)</p><p>where t = 7, . . . , n − 6. The coefficients in Equation (1.6) for each month</p><p>are 1/12 (or sum to 1/12 in the case of the first and last coefficients), so that</p><p>equal weight is given to each month and the coefficients sum to 1. By using the</p><p>seasonal frequency for the coefficients in the moving average, the procedure</p><p>generalises for any seasonal frequency (e.g., quarterly series), provided the</p><p>condition that the coefficients sum to unity is still met.</p><p>An estimate of the monthly additive effect (st) at time t can be obtained</p><p>by subtracting m̂t:</p><p>ŝt = xt − m̂t (1.7)</p><p>By averaging these estimates of the monthly effects for each month, we obtain</p><p>a single estimate of the effect for each month. If the period of the time series</p><p>is a whole number of years, the number of monthly effects averaged for each</p><p>month is one less than the number of years of record. At this stage, the twelve</p><p>monthly additive components should have an average value close to, but not</p><p>usually exactly equal to, zero. It is usual to adjust them by subtracting this</p><p>mean so that they do average zero. If the monthly effect is multiplicative, the</p><p>estimate is given by division; i.e., ŝt = xt/m̂t. It is usual to adjust monthly</p><p>multiplicative factors so that they average unity. The procedure generalises,</p><p>using the same principle, to any seasonal frequency.</p><p>It is common to present economic indicators, such as unemployment per-</p><p>centages, as seasonally adjusted series. This highlights any trend that might</p><p>otherwise be masked by seasonal variation attributable, for instance, to the</p><p>end of the academic year, when school and university leavers are seeking work.</p><p>If the seasonal effect is additive, a seasonally adjusted series is given by xt− s̄t,</p><p>whilst if it is multiplicative, an adjusted series is obtained from xt/s̄t, where</p><p>s̄t is the seasonally adjusted mean for the month corresponding to time t.</p><p>1.5.4 Smoothing</p><p>The centred moving average is an example of a smoothing procedure that is</p><p>applied retrospectively to a time series with the objective of identifying an un-</p><p>derlying signal or trend. Smoothing procedures can, and usually do, use points</p><p>before and after the time at which the smoothed estimate is to be calculated.</p><p>A consequence is that the smoothed series will have some points missing at</p><p>the beginning and the end unless the smoothing algorithm is adapted for the</p><p>end points.</p><p>A second smoothing algorithm offered by R is stl. This uses a locally</p><p>weighted regression technique known as loess. The regression, which can be</p><p>a line or higher polynomial, is referred to as local because it uses only some</p><p>22 1 Time Series Data</p><p>relatively small number of points on either side of the point at which the</p><p>smoothed estimate is required. The weighting reduces the influence of outlying</p><p>points and is an example of robust regression. Although the principles behind</p><p>stl are straightforward, the details are quite complicated.</p><p>Smoothing procedures such as the centred moving average and loess do</p><p>not require a predetermined model, but they do not produce a formula that</p><p>can be extrapolated to give forecasts. Fitting a line to model a linear trend</p><p>has an advantage in this respect.</p><p>The term filtering is</p>
- cms_files_48621_1671471958Tendencias_de_CS_e_CX_para_2023_por_SenseData
- Historia_republica12 (78)
- Caderno-de-questoes-Biologia_pag211
- ATV1.ECONOMETRIA
- CEA-Questoes_pag74
- Caderno-de-questoes-Biologia_pag211
- Apostila_de_Exercícios_de_Raciocínio (151)
- A_Lista_de_Exerci_cios_de_QO_II
- Econometria_I_Exercicios_para_revisao_e
- A Voz do Silêncio - Helena P
- atividades modulo 2
- ECONOMETRIA
- Estimação por MQO em Painéis
- II- Entre os automatizados, os mais utilizados são Liqui- Prep, Gel hidroalcoólico, entre outros. III- Os métodos não automatizados incluem o ThinP...
- A presença de autocorrelação nos resíduos gera diversos problemas. A respeito dos pressupostos desse problema de autocorrelação, considere as asser...
- 5) "O BSC é capaz de traduzir as estratégias e serve como comunicador do desempenho. Nesse sistema, a comunicação é feita por uma estrutura lógica,...
- Observe o quadro a seguir nesse caso , é possível verificar entre as duas variáveis uma relação
- 10 Marcar para revisão A partir de 1930, com a posse do presidente Getulio Vargas, O Estado passa a se configurar como agente do desenvolvimento. D...
- estudamos algoritimo preditivo com base em ( 1 )modelo de regressão linear e ( 2 ) modelos de regressão logistica
- II- Entre os automatizados, os mais utilizados são Liqui- Prep, Gel hidroalcoólico, entre outros. III- Os métodos não automatizados incluem o ThinP...
- Dona Arminda é mãe de 4 filhos. Cada um de seus filhos teve 3 filhos. Cada um de seus netos teve 2 filhos. Considerando que todos estão vivos, o nú...
- Assinale a principal e mais comum preocupacao de modelos de forma refusing:
- Os Agentes Comunitários de Saúde por conhecerem geograficamente bem a região onde atual e estarem inseridos na comunidade são considerados O eld en...
- I. O processo xt é trivialmente estacionário (também chamado de estacionariedade fraca) e é inversível somente para θ < 1. II. yt segue um process...
- O estudo de séries temporais envolve reconhecer os padrões das séries ruído branco, passeio aleatório sem drift, passeio aleatório com drift e quai...
- Acerca da estacionariedade de séries temporais, é correto afirmar que: A estacionariedade fraca não é necessária para a modelagem em séries tempora...
- Port 6 - Vanessa Garcia - Cálculo Integral II
- Matemática do Vigor - Apostila de exercícios
Perguntas dessa disciplina
Grátis
Grátis