Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (2025)

USP-SP

José Antonio de Souza 23/10/2024

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (3)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (4)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (5)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (6)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (7)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (8)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (9)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (10)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (11)

Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (12)

Prévia do material em texto

<p>Use R!</p><p>Advisors:</p><p>Robert Gentleman</p><p>Kurt Hornik</p><p>Giovanni Parmigiani</p><p>For other titles published in this series, go to</p><p>http://www.springer.com/series/6991</p><p>Paul S.P. Cowpertwait · Andrew V. Metcalfe</p><p>Introductory Time Series</p><p>with R</p><p>123</p><p>Paul S.P. Cowpertwait</p><p>Inst. Information and</p><p>Mathematical Sciences</p><p>Massey University</p><p>Auckland</p><p>Albany Campus</p><p>New Zealand</p><p>p.s.cowpertwait@massey.ac.nz</p><p>Andrew V. Metcalfe</p><p>School of Mathematical</p><p>Sciences</p><p>University of Adelaide</p><p>Adelaide SA 5005</p><p>Australia</p><p>andrew.metcalfe@adelaide.edu.au</p><p>Series Editors</p><p>Robert Gentleman</p><p>Program in Computational Biology</p><p>Division of Public Health Sciences</p><p>Fred Hutchinson Cancer Research Center</p><p>1100 Fairview Avenue, N. M2-B876</p><p>Seattle, Washington 98109</p><p>USA</p><p>Giovanni Parmigiani</p><p>The Sidney Kimmel Comprehensive Cancer</p><p>Center at Johns Hopkins University</p><p>550 North Broadway</p><p>Baltimore, MD 21205-2011</p><p>USA</p><p>Kurt Hornik</p><p>Department of Statistik and Mathematik</p><p>Wirtschaftsuniversität Wien Augasse 2-6</p><p>A-1090 Wien</p><p>Austria</p><p>ISBN 978-0-387-88697-8 e-ISBN 978-0-387-88698-5</p><p>DOI 10.1007/978-0-387-88698-5</p><p>Springer Dordrecht Heidelberg London New York</p><p>Library of Congress Control Number: 2009928496</p><p>c© Springer Science+Business Media, LLC 2009</p><p>All rights reserved. This work may not be translated or copied in whole or in part without the written</p><p>permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,</p><p>NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in</p><p>connection with any form of information storage and retrieval, electronic adaptation, computer</p><p>software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.</p><p>The use in this publication of trade names, trademarks, service marks, and similar terms, even if</p><p>they are not identified as such, is not to be taken as an expression of opinion as to whether or not</p><p>they are subject to proprietary rights.</p><p>Printed on acid-free paper</p><p>Springer is part of Springer Science+Business Media (www.springer.com)</p><p>In memory of Ian Cowpertwait</p><p>Preface</p><p>R has a command line interface that offers considerable advantages over menu</p><p>systems in terms of efficiency and speed once the commands are known and the</p><p>language understood. However, the command line system can be daunting for</p><p>the first-time user, so there is a need for concise texts to enable the student or</p><p>analyst to make progress with R in their area of study. This book aims to fulfil</p><p>that need in the area of time series to enable the non-specialist to progress,</p><p>at a fairly quick pace, to a level where they can confidently apply a range of</p><p>time series methods to a variety of data sets. The book assumes the reader</p><p>has a knowledge typical of a first-year university statistics course and is based</p><p>around lecture notes from a range of time series courses that we have taught</p><p>over the last twenty years. Some of this material has been delivered to post-</p><p>graduate finance students during a concentrated six-week course and was well</p><p>received, so a selection of the material could be mastered in a concentrated</p><p>course, although in general it would be more suited to being spread over a</p><p>complete semester.</p><p>The book is based around practical applications and generally follows a</p><p>similar format for each time series model being studied. First, there is an</p><p>introductory motivational section that describes practical reasons why the</p><p>model may be needed. Second, the model is described and defined in math-</p><p>ematical notation. The model is then used to simulate synthetic data using</p><p>R code that closely reflects the model definition and then fitted to the syn-</p><p>thetic data to recover the underlying model parameters. Finally, the model</p><p>is fitted to an example historical data set and appropriate diagnostic plots</p><p>given. By using R, the whole procedure can be reproduced by the reader,</p><p>and it is recommended that students work through most of the examples.1</p><p>Mathematical derivations are provided in separate frames and starred sec-</p><p>1 We used the R package Sweave to ensure that, in general, your code will produce</p><p>the same output as ours. However, for stylistic reasons we sometimes edited our</p><p>code; e.g., for the plots there will sometimes be minor differences between those</p><p>generated by the code in the text and those shown in the actual figures.</p><p>vii</p><p>viii Preface</p><p>tions and can be omitted by those wanting to progress quickly to practical</p><p>applications. At the end of each chapter, a concise summary of the R com-</p><p>mands that were used is given followed by exercises. All data sets used in</p><p>the book, and solutions to the odd numbered exercises, are available on the</p><p>website http://www.massey.ac.nz/∼pscowper/ts.</p><p>We thank John Kimmel of Springer and the anonymous referees for their</p><p>helpful guidance and suggestions, Brian Webby for careful reading of the text</p><p>and valuable comments, and John Xie for useful comments on an earlier draft.</p><p>The Institute of Information and Mathematical Sciences at Massey Univer-</p><p>sity and the School of Mathematical Sciences, University of Adelaide, are</p><p>acknowledged for support and funding that made our collaboration possible.</p><p>Paul thanks his wife, Sarah, for her continual encouragement and support</p><p>during the writing of this book, and our son, Daniel, and daughters, Lydia</p><p>and Louise, for the joy they bring to our lives. Andrew thanks Natalie for</p><p>providing inspiration and her enthusiasm for the project.</p><p>Paul Cowpertwait and Andrew Metcalfe</p><p>Massey University, Auckland, New Zealand</p><p>University of Adelaide, Australia</p><p>December 2008</p><p>Contents</p><p>Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii</p><p>1 Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1</p><p>1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1</p><p>1.2 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2</p><p>1.3 R language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3</p><p>1.4 Plots, trends, and seasonal variation . . . . . . . . . . . . . . . . . . . . . . . 4</p><p>1.4.1 A flying start: Air passenger bookings . . . . . . . . . . . . . . . . 4</p><p>1.4.2 Unemployment: Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7</p><p>1.4.3 Multiple time series: Electricity, beer and chocolate data 10</p><p>1.4.4 Quarterly exchange rate: GBP to NZ dollar . . . . . . . . . . . 14</p><p>1.4.5 Global temperature series . . . . . . . . . . . . . . . . . . . . . . . . . . 16</p><p>1.5 Decomposition of series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>1.5.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>1.5.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19</p><p>1.5.3 Estimating trends and seasonal effects . . . . . . . . . . . . . . . 20</p><p>1.5.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21</p><p>1.5.5 Decomposition in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22</p><p>1.6 Summary of commands used in examples . . . . . . . . . . . . . . . . . . . 24</p><p>1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24</p><p>2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.2 Expectation and the ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.2.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27</p><p>2.2.2 The ensemble and stationarity . . . . . . . . . . . . . . . . . . . . . . 30</p><p>2.2.3 Ergodic series* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31</p><p>2.2.4 Variance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .</p><p>also used for smoothing, particularly in the engi-</p><p>neering literature. A more specific use of the term filtering is the process of</p><p>obtaining the best estimate of some variable now, given the latest measure-</p><p>ment of it and past measurements. The measurements are subject to random</p><p>error and are described as being corrupted by noise. Filtering is an important</p><p>part of control algorithms which have a myriad of applications. An exotic ex-</p><p>ample is the Huygens probe leaving the Cassini orbiter to land on Saturn’s</p><p>largest moon, Titan, on January 14, 2005.</p><p>1.5.5 Decomposition in R</p><p>In R, the function decompose estimates trends and seasonal effects using</p><p>a moving average method. Nesting the function within plot (e.g., using</p><p>plot(stl())) produces a single figure showing the original series xt and the</p><p>decomposed series mt, st, and zt. For example, with the electricity data, addi-</p><p>tive and multiplicative decomposition plots are given by the commands below;</p><p>the last plot, which uses lty to give different line types, is the superposition</p><p>of the seasonal effect on the trend (Fig. 1.13).</p><p>Time</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>Fig. 1.13. Electricity production data: trend with superimposed multiplicative sea-</p><p>sonal effects.</p><p>1.5 Decomposition of series 23</p><p>> plot(decompose(Elec.ts))</p><p>> Elec.decom plot(Elec.decom)</p><p>> Trend Seasonal ts.plot(cbind(Trend, Trend * Seasonal), lty = 1:2)</p><p>20</p><p>00</p><p>80</p><p>00</p><p>14</p><p>00</p><p>0</p><p>ob</p><p>se</p><p>rv</p><p>ed</p><p>20</p><p>00</p><p>80</p><p>00</p><p>tr</p><p>en</p><p>d</p><p>0.</p><p>90</p><p>1.</p><p>00</p><p>1.</p><p>10</p><p>se</p><p>as</p><p>on</p><p>al</p><p>0.</p><p>94</p><p>1.</p><p>00</p><p>1.</p><p>06</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>ra</p><p>nd</p><p>om</p><p>Time</p><p>Decomposition of multiplicative time series</p><p>Fig. 1.14. Decomposition of the electricity production data.</p><p>In this example, the multiplicative model would seem more appropriate</p><p>than the additive model because the variance of the original series and trend</p><p>increase with time (Fig. 1.14). However, the random component, which cor-</p><p>responds to zt, also has an increasing variance, which indicates that a log-</p><p>transformation (Equation (1.4)) may be more appropriate for this series (Fig.</p><p>1.14). The random series obtained from the decompose function is not pre-</p><p>cisely a realisation of the random process zt but rather an estimate of that</p><p>realisation. It is an estimate because it is obtained from the original time</p><p>series using estimates of the trend and seasonal effects. This estimate of the</p><p>realisation of the random process is a residual error series. However, we treat</p><p>it as a realisation of the random process.</p><p>There are many other reasonable methods for decomposing time series,</p><p>and we cover some of these in Chapter 5 when we study regression methods.</p><p>24 1 Time Series Data</p><p>1.6 Summary of commands used in examples</p><p>read.table reads data into a data frame</p><p>attach makes names of column variables available</p><p>ts produces a time series object</p><p>aggregate creates an aggregated series</p><p>ts.plot produces a time plot for one or more series</p><p>window extracts a subset of a time series</p><p>time extracts the time from a time series object</p><p>ts.intersect creates the intersection of one or more time series</p><p>cycle returns the season for each value in a series</p><p>decompose decomposes a series into the components</p><p>trend, seasonal effect, and residual</p><p>stl decomposes a series using loess smoothing</p><p>summary summarises an R object</p><p>1.7 Exercises</p><p>1. Carry out the following exploratory time series analysis in R using either</p><p>the chocolate or the beer production data from §1.4.3.</p><p>a) Produce a time plot of the data. Plot the aggregated annual series and</p><p>a boxplot that summarises the observed values for each season, and</p><p>comment on the plots.</p><p>b) Decompose the series into the components trend, seasonal effect, and</p><p>residuals, and plot the decomposed series. Produce a plot of the trend</p><p>with a superimposed seasonal effect.</p><p>2. Many economic time series are based on indices. A price index is the</p><p>ratio of the cost of a basket of goods now to its cost in some base year.</p><p>In the Laspeyre formulation, the basket is based on typical purchases in</p><p>the base year. You are asked to calculate an index of motoring cost from</p><p>the following data. The clutch represents all mechanical parts, and the</p><p>quantity allows for this.</p><p>item quantity ’00 unit price ’00 quantity ’04 unit price ’04</p><p>(i) (qi0) (pi0) (qit) (pit)</p><p>car 0.33 18 000 0.5 20 000</p><p>petrol (litre) 2 000 0.80 1 500 1.60</p><p>servicing (h) 40 40 20 60</p><p>tyre 3 80 2 120</p><p>clutch 2 200 1 360</p><p>The Laspeyre Price Index at time t relative to base year 0 is</p><p>LIt =</p><p>∑</p><p>qi0pit∑</p><p>qi0pi0</p><p>1.7 Exercises 25</p><p>Calculate the LIt for 2004 relative to 2000.</p><p>3. The Paasche Price Index at time t relative to base year 0 is</p><p>PIt =</p><p>∑</p><p>qitpit∑</p><p>qitpi0</p><p>a) Use the data above to calculate the PIt for 2004 relative to 2000.</p><p>b) Explain why the PIt is usually lower than the LIt.</p><p>c) Calculate the Irving-Fisher Price Index as the geometric mean of LIt</p><p>and PIt. (The geometric mean of a sample of n items is the nth root</p><p>of their product.)</p><p>4. A standard procedure for finding an approximate mean and variance of a</p><p>function of a variable is to use a Taylor expansion for the function about</p><p>the mean of the variable. Suppose the variable is y and that its mean and</p><p>standard deviation are µ and σ respectively.</p><p>φ(y) = φ(µ) + φ′(µ)(y − µ) + φ′′(µ)</p><p>(y − µ)2</p><p>2!</p><p>+ φ′′′(µ)</p><p>(y − µ)3</p><p>3!</p><p>+ . . .</p><p>Consider the case of φ(.) as e(.). By taking the expectation of both sides</p><p>of this equation, explain why the bias correction factor given in Equation</p><p>(1.5) is an overcorrection if the residual series has a negative skewness,</p><p>where the skewness γ of a random variable y is defined by</p><p>γ =</p><p>E</p><p>[</p><p>(y − µ)3</p><p>]</p><p>σ3</p><p>2</p><p>Correlation</p><p>2.1 Purpose</p><p>Once we have identified any trend and seasonal effects, we can deseasonalise</p><p>the time series and remove the trend. If we use the additive decomposition</p><p>method of §1.5, we first calculate the seasonally adjusted time series and</p><p>then remove the trend by subtraction. This leaves the random component,</p><p>but the random component is not necessarily well modelled by independent</p><p>random variables. In many cases, consecutive variables will be correlated. If</p><p>we identify such correlations, we can improve our forecasts, quite dramatically</p><p>if the correlations are high. We also need to estimate correlations if we are</p><p>to generate realistic time series for simulations. The correlation structure of a</p><p>time series model is defined by the correlation function, and we estimate this</p><p>from the observed time series.</p><p>Plots of serial correlation (the ‘correlogram’, defined later) are also used</p><p>extensively in signal processing applications. The paradigm is an underlying</p><p>deterministic signal corrupted by noise. Signals from yachts, ships, aeroplanes,</p><p>and space exploration vehicles are examples. At the beginning of 2007, NASA’s</p><p>twin Voyager spacecraft were sending back radio signals from the frontier of</p><p>our solar system, including evidence of hollows in the turbulent zone near the</p><p>edge.</p><p>2.2 Expectation and the ensemble</p><p>2.2.1 Expected value</p><p>The expected value, commonly abbreviated to expectation, E, of a variable,</p><p>or a function of a variable, is its mean value in a population. So E(x) is the</p><p>mean of x, denoted µ,1 and E</p><p>[</p><p>(x− µ)2</p><p>]</p><p>is the mean of the squared deviations</p><p>1 A more formal definition of the expectation E of a function φ(x, y) of continuous</p><p>random variables x and y, with a joint probability density function f(x, y), is the</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 27</p><p>Use R, DOI 10.1007/978-0-387-88698-5 2,</p><p>© Springer Science+Business Media, LLC 2009</p><p>28 2 Correlation</p><p>about µ, better known as the variance σ2 of x.2 The standard deviation, σ is</p><p>the square root of the variance. If there are two variables (x, y), the variance</p><p>may be generalised to the covariance, γ(x, y). Covariance is defined by</p><p>γ(x, y) = E [(x− µx)(y − µy)] (2.1)</p><p>The covariance is a measure of linear association between two variables</p><p>(x, y). In §1.4.3, we emphasised that a linear association between variables</p><p>does not imply causality.</p><p>Sample estimates are obtained by adding the appropriate function of the</p><p>individual data values and division by n or, in the case of variance and co-</p><p>variance, n− 1, to give unbiased estimators.3 For example, if we have n data</p><p>pairs, (xi, yi), the sample covariance is given by</p><p>Cov(x, y) =</p><p>∑</p><p>(xi − x)(yi − y)/(n− 1) (2.2)</p><p>If the data pairs are plotted, the lines x = x and y = y divide the plot into</p><p>quadrants. Points in the lower left quadrant have both (xi − x) and (yi − y)</p><p>negative, so the product that contributes to the covariance is positive. Points in</p><p>the upper right quadrant also make a positive contribution. In contrast, points</p><p>in the upper left and lower right quadrants make a negative contribution to the</p><p>covariance. Thus, if y tends to increase when x increases, most of the points</p><p>will be in the lower left and upper right quadrants and the covariance will</p><p>be positive. Conversely, if y tends to decrease as x increases, the covariance</p><p>will be negative. If there is no such linear association, the covariance will be</p><p>small relative to the standard deviations of {xi} and {yi} – always check the</p><p>plot in case there is a quadratic association or some other pattern. In R we</p><p>can calculate a sample covariance, with denominator n−1, from its definition</p><p>or by using the function cov. If we use the mean function, we are implicitly</p><p>dividing by n.</p><p>Benzoapyrene is a carcinogenic hydrocarbon that is a product of incom-</p><p>plete combustion. One source of benzoapyrene and carbon monoxide is au-</p><p>tomobile exhaust. Colucci and Begeman (1971) analysed sixteen air samples</p><p>mean value for φ obtained by integrating over all possible values of x and y:</p><p>E [φ(x, y)] =</p><p>∫</p><p>y</p><p>∫</p><p>x</p><p>φ(x, y)f(x, y) dx dy</p><p>Note that the mean of x is obtained as the special case φ(x, y) = x.</p><p>2 For more than one variable, subscripts can be used to distinguish between the</p><p>properties; e.g., for the means we may write µx and µy to distinguish between</p><p>the mean of x and the mean of y.</p><p>3 An estimator is unbiased for a population parameter if its average value, in in-</p><p>finitely repeated samples of size n, equals that population parameter. If an esti-</p><p>mator is unbiased, its value in a particular sample is referred to as an unbiased</p><p>estimate.</p><p>2.2 Expectation and the ensemble 29</p><p>from Herald Square in Manhattan and recorded the carbon monoxide con-</p><p>centration (x, in parts per million) and benzoapyrene concentration (y, in</p><p>micrograms per thousand cubic metres) for each sample. The data are plotted</p><p>in Figure 2.1.</p><p>Fig. 2.1. Sixteen air samples from Herald Square.</p><p>> www Herald.dat attach (Herald.dat)</p><p>We now use R to calculate the covariance for the Herald Square pairs in</p><p>three different ways:</p><p>> x sum((x - mean(x))*(y - mean(y))) / (n - 1)</p><p>[1] 5.51</p><p>> mean((x - mean(x)) * (y - mean(y)))</p><p>[1] 5.17</p><p>> cov(x, y)</p><p>[1] 5.51</p><p>The correspondence between the R code above and the expectation defini-</p><p>tion of covariance should be noted:</p><p>mean((x - mean(x))*(y - mean(y)))→ E [(x− µx)(y − µy)] (2.3)</p><p>●</p><p>●</p><p>●●●●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>5 10 15 20</p><p>0</p><p>2</p><p>4</p><p>6</p><p>8</p><p>CO</p><p>be</p><p>nz</p><p>oa</p><p>py</p><p>re</p><p>ne</p><p>30 2 Correlation</p><p>Given this correspondence, the more natural estimate of covariance would</p><p>be mean((x - mean(x))*(y - mean(y))). However, as can be seen above,</p><p>the values computed using the internal function cov are those obtained using</p><p>sum with a denominator of n − 1. As n gets large, the difference in denomi-</p><p>nators becomes less noticeable and the more natural estimate asymptotically</p><p>approaches the unbiased estimate.4</p><p>Correlation is a dimensionless measure of the linear association between</p><p>a pair of variables (x, y) and is obtained by standardising the covariance by</p><p>dividing it by the product of the standard deviations of the variables. Corre-</p><p>lation takes a value between −1 and +1, with a value of 0 indicating no linear</p><p>association. The population correlation, ρ, between a pair of variables (x, y)</p><p>is defined by</p><p>ρ(x, y) =</p><p>E [(x− µx)(y − µy)]</p><p>σxσy</p><p>=</p><p>γ(x, y)</p><p>σxσy</p><p>(2.4)</p><p>The sample correlation, Cor, is an estimate of ρ and is calculated as</p><p>Cor(x, y) =</p><p>Cov(x, y)</p><p>sd(x)sd(y)</p><p>(2.5)</p><p>In R, the sample correlation for pairs (xi, yi) stored in vectors x and y is</p><p>cor(x,y). A value of +1 or −1 indicates an exact linear association, with the</p><p>(x, y) pairs falling on a straight line of positive or negative slope, respectively.</p><p>The correlation between the CO and benzoapyrene measurements at Herald</p><p>Square is now calculated both from the definition and using cor.</p><p>> cov(x,y) / (sd(x)*sd(y))</p><p>[1] 0.3551</p><p>> cor(x,y)</p><p>[1] 0.3551</p><p>Although the correlation is small, there is nevertheless a physical expla-</p><p>nation for the correlation because both products are a result of incomplete</p><p>combustion. A correlation of 0.36 typically corresponds to a slight visual im-</p><p>pression that y tends to increase as x increases, although the points will be</p><p>well scattered.</p><p>2.2.2 The ensemble and stationarity</p><p>The mean function of a time series model is</p><p>µ(t) = E (xt) (2.6)</p><p>and, in general, is a function of t. The expectation in this definition is an</p><p>average taken across the ensemble of all the possible time series that might</p><p>4 In statistics, asymptotically means as the sample size approaches infinity.</p><p>2.2 Expectation and the ensemble 31</p><p>have been produced by the time series model (Fig. 2.2). The ensemble consti-</p><p>tutes the entire population. If we have a time series model, we can simulate</p><p>more than one time series (see Chapter 4). However, with historical data, we</p><p>usually only have a single time series so all we can do, without assuming a</p><p>mathematical structure for the trend, is to estimate the mean at each sample</p><p>point by the corresponding observed value. In practice, we make estimates of</p><p>any apparent trend and seasonal effects in our data and remove them, using</p><p>decompose for example, to obtain time series of the random component. Then</p><p>time series models with a constant mean will be appropriate.</p><p>If the mean function is constant, we say that the time series model is</p><p>stationary in the mean. The sample estimate of the population mean, µ, is</p><p>the sample mean, x̄:</p><p>x̄ =</p><p>n∑</p><p>t=1</p><p>xt/n (2.7)</p><p>Equation (2.7) does rely on an assumption that a sufficiently long time series</p><p>characterises the hypothetical model. Such models are known as ergodic, and</p><p>the models in this book are all ergodic.</p><p>2.2.3 Ergodic series*</p><p>A time series model that is stationary in the mean is ergodic in the mean if</p><p>the time average for a single time series tends to the ensemble mean as the</p><p>length of the time series increases:</p><p>lim</p><p>n→∞</p><p>∑</p><p>xt</p><p>n</p><p>= µ (2.8)</p><p>This implies that the time average is independent of the starting point. Given</p><p>that we usually only have a single time series, you may wonder how a time</p><p>series model can fail to be ergodic, or why we should want a model that is</p><p>not ergodic. Environmental and economic time series are single realisations of</p><p>a hypothetical time series model, and we simply define the underlying model</p><p>as ergodic.</p><p>There are, however, cases in which we can have many time series arising</p><p>from the same time series model. Suppose we investigate the acceleration at</p><p>the pilot seat of a new design of microlight aircraft in simulated random gusts</p><p>in a wind tunnel. Even if we have built two prototypes to the same design,</p><p>we cannot be certain they will have the same average acceleration response</p><p>because of slight differences in manufacture. In such cases, the number of time</p><p>series is equal to the number of prototypes. Another example is an experiment</p><p>investigating turbulent flows in some complex system. It is possible that we</p><p>will obtain qualitatively different results from different runs because they do</p><p>depend on initial conditions. It would seem better to run an experiment in-</p><p>volving turbulence many times than to run it once for a much longer time.</p><p>The number of runs is the number of time series. It is straightforward to adapt</p><p>32 2 Correlation</p><p>Time</p><p>E</p><p>ns</p><p>em</p><p>bl</p><p>e</p><p>po</p><p>pu</p><p>la</p><p>tio</p><p>n</p><p>t</p><p>Fig. 2.2. An ensemble of time series. The expected value E(xt) at a particular time</p><p>t is the average taken over the entire population.</p><p>a stationary time series model to be non-ergodic by defining the means for</p><p>the individual time series to be from some probability distribution.</p><p>2.2.4 Variance function</p><p>The variance function of a time series model that is stationary in the mean is</p><p>σ2(t) = E</p><p>[</p><p>(xt − µ)2</p><p>]</p><p>(2.9)</p><p>which can, in principle, take a different value at every time t. But we cannot</p><p>estimate a different variance at each time point from a single time series. To</p><p>progress, we must make some simplifying assumption. If we assume the model</p><p>is stationary in the variance, this constant population variance, σ2, can be</p><p>estimated from the sample variance:</p><p>Var(x) =</p><p>∑</p><p>(xt − x)2</p><p>n− 1</p><p>(2.10)</p><p>2.2 Expectation and the ensemble 33</p><p>In a time series analysis, sequential observations may be correlated. If the cor-</p><p>relation is positive, Var(x) will tend to underestimate the population variance</p><p>in a short series because successive observations tend to be relatively similar.</p><p>In most cases, this does not present a problem since the bias decreases rapidly</p><p>as the length n of the series increases.</p><p>2.2.5 Autocorrelation</p><p>The mean and variance play an important role in the study of statistical</p><p>distributions because they summarise two key distributional properties – a</p><p>central location and the spread. Similarly, in the study of time series models,</p><p>a key role is played by the second-order properties, which include the mean,</p><p>variance, and serial correlation (described below).</p><p>Consider a time series model that is stationary in the mean and the vari-</p><p>ance. The variables may be correlated, and the model is second-order sta-</p><p>tionary if the correlation between variables depends only on the number of</p><p>time steps separating them. The number of time steps between the variables</p><p>is known as the lag. A correlation of a variable with itself at different times</p><p>is known as autocorrelation or serial correlation. If a time series model is</p><p>second-order stationary, we can define an autocovariance function (acvf ), γk,</p><p>as a function of the lag k:</p><p>γk = E [(xt − µ)(xt+k − µ)] (2.11)</p><p>The function γk does not depend on t because the expectation, which is across</p><p>the ensemble, is the same at all times t. This definition follows naturally from</p><p>Equation (2.1) by replacing x with xt and y with xt+k and noting that the</p><p>mean µ is the mean of both xt and xt+k. The lag k autocorrelation function</p><p>(acf ), ρk, is defined by</p><p>ρk =</p><p>γk</p><p>σ2</p><p>(2.12)</p><p>It follows from the definition that ρ0 is 1.</p><p>It is possible to set up a second-order stationary time series model that</p><p>has skewness; for example, one that depends on time t. Applications for such</p><p>models are rare, and it is customary to drop the term ‘second-order’ and</p><p>use ‘stationary’ on its own for a time series model that is at least second-</p><p>order stationary. The term strictly stationary is reserved for more rigorous</p><p>conditions.</p><p>The acvf and acf can be estimated from a time series by their sample</p><p>equivalents. The sample acvf, ck, is calculated as</p><p>ck =</p><p>1</p><p>n</p><p>n−k∑</p><p>t=1</p><p>(</p><p>xt − x</p><p>)(</p><p>xt+k − x</p><p>)</p><p>(2.13)</p><p>Note that the autocovariance at lag 0, c0, is the variance calculated with a</p><p>denominator n. Also, a denominator n is used when calculating ck, although</p><p>34 2 Correlation</p><p>only n − k terms are added to form the numerator. Adopting this definition</p><p>constrains all sample autocorrelations to lie between −1 and 1. The sample</p><p>acf is defined as</p><p>rk =</p><p>ck</p><p>c0</p><p>(2.14)</p><p>We will demonstrate the calculations in R using a time series of wave</p><p>heights (mm relative to still water level) measured at the centre of a wave tank.</p><p>The sampling interval is 0.1 second and the record length is 39.7 seconds. The</p><p>waves were generated by a wave maker driven by a pseudo-random signal that</p><p>was programmed to emulate a rough sea. There is no trend and no seasonal</p><p>period, so it is reasonable to suppose the time series is a realisation of a</p><p>stationary process.</p><p>> www wave.dat plot(ts(waveht)) ; plot(ts(waveht[1:60]))</p><p>The upper plot in Figure 2.3 shows the entire time series. There are no outlying</p><p>values. The lower plot is of the first sixty wave heights. We can see that there</p><p>is a tendency for consecutive values to be relatively similar and that the form</p><p>is like a rough sea, with a quasi-periodicity but no fixed frequency.</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>(</p><p>m</p><p>m</p><p>)</p><p>0 100 200 300 400</p><p>−</p><p>50</p><p>0</p><p>50</p><p>0</p><p>(a) Wave height over 39.7 seconds</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>(</p><p>m</p><p>m</p><p>)</p><p>0 10 20 30 40 50 60</p><p>−</p><p>60</p><p>0</p><p>0</p><p>60</p><p>0</p><p>(b) Wave height over 6 seconds</p><p>Fig. 2.3. Wave height at centre of tank sampled at 0.1 second intervals.</p><p>2.3 The correlogram 35</p><p>The autocorrelations of x are stored in the vector acf(x)$acf, with the</p><p>lag k autocorrelation located in acf(x)$acf[k+1]. For example, the lag 1</p><p>autocorrelation for waveht is</p><p>> acf(waveht)$acf[2]</p><p>[1] 0.47</p><p>The first entry, acf(waveht)$acf[1], is r0 and equals 1. A scatter plot, such</p><p>as Figure 2.1 for the Herald Square data, complements the calculation of</p><p>the correlation and alerts us to any non-linear patterns. In a similar way,</p><p>we can draw a scatter plot corresponding to each autocorrelation. For ex-</p><p>ample, for lag 1 we plot(waveht[1:396],waveht[2:397]) to obtain Figure</p><p>2.4. Autocovariances are obtained by adding an argument to acf. The lag 1</p><p>autocovariance is given by</p><p>> acf(waveht, type = c("covariance"))$acf[2]</p><p>[1] 33328</p><p>Fig. 2.4. Wave height pairs separated by a lag of 1.</p><p>2.3 The correlogram</p><p>2.3.1 General discussion</p><p>By default, the acf function produces a plot of rk against k, which is called</p><p>the correlogram. For example, Figure 2.5 gives the correlogram for the wave</p><p>heights obtained from acf(waveht). In general, correlograms have the follow-</p><p>ing features:</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>● ●●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>● ●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●●●</p><p>●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●●</p><p>−500 0 500</p><p>−</p><p>50</p><p>0</p><p>0</p><p>50</p><p>0</p><p>x_t</p><p>x_</p><p>t+</p><p>1</p><p>36 2 Correlation</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.5. Correlogram of wave heights.</p><p>• The x-axis gives the lag (k) and the y-axis gives the autocorrelation (rk) at</p><p>each lag. The unit of lag is the sampling interval, 0.1 second. Correlation</p><p>is dimensionless, so there is no unit for the y-axis.</p><p>• If ρk = 0, the sampling distribution of rk is approximately normal, with a</p><p>mean of −1/n and a variance of 1/n. The dotted lines on the correlogram</p><p>are drawn at</p><p>− 1</p><p>n</p><p>± 2√</p><p>n</p><p>If rk falls outside these lines, we have evidence against the null hypothesis</p><p>that ρk = 0 at the 5% level. However, we should be careful about inter-</p><p>preting multiple hypothesis tests. Firstly, if ρk does equal 0 at all lags k,</p><p>we expect 5% of the estimates, rk, to fall outside the lines. Secondly, the</p><p>rk are correlated, so if one falls outside the lines, the neighbouring ones are</p><p>more likely to be statistically significant. This will become clearer when</p><p>we simulate time series in Chapter 4. In the meantime, it is worth looking</p><p>for statistically significant values at specific lags</p><p>that have some practical</p><p>meaning (for example, the lag that corresponds to the seasonal period,</p><p>when there is one). For monthly series, a significant autocorrelation at lag</p><p>12 might indicate that the seasonal adjustment is not adequate.</p><p>• The lag 0 autocorrelation is always 1 and is shown on the plot. Its inclusion</p><p>helps us compare values of the other autocorrelations relative to the theo-</p><p>retical maximum of 1. This is useful because, if we have a long time series,</p><p>small values of rk that are of no practical consequence may be statistically</p><p>significant. However, some discernment is required to decide what consti-</p><p>tutes a noteworthy autocorrelation from a practical viewpoint. Squaring</p><p>the autocorrelation can help, as this gives the percentage of variability</p><p>explained by a linear relationship between the variables. For example, a</p><p>lag 1 autocorrelation of 0.1 implies that a linear dependency of xt on xt−1</p><p>2.3 The correlogram 37</p><p>would only explain 1% of the variability of xt. It is a common fallacy to</p><p>treat a statistically significant result as important when it has almost no</p><p>practical consequence.</p><p>• The correlogram for wave heights has a well-defined shape that appears</p><p>like a sampled damped cosine function. This is typical of correlograms</p><p>of time series generated by an autoregressive model of order 2. We cover</p><p>autoregressive models in Chapter 4.</p><p>If you look back at the plot of the air passenger bookings, there is a clear</p><p>seasonal pattern and an increasing trend (Fig. 1.1). It is not reasonable to</p><p>claim the time series is a realisation of a stationary model. But, whilst the</p><p>population acf was defined only for a stationary time series model, the sample</p><p>acf can be calculated for any time series, including deterministic signals. Some</p><p>results for deterministic signals are helpful for explaining patterns in the acf</p><p>of time series that we do not consider realisations of some stationary process:</p><p>• If you construct a time series that consists of a trend only, the integers from</p><p>1 up to 1000 for example, the acf decreases slowly and almost linearly from</p><p>1.</p><p>• If you take a large number of cycles of a discrete sinusoidal wave of any</p><p>amplitude and phase, the acf is a discrete cosine function of the same</p><p>period.</p><p>• If you construct a time series that consists of an arbitrary sequence of p</p><p>numbers repeated many times, the correlogram has a dominant spike of</p><p>almost 1 at lag p.</p><p>Usually a trend in the data will show in the correlogram as a slow decay in</p><p>the autocorrelations, which are large and positive due to similar values in the</p><p>series occurring close together in time. This can be seen in the correlogram for</p><p>the air passenger bookings acf(AirPassengers) (Fig. 2.6). If there is seasonal</p><p>variation, seasonal spikes will be superimposed on this pattern. The annual</p><p>cycle appears in the air passenger correlogram as a cycle of the same period</p><p>superimposed on the gradually decaying ordinates of the acf. This gives a</p><p>maximum at a lag of 1 year, reflecting a positive linear relationship between</p><p>pairs of variables (xt, xt+12) separated by 12-month periods. Conversely, be-</p><p>cause the seasonal trend is approximately sinusoidal, values separated by a</p><p>period of 6 months will tend to have a negative relationship. For example,</p><p>higher values tend to occur in the summer months followed by lower values</p><p>in the winter months. A dip in the acf therefore occurs at lag 6 months (or</p><p>0.5 years). Although this is typical for seasonal variation that is approximated</p><p>by a sinusoidal curve, other series may have patterns, such as high sales at</p><p>Christmas, that contribute a single spike to the correlogram.</p><p>2.3.2 Example based on air passenger series</p><p>Although we want to know about trends and seasonal patterns in a time series,</p><p>we do not necessarily rely on the correlogram to identify them. The main use</p><p>38 2 Correlation</p><p>0.0 0.5 1.0 1.5</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>lag (years)</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.6. Correlogram for the air passenger bookings over the period 1949–1960.</p><p>The gradual decay is typical of a time series containing a trend. The peak at 1 year</p><p>indicates seasonal variation.</p><p>of the correlogram is to detect autocorrelations in the time series after we</p><p>have removed an estimate of the trend and seasonal variation. In the code</p><p>below, the air passenger series is seasonally adjusted and the trend removed</p><p>using decompose. To plot the random component and draw the correlogram,</p><p>we need to remember that a consequence of using a centred moving average of</p><p>12 months to smooth the time series, and thereby estimate the trend, is that</p><p>the first six and last six terms in the random component cannot be calculated</p><p>and are thus stored in R as NA. The random component and correlogram are</p><p>shown in Figures 2.7 and 2.8, respectively.</p><p>> data(AirPassengers)</p><p>> AP AP.decom plot(ts(AP.decom$random[7:138]))</p><p>> acf(AP.decom$random[7:138])</p><p>The correlogram in Figure 2.8 suggests either a damped cosine shape that</p><p>is characteristic of an autoregressive model of order 2 (Chapter 4) or that the</p><p>seasonal adjustment has not been entirely effective. The latter explanation is</p><p>unlikely because the decomposition does estimate twelve independent monthly</p><p>indices. If we investigate further, we see that the standard deviation of the</p><p>original series from July until June is 109, the standard deviation of the series</p><p>after subtracting the trend estimate is 41, and the standard deviation after</p><p>seasonal adjustment is just 0.03.</p><p>> sd(AP[7:138])</p><p>2.3 The correlogram 39</p><p>Time</p><p>A</p><p>P</p><p>:r</p><p>an</p><p>do</p><p>m</p><p>0 20 40 60 80 100 120</p><p>0.</p><p>90</p><p>0.</p><p>95</p><p>1.</p><p>00</p><p>1.</p><p>05</p><p>Fig. 2.7. The random component of the air passenger series after removing the</p><p>trend and the seasonal variation.</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>lag (months)</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.8. Correlogram for the random component of air passenger bookings over</p><p>the period 1949–1960.</p><p>[1] 109</p><p>> sd(AP[7:138] - AP.decom$trend[7:138])</p><p>[1] 41.1</p><p>> sd(AP.decom$random[7:138])</p><p>[1] 0.0335</p><p>The reduction in the standard deviation shows that the seasonal adjustment</p><p>has been very effective.</p><p>40 2 Correlation</p><p>2.3.3 Example based on the Font Reservoir series</p><p>Monthly effective inflows (m3s−1) to the Font Reservoir in Northumberland</p><p>for the period from January 1909 until December 1980 have been provided by</p><p>Northumbrian Water PLC. A plot of the data is shown in Figure 2.9. There</p><p>was a slight decreasing trend over this period, and substantial seasonal vari-</p><p>ation. The trend and seasonal variation have been estimated by regression,</p><p>as described in Chapter 5, and the residual series (adflow), which we anal-</p><p>yse here, can reasonably be considered a realisation from a stationary time</p><p>series model. The main difference between the regression approach and us-</p><p>ing decompose is that the former assumes a linear trend, whereas the latter</p><p>smooths the time series without assuming any particular form for the trend.</p><p>The correlogram is plotted in Figure 2.10.</p><p>> www Fontdsdt.dat attach(Fontdsdt.dat)</p><p>> plot(ts(adflow), ylab = 'adflow')</p><p>> acf(adflow, xlab = 'lag (months)', main="")</p><p>Time</p><p>ad</p><p>flo</p><p>w</p><p>0 200 400 600 800</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>1.</p><p>5</p><p>2.</p><p>5</p><p>Fig. 2.9. Adjusted inflows to the Font Reservoir, 1909–1980.</p><p>There is a statistically significant correlation at lag 1. The physical inter-</p><p>pretation is that the inflow next month is more likely than not to be above</p><p>average if the inflow this month is above average. Similarly, if the inflow this</p><p>month is below average it is more likely than not that next month’s inflow</p><p>will be below average. The explanation is that the groundwater supply can be</p><p>thought of as a slowly discharging reservoir. If groundwater is high one month</p><p>it will augment inflows, and is likely to do so next month as well. Given this</p><p>2.4 Covariance of sums of random variables 41</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>lag (months)</p><p>A</p><p>C</p><p>F</p><p>Fig. 2.10. Correlogram for adjusted inflows to the Font Reservoir,</p><p>1909–1980.</p><p>explanation, you may be surprised that the lag 1 correlation is not higher.</p><p>The explanation for this is that most of the inflow is runoff following rainfall,</p><p>and in Northumberland there is little correlation between seasonally adjusted</p><p>rainfall in consecutive months. An exponential decay in the correlogram is</p><p>typical of a first-order autoregressive model (Chapter 4). The correlogram of</p><p>the adjusted inflows is consistent with an exponential decay. However, given</p><p>the sampling errors for a time series of this length, estimates of autocorre-</p><p>lation at higher lags are unlikely to be statistically significant. This is not a</p><p>practical limitation because such low correlations are inconsequential. When</p><p>we come to identify suitable models, we should remember that there is no one</p><p>correct model and that there will often be a choice of suitable models. We</p><p>may make use of a specific statistical criterion such as Akaike’s information</p><p>criterion, introduced in Chapter 5, to choose a model, but this does not imply</p><p>that the model is correct.</p><p>2.4 Covariance of sums of random variables</p><p>In subsequent chapters, second-order properties for several time series models</p><p>are derived using the result shown in Equation (2.15). Let x1, x2, . . . , xn and</p><p>y1, y2, . . . , ym be random variables. Then</p><p>Cov</p><p> n∑</p><p>i=1</p><p>xi,</p><p>m∑</p><p>j=1</p><p>yj</p><p> =</p><p>n∑</p><p>i=1</p><p>m∑</p><p>j=1</p><p>Cov(xi, yj) (2.15)</p><p>where Cov(x, y) is the covariance between a pair of random variables x and</p><p>y. The result tells us that the covariance of two sums of variables is the sum</p><p>42 2 Correlation</p><p>of all possible covariance pairs of the variables. Note that the special case of</p><p>n = m and xi = yi (i = 1, . . . , n) occurs in subsequent chapters for a time</p><p>series {xt}. The proof of Equation (2.15) is left to Exercise 5a.</p><p>2.5 Summary of commands used in examples</p><p>mean returns the mean (average)</p><p>var returns the variance with denominator n− 1</p><p>sd returns the standard deviation</p><p>cov returns the covariance with denominator n− 1</p><p>cor returns the correlation</p><p>acf returns the correlogram (or sets the argument</p><p>to obtain autocovariance function)</p><p>2.6 Exercises</p><p>1. On the book’s website, you will find two small bivariate data sets that are</p><p>not time series. Draw a scatter plot for each set and then calculate the</p><p>correlation. Comment on your results.</p><p>a) The data in the file varnish.dat are the amount of catalyst in a var-</p><p>nish, x, and the drying time of a set volume in a petri dish, y.</p><p>b) The data in the file guesswhat.dat are data pairs. Can you see a</p><p>pattern? Can you guess what they represent?</p><p>2. The following data are the volumes, relative to nominal contents of 750 ml,</p><p>of 16 bottles taken consecutively from the filling machine at the Serendip-</p><p>ity Shiraz vineyard:</p><p>39, 35, 16, 18, 7, 22, 13, 18, 20, 9, −12, −11, −19, −9, −2, 16.</p><p>The following are the volumes, relative to nominal contents of 750 ml, of</p><p>consecutive bottles taken from the filling machine at the Cagey Chardon-</p><p>nay vineyard:</p><p>47, −26, 42, −10, 27, −8, 16, 6, −1, 25, 11, 1, 25, 7, −5, 3</p><p>The data are also available from the website in the file ch2ex2.dat.</p><p>a) Produce time plots of the two time series.</p><p>b) For each time series, draw a lag 1 scatter plot.</p><p>c) Produce the acf for both time series and comment.</p><p>2.6 Exercises 43</p><p>3. Carry out the following exploratory time series analysis using the global</p><p>temperature series from §1.4.5.</p><p>a) Decompose the series into the components trend, seasonal effect, and</p><p>residuals. Plot these components. Would you expect these data to have</p><p>a substantial seasonal component? Compare the standard deviation of</p><p>the original series with the deseasonalised series. Produce a plot of the</p><p>trend with a superimposed seasonal effect.</p><p>b) Plot the correlogram of the residuals (random component) from part</p><p>(a). Comment on the plot, with particular reference to any statistically</p><p>significant correlations.</p><p>4. The monthly effective inflows (m3s−1) to the Font Reservoir are in the file</p><p>Font.dat. Use decompose on the time series and then plot the correlogram</p><p>of the random component. Compare this with Figure 2.10 and comment.</p><p>5. a) Prove Equation (2.15), using the following properties of summation,</p><p>expectation, and covariance:∑n</p><p>i=1 xi</p><p>∑m</p><p>j=1 yj =</p><p>∑n</p><p>i=1</p><p>∑m</p><p>j=1 xiyj</p><p>E [</p><p>∑n</p><p>i=1 xi] =</p><p>∑n</p><p>i=1E (xi)</p><p>Cov (x, y) = E (xy)− E (x)E (y)</p><p>b) By taking n = m = 2 and xi = yi in Equation (2.15), derive the</p><p>well-known result</p><p>Var (x+ y) = Var (x) + Var (y) + 2 Cov (x, y)</p><p>c) Verify the result in part (b) above using R with x and y (CO and</p><p>Benzoa, respectively) taken from §2.2.1.</p><p>3</p><p>Forecasting Strategies</p><p>3.1 Purpose</p><p>Businesses rely on forecasts of sales to plan production, justify marketing de-</p><p>cisions, and guide research. A very efficient method of forecasting one variable</p><p>is to find a related variable that leads it by one or more time intervals. The</p><p>closer the relationship and the longer the lead time, the better this strategy</p><p>becomes. The trick is to find a suitable lead variable. An Australian example</p><p>is the Building Approvals time series published by the Australian Bureau of</p><p>Statistics. This provides valuable information on the likely demand over the</p><p>next few months for all sectors of the building industry. A variation on the</p><p>strategy of seeking a leading variable is to find a variable that is associated</p><p>with the variable we need to forecast and easier to predict.</p><p>In many applications, we cannot rely on finding a suitable leading variable</p><p>and have to try other methods. A second approach, common in marketing,</p><p>is to use information about the sales of similar products in the past. The in-</p><p>fluential Bass diffusion model is based on this principle. A third strategy is</p><p>to make extrapolations based on present trends continuing and to implement</p><p>adaptive estimates of these trends. The statistical technicalities of forecast-</p><p>ing are covered throughout the book, and the purpose of this chapter is to</p><p>introduce the general strategies that are available.</p><p>3.2 Leading variables and associated variables</p><p>3.2.1 Marine coatings</p><p>A leading international marine paint company uses statistics available in the</p><p>public domain to forecast the numbers, types, and sizes of ships to be built</p><p>over the next three years. One source of such information is World Shipyard</p><p>Monitor, which gives brief details of orders in over 300 shipyards. The paint</p><p>company has set up a database of ship types and sizes from which it can</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 45</p><p>Use R, DOI 10.1007/978-0-387-88698-5 3,</p><p>© Springer Science+Business Media, LLC 2009</p><p>46 3 Forecasting Strategies</p><p>forecast the areas to be painted and hence the likely demand for paint. The</p><p>company monitors its market share closely and uses the forecasts for planning</p><p>production and setting prices.</p><p>3.2.2 Building approvals publication</p><p>Building approvals and building activity time series</p><p>The Australian Bureau of Statistics publishes detailed data on building ap-</p><p>provals for each month, and, a few weeks later, the Building Activity Publi-</p><p>cation lists the value of building work done in each quarter. The data in the</p><p>file ApprovActiv.dat are the total dwellings approved per month, averaged</p><p>over the past three months, labelled “Approvals”, and the value of work done</p><p>over the past three months (chain volume measured in millions of Australian</p><p>dollars at the reference year 2004–05 prices), labelled “Activity”, from March</p><p>1996 until September 2006. We start by reading the data into R and then</p><p>construct time series objects and plot the two series on the same graph using</p><p>ts.plot (Fig. 3.1).</p><p>> www Build.dat App.ts Act.ts ts.plot(App.ts, Act.ts, lty = c(1,3))</p><p>Time(quarter)</p><p>D</p><p>w</p><p>el</p><p>lin</p><p>gs</p><p>/m</p><p>on</p><p>th</p><p>;</p><p>bu</p><p>ild</p><p>in</p><p>g</p><p>A</p><p>U</p><p>$</p><p>1996 1998 2000 2002 2004 2006</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>Fig. 3.1. Building approvals (solid line) and building activity (dotted</p><p>line).</p><p>In Figure 3.1, we can see that the building activity tends to lag one quarter</p><p>behind the building approvals, or equivalently that the building approvals ap-</p><p>pear to lead the building activity by a quarter. The cross-correlation function,</p><p>3.2 Leading variables and associated variables 47</p><p>which is abbreviated to ccf, can be used to quantify this relationship. A plot of</p><p>the cross-correlation function against lag is referred to as a cross-correlogram.</p><p>Cross-correlation</p><p>Suppose we have time series models for variables x and y that are stationary</p><p>in the mean and the variance. The variables may each be serially correlated,</p><p>and correlated with each other at different time lags. The combined model is</p><p>second-order stationary if all these correlations depend only on the lag, and</p><p>then we can define the cross covariance function (ccvf ), γk(x, y), as a function</p><p>of the lag, k:</p><p>γk(x, y) = E [(xt+k − µx)(yt − µy)] (3.1)</p><p>This is not a symmetric relationship, and the variable x is lagging variable</p><p>y by k. If x is the input to some physical system and y is the response, the</p><p>cause will precede the effect, y will lag x, the ccvf will be 0 for positive k, and</p><p>there will be spikes in the ccvf at negative lags. Some textbooks define ccvf</p><p>with the variable y lagging when k is positive, but we have used the definition</p><p>that is consistent with R. Whichever way you choose to define the ccvf,</p><p>γk(x, y) = γ−k(y, x) (3.2)</p><p>When we have several variables and wish to refer to the acvf of one rather</p><p>than the ccvf of a pair, we can write it as, for example, γk(x, x). The lag k</p><p>cross-correlation function (ccf ), ρk(x, y), is defined by</p><p>ρk(x, y) =</p><p>γk(x, y)</p><p>σxσy</p><p>. (3.3)</p><p>The ccvf and ccf can be estimated from a time series by their sample</p><p>equivalents. The sample ccvf, ck(x, y), is calculated as</p><p>ck(x, y) =</p><p>1</p><p>n</p><p>n−k∑</p><p>t=1</p><p>(</p><p>xt+k − x</p><p>)(</p><p>yt − y</p><p>)</p><p>(3.4)</p><p>The sample acf is defined as</p><p>rk(x, y) =</p><p>ck(x, y)√</p><p>c0(x, x)c0(y, y)</p><p>(3.5)</p><p>Cross-correlation between building approvals and activity</p><p>The ts.union function binds time series with a common frequency, padding</p><p>with ‘NA’s to the union of their time coverages. If ts.union is used within</p><p>the acf command, R returns the correlograms for the two variables and the</p><p>cross-correlograms in a single figure.</p><p>48 3 Forecasting Strategies</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>App.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>App.ts & Act.ts</p><p>−3.0 −2.0 −1.0 0.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Act.ts & App.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>Act.ts</p><p>Fig. 3.2. Correlogram and cross-correlogram for building approvals and building</p><p>activity.</p><p>> acf(ts.union(App.ts, Act.ts))</p><p>In Figure 3.2, the acfs for x and y are in the upper left and lower right</p><p>frames, respectively, and the ccfs are in the lower left and upper right frames.</p><p>The time unit for lag is one year, so a correlation at a lag of one quarter ap-</p><p>pears at 0.25. If the variables are independent, we would expect 5% of sample</p><p>correlations to lie outside the dashed lines. Several of the cross-correlations</p><p>at negative lags do pass these lines, indicating that the approvals time series</p><p>is leading the activity. Numerical values can be printed using the print()</p><p>function, and are 0.432, 0.494, 0.499, and 0.458 at lags of 0, 1, 2, and 3, re-</p><p>spectively. The ccf can be calculated for any two time series that overlap,</p><p>but if they both have trends or similar seasonal effects, these will dominate</p><p>(Exercise 1). It may be that common trends and seasonal effects are precisely</p><p>what we are looking for, but the population ccf is defined for stationary ran-</p><p>dom processes and it is usual to remove the trend and seasonal effects before</p><p>investigating cross-correlations. Here we remove the trend using decompose,</p><p>which uses a centred moving average of the four quarters (see Fig. 3.3). We</p><p>will discuss the use of ccf in later chapters.</p><p>3.2 Leading variables and associated variables 49</p><p>> app.ran app.ran.ts act.ran act.ran.ts acf (ts.union(app.ran.ts, act.ran.ts))</p><p>> ccf (app.ran.ts, act.ran.ts)</p><p>We again use print() to obtain the following table.</p><p>> print(acf(ts.union(app.ran.ts, act.ran.ts)))</p><p>app.ran.ts act.ran.ts</p><p>1.000 ( 0.00) 0.123 ( 0.00)</p><p>0.422 ( 0.25) 0.704 (-0.25)</p><p>-0.328 ( 0.50) 0.510 (-0.50)</p><p>-0.461 ( 0.75) -0.135 (-0.75)</p><p>-0.400 ( 1.00) -0.341 (-1.00)</p><p>-0.193 ( 1.25) -0.187 (-1.25)</p><p>...</p><p>app.ran.ts act.ran.ts</p><p>0.123 ( 0.00) 1.000 ( 0.00)</p><p>-0.400 ( 0.25) 0.258 ( 0.25)</p><p>-0.410 ( 0.50) -0.410 ( 0.50)</p><p>-0.250 ( 0.75) -0.411 ( 0.75)</p><p>0.071 ( 1.00) -0.112 ( 1.00)</p><p>0.353 ( 1.25) 0.180 ( 1.25)</p><p>...</p><p>The ccf function produces a single plot, shown in Figure 3.4, and again</p><p>shows the lagged relationship. The Australian Bureau of Statistics publishes</p><p>the building approvals by state and by other categories, and specific sectors of</p><p>the building industry may find higher correlations between demand for their</p><p>products and one of these series than we have seen here.</p><p>3.2.3 Gas supply</p><p>Gas suppliers typically have to place orders for gas from offshore fields 24 hours</p><p>ahead. Variation about the average use of gas, for the time of year, depends</p><p>on temperature and, to some extent, humidity and wind speed. Coleman et al.</p><p>(2001) found that the weather accounts for 90% of this variation in the United</p><p>Kingdom. Weather forecasts for the next 24 hours are now quite accurate and</p><p>are incorporated into the forecasting procedure.</p><p>50 3 Forecasting Strategies</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>app.ran.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>app.ran.ts & act.ran.ts</p><p>−3.0 −2.0 −1.0 0.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>act.ran.ts & app.ran.ts</p><p>0.0 1.0 2.0 3.0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>act.ran.ts</p><p>Fig. 3.3. Correlogram and cross-correlogram of the random components of building</p><p>approvals and building activity after using decompose.</p><p>−3 −2 −1 0 1 2 3</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>app.ran.ts & act.ran.ts</p><p>Fig. 3.4. Cross-correlogram of the random components of building approvals and</p><p>building activity after using decompose.</p><p>3.3 Bass model 51</p><p>3.3 Bass model</p><p>3.3.1 Background</p><p>Frank Bass published a paper describing his mathematical model, which quan-</p><p>tified the theory of adoption and diffusion of a new product by society (Rogers,</p><p>1962), in Management Science nearly fifty years ago (Bass, 1969). The mathe-</p><p>matics is straightforward, and the model has been influential in marketing. An</p><p>entrepreneur with a new invention will often use the Bass model when mak-</p><p>ing a case for funding. There is an associated demand for market research, as</p><p>demonstrated, for example, by the Marketing Science Centre at the Univer-</p><p>sity of South Australia becoming the Ehrenberg-Bass Institute for Marketing</p><p>Science in 2005.</p><p>3.3.2 Model definition</p><p>The Bass formula for the number of people, Nt, who have bought a product at</p><p>time t depends on three parameters: the total number of people who eventually</p><p>buy the product, m; the coefficient of innovation, p; and the coefficient of</p><p>imitation, q. The Bass formula is</p><p>Nt+1 = Nt + p(m−Nt) + qNt(m−Nt)/m (3.6)</p><p>According to the model, the increase in sales, Nt+1 −Nt, over the next time</p><p>period is equal to the sum of a fixed proportion p and a time varying proportion</p><p>qNt</p><p>m of people who will eventually buy the product but have not yet done so.</p><p>The rationale for the model is that initial sales will be to people who are</p><p>interested in the novelty of the product, whereas later sales will be to people</p><p>who are drawn to the product after seeing their friends and acquaintances use</p><p>it. Equation (3.6) is a difference equation and its solution is</p><p>Nt = m</p><p>1− e−(p+q)t</p><p>1 + (q/p)e−(p+q)t</p><p>(3.7)</p><p>It is easier to verify this result for the continuous-time version of the model.</p><p>3.3.3 Interpretation of the Bass model*</p><p>One interpretation of the Bass model is that the time from product launch</p><p>until purchase is assumed to have a probability</p><p>distribution that can be</p><p>parametrised in terms of p and q. A plot of sales per time unit against time is</p><p>obtained by multiplying the probability density by the number of people, m,</p><p>who eventually buy the product. Let f(t), F (t), and h(t) be the density, cumu-</p><p>lative distribution function (cdf), and hazard, respectively, of the distribution</p><p>of time until purchase. The definition of the hazard is</p><p>52 3 Forecasting Strategies</p><p>h(t) =</p><p>f(t)</p><p>1− F (t)</p><p>(3.8)</p><p>The interpretation of the hazard is that if it is multiplied by a small time</p><p>increment it gives the probability that a random purchaser who has not yet</p><p>made the purchase will do so in the next small time increment (Exercise 2).</p><p>Then the continuous time model of the Bass formula can be expressed in terms</p><p>of the hazard:</p><p>h(t) = p+ qF (t) (3.9)</p><p>Equation (3.6) is the discrete form of Equation (3.9) (Exercise 2). The solution</p><p>of Equation (3.8), with h(t) given by Equation (3.9), for F (t) is</p><p>F (t) =</p><p>1− e−(p+q)t</p><p>1 + (q/p)e−(p+q)t</p><p>(3.10)</p><p>Two special cases of the distribution are the exponential distribution and lo-</p><p>gistic distribution, which arise when q = 0 and p = 0, respectively. The logistic</p><p>distribution closely resembles the normal distribution (Exercise 3). Cumula-</p><p>tive sales are given by the product of m and F (t). The pdf is the derivative</p><p>of Equation (3.10):</p><p>f(t) =</p><p>(p+ q)2e−(p+q)t</p><p>p</p><p>[</p><p>1 + (q/p)e−(p+q)t</p><p>]2 (3.11)</p><p>Sales per unit time at time t are</p><p>S(t) = mf(t) =</p><p>m(p+ q)2e−(p+q)t</p><p>p</p><p>[</p><p>1 + (q/p)e−(p+q)t</p><p>]2 (3.12)</p><p>The time to peak is</p><p>tpeak =</p><p>log(q)− log(p)</p><p>p+ q</p><p>(3.13)</p><p>3.3.4 Example</p><p>We show a typical Bass curve by fitting Equation (3.12) to yearly sales of</p><p>VCRs in the US home market between 1980 and 1989 (Bass website) using</p><p>the R non-linear least squares function nls. The variable T79 is the year from</p><p>1979, and the variable Tdelt is the time from 1979 at a finer resolution of</p><p>0.1 year for plotting the Bass curves. The cumulative sum function cumsum is</p><p>useful for monitoring changes in the mean level of the process (Exercise 8).</p><p>> T79 Tdelt Sales Cusales Bass.nls summary(Bass.nls)</p><p>3.3 Bass model 53</p><p>Parameters:</p><p>Estimate Std. Error t value Pr(>|t|)</p><p>M 6.798e+04 3.128e+03 21.74 1.10e-07 ***</p><p>P 6.594e-03 1.430e-03 4.61 0.00245 **</p><p>Q 6.381e-01 4.140e-02 15.41 1.17e-06 ***</p><p>Residual standard error: 727.2 on 7 degrees of freedom</p><p>The final estimates for m, p, and q, rounded to two significant places, are</p><p>68000, 0.0066, and 0.64 respectively. The starting values for P and Q are p and</p><p>q for a typical product. We assume the sales figures are prone to error and</p><p>estimate the total sales, m, setting the starting value for M to the recorded</p><p>total sales. The data and fitted curve can be plotted using the code below (see</p><p>Fig. 3.5 and 3.6):</p><p>> Bcoef m p q ngete Bpdf plot(Tdelt, Bpdf, xlab = "Year from 1979",</p><p>ylab = "Sales per year", type='l')</p><p>> points(T79, Sales)</p><p>> Bcdf plot(Tdelt, Bcdf, xlab = "Year from 1979",</p><p>ylab = "Cumulative sales", type='l')</p><p>> points(T79, Cusales)</p><p>Fig. 3.5. Bass sales curve fitted to sales of VCRs in the US home market, 1980–1989.</p><p>0 2 4 6 8 10</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>Year from 1979</p><p>S</p><p>al</p><p>es</p><p>p</p><p>er</p><p>y</p><p>ea</p><p>r</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>54 3 Forecasting Strategies</p><p>Fig. 3.6. Bass cumulative sales curve, obtained as the integral of the sales curve,</p><p>and cumulative sales of VCRs in the US home market, 1980–1989.</p><p>It is easy to fit a curve to past sales data. The importance of the Bass</p><p>curve in marketing is in forecasting, which needs values for the parameters m,</p><p>p, and q. Plausible ranges for the parameter values can be based on published</p><p>data for similar categories of past inventions, and a few examples follow.</p><p>Product m p q Reference</p><p>Typical product - 0.030 0.380 VBM1</p><p>35 mm projectors, 1965–1986 3.37 million 0.009 0.173 Bass2</p><p>Overhead projectors, 1960–1970 0.961 million 0.028 0.311 Bass</p><p>PCs, 1981–2010 3.384 billion 0.001 0.195 Bass</p><p>1Value-Based Management; 2Frank M. Bass, 1999.</p><p>Although the forecasts are inevitably uncertain, they are the best informa-</p><p>tion available when making marketing and investment decisions. A prospectus</p><p>for investors or a report to the management team will typically include a set</p><p>of scenarios based on the most likely, optimistic, and pessimistic sets of pa-</p><p>rameters.</p><p>The basic Bass model does not allow for replacement sales and multiple</p><p>purchases. Extensions of the model that allow for replacement sales, multiple</p><p>purchases, and the effects of pricing and advertising in a competitive market</p><p>have been proposed (for example, Mahajan et al. 2000). However, there are</p><p>several reasons why these refinements may be of less interest to investors than</p><p>you might expect. The first is that the profit margin on manufactured goods,</p><p>such as innovative electronics and pharmaceuticals, will drop dramatically</p><p>once patent protection expires and competitors enter the market. A second</p><p>reason is that successful inventions are often superseded by new technology, as</p><p>0 2 4 6 8 10</p><p>0</p><p>20</p><p>00</p><p>0</p><p>40</p><p>00</p><p>0</p><p>60</p><p>00</p><p>0</p><p>Year from 1979</p><p>C</p><p>um</p><p>ul</p><p>at</p><p>iv</p><p>e</p><p>sa</p><p>le</p><p>s</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>3.4 Exponential smoothing and the Holt-Winters method 55</p><p>VCRs have been by DVD players, and replacement sales are limited. Another</p><p>reason is that many investors are primarily interested in a relatively quick</p><p>return on their money. You are asked to consider Bass models for sales of two</p><p>recent 3G mobile communication devices in Exercise 4.</p><p>3.4 Exponential smoothing & the Holt-Winters method</p><p>3.4.1 Exponential smoothing</p><p>Our objective is to predict some future value xn+k given a past history</p><p>{x1, x2, . . . , xn} of observations up to time n. In this subsection we assume</p><p>there is no systematic trend or seasonal effects in the process, or that these</p><p>have been identified and removed. The mean of the process can change from</p><p>one time step to the next, but we have no information about the likely direction</p><p>of these changes. A typical application is forecasting sales of a well-established</p><p>product in a stable market. The model is</p><p>xt = µt + wt (3.14)</p><p>where µt is the non-stationary mean of the process at time t and wt are</p><p>independent random deviations with a mean of 0 and a standard deviation σ.</p><p>We will follow the notation in R and let at be our estimate of µt. Given that</p><p>there is no systematic trend, an intuitively reasonable estimate of the mean</p><p>at time t is given by a weighted average of our observation at time t and our</p><p>estimate of the mean at time t− 1:</p><p>at = αxt + (1− α)at−1 0</p><p>Strategies</p><p>Equation (3.15), for at, can be rewritten in two other useful ways. Firstly,</p><p>we can write the sum of at−1 and a proportion of the one-step-ahead forecast</p><p>error, xt − at−1,</p><p>at = α(xt − at−1) + at−1 (3.17)</p><p>Secondly, by repeated back substitution we obtain</p><p>at = αxt + α(1− α)xt−1 + α(1− α)2xt−2 + . . . (3.18)</p><p>When written in this form, we see that at is a linear combination of the current</p><p>and past observations, with more weight given to the more recent observations.</p><p>The restriction 0 www Motor.dat Comp.ts plot(Comp.ts, xlab = "Time / months", ylab = "Complaints")</p><p>3.4 Exponential smoothing and the Holt-Winters method 57</p><p>Time (months)</p><p>C</p><p>om</p><p>pl</p><p>ai</p><p>nt</p><p>s</p><p>1996 1997 1998 1999 2000</p><p>5</p><p>10</p><p>15</p><p>20</p><p>25</p><p>30</p><p>35</p><p>Fig. 3.7. Monthly numbers of letters of complaint received by a motoring organi-</p><p>sation.</p><p>There is no evidence of a systematic trend or seasonal effects, so it seems</p><p>reasonable to use exponential smoothing for this time series. Exponential</p><p>smoothing is a special case of the Holt-Winters algorithm, which we intro-</p><p>duce in the next section, and is implemented in R using the HoltWinters</p><p>function with the additional parameters set to 0. If we do not specify a value</p><p>for α, R will find the value that minimises the one-step-ahead prediction error.</p><p>> Comp.hw1 plot(Comp.hw1)</p><p>Holt-Winters exponential smoothing without trend and without seasonal</p><p>component.</p><p>Smoothing parameters:</p><p>alpha: 0.143</p><p>beta : 0</p><p>gamma: 0</p><p>Coefficients:</p><p>[,1]</p><p>a 17.70</p><p>> Comp.hw1$SSE</p><p>[1] 2502</p><p>The estimated value of the mean number of letters of complaint per month</p><p>at the end of 1999 is 17.7. The value of α that gives a minimum SS1PE, of</p><p>2502, is 0.143. We now compare these results with those obtained if we specify</p><p>a value for α of 0.2.</p><p>58 3 Forecasting Strategies</p><p>> Comp.hw2 Comp.hw2</p><p>...</p><p>alpha: 0.2</p><p>beta : 0</p><p>gamma: 0</p><p>Coefficients:</p><p>[,1]</p><p>a 17.98</p><p>> Comp.hw2$SSE</p><p>[1] 2526</p><p>Holt−Winters filtering</p><p>Time</p><p>O</p><p>bs</p><p>er</p><p>ve</p><p>d</p><p>/ F</p><p>itt</p><p>ed</p><p>10 20 30 40</p><p>5</p><p>10</p><p>15</p><p>20</p><p>25</p><p>30</p><p>35</p><p>Fig. 3.8. Monthly numbers of letters and exponentially weighted moving average.</p><p>The estimated value of the mean number of letters of complaint per month</p><p>at the end of 1999 is now 18.0, and the SS1PE has increased slightly to 2526.</p><p>The advantage of letting R estimate a value for α is that it is optimum for a</p><p>practically important criterion, SS1PE, and that it removes the need to make</p><p>a choice. However, the optimum estimate can be close to 0 if we have a long</p><p>time series over a stable period, and this makes the EWMA unresponsive to</p><p>any future change in mean level. From Figure 3.8, it seems that there was a</p><p>decrease in the number of complaints at the start of the period and a slight rise</p><p>towards the end, although this has not yet affected the exponentially weighted</p><p>moving average.</p><p>3.4 Exponential smoothing and the Holt-Winters method 59</p><p>3.4.2 Holt-Winters method</p><p>We usually have more information about the market than exponential smooth-</p><p>ing can take into account. Sales are often seasonal, and we may expect trends</p><p>to be sustained for short periods at least. But trends will change. If we have</p><p>a successful invention, sales will increase initially but then stabilise before de-</p><p>clining as competitors enter the market. We will refer to the change in level</p><p>from one time period to the next as the slope.1 Seasonal patterns can also</p><p>change due to vagaries of fashion and variation in climate, for example. The</p><p>Holt-Winters method was suggested by Holt (1957) and Winters (1960), who</p><p>were working in the School of Industrial Administration at Carnegie Institute</p><p>of Technology, and uses exponentially weighted moving averages to update</p><p>estimates of the seasonally adjusted mean (called the level), slope, and sea-</p><p>sonals.</p><p>The Holt-Winters method generalises Equation (3.15), and the additive</p><p>seasonal form of their updating equations for a series {xt} with period p is</p><p>at = α(xt − st−p) + (1− α)(at−1 + bt−1)</p><p>bt = β(at − at−1) + (1− β)bt−1</p><p>st = γ(xt − at) + (1− γ)st−p</p><p> (3.21)</p><p>where at, bt, and st are the estimated level,2 slope, and seasonal effect at time</p><p>t, and α, β, and γ are the smoothing parameters. The first updating equation</p><p>takes a weighted average of our latest observation, with our existing estimate</p><p>of the appropriate seasonal effect subtracted, and our forecast of the level</p><p>made one time step ago. The one-step-ahead forecast of the level is the sum</p><p>of the estimates of the level and slope at the time of forecast. A typical choice</p><p>of the weight α is 0.2. The second equation takes a weighted average of our</p><p>previous estimate and latest estimate of the slope, which is the difference in</p><p>the estimated level at time t and the estimated level at time t− 1. Note that</p><p>the second equation can only be used after the first equation has been applied</p><p>to get at. Finally, we have another estimate of the seasonal effect, from the</p><p>difference between the observation and the estimate of the level, and we take</p><p>a weighted average of this and the last estimate of the seasonal effect for this</p><p>season, which was made at time t− p. Typical choices of the weights β and γ</p><p>are 0.2. The updating equations can be started with a1 = x1 and initial slope,</p><p>b1, and seasonal effects, s1, . . . , sp, reckoned from experience, estimated from</p><p>the data in some way, or set at 0. The default in R is to use values obtained</p><p>from the decompose procedure.</p><p>The forecasting equation for xn+k made after the observation at time n is</p><p>x̂n+k|n = an + kbn + sn+k−p k ≤ p (3.22)</p><p>1 When describing the Holt-Winters procedure, the R help and many textbooks</p><p>refer to the slope as the trend.</p><p>2 The mean of the process is the sum of the level and the appropriate seasonal</p><p>effect.</p><p>60 3 Forecasting Strategies</p><p>where an is the estimated level and bn is the estimated slope, so an+kbn is the</p><p>expected level at time n+k and sn+k−p is the exponentially weighted estimate</p><p>of the seasonal effect made at time n = k− p. For example, for monthly data</p><p>(p = 12), if time n + 1 occurs in January, then sn+1−12 is the exponentially</p><p>weighted estimate of the seasonal effect for January made in the previous year.</p><p>The forecasting</p><p>equation can be used for lead times between (m−1)p+1 and</p><p>mp, but then the most recent exponentially weighted estimate of the seasonal</p><p>effect available will be sn+k−(m−1)p.</p><p>The Holt-Winters algorithm with multiplicative seasonals is</p><p>an = α</p><p>(</p><p>xn</p><p>sn−p</p><p>)</p><p>+ (1− α)(an−1 + bn−1)</p><p>bn = β(an − an−1) + (1− β)bn−1</p><p>sn = γ</p><p>(</p><p>xn</p><p>an</p><p>)</p><p>+ (1− γ)sn−p</p><p> (3.23)</p><p>The forecasting equation for xn+k made after the observation at time n</p><p>becomes</p><p>x̂n+k|n = (an + kbn)sn+k−p k ≤ p (3.24)</p><p>In R, the function HoltWinters can be used to estimate smoothing param-</p><p>eters for the Holt-Winters model by minimising the one-step-ahead prediction</p><p>errors (SS1PE).</p><p>Sales of Australian wine</p><p>The data in the file wine.dat are monthly sales of Australian wine by category,</p><p>in thousands of litres, from January 1980 until July 1995. The categories are</p><p>fortified white, dry white, sweet white, red, rose, and sparkling. The sweet</p><p>white wine time series is plotted in Figure 3.9, and there is a dramatic increase</p><p>in sales in the second half of the 1980s followed by a reduction to a level well</p><p>above the starting values. The seasonal variation looks as though it would be</p><p>better modelled as multiplicative, and comparison of the SS1PE for the fitted</p><p>models confirms this (Exercise 6). Here we present results for the model with</p><p>multiplicative seasonals only. The Holt-Winters components and fitted values</p><p>are shown in Figures 3.10 and 3.11 respectively.</p><p>> www wine.dat sweetw.ts plot(sweetw.ts, xlab= "Time (months)", ylab = "sales (1000 litres)")</p><p>> sweetw.hw sweetw.hw ; sweetw.hw$coef ; sweetw.hw$SSE</p><p>...</p><p>Smoothing parameters:</p><p>alpha: 0.4107</p><p>beta : 0.0001516</p><p>3.4 Exponential smoothing and the Holt-Winters method 61</p><p>gamma: 0.4695</p><p>...</p><p>> sqrt(sweetw.hw$SSE/length(sweetw))</p><p>[1] 50.04</p><p>> sd(sweetw)</p><p>[1] 121.4</p><p>> plot (sweetw.hw$fitted)</p><p>> plot (sweetw.hw)</p><p>Time(months)</p><p>S</p><p>al</p><p>es</p><p>(</p><p>10</p><p>00</p><p>li</p><p>tr</p><p>es</p><p>)</p><p>1980 1985 1990 1995</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 3.9. Sales of Australian sweet white wine.</p><p>The optimum values for the smoothing parameters, based on minimising</p><p>the one-step ahead prediction errors, are 0.4107, 0.0001516, and 0.4695 for α,</p><p>β, and γ, respectively. It follows that the level and seasonal variation adapt</p><p>rapidly whereas the trend is slow to do so. The coefficients are the estimated</p><p>values of the level, slope, and multiplicative seasonals from January to De-</p><p>cember available at the latest time point (t = n = 187), and these are the</p><p>values that will be used for predictions (Exercise 6). Finally, we have calcu-</p><p>lated the mean square one-step-ahead prediction error, which equals 50, and</p><p>have compared it with the standard deviation of the original time series which</p><p>is 121. The decrease is substantial, but a more testing comparison would be</p><p>with the mean one-step-ahead prediction error if we forecast the next month’s</p><p>sales as equal to this month’s sales (Exercise 6). Also, in Exercise 6 you are</p><p>asked to investigate the performance of the Holt-Winters algorithm if the</p><p>three smoothing parameters are all set equal to 0.2 and if the values for the</p><p>parameters are optimised at each time step.</p><p>62 3 Forecasting Strategies</p><p>10</p><p>0</p><p>40</p><p>0</p><p>xh</p><p>at</p><p>10</p><p>0</p><p>30</p><p>0</p><p>Le</p><p>ve</p><p>l</p><p>0.</p><p>40</p><p>0.</p><p>43</p><p>T</p><p>re</p><p>nd</p><p>0.</p><p>8</p><p>1.</p><p>2</p><p>1985 1990 1995</p><p>S</p><p>ea</p><p>so</p><p>n</p><p>Time</p><p>Fig. 3.10. Sales of Australian white wine: fitted values; level; slope (labelled trend);</p><p>seasonal variation.</p><p>Holt−Winters filtering</p><p>Time</p><p>O</p><p>bs</p><p>er</p><p>ve</p><p>d</p><p>/ F</p><p>itt</p><p>ed</p><p>1985 1990 1995</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 3.11. Sales of Australian white wine and Holt-Winters fitted values.</p><p>3.4.3 Four-year-ahead forecasts for the air passenger data</p><p>The seasonal effect for the air passenger data of §1.4.1 appeared to increase</p><p>with the trend, which suggests that a ‘multiplicative’ seasonal component be</p><p>used in the Holt-Winters procedure. The Holt-Winters fit is impressive – see</p><p>Figure 3.12. The predict function in R can be used with the fitted model to</p><p>make forecasts into the future (Fig. 3.13).</p><p>> AP.hw plot(AP.hw)</p><p>3.4 Exponential smoothing and the Holt-Winters method 63</p><p>> AP.predict ts.plot(AP, AP.predict, lty = 1:2)</p><p>Holt−Winters filtering</p><p>Time</p><p>O</p><p>bs</p><p>er</p><p>ve</p><p>d</p><p>/ F</p><p>itt</p><p>ed</p><p>1950 1952 1954 1956 1958 1960</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 3.12. Holt-Winters fit for air passenger data.</p><p>Time</p><p>1950 1955 1960 1965</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>70</p><p>0</p><p>Fig. 3.13. Holt-Winters forecasts for air passenger data for 1961–1964 shown as</p><p>dotted lines.</p><p>The estimates of the model parameters, which can be obtained from</p><p>AP.hw$alpha, AP.hw$beta, and AP.hw$gamma, are α̂ = 0.274, β̂ = 0.0175,</p><p>and γ̂ = 0.877. It should be noted that the extrapolated forecasts are based</p><p>entirely on the trends in the period during which the model was fitted and</p><p>would be a sensible prediction assuming these trends continue. Whilst the ex-</p><p>64 3 Forecasting Strategies</p><p>trapolation in Figure 3.12 looks visually appropriate, unforeseen events could</p><p>lead to completely different future values than those shown here.</p><p>3.5 Summary of commands used in examples</p><p>nls non-linear least squares fit</p><p>HoltWinters estimates the parameters of the Holt-Winters</p><p>or exponential smoothing model</p><p>predict forecasts future values</p><p>ts.union create the union of two series</p><p>coef extracts the coefficients of a fitted model</p><p>3.6 Exercises</p><p>1. a) Describe the association and calculate the ccf between x and y for k</p><p>equal to 1, 10, and 100.</p><p>> w x y ccf(x, y)</p><p>b) Describe the association between x and y, and calculate the ccf.</p><p>> Time x y</p><p>the sum of n terms of a geometric pro-</p><p>gression tend to a finite sum as n tends to infinity? What is this sum?</p><p>c) Obtain an expression for the sum of the weights in an EWMA if we</p><p>specify a1 = x1 in Equation (3.15).</p><p>d) Suppose xt happens to be a sequence of independent variables with a</p><p>constant mean and a constant variance σ2. What is the variance of at</p><p>if we specify a1 = x1 in Equation (3.15)?</p><p>6. Refer to the sweet white wine sales (§3.4.2).</p><p>a) Use the HoltWinters procedure with α, β and γ set to 0.2 and com-</p><p>pare the SS1PE with the minimum obtained with R.</p><p>b) Use the HoltWinters procedure on the logarithms of sales and com-</p><p>pare SS1PE with that obtained using sales.</p><p>66 3 Forecasting Strategies</p><p>c) What is the SS1PE if you predict next month’s sales will equal this</p><p>month’s sales?</p><p>d) This is rather harder: What is the SS1PE if you find the optimum α,</p><p>β and γ from the data available at each time step before making the</p><p>one-step-ahead prediction?</p><p>7. Continue the following exploratory time series analysis using the global</p><p>temperature series from §1.4.5.</p><p>a) Produce a time plot of the data. Plot the aggregated annual mean</p><p>series and a boxplot that summarises the observed values for each</p><p>season, and comment on the plots.</p><p>b) Decompose the series into the components trend, seasonal effect, and</p><p>residuals, and plot the decomposed series. Produce a plot of the trend</p><p>with a superimposed seasonal effect.</p><p>c) Plot the correlogram of the residuals from question 7b. Comment on</p><p>the plot, explaining any ‘significant’ correlations at significant lags.</p><p>d) Fit an appropriate Holt-Winters model to the monthly data. Explain</p><p>why you chose that particular Holt-Winters model, and give the pa-</p><p>rameter estimates.</p><p>e) Using the fitted model, forecast values for the years 2005–2010. Add</p><p>these forecasts to a time plot of the original series. Under what cir-</p><p>cumstances would these forecasts be valid? What comments of cau-</p><p>tion would you make to an economist or politician who wanted to</p><p>use these forecasts to make statements about the potential impact of</p><p>global warming on the world economy?</p><p>8. A cumulative sum plot is useful for monitoring changes in the mean of a</p><p>process. If we have a time series composed of observations xt at times t</p><p>with a target value of τ , the CUSUM chart is a plot of the cumulative</p><p>sums of the deviations from target, cst, against t. The formula for cst at</p><p>time t is</p><p>cst =</p><p>t∑</p><p>i=1</p><p>(xi − τ)</p><p>The R function cumsum calculates a cumulative sum. Plot the CUSUM for</p><p>the motoring organisation complaints with a target of 18.</p><p>9. Using the motor organisation complaints series, refit the exponential</p><p>smoothing model with weights α = 0.01 and α = 0.99. In each case,</p><p>extract the last residual from the fitted model and verify that the last</p><p>residual satisfies Equation (3.19). Redraw Figure 3.8 using the new values</p><p>of α, and comment on the plots, explaining the main differences.</p><p>4</p><p>Basic Stochastic Models</p><p>4.1 Purpose</p><p>So far, we have considered two approaches for modelling time series. The</p><p>first is based on an assumption that there is a fixed seasonal pattern about a</p><p>trend. We can estimate the trend by local averaging of the deseasonalised data,</p><p>and this is implemented by the R function decompose. The second approach</p><p>allows the seasonal variation and trend, described in terms of a level and slope,</p><p>to change over time and estimates these features by exponentially weighted</p><p>averages. We used the HoltWinters function to demonstrate this method.</p><p>When we fit mathematical models to time series data, we refer to the dis-</p><p>crepancies between the fitted values, calculated from the model, and the data</p><p>as a residual error series. If our model encapsulates most of the deterministic</p><p>features of the time series, our residual error series should appear to be a re-</p><p>alisation of independent random variables from some probability distribution.</p><p>However, we often find that there is some structure in the residual error series,</p><p>such as consecutive errors being positively correlated, which we can use to im-</p><p>prove our forecasts and make our simulations more realistic. We assume that</p><p>our residual error series is stationary, and in Chapter 6 we introduce models</p><p>for stationary time series.</p><p>Since we judge a model to be a good fit if its residual error series appears</p><p>to be a realisation of independent random variables, it seems natural to build</p><p>models up from a model of independent random variation, known as discrete</p><p>white noise. The name ‘white noise’ was coined in an article on heat radiation</p><p>published in Nature in April 1922, where it was used to refer to series that</p><p>contained all frequencies in equal proportions, analogous to white light. The</p><p>term purely random is sometimes used for white noise series. In §4.3 we define a</p><p>fundamental non-stationary model based on discrete white noise that is called</p><p>the random walk. It is sometimes an adequate model for financial series and is</p><p>often used as a standard against which the performance of more complicated</p><p>models can be assessed.</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 67</p><p>Use R, DOI 10.1007/978-0-387-88698-5 4,</p><p>© Springer Science+Business Media, LLC 2009</p><p>68 4 Basic Stochastic Models</p><p>4.2 White noise</p><p>4.2.1 Introduction</p><p>A residual error is the difference between the observed value and the model</p><p>predicted value at time t. If we suppose the model is defined for the variable</p><p>yt and ŷt is the value predicted by the model, the residual error xt is</p><p>xt = yt − ŷt (4.1)</p><p>As the residual errors occur in time, they form a time series: x1, x2, . . . , xn.</p><p>In Chapter 2, we found that features of the historical series, such as the</p><p>trend or seasonal variation, are reflected in the correlogram. Thus, if a model</p><p>has accounted for all the serial correlation in the data, the residual series would</p><p>be serially uncorrelated, so that a correlogram of the residual series would</p><p>exhibit no obvious patterns. This ideal motivates the following definition.</p><p>4.2.2 Definition</p><p>A time series {wt : t = 1, 2, . . . , n} is discrete white noise (DWN) if the</p><p>variables w1, w2, . . . , wn are independent and identically distributed with a</p><p>mean of zero. This implies that the variables all have the same variance σ2</p><p>and Cor(wi, wj) = 0 for all i 6= j. If, in addition, the variables also follow a</p><p>normal distribution (i.e., wt ∼ N(0, σ2)) the series is called Gaussian white</p><p>noise.</p><p>4.2.3 Simulation in R</p><p>A fitted time series model can be used to simulate data. Time series simulated</p><p>using a model are sometimes called synthetic series to distinguish them from</p><p>an observed historical series.</p><p>Simulation is useful for many reasons. For example, simulation can be used</p><p>to generate plausible future scenarios and to construct confidence intervals for</p><p>model parameters (sometimes called bootstrapping). In R, simulation is usu-</p><p>ally straightforward, and most standard statistical distributions are simulated</p><p>using a function that has an abbreviated name for the distribution prefixed</p><p>with an ‘r’ (for ‘random’).1 For example, rnorm(100) is used to simulate 100</p><p>independent standard normal variables, which is equivalent to simulating a</p><p>Gaussian white noise series of length 100 (Fig. 4.1).</p><p>> set.seed(1)</p><p>> w plot(w, type = "l")</p><p>1 Other prefixes are also available to calculate properties for standard distributions;</p><p>e.g., the prefix ‘d’ is used to calculate the probability (density) function. See the</p><p>R help (e.g., ?dnorm) for more details.</p><p>4.2 White noise 69</p><p>0 20 40 60 80 100</p><p>−</p><p>2</p><p>−</p><p>1</p><p>0</p><p>1</p><p>2</p><p>time</p><p>w</p><p>Fig. 4.1. Time plot of simulated Gaussian white noise series.</p><p>Simulation experiments in R can easily be repeated using the ‘up’ arrow</p><p>on the keyboard. For this reason, it is sometimes preferable to put all the</p><p>commands on one line, separated by ‘;’, or to nest the functions; for example,</p><p>a plot of a white noise series is given by plot(rnorm(100), type="l").</p><p>The function set.seed is used to provide a starting point (or seed) in</p><p>the simulations, thus ensuring that the simulations can be</p><p>. . . 32</p><p>2.2.5 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33</p><p>ix</p><p>x Contents</p><p>2.3 The correlogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35</p><p>2.3.1 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35</p><p>2.3.2 Example based on air passenger series . . . . . . . . . . . . . . . 37</p><p>2.3.3 Example based on the Font Reservoir series . . . . . . . . . . . 40</p><p>2.4 Covariance of sums of random variables . . . . . . . . . . . . . . . . . . . . 41</p><p>2.5 Summary of commands used in examples . . . . . . . . . . . . . . . . . . . 42</p><p>2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42</p><p>3 Forecasting Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45</p><p>3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45</p><p>3.2 Leading variables and associated variables . . . . . . . . . . . . . . . . . . 45</p><p>3.2.1 Marine coatings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45</p><p>3.2.2 Building approvals publication . . . . . . . . . . . . . . . . . . . . . . 46</p><p>3.2.3 Gas supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49</p><p>3.3 Bass model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.2 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.3 Interpretation of the Bass model* . . . . . . . . . . . . . . . . . . . 51</p><p>3.3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52</p><p>3.4 Exponential smoothing and the Holt-Winters method . . . . . . . . 55</p><p>3.4.1 Exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55</p><p>3.4.2 Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59</p><p>3.4.3 Four-year-ahead forecasts for the air passenger data . . . 62</p><p>3.5 Summary of commands used in examples . . . . . . . . . . . . . . . . . . . 64</p><p>3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64</p><p>4 Basic Stochastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67</p><p>4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67</p><p>4.2 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.3 Simulation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68</p><p>4.2.4 Second-order properties and the correlogram . . . . . . . . . . 69</p><p>4.2.5 Fitting a white noise model . . . . . . . . . . . . . . . . . . . . . . . . . 70</p><p>4.3 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.3 The backward shift operator . . . . . . . . . . . . . . . . . . . . . . . . 71</p><p>4.3.4 Random walk: Second-order properties . . . . . . . . . . . . . . . 72</p><p>4.3.5 Derivation of second-order properties* . . . . . . . . . . . . . . . 72</p><p>4.3.6 The difference operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72</p><p>4.3.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73</p><p>4.4 Fitted models and diagnostic plots . . . . . . . . . . . . . . . . . . . . . . . . . 74</p><p>4.4.1 Simulated random walk series . . . . . . . . . . . . . . . . . . . . . . . 74</p><p>4.4.2 Exchange rate series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75</p><p>Contents xi</p><p>4.4.3 Random walk with drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77</p><p>4.5 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79</p><p>4.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79</p><p>4.5.2 Stationary and non-stationary AR processes . . . . . . . . . . 79</p><p>4.5.3 Second-order properties of an AR(1) model . . . . . . . . . . . 80</p><p>4.5.4 Derivation of second-order properties for an AR(1)</p><p>process* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80</p><p>4.5.5 Correlogram of an AR(1) process . . . . . . . . . . . . . . . . . . . . 81</p><p>4.5.6 Partial autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81</p><p>4.5.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81</p><p>4.6 Fitted models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82</p><p>4.6.1 Model fitted to simulated series . . . . . . . . . . . . . . . . . . . . . 82</p><p>4.6.2 Exchange rate series: Fitted AR model . . . . . . . . . . . . . . . 84</p><p>4.6.3 Global temperature series: Fitted AR model . . . . . . . . . . 85</p><p>4.7 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87</p><p>4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87</p><p>5 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91</p><p>5.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91</p><p>5.2 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92</p><p>5.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92</p><p>5.2.2 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93</p><p>5.2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93</p><p>5.3 Fitted models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94</p><p>5.3.1 Model fitted to simulated data . . . . . . . . . . . . . . . . . . . . . . 94</p><p>5.3.2 Model fitted to the temperature series (1970–2005) . . . . 95</p><p>5.3.3 Autocorrelation and the estimation of sample statistics* 96</p><p>5.4 Generalised least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98</p><p>5.4.1 GLS fit to simulated series . . . . . . . . . . . . . . . . . . . . . . . . . . 98</p><p>5.4.2 Confidence interval for the trend in the temperature</p><p>series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.5 Linear models with seasonal variables . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99</p><p>5.5.2 Additive seasonal indicator variables . . . . . . . . . . . . . . . . . 99</p><p>5.5.3 Example: Seasonal model for the temperature series . . . 100</p><p>5.6 Harmonic seasonal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101</p><p>5.6.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102</p><p>5.6.2 Fit to simulated series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103</p><p>5.6.3 Harmonic model fitted to temperature series (1970–2005)105</p><p>5.7 Logarithmic transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109</p><p>5.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109</p><p>5.7.2 Example using the air passenger series . . . . . . . . . . . . . . . 109</p><p>5.8 Non-linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113</p><p>5.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113</p><p>5.8.2 Example of a simulated and fitted non-linear series</p><p>reproduced. If this</p><p>function is left out, a different set of simulated data are obtained, although</p><p>the underlying statistical properties remain unchanged. To see this, rerun the</p><p>plot above a few times with and without set.seed(1).</p><p>To illustrate by simulation how samples may differ from their underlying</p><p>populations, consider the following histogram of a Gaussian white noise series.</p><p>Type the following to view the plot (which is not shown in the text):</p><p>> x hist(rnorm(100), prob = T); points(x, dnorm(x), type = "l")</p><p>Repetitions of the last command, which can be obtained using the ‘up’ arrow</p><p>on your keyboard, will show a range of different sample distributions that</p><p>arise when the underlying distribution is normal. Distributions that depart</p><p>from the plotted curve have arisen due to sampling variation.</p><p>4.2.4 Second-order properties and the correlogram</p><p>The second-order properties of a white noise series {wt} are an immediate</p><p>consequence of the definition in §4.2.2. However, as they are needed so often</p><p>in the derivation of the second-order properties for more complex models, we</p><p>explicitly state them here:</p><p>70 4 Basic Stochastic Models</p><p>µw = 0</p><p>γk = Cov(wt, wt+k) =</p><p>{</p><p>σ2 if k = 0</p><p>0 if k 6= 0</p><p> (4.2)</p><p>The autocorrelation function follows as</p><p>ρk =</p><p>{</p><p>1 if k = 0</p><p>0 if k 6= 0</p><p>(4.3)</p><p>Simulated white noise data will not have autocorrelations that are exactly</p><p>zero (when k 6= 0) because of sampling variation. In particular, for a simu-</p><p>lated white noise series, it is expected that 5% of the autocorrelations will</p><p>be significantly different from zero at the 5% significance level, shown as dot-</p><p>ted lines on the correlogram. Try repeating the following command to view a</p><p>range of correlograms that could arise from an underlying white noise series.</p><p>A typical plot, with one statistically significant autocorrelation, occurring at</p><p>lag 7, is shown in Figure 4.2.</p><p>> set.seed(2)</p><p>> acf(rnorm(100))</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.2. Correlogram of a simulated white noise series. The underlying autocorre-</p><p>lations are all zero (except at lag 0); the statistically significant value at lag 7 is due</p><p>to sampling variation.</p><p>4.2.5 Fitting a white noise model</p><p>A white noise series usually arises as a residual series after fitting an appropri-</p><p>ate time series model. The correlogram generally provides sufficient evidence,</p><p>4.3 Random walks 71</p><p>provided the series is of a reasonable length, to support the conjecture that</p><p>the residuals are well approximated by white noise.</p><p>The only parameter for a white noise series is the variance σ2, which is</p><p>estimated by the residual variance, adjusted by degrees of freedom, given in</p><p>the computer output of the fitted model. If your analysis begins on data that</p><p>are already approximately white noise, then only σ2 needs to be estimated,</p><p>which is readily achieved using the var function.</p><p>4.3 Random walks</p><p>4.3.1 Introduction</p><p>In Chapter 1, the exchange rate data were examined and found to exhibit</p><p>stochastic trends. A random walk often provides a good fit to data with</p><p>stochastic trends, although even better fits are usually obtained from more</p><p>general model formulations, such as the ARIMA models of Chapter 7.</p><p>4.3.2 Definition</p><p>Let {xt} be a time series. Then {xt} is a random walk if</p><p>xt = xt−1 + wt (4.4)</p><p>where {wt} is a white noise series. Substituting xt−1 = xt−2+wt−1 in Equation</p><p>(4.4) and then substituting for xt−2, followed by xt−3 and so on (a process</p><p>known as ‘back substitution’) gives:</p><p>xt = wt + wt−1 + wt−2 + . . . (4.5)</p><p>In practice, the series above will not be infinite but will start at some time</p><p>t = 1. Hence,</p><p>xt = w1 + w2 + . . .+ wt (4.6)</p><p>Back substitution is used to define more complex time series models and</p><p>also to derive second-order properties. The procedure occurs so frequently in</p><p>the study of time series models that the following definition is needed.</p><p>4.3.3 The backward shift operator</p><p>The backward shift operator B is defined by</p><p>Bxt = xt−1 (4.7)</p><p>The backward shift operator is sometimes called the ‘lag operator’. By repeat-</p><p>edly applying B, it follows that</p><p>Bnxt = xt−n (4.8)</p><p>72 4 Basic Stochastic Models</p><p>Using B, Equation (4.4) can be rewritten as</p><p>xt = Bxt + wt ⇒ (1−B)xt = wt ⇒ xt = (1−B)−1wt</p><p>⇒ xt = (1 + B + B2 + . . .)wt ⇒ xt = wt + wt−1 + wt−2 + . . .</p><p>and Equation (4.5) is recovered.</p><p>4.3.4 Random walk: Second-order properties</p><p>The second-order properties of a random walk follow as</p><p>µx = 0</p><p>γk(t) = Cov(xt, xt+k) = tσ2</p><p>}</p><p>(4.9)</p><p>The covariance is a function of time, so the process is non-stationary. In par-</p><p>ticular, the variance is tσ2 and so it increases without limit as t increases. It</p><p>follows that a random walk is only suitable for short term predictions.</p><p>The time-varying autocorrelation function for k > 0 follows from Equation</p><p>(4.9) as</p><p>ρk(t) =</p><p>Cov(xt, xt+k)√</p><p>Var(xt)Var(xt+k)</p><p>=</p><p>tσ2√</p><p>tσ2(t+ k)σ2</p><p>=</p><p>1√</p><p>1 + k/t</p><p>(4.10)</p><p>so that, for large t with k considerably less than t, ρk is nearly 1. Hence, the</p><p>correlogram for a random walk is characterised by positive autocorrelations</p><p>that decay very slowly down from unity. This is demonstrated by simulation</p><p>in §4.3.7.</p><p>4.3.5 Derivation of second-order properties*</p><p>Equation (4.6) is a finite sum of white noise terms, each with zero mean and</p><p>variance σ2. Hence, the mean of xt is zero (Equation (4.9)). The autocovari-</p><p>ance in Equation (4.9) can be derived using Equation (2.15) as follows:</p><p>γk(t) = Cov(xt, xt+k) = Cov</p><p> t∑</p><p>i=1</p><p>wi,</p><p>t+k∑</p><p>j=1</p><p>wj</p><p> =</p><p>∑</p><p>i=j</p><p>Cov(wi, wj) = tσ2</p><p>4.3.6 The difference operator</p><p>Differencing adjacent terms of a series can transform a non-stationary series</p><p>to a stationary series. For example, if the series {xt} is a random walk, it</p><p>is non-stationary. However, from Equation (4.4), the first-order differences of</p><p>{xt} produce the stationary white noise series {wt} given by xt − xt−1 = wt.</p><p>4.3 Random walks 73</p><p>Hence, differencing turns out to be a useful ‘filtering’ procedure in the study</p><p>of non-stationary time series. The difference operator ∇ is defined by</p><p>∇xt = xt − xt−1 (4.11)</p><p>Note that ∇xt = (1−B)xt, so that ∇ can be expressed in terms of the back-</p><p>ward shift operator B. In general, higher-order differencing can be expressed</p><p>as</p><p>∇n = (1−B)n (4.12)</p><p>The proof of the last result is left to Exercise 7.</p><p>4.3.7 Simulation</p><p>It is often helpful to study a time series model by simulation. This enables the</p><p>main features of the model to be observed in plots, so that when historical data</p><p>exhibit similar features, the model may be selected as a potential candidate.</p><p>The following commands can be used to simulate random walk data for x:</p><p>> x for (t in 2:1000) x[t] plot(x, type = "l")</p><p>The first command above places a white noise series into w and uses this</p><p>series to initialise x. The ‘for’ loop then generates the random walk using</p><p>Equation (4.4) – the correspondence between the R code above and Equation</p><p>(4.4) should be noted. The series is plotted and shown in Figure 4.3.2</p><p>A correlogram of the series is obtained from acf(x) and is shown in Fig-</p><p>ure 4.4 – a gradual decay in the correlations is evident in the figure, thus</p><p>supporting the theoretical results in §4.3.4.</p><p>Throughout this book, we will often fit models to data that we have simu-</p><p>lated and attempt to recover the underlying model parameters. At first sight,</p><p>this might seem odd, given that the parameters are used to simulate the data</p><p>so that we already know at the outset the values the parameters should take.</p><p>However, the procedure is useful for a number of reasons. In particular, to</p><p>be able to simulate data using a model requires that the model formulation</p><p>be correctly understood. If the model is understood but incorrectly imple-</p><p>mented, then the parameter estimates from the fitted model may deviate</p><p>significantly from the underlying model values used in the simulation. Simu-</p><p>lation can therefore help ensure that the model is both correctly understood</p><p>and correctly implemented.</p><p>2 To obtain the same simulation and plot, it is necessary to</p><p>have run the previous</p><p>code in §4.2.4 first, which sets the random number seed.</p><p>74 4 Basic Stochastic Models</p><p>0 200 400 600 800 1000</p><p>0</p><p>20</p><p>40</p><p>60</p><p>80</p><p>Index</p><p>x</p><p>Fig. 4.3. Time plot of a simulated random walk. The series exhibits an increasing</p><p>trend. However, this is purely stochastic and due to the high serial correlation.</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.4. The correlogram for the simulated random walk. A gradual decay from a</p><p>high serial correlation is a notable feature of a random walk series.</p><p>4.4 Fitted models and diagnostic plots</p><p>4.4.1 Simulated random walk series</p><p>The first-order differences of a random walk are a white noise series, so the</p><p>correlogram of the series of differences can be used to assess whether a given</p><p>series is reasonably modelled as a random walk.</p><p>> acf(diff(x))</p><p>4.4 Fitted models and diagnostic plots 75</p><p>As can be seen in Figure 4.5, there are no obvious patterns in the correlogram,</p><p>with only a couple of marginally statistically significant values. These signif-</p><p>icant values can be ignored because they are small in magnitude and about</p><p>5% of the values are expected to be statistically significant even when the</p><p>underlying values are zero (§2.3). Thus, as expected, there is good evidence</p><p>that the simulated series in x follows a random walk.</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.5. Correlogram of differenced series. If a series follows a random walk, the</p><p>differenced series will be white noise.</p><p>4.4.2 Exchange rate series</p><p>The correlogram of the first-order differences of the exchange rate data from</p><p>§1.4.4 can be obtained from acf(diff(Z.ts)) and is shown in Figure 4.6.</p><p>A significant value occurs at lag 1, suggesting that a more complex model</p><p>may be needed, although the lack of any other significant values in the cor-</p><p>relogram does suggest that the random walk provides a good approximation</p><p>for the series (Fig. 4.6). An additional term can be added to the random</p><p>walk model using the Holt-Winters procedure, allowing the parameter β to</p><p>be non-zero but still forcing the seasonal term γ to be zero:</p><p>> Z.hw acf(resid(Z.hw))</p><p>Figure 4.7 shows the correlogram of the residuals from the fitted Holt-</p><p>Winters model. This correlogram is more consistent with a hypothesis that</p><p>the residual series is white noise (Fig. 4.7). Using Equation (3.21), with the</p><p>parameter estimates obtained from Z.hw$alpha and Z.hw$beta, the fitted</p><p>model can be expressed as</p><p>76 4 Basic Stochastic Models</p><p>0 1 2 3</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.6. Correlogram of first-order differences of the exchange rate series (UK</p><p>pounds to NZ dollars, 1991–2000). The significant value at lag 1 indicates that an</p><p>extension of the random walk model is needed for this series.</p><p>0 1 2 3</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.7. The correlogram of the residuals from the fitted Holt-Winters model for the</p><p>exchange rate series (UK pounds to NZ dollars, 1991–2000). There are no significant</p><p>correlations in the residual series, so the model provides a reasonable approximation</p><p>to the exchange rate data.</p><p>xt = xt−1 + bt−1 + wt</p><p>bt−1 = 0.167(xt−1 − xt−2) + 0.833bt−2</p><p>}</p><p>(4.13)</p><p>where {wt} is white noise with zero mean.</p><p>4.4 Fitted models and diagnostic plots 77</p><p>After some algebra, Equations (4.13) can be expressed as one equation</p><p>in terms of the backward shift operator:</p><p>(1− 0.167B + 0.167B2)(1−B)xt = wt (4.14)</p><p>Equation (4.14) is a special case – the integrated autoregressive model –</p><p>within the important class of models known as ARIMA models (Chap-</p><p>ter 7). The proof of Equation (4.14) is left to Exercise 8.</p><p>4.4.3 Random walk with drift</p><p>Company stockholders generally expect their investment to increase in value</p><p>despite the volatility of financial markets. The random walk model can be</p><p>adapted to allow for this by including a drift parameter δ.</p><p>xt = xt−1 + δ + wt</p><p>Closing prices (US dollars) for Hewlett-Packard Company stock for 672</p><p>trading days up to June 7, 2007 are read into R and plotted (see the code</p><p>below and Fig. 4.8). The lag 1 differences are calculated using diff() and</p><p>plotted in Figure 4.9. The correlogram of the differences is in Figure 4.10, and</p><p>they appear to be well modelled as white noise. The mean of the differences is</p><p>0.0399, and this is our estimate of the drift parameter. The standard deviation</p><p>of the 671 differences is 0.460, and an approximate 95% confidence interval</p><p>for the drift parameter is [0.004, 0.075]. Since this interval does not include 0,</p><p>we have evidence of a positive drift over this period.</p><p>Day</p><p>C</p><p>lo</p><p>si</p><p>ng</p><p>P</p><p>ric</p><p>e</p><p>0 100 200 300 400 500 600</p><p>20</p><p>25</p><p>30</p><p>35</p><p>40</p><p>45</p><p>Fig. 4.8. Daily closing prices of Hewlett-Packard stock.</p><p>78 4 Basic Stochastic Models</p><p>Day</p><p>D</p><p>iff</p><p>er</p><p>en</p><p>ce</p><p>o</p><p>f c</p><p>lo</p><p>si</p><p>ng</p><p>p</p><p>ric</p><p>e</p><p>0 100 200 300 400 500 600</p><p>−</p><p>2</p><p>−</p><p>1</p><p>0</p><p>1</p><p>2</p><p>3</p><p>Fig. 4.9. Lag 1 differences of daily closing prices of Hewlett-Packard stock.</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Series DP</p><p>Fig. 4.10. Acf of lag 1 differences of daily closing prices of Hewlett-Packard stock.</p><p>> www HP.dat plot (as.ts(Price))</p><p>> DP mean(DP) + c(-2, 2) * sd(DP)/sqrt(length(DP))</p><p>[1] 0.004378 0.075353</p><p>4.5 Autoregressive models 79</p><p>4.5 Autoregressive models</p><p>4.5.1 Definition</p><p>The series {xt} is an autoregressive process of order p, abbreviated to AR(p),</p><p>if</p><p>xt = α1xt−1 + α2xt−2 + . . .+ αpxt−p + wt (4.15)</p><p>where {wt} is white noise and the αi are the model parameters with αp 6= 0</p><p>for an order p process. Equation (4.15) can be expressed as a polynomial of</p><p>order p in terms of the backward shift operator:</p><p>θp(B)xt = (1− α1B− α2B2 − . . .− αpBp)xt = wt (4.16)</p><p>The following points should be noted:</p><p>(a) The random walk is the special case AR(1) with α1 = 1 (see Equation</p><p>(4.4)).</p><p>(b) The exponential smoothing model is the special case αi = α(1 − α)i for</p><p>i = 1, 2, . . . and p→∞.</p><p>(c) The model is a regression of xt on past terms from the same series; hence</p><p>the use of the term ‘autoregressive’.</p><p>(d) A prediction at time t is given by</p><p>x̂t = α1xt−1 + α2xt−2 + . . .+ αpxt−p (4.17)</p><p>(e) The model parameters can be estimated by minimising the sum of squared</p><p>errors.</p><p>4.5.2 Stationary and non-stationary AR processes</p><p>The equation θp(B) = 0, where B is formally treated as a number (real or</p><p>complex), is called the characteristic equation. The roots of the characteristic</p><p>equation (i.e., the polynomial θp(B) from Equation (4.16)) must all exceed</p><p>unity in absolute value for the process to be stationary. Notice that the random</p><p>walk has θ = 1−B with root B = 1 and is non-stationary. The following four</p><p>examples illustrate the procedure for determining whether an AR process is</p><p>stationary or non-stationary:</p><p>1. The AR(1) model xt = 1</p><p>2xt−1 + wt is stationary because the root of</p><p>1− 1</p><p>2B = 0 is B = 2, which is greater than 1.</p><p>2. The AR(2) model xt = xt−1 − 1</p><p>4xt−2 +wt is stationary. The proof of this</p><p>result is obtained by first expressing the model in terms of the backward</p><p>shift operator 1</p><p>4 (B2 − 4B + 4)xt = wt; i.e., 1</p><p>4 (B− 2)2xt = wt. The roots</p><p>of the polynomial are given by solving θ(B) = 1</p><p>4 (B − 2)2 = 0 and are</p><p>therefore obtained as B = 2. As the roots are greater than unity this</p><p>AR(2) model is stationary.</p><p>80 4 Basic Stochastic Models</p><p>3. The model xt = 1</p><p>2xt−1 + 1</p><p>2xt−2 + wt is non-stationary because one of</p><p>the roots is unity. To prove this, first express the model in terms of the</p><p>backward shift operator − 1</p><p>2 (B2+B−2)xt = wt; i.e., − 1</p><p>2 (B−1)(B+2)xt =</p><p>wt. The polynomial θ(B) = − 1</p><p>2 (B − 1)(B + 2) has roots B = 1,−2. As</p><p>there is a unit root (B = 1), the model is non-stationary. Note that the</p><p>other root (B = −2) exceeds unity in absolute value, so only the presence</p><p>of the unit root makes this process non-stationary.</p><p>4. The AR(2) model xt = − 1</p><p>4xt−2 + wt is stationary because the roots of</p><p>1 + 1</p><p>4B</p><p>2 =</p><p>0 are B = ±2i, which are complex numbers with i =</p><p>√</p><p>−1,</p><p>each having an absolute value of 2 exceeding unity.</p><p>The R function polyroot finds zeros of polynomials and can be used to find</p><p>the roots of the characteristic equation to check for stationarity.</p><p>4.5.3 Second-order properties of an AR(1) model</p><p>From Equation (4.15), the AR(1) process is given by</p><p>xt = αxt−1 + wt (4.18)</p><p>where {wt} is a white noise series with mean zero and variance σ2. It can be</p><p>shown (§4.5.4) that the second-order properties follow as</p><p>µx = 0</p><p>γk = αkσ2/(1− α2)</p><p>}</p><p>(4.19)</p><p>4.5.4 Derivation of second-order properties for an AR(1) process*</p><p>Using B, a stable AR(1) process (|α| rho layout(1:2)</p><p>> plot(0:10, rho(0:10, 0.7), type = "b")</p><p>> plot(0:10, rho(0:10, -0.7), type = "b")</p><p>Try experimenting using other values for α. For example, use a small value of</p><p>α to observe a more rapid decay to zero in the correlogram.</p><p>4.5.6 Partial autocorrelation</p><p>From Equation (4.21), the autocorrelations are non-zero for all lags even</p><p>though in the underlying model xt only depends on the previous value xt−1</p><p>(Equation (4.18)). The partial autocorrelation at lag k is the correlation that</p><p>results after removing the effect of any correlations due to the terms at shorter</p><p>lags. For example, the partial autocorrelation of an AR(1) process will be zero</p><p>for all lags greater than 1. In general, the partial autocorrelation at lag k is</p><p>the kth coefficient of a fitted AR(k) model; if the underlying process is AR(p),</p><p>then the coefficients αk will be zero for all k > p. Thus, an AR(p) process has</p><p>a correlogram of partial autocorrelations that is zero after lag p. Hence, a plot</p><p>of the estimated partial autocorrelations can be useful when determining the</p><p>order of a suitable AR process for a time series. In R, the function pacf can</p><p>be used to calculate the partial autocorrelations of a time series and produce</p><p>a plot of the partial autocorrelations against lag (the ‘partial correlogram’).</p><p>4.5.7 Simulation</p><p>An AR(1) process can be simulated in R as follows:</p><p>> set.seed(1)</p><p>> x for (t in 2:100) x[t] plot(x, type = "l")</p><p>> acf(x)</p><p>> pacf(x)</p><p>The resulting plots of the simulated data are shown in Figure 4.12 and give one</p><p>possible realisation of the model. The partial correlogram has no significant</p><p>correlations except the value at lag 1, as expected (Fig. 4.12c – note that the</p><p>82 4 Basic Stochastic Models</p><p>Fig. 4.11. Example correlograms for two autoregressive models: (a) xt = 0.7xt−1 +</p><p>wt; (b) xt = −0.7xt−1 + wt.</p><p>pacf starts at lag 1, whilst the acf starts at lag 0). The difference between the</p><p>correlogram of the underlying model (Fig. 4.11a) and the sample correlogram</p><p>of the simulated series (Fig. 4.12b) shows discrepancies that have arisen due</p><p>to sampling variation. Try repeating the commands above several times to</p><p>obtain a range of possible sample correlograms for an AR(1) process with</p><p>underlying parameter α = 0.7. You are asked to investigate an AR(2) process</p><p>in Exercise 4.</p><p>4.6 Fitted models</p><p>4.6.1 Model fitted to simulated series</p><p>An AR(p) model can be fitted to data in R using the ar function. In the code</p><p>below, the autoregressive model x.ar is fitted to the simulated series of the</p><p>last section and an approximate 95% confidence interval for the underlying</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ● ● ●</p><p>0 2 4 6 8 10</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>lag k</p><p>ρρ k</p><p>(a) αα == 0.7</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>0 2 4 6 8 10</p><p>−</p><p>1</p><p>0</p><p>1</p><p>lag k</p><p>ρρ k</p><p>(b) αα == −− 0.7</p><p>4.6 Fitted models 83</p><p>0 20 40 60 80 100</p><p>0</p><p>2</p><p>(a) Time plot.</p><p>x</p><p>0 5 10 15 20</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>(b) Correlogram: Sample correlation against lag</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>(c) Partial correlogram: Sample partial correlation against lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 4.12. A simulated AR(1) process, xt = 0.7xt−1 + wt. Note that in the partial</p><p>correlogram (c) only the first lag is significant, which is usually the case when the</p><p>underlying process is AR(1).</p><p>parameter is given, where the (asymptotic) variance of the parameter estimate</p><p>is extracted using x.ar$asy.var:</p><p>> x.ar x.ar$order</p><p>[1] 1</p><p>> x.ar$ar</p><p>84 4 Basic Stochastic Models</p><p>[1] 0.601</p><p>> x.ar$ar + c(-2, 2) * sqrt(x.ar$asy.var)</p><p>[1] 0.4404 0.7615</p><p>The method “mle” used in the fitting procedure above is based on max-</p><p>imising the likelihood function (the probability of obtaining the data given the</p><p>model) with respect to the unknown parameters. The order p of the process</p><p>is chosen using the Akaike Information Criterion (AIC; Akaike, 1974), which</p><p>penalises models with too many parameters:</p><p>AIC = −2× log-likelihood + 2× number of parameters (4.22)</p><p>In the function ar, the model with the smallest AIC is selected as the best-</p><p>fitting AR model. Note that, in the code above, the correct order (p = 1)</p><p>of the underlying process is recovered. The parameter estimate for the fitted</p><p>AR(1) model is α̂ = 0.60. Whilst this is smaller than the underlying model</p><p>value of α = 0.7, the approximate 95% confidence interval does contain the</p><p>value of the model parameter as expected, giving us no reason to doubt the</p><p>implementation of the model.</p><p>4.6.2 Exchange rate series: Fitted AR model</p><p>An AR(1) model is fitted to the exchange rate series, and the upper bound</p><p>of the confidence interval for the parameter includes 1. This indicates that</p><p>there would not be sufficient evidence to reject the hypothesis α = 1, which is</p><p>consistent with the earlier conclusion that a random walk provides a good ap-</p><p>proximation for this series. However, simulated data from models with values</p><p>of α > 1, formally included in the confidence interval below, exhibit exponen-</p><p>tially unstable behaviour and are not credible models for the New Zealand</p><p>exchange rate.</p><p>> Z.ar mean(Z.ts)</p><p>[1] 2.823</p><p>> Z.ar$order</p><p>[1] 1</p><p>> Z.ar$ar</p><p>[1] 0.8903</p><p>> Z.ar$ar + c(-2, 2) * sqrt(Z.ar$asy.var)</p><p>[1] 0.7405 1.0400</p><p>> acf(Z.ar$res[-1])</p><p>4.6 Fitted models 85</p><p>In the code above, a “−1” is used in the vector of residuals to remove the</p><p>first item from the residual series (Fig. 4.13). (For a fitted AR(1) model, the</p><p>first item has no predicted value because there is no observation at t = 0; in</p><p>general, the first p values will be ‘not available’ (NA) in the residual series of</p><p>a fitted AR(p) model.)</p><p>By default, the mean is subtracted before the parameters are estimated,</p><p>so a predicted value ẑt at time t based on the output above is given by</p><p>ẑt = 2.8 + 0.89(zt−1 − 2.8) (4.23)</p><p>0 5 10 15</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.13. The correlogram of residual series for the AR(1) model fitted to the</p><p>exchange rate data.</p><p>4.6.3 Global temperature series: Fitted AR model</p><p>The global temperature series was introduced in §1.4.5, where it was apparent</p><p>that the data exhibited an increasing trend after 1970, which may be due to</p><p>the ‘greenhouse effect’. Sceptics may claim that the apparent increasing trend</p><p>can be dismissed as a transient stochastic phenomenon. For their claim to be</p><p>consistent with the time series data, it should be possible to model the trend</p><p>without the use of deterministic functions.</p><p>Consider the following AR model fitted to the mean annual temperature</p><p>series:</p><p>> www = "http://www.massey.ac.nz/~pscowper/ts/global.dat"</p><p>> Global = scan(www)</p><p>> Global.ts</p><p>= ts(Global, st = c(1856, 1), end = c(2005, 12),</p><p>fr = 12)</p><p>86 4 Basic Stochastic Models</p><p>> Global.ar mean(aggregate(Global.ts, FUN = mean))</p><p>[1] -0.1383</p><p>> Global.ar$order</p><p>[1] 4</p><p>> Global.ar$ar</p><p>[1] 0.58762 0.01260 0.11117 0.26764</p><p>> acf(Global.ar$res[-(1:Global.ar$order)], lag = 50)</p><p>0 10 20 30 40 50</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 4.14. The correlogram of the residual series for the AR(4) model fitted to the</p><p>annual global temperature series. The correlogram is approximately white noise so</p><p>that, in the absence of further information, a simple stochastic model can ‘explain’</p><p>the correlation and trends in the series.</p><p>Based on the output above a predicted mean annual temperature x̂t at</p><p>time t is given by</p><p>x̂t = −0.14 + 0.59(xt−1 + 0.14) + 0.013(xt−2 + 0.14)</p><p>+0.11(xt−3 + 0.14) + 0.27(xt−4 + 0.14) (4.24)</p><p>The correlogram of the residuals has only one (marginally) significant value</p><p>at lag 27, so the underlying residual series could be white noise (Fig. 4.14).</p><p>Thus the fitted AR(4) model (Equation (4.24)) provides a good fit to the</p><p>data. As the AR model has no deterministic trend component, the trends in</p><p>the data can be explained by serial correlation and random variation, implying</p><p>that it is possible that these trends are stochastic (or could arise from a purely</p><p>4.8 Exercises 87</p><p>stochastic process). Again we emphasise that this does not imply that there is</p><p>no underlying reason for the trends. If a valid scientific explanation is known,</p><p>such as a link with the increased use of fossil fuels, then this information would</p><p>clearly need to be included in any future forecasts of the series.</p><p>4.7 Summary of R commands</p><p>set.seed sets a seed for the random number generator</p><p>enabling a simulation to be reproduced</p><p>rnorm simulates Gaussian white noise series</p><p>diff creates a series of first-order differences</p><p>ar gets the best fitting AR(p) model</p><p>pacf extracts partial autocorrelations</p><p>and partial correlogram</p><p>polyroot extracts the roots of a polynomial</p><p>resid extracts the residuals from a fitted model</p><p>4.8 Exercises</p><p>1. Simulate discrete white noise from an exponential distribution and plot the</p><p>histogram and the correlogram. For example, you can use the R command</p><p>w</p><p>5,</p><p>© Springer Science+Business Media, LLC 2009</p><p>92 5 Regression</p><p>to erroneously high statistical significance being attributed to statistical tests</p><p>in standard computer output (the p values will be smaller than they should</p><p>be). Presenting correct statistical evidence is important. For example, an en-</p><p>vironmental protection group could be undermined by allegations that it is</p><p>falsely claiming statistically significant trends. In this chapter, generalised</p><p>least squares is used to obtain improved estimates of the standard error to</p><p>account for autocorrelation in the residual series.</p><p>5.2 Linear models</p><p>5.2.1 Definition</p><p>A model for a time series {xt : t = 1, . . . n} is linear if it can be expressed as</p><p>xt = α0 + α1u1,t + α2u2,t + . . .+ αmum,t + zt (5.1)</p><p>where ui,t is the value of the ith predictor (or explanatory) variable at time</p><p>t (i = 1, . . . ,m; t = 1, . . . , n), zt is the error at time t, and α0, α1, . . . , αm</p><p>are model parameters, which can be estimated by least squares. Note that the</p><p>errors form a time series {zt}, with mean 0, that does not have to be Gaussian</p><p>or white noise. An example of a linear model is the pth-order polynomial</p><p>function of t:</p><p>xt = α0 + α1t+ α2t</p><p>2 . . .+ αpt</p><p>p + zt (5.2)</p><p>The predictor variables can be written ui,t = ti (i = 1, . . . , p). The term</p><p>‘linear’ is a reference to the summation of model parameters, each multiplied</p><p>by a single predictor variable.</p><p>A simple special case of a linear model is the straight-line model obtained</p><p>by putting p = 1 in Equation (5.2): xt = α0 +α1t+ zt. In this case, the value</p><p>of the line at time t is the trend mt. For the more general polynomial, the</p><p>trend at time t is the value of the underlying polynomial evaluated at t, so in</p><p>Equation (5.2) the trend is mt = α0 + α1t+ α2t</p><p>2 . . .+ αpt</p><p>p.</p><p>Many non-linear models can be transformed to linear models. For example,</p><p>the model xt = eα0+α1t+zt for the series {xt} can be transformed by taking</p><p>natural logarithms to obtain a linear model for the series {yt}:</p><p>yt = log xt = α0 + α1t+ zt (5.3)</p><p>In Equation (5.3), standard least squares regression could then be used to fit</p><p>a linear model (i.e., estimate the parameters α0 and α1) and make predictions</p><p>for yt. To make predictions for xt, the inverse transform needs to be applied</p><p>to yt, which in this example is exp(yt). However, this usually has the effect</p><p>of biasing the forecasts of mean values, and we discuss correction factors in</p><p>§5.10.</p><p>Natural processes that generate time series are not expected to be precisely</p><p>linear, but linear approximations are often adequate. However, we are not</p><p>5.2 Linear models 93</p><p>restricted to linear models, and the Bass model (§3.3) is an example of a non-</p><p>linear model, which we fitted using the non-linear least squares function nls.</p><p>5.2.2 Stationarity</p><p>Linear models for time series are non-stationary when they include functions</p><p>of time. Differencing can often transform a non-stationary series with a de-</p><p>terministic trend to a stationary series. For example, if the time series {xt} is</p><p>given by the straight-line function plus white noise xt = α0 + α1t + zt, then</p><p>the first-order differences are given by</p><p>∇xt = xt − xt−1 = zt − zt−1 + α1 (5.4)</p><p>Assuming the error series {zt} is stationary, the series {∇xt} is stationary</p><p>as it is not a function of t. In §4.3.6 we found that first-order differencing</p><p>can transform a non-stationary series with a stochastic trend (the random</p><p>walk) to a stationary series. Thus, differencing can remove both stochastic and</p><p>deterministic trends from time series. If the underlying trend is a polynomial</p><p>of order m, then mth-order differencing is required to remove the trend.</p><p>Notice that differencing the straight-line function plus white noise leads to</p><p>a different stationary time series than subtracting the trend. The latter gives</p><p>white noise, whereas differencing gives a series of consecutive white noise terms</p><p>(which is an example of an MA process, described in Chapter 6).</p><p>5.2.3 Simulation</p><p>In time series regression, it is common for the error series {zt} in Equation</p><p>(5.1) to be autocorrelated. In the code below a time series with an increas-</p><p>ing straight-line trend (50 + 3t) with autocorrelated errors is simulated and</p><p>plotted:</p><p>> set.seed(1)</p><p>> z for (t in 2:100) z[t] Time x plot(x, xlab = "time", type = "l")</p><p>The model for the code above can be expressed as xt = 50 + 3t + zt, where</p><p>{zt} is the AR(1) process zt = 0.8zt−1 +wt and {wt} is Gaussian white noise</p><p>with σ = 20. A time plot of a realisation of {xt} is given in Figure 5.1.</p><p>94 5 Regression</p><p>0 20 40 60 80 100</p><p>10</p><p>0</p><p>20</p><p>0</p><p>30</p><p>0</p><p>40</p><p>0</p><p>time</p><p>x</p><p>Fig. 5.1. Time plot of a simulated time series with a straight-line trend and AR(1)</p><p>residual errors.</p><p>5.3 Fitted models</p><p>5.3.1 Model fitted to simulated data</p><p>Linear models are usually fitted by minimising the sum of squared errors,∑</p><p>z2</p><p>t =</p><p>∑</p><p>(xt−α0−α1u1,t− . . .−αmum,t)2, which is achieved in R using the</p><p>function lm:</p><p>> x.lm coef(x.lm)</p><p>(Intercept) Time</p><p>58.55 3.06</p><p>> sqrt(diag(vcov(x.lm)))</p><p>(Intercept) Time</p><p>4.8801 0.0839</p><p>In the code above, the estimated parameters of the linear model are extracted</p><p>using coef. Note that, as expected, the estimates are close to the underlying</p><p>parameter values of 50 for the intercept and 3 for the slope. The standard</p><p>errors are extracted using the square root of the diagonal elements obtained</p><p>from vcov, although these standard errors are likely to be underestimated</p><p>because of autocorrelation in the residuals. The function summary can also be</p><p>used to obtain this information but tends to give additional information, for</p><p>example t-tests, which may be incorrect for a time series regression analysis</p><p>due to autocorrelation in the residuals.</p><p>After fitting a regression model, we should consider various diagnostic</p><p>plots. In the case of time series regression, an important diagnostic plot is the</p><p>correlogram of the residuals:</p><p>5.3 Fitted models 95</p><p>> acf(resid(x.lm))</p><p>> pacf(resid(x.lm))</p><p>As expected, the residual time series is autocorrelated (Fig. 5.2). In Figure</p><p>5.3, only the lag 1 partial autocorrelation is significant, which suggests that</p><p>the residual series follows an AR(1) process. Again this should be as expected,</p><p>given that an AR(1) process was used to simulate these residuals.</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.2. Residual correlogram for the fitted straight-line model.</p><p>5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 5.3. Residual partial correlogram for the fitted straight-line model.</p><p>5.3.2 Model fitted to the temperature series (1970–2005)</p><p>In §1.4.5, we extracted temperatures for the period 1970–2005. The follow-</p><p>ing regression model is fitted to the global temperature over this period,</p><p>96 5 Regression</p><p>and approximate 95% confidence intervals are given for the parameters us-</p><p>ing confint. The explanatory variable is the time, so the function time is</p><p>used to extract the ‘times’ from the ts temperature object.</p><p>> www Global Global.ts temp temp.lm coef(temp.lm)</p><p>(Intercept) time(temp)</p><p>-34.9204 0.0177</p><p>> confint(temp.lm)</p><p>2.5 % 97.5 %</p><p>(Intercept) -37.2100 -32.6308</p><p>time(temp) 0.0165 0.0188</p><p>> acf(resid(lm(temp ~ time(temp))))</p><p>The confidence interval for the slope does not contain zero, which would pro-</p><p>vide statistical evidence of an increasing trend in global temperatures if the</p><p>autocorrelation in the residuals is negligible. However, the residual series is</p><p>positively autocorrelated at shorter lags (Fig. 5.4), leading to an underesti-</p><p>mate of the standard error and too narrow a confidence interval for the slope.</p><p>Intuitively, the positive correlation between consecutive values reduces the</p><p>effective record length because similar values will tend to occur together. The</p><p>following section illustrates</p><p>the reasoning behind this but may be omitted,</p><p>without loss of continuity, by readers who do not require the mathematical</p><p>details.</p><p>5.3.3 Autocorrelation and the estimation of sample statistics*</p><p>To illustrate the effect of autocorrelation in estimation, the sample mean will</p><p>be used, as it is straightforward to analyse and is used in the calculation of</p><p>other statistical properties.</p><p>Suppose {xt : t = 1, . . . , n} is a time series of independent random variables</p><p>with mean E(xt) = µ and variance Var(xt) = σ2. Then it is well known in</p><p>the study of random samples that the sample mean x̄ =</p><p>∑n</p><p>t=1 xt/n has mean</p><p>E(x̄) = µ and variance Var(x̄) = σ2/n (or standard error σ/</p><p>√</p><p>n). Now let</p><p>{xt : t = 1, . . . , n} be a stationary time series with E(xt) = µ, Var(xt) = σ2,</p><p>and autocorrelation function Cor(xt, xt+k) = ρk. Then the variance of the</p><p>sample mean is given by</p><p>Var (x̄) =</p><p>σ2</p><p>n</p><p>[</p><p>1 + 2</p><p>n−1∑</p><p>k=1</p><p>(1− k/n)ρk</p><p>]</p><p>(5.5)</p><p>5.3 Fitted models 97</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.4. Residual correlogram for the regression model fitted to the global temper-</p><p>ature series (1970–2005).</p><p>In Equation (5.5) the variance σ2/n for an independent random sam-</p><p>ple arises as the special case where ρk = 0 for all k > 0. If ρk > 0, then</p><p>Var(x̄) > σ2/n and the resulting estimate of µ is less accurate than that ob-</p><p>tained from a random (independent) sample of the same size. Conversely, if</p><p>ρk library(nlme)</p><p>> x.gls coef(x.gls)</p><p>(Intercept) Time</p><p>58.23 3.04</p><p>> sqrt(diag(vcov(x.gls)))</p><p>(Intercept) Time</p><p>11.925 0.202</p><p>A lag 1 autocorrelation of 0.8 is used above because this value was used to</p><p>simulate the data (§5.2.3). For historical series, the lag 1 autocorrelation would</p><p>need to be estimated from the correlogram of the residuals of a fitted linear</p><p>model; i.e., a linear model should first be fitted by ordinary least squares</p><p>(OLS) and the lag 1 autocorrelation read off from a correlogram plot of the</p><p>residuals of the fitted model.</p><p>In the example above, the standard errors of the parameters are consid-</p><p>erably greater than those obtained from OLS using lm (§5.3) and are more</p><p>accurate as they take the autocorrelation into account. The parameter esti-</p><p>mates from GLS will generally be slightly different from those obtained with</p><p>OLS, because of the weighting. For example, the slope is estimated as 3.06</p><p>using lm but 3.04 using gls. In principle, the GLS estimators are preferable</p><p>because they have smaller standard errors.</p><p>5.5 Linear models with seasonal variables 99</p><p>5.4.2 Confidence interval for the trend in the temperature series</p><p>To calculate an approximate 95% confidence interval for the trend in the global</p><p>temperature series (1970–2005), GLS is used to estimate the standard error</p><p>accounting for the autocorrelation in the residual series (Fig. 5.4). In the gls</p><p>function, the residual series is approximated as an AR(1) process with a lag</p><p>1 autocorrelation of 0.7 read from Figure 5.4, which is used as a parameter in</p><p>the gls function:</p><p>> temp.gls confint(temp.gls)</p><p>2.5 % 97.5 %</p><p>(Intercept) -39.8057 -28.4966</p><p>time(temp) 0.0144 0.0201</p><p>Although the confidence intervals above are now wider than they were in §5.3,</p><p>zero is not contained in the intervals, which implies that the estimates are</p><p>statistically significant, and, in particular, that the trend is significant. Thus,</p><p>there is statistical evidence of an increasing trend in global temperatures over</p><p>the period 1970–2005, so that, if current conditions persist, temperatures may</p><p>be expected to continue to rise in the future.</p><p>5.5 Linear models with seasonal variables</p><p>5.5.1 Introduction</p><p>As time series are observations measured sequentially in time, seasonal effects</p><p>are often present in the data, especially annual cycles caused directly or indi-</p><p>rectly by the Earth’s movement around the Sun. Seasonal effects have already</p><p>been observed in several of the series we have looked at, including the airline</p><p>series (§1.4.1), the temperature series (§1.4.5), and the electricity production</p><p>series (§1.4.3). In this section, linear regression models with predictor variables</p><p>for seasonal effects are considered.</p><p>5.5.2 Additive seasonal indicator variables</p><p>Suppose a time series contains s seasons. For example, with time series mea-</p><p>sured over each calendar month, s = 12, whereas for series measured over</p><p>six-month intervals, corresponding to summer and winter, s = 2. A seasonal</p><p>indicator model for a time series {xt : t = 1, . . . , n} containing s seasons and</p><p>a trend mt is given by</p><p>xt = mt + st + zt (5.6)</p><p>where st = βi when t falls in the ith season (t = 1, . . . , n; i = 1, . . . , s) and</p><p>{zt} is the residual error series, which may be autocorrelated. This model</p><p>100 5 Regression</p><p>takes the same form as the additive decomposition model (Equation (1.2))</p><p>but differs in that the trend is formulated with parameters. In Equation (5.6),</p><p>mt does not have a constant term (referred to as the intercept), i.e., mt could</p><p>be a polynomial of order p with parameters α1, . . . , αp. Equation (5.6) is then</p><p>equivalent to a polynomial trend in which the constant term depends on the</p><p>season, so that the s seasonal parameters (β1, . . . , βs) correspond to s possible</p><p>constant terms in Equation (5.2). Equation (5.6) can therefore be written as</p><p>xt = mt + β1+(t−1) mod s + zt (5.7)</p><p>For example, with a time series {xt} observed for each calendar month</p><p>beginning with t = 1 at January, a seasonal indicator model with a straight-</p><p>line trend is given by</p><p>xt = α1t+ st + zt =</p><p></p><p>α1t+ β1 + zt t = 1, 13, . . .</p><p>α1t+ β2 + zt t = 2, 14, . . .</p><p>...</p><p>α1t+ β12 + zt t = 12, 24, . . .</p><p>(5.8)</p><p>The parameters for the model in Equation (5.8) can be estimated by OLS</p><p>or GLS by treating the seasonal term st as a ‘factor’. In R, the factor function</p><p>can be applied to seasonal indices extracted using the function</p><p>cycle (§1.4.1).</p><p>5.5.3 Example: Seasonal model for the temperature series</p><p>The parameters of a straight-line trend with additive seasonal indices can be</p><p>estimated for the temperature series (1970–2005) as follows:</p><p>> Seas Time temp.lm coef(temp.lm)</p><p>Time factor(Seas)1 factor(Seas)2 factor(Seas)3</p><p>0.0177 -34.9973 -34.9880 -35.0100</p><p>factor(Seas)4 factor(Seas)5 factor(Seas)6 factor(Seas)7</p><p>-35.0123 -35.0337 -35.0251 -35.0269</p><p>factor(Seas)8 factor(Seas)9 factor(Seas)10 factor(Seas)11</p><p>-35.0248 -35.0383 -35.0525 -35.0656</p><p>factor(Seas)12</p><p>-35.0487</p><p>A zero is used within the formula to ensure that the model does not have an</p><p>intercept. If the intercept is included in the formula, one of the seasonal terms</p><p>will be dropped and an estimate for the intercept will appear in the output.</p><p>However, the fitted models, with or without an intercept, would be equivalent,</p><p>as can be easily verified by rerunning the algorithm above without the zero in</p><p>5.6 Harmonic seasonal models 101</p><p>the formula. The parameters can also be estimated by GLS by replacing lm</p><p>with gls in the code above.</p><p>Using the above fitted model, a two-year-ahead future prediction for the</p><p>temperature series is obtained as follows:</p><p>> new.t alpha beta (alpha * new.t + beta)[1:4]</p><p>factor(Seas)1 factor(Seas)2 factor(Seas)3 factor(Seas)4</p><p>0.524 0.535 0.514 0.514</p><p>Alternatively, the predict function can be used to make forecasts provided</p><p>the new data are correctly labelled within a data.frame:</p><p>> new.dat predict(temp.lm, new.dat)[1:24]</p><p>1 2 3 4 5 6 7 8 9 10 11 12</p><p>0.524 0.535 0.514 0.514 0.494 0.504 0.503 0.507 0.495 0.482 0.471 0.489</p><p>13 14 15 16 17 18 19 20 21 22 23 24</p><p>0.542 0.553 0.532 0.531 0.511 0.521 0.521 0.525 0.513 0.500 0.488 0.507</p><p>5.6 Harmonic seasonal models</p><p>In the previous section, one parameter estimate is used per season. However,</p><p>seasonal effects often vary smoothly over the seasons, so that it may be more</p><p>parameter-efficient to use a smooth function instead of separate indices.</p><p>Sine and cosine functions can be used to build smooth variation into a</p><p>seasonal model. A sine wave with frequency f (cycles per sampling interval),</p><p>amplitude A, and phase shift φ can be expressed as</p><p>A sin(2πft+ φ) = αs sin(2πft) + αc cos(2πft) (5.9)</p><p>where αs = A cos(φ) and αc = A sin(φ). The expression on the right-hand</p><p>side of Equation (5.9) is linear in the parameters αs and αc, whilst the left-</p><p>hand side is non-linear because the parameter φ is within the sine function.</p><p>Hence, the expression on the right-hand side is preferred in the formulation</p><p>of a seasonal regression model, so that OLS can be used to estimate the</p><p>parameters. For a time series {xt} with s seasons there are [s/2] possible</p><p>cycles.1 The harmonic seasonal model is defined by</p><p>1 The notation [ ] represents the integer part of the expression within. In most</p><p>practical cases, s is even and so [ ] can be omitted. However, for some ‘seasons’,</p><p>s may be an odd number, making the notation necessary. For example, if the</p><p>‘seasons’ are the days of the week, there would be [7/2] = 3 possible cycles.</p><p>102 5 Regression</p><p>xt = mt +</p><p>[s/2]∑</p><p>i=1</p><p>{</p><p>si sin(2πit/s) + ci cos(2πit/s)</p><p>}</p><p>+ zt (5.10)</p><p>wheremt is the trend which includes a parameter for the constant term, and si</p><p>and ci are unknown parameters. The trend may take a polynomial form as in</p><p>Equation (5.2). When s is an even number, the value of the sine at frequency</p><p>1/2 (when i = s/2 in the summation term shown in Equation (5.10)) will</p><p>be zero for all values of t, and so the term can be left out of the model.</p><p>Hence, with a constant term included, the maximum number of parameters</p><p>in the harmonic model equals that of the seasonal indicator variable model</p><p>(Equation (5.6)), and the fits will be identical.</p><p>At first sight it may seem strange that the harmonic model has cycles of</p><p>a frequency higher than the seasonal frequency of 1/s. However, the addition</p><p>of further harmonics has the effect of perturbing the underlying wave to make</p><p>it less regular than a standard sine wave of period s. This usually still gives</p><p>a dominant seasonal pattern of period s, but with a more realistic underlying</p><p>shape. For example, suppose data are taken at monthly intervals. Then the</p><p>second plot given below might be a more realistic underlying seasonal pattern</p><p>than the first plot, as it perturbs the standard sine wave by adding another</p><p>two harmonic terms of frequencies 2/12 and 4/12 (Fig. 5.5):</p><p>> TIME plot(TIME, sin(2 * pi * TIME/12), type = "l")</p><p>> plot(TIME, sin(2 * pi * TIME/12) + 0.2 * sin(2 * pi * 2 *</p><p>TIME/12) + 0.1 * sin(2 * pi * 4 * TIME/12) + 0.1 *</p><p>cos(2 * pi * 4 * TIME/12), type = "l")</p><p>The code above illustrates just one of many possible combinations of harmon-</p><p>ics that could be used to model a wide range of possible underlying seasonal</p><p>patterns.</p><p>5.6.1 Simulation</p><p>It is straightforward to simulate a series based on the harmonic model given</p><p>by Equation (5.10). For example, suppose the underlying model is</p><p>xt = 0.1 + 0.005t+ 0.001t2 + sin(2πt/12)+</p><p>0.2 sin(4πt/12) + 0.1 sin(8πt/12) + 0.1 cos(8πt/12) + wt</p><p>(5.11)</p><p>where {wt} is Gaussian white noise with standard deviation 0.5. This model</p><p>has the same seasonal harmonic components as the model represented in Fig-</p><p>ure 5.5b but also contains an underlying quadratic trend. Using the code</p><p>below, a series of length 10 years is simulated, and it is shown in Figure 5.6.</p><p>> set.seed(1)</p><p>> TIME w Trend Seasonal x plot(x, type = "l")</p><p>5.6.2 Fit to simulated series</p><p>With reference to Equation (5.10), it would seem reasonable to place the</p><p>harmonic variables in matrices, which can be achieved as follows:</p><p>> SIN for (i in 1:6) {</p><p>104 5 Regression</p><p>0 20 40 60 80 100 120</p><p>0</p><p>5</p><p>10</p><p>15</p><p>Fig. 5.6. Ten years of simulated data for the model given by Equation (5.11).</p><p>COS[, i] x.lm1 coef(x.lm1)/sqrt(diag(vcov(x.lm1)))</p><p>(Intercept) TIME I(TIME^2) COS[, 1] SIN[, 1] COS[, 2]</p><p>1.239 1.125 25.933 0.328 15.442 -0.515</p><p>SIN[, 2] COS[, 3] SIN[, 3] COS[, 4] SIN[, 4] COS[, 5]</p><p>3.447 0.232 -0.703 0.228 1.053 -1.150</p><p>SIN[, 5] COS[, 6] SIN[, 6]</p><p>0.857 -0.310 0.382</p><p>The preceding output</p><p>has three significant coefficients. These are used in the</p><p>following model:2</p><p>2 Some statisticians choose to include both the COS and SIN terms for a particular</p><p>frequency if either has a statistically significant value.</p><p>5.6 Harmonic seasonal models 105</p><p>> x.lm2 coef(x.lm2)/sqrt(diag(vcov(x.lm2)))</p><p>(Intercept) I(TIME^2) SIN[, 1] SIN[, 2]</p><p>4.63 111.14 15.79 3.49</p><p>As can be seen in the output from the last command, the coefficients are all</p><p>significant. The estimated coefficients of the best-fitting model are given by</p><p>> coef(x.lm2)</p><p>(Intercept) I(TIME^2) SIN[, 1] SIN[, 2]</p><p>0.28040 0.00104 0.90021 0.19886</p><p>The coefficients above give the following model for predictions at time t:</p><p>x̂t = 0.280 + 0.00104t2 + 0.900 sin(2πt/12) + 0.199 sin(4πt/12) (5.12)</p><p>The AIC can be used to compare the two fitted models:</p><p>> AIC(x.lm1)</p><p>[1] 165</p><p>> AIC(x.lm2)</p><p>[1] 150</p><p>As expected, the last model has the smallest AIC and therefore provides the</p><p>best fit to the data. Due to sampling variation, the best-fitting model is not</p><p>identical to the model used to simulate the data, as can easily be verified by</p><p>taking the AIC of the known underlying model:</p><p>> AIC(lm(x ~ TIME +I(TIME^2) +SIN[,1] +SIN[,2] +SIN[,4] +COS[,4]))</p><p>[1] 153</p><p>In R, the algorithm step can be used to automate the selection of the best-</p><p>fitting model by the AIC. For the example above, the appropriate command</p><p>is step(x.lm1), which contains all the predictor variables in the form of the</p><p>first model. Try running this command, and check that the final output agrees</p><p>with the model selected above.</p><p>A best fit can equally well be based on choosing the model that leads to</p><p>the smallest estimated standard deviations of the errors, provided the degrees</p><p>of freedom are taken into account.</p><p>5.6.3 Harmonic model fitted to temperature series (1970–2005)</p><p>In the code below, a harmonic model with a quadratic trend is fitted to the</p><p>temperature series (1970–2005) from §5.3.2. The units for the ‘time’ variable</p><p>are in ‘years’, so the divisor of 12 is not needed when creating the harmonic</p><p>variables. To reduce computation error in the OLS procedure due to large</p><p>numbers, the TIME variable is standardized after the COS and SIN predictors</p><p>have been calculated.</p><p>106 5 Regression</p><p>> SIN for (i in 1:6) {</p><p>COS[, i] TIME mean(time(temp))</p><p>[1] 1988</p><p>> sd(time(temp))</p><p>[1] 10.4</p><p>> temp.lm1 coef(temp.lm1)/sqrt(diag(vcov(temp.lm1)))</p><p>(Intercept) TIME I(TIME^2) COS[, 1] SIN[, 1] COS[, 2]</p><p>18.245 30.271 1.281 0.747 2.383 1.260</p><p>SIN[, 2] COS[, 3] SIN[, 3] COS[, 4] SIN[, 4] COS[, 5]</p><p>1.919 0.640 0.391 0.551 0.168 0.324</p><p>SIN[, 5] COS[, 6] SIN[, 6]</p><p>0.345 -0.409 -0.457</p><p>> temp.lm2 coef(temp.lm2)</p><p>(Intercept) TIME SIN[, 1] SIN[, 2]</p><p>0.1750 0.1841 0.0204 0.0162</p><p>> AIC(temp.lm)</p><p>[1] -547</p><p>> AIC(temp.lm1)</p><p>[1] -545</p><p>> AIC(temp.lm2)</p><p>[1] -561</p><p>Again, the AIC is used to compare the fitted models, and only statistically</p><p>significant terms are included in the final model.</p><p>To check the adequacy of the fitted model, it is appropriate to create a</p><p>time plot and correlogram of the residuals because the residuals form a time</p><p>series (Fig. 5.7). The time plot is used to detect patterns in the series. For</p><p>example, if a higher-ordered polynomial is required, this would show up as a</p><p>curve in the time plot. The purpose of the correlogram is to determine whether</p><p>there is autocorrelation in the series, which would require a further model.</p><p>5.6 Harmonic seasonal models 107</p><p>> plot(time(temp), resid(temp.lm2), type = "l")</p><p>> abline(0, 0, col = "red")</p><p>> acf(resid(temp.lm2))</p><p>> pacf(resid(temp.lm2))</p><p>In Figure 5.7(a), there is no discernible curve in the series, which implies</p><p>that a straight line is an adequate description of the trend. A tendency for the</p><p>series to persist above or below the x-axis implies that the series is positively</p><p>autocorrelated. This is verified in the correlogram of the residuals, which shows</p><p>a clear positive autocorrelation at lags 1–10 (Fig. 5.7b).</p><p>1970 1975 1980 1985 1990 1995 2000 2005</p><p>−</p><p>0.</p><p>4</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>(a)</p><p>R</p><p>es</p><p>id</p><p>ua</p><p>l</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>(b)</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>(c)</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 5.7. Residual diagnostic plots for the harmonic model fitted to the temperature</p><p>series (1970–2005): (a) the residuals plotted against time; (b) the correlogram of the</p><p>residuals (time units are months); (c) partial autocorrelations plotted against lag</p><p>(in months).</p><p>The correlogram in Figure 5.7 is similar to that expected of an AR(p)</p><p>process (§4.5.5). This is verified by the plot of the partial autocorrelations,</p><p>in which only the lag 1 and lag 2 autocorrelations are statistically significant</p><p>(Fig. 5.7). In the code below, an AR(2) model is fitted to the residual series:</p><p>108 5 Regression</p><p>> res.ar res.ar$ar</p><p>[1] 0.494 0.307</p><p>> sd(res.ar$res[-(1:2)])</p><p>[1] 0.0837</p><p>> acf(res.ar$res[-(1:2)])</p><p>The correlogram of the residuals of the fitted AR(2) model is given in Figure</p><p>5.8, from which it is clear that the residuals are approximately white noise.</p><p>Hence, the final form of the model provides a good fit to the data. The fitted</p><p>model for the monthly temperature series can be written as</p><p>xt = 0.175 +</p><p>0.184(t− 1988)</p><p>10.4</p><p>+ 0.0204 sin(2πt) + 0.0162 sin(4πt) + zt (5.13)</p><p>where t is ‘time’ measured in units of ‘years’, the residual series {zt} follow</p><p>an AR(2) process given by</p><p>zt = 0.494zt−1 + 0.307zt−2 + wt (5.14)</p><p>and {wt} is white noise with mean zero and standard deviation 0.0837.</p><p>If we require an accurate assessment of the standard error, we should refit</p><p>the model using gls, allowing for an AR(2) structure for the errors (Exer-</p><p>cise 6).</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.8. Correlogram of the residuals of the AR(2) model fitted to the residuals of</p><p>the harmonic model for the temperature series.</p><p>5.7 Logarithmic transformations 109</p><p>5.7 Logarithmic transformations</p><p>5.7.1 Introduction</p><p>Recall from §5.2 that the natural logarithm (base e) can be used to transform</p><p>a model with multiplicative components to a model with additive components.</p><p>For example, if {xt} is a time series given by</p><p>xt = m′</p><p>t s</p><p>′</p><p>t z</p><p>′</p><p>t (5.15)</p><p>where m′</p><p>t is the trend, s′t is the seasonal effect, and z′t is the residual error,</p><p>then the series {yt}, given by</p><p>yt = log xt = logm′</p><p>t + log s′t + log z′t = mt + st + zt (5.16)</p><p>has additive components, so that if mt and st are also linear functions, the</p><p>parameters in Equation (5.16) can be estimated by OLS. In Equation (5.16),</p><p>logs can be taken only if the series {xt} takes all positive values; i.e., xt > 0 for</p><p>all t. Conversely, a log-transformation may be seen as an appropriate model</p><p>formulation when a series can only take positive values and has values near</p><p>zero because the anti-log forces the predicted and simulated values for {xt}</p><p>to be positive.</p><p>5.7.2 Example using the air passenger series</p><p>Consider the air passenger series from §1.4.1. Time plots of the original series</p><p>and the natural logarithm of the series can be obtained using the code below</p><p>and are shown in Figure 5.9.</p><p>> data(AirPassengers)</p><p>> AP plot(AP)</p><p>> plot(log(AP))</p><p>In Figure 5.9(a), the variance can be seen to increase as t increases, whilst</p><p>after the logarithm is taken the variance is approximately constant over the</p><p>period of the record (Fig. 5.9b). Therefore, as the number of people using</p><p>the airline can also only be positive, the logarithm would be appropriate in</p><p>the model formulation for this time series. In the following code, a harmonic</p><p>model with polynomial trend is fitted to the air passenger series. The function</p><p>time is used to</p><p>extract the time and create a standardised time variable TIME.</p><p>> SIN for (i in 1:6) {</p><p>SIN[, i] TIME mean(time(AP))</p><p>110 5 Regression</p><p>(a)</p><p>A</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>(</p><p>10</p><p>00</p><p>s)</p><p>1950 1952 1954 1956 1958 1960</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>(b)</p><p>A</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>(</p><p>10</p><p>00</p><p>s)</p><p>1950 1952 1954 1956 1958 1960</p><p>5.</p><p>0</p><p>5.</p><p>5</p><p>6.</p><p>0</p><p>6.</p><p>5</p><p>Fig. 5.9. Time plots of (a) the airline series (1949–1960) and (b) the natural loga-</p><p>rithm of the airline series.</p><p>[1] 1955</p><p>> sd(time(AP))</p><p>[1] 3.48</p><p>> AP.lm1 coef(AP.lm1)/sqrt(diag(vcov(AP.lm1)))</p><p>(Intercept) TIME I(TIME^2) I(TIME^3) I(TIME^4) SIN[, 1]</p><p>744.685 42.382 -4.162 -0.751 1.873 4.868</p><p>COS[, 1] SIN[, 2] COS[, 2] SIN[, 3] COS[, 3] SIN[, 4]</p><p>-26.055 10.395 10.004 -4.844 -1.560 -5.666</p><p>COS[, 4] SIN[, 5] COS[, 5] SIN[, 6] COS[, 6]</p><p>1.946 -3.766 1.026 0.150 -0.521</p><p>> AP.lm2 coef(AP.lm2)/sqrt(diag(vcov(AP.lm2)))</p><p>5.7 Logarithmic transformations 111</p><p>(Intercept) TIME I(TIME^2) SIN[, 1] COS[, 1] SIN[, 2]</p><p>922.63 103.52 -8.24 4.92 -25.81 10.36</p><p>COS[, 2] SIN[, 3] SIN[, 4] COS[, 4] SIN[, 5]</p><p>9.96 -4.79 -5.61 1.95 -3.73</p><p>> AIC(AP.lm1)</p><p>[1] -448</p><p>> AIC(AP.lm2)</p><p>[1] -451</p><p>> acf(resid(AP.lm2))</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>1.</p><p>0</p><p>(a)</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>(b)</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 5.10. The correlogram (a) and partial autocorrelations (b) of the residual</p><p>series.</p><p>The residual correlogram indicates that the data are positively autocorre-</p><p>lated (Fig. 5.10). As mentioned in §5.4, the standard errors of the parameter</p><p>estimates are likely to be under-estimated if there is positive serial corre-</p><p>lation in the data. This implies that predictor variables may falsely appear</p><p>‘significant’ in the fitted model. In the code below, GLS is used to check the</p><p>significance of the variables in the fitted model, using the lag 1 autocorrelation</p><p>(approximately 0.6) from Figure 5.10.</p><p>112 5 Regression</p><p>> AP.gls coef(AP.gls)/sqrt(diag(vcov(AP.gls)))</p><p>(Intercept) TIME I(TIME^2) SIN[, 1] COS[, 1] SIN[, 2]</p><p>398.84 45.85 -3.65 3.30 -18.18 11.77</p><p>COS[, 2] SIN[, 3] SIN[, 4] COS[, 4] SIN[, 5]</p><p>11.43 -7.63 -10.75 3.57 -7.92</p><p>In Figure 5.10(b), the partial autocorrelation plot suggests that the resid-</p><p>ual series follows an AR(1) process, which is fitted to the series below:</p><p>> AP.ar AP.ar$ar</p><p>[1] 0.641</p><p>> acf(AP.ar$res[-1])</p><p>The correlogram of the residuals of the fitted AR(1) model might be taken</p><p>for white noise given that only one autocorrelation is significant (Fig. 5.11).</p><p>However, the lag of this significant value corresponds to the seasonal lag (12)</p><p>in the original series, which implies that the fitted model has failed to fully</p><p>account for the seasonal variation in the data. Understandably, the reader</p><p>might regard this as curious, given that the data were fitted using the full</p><p>seasonal harmonic model. However, seasonal effects can be stochastic just</p><p>as trends can, and the harmonic model we have used is deterministic. In</p><p>Chapter 7, models with stochastic seasonal terms will be considered.</p><p>0 5 10 15 20</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 5.11. Correlogram of the residuals from the AR(1) model fitted to the residuals</p><p>of the logarithm model.</p><p>5.8 Non-linear models 113</p><p>5.8 Non-linear models</p><p>5.8.1 Introduction</p><p>For the reasons given in §5.2, linear models are applicable to a wide range of</p><p>time series. However, for some time series it may be more appropriate to fit</p><p>a non-linear model directly rather than take logs or use a linear polynomial</p><p>approximation. For example, if a series is known to derive from a known non-</p><p>linear process, perhaps based on an underlying known deterministic law in</p><p>science, then it would be better to use this information in the model formula-</p><p>tion and fit a non-linear model directly to the data. In R, a non-linear model</p><p>can be fitted by least squares using the function nls.</p><p>In the previous section, we found that using the natural logarithm of a</p><p>series could help stabilise the variance. However, using logs can present diffi-</p><p>culties when a series contains negative values, because the log of a negative</p><p>value is undefined. One way around this problem is to add a constant to all</p><p>the terms in the series, so if {xt} is a series containing (some) negative values,</p><p>then adding c0 such that c0 > max{−xt} and then taking logs produces a</p><p>transformed series {log(c0 +xt)} that is defined for all t. A linear model (e.g.,</p><p>a straight-line trend) could then be fitted to produce for {xt} the model</p><p>xt = −c0 + eα0+α1t+zt (5.17)</p><p>where α0 and α1 are model parameters and {zt} is a residual series that may</p><p>be autocorrelated.</p><p>The main difficulty with the approach leading to Equation (5.17) is that</p><p>c0 should really be estimated like any other parameter in the model, whilst in</p><p>practice a user will often arbitrarily choose a value that satisfies the constraint</p><p>(c0 > max{−xt}). If there is a reason to expect a model similar to that in</p><p>Equation (5.17) but there is no evidence for multiplicative residual terms, then</p><p>the constant c0 should be estimated with the other model parameters using</p><p>non-linear least squares; i.e., the following model should be fitted:</p><p>xt = −c0 + eα0+α1t + zt (5.18)</p><p>5.8.2 Example of a simulated and fitted non-linear series</p><p>As non-linear models are generally fitted when the underlying non-linear func-</p><p>tion is known, we will simulate a non-linear series based on Equation (5.18)</p><p>with c0 = 0 and compare parameters estimated using nls with those of the</p><p>known underlying function.</p><p>Below, a non-linear series with AR(1) residuals is simulated and plotted</p><p>(Fig. 5.12):</p><p>> set.seed(1)</p><p>> w z for (t in 2:100) z[t] Time f x plot(x, type = "l")</p><p>> abline(0, 0)</p><p>0 20 40 60 80 100</p><p>0</p><p>10</p><p>0</p><p>20</p><p>0</p><p>30</p><p>0</p><p>40</p><p>0</p><p>time</p><p>Fig. 5.12. Plot of a non-linear series containing negative values.</p><p>The series plotted in Figure 5.12 has an apparent increasing exponential</p><p>trend but also contains negative values, so that a direct log-transformation</p><p>cannot be used and a non-linear model is needed. In R, a non-linear model is</p><p>fitted by specifying a formula with the parameters and their starting values</p><p>contained in a list:</p><p>> x.nls summary(x.nls)$parameters</p><p>Estimate Std. Error t value Pr(>|t|)</p><p>alp0 1.1764 0.074295 15.8 9.20e-29</p><p>alp1 0.0483 0.000819 59.0 2.35e-78</p><p>The estimates for α0 and α1 are close to the underlying values that were</p><p>used to simulate the data, although the standard errors of these estimates are</p><p>likely to be underestimated because of the autocorrelation in the residuals.3</p><p>3 The generalised least squares function gls can be used to fit non-linear mod-</p><p>els with autocorrelated residuals. However, in practice, computational difficulties</p><p>often arise when using this function with non-linear models.</p><p>5.10 Inverse transform and bias correction 115</p><p>5.9 Forecasting from regression</p><p>5.9.1 Introduction</p><p>A forecast is a prediction into the future. In the context of time series re-</p><p>gression, a forecast involves extrapolating a fitted model into the future by</p><p>evaluating the model function for a new series of times. The main problem</p><p>with this approach is that the trends present in the fitted series may change</p><p>in the future. Therefore, it is better to think of a forecast from a regression</p><p>model as an</p><p>expected value conditional on past trends continuing into the</p><p>future.</p><p>5.9.2 Prediction in R</p><p>The generic function for making predictions in R is predict. The function</p><p>essentially takes a fitted model and new data as parameters. The key to using</p><p>this function with a regression model is to ensure that the new data are</p><p>properly defined and labelled in a data.frame.</p><p>In the code below, we use this function in the fitted regression model</p><p>of §5.7.2 to forecast the number of air passengers travelling for the 10-year</p><p>period that follows the record (Fig. 5.13). The forecast is given by applying</p><p>the exponential function (anti-log) to predict because the regression model</p><p>was fitted to the logarithm of the series:</p><p>> new.t TIME SIN for (i in 1:6) {</p><p>COS[, i] SIN new.dat AP.pred.ts ts.plot(log(AP), log(AP.pred.ts), lty = 1:2)</p><p>> ts.plot(AP, AP.pred.ts, lty = 1:2)</p><p>5.10 Inverse transform and bias correction</p><p>5.10.1 Log-normal residual errors</p><p>The forecasts in Figure 5.13(b) were obtained by applying the anti-log to the</p><p>forecasted values obtained from the log-regression model. However, the process</p><p>116 5 Regression</p><p>(a)</p><p>Lo</p><p>g</p><p>of</p><p>a</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>1950 1955 1960 1965 1970</p><p>5.</p><p>0</p><p>6.</p><p>0</p><p>7.</p><p>0</p><p>(b)</p><p>A</p><p>ir</p><p>pa</p><p>ss</p><p>en</p><p>ge</p><p>rs</p><p>(</p><p>10</p><p>00</p><p>s)</p><p>1950 1955 1960 1965 1970</p><p>20</p><p>0</p><p>60</p><p>0</p><p>10</p><p>00</p><p>Fig. 5.13. Air passengers (1949–1960; solid line) and forecasts (1961–1970; dotted</p><p>lines): (a) logarithm and forecasted values; (b) original series and anti-log of the</p><p>forecasted values.</p><p>of using a transformation, such as the logarithm, and then applying an inverse</p><p>transformation introduces a bias in the forecasts of the mean values. If the</p><p>regression model closely fits the data, this bias will be small (as shown in the</p><p>next example for the airline predictions). Note that a bias correction is only</p><p>for means and should not be used in simulations.</p><p>The bias in the means arises as a result of applying the inverse transform</p><p>to a residual series. For example, if the time series are Gaussian white noise</p><p>{wt}, with mean zero and standard deviation σ, then the distribution of the</p><p>inverse-transform (the anti-log) of the series is log-normal with mean e</p><p>1</p><p>2 σ2</p><p>.</p><p>This can be verified theoretically, or empirically by simulation as in the code</p><p>below:</p><p>> set.seed(1)</p><p>> sigma w mean(w)</p><p>[1] 4.69e-05</p><p>5.10 Inverse transform and bias correction 117</p><p>> mean(exp(w))</p><p>[1] 1.65</p><p>> exp(sigma^2/2)</p><p>[1] 1.65</p><p>The code above indicates that the mean of the anti-log of the Gaussian</p><p>white noise and the expected mean from a log-normal distribution are equal.</p><p>Hence, for a Gaussian white noise residual series, a correction factor of e</p><p>1</p><p>2 σ2</p><p>should be applied to the forecasts of means. The importance of this correction</p><p>factor really depends on the value of σ2. If σ2 is very small, the correction</p><p>factor will hardly change the forecasts at all and so could be neglected with-</p><p>out major concern, especially as errors from other sources are likely to be</p><p>significantly greater.</p><p>5.10.2 Empirical correction factor for forecasting means</p><p>The e</p><p>1</p><p>2 σ2</p><p>correction factor can be used when the residual series of the fitted</p><p>log-regression model is Gaussian white noise. In general, however, the distri-</p><p>bution of the residuals from the log regression (Exercise 5) is often negatively</p><p>skewed, in which case a correction factor can be determined empirically us-</p><p>ing the mean of the anti-log of the residual series. In this approach, adjusted</p><p>forecasts {x̂′t} can be obtained from</p><p>x̂′t = e</p><p>ˆlog xt</p><p>n∑</p><p>t=1</p><p>ezt/n (5.19)</p><p>where { ˆlog xt : t = 1, . . . , n} is the predicted series given by the fitted log-</p><p>regression model, and {zt} is the residual series from this fitted model.</p><p>The following example illustrates the procedure for calculating the correc-</p><p>tion factors.</p><p>5.10.3 Example using the air passenger data</p><p>For the airline series, the forecasts can be adjusted by multiplying the predic-</p><p>tions by e</p><p>1</p><p>2 σ2</p><p>, where σ is the standard deviation of the residuals, or using an</p><p>empirical correction factor as follows:</p><p>> summary(AP.lm2)$r.sq</p><p>[1] 0.989</p><p>> sigma lognorm.correction.factor empirical.correction.factor lognorm.correction.factor</p><p>[1] 1.001171</p><p>> empirical.correction.factor</p><p>[1] 1.001080</p><p>> AP.pred.ts</p><p>. . . . 113</p><p>xii Contents</p><p>5.9 Forecasting from regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.9.2 Prediction in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.10 Inverse transform and bias correction . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.10.1 Log-normal residual errors . . . . . . . . . . . . . . . . . . . . . . . . . . 115</p><p>5.10.2 Empirical correction factor for forecasting means . . . . . . 117</p><p>5.10.3 Example using the air passenger data . . . . . . . . . . . . . . . . 117</p><p>5.11 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118</p><p>5.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118</p><p>6 Stationary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121</p><p>6.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121</p><p>6.2 Strictly stationary series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121</p><p>6.3 Moving average models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122</p><p>6.3.1 MA(q) process: Definition and properties . . . . . . . . . . . . . 122</p><p>6.3.2 R examples: Correlogram and simulation . . . . . . . . . . . . . 123</p><p>6.4 Fitted MA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124</p><p>6.4.1 Model fitted to simulated series . . . . . . . . . . . . . . . . . . . . . 124</p><p>6.4.2 Exchange rate series: Fitted MA model . . . . . . . . . . . . . . 126</p><p>6.5 Mixed models: The ARMA process . . . . . . . . . . . . . . . . . . . . . . . . 127</p><p>6.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127</p><p>6.5.2 Derivation of second-order properties* . . . . . . . . . . . . . . . 128</p><p>6.6 ARMA models: Empirical analysis . . . . . . . . . . . . . . . . . . . . . . . . . 129</p><p>6.6.1 Simulation and fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129</p><p>6.6.2 Exchange rate series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129</p><p>6.6.3 Electricity production series . . . . . . . . . . . . . . . . . . . . . . . . 130</p><p>6.6.4 Wave tank data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133</p><p>6.7 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135</p><p>6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135</p><p>7 Non-stationary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137</p><p>7.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137</p><p>7.2 Non-seasonal ARIMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137</p><p>7.2.1 Differencing and the electricity series . . . . . . . . . . . . . . . . 137</p><p>7.2.2 Integrated model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138</p><p>7.2.3 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139</p><p>7.2.4 Simulation and fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140</p><p>7.2.5 IMA(1, 1) model fitted to the beer production series . . . 141</p><p>7.3 Seasonal ARIMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142</p><p>7.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142</p><p>7.3.2 Fitting procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143</p><p>7.4 ARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145</p><p>7.4.1 S&P500 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145</p><p>7.4.2 Modelling volatility: Definition of the ARCH model . . . . 147</p><p>7.4.3 Extensions and GARCH models . . . . . . . . . . . . . . . . . . . . . 148</p><p>Contents xiii</p><p>7.4.4 Simulation and fitted GARCH model . . . . . . . . . . . . . . . . 149</p><p>7.4.5 Fit to S&P500 series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150</p><p>7.4.6 Volatility in climate series . . . . . . . . . . . . . . . . . . . . . . . . . . 152</p><p>7.4.7 GARCH in forecasts and simulations . . . . . . . . . . . . . . . . 155</p><p>7.5 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155</p><p>7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155</p><p>8 Long-Memory Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159</p><p>8.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159</p><p>8.2 Fractional differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159</p><p>8.3 Fitting to simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161</p><p>8.4 Assessing evidence of long-term dependence . . . . . . . . . . . . . . . . . 164</p><p>8.4.1 Nile minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164</p><p>8.4.2 Bellcore Ethernet data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165</p><p>8.4.3 Bank loan rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166</p><p>8.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167</p><p>8.6 Summary of additional commands used . . . . . . . . . . . . . . . . . . . . 168</p><p>8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168</p><p>9 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.2 Periodic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.2.1 Sine waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171</p><p>9.2.2 Unit of measurement of frequency . . . . . . . . . . . . . . . . . . . 172</p><p>9.3 Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173</p><p>9.3.1 Fitting sine waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173</p><p>9.3.2 Sample spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175</p><p>9.4 Spectra of simulated series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175</p><p>9.4.1 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175</p><p>9.4.2 AR(1): Positive coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . 177</p><p>9.4.3 AR(1): Negative coefficient . . . . . . . . . . . . . . . . . . . . . . . . . 178</p><p>9.4.4 AR(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178</p><p>9.5 Sampling interval and record length. . . . . . . . . . . . . . . . . . . . . . . . 179</p><p>9.5.1 Nyquist frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181</p><p>9.5.2 Record length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181</p><p>9.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183</p><p>9.6.1 Wave tank data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183</p><p>9.6.2 Fault detection on electric motors . . . . . . . . . . . . . . . . . . . 183</p><p>9.6.3 Measurement of vibration dose . . . . . . . . . . . . . . . . . . . . . . 184</p><p>9.6.4 Climatic indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187</p><p>9.6.5 Bank loan rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189</p><p>9.7 Discrete Fourier transform (DFT)* . . . . . . . . . . . . . . . . . . . . . . . . 190</p><p>9.8 The spectrum of a random process* . . . . . . . . . . . . . . . . . . . . . . . . 192</p><p>9.8.1 Discrete white noise . . . . . . .</p><p>and</p><p>comment on the plot. Fit back-to-back Weibull distributions to the</p><p>errors.</p><p>c) Simulate 20 realisations of inflow for the next 10 years.</p><p>d) Give reasons why a log transformation may be suitable for the series</p><p>of inflows.</p><p>e) Regress log(inflow) on month using indicator variables and time t</p><p>(as above), and fit a suitable AR model to the residual error series.</p><p>f) Plot a histogram of the residual errors of the fitted AR model, and</p><p>comment on the plot. Fit a back-to-back Weibull distribution to the</p><p>residual errors.</p><p>120 5 Regression</p><p>g) Simulate 20 realisations of log(inflow) for the next 10-years. Take</p><p>anti-logs of the simulated values to produce a series of simulated flows.</p><p>h) Compare both sets of simulated flows, and discuss which is the more</p><p>satisfactory.</p><p>6. Refit the harmonic model to the temperature series using gls, allowing</p><p>for errors from an AR(2) process.</p><p>a) Construct a 99% confidence interval for the coefficient of time.</p><p>b) Plot the residual error series from the model fitted using GLS against</p><p>the residual error series from the model fitted using OLS.</p><p>c) Refit the AR(2) model to the residuals from the fitted (GLS) model.</p><p>d) How different are the fitted models?</p><p>e) Calculate the annual means. Use OLS to regress the annual mean</p><p>temperature on time, and construct a 99% confidence interval for its</p><p>coefficient.</p><p>6</p><p>Stationary Models</p><p>6.1 Purpose</p><p>As seen in the previous chapters, a time series will often have well-defined</p><p>components, such as a trend and a seasonal pattern. A well-chosen linear re-</p><p>gression may account for these non-stationary components, in which case the</p><p>residuals from the fitted model should not contain noticeable trend or seasonal</p><p>patterns. However, the residuals will usually be correlated in time, as this is</p><p>not accounted for in the fitted regression model. Similar values may cluster to-</p><p>gether in time; for example, monthly values of the Southern Oscillation Index,</p><p>which is closely associated with El Niño, tend to change slowly and may give</p><p>rise to persistent weather patterns. Alternatively, adjacent observations may</p><p>be negatively correlated; for example, an unusually high monthly sales figure</p><p>may be followed by an unusually low value because customers have supplies</p><p>left over from the previous month. In this chapter, we consider stationary</p><p>models that may be suitable for residual series that contain no obvious trends</p><p>or seasonal cycles. The fitted stationary models may then be combined with</p><p>the fitted regression model to improve forecasts. The autoregressive models</p><p>that were introduced in §4.5 often provide satisfactory models for the residual</p><p>time series, and we extend the repertoire in this chapter. The term stationary</p><p>was discussed in previous chapters; we now give a more rigorous definition.</p><p>6.2 Strictly stationary series</p><p>A time series model {xt} is strictly stationary if the joint statistical distribu-</p><p>tion of xt1 , . . . , xtn is the same as the joint distribution of xt1+m, . . . , xtn+m for</p><p>all t1, . . . , tn and m, so that the distribution is unchanged after an arbitrary</p><p>time shift. Note that strict stationarity implies that the mean and variance</p><p>are constant in time and that the autocovariance Cov(xt, xs) only depends on</p><p>lag k = |t − s| and can be written γ(k). If a series is not strictly stationary</p><p>but the mean and variance are constant in time and the autocovariance only</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 121</p><p>Use R, DOI 10.1007/978-0-387-88698-5 6,</p><p>© Springer Science+Business Media, LLC 2009</p><p>122 6 Stationary Models</p><p>depends on the lag, then the series is called second-order stationary.1 We focus</p><p>on the second-order properties in this chapter, but the stochastic processes</p><p>discussed are strictly stationary. Furthermore, if the white noise is Gaussian,</p><p>the stochastic process is completely defined by the mean and covariance struc-</p><p>ture, in the same way as any normal distribution is defined by its mean and</p><p>variance-covariance matrix.</p><p>Stationarity is an idealisation that is a property of models. If we fit a</p><p>stationary model to data, we assume our data are a realisation of a stationary</p><p>process. So our first step in an analysis should be to check whether there is any</p><p>evidence of a trend or seasonal effects and, if there is, remove them. Regression</p><p>can break down a non-stationary series to a trend, seasonal components, and</p><p>residual series. It is often reasonable to treat the time series of residuals as a</p><p>realisation of a stationary error series. Therefore, the models in this chapter</p><p>are often fitted to residual series arising from regression analyses.</p><p>6.3 Moving average models</p><p>6.3.1 MA(q) process: Definition and properties</p><p>A moving average (MA) process of order q is a linear combination of the</p><p>current white noise term and the q most recent past white noise terms and is</p><p>defined by</p><p>xt = wt + β1wt−1 + . . .+ βqwt−q (6.1)</p><p>where {wt} is white noise with zero mean and variance σ2</p><p>w. Equation (6.1)</p><p>can be rewritten in terms of the backward shift operator B</p><p>xt = (1 + β1B + β2B2 + · · ·+ βqBq)wt = φq(B)wt (6.2)</p><p>where φq is a polynomial of order q. Because MA processes consist of a finite</p><p>sum of stationary white noise terms, they are stationary and hence have a</p><p>time-invariant mean and autocovariance.</p><p>The mean and variance for {xt} are easy to derive. The mean is just zero</p><p>because it is a sum of terms that all have a mean of zero. The variance is σ2</p><p>w(1+</p><p>β2</p><p>1 + . . .+β2</p><p>q ) because each of the white noise terms has the same variance and</p><p>the terms are mutually independent. The autocorrelation function, for k ≥ 0,</p><p>is given by</p><p>ρ(k) =</p><p></p><p>1 k = 0∑q−k</p><p>i=0 βiβi+k/</p><p>∑q</p><p>i=0 β</p><p>2</p><p>i k = 1, . . . , q</p><p>0 k > q</p><p>(6.3)</p><p>where β0 is unity. The function is zero when k > q because xt and xt+k</p><p>then consist of sums of independent white noise terms and so have covariance</p><p>1 For example, the skewness, or more generally E(xtxt+kxt+l), might change over</p><p>time.</p><p>6.3 Moving average models 123</p><p>zero. The derivation of the autocorrelation function is left to Exercise 1. An</p><p>MA process is invertible if it can be expressed as a stationary autoregressive</p><p>process of infinite order without an error term. For example, the MA process</p><p>xt = (1− βB)wt can be expressed as</p><p>wt = (1− βB)−1xt = xt + βxt−1 + β2xt−2 + . . . (6.4)</p><p>provided |β| rho q) ACF beta rho.k for (k in 1:10) rho.k[k] plot(0:10, c(1, rho.k), pch = 4, ylab = expression(rho[k]))</p><p>> abline(0, 0)</p><p>The plot in Figure 6.1(b) is the autocovariance function</p><p>for an MA(3) process</p><p>with parameters β1 = −0.7, β2 = 0.5, and β3 = −0.2, which has negative</p><p>124 6 Stationary Models</p><p>0 2 4 6 8 10</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>lag k</p><p>ρρ k</p><p>(a)</p><p>0 2 4 6 8 10</p><p>0</p><p>1</p><p>lag k</p><p>ρρ k</p><p>(b)</p><p>Fig. 6.1. Plots of the autocorrelation functions for two MA(3) processes: (a) β1 =</p><p>0.7, β2 = 0.5, β3 = 0.2; (b) β1 = −0.7, β2 = 0.5, β3 = −0.2.</p><p>correlations at lags 1 and 3. The function expression is used to get the</p><p>Greek symbol ρ.</p><p>The code below can be used to simulate the MA(3) process and plot the cor-</p><p>relogram of the simulated series. An example time plot and correlogram are</p><p>shown in Figure 6.2. As expected, the first three autocorrelations are signif-</p><p>icantly different from 0 (Fig. 6.2b); other statistically significant correlations</p><p>are attributable to random sampling variation. Note that in the correlogram</p><p>plot (Fig. 6.2b) 1 in 20 (5%) of the sample correlations for lags greater than</p><p>3, for which the underlying population correlation is zero, are expected to be</p><p>statistically significantly different from zero at the 5% level because multiple</p><p>t-test results are being shown on the plot.</p><p>> set.seed(1)</p><p>> b x for (t in 4:1000) {</p><p>for (j in 1:3) x[t] plot(x, type = "l")</p><p>> acf(x)</p><p>6.4 Fitted MA models</p><p>6.4.1 Model fitted to simulated series</p><p>An MA(q) model can be fitted to data in R using the arima function with</p><p>the order function parameter set to c(0,0,q). Unlike the function ar, the</p><p>6.4 Fitted MA models 125</p><p>0 200 400 600 800 1000</p><p>−</p><p>4</p><p>−</p><p>2</p><p>0</p><p>2</p><p>4</p><p>(a)</p><p>Time t</p><p>R</p><p>ea</p><p>lis</p><p>at</p><p>io</p><p>n</p><p>at</p><p>t</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.2. (a) Time plot and (b) correlogram for a simulated MA(3) process.</p><p>function arima does not subtract the mean by default and estimates an in-</p><p>tercept term. MA models cannot be expressed in a multiple regression form,</p><p>and, in general, the parameters are estimated with a numerical algorithm. The</p><p>function arima minimises the conditional sum of squares to estimate values of</p><p>the parameters and will either return these if method=c("CSS") is specified</p><p>or use them as initial values for maximum likelihood estimation.</p><p>A description of the conditional sum of squares algorithm for fitting an</p><p>MA(q) process follows. For any choice of parameters, the sum of squared</p><p>residuals can be calculated iteratively by rearranging Equation (6.1) and re-</p><p>placing the errors, wt, with their estimates (that is, the residuals), which are</p><p>denoted by ŵt:</p><p>S(β̂1, . . . , β̂q) =</p><p>n∑</p><p>t=1</p><p>ŵ2</p><p>t =</p><p>n∑</p><p>t=1</p><p>{</p><p>xt − (β̂1ŵt−1 + · · ·+ β̂qŵt−q)</p><p>}2</p><p>(6.5)</p><p>conditional on ŵ0, . . . , ŵt−q being taken as 0 to start the iteration. A numerical</p><p>search is used to find the parameter values that minimise this conditional sum</p><p>of squares.</p><p>In the following code, a moving average model, x.ma, is fitted to the simu-</p><p>lated series of the last section. Looking at the parameter estimates (coefficients</p><p>in the output below), it can be seen that the 95% confidence intervals (approx-</p><p>imated by coeff. ±2 s.e. of coeff.) contain the underlying parameter values (0.8,</p><p>0.6, and 0.4) that were used in the simulations. Furthermore, also as expected,</p><p>126 6 Stationary Models</p><p>the intercept is not significantly different from its underlying parameter value</p><p>of zero.</p><p>> x.ma x.ma</p><p>Call:</p><p>arima(x = x, order = c(0, 0, 3))</p><p>Coefficients:</p><p>ma1 ma2 ma3 intercept</p><p>0.790 0.566 0.396 -0.032</p><p>s.e. 0.031 0.035 0.032 0.090</p><p>sigma^2 estimated as 1.07: log likelihood = -1452, aic = 2915</p><p>It is possible to set the value for the mean to zero, rather than estimate</p><p>the intercept, by using include.mean=FALSE within the arima function. This</p><p>option should be used with caution and would only be appropriate if you</p><p>wanted {xt} to represent displacement from some known fixed mean.</p><p>6.4.2 Exchange rate series: Fitted MA model</p><p>In the code below, an MA(1) model is fitted to the exchange rate series.</p><p>If you refer back to §4.6.2, a comparison with the output below indicates</p><p>that the AR(1) model provides the better fit, as it has the smaller standard</p><p>deviation of the residual series, 0.031 compared with 0.042. Furthermore, the</p><p>correlogram of the residuals indicates that an MA(1) model does not provide</p><p>a satisfactory fit, as the residual series is clearly not a realistic realisation of</p><p>white noise (Fig. 6.3).</p><p>> www x x.ts x.ma x.ma</p><p>Call:</p><p>arima(x = x.ts, order = c(0, 0, 1))</p><p>Coefficients:</p><p>ma1 intercept</p><p>1.000 2.833</p><p>s.e. 0.072 0.065</p><p>sigma^2 estimated as 0.0417: log likelihood = 4.76, aic = -3.53</p><p>> acf(x.ma$res[-1])</p><p>6.5 Mixed models: The ARMA process 127</p><p>0 5 10 15</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.3. The correlogram of residual series for the MA(1) model fitted to the</p><p>exchange rate data.</p><p>6.5 Mixed models: The ARMA process</p><p>6.5.1 Definition</p><p>Recall from Chapter 4 that a series {xt} is an autoregressive process of order p,</p><p>an AR(p) process, if</p><p>xt = α1xt−1 + α2xt−2 + . . .+ αpxt−p + wt (6.6)</p><p>where {wt} is white noise and the αi are the model parameters. A useful</p><p>class of models are obtained when AR and MA terms are added together in a</p><p>single expression. A time series {xt} follows an autoregressive moving average</p><p>(ARMA) process of order (p, q), denoted ARMA(p, q), when</p><p>xt = α1xt−1+α2xt−2+. . .+αpxt−p+wt+β1wt−1+β2wt−2+. . .+βqwt−q (6.7)</p><p>where {wt} is white noise. Equation (6.7) may be represented in terms of the</p><p>backward shift operator B and rearranged in the more concise polynomial</p><p>form</p><p>θp(B)xt = φq(B)wt (6.8)</p><p>The following points should be noted about an ARMA(p, q) process:</p><p>(a) The process is stationary when the roots of θ all exceed unity in absolute</p><p>value.</p><p>(b) The process is invertible when the roots of φ all exceed unity in absolute</p><p>value.</p><p>(c) The AR(p) model is the special case ARMA(p, 0).</p><p>(d) The MA(q) model is the special case ARMA(0, q).</p><p>(e) Parameter parsimony. When fitting to data, an ARMA model will often</p><p>be more parameter efficient (i.e., require fewer parameters) than a single</p><p>MA or AR model.</p><p>128 6 Stationary Models</p><p>(e) Parameter redundancy. When θ and φ share a common factor, a stationary</p><p>model can be simplified. For example, the model (1 − 1</p><p>2B)(1 − 1</p><p>3B)xt =</p><p>(1− 1</p><p>2B)wt can be written (1− 1</p><p>3B)xt = wt.</p><p>6.5.2 Derivation of second-order properties*</p><p>In order to derive the second-order properties for an ARMA(p, q) process</p><p>{xt}, it is helpful first to express the xt in terms of white noise components</p><p>wt because white noise terms are independent. We illustrate the procedure for</p><p>the ARMA(1, 1) model.</p><p>The ARMA(1, 1) process for {xt} is given by</p><p>xt = αxt−1 + wt + βwt−1 (6.9)</p><p>where wt is white noise, with E(wt) = 0 and Var(wt) = σ2</p><p>w. Rearranging</p><p>Equation (6.9) to express xt in terms of white noise components,</p><p>xt = (1− αB)−1(1 + βB)wt</p><p>Expanding the right-hand-side,</p><p>xt = (1 + αB + α2B2 + . . .)(1 + βB)wt</p><p>=</p><p>( ∞∑</p><p>i=0</p><p>αiBi</p><p>)</p><p>(1 + βB)wt</p><p>=</p><p>(</p><p>1 +</p><p>∞∑</p><p>i=0</p><p>αi+1Bi+1 +</p><p>∞∑</p><p>i=0</p><p>αiβBi+1</p><p>)</p><p>wt</p><p>= wt + (α+ β)</p><p>∞∑</p><p>i=1</p><p>αi−1wt−i (6.10)</p><p>With the equation in the form above, the second-order properties follow. For</p><p>example, the mean E(xt) is clearly zero because E(wt−i) = 0 for all i, and</p><p>the variance is given by</p><p>Var(xt) = Var</p><p>[</p><p>wt + (α+ β)</p><p>∞∑</p><p>i=1</p><p>αi−1wt−i</p><p>]</p><p>= σ2</p><p>w + σ2</p><p>w(α+ β)2(1− α2)−1 (6.11)</p><p>The autocovariance γk, for k > 0, is given by</p><p>Cov (xt, xt+k) = (α+ β)αk−1σ2</p><p>w + (α+ β)2 σ2</p><p>wα</p><p>k</p><p>∞∑</p><p>i=1</p><p>α2i−2</p><p>= (α+ β)αk−1σ2</p><p>w + (α+ β)2 σ2</p><p>wα</p><p>k(1− α2)−1</p><p>(6.12)</p><p>6.6 ARMA models: Empirical analysis 129</p><p>The autocorrelation ρk then follows as</p><p>ρk = γk/γ0 = Cov (xt, xt+k) /Var (xt)</p><p>=</p><p>αk−1(α+ β)(1 + αβ)</p><p>1 + αβ + β2</p><p>(6.13)</p><p>Note that Equation (6.13) implies ρk = αρk−1.</p><p>6.6 ARMA models: Empirical analysis</p><p>6.6.1 Simulation and fitting</p><p>The ARMA process, and the more general ARIMA processes discussed in the</p><p>next chapter, can be simulated using the R function arima.sim, which takes a</p><p>list of coefficients representing the AR</p><p>and MA parameters. An ARMA(p, q)</p><p>model can be fitted using the arima function with the order function param-</p><p>eter set to c(p, 0, q). The fitting algorithm proceeds similarly to that for</p><p>an MA process. Below, data from an ARMA(1, 1) process are simulated for</p><p>α = −0.6 and β = 0.5 (Equation (6.7)), and an ARMA(1, 1) model fitted to</p><p>the simulated series. As expected, the sample estimates of α and β are close</p><p>to the underlying model parameters.</p><p>> set.seed(1)</p><p>> x coef(arima(x, order = c(1, 0, 1)))</p><p>ar1 ma1 intercept</p><p>-0.59697 0.50270 -0.00657</p><p>6.6.2 Exchange rate series</p><p>In §6.3, a simple MA(1) model failed to provide an adequate fit to the exchange</p><p>rate series. In the code below, fitted MA(1), AR(1) and ARMA(1, 1) models</p><p>are compared using the AIC. The ARMA(1, 1) model provides the best fit</p><p>to the data, followed by the AR(1) model, with the MA(1) model providing</p><p>the poorest fit. The correlogram in Figure 6.4 indicates that the residuals of</p><p>the fitted ARMA(1, 1) model have small autocorrelations, which is consistent</p><p>with a realisation of white noise and supports the use of the model.</p><p>> x.ma x.ar x.arma AIC(x.ma)</p><p>[1] -3.53</p><p>> AIC(x.ar)</p><p>130 6 Stationary Models</p><p>[1] -37.4</p><p>> AIC(x.arma)</p><p>[1] -42.3</p><p>> x.arma</p><p>Call:</p><p>arima(x = x.ts, order = c(1, 0, 1))</p><p>Coefficients:</p><p>ar1 ma1 intercept</p><p>0.892 0.532 2.960</p><p>s.e. 0.076 0.202 0.244</p><p>sigma^2 estimated as 0.0151: log likelihood = 25.1, aic = -42.3</p><p>> acf(resid(x.arma))</p><p>0 1 2 3</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.4. The correlogram of residual series for the ARMA(1, 1) model fitted to the</p><p>exchange rate data.</p><p>6.6.3 Electricity production series</p><p>Consider the Australian electricity production series introduced in §1.4.3. The</p><p>data exhibit a clear positive trend and a regular seasonal cycle. Furthermore,</p><p>the variance increases with time, which suggests a log-transformation may be</p><p>appropriate (Fig. 1.5). A regression model is fitted to the logarithms of the</p><p>original series in the code below.</p><p>6.6 ARMA models: Empirical analysis 131</p><p>> www CBE Elec.ts Time Imth Elec.lm acf(resid(Elec.lm))</p><p>The correlogram of the residuals appears to cycle with a period of 12 months</p><p>suggesting that the monthly indicator variables are not sufficient to account</p><p>for the seasonality in the series (Fig. 6.5). In the next chapter, we find that this</p><p>can be accounted for using a non-stationary model with a stochastic seasonal</p><p>component. In the meantime, we note that the best fitting ARMA(p, q) model</p><p>can be chosen using the smallest AIC either by trying a range of combinations</p><p>of p and q in the arima function or using a for loop with upper bounds on</p><p>p and q – taken as 2 in the code shown below. In each step of the for loop,</p><p>the AIC of the fitted model is compared with the currently stored smallest</p><p>value. If the model is found to be an improvement (i.e., has a smaller AIC</p><p>value), then the new value and model are stored. To start with, best.aic is</p><p>initialised to infinity (Inf). After the loop is complete, the best model can</p><p>be found in best.order, and in this case the best model turns out to be an</p><p>AR(2) model.</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.5. Electricity production series: correlogram of the residual series of the fitted</p><p>regression model.</p><p>> best.order best.aic for (i in 0:2) for (j in 0:2) {</p><p>fit.aic best.order</p><p>[1] 2 0 0</p><p>> acf(resid(best.arma))</p><p>The predict function can be used both to forecast future values from</p><p>the fitted regression model and forecast the future errors associated with the</p><p>regression model using the ARMA model fitted to the residuals from the</p><p>regression. These two forecasts can then be summed to give a forecasted value</p><p>of the logarithm for electricity production, which would then need to be anti-</p><p>logged and perhaps adjusted using a bias correction factor. As predict is</p><p>a generic R function, it works in different ways for different input objects</p><p>and classes. For a fitted regression model of class lm, the predict function</p><p>requires the new set of data to be in the form of a data frame (object class</p><p>data.frame). For a fitted ARMA model of class arima, the predict function</p><p>requires just the number of time steps ahead for the desired forecast. In the</p><p>latter case, predict produces an object that has both the predicted values and</p><p>their standard errors, which can be extracted using pred and se, respectively.</p><p>In the code below, the electricity production for each month of the next three</p><p>years is predicted.</p><p>> new.time new.data predict.lm predict.arma elec.pred ts.plot(cbind(Elec.ts, elec.pred), lty = 1:2)</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>2</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 6.6. Electricity production series: correlogram of the residual series of the</p><p>best-fitting ARMA model.</p><p>6.6 ARMA models: Empirical analysis 133</p><p>The plot of the forecasted values suggests that the predicted values for</p><p>winter may be underestimated by the fitted model (Fig. 6.7), which may be</p><p>due to the remaining seasonal autocorrelation in the residuals (see Fig. 6.6).</p><p>This problem will be addressed in the next chapter.</p><p>Time</p><p>1960 1965 1970 1975 1980 1985 1990 1995</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>16</p><p>00</p><p>0</p><p>Fig. 6.7. Electricity production series: observed (solid line) and forecasted values</p><p>(dotted line). The forecasted values are not likely to be accurate because of the</p><p>seasonal autocorrelation present in the residuals for the fitted model.</p><p>6.6.4 Wave tank data</p><p>The data in the file wave.dat are the surface height of water (mm), relative</p><p>to the still water level, measured using a capacitance probe positioned at the</p><p>centre of a wave tank. The continuous voltage signal from this capacitance</p><p>probe was sampled every 0.1 second over a 39.6-second period. The objective</p><p>is to fit a suitable ARMA(p, q) model that can be used to generate a realistic</p><p>wave input to a mathematical model for an ocean-going tugboat in a computer</p><p>simulation. The results of the computer simulation will be compared with tests</p><p>using a physical model of the tugboat in the wave tank.</p><p>The pacf suggests that p should be at least 2 (Fig. 6.8). The best-fitting</p><p>ARMA(p, q) model, based on a minimum variance of residuals, was obtained</p><p>with both p and q equal to 4. The acf and pacf of the residuals from this model</p><p>are consistent with the residuals being a realisation of white noise (Fig. 6.9).</p><p>> www wave.dat attach (wave.dat)</p><p>> layout(1:3)</p><p>> plot (as.ts(waveht), ylab = 'Wave height')</p><p>> acf (waveht)</p><p>> pacf (waveht)</p><p>> wave.arma acf (wave.arma$res[-(1:4)])</p><p>> pacf (wave.arma$res[-(1:4)])</p><p>> hist(wave.arma$res[-(1:4)], xlab='height / mm', main='')</p><p>134 6 Stationary Models</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>0 100 200 300 400</p><p>−</p><p>50</p><p>0</p><p>50</p><p>0</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20 25</p><p>−</p><p>0.</p><p>6</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>Fig. 6.8. Wave heights: time plot, acf, and pacf.</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>5 10 15 20 25</p><p>−</p><p>0.</p><p>15</p><p>0.</p><p>00</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>height / mm</p><p>F</p><p>re</p><p>qu</p><p>en</p><p>cy</p><p>−400 −200 0 200 400 600</p><p>0</p><p>40</p><p>80</p><p>Fig. 6.9. Residuals after fitting an ARMA(4, 4) model to wave heights: acf, pacf,</p><p>and histogram.</p><p>6.8 Exercises 135</p><p>6.7 Summary of R commands</p><p>arima.sim simulates data from an ARMA (or ARIMA) process</p><p>arima fits an ARMA (or ARIMA) model to data</p><p>seq generates a sequence</p><p>expression used to plot maths symbol</p><p>6.8 Exercises</p><p>1. Using the relation Cov(</p><p>∑</p><p>xt,</p><p>∑</p><p>yt) =</p><p>∑∑</p><p>Cov(xt, yt) (Equation (2.15))</p><p>for time series {xt} and {yt}, prove Equation (6.3).</p><p>2. The series {wt} is white noise with zero mean and variance σ2</p><p>w. For the</p><p>following moving average models, find the autocorrelation function and</p><p>determine whether they are invertible. In addition, simulate 100 observa-</p><p>tions for each model in R, compare the time plots of the simulated series,</p><p>and comment on how the two series might be distinguished.</p><p>a) xt = wt + 1</p><p>2wt−1</p><p>b) xt = wt + 2wt−1</p><p>3. Write the following models in ARMA(p, q) notation and determine whether</p><p>they are stationary and/or invertible (wt is white noise). In each case,</p><p>check for parameter redundancy and ensure that the ARMA(p, q) nota-</p><p>tion is expressed in the simplest form.</p><p>a) xt = xt−1 − 1</p><p>4xt−2 + wt + 1</p><p>2wt−1</p><p>b) xt = 2xt−1 − xt−2 + wt</p><p>c) xt = 3</p><p>2xt−1 − 1</p><p>2xt−2 + wt − 1</p><p>2wt−1 + 1</p><p>4wt−2</p><p>d) xt = 3</p><p>2xt−1 − 1</p><p>2xt−2 + 1</p><p>2wt − wt−1</p><p>e) xt = 7</p><p>10xt−1 − 1</p><p>10xt−2 + wt − 3</p><p>2wt−1</p><p>f) xt = 3</p><p>2xt−1 − 1</p><p>2xt−2 + wt − 1</p><p>3wt−1 + 1</p><p>6wt−2</p><p>4. a) Fit a suitable regression model to the air passenger series. Comment</p><p>on the correlogram of the residuals from the fitted regression model.</p><p>b) Fit an ARMA(p, q) model for values of p and q no greater than 2</p><p>to the residual series of the fitted regression model. Choose the best</p><p>fitting model based on the AIC and comment on its correlogram.</p><p>c) Forecast the number of passengers travelling on the airline in 1961.</p><p>5. a) Write an R function that calculates the autocorrelation function (Equa-</p><p>tion (6.13)) for an ARMA(1, 1) process. Your function should take</p><p>parameters representing α and β for the AR and MA components.</p><p>136 6 Stationary Models</p><p>b) Plot the autocorrelation function above for the case with α = 0.7 and</p><p>β = −0.5 for lags 0 to 20.</p><p>c) Simulate n = 100 values of the ARMA(1, 1) model with α = 0.7</p><p>and β = −0.5, and compare the sample correlogram to the theoretical</p><p>correlogram plotted in part (b). Repeat for n = 1000.</p><p>6. Let {xt : t = 1, . . . , n} be a stationary time series with E(xt) = µ,</p><p>Var(xt) = σ2, and Cor(xt, xt+k) = ρk. Using Equation (5.5) from Chapter</p><p>5:</p><p>a) Calculate Var(x̄) when {xt} is the MA(1) process xt = wt + 1</p><p>2wt−1.</p><p>b) Calculate Var(x̄) when {xt} is the MA(1) process xt = wt − 1</p><p>2wt−1.</p><p>c) Compare each of the above with the variance of the sample mean</p><p>obtained for the white noise case ρk = 0 (k > 0). Of the three mod-</p><p>els, which would have the most accurate estimate of µ based on the</p><p>variances of their sample means?</p><p>d) A simulated example that extracts the variance of the sample mean</p><p>for 100 Gaussian white noise series each of length 20 is given by</p><p>> set.seed(1)</p><p>> m for (i in 1:100) m[i] var(m)</p><p>[1] 0.0539</p><p>For each of the two MA(1) processes, write R code that extracts the</p><p>variance of the sample mean of 100 realisations of length 20. Compare</p><p>them with the variances calculated in parts (a) and (b).</p><p>7. If the sample autocorrelation function of a time series appears to cut off</p><p>after lag q (i.e., autocorrelations at lags higher than q are not significantly</p><p>different from 0 and do not follow any clear patterns), then an MA(q)</p><p>model might be suitable. An AR(p) model is indicated when the partial</p><p>autocorrelation function cuts off after lag p. If there are no convincing</p><p>cutoff points for either function, an ARMA model may provide the best</p><p>fit. Plot the autocorrelation and partial autocorrelation functions for the</p><p>simulated ARMA(1, 1) series given in §6.6.1. Using the AIC, choose a</p><p>best-fitting AR model and a best-fitting MA model. Which best-fitting</p><p>model (AR or MA) has the smallest number of parameters? Compare this</p><p>model with the fitted ARMA(1, 1) model of §6.6.1, and comment.</p><p>7</p><p>Non-stationary Models</p><p>7.1 Purpose</p><p>As we have discovered in the previous chapters, many time series are non-</p><p>stationary because of seasonal effects or trends. In particular, random walks,</p><p>which characterise many types of series, are non-stationary but can be trans-</p><p>formed to a stationary series by first-order differencing (§4.4). In this chap-</p><p>ter we first extend the random walk model to include autoregressive and</p><p>moving average terms. As the differenced series needs to be aggregated (or</p><p>‘integrated’) to recover the original series, the underlying stochastic process</p><p>is called autoregressive integrated moving average, which is abbreviated to</p><p>ARIMA.</p><p>The ARIMA process can be extended to include seasonal terms, giving a</p><p>non-stationary seasonal ARIMA (SARIMA) process. Seasonal ARIMA models</p><p>are powerful tools in the analysis of time series as they are capable of modelling</p><p>a very wide range of series. Much of the methodology was pioneered by Box</p><p>and Jenkins in the 1970’s.</p><p>Series may also be non-stationary because the variance is serially corre-</p><p>lated (technically known as conditionally heteroskedastic), which usually re-</p><p>sults in periods of volatility , where there is a clear change in variance. This</p><p>is common in financial series, but may also occur in other series such as cli-</p><p>mate records. One approach to modelling series of this nature is to use an</p><p>autoregressive model for the variance, i.e. an autoregressive conditional het-</p><p>eroskedastic (ARCH) model. We consider this approach, along with the gen-</p><p>eralised ARCH (GARCH) model in the later part of the chapter.</p><p>7.2 Non-seasonal ARIMA models</p><p>7.2.1 Differencing and the electricity series</p><p>Differencing a series {xt} can remove trends, whether these trends are stochas-</p><p>tic, as in a random walk, or deterministic, as in the case of a linear trend. In</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 137</p><p>Use R, DOI 10.1007/978-0-387-88698-5 7,</p><p>© Springer Science+Business Media, LLC 2009</p><p>138 7 Non-stationary Models</p><p>the case of a random walk, xt = xt−1 + wt, the first-order differenced se-</p><p>ries is white noise {wt} (i.e., ∇xt = xt − xt−1 = wt) and so is stationary.</p><p>In contrast, if xt = a + bt + wt, a linear trend with white noise errors, then</p><p>∇xt = xt−xt−1 = b+wt−wt−1, which is a stationary moving average process</p><p>rather than white noise. Notice that the consequence of differencing a linear</p><p>trend with white noise is an MA(1) process, whereas subtraction of the trend,</p><p>a+ bt, would give white noise. This raises an issue of whether or not it is sen-</p><p>sible to use differencing to remove a deterministic trend. The arima function</p><p>in R does not allow the fitted differenced models to include a constant. If you</p><p>wish to fit a differenced model to a deterministic trend using R you need to</p><p>difference, then mean adjust the differenced series to have a mean of 0, and</p><p>then fit an ARMA model to the adjusted differenced series using arima with</p><p>include.mean set to FALSE and d = 0.</p><p>A corresponding issue arises with simulations from an ARIMA model.</p><p>Suppose xt = a + bt + wt so ∇xt = yt = b + wt − wt−1. It follows directly</p><p>from the definitions that the inverse of yt = ∇xt is xt = x0 +</p><p>∑t</p><p>i=1 yi. If an</p><p>MA(1) model is fitted to the differenced time series, {yt}, the coefficient of</p><p>wt−1 is unlikely to be identified as precisely −1. It follows that the simulated</p><p>{xt} will have increasing variance (Exercise 3) about a straight line.</p><p>We can take first-order differences in R using the difference function diff.</p><p>For example, with the Australian electricity production series, the code below</p><p>plots the data and first-order differences of the natural logarithm of the series.</p><p>Note that in the layout command below the first figure is allocated two 1s</p><p>and is therefore plotted over half (i.e., the first two fourths) of the frame.</p><p>> www CBE Elec.ts layout(c(1, 1, 2, 3))</p><p>> plot(Elec.ts)</p><p>> plot(diff(Elec.ts))</p><p>> plot(diff(log(Elec.ts)))</p><p>The increasing trend is no longer apparent in the plots of the differenced series</p><p>(Fig. 7.1).</p><p>7.2.2 Integrated model</p><p>A series {xt} is integrated of order d, denoted as I(d), if the dth difference of</p><p>{xt} is white noise {wt}; i.e., ∇dxt = wt. Since ∇d ≡ (1 − B)d, where B is</p><p>the backward shift operator, a series {xt} is integrated of order d if</p><p>(1−B)dxt = wt (7.1)</p><p>The random walk is the special case I(1). The diff command from the pre-</p><p>vious section can be used to obtain higher-order differencing either by re-</p><p>peated application or setting the parameter d to the required values; e.g.,</p><p>7.2 Non-seasonal ARIMA models 139</p><p>(a)</p><p>Time</p><p>S</p><p>er</p><p>ie</p><p>s</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>20</p><p>00</p><p>40</p><p>00</p><p>60</p><p>00</p><p>80</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>(b)</p><p>Time</p><p>D</p><p>iff</p><p>s</p><p>er</p><p>ie</p><p>s</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>−</p><p>15</p><p>00</p><p>0</p><p>10</p><p>00</p><p>(c)</p><p>Time</p><p>D</p><p>iff</p><p>lo</p><p>g−</p><p>se</p><p>rie</p><p>s</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>−</p><p>0.</p><p>15</p><p>0.</p><p>05</p><p>0.</p><p>20</p><p>Fig. 7.1. (a) Plot of Australian electricity production series; (b) plot of the first-</p><p>order differenced series; (c) plot of the first-order differenced log-transformed series.</p><p>diff(diff(x)) and diff(x, d=2) would both produce second-order differ-</p><p>enced series of x. Second-order differencing may sometimes successfully reduce</p><p>a series with an underlying curve trend to white noise. A further parameter</p><p>(lag) can be used to set the lag of the differencing. By default, lag is set to</p><p>unity, but other values can be useful for removing additive seasonal effects.</p><p>For example, diff(x, lag=12) will remove both a linear trend and additive</p><p>seasonal effects in a monthly series.</p><p>7.2.3 Definition and examples</p><p>A time series {xt} follows an ARIMA(p, d, q) process if the dth differences of</p><p>the {xt} series are an ARMA(p, q) process. If we introduce yt = (1−B)dxt,</p><p>140 7 Non-stationary Models</p><p>then θp(B)yt = φq(B)wt. We can now substitute for yt to obtain the more</p><p>succinct form for an ARIMA(p, d, q) process as</p><p>θp(B)(1−B)dxt = φq(B)wt (7.2)</p><p>where θp and φq are polynomials of orders p and q, respectively. Some examples</p><p>of ARIMA models are:</p><p>(a) xt = xt−1+wt+βwt−1, where β is a model parameter. To see which model</p><p>this represents, collect together like terms, factorise them, and express</p><p>them in terms of the backward shift operator (1 − B)xt = (1 + βB)wt.</p><p>Comparing this with Equation (7.2), we can see that {xt} is ARIMA(0, 1,</p><p>1), which is sometimes called an integrated moving average model, denoted</p><p>as IMA(1, 1). In general, ARIMA(0, d, q) ≡ IMA(d, q).</p><p>(b) xt = αxt−1+xt−1−αxt−2+wt, where α is a model parameter. Rearranging</p><p>and factorising gives (1 − αB)(1 −B)xt = wt, which is ARIMA(1, 1, 0),</p><p>also known as an integrated autoregressive process and denoted as ARI(1,</p><p>1). In general, ARI(p, d) ≡ ARIMA(p, d, 0).</p><p>7.2.4 Simulation and fitting</p><p>An ARIMA(p, d, q) process can be fitted to data using the R function arima</p><p>with the parameter order set to c(p, d, q). An ARIMA(p, d, q) process can</p><p>be simulated in R by writing appropriate code. For example, in the code below,</p><p>data for the ARIMA(1, 1, 1) model xt = 0.5xt−1+xt−1−0.5xt−2+wt+0.3wt−1</p><p>are simulated and the model fitted to the simulated series to recover the</p><p>parameter estimates.</p><p>> set.seed(1)</p><p>> x for (i in 3:1000) x[i] arima(x, order = c(1, 1, 1))</p><p>Call:</p><p>arima(x = x, order = c(1, 1, 1))</p><p>Coefficients:</p><p>ar1 ma1</p><p>0.423 0.331</p><p>s.e. 0.043 0.045</p><p>sigma^2 estimated as 1.07: log likelihood = -1450, aic = 2906</p><p>Writing your own code has the advantage in that it helps to ensure that you</p><p>understand the model. However, an ARIMA simulation can be carried out</p><p>using the inbuilt R function arima.sim, which has the parameters model and</p><p>n to specify the model and the simulation length, respectively.</p><p>7.2 Non-seasonal ARIMA models 141</p><p>> x arima(x, order = c(1, 1, 1))</p><p>Call:</p><p>arima(x = x, order = c(1, 1, 1))</p><p>Coefficients:</p><p>ar1 ma1</p><p>0.557 0.250</p><p>s.e. 0.037 0.044</p><p>sigma^2 estimated as 1.08: log likelihood = -1457, aic = 2921</p><p>7.2.5 IMA(1, 1) model fitted to the beer production series</p><p>The Australian beer production series is in the second column of the dataframe</p><p>CBE in §7.2.1. The beer data is dominated by a trend of increasing beer pro-</p><p>duction over the period, so a simple integrated model IMA(1, 1) is fitted to</p><p>allow for this trend and a carryover of production from the previous month.</p><p>The IMA(1, 1) model is often appropriate because it represents a linear trend</p><p>with white noise added. The residuals are analysed using the correlogram (Fig.</p><p>7.2), which has peaks at yearly cycles and suggests that a seasonal term is</p><p>required.</p><p>> Beer.ts Beer.ima Beer.ima</p><p>Call:</p><p>arima(x = Beer.ts, order = c(0, 1, 1))</p><p>Coefficients:</p><p>ma1</p><p>-0.333</p><p>s.e. 0.056</p><p>sigma^2 estimated as 360: log likelihood = -1723, aic = 3451</p><p>> acf(resid(Beer.ima))</p><p>From the output above the fitted model is xt = xt−1+wt−0.33wt−1. Forecasts</p><p>can be obtained using this model, with t set to the value required for the</p><p>forecast. Forecasts can also be obtained using the predict function in R with</p><p>the parameter n.ahead set to the number of values in the future. For example,</p><p>the production for the next year in the record is obtained using predict and</p><p>the total annual production for 1991 obtained by summation:</p><p>> Beer.1991 sum(Beer.1991$pred)</p><p>[1] 2365</p><p>142 7 Non-stationary Models</p><p>0.0 0.5 1.0 1.5 2.0</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.2. Australian beer series: correlogram of the residuals of the fitted IMA(1,</p><p>1) model</p><p>7.3 Seasonal ARIMA models</p><p>7.3.1 Definition</p><p>A seasonal ARIMA model uses differencing at a lag equal to the number of</p><p>seasons (s) to remove additive seasonal effects. As with lag 1 differencing to</p><p>remove a trend, the lag s differencing introduces a moving average term. The</p><p>seasonal ARIMA model includes autoregressive and moving average terms at</p><p>lag s. The seasonal ARIMA(p, d, q)(P , D, Q)s model can be most succinctly</p><p>expressed using the backward shift operator</p><p>ΘP (Bs)θp(B)(1−Bs)D(1−B)dxt = ΦQ(Bs)φq(B)wt (7.3)</p><p>where ΘP , θp, ΦQ, and φq are polynomials of orders P , p, Q, and q, respec-</p><p>tively. In general, the model is non-stationary, although if D = d = 0 and the</p><p>roots of the characteristic equation (polynomial terms on the left-hand side of</p><p>Equation (7.3)) all exceed unity in absolute value, the resulting model would</p><p>be stationary. Some examples of seasonal ARIMA models are:</p><p>(a) A simple AR model with a seasonal period of 12 units, denoted as</p><p>ARIMA(0, 0, 0)(1, 0, 0)12, is xt = αxt−12 + wt. Such a model would</p><p>be appropriate for monthly data when only the value in the month of the</p><p>previous year influences the current monthly value. The model is station-</p><p>ary when |α−1/12| > 1.</p><p>(b) It is common to find series with stochastic trends that nevertheless</p><p>have seasonal influences. The model in (a) above could be extended to</p><p>xt = xt−1 + αxt−12 − αxt−13 + wt. Rearranging and factorising gives</p><p>7.3 Seasonal ARIMA models 143</p><p>(1 − αB12)(1 − B)xt = wt or Θ1(B12)(1 − B)xt = wt, which, on com-</p><p>paring with Equation (7.3), is ARIMA(0, 1, 0)(1, 0, 0)12. Note that this</p><p>model could also be written ∇xt = α∇xt−12 +wt, which emphasises that</p><p>the change at time t depends on the change at the same time (i.e., month)</p><p>of the previous year. The model is non-stationary since the polynomial on</p><p>the left-hand side contains the term (1 − B), which implies that there</p><p>exists a unit root B = 1.</p><p>(c) A simple quarterly seasonal moving average model is xt = (1−βB4)wt =</p><p>wt−βwt−4. This is stationary and only suitable for data without a trend.</p><p>If the data also contain a stochastic trend, the model could be extended</p><p>to include first-order differences, xt = xt−1 + wt − βwt−4, which is an</p><p>ARIMA(0, 1, 0)(0, 0, 1)4 process. Alternatively, if the seasonal terms con-</p><p>tain a stochastic trend, differencing can be applied at the seasonal period</p><p>to give xt</p><p>= xt−4 + wt − βwt−4, which is ARIMA(0, 0, 0)(0, 1, 1)4.</p><p>You should be aware that differencing at lag s will remove a linear trend,</p><p>so there is a choice whether or not to include lag 1 differencing. If lag 1</p><p>differencing is included, when a linear trend is appropriate, it will introduce</p><p>moving average terms into a white noise series. As an example, consider a time</p><p>series of period 4 that is the sum of a linear trend, four additive seasonals,</p><p>and white noise:</p><p>xt = a+ bt+ s[t] + wt</p><p>where [t] is the remainder after division of t by 4, so s[t] = s[t−4]. First, consider</p><p>first-order differencing at lag 4 only. Then,</p><p>(1−B4)xt = xt − xt−4</p><p>= a+ bt− (a+ b(t− 4)) + s[t] − s[t−4] + wt − wt−4</p><p>= 4b+ wt − wt−4</p><p>Formally, the model can be expressed as ARIMA(0, 0, 0)(0, 1, 1)4 with a</p><p>constant term 4b. Now suppose we apply first-order differencing at lag 1 before</p><p>differencing at lag 4. Then,</p><p>(1−B4)(1−B)xt = (1−B4)(b+ s[t] − s[t−1] + wt − wt−1)</p><p>= wt − wt−1 − wt−4 + wt−5</p><p>which is a ARIMA(0, 1, 1)(0, 1, 1)4 model with no constant term.</p><p>7.3.2 Fitting procedure</p><p>Seasonal ARIMA models can potentially have a large number of parameters</p><p>and combinations of terms. Therefore, it is appropriate to try out a wide</p><p>range of models when fitting to data and to choose a best-fitting model using</p><p>144 7 Non-stationary Models</p><p>an appropriate criterion such as the AIC. Once a best-fitting model has been</p><p>found, the correlogram of the residuals should be verified as white noise. Some</p><p>confidence in the best-fitting model can be gained by deliberately overfitting</p><p>the model by including further parameters and observing an increase in the</p><p>AIC.</p><p>In R, this approach to fitting a range of seasonal ARIMA models is straight-</p><p>forward, since the fitting criteria can be called by nesting functions and the</p><p>‘up arrow’ on the keyboard used to recall the last command, which can then</p><p>be edited to try a new model. Any obvious terms, such as a differencing term</p><p>if there is a trend, should be included and retained in the model to reduce</p><p>the number of comparisons. The model can be fitted with the arima function,</p><p>which requires an additional parameter seasonal to specify the seasonal com-</p><p>ponents. In the example below, we fit two models with first-order terms to</p><p>the logarithm of the electricity production series. The first uses autoregressive</p><p>terms and the second uses moving average terms. The parameter d = 1 is re-</p><p>tained in both the models since we found in §7.2.1 that first-order differencing</p><p>successfully removed the trend in the series. The seasonal ARI model provides</p><p>the better fit since it has the smallest AIC.</p><p>> AIC (arima(log(Elec.ts), order = c(1,1,0),</p><p>seas = list(order = c(1,0,0), 12)))</p><p>[1] -1765</p><p>> AIC (arima(log(Elec.ts), order = c(0,1,1),</p><p>seas = list(order = c(0,0,1), 12)))</p><p>[1] -1362</p><p>It is straightforward to check a range of models by a trial-and-error approach</p><p>involving just editing a command on each trial to see if an improvement in the</p><p>AIC occurs. Alternatively, we could write a simple function that fits a range of</p><p>ARIMA models and selects the best-fitting model. This approach works better</p><p>when the conditional sum of squares method CSS is selected in the arima</p><p>function, as the algorithm is more robust. To avoid over parametrisation, the</p><p>consistent Akaike Information Criteria (CAIC; see Bozdogan, 1987) can be</p><p>used in model selection. An example program follows.</p><p>get.best.arima best.arima.elec best.fit.elec acf( resid(best.fit.elec) )</p><p>> best.arima.elec [[3]]</p><p>[1] 0 1 1 2 0 2</p><p>> ts.plot( cbind( window(Elec.ts,start = 1981),</p><p>exp(predict(best.fit.elec,12)$pred) ), lty = 1:2)</p><p>From the code above, we see the best-fitting model using terms up to second</p><p>order is ARIMA(0, 1, 1)(2, 0, 2)12. Although higher-order terms could be tried</p><p>by increasing the values in maxord, this would seem unnecessary since the</p><p>residuals are approximately white noise (Fig. 7.3b). For the predicted values</p><p>(Fig. 7.3a), a biased correction factor could be used, although this would seem</p><p>unnecessary given that the residual standard deviation is small compared with</p><p>the predictions.</p><p>7.4 ARCH models</p><p>7.4.1 S&P500 series</p><p>Standard and Poors (of the McGraw-Hill companies) publishes a range of</p><p>financial indices and credit ratings. Consider the following time plot and cor-</p><p>relogram of the daily returns of the S&P500 Index1 (from January 2, 1990 to</p><p>December 31, 1999), available in the MASS library within R.</p><p>> library(MASS)</p><p>> data(SP500)</p><p>> plot(SP500, type = 'l')</p><p>> acf(SP500)</p><p>The time plot of the returns is shown in Figure 7.4(a), and at first glance</p><p>the series appears to be a realisation of a stationary process. However, on</p><p>1 The S&P500 Index is calculated from the stock prices of 500 large corpora-</p><p>tions. The time series in R is the returns of the S&P500 Index, defined as</p><p>100ln(SPIt/SPIt−1), where SPIt is the value of the S&P500 Index on trading</p><p>day t.</p><p>146 7 Non-stationary Models</p><p>(a)</p><p>Time</p><p>1982 1984 1986 1988 1990 1992</p><p>80</p><p>00</p><p>12</p><p>00</p><p>0</p><p>0.0 0.5 1.0 1.5 2.0</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.3. Electricity production series: (a) time plot for last 10 years, with added</p><p>predicted values (dotted); (b) correlogram of the residuals of the best-fitting seasonal</p><p>ARIMA model.</p><p>closer inspection, it seems that the variance is smallest in the middle third of</p><p>the series and greatest in the last third. The series exhibits periods of increased</p><p>variability, sometimes called volatility in the financial literature, although it</p><p>does not increase in a regular way. When a variance is not constant in time</p><p>but changes in a regular way, as in the airline and electricity data (where the</p><p>variance increased with the trend), the series is called heteroskedastic. If a</p><p>series exhibits periods of increased variance, so the variance is correlated in</p><p>time (as observed in the S&P500 data), the series exhibits volatility and is</p><p>called conditional heteroskedastic.</p><p>Note that the correlogram of a volatile series does not differ significantly</p><p>from white noise (Fig. 7.4b), but the series is non-stationary since the variance</p><p>is different at different times. If a correlogram appears to be white noise (e.g.,</p><p>Fig. 7.4b), then volatility can be detected by looking at the correlogram of</p><p>the squared values since the squared values are equivalent to the variance</p><p>7.4 ARCH models 147</p><p>0 500 1000 1500 2000 2500</p><p>−</p><p>6</p><p>−</p><p>2</p><p>2</p><p>(a)</p><p>Index</p><p>S</p><p>P</p><p>50</p><p>0</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.4. Standard and Poors returns of the S&P500 Index: (a) time plot; (b)</p><p>correlogram.</p><p>(provided the series is adjusted to have a mean of zero). The mean of the</p><p>returns of the S&P500 Index between January 2, 1990 and December 31,</p><p>1999 is 0.0458. Although this is small compared with the variance, it accounts</p><p>for an increase in the S&P500 Index from 360 to 1469 over the 2527 trading</p><p>days. The correlogram of the squared mean-adjusted values of the S&P500</p><p>index is given by</p><p>> acf((SP500 - mean(SP500))^2)</p><p>From this we can see that there is evidence of serial correlation in the squared</p><p>values, so there is evidence of conditional heteroskedastic behaviour and</p><p>volatility (Fig. 7.5).</p><p>7.4.2 Modelling volatility: Definition of the ARCH model</p><p>In order to account for volatility, we require a model that allows for conditional</p><p>changes in the variance. One approach to this is to use an autoregressive model</p><p>for the variance process. This leads to the following definition. A</p><p>series {εt}</p><p>is first-order autoregressive conditional heteroskedastic, denoted ARCH(1), if</p><p>148 7 Non-stationary Models</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.5. Returns of the Standard and Poors S&P500 Index: correlogram of the</p><p>squared mean-adjusted values.</p><p>εt = wt</p><p>√</p><p>α0 + α1ε2t−1 (7.4)</p><p>where {wt} is white noise with zero mean and unit variance and α0 and α1</p><p>are model parameters.</p><p>To see how this introduces volatility, square Equation (7.4) to calculate</p><p>the variance</p><p>Var (εt) = E</p><p>(</p><p>ε2t</p><p>)</p><p>= E</p><p>(</p><p>w2</p><p>t</p><p>)</p><p>E</p><p>(</p><p>α0 + α1ε</p><p>2</p><p>t−1</p><p>)</p><p>= E</p><p>(</p><p>α0 + α1ε</p><p>2</p><p>t−1</p><p>)</p><p>= α0 + α1Var (εt−1) (7.5)</p><p>since {wt} has unit variance and {εt} has zero mean. If we compare Equa-</p><p>tion (7.5) with the AR(1) process xt = α0 + α1xt−1 + wt, we see that the</p><p>variance of an ARCH(1) process behaves just like an AR(1) model. Hence, in</p><p>model fitting, a decay in the autocorrelations of the squared residuals should</p><p>indicate whether an ARCH model is appropriate or not. The model should</p><p>only be applied to a prewhitened residual series {εt} that is uncorrelated and</p><p>contains no trends or seasonal changes, such as might be obtained after fitting</p><p>a satisfactory SARIMA model.</p><p>7.4.3 Extensions and GARCH models</p><p>The first-order ARCH model can be extended to a pth-order process by in-</p><p>cluding higher lags. An ARCH(p) process is given by</p><p>7.4 ARCH models 149</p><p>εt = wt</p><p>√√√√α0 +</p><p>p∑</p><p>i=1</p><p>αpε2t−i (7.6)</p><p>where {wt} is again white noise with zero mean and unit variance.</p><p>A further extension, widely used in financial applications, is the generalised</p><p>ARCH model, denoted GARCH(q, p), which has the ARCH(p) model as the</p><p>special case GARCH(0, p). A series {εt} is GARCH(q, p) if</p><p>εt = wt</p><p>√</p><p>ht (7.7)</p><p>where</p><p>ht = α0 +</p><p>p∑</p><p>i=1</p><p>αiε</p><p>2</p><p>t−i +</p><p>q∑</p><p>j=1</p><p>βjht−j (7.8)</p><p>and αi and βj (i = 0, 1, . . . , p; j = 1, . . . , q) are model parameters. In R, a</p><p>GARCH model can be fitted using the garch function in the tseries library</p><p>(Trapletti and Hornik, 2008). An example now follows.</p><p>7.4.4 Simulation and fitted GARCH model</p><p>In the following code data are simulated for the GARCH(1, 1) model at =</p><p>wt</p><p>√</p><p>ht, where ht = α0 +α1at−1 +β1ht−1 with α1 +β1 set.seed(1)</p><p>> alpha0 alpha1 beta1 w a h for (i in 2:10000) {</p><p>h[i] acf(a)</p><p>> acf(a^2)</p><p>The series in a exhibits the GARCH characteristics of uncorrelated values</p><p>(Fig. 7.6a) but correlated squared values (Fig. 7.6b).</p><p>In the following example, a GARCH model is fitted to the simulated series</p><p>using the garch function, which can be seen to recover the original parameters</p><p>since these fall within the 95% confidence intervals. The default is GARCH(1,</p><p>1), which often provides an adequate model, but higher-order models can be</p><p>specified with the parameter order=c(p,q) for some choice of p and q.</p><p>150 7 Non-stationary Models</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.6. Correlograms for GARCH series: (a) simulated series; (b) squared values</p><p>of simulated series.</p><p>> library(tseries)</p><p>> a.garch confint(a.garch)</p><p>2.5 % 97.5 %</p><p>a0 0.0882 0.109</p><p>a1 0.3308 0.402</p><p>b1 0.1928 0.295</p><p>In the example above, we have used the parameter trace=F to sup-</p><p>press output and a numerical estimate of gradient grad="numerical" that</p><p>is slightly more robust (in the sense of algorithmic convergence) than the</p><p>default.</p><p>7.4.5 Fit to S&P500 series</p><p>The GARCH model is fitted to the S&P500 return series. The residual series</p><p>of the GARCH model {ŵt} are calculated from</p><p>ŵt =</p><p>εt√</p><p>ĥt</p><p>7.4 ARCH models 151</p><p>If the GARCH model is suitable the residual series should appear to be a</p><p>realisation of white noise with zero mean and unit variance. In the case of a</p><p>GARCH(1, 1) model,</p><p>ĥt = α̂0 + α̂1ε</p><p>2</p><p>t−1 + β̂1ĥt−1</p><p>with ĥ1 = 0 for t = 2, . . . , n.2 The calculations are performed by the function</p><p>garch. The first value in the residual series is not available (NA), so we remove</p><p>the first value using [-1] and the correlograms are then found for the resultant</p><p>residual and squared residual series:</p><p>> sp.garch sp.res acf(sp.res)</p><p>> acf(sp.res^2)</p><p>Both correlograms suggest that the residuals of the fitted GARCH model be-</p><p>have like white noise, indicating a satisfactory fit has been obtained (Fig. 7.7).</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.7. GARCH model fitted to mean-adjusted S&P500 returns: (a) correlogram</p><p>of the residuals; (b) correlogram of the squared residuals.</p><p>2 Notice that a residual for time t = 1 cannot be calculated from this formula.</p><p>152 7 Non-stationary Models</p><p>7.4.6 Volatility in climate series</p><p>Recently there have been studies on volatility in climate series (e.g., Romilly,</p><p>2005). Temperature data (1850–2007; see Brohan et al. 2006) for the southern</p><p>hemisphere were extracted from the database maintained by the University</p><p>of East Anglia Climatic Research Unit and edited into a form convenient for</p><p>reading into R. In the following code, the series are read in, plotted (Fig. 7.8),</p><p>and a best-fitting seasonal ARIMA model obtained using the get.best.arima</p><p>function given in §7.3.2. Confidence intervals for the parameters were then</p><p>evaluated (the transpose t() was taken to provide these in rows instead of</p><p>columns).</p><p>Time</p><p>st</p><p>em</p><p>p.</p><p>ts</p><p>1850 1900 1950 2000</p><p>−</p><p>1.</p><p>0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>Fig. 7.8. The southern hemisphere temperature series.</p><p>> stemp stemp.ts plot(stemp.ts)</p><p>> stemp.best stemp.best[[3]]</p><p>[1] 1 1 2 2 0 1</p><p>> stemp.arima t( confint(stemp.arima) )</p><p>ar1 ma1 ma2 sar1 sar2 sma1</p><p>2.5 % 0.832 -1.45 0.326 0.858 -0.0250 -0.97</p><p>97.5 % 0.913 -1.31 0.453 1.004 0.0741 -0.85</p><p>The second seasonal AR component is not significantly different from zero,</p><p>and therefore the model is refitted leaving this component out:</p><p>> stemp.arima t( confint(stemp.arima) )</p><p>ar1 ma1 ma2 sar1 sma1</p><p>2.5 % 0.83 -1.45 0.324 0.924 -0.969</p><p>97.5 % 0.91 -1.31 0.451 0.996 -0.868</p><p>To check for goodness-of-fit, the correlogram of residuals from the ARIMA</p><p>model is plotted (Fig. 7.9a). In addition, to investigate volatility, the correlo-</p><p>gram of the squared residuals is found (Fig. 7.9b).</p><p>0.0 0.5 1.0 1.5 2.0 2.5</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.5 1.0 1.5 2.0 2.5</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.9. Seasonal ARIMA model fitted to the temperature series: (a) correlogram</p><p>of the residuals; (b) correlogram of the squared residuals.</p><p>> stemp.res layout(1:2)</p><p>154 7 Non-stationary Models</p><p>> acf(stemp.res)</p><p>> acf(stemp.res^2)</p><p>There is clear evidence of volatility since the squared residuals are corre-</p><p>lated at most lags (Fig. 7.9b). Hence, a GARCH model is fitted to the residual</p><p>series:</p><p>> stemp.garch t(confint(stemp.garch))</p><p>a0 a1 b1</p><p>2.5 % 1.06e-05 0.0330 0.925</p><p>97.5 % 1.49e-04 0.0653 0.963</p><p>> stemp.garch.res acf(stemp.garch.res)</p><p>> acf(stemp.garch.res^2)</p><p>Based on the output above, we can see that the coefficients of the fitted</p><p>GARCH model are all statistically significant, since zero does not fall in any of</p><p>the confidence intervals. Furthermore, the correlogram of the residuals shows</p><p>no obvious patterns or significant values (Fig. 7.10). Hence, a satisfactory fit</p><p>has been obtained.</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 7.10. GARCH model fitted to the residuals of the seasonal ARIMA model</p><p>of the temperature series: (a) correlogram of the residuals; (b) correlogram of the</p><p>squared residuals.</p><p>7.6 Exercises 155</p><p>7.4.7 GARCH in forecasts and simulations</p><p>If a GARCH model is fitted to the residual errors of a fitted time series</p><p>model, it will not influence the average prediction at some point in time since</p><p>the mean of the residual errors is zero. Thus, single-point forecasts from a</p><p>fitted time series model remain unchanged when GARCH models are fitted</p><p>to the residuals. However, a fitted GARCH model will affect the variance of</p><p>simulated predicted values and thus result in periods of changing variance or</p><p>volatility in simulated series.</p><p>The main application of GARCH models is for simulation studies, espe-</p><p>cially in finance, insurance, teletraffic, and climatology. In all these applica-</p><p>tions, the periods of high variability tend to lead to untoward events, and it is</p><p>essential to model them in a realistic manner. Typical R code for simulation</p><p>was given in §7.4.4.</p><p>7.5 Summary of R commands</p><p>garch fits a GARCH (or ARCH) model to data</p><p>7.6 Exercises</p><p>In each of the following, {wt} is white noise with zero mean.</p><p>1. Identify each of the following as specific ARIMA models and state whether</p><p>or not they are stationary.</p><p>a) zt = zt−1 − 0.25zt−2 + wt + 0.5wt−1</p><p>b) zt = 2zt−1 − zt−2 + wt</p><p>c) zt = 0.5zt−1 + 0.5zt−2 + wt − 0.5wt−1 + 0.25wt−2</p><p>2. Identify the following as certain multiplicative seasonal ARIMA models</p><p>and find out whether they are invertible and stationary.</p><p>a) zt = 0.5zt−1 + zt−4 − 0.5zt−5 + wt − 0.3wt−1</p><p>b) zt = zt−1 + zt−12 − zt−13 + wt − 0.5wt−1 − 0.5wt−12 + 0.25wt−13</p><p>3. Suppose xt = a+ bt+ wt. Define yt = ∇xt.</p><p>a) Show that xt = x0 +</p><p>∑t</p><p>i=1 yi and identify x0.</p><p>b) Now suppose an MA(1) model is fitted to {yt} and the fitted model is</p><p>yt = b+wt + βwt−1. Show that a simulated {xt} will have increasing</p><p>variance about the line a+ bt unless β is precisely −1.</p><p>156 7 Non-stationary Models</p><p>4. The number of overseas visitors to New Zealand is recorded for each month</p><p>over the period 1977 to 1995 in the file osvisit.dat on the book website</p><p>(http://www.massey.ac.nz/∼pscowper/ts/osvisit.dat). Download the file</p><p>into R and carry out the following analysis. Your solution should include</p><p>any R commands, plots, and comments. Let xt be the number of overseas</p><p>visitors in time period t (in months) and zt = ln(xt).</p><p>a) Comment on the main features in the correlogram for {zt}.</p><p>b) Fit an ARIMA(1, 1, 0) model to {zt} giving the estimated AR pa-</p><p>rameter and the standard deviation of the residuals. Comment on the</p><p>correlogram of the residuals of this fitted ARIMA model.</p><p>c) Fit a seasonal ARIMA(1, 1, 0)(0, 1, 0)12 model to {zt} and plot the</p><p>correlogram of the residuals of this model. Has seasonal differencing</p><p>removed the seasonal effect? Comment.</p><p>d) Choose the best-fitting Seasonal ARIMA model from the following:</p><p>ARIMA(1, 1, 0)(1, 1, 0)12, ARIMA(0, 1, 1)(0, 1, 1)12, ARIMA(1, 1,</p><p>0)(0, 1, 1)12, ARIMA(0, 1, 1)(1, 1, 0)12, ARIMA(1, 1, 1)(1, 1, 1)12,</p><p>ARIMA(1, 1, 1)(1, 1, 0)12, ARIMA(1, 1, 1)(0, 1, 1)12. Base your choice</p><p>on the AIC, and comment on the correlogram of the residuals of the</p><p>best-fitting model.</p><p>e) Express the best-fitting model in part (d) above in terms of zt, white</p><p>noise components, and the backward shift operator (you will need</p><p>to write this out by hand, but it is not necessary to expand all the</p><p>factors).</p><p>f) Test the residuals from the best-fitting seasonal ARIMA model for</p><p>stationarity.</p><p>g) Forecast the number of overseas visitors for each month in the next</p><p>year (1996), and give the total number of visitors expected in 1996</p><p>under the fitted model. [Hint: To get the forecasts, you will need to use</p><p>the exponential function of the generated seasonal ARIMA forecasts</p><p>and multiply these by a bias correction factor based on the mean</p><p>square residual error.]</p><p>5. Use the get.best.arima function from §7.3.2 to obtain a best-fitting</p><p>ARIMA(p, d, q)(P , D, Q)12 for all p, d, q, P , D, Q ≤ 2 to the</p><p>logarithm of the Australian chocolate production series (in the file at</p><p>http://www.massey.ac.nz/∼pscowper/ts/cbe.dat). Check that the correl-</p><p>ogram of the residuals for the best-fitting model is representative of white</p><p>noise. Check the correlogram of the squared residuals. Comment on the</p><p>results.</p><p>6. This question uses the data in stockmarket.dat on the book website</p><p>http://www.massey.ac.nz/∼pscowper/ts/, which contains stock market</p><p>7.6 Exercises 157</p><p>data for seven cities for the period January 6, 1986 to December 31, 1997.</p><p>Download the data into R and put the data into a variable x. The first</p><p>three rows should be:</p><p>> x[1:3,]</p><p>Amsterdam Frankfurt London HongKong Japan Singapore NewYork</p><p>1 275.76 1425.56 1424.1 1796.59 13053.8 233.63 210.65</p><p>2 275.43 1428.54 1415.2 1815.53 12991.2 237.37 213.80</p><p>3 278.76 1474.24 1404.2 1826.84 13056.4 240.99 207.97</p><p>a) Plot the Amsterdam series and the first-order differences of the series.</p><p>Comment on the plots.</p><p>b) Fit the following models to the Amsterdam series, and select the best</p><p>fitting model: ARIMA(0, 1, 0); ARIMA(1, 1, 0), ARIMA(0, 1, 1),</p><p>ARIMA(1, 1, 1).</p><p>c) Produce the correlogram of the residuals of the best-fitting model and</p><p>the correlogram of the squared residuals. Comment.</p><p>d) Fit the following GARCH models to the residuals, and select the</p><p>best-fitting model: GARCH(0, 1), GARCH(1, 0), GARCH(1, 1), and</p><p>GARCH(0, 2). Give the estimated parameters of the best-fitting</p><p>model.</p><p>e) Plot the correlogram of the residuals from the best fitting GARCH</p><p>model. Plot the correlogram of the squared residuals from the best</p><p>fitting GARCH model, and comment on the plot.</p><p>7. Predict the monthly temperatures for 2008 using the model fitted to the</p><p>climate series in §7.4.6, and add these predicted values to a time plot of</p><p>the temperature series from 1990. Give an upper bound for the predicted</p><p>values based on a 95% confidence level. Simulate ten possible future tem-</p><p>perature scenarios for 2008. This will involve generating GARCH errors</p><p>and adding these to the predicted values from the fitted seasonal ARIMA</p><p>model.</p><p>8</p><p>Long-Memory Processes</p><p>8.1 Purpose</p><p>Some time series exhibit marked correlations at high lags, and they are re-</p><p>ferred to as long-memory processes. Long-memory is a feature of many geo-</p><p>physical time series. Flows in the Nile River have correlations at high lags,</p><p>and Hurst (1951) demonstrated that this affected the optimal design capacity</p><p>of a dam. Mudelsee (2007) shows that long-memory is a hydrological prop-</p><p>erty that can lead to prolonged drought or temporal clustering of extreme</p><p>floods. At a rather different scale, Leland et al. (1993) found that Ethernet</p><p>local area network (LAN) traffic appears to be statistically self-similar and a</p><p>long-memory process. They showed that the nature of congestion produced by</p><p>self-similar traffic differs drastically from that predicted by the traffic models</p><p>used at that time. Mandelbrot and co-workers investigated the relationship</p><p>between self-similarity and long term memory and played a leading role in</p><p>establishing fractal geometry as a subject of study.</p><p>8.2 Fractional differencing</p><p>Beran (1994) describes the qualitative features of a typical sample path (real-</p><p>isation) from a long-memory process. There are relatively long periods during</p><p>which the observations tend to stay at a high level and similar long periods</p><p>during which observations tend to be at a low level. There may appear to</p><p>be trends or cycles over short time periods, but these do not persist and the</p><p>entire series looks stationary. A more objective criterion is that sample corre-</p><p>lations rk decay to zero at a rate that is approximately proportional to k−λ</p><p>for some 0</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 159</p><p>Use R, DOI 10.1007/978-0-387-88698-5 8,</p><p>© Springer Science+Business Media, LLC 2009</p><p>160 8 Long-Memory Processes</p><p>its autocorrelation function. A stationary process xt with long-memory has</p><p>an autocorrelation function ρk that satisfies the condition</p><p>lim</p><p>k→∞</p><p>ρk = ck−λ</p><p>for some 0 1</p><p>2 . The</p><p>Hurst parameter, H, is defined by H = 1 − λ/2 and hence ranges from 1</p><p>2</p><p>to 1. The closer H is to 1, the more persistent the time series. If there is no</p><p>long-memory effect, then H = 1</p><p>2 .</p><p>A fractionally differenced ARIMA process {xt}, FARIMA(p, d, q), has the</p><p>form</p><p>φ(B)(1−B)dxt = ψ(B)wt (8.1)</p><p>for some − 1</p><p>2 cf d cf[1] for (i in 1:39) cf[i+1] 4) degrees of freedom has</p><p>kurtosis 6/(ν−4) and so is heavy tailed. If, for example, d = 0.45 and L = 40,</p><p>then</p><p>(1−B)−dwt = wt + 0.45wt−1 + 0.32625wt−2 + 0.2664375wt−3</p><p>+ · · ·+ 0.0657056wt−40</p><p>The autocorrelation function ρk of a FARIMA(0, d, 0) process tends towards</p><p>Γ (1− d)</p><p>Γ (d)</p><p>|k|2d−1</p><p>for large n. The process is stationary provided − 1</p><p>2 library(fracdiff)</p><p>> set.seed(1)</p><p>> fds.sim x fds.fit n L d fdc fdc[1] for (k in 2:L) fdc[k] y for (i in (L+1):n) {</p><p>csm y z.ar ns z par(mfcol = c(2, 2))</p><p>> plot(as.ts(x), ylab = "x")</p><p>> acf(x) ; acf(y) ; acf(z)</p><p>In Figure 8.1, we show the results when we generate a realisation {xt} from</p><p>a fractional difference model with no AR or MA parameters, FARIMA(0, 0.4,</p><p>0). The very slow decay in both the acf and pacf indicates long-memory. The</p><p>estimate of d is 0.3921. The fractionally differenced series, {yt}, appears to be</p><p>a realisation of DWN. If, instead of fitting a FARIMA(0, d, 0) model, we use</p><p>ar, the order selected is 38. The residuals from AR(38) also appear to be a</p><p>realisation from DWN, but the single-parameter FARIMA model is far more</p><p>parsimonious.</p><p>In Figure 8.2, we show the results when we generate a realisation {xt}</p><p>from a FARIMA(1, 0.4, 0) model with an AR parameter of 0.9. The estimates</p><p>of d and the AR parameter, obtained from fracdiff, are 0.429 and 0.884,</p><p>respectively. The estimate of the AR parameter made from the fractionally</p><p>differenced series {yt} using ar is 0.887, and the slight difference is small by</p><p>comparison with the estimated error and is of no practical importance. The</p><p>residuals appear to be a realisation of DWN (Fig. 8.2).</p><p>8.3 Fitting to simulated data 163</p><p>(a)</p><p>Time</p><p>x</p><p>0 2000 6000 10000</p><p>−</p><p>4</p><p>0</p><p>2</p><p>4</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>(a)</p><p>Lag</p><p>P</p><p>ar</p><p>tia</p><p>l A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 8.1. A simulated series with long-memory FARIMA(0, 0.4, 0): (a) time series</p><p>plot (x); (b) correlogram of series x; (c) partial correlogram of y; (d) correlogram</p><p>after fractional differencing (z).</p><p>> summary(fds.fit)</p><p>...</p><p>Coefficients:</p><p>Estimate Std. Error z value Pr(>|z|)</p><p>d 0.42904 0.01439 29.8 ar(y)</p><p>Coefficients:</p><p>1</p><p>0.887</p><p>Order selected 1 sigma^2 estimated as 1.03</p><p>164 8 Long-Memory Processes</p><p>(a)</p><p>Time</p><p>x</p><p>0 2000 6000 10000</p><p>−</p><p>30</p><p>−</p><p>10</p><p>10</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(c)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 10 20 30 40</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(d)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 8.2. A time series with long-memory FARIMA(1, 0.4, 0): (a) time series plot</p><p>(x); (b) correlogram of series x; (c) correlogram of the differenced series (y); (d)</p><p>correlogram of the residuals after fitting an AR(1) model (z).</p><p>8.4 Assessing evidence of long-term dependence</p><p>8.4.1 Nile minima</p><p>The data in the file Nilemin.txt are annual minimum water levels (mm)</p><p>of the Nile River for the years 622 to 1284, measured at the Roda Island</p><p>gauge near Cairo. It is likely that there may be a trend over a 600-year period</p><p>due to changing climatic conditions or changes to the channels around Roda</p><p>Island. We start the analysis by estimating and removing a linear trend fitted</p><p>by regression. Having done this, a choice of nar is taken as a starting value</p><p>for using fracdiff on the residuals from the regression.</p><p>Given the iterative</p><p>nature of the fitting process, the choice of initial values for nar and nma should</p><p>not be critical. The estimate of d with nar set at 5 is 0.3457. The best-fitting</p><p>model to the fractionally differenced series is AR(1) with parameter 0.14. We</p><p>now re-estimate d using fracdiff with nar equal to 1, but in this case the</p><p>estimate of d is unchanged. The residuals are a plausible realisation of DWN.</p><p>The acf of the squared residuals indicates that a GARCH model would be</p><p>appropriate. There is convincing evidence of long-term memory in the Nile</p><p>River minima flows (Fig. 8.3).</p><p>8.4 Assessing evidence of long-term dependence 165</p><p>0 100 200 300 400 500 600</p><p>10</p><p>00</p><p>13</p><p>00</p><p>Nile minima</p><p>Time</p><p>D</p><p>ep</p><p>th</p><p>(</p><p>m</p><p>m</p><p>)</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Detrended Nile minima</p><p>Fractionally differenced series</p><p>Time</p><p>m</p><p>m</p><p>0 100 200 300 400 500</p><p>−</p><p>20</p><p>0</p><p>10</p><p>0</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fractionally differenced series</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Residuals</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Squared residuals</p><p>Fig. 8.3. Nile River minimum water levels: time series (top left); acf of detrended</p><p>time series (middle left); fractionally differenced detrended series (lower left); acf of</p><p>fractionally differenced series (top right); acf of residuals of AR(1) fitted to frac-</p><p>tionally differenced series (middle right); acf of squared residuals of AR(1) (lower</p><p>right).</p><p>8.4.2 Bellcore Ethernet data</p><p>The data in LAN.txt are the numbers of packet arrivals (bits) in 4000 consecu-</p><p>tive 10-ms intervals seen on an Ethernet at the Bellcore Morristown Research</p><p>and Engineering facility. A histogram of the numbers of bits is remarkably</p><p>skewed, so we work with the logarithm of one plus the number of bits. The</p><p>addition of 1 is needed because there are many intervals in which no pack-</p><p>ets arrive. The correlogram of this transformed time series suggests that a</p><p>FARIMA model may be suitable.</p><p>The estimate of d, with nar set at 48, is 0.3405, and the fractionally dif-</p><p>ferenced series has no substantial correlations. Nevertheless, the function ar</p><p>fits an AR(26) model to this series, and the estimate of the standard devi-</p><p>ation of the errors, 2.10, is slightly less than the standard deviation of the</p><p>fractionally differenced series, 2.13. There is noticeable autocorrelation in the</p><p>series of squared residuals from the AR(26) model, which is a feature of time</p><p>series that have bursts of activity, and this can be modelled as a GARCH</p><p>166 8 Long-Memory Processes</p><p>ln(bits+1)</p><p>Time</p><p>x</p><p>0 1000 2000 3000 4000</p><p>0</p><p>2</p><p>4</p><p>6</p><p>8</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>ln(bits+1)</p><p>Fractionally differenced series</p><p>Time</p><p>y</p><p>0 1000 2000 3000 4000</p><p>−</p><p>6</p><p>0</p><p>4</p><p>8</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fractionally differenced series</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Residuals</p><p>0 5 10 15 20 25 30 35</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Squared residuals</p><p>Fig. 8.4. Bellcore local area network (LAN) traffic, ln(1+number of bits): time</p><p>series (top left); acf of time series (middle left); fractionally differenced series (lower</p><p>left); acf of fractionally differenced series (top right); acf of residuals of AR(26) fitted</p><p>to fractionally differenced series (middle right); acf of squared residuals of AR(26)</p><p>(lower right).</p><p>process (Fig. 8.4). In Exercises 1 and 2, you are asked to look at this case in</p><p>more detail and, in particular, investigate whether an ARMA model is more</p><p>parsimonious.</p><p>8.4.3 Bank loan rate</p><p>The data in mprime.txt are of the monthly percentage US Federal Reserve</p><p>Bank prime loan rate,2 courtesy of the Board of Governors of the Federal</p><p>Reserve System, from January 1949 until November 2007. The time series is</p><p>plotted in the top left of Figure 8.5 and looks as though it could be a realisation</p><p>of a random walk. It also has a period of high variability. The correlogram</p><p>shows very high correlations at smaller lags and substantial correlation up to</p><p>lag 28. Neither a random walk nor a trend is a suitable model for long-term</p><p>2 Data downloaded from Federal Reserve Economic Data at the Federal Reserve</p><p>Bank of St. Louis.</p><p>8.5 Simulation 167</p><p>simulation of interest rates in a stable economy. Instead, we fit a FARIMA</p><p>model, which has the advantage of being stationary.</p><p>Interest rate</p><p>Time</p><p>x</p><p>0 100 200 300 400 500 600 700</p><p>5</p><p>10</p><p>20</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Interest rate</p><p>Fractionally differenced series</p><p>Time</p><p>y</p><p>0 100 200 300 400 500 600</p><p>5</p><p>10</p><p>20</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fractionally differenced series</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Residuals</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Squared residuals</p><p>Fig. 8.5. Federal Reserve Bank interest rates: time series (top left); acf of time series</p><p>(middle left); fractionally differenced series (lower left); acf of fractionally differenced</p><p>series (upper right); acf of residuals of AR(17) fitted to fractionally differenced series</p><p>(middle right); acf of squared residuals of AR(17) (lower right).</p><p>The estimate of d is almost 0, and this implies that the decay of the</p><p>correlations from an initial high value is more rapid than it would be for a</p><p>FARIMA model. The fitted AR model has an order of 17 and is not entirely</p><p>satisfactory because of the statistically significant autocorrelation at lag 1 in</p><p>the residual series. You are asked to do better in Exercise 3. The substantial</p><p>autocorrelations of the squared residuals from the AR(17) model indicate that</p><p>a GARCH model is needed. This has been a common feature of all three time</p><p>series considered in this section.</p><p>8.5 Simulation</p><p>FARIMA models are important for simulation because short-memory models,</p><p>which ignore evidence of long-memory, can lead to serious overestimation of</p><p>168 8 Long-Memory Processes</p><p>system performance. This has been demonstrated convincingly at scales from</p><p>reservoirs to routers in telecommunication networks.</p><p>Realistic models for simulation will typically need to incorporate GARCH</p><p>and heavy-tailed distributions for the basic white noise series. The procedure</p><p>is to fit a GARCH model to the residuals from the AR model fitted to the</p><p>fractionally differenced series. Then the residuals from the GARCH model</p><p>are calculated and a suitable probability distribution can be fitted to these</p><p>residuals (Exercise 5). Having fitted the models, the simulation proceeds by</p><p>generating random numbers from the fitted probability model fitted to the</p><p>GARCH residuals.</p><p>8.6 Summary of additional commands used</p><p>fracdiff fits a fractionally differenced, FARIMA(p, d, q), model</p><p>fracdiff.sim simulates a FARIMA model</p><p>8.7 Exercises</p><p>1. Read the LAN data into R.</p><p>a) Plot a boxplot and histogram of the number of bits.</p><p>b) Calculate the skewness and kurtosis of the number of bits.</p><p>c) Repeat (a) and (b) for the logarithm of 1 plus the number of bits.</p><p>d) Repeat (a) for the residuals after fitting an AR model to the fraction-</p><p>ally differenced series.</p><p>e) Fit an ARMA(p, q) model to the fractionally differenced series. Is this</p><p>an improvement on the AR(p) model?</p><p>f) In the text, we set nar in fracdiff at 48. Repeat the analysis with</p><p>nar equal to 2.</p><p>2. Read the LAN data into R.</p><p>a) Calculate the number of bits in 20-ms intervals, and repeat the analysis</p><p>using this time series.</p><p>b) Calculate the number of bits in 40-ms intervals, and repeat the analysis</p><p>using this time series.</p><p>c) Repeat (a) and (b) for realisations from FARIMA(0, d, 0).</p><p>3. Read the Federal Reserve Bank data into R.</p><p>a) Fit a random walk model and comment.</p><p>b) Fit an ARMA(p, q) model and comment.</p><p>8.7 Exercises 169</p><p>4. The rescaled adjusted range is calculated for a time series {xt} of length</p><p>m as follows. First compute the mean, x̄, and standard deviation, s, of</p><p>the series. Then calculate the adjusted partial sums</p><p>Sk =</p><p>k∑</p><p>t=1</p><p>xt − kx̄</p><p>for k = 1, . . . ,m. Notice that S(m) must equal zero and that large devia-</p><p>tions from 0 are indicative of persistence. The rescaled adjusted range</p><p>Rm = {max(S1, . . . , Sm)− min(S1, . . . , Sm)}/s</p><p>is the difference</p><p>. . . . . . . . . . . . . . . . . . . . . . . . 193</p><p>9.8.2 AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193</p><p>xiv Contents</p><p>9.8.3 Derivation of spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193</p><p>9.9 Autoregressive spectrum estimation . . . . . . . . . . . . . . . . . . . . . . . . 194</p><p>9.10 Finer details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194</p><p>9.10.1 Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194</p><p>9.10.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195</p><p>9.10.3 Daniell windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196</p><p>9.10.4 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196</p><p>9.10.5 Tapering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197</p><p>9.10.6 Spectral analysis compared with wavelets . . . . . . . . . . . . . 197</p><p>9.11 Summary of additional commands used . . . . . . . . . . . . . . . . . . . . 197</p><p>9.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198</p><p>10 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.2 Identifying the gain of a linear system . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.2.1 Linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201</p><p>10.2.2 Natural frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202</p><p>10.2.3 Estimator of the gain function . . . . . . . . . . . . . . . . . . . . . . 202</p><p>10.3 Spectrum of an AR(p) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203</p><p>10.4 Simulated single mode of vibration system . . . . . . . . . . . . . . . . . . 203</p><p>10.5 Ocean-going tugboat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205</p><p>10.6 Non-linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207</p><p>10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208</p><p>11 Multivariate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211</p><p>11.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211</p><p>11.2 Spurious regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211</p><p>11.3 Tests for unit roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214</p><p>11.4 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216</p><p>11.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216</p><p>11.4.2 Exchange rate series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218</p><p>11.5 Bivariate and multivariate white noise . . . . . . . . . . . . . . . . . . . . . 219</p><p>11.6 Vector autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220</p><p>11.6.1 VAR model fitted to US economic series . . . . . . . . . . . . . . 222</p><p>11.7 Summary of R commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227</p><p>11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227</p><p>12 State Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229</p><p>12.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229</p><p>12.2 Linear state space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230</p><p>12.2.1 Dynamic linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230</p><p>12.2.2 Filtering* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231</p><p>12.2.3 Prediction* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232</p><p>12.2.4 Smoothing* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233</p><p>12.3 Fitting to simulated univariate time series . . . . . . . . . . . . . . . . . . 234</p><p>Contents xv</p><p>12.3.1 Random walk plus noise model . . . . . . . . . . . . . . . . . . . . . . 234</p><p>12.3.2 Regression model with time-varying coefficients . . . . . . . 236</p><p>12.4 Fitting to univariate time series . . . . . . . . . . . . . . . . . . . . . . . . . . . 238</p><p>12.5 Bivariate time series – river salinity . . . . . . . . . . . . . . . . . . . . . . . . 239</p><p>12.6 Estimating the variance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 242</p><p>12.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243</p><p>12.8 Summary of additional commands used . . . . . . . . . . . . . . . . . . . . 244</p><p>12.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244</p><p>References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247</p><p>Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249</p><p>1</p><p>Time Series Data</p><p>1.1 Purpose</p><p>Time series are analysed to understand the past and to predict the future,</p><p>enabling managers or policy makers to make properly informed decisions.</p><p>A time series analysis quantifies the main features in data and the random</p><p>variation. These reasons, combined with improved computing power, have</p><p>made time series methods widely applicable in government, industry, and</p><p>commerce.</p><p>The Kyoto Protocol is an amendment to the United Nations Framework</p><p>Convention on Climate Change. It opened for signature in December 1997 and</p><p>came into force on February 16, 2005. The arguments for reducing greenhouse</p><p>gas emissions rely on a combination of science, economics, and time series</p><p>analysis. Decisions made in the next few years will affect the future of the</p><p>planet.</p><p>During 2006, Singapore Airlines placed an initial order for twenty Boeing</p><p>787-9s and signed an order of intent to buy twenty-nine new Airbus planes,</p><p>twenty A350s, and nine A380s (superjumbos). The airline’s decision to expand</p><p>its fleet relied on a combination of time series analysis of airline passenger</p><p>trends and corporate plans for maintaining or increasing its market share.</p><p>Time series methods are used in everyday operational decisions. For exam-</p><p>ple, gas suppliers in the United Kingdom have to place orders for gas from the</p><p>offshore fields one day ahead of the supply. Variation about the average for</p><p>the time of year depends on temperature and, to some extent, the wind speed.</p><p>Time series analysis is used to forecast demand from the seasonal average with</p><p>adjustments based on one-day-ahead weather forecasts.</p><p>Time series models often form the basis of computer simulations. Some</p><p>examples are assessing different strategies for control of inventory using a</p><p>simulated time series of demand; comparing designs of wave power devices us-</p><p>ing a simulated series of sea states; and simulating daily rainfall to investigate</p><p>the long-term environmental effects of proposed water management policies.</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 1</p><p>Use R, DOI 10.1007/978-0-387-88698-5 1,</p><p>© Springer Science+Business Media, LLC 2009</p><p>2 1 Time Series Data</p><p>1.2 Time series</p><p>In most branches of science, engineering, and commerce, there are variables</p><p>measured sequentially in time. Reserve banks record interest rates and ex-</p><p>change rates each day. The government statistics department will compute</p><p>the country’s gross domestic product on a yearly basis. Newspapers publish</p><p>yesterday’s noon temperatures for capital cities from around the world. Me-</p><p>teorological offices record rainfall at many different sites with differing reso-</p><p>lutions. When</p><p>between the largest surplus and the greatest deficit. If we</p><p>have a long time series of length n, we can calculate Rm for values of m</p><p>from, for example, 20 upwards to n in steps of 10. When m is less than</p><p>n, we can calculate n−m values for Rm by starting at different points in</p><p>the series. Hurst plotted ln(Rm) against ln(m) for many long time series.</p><p>He noticed that lines fitted through the points were usually steeper for</p><p>geophysical series, such as streamflow, than for realisations of independent</p><p>Gaussian variables (Gaussian DWN). The average value of the slope (H)</p><p>of these lines for the geophysical time series was 0.73, significantly higher</p><p>than the average slope of 0.5 for the independent sequences. The linear</p><p>logarithmic relationship is equivalent to</p><p>Rm ∝ mH</p><p>Plot ln(Rm) against ln(m) for the detrended Nile River minimum flows.</p><p>5. a) Refer to the data in LAN.txt and the time series of logarithms of the</p><p>numbers of packet arrivals, with 1 added, in 10-ms intervals calcu-</p><p>lated from the numbers of packet arrivals. Fit a GARCH model to the</p><p>residuals from the AR(26) model fitted to the fractionally differenced</p><p>time series.</p><p>b) Calculate the residuals from the GARCH model, and fit a suitable</p><p>distribution to these residuals.</p><p>c) Calculate the mean number of packets arriving in 10-ms intervals. Set</p><p>up a simulation model for a router that has a realisation of the model</p><p>in (a) as input and can send out packets at a constant rate equal to</p><p>the product of the mean number of packets arriving in 10-ms intervals</p><p>with a factor g, which is greater than 1.</p><p>d) Code the model fitted in (a) so that it will provide simulations of</p><p>time series of the number of packets that are the input to the router.</p><p>Remember that you first obtain a realisation for ln(number of packets</p><p>+ 1) and then take the exponential of this quantity, subtract 1, and</p><p>round the result to the nearest integer.</p><p>170 8 Long-Memory Processes</p><p>e) Compare the results of your simulation with a model that assumes</p><p>Gaussian white noise for the residuals of the AR(26) model for g =</p><p>1.05, 1.1, 1.5, and 2.</p><p>9</p><p>Spectral Analysis</p><p>9.1 Purpose</p><p>Although it follows from the definition of stationarity that a stationary time</p><p>series model cannot have components at specific frequencies, it can never-</p><p>theless be described in terms of an average frequency composition. Spectral</p><p>analysis distributes the variance of a time series over frequency, and there are</p><p>many applications. It can be used to characterise wind and wave forces, which</p><p>appear random but have a frequency range over which most of the power is</p><p>concentrated. The British Standard BS6841, “Measurement and evaluation of</p><p>human exposure to whole-body vibration”, uses spectral analysis to quantify</p><p>exposure of personnel to vibration and repeated shocks. Many of the early</p><p>applications of spectral analysis were of economic time series, and there has</p><p>been recent interest in using spectral methods for economic dynamics analysis</p><p>(Iacobucci and Noullez, 2005).</p><p>More generally, spectral analysis can be used to detect periodic signals</p><p>that are corrupted by noise. For example, spectral analysis of vibration signals</p><p>from machinery such as turbines and gearboxes is used to expose faults before</p><p>they cause catastrophic failure. The warning is given by the emergence of new</p><p>peaks in the spectrum. Astronomers use spectral analysis to measure the red</p><p>shift and hence deduce the speeds of galaxies relative to our own.</p><p>9.2 Periodic signals</p><p>9.2.1 Sine waves</p><p>Any signal that has a repeating pattern is periodic, with a period equal to</p><p>the length of the pattern. However, the fundamental periodic signal in mathe-</p><p>matics is the sine wave. Joseph Fourier (1768–1830) showed that sums of sine</p><p>waves can provide good approximations to most periodic signals, and spectral</p><p>analysis is based on sine waves.</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 171</p><p>Use R, DOI 10.1007/978-0-387-88698-5 9,</p><p>© Springer Science+Business Media, LLC 2009</p><p>172 9 Spectral Analysis</p><p>Spectral analysis can be confusing because different authors use different</p><p>notation. For example, frequency can be given in radians or cycles per sam-</p><p>pling interval, and frequency can be treated as positive or negative, or just</p><p>positive. You need to be familiar with the sine wave defined with respect to</p><p>a unit circle, and this relationship is so fundamental that the sine and cosine</p><p>functions are called circular functions.</p><p>Imagine a circle with unit radius and centre at the origin, O, with the</p><p>radius rotating at a rotational velocity of ω radians per unit of time. Let t</p><p>be time. The angle, ωt, in radians is measured as the distance around the</p><p>circumference from the positive real (horizontal) axis, with the anti-clockwise</p><p>rotation defined as positive (Fig. 9.1). So, if the radius sweeps out a full circle,</p><p>it has been rotated through an angle of 2π radians. The time taken for this</p><p>one revolution, or cycle, is 2π/ω and is known as the period.</p><p>The sine function, sin(ωt), is the projection of the radius onto the vertical</p><p>axis, and the cosine function, cos(ωt), is the projection of the radius onto the</p><p>horizontal axis. In general, a sine wave of frequency ω, amplitude A, and phase</p><p>ψ is</p><p>A sin(ωt+ ψ) (9.1)</p><p>The positive phase shift represents an advance of ψ/2π cycles. In spectral</p><p>analysis, it is convenient to refer to specific sine waves as harmonics. We rely</p><p>on the trigonometric identity that expresses a general sine wave as a weighted</p><p>sum of sine and cosine functions:</p><p>A sin(ωt+ ψ) = A cos(ψ)sin(ωt) +A sin(ψ)cos(ωt) (9.2)</p><p>Equation (9.2) is fundamental for spectral analysis because a sampled sine</p><p>wave of any given amplitude and phase can be fitted by a linear regression</p><p>model with the sine and cosine functions as predictor variables.</p><p>9.2.2 Unit of measurement of frequency</p><p>The SI1 unit of frequency is the hertz (Hz), which is 1 cycle per second and</p><p>equivalent to 2π radians per second. The hertz is a derived SI unit, and in</p><p>terms of fundamental SI units it has unit s−1. A frequency of f cycles per</p><p>second is equivalent to ω radians per second, where</p><p>ω = 2πf ⇔ f =</p><p>ω</p><p>2π</p><p>(9.3)</p><p>The mathematics is naturally expressed in radians, but Hz is generally used</p><p>in physical applications. By default, R plots have a frequency axis calibrated</p><p>in cycles per sampling interval.</p><p>1 SI is the International System of Units, abbreviated from the French Le Systéme</p><p>International d’Unités.</p><p>9.3 Spectrum 173</p><p>−1.0 −0.5 0.0 0.5 1.0</p><p>−</p><p>1.</p><p>0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>x (Real axis)</p><p>y</p><p>(I</p><p>m</p><p>ag</p><p>in</p><p>ar</p><p>y</p><p>ax</p><p>is</p><p>)</p><p>eiωωt</p><p>ωωt</p><p>cos((ωωt))</p><p>sin((ωωt))</p><p>Fig. 9.1. Angle ωt is the length along the radius. The projection of the radius onto</p><p>the x and y axes is cos(ωt) and sin(ωt), respectively.</p><p>9.3 Spectrum</p><p>9.3.1 Fitting sine waves</p><p>Suppose we have a time series of length n, {xt : t = 1, . . . , n}, where it is</p><p>convenient to arrange that n is even, if necessary by dropping the first or last</p><p>term. We can fit a time series regression with xt as the response and n − 1</p><p>predictor variables:</p><p>cos</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>, cos</p><p>(</p><p>4πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>4πt</p><p>n</p><p>)</p><p>, cos</p><p>(</p><p>6πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>6πt</p><p>n</p><p>)</p><p>, . . . ,</p><p>cos</p><p>(</p><p>2(n/2−1)πt</p><p>n</p><p>)</p><p>, sin</p><p>(</p><p>2(n/2−1)πt</p><p>n</p><p>)</p><p>, cos (πt).</p><p>We will denote the estimated coefficients by a1, b1, a2, b2, a3, b3, . . . , an/2−1,</p><p>bn/2−1, an/2, respectively, so</p><p>xt = a0 + a1cos</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>+ b1sin</p><p>(</p><p>2πt</p><p>n</p><p>)</p><p>+ · · ·</p><p>+ an/2−1cos</p><p>(</p><p>2(n/2− 1)πt</p><p>n</p><p>)</p><p>+ bn/2−1sin</p><p>(</p><p>2(n/2− 1)πt</p><p>n</p><p>)</p><p>+ an/2cos (πt)</p><p>Since the number of coefficients equals the length of the time series, there are</p><p>no degrees of freedom for error. The intercept term, a0, is just the mean x. The</p><p>lowest frequency is one cycle, or 2π radians, per record length, which is 2π/n</p><p>174 9 Spectral Analysis</p><p>radians per sampling interval. A general frequency, in this representation, is m</p><p>cycles per record length, equivalent to 2πm/n radians per sampling interval,</p><p>where m is an integer between 1 and n/2. The highest frequency is π radians</p><p>per sampling interval, or equivalently 0.5 cycles per sampling interval, and it</p><p>makes</p><p>n/2 cycles in the record length, alternating between −1 and +1 at the</p><p>sampling points. This regression model is a finite Fourier series for a discrete</p><p>time series.2</p><p>We will refer to the sine wave that makes m cycles in the record length</p><p>as the mth harmonic, and the first harmonic is commonly referred to as the</p><p>fundamental frequency . The amplitude of the mth harmonic is</p><p>Am =</p><p>√</p><p>a2</p><p>m + b2m</p><p>Parseval’s Theorem is the key result, and it expresses the variance of the time</p><p>series as a sum of n/2 components at integer frequencies from 1 to n/2 cycles</p><p>per record length:</p><p>1</p><p>n</p><p>∑n</p><p>t=1 x</p><p>2</p><p>t = A2</p><p>0 + 1</p><p>2</p><p>∑(n/2)−1</p><p>m=1 A2</p><p>m +A2</p><p>n/2</p><p>Var(x) = 1</p><p>2</p><p>∑(n/2)−1</p><p>m=1 A2</p><p>m +A2</p><p>n/2</p><p>(9.4)</p><p>Parseval’s Theorem follows from the fact that the sine and cosine terms used</p><p>as explanatory terms in the time series regression are uncorrelated, together</p><p>with the result for the variance of a linear combination of variables (Exer-</p><p>cise 1). A summary of the harmonics, and their corresponding frequencies</p><p>and periods,3 follows:</p><p>harmonic period frequency frequency contribution</p><p>(cycle/samp. int.) (rad/samp. int.) to variance</p><p>1 n 1/n 2π/n 1</p><p>2A</p><p>2</p><p>1</p><p>2 n/2 2/n 4π/n 1</p><p>2A</p><p>2</p><p>2</p><p>3 n/3 3/n 6π/n 1</p><p>2A</p><p>2</p><p>3</p><p>...</p><p>...</p><p>...</p><p>...</p><p>...</p><p>n/2− 1 n/(n/2− 1) (n/2− 1)/n (n− 2)π/n 1</p><p>2A</p><p>2</p><p>n/2−1</p><p>n/2 2 1/n π A2</p><p>n/2</p><p>Although we have introduced the Am in the context of a time series regres-</p><p>sion, the calculations are usually performed with the fast fourier transform</p><p>algorithm (FFT). We say more about this in §9.7.</p><p>2 A Fourier series is an approximation to a signal defined for continuous time over</p><p>a finite period. The signal may have discontinuities. The Fourier series is the sum</p><p>of an infinite number of sine and cosine terms.</p><p>3 The period of a sine wave is the time taken for 1 cycle and is the reciprocal of</p><p>the frequency measured in cycles per time unit.</p><p>9.4 Spectra of simulated series 175</p><p>9.3.2 Sample spectrum</p><p>A plot of A2</p><p>m, as spikes, against m is a Fourier line spectrum. The raw pe-</p><p>riodogram in R is obtained by joining the tips of the spikes in the Fourier</p><p>line spectrum to give a continuous plot and scaling it so that the area equals</p><p>the variance. The periodogram distributes the variance over frequency, but it</p><p>has two drawbacks. The first is that the precise set of frequencies is arbitrary</p><p>inasmuch as it depends on the record length. The second is that the peri-</p><p>odogram does not become smoother as the length of the time series increases</p><p>but just includes more spikes packed closer together. The remedy is to smooth</p><p>the periodogram by taking a moving average of spikes before joining the tips.</p><p>The smoothed periodogram is also known as the (sample) spectrum. We de-</p><p>note the spectrum of {xt} by Cxx(), with an argument ω or f depending on</p><p>whether it is expressed in radians or cycles per sampling interval. However,</p><p>the smoothing will reduce the heights of peaks, and excessive smoothing will</p><p>blur the features we are looking for. It is a good idea to consider spectra</p><p>with different amounts of smoothing, and this is made easy for us with the R</p><p>function spectrum. The argument span is the number of spikes in the moving</p><p>average,4 and is a useful guide for an initial value, for time series of lengths</p><p>up to a thousand, is twice the record length.</p><p>The time series should either be mean adjusted (mean subtracted) before</p><p>calculating the periodogram or the a0 spike should be set to 0 before averaging</p><p>spikes to avoid increasing the low-frequency contributions to the variance. In</p><p>R, the spectrum function goes further than this and removes a linear trend</p><p>from the series before calculating the periodogram. It seems appropriate to fit</p><p>a trend and remove it if the existence of a trend in the underlying stochastic</p><p>process is plausible. Although this will usually pertain, there may be cases in</p><p>which you wish to attribute an apparent trend in a time series to a fractionally</p><p>differenced process, and prefer not to remove a fitted trend. You could then use</p><p>the fft function and average the spikes to obtain a spectrum of the unadjusted</p><p>time series (§9.7).</p><p>The spectrum does not retain the phase information, though in the case</p><p>of stationary time series all phases are equally likely and the sample phases</p><p>have no theoretical interest.</p><p>9.4 Spectra of simulated series</p><p>9.4.1 White noise</p><p>We will start by generating an independent random sample from a normal</p><p>distribution. This is a realisation of a Gaussian white noise process. If no span</p><p>is specified in the spectrum function, R will use the heights of the Fourier line</p><p>4 Weighted moving averages can be used, and the choice of weights determines the</p><p>spectral window.</p><p>176 9 Spectral Analysis</p><p>spectrum spikes to construct a spectrum with no smoothing.5 We compare</p><p>this with a span of 65 in Figure 9.2.</p><p>> layout(1:2)</p><p>> set.seed(1)</p><p>> x spectrum(x, log = c("no"))</p><p>> spectrum(x, span = 65, log = c("no"))</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>2</p><p>4</p><p>6</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(a)</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>8</p><p>1.</p><p>1</p><p>1.</p><p>4</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(b)</p><p>Fig. 9.2. Realisation of Gaussian white noise: (a) raw periodogram; (b) spectrum</p><p>with span = 65.</p><p>The default is a logarithmic scale for the spectrum, but we have changed</p><p>this by setting the log parameter to "no". The frequency axis is cycles per</p><p>sampling interval.</p><p>The second spectrum is much smoother as a result of the moving average</p><p>of 65 adjacent spikes. Both spectra are scaled so that their area is one-half</p><p>the variance of the time series. The rationale for this is that the spectrum is</p><p>5 By default, spectrum applies a taper to the first 10% and last 10% of the series and</p><p>pads the series to a highly composite length. However, 2048 is highly composite,</p><p>and the taper has little effect on a realisation of this length.</p><p>9.4 Spectra of simulated series 177</p><p>defined from −0.5 to 0.5, and is symmetric about 0. However, in the context of</p><p>spectral analysis, there is no useful distinction between positive and negative</p><p>frequencies, and it is usual to plot the spectrum over [0, 0.5], scaled so that its</p><p>area equals the variance of the signal. So, for a report it is better to multiply</p><p>the R spectrum by a factor of 2 and to use hertz rather than cycles per sampling</p><p>interval for frequency. You can easily do this with the following R commands,</p><p>assuming the width of the sampling interval is Del (which would need to be</p><p>assigned first):</p><p>> x.spec spx spy plot (spx, spy, xlab = "Hz", ylab = "variance/Hz", type = "l")</p><p>The theoretical spectrum for independent random variation with variance</p><p>of unity is flat at 2 over the range [0, 0.5]. The name white noise is chosen</p><p>to be reminiscent of white light made up from equal contributions of energy</p><p>across the visible spectrum. An explanation for the flat spectrum arises from</p><p>the regression model. If we have independent random errors, the E[am] and</p><p>E[bm] will all be 0 and the E[A2</p><p>m] are all equal. Notice that the vertical scale</p><p>for the smoothed periodogram is from 0.8 to 1.4, so it is relatively flat (Fig.</p><p>9.2). If longer realisations are generated and the bandwidth is held constant,</p><p>the default R spectra will tend towards a flat line at a height of 1.</p><p>The bandwidths shown in Figure 9.2 are calculated from the R definition</p><p>of bandwidth as span×{0.5/(n/2)}/</p><p>√</p><p>12. A more common definition of band-</p><p>width in the context of spectral analysis is span/(n/2) cycles per sampling</p><p>interval. The latter definition is the spacing between statistically independent</p><p>estimates of the spectrum height, and it is larger than the R bandwidth by a</p><p>factor of 6.92.</p><p>The spectrum distributes variance over frequency, and the expected shape</p><p>does not depend on the distribution that is being sampled. You are asked to</p><p>investigate the effect, if any, of using random numbers from an exponential,</p><p>rather than normal, distribution in Exercise 2.</p><p>9.4.2 AR(1): Positive coefficient</p><p>We generate a realisation of length 1024 from an AR(1) process with α equal</p><p>to 0.9 and compare the time series</p><p>plot, correlogram, and spectrum in Figure</p><p>9.3.</p><p>> set.seed(1)</p><p>> x for (t in 2:1024) x[t] layout(1:3)</p><p>> plot(as.ts(x))</p><p>> acf(x)</p><p>> spectrum(x, span = 51, log = c("no"))</p><p>178 9 Spectral Analysis</p><p>(a)</p><p>Time</p><p>x</p><p>0 200 400 600 800 1000</p><p>−</p><p>6</p><p>−</p><p>2</p><p>2</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>20</p><p>40</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.3. Simulated AR(1) process with α = 0.9: (a) time plot; (b) correlogram; (c)</p><p>spectrum.</p><p>The plot of the time series shows the tendency for consecutive values to</p><p>be relatively similar, and change is relatively slow, so we might expect the</p><p>spectrum to pick up low-frequency variation. The acf quantifies the tendency</p><p>for consecutive values to be relatively similar. The spectrum confirms that</p><p>low-frequency variation dominates.</p><p>9.4.3 AR(1): Negative coefficient</p><p>We now change α from 0.9 to −0.9. The plot of the time series (Fig. 9.4)</p><p>shows the tendency for consecutive values to oscillate, change is rapid, and we</p><p>expect the spectrum to pick up high-frequency variation. The acf quantifies</p><p>the tendency for consecutive values to oscillate, and the spectrum shows high</p><p>frequency variation.</p><p>9.4.4 AR(2)</p><p>Consider an AR(2) process with parameters 1 and −0.6. This can be inter-</p><p>preted as a second-order difference equation describing the motion of a lightly</p><p>damped single mode system (Exercise 3), such as a mass on a spring, subjected</p><p>9.5 Sampling interval and record length 179</p><p>(a)</p><p>Time</p><p>x</p><p>0 200 400 600 800 1000</p><p>−</p><p>8</p><p>−</p><p>2</p><p>2</p><p>6</p><p>0 5 10 15 20 25 30</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>10</p><p>30</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.4. Simulated AR(1) process with α = −0.9: (a) time plot; (b) correlogram;</p><p>(c) spectrum.</p><p>to a sequence of white noise impulses. The spectrum in Figure 9.5 shows a</p><p>peak at the natural frequency of the system – the frequency at which the mass</p><p>will oscillate if the spring is extended and then released.</p><p>> set.seed(1)</p><p>> x for (t in 3:1024) x[t] layout (1:3)</p><p>> plot (as.ts(x))</p><p>> acf (x)</p><p>> spectrum (x, span = 51, log = c("no"))</p><p>9.5 Sampling interval and record length</p><p>Many time series are of an inherently continuous variable that is sampled to</p><p>give a time series at discrete time steps. For example, the National Climatic</p><p>Data Center (NCDC) provides 1-minute readings of temperature, wind speed,</p><p>and pressure at meteorological stations throughout the United States. It is</p><p>180 9 Spectral Analysis</p><p>(a)</p><p>Time</p><p>x</p><p>0 200 400 600 800 1000</p><p>−</p><p>4</p><p>0</p><p>4</p><p>0 5 10 15 20 25 30</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>2</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0</p><p>4</p><p>8</p><p>12</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.5. Simulated AR(2) process with α1 = 1 and α2 = −0.6: (a) time plot; (b)</p><p>correlogram; (c) spectrum.</p><p>crucial that the continuous signal be sampled at a sufficiently high rate to</p><p>retain all its information. If the sampling rate is too low, we not only lose</p><p>information but will mistake high-frequency variation for variation at a lower</p><p>frequency. This latter phenomenon is known as aliasing and can have serious</p><p>consequences.</p><p>In signal processing applications, the measurement device may return a</p><p>voltage as a continuously varying electrical signal. However, analysis is usu-</p><p>ally performed on a digital computer, and the signal has to be sampled to give</p><p>a time series at discrete time steps. The sampling is known as analog-to-digital</p><p>conversion (A/D). Modern oscilloscopes sample at rates as high as Giga sam-</p><p>ples per second (GS/s) and have anti-alias filters, built from electronic com-</p><p>ponents, that remove any higher-frequency components in the original contin-</p><p>uous signal. Digital recordings of musical performances are typically sampled</p><p>at rates of 1 Mega sample per second (MS/s) after any higher-frequencies</p><p>have been removed with anti-alias filters. Since the frequency range of human</p><p>hearing is from about 15 to 20,000 Hz, sampling rates of 1 MS/s are quite</p><p>adequate for high-fidelity recordings.</p><p>9.5 Sampling interval and record length 181</p><p>9.5.1 Nyquist frequency</p><p>The Nyquist frequency is the cutoff frequency associated with a given sam-</p><p>pling rate and is one-half the sampling frequency. Once a continuous signal</p><p>is sampled, any frequency higher than the Nyquist frequency will be indistin-</p><p>guishable from its low-frequency alias.</p><p>To understand this phenomenon, suppose the sampling interval is ∆ and</p><p>the corresponding sampling frequency is 1/∆ samples per second. A sine wave</p><p>with a frequency of 1/∆ cycles per second is generated by the radius in Figure</p><p>9.1 rotating anti-clockwise at a rate of 1 revolution per sampling interval ∆,</p><p>and it follows that it cannot be detected when sampled at this rate. Similarly, a</p><p>sine wave with a frequency of −1/∆ cycles per second, generated by the radius</p><p>in Figure 9.1 rotating clockwise at a rate of 1 revolution per sampling interval</p><p>∆, is also undetectable. Now consider a sine wave with a frequency f that lies</p><p>within the interval [−1/(2∆), 1/(2∆)]. This sine wave will be indistinguishable</p><p>from any sine wave generated by a radius that completes an integer number</p><p>of additional revolutions, anti-clockwise or clockwise, during the sampling</p><p>interval. More formally, the frequency f will be indistinguishable from</p><p>f ± k∆ (9.5)</p><p>where k is an integer. Figure 9.6 shows a sine function with a frequency of 1 Hz,</p><p>sin(2πt), sampled at 0.2 s, together with its alias when k in Equation (9.5)</p><p>equals −1. This alias frequency is 1− 1/0.2, which equals −4 Hz. Physically,</p><p>a frequency of −4 Hz is identical to a frequency of 4 Hz, except for a phase</p><p>difference of half a cycle (sin(−θ) = − sin(θ) = sin(θ − π)).</p><p>> t tc x xc xa plot (t, x)</p><p>> lines (tc, xc)</p><p>> lines (tc, xa, lty = "dashed")</p><p>To summarise, the Nyquist frequency Q is related to the sampling interval</p><p>∆ by</p><p>Q =</p><p>1</p><p>2∆</p><p>(9.6)</p><p>and Q should be higher than any frequency components in the continuous</p><p>signal.</p><p>9.5.2 Record length</p><p>To begin with, we need to establish the highest frequency we can expect to</p><p>encounter and set the Nyquist frequency Q well above this. The Nyquist fre-</p><p>quency determines the sampling interval, ∆, from Equation (9.6). If the time</p><p>182 9 Spectral Analysis</p><p>Fig. 9.6. Aliased frequencies: 1 Hz and 4 Hz with ∆ = 0.2 second.</p><p>series has length n, the record length, T , is n∆. The fundamental frequency</p><p>is 1/T Hz, and this is the spacing between spikes in the Fourier line spec-</p><p>trum. If we wish to distinguish frequencies separated by ε Hz, we should aim</p><p>for independent estimates of the spectrum centred on these frequencies. This</p><p>implies that the bandwidth must be at most ε. If we take a moving average</p><p>of L spikes in the Fourier line spectrum, we have the following relationship:</p><p>2L</p><p>n∆</p><p>=</p><p>2L</p><p>T</p><p>≤ ε (9.7)</p><p>For example, suppose we wish to distinguish frequencies separated by 1 Hz</p><p>in an audio recording. A typical sampling rate for audio recording is 1 MS/s,</p><p>corresponding to ∆ = 0.000001. If we take L equal to 100, it follows from</p><p>Equation (9.7) that n must exceed 200×106. This is a long time series but the</p><p>record length is less than four minutes. If a time series of this length presents</p><p>computational problems, an alternative method for computing a smoothed</p><p>spectrum is to calculate the Fourier line spectrum for the 100 subseries of two</p><p>million observations and average these 100 Fourier line spectra.</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>0.0 0.5 1.0 1.5 2.0</p><p>−</p><p>1.</p><p>0</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>t</p><p>x</p><p>9.6 Applications 183</p><p>9.6 Applications</p><p>9.6.1 Wave tank data</p><p>The data in the file wave.dat are the surface height, relative to still water</p><p>level, of water at the centre of a wave tank sampled over 39.6 seconds at a</p><p>rate of 10 samples per second. The aim of the analysis is to check whether the</p><p>spectrum is a realistic emulation of typical sea spectra. Referring to Figure</p><p>9.7, the time series plot gives a general impression of the wave profile over time</p><p>and we can see that there are no obvious erroneous values. The correlogram</p><p>is qualitatively similar to that for a realisation of an AR(2) process,6 but</p><p>an AR(2) model would not account for a second peak in the spectrum at a</p><p>frequency near 0.09.</p><p>> www wavetank.dat attach (wavetank.dat)</p><p>> layout (1:3)</p><p>> plot (as.ts(waveht))</p><p>> acf (waveht)</p><p>> spectrum (waveht)</p><p>The default method of fitting the spectrum used above does not require the</p><p>ar function. However, the ar function is used in §9.9 and selects an AR(13)</p><p>model. The shape of the estimated spectrum in Figure 9.7 is similar to that</p><p>of typical sea spectra.</p><p>9.6.2 Fault detection on electric motors</p><p>Induction motors are widely used in industry, and although they are generally</p><p>reliable, they do require maintenance. A common fault is broken rotor bars,</p><p>which reduce the output torque capability and increase vibration, and if left</p><p>undetected can lead to catastrophic failure of the electric motor. The measured</p><p>current spectrum of a typical motor in good condition will have a spike at</p><p>mains frequency, commonly 50 Hz, with side band peaks at 46 Hz and 54 Hz.</p><p>If a rotor bar breaks, the magnitude of the side band peaks will increase by a</p><p>factor of around 10. This increase can easily be detected in the spectrum.</p><p>Siau et al. (2004) compare current spectra for an induction motor in good</p><p>condition and with one broken bar. They sample the current at 0.0025-second</p><p>intervals, corresponding to a Nyquist frequency of 200 Hz, and calculate spec-</p><p>tra from records of 100 seconds length. The time series have length 40,000,</p><p>and the bandwidth with a span of 60 is 1.2 Hz (Equation (9.7)).</p><p>The data are in the file imotor.txt. R code for drawing the spectra (Fig.</p><p>9.8) follows. The broken bar condition is indicated clearly by the higher side</p><p>band peaks in the spectrum. In contrast, the standard deviations of the good</p><p>condition and broken condition time series are very close.</p><p>6 The pacf, not shown here, also suggests that an AR(2) model would be plausible.</p><p>184 9 Spectral Analysis</p><p>(a)</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>he</p><p>ig</p><p>ht</p><p>0 100 200 300 400</p><p>−</p><p>50</p><p>0</p><p>50</p><p>0</p><p>0 5 10 15 20 25</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>5</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0e</p><p>+</p><p>00</p><p>3e</p><p>+</p><p>05</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.7. Wave elevation series: (a) time plot; (b) correlogram; (c) spectrum.</p><p>> www imotor.dat attach (imotor.dat)</p><p>> xg.spec xb.spec freqg freqb plot(freqg, 10*log10(xg.spec$spec[4400:5600]), main = "",</p><p>xlab = "Frequency (Hz)", ylab = "Current spectrum (dB)", type="l")</p><p>> lines(freqb, 10 * log10(xb.spec$spec[4400:5600]), lty = "dashed")</p><p>> sd(good)</p><p>[1] 7071.166</p><p>> sd(broken)</p><p>[1] 7071.191</p><p>9.6.3 Measurement of vibration dose</p><p>The drivers of excavators in open cast mines are exposed to considerable me-</p><p>chanical vibration. The British Standard Guide BS6841:1987 is routinely used</p><p>to quantify the effects. A small engineering company has developed an active</p><p>9.6 Applications 185</p><p>44 46 48 50 52 54 56</p><p>0</p><p>20</p><p>40</p><p>60</p><p>80</p><p>10</p><p>0</p><p>Frequency (Hz)</p><p>C</p><p>ur</p><p>re</p><p>nt</p><p>s</p><p>pe</p><p>ct</p><p>ru</p><p>m</p><p>(</p><p>dB</p><p>)</p><p>Fig. 9.8. Spectrum of current signal from induction motor in good condition (solid)</p><p>and with broken rotor bar (dotted). Frequency is in cycles per 0.0025 second sam-</p><p>pling interval.</p><p>vibration absorber for excavators and has carried out tests. The company has</p><p>accelerometer measurements of the acceleration in the forward (x), sideways</p><p>(y), and vertical (z) directions during a rock-cutting operation. The estimated</p><p>vibration dose value is defined as</p><p>eV DV =</p><p>[</p><p>(1.4× ā)4 × T</p><p>]1/4</p><p>(9.8)</p><p>where ā is the root mean square value of frequency-weighted acceleration</p><p>(ms−2) and T is the duration (s). The mean square frequency-weighted accel-</p><p>eration in the vertical direction is estimated by</p><p>ā2</p><p>z =</p><p>∫</p><p>Cz̈z̈(f)W (f) df (9.9)</p><p>where the weighting function, W (f), represents the relative severity of vibra-</p><p>tion at different frequencies for a driver, and the acceleration time series is the</p><p>second derivative of the displacement signal, denoted z̈. Components in the</p><p>186 9 Spectral Analysis</p><p>forward and sideways directions are defined similarly, and then ā is calculated</p><p>as</p><p>ā = (ā2</p><p>x + ā2</p><p>y + ā2</p><p>z)</p><p>1/2 (9.10)</p><p>The data in the file zdd.txt are acceleration in the vertical direction (mm</p><p>s−2) measured over a 5-second period during a rock-cutting operation. The</p><p>sampling rate is 200 per second, and analog anti-aliasing filters were used to</p><p>remove any frequencies above 100 Hz in the continuous voltage signal from the</p><p>accelerometer. The frequency-weighting function was supplied by a medical</p><p>consultant. It is evaluated at 500 frequencies to match the spacing of the</p><p>spectrum ordinates and is given in vibdoswt.txt. The R routine has been</p><p>written to give diagrams in physical units, as required for a report.7</p><p>> www zdotdot.dat attach (zdotdot.dat)</p><p>> www wt.dat attach (wt.dat)</p><p>> acceln.spec Frequ Sord Time layout (1:3)</p><p>> plot (Time, Accelnz, xlab = "Time (s)",</p><p>ylab = expression(mm~ s^-2),</p><p>main = "Acceleration", type = "l")</p><p>> plot (Frequ, Sord, main = "Spectrum", xlab = "Frequency (Hz)",</p><p>ylab = expression(mm^2~s^-4~Hz^-1), type = "l")</p><p>> plot (Frequ, Weight, xlab = "Frequency (Hz)",</p><p>main = "Weighting function", type = "l")</p><p>> sd (Accelnz)</p><p>[1] 234.487</p><p>> sqrt( sum(Sord * Weight) * 0.2 )</p><p>[1] 179.9286</p><p>Suppose a driver is cutting rock for a 7-hour shift. The estimated root</p><p>mean square value of frequency weighted acceleration is 179.9 (mm s−2). If</p><p>we assume continuous exposure throughout the 7-hour period, the eVDV cal-</p><p>culated using Equation (9.8) is 3.17 (m s−1.75). The British Standard states</p><p>that doses as high as 15 will cause severe discomfort but is non-committal</p><p>about safe doses arising from daily exposure. The company needs to record</p><p>acceleration measurements during rock-cutting operations on different occa-</p><p>sions, with and without the vibration absorber activated. It can then estimate</p><p>the decrease in vibration dose that can be achieved by fitting the vibration</p><p>absorber to an excavator (Fig. 9.9).</p><p>7 Within R, type demo(plotmath) to see a list of mathematical operators that can</p><p>be used by the function expression for plots.</p><p>9.6 Applications 187</p><p>0 1 2 3 4 5</p><p>−</p><p>60</p><p>0</p><p>0</p><p>40</p><p>0</p><p>(a)</p><p>Time (s)</p><p>m</p><p>m</p><p>s</p><p>−−2</p><p>0 20 40 60 80 100</p><p>50</p><p>0</p><p>15</p><p>00</p><p>(b)</p><p>Frequency (Hz)</p><p>m</p><p>m</p><p>2 s</p><p>−−4</p><p>H</p><p>z−−1</p><p>0 20 40 60 80 100</p><p>0.</p><p>4</p><p>0.</p><p>7</p><p>1.</p><p>0</p><p>(c)</p><p>Frequency (Hz)</p><p>W</p><p>ei</p><p>gh</p><p>t</p><p>Fig. 9.9. Excavator series: (a) acceleration in vertical direction; (b) spectrum; (c)</p><p>frequency weighting function.</p><p>9.6.4 Climatic indices</p><p>Climatic indices are strongly related to ocean currents, which have a major</p><p>influence on weather patterns throughout the world. For example, El Niño is</p><p>associated with droughts throughout much of eastern Australia. A statistical</p><p>analysis of these indices is essential for two reasons. Firstly, it helps us assess</p><p>evidence of climate change. Secondly, it allows us to forecast, albeit with</p><p>limited confidence, potential natural disasters such as droughts and to take</p><p>action to mitigate the effects. Farmers, in particular, will modify their plans</p><p>for crop planting if drought is more likely than usual. Spectral analysis enables</p><p>us to identify any tendencies towards periodicities or towards persistence in</p><p>these indices.</p><p>The Southern Oscillation Index (SOI) is defined as the normalised pressure</p><p>difference between Tahiti and Darwin. El Niño events occur when the SOI is</p><p>strongly negative, and are associated with droughts in eastern Australia.</p><p>The</p><p>monthly time series8 from January 1866 until December 2006 are in soi.txt.</p><p>The time series plot in Figure 9.10 is a useful check that the data have been</p><p>read correctly and gives a general impression of the range and variability of</p><p>the SOI. But, it is hard to discern any frequency information. The spectrum</p><p>is plotted with a logarithmic vertical scale and includes a 95% confidence in-</p><p>terval for the population spectrum in the upper right. The confidence interval</p><p>can be represented as a vertical line relative to the position of the sample</p><p>8 More details and the data are at http://www.cru.uea.ac.uk/cru/data/soi.htm.</p><p>188 9 Spectral Analysis</p><p>spectrum indicated by the horizontal line, because it has a constant width on</p><p>a logarithmic scale (§9.10.2). The spectrum has a peak at a low-frequency, so</p><p>we enlarge the low frequency section of the spectrum to identify this frequency</p><p>more precisely. It is about 0.022 cycles per month and corresponds to a period</p><p>of 45 months. However, the peak is small and lower frequency contributions</p><p>to the spectrum are substantial, so we cannot expect a regular pattern of El</p><p>Niño events.</p><p>> www soi.dat attach (soi.dat)</p><p>> soi.ts layout (1:3)</p><p>> plot (soi.ts)</p><p>> soi.spec plot (soi.spec$freq[1:60], soi.spec$spec[1:60], type = "l")</p><p>(a)</p><p>Time</p><p>S</p><p>O</p><p>I</p><p>1880 1900 1920 1940 1960 1980 2000</p><p>−</p><p>4</p><p>0</p><p>2</p><p>4</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>5</p><p>2.</p><p>0</p><p>10</p><p>.0</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(b)</p><p>0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035</p><p>6.</p><p>0</p><p>7.</p><p>0</p><p>8.</p><p>0</p><p>9.</p><p>0</p><p>(c)</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>Fig. 9.10. Southern Oscillation Index: (a) time plot; (b) spectrum; (c) spectrum</p><p>for the low-frequencies.</p><p>The Pacific Decadal Oscillation (PDO) index is the difference between an</p><p>average of sea surface temperature anomalies in the North Pacific Ocean pole-</p><p>ward of 20 ◦N and the monthly mean global average anomaly.9 The monthly</p><p>time series from January 1900 until November 2007 is in pdo.txt. The spec-</p><p>trum in Figure 9.11 has no noteworthy peak and increases as the frequency</p><p>9 The time series data are available from http://jisao.washington.edu/pdo/.</p><p>9.6 Applications 189</p><p>becomes lower. The function spectrum removes a fitted linear trend before</p><p>calculating the spectrum, so the increase as the frequency tends to zero is</p><p>evidence of long-term memory in the PDO.</p><p>(a)</p><p>Time</p><p>P</p><p>D</p><p>O</p><p>1900 1920 1940 1960 1980 2000</p><p>−</p><p>3</p><p>0</p><p>2</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>2</p><p>1.</p><p>0</p><p>5.</p><p>0</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(b)</p><p>Fig. 9.11. Pacific Decadal Oscillation: (a) time plot; (b) spectrum.</p><p>> www pdo.dat attach (pdo.dat)</p><p>> pdo.ts layout (1:2)</p><p>> plot (pdo.ts)</p><p>> spectrum( PDO, span = sqrt( 2 * length(PDO) ) )</p><p>This analysis suggests that a FARIMA model might be suitable for modelling</p><p>the PDO and for generating future climate scenarios.</p><p>9.6.5 Bank loan rate</p><p>The data in mprime.txt are the monthly percentage US Federal Reserve Bank</p><p>prime loan rate,10 courtesy of the Board of Governors of the Federal Reserve</p><p>System, from January 1949 until November 2007. We will plot the time series,</p><p>the correlogram, and a spectrum on a logarithmic scale (Fig. 9.12).</p><p>10 Data downloaded from Federal Reserve Economic Data at the Federal Reserve</p><p>Bank of St. Louis.</p><p>190 9 Spectral Analysis</p><p>> www intr.dat attach (intr.dat)</p><p>> layout (1:3)</p><p>> plot (as.ts(Interest), ylab = 'Interest rate')</p><p>> acf (Interest)</p><p>> spectrum(Interest, span = sqrt(length(Interest)) / 4)</p><p>The height of the spectrum increases as the frequency tends to zero (Fig.</p><p>9.12). This feature is similar to that observed in the spectrum of the PDO</p><p>series in §9.6.5 and is again indicative of long-term memory, although it is less</p><p>pronounced in the loan rate series. In §8.4.3, we found that the estimate of the</p><p>fractional differencing parameter was close to 0 and that the apparent long</p><p>memory could be adequately accounted for by high-order ARMA models.</p><p>(a)</p><p>Time</p><p>In</p><p>te</p><p>re</p><p>st</p><p>r</p><p>at</p><p>e</p><p>0 100 200 300 400 500 600 700</p><p>5</p><p>10</p><p>20</p><p>0 5 10 15 20 25</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>1e</p><p>−</p><p>02</p><p>1e</p><p>+</p><p>02</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>(c)</p><p>Fig. 9.12. Federal Reserve Bank loan rates: (a) time plot; (b) correlogram; (c) spec-</p><p>trum.</p><p>9.7 Discrete Fourier transform (DFT)*</p><p>The theoretical basis for spectral analysis can be described succinctly in terms</p><p>of the discrete Fourier transform (DFT). The DFT requires the concept of</p><p>9.7 Discrete Fourier transform (DFT)* 191</p><p>complex numbers and Euler’s formula for a complex sinusoid, but the theory</p><p>then follows nicely. In R, complex numbers are handled by typing i following,</p><p>without a space, a numerical value; for example,</p><p>> z1 z2 z1 - z2</p><p>[1] 3+4i</p><p>> z1 * z2</p><p>[1] 1-5i</p><p>> abs(z1)</p><p>[1] 3.61</p><p>Euler’s formula for a complex sinusoid is</p><p>eiθ = cos(θ) + i sin(θ) (9.11)</p><p>If the circle in Figure 9.1 is at the centre of the complex plane, eiθ is the point</p><p>along the circumference. This remarkable formula can be verified using Taylor</p><p>expansions of eiθ, sin(θ), and cos(θ).</p><p>The DFT is usually calculated using the fast fourier transform algorithm</p><p>(FFT), which is very efficient for long time series. The DFT of a time series of</p><p>length n, {xt : t = 0, . . . , n− 1}, and its inverse transform (IDFT) are defined</p><p>by Equation (9.12) and Equation (9.13), respectively.</p><p>Xm =</p><p>n−1∑</p><p>t=0</p><p>xte</p><p>−2πimt/n m = 0, . . . , n− 1 (9.12)</p><p>xt =</p><p>1</p><p>n</p><p>n−1∑</p><p>m=0</p><p>Xme</p><p>2πitm/n t = 0, . . . , n− 1 (9.13)</p><p>It is convenient to start the time series at t = 0 for these definitions be-</p><p>cause m then corresponds to frequency 2πm/n radians per sampling interval.</p><p>The steps in the derivation of the DFT-IDFT transform pair are set out in</p><p>Exercise 5. The DFT is obtained in R with the function fft(), where x[t+1]</p><p>corresponds to xt and X[m+1] corresponds to Xm.</p><p>> set.seed(1)</p><p>> n x x</p><p>[1] -0.626 0.184 -0.836 1.595 0.330 -0.820 0.487 0.738</p><p>> X X</p><p>192 9 Spectral Analysis</p><p>[1] 1.052+0.000i -0.852+0.007i 0.051+2.970i -1.060-2.639i</p><p>[5] -2.342+0.000i -1.060+2.639i 0.051-2.970i -0.852-0.007i</p><p>> fft(X, inverse = TRUE)/n</p><p>[1] -0.626-0i 0.184+0i -0.836-0i 1.595-0i 0.330+0i -0.820-0i</p><p>[7] 0.487+0i 0.738+0i</p><p>The complex form of Parseval’s Theorem, first given in Equation (9.4), is</p><p>n−1∑</p><p>t=0</p><p>x2</p><p>t =</p><p>n−1∑</p><p>m=0</p><p>|Xm|2/n (9.14)</p><p>If n is even, the |Xm|2 contribution to the variance corresponds to a frequency</p><p>of 2πm/n for m = 1, . . . , n/2. For m = n/2, . . . , (n − 1), the frequencies</p><p>are greater than the Nyquist frequency, π, and are aliased to the frequencies</p><p>2π(m−n)/n, which lie in the range [−π,−2π/n]. All but two of the Xm occur</p><p>as complex conjugate pairs; that is, Xn−j = X∗</p><p>j for j = 1, . . . , n/2 − 1. The</p><p>following lines of R code give the spikes of the Fourier line spectrum FL at</p><p>frequencies in frq scaled so that FL[1] is mean(x)^2 and the sum of FL[2],</p><p>..., FL[n/2+1] is(n-1)*var(x)/n.</p><p>> fq frq FL FL [1] frq[1] for ( j in 2:(n/2) ) {</p><p>FL [j] FL [n/2 + 1] frq[n/2 + 1]</p><p>193</p><p>9.8.1 Discrete white noise</p><p>The spectrum of discrete white noise with variance σ2 is easily obtained from</p><p>the definition since the only non-zero value of γk is σ2 when k = 0.</p><p>Γ (ω) =</p><p>σ2</p><p>2π</p><p>− π spectrum( waveht, log = c("no"), method = c("ar") )</p><p>The smooth shape is useful for qualitative comparisons with the sea spectra</p><p>(Fig. 9.13). The analysis also indicates that we could use an AR(13) model</p><p>to obtain realisations of time series with this same spectrum in computer</p><p>simulations. A well-chosen probability distribution for the errors could be</p><p>used to give a realistic simulation of extreme values in the series.</p><p>9.10 Finer details</p><p>9.10.1 Leakage</p><p>Suppose a time series is a sampled sine function at a specific frequency. If this</p><p>frequency corresponds to one of the frequencies in the finite Fourier series,</p><p>then there will be a spike in the Fourier line spectrum at this frequency. This</p><p>coincidence is unlikely to arise by chance, so now suppose that the specific</p><p>frequency lies between two of the frequencies in the finite Fourier series. There</p><p>will not only be spikes at these two frequencies but also smaller spikes at</p><p>neighbouring frequencies (Exercise 6). This phenomenon is known as leakage.</p><p>9.10 Finer details 195</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0e</p><p>+</p><p>00</p><p>1e</p><p>+</p><p>05</p><p>2e</p><p>+</p><p>05</p><p>3e</p><p>+</p><p>05</p><p>4e</p><p>+</p><p>05</p><p>frequency</p><p>sp</p><p>ec</p><p>tr</p><p>um</p><p>Series: x</p><p>AR (13) spectrum</p><p>Fig. 9.13. Wave elevation series: spectrum calculated from fitting an AR model.</p><p>9.10.2 Confidence intervals</p><p>Consider a frequency ω0 corresponding to a spike of the Fourier line spec-</p><p>trum. If we average an odd number, L, of scaled spikes to obtain a smoothed</p><p>spectrum, then</p><p>C(ω0) =</p><p>1</p><p>L</p><p>(L−1)/2∑</p><p>l=−(L−1)/2</p><p>CRP (ωl) (9.24)</p><p>where CRP are the raw periodogram, scaled spike estimates. Now taking the</p><p>expectation of both sides of Equation (9.24), and assuming the raw peri-</p><p>odogram is unbiased for the population spectrum, we obtain</p><p>E [C(ω0)] =</p><p>1</p><p>L</p><p>(L−1)/2∑</p><p>l=−(L−1)/2</p><p>Γ (ωl) (9.25)</p><p>Provided the population spectrum does not vary much over the interval[</p><p>−ω−(L−1)/2, ω(L−1)/2</p><p>]</p><p>,</p><p>E [C(ω0)] ≈ Γ (ω0) (9.26)</p><p>But, notice that if ω0 corresponds to a peak or trough of the spectrum, the</p><p>smoothed spectrum will be biased low or high. The more the smoothing, the</p><p>196 9 Spectral Analysis</p><p>more the bias. However, some smoothing is essential to reduce the variability.</p><p>The following heuristic argument gives an approximate confidence interval for</p><p>the spectrum. If we divide both sides of Equation (9.24) by Γ (ω0) and take</p><p>the variance, we obtain</p><p>Var [C(ω0)/Γ (ω0)] ≈</p><p>1</p><p>L2</p><p>(L−1)/2∑</p><p>l=−(L−1)/2</p><p>Var [CRP (ωl)/Γ (ωl)] (9.27)</p><p>where we have used the fact that spikes in the Fourier line spectrum are</p><p>independent – a consequence of Parseval’s Theorem. Now each spike is an</p><p>estimate of variance at frequency ωl based on 2 degrees of freedom. So,</p><p>2CRP (ωl)</p><p>Γ (ωl)</p><p>∼ χ2</p><p>2 (9.28)</p><p>The variance of a chi-square distribution is twice its degrees of freedom. Hence,</p><p>Var [C(ω0)/Γ (ω0)] ≈</p><p>1</p><p>L</p><p>(9.29)</p><p>A scaled sum of L chi-square variables, each with 2 degrees of freedom, is a</p><p>scaled chi-square variable with 2L degrees of freedom and well approximated</p><p>by a normal distribution. Thus an approximate 95% confidence interval for</p><p>Γ (ω) is [(</p><p>1− 2√</p><p>L</p><p>)</p><p>C(ω),</p><p>(</p><p>1 +</p><p>2√</p><p>L</p><p>)</p><p>C(ω)</p><p>]</p><p>(9.30)</p><p>We have dropped the subscript on ω because the result remains a good ap-</p><p>proximation for estimates of the spectrum interpolated between C(ωl).</p><p>9.10.3 Daniell windows</p><p>The function spectrum uses a modified Daniell window, or smoother, that</p><p>gives half weight to the end values. If more than one number is specified for</p><p>the parameter span, it will use a series of Daniell smoothers, and the net result</p><p>will be a centred moving average with weights decreasing from the centre. The</p><p>rationale for using a series of smoothers is that it will decrease the bias.</p><p>9.10.4 Padding</p><p>The simplest FFT algorithm assumes that the time series has a length that is</p><p>some power of 2. A positive integer is highly composite if it has more divisors</p><p>than any smaller positive integer. The FFT algorithm is most efficient when</p><p>the length n is highly composite, and by default spec.pgram pads the mean</p><p>adjusted time series with zeros to reach the smallest highly composite number</p><p>that is greater than or equal to the length of the time series. Padding can be</p><p>9.11 Summary of additional commands used 197</p><p>avoided by setting the parameter fast=FALSE. A justification for padding is</p><p>that the length of the time series is arbitrary and that adding zeros has no</p><p>effect on the frequency composition. Adding zeros does reduce the variance,</p><p>and this must be remembered when scaling the spectrum, so that its area</p><p>equals the variance of the original time series.</p><p>9.10.5 Tapering</p><p>The length of a time series is not usually related to any underlying frequency</p><p>composition. However, the discrete Fourier series keeps replicating the original</p><p>time series as −∞</p><p>that are corrupted by noise. Spectral analy-</p><p>sis can be used for spatial series such as surface roughness transects, and</p><p>two-dimensional spectral analysis can be used for measurements of surface</p><p>roughness made over a plane. However, spectral analysis is not suitable for</p><p>non-stationary applications.</p><p>In contrast, wavelets have been developed to summarise the variation in</p><p>frequency composition through time or over space. There are many applica-</p><p>tions, including compression of digital files of images and in speech recognition</p><p>software. Nason (2008) provides an introduction to wavelets using the R pack-</p><p>age WaveThresh4.</p><p>9.11 Summary of additional commands used</p><p>spectrum returns the spectrum</p><p>spec.pgam returns the spectrum with more control of parameters</p><p>fft returns the DFT</p><p>198 9 Spectral Analysis</p><p>9.12 Exercises</p><p>1. Refer to §9.3.1 and take n = 128.</p><p>a) Use R to calculate cos(2πt/n), sin(2πt/n), and cos(4πt/n) for t =</p><p>1, . . . , n. Calculate the three variances and the three correlations.</p><p>b) Assuming the results above generalise, provide an explanation for Par-</p><p>seval’s Theorem.</p><p>c) Explain why the A2</p><p>n/2 term in Equation (9.4) is not divided by 2.</p><p>2. Repeat the investigation of realisations from AR processes in §9.4 using</p><p>random deviates from an exponential distribution with parameter 1 and</p><p>with its mean subtracted, rather than the standard normal distribution.</p><p>3. The differential equation for the oscillatory response x of a lightly damped</p><p>single mode of vibration system, such as a mass on a spring, with a forcing</p><p>term w is</p><p>ẍ+ 2ζΩẋ+Ω2x = w</p><p>where ζ is the damping coefficient, which must be less than 1 for an</p><p>oscillatory response, and Ω is the natural frequency. Approximate the</p><p>derivatives by backward differences:</p><p>ẍ = xt − 2xt−1 + xt−2 ẋ = xt − xt−1</p><p>and set w = wt and rearrange to obtain the form of the AR(2) process in</p><p>§8.4.4. Consider an approximation using central differences.</p><p>4. Suppose that</p><p>xt =</p><p>n−1∑</p><p>m=0</p><p>ame2πimt/n m = 0, . . . , n− 1 (9.31)</p><p>for some coefficients am that we wish to determine. Now multiply both</p><p>sides of this equation by e−2πijt/n and sum over t from 0 to n−1 to obtain</p><p>n−1∑</p><p>t=0</p><p>xte−2πijt/n =</p><p>n−1∑</p><p>t=0</p><p>n−1∑</p><p>m=0</p><p>ame2πi(m−j)t/n (9.32)</p><p>Consider a fixed value of j. Notice that the sum to the right of am is</p><p>a geometric series with sum 0 unless m = j. This is Equation (9.12)</p><p>expressed it terms of naj in place of Xm with a factor of n.</p><p>5. Write R code to average an odd number of spike heights obtained from</p><p>fft and hence plot a spectrum.</p><p>9.12 Exercises 199</p><p>6. Sample the three signals</p><p>a) sin(πt/2)</p><p>b) sin(3πt/4)</p><p>c) sin(5πt/8)</p><p>at times t = 0, . . . , 7, using fft to compare their line spectra.</p><p>7. Sample the signal sin(11πt/32) for t = 0, . . . , 31. Use fft to calculate the</p><p>Fourier line spectrum. The cosine bell taper applied to the beginning α</p><p>and ending α of a series is defined by[</p><p>1− cos</p><p>(</p><p>π{t+ 0.5}/{αn}</p><p>)]</p><p>xt (t+ 1) ≤ αn[</p><p>1− cos</p><p>(</p><p>π{n− t− 0.5}/{αn}</p><p>)]</p><p>xt (t+ 1) ≥ (1− α)n</p><p>Investigate the effect of this taper, with α = 0.1, on the Fourier line</p><p>spectrum of the sampled signal.</p><p>8. Sea spectra are sometimes modelled by the Peirson-Moskowitz spectrum,</p><p>which has the form below and is usually only appropriate for deep water</p><p>conditions.</p><p>Γ (ω) = aω−5e−bω−4</p><p>0 ≤ ω ≤ π</p><p>Plot the Peirson-Moskowitz spectrum in R for a few choices of parameters</p><p>a and b. Compare it with the wave elevation spectra (Fig. 9.7).</p><p>10</p><p>System Identification</p><p>10.1 Purpose</p><p>Vibration is defined as an oscillatory movement of some entity about an equi-</p><p>librium state. It is the means of producing sound in musical instruments, it</p><p>is the principle underlying the design of loudspeakers, and it describes the</p><p>response of buildings to earthquakes. The squealing of disc brakes on a car</p><p>is caused by vibration. The up and down motion of a ship at sea is a low-</p><p>frequency vibration. Spectral analysis provides the means for understanding</p><p>and controlling vibration.</p><p>Vibration is generally caused by some external force acting on a system,</p><p>and the relationship between the external force and the system response can</p><p>be described by a mathematical model of the system dynamics. We can use</p><p>spectral analysis to estimate the parameters of the mathematical model and</p><p>then use the model to make predictions of the response of the system under</p><p>different forces.</p><p>10.2 Identifying the gain of a linear system</p><p>10.2.1 Linear system</p><p>We consider systems that have clearly defined inputs and outputs, and aim</p><p>to deduce the system from measurements of the inputs and outputs or to</p><p>predict the output knowing the system and the input. Attempts to under-</p><p>stand economies and to control inflation by increasing interest rates provide</p><p>ambitious examples of applications of these principles.</p><p>A mathematical model of a dynamic system is linear if the output to a</p><p>sum of input variables, x and y, equals the sum of the outputs corresponding</p><p>to the individual inputs. More formally, a mathematical operator L is linear</p><p>if it satisfies</p><p>L (ax+ by) = aL(x) + bL(y)</p><p>P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 201</p><p>Use R, DOI 10.1007/978-0-387-88698-5 10,</p><p>© Springer Science+Business Media, LLC 2009</p><p>202 10 System Identification</p><p>where a and b are constants. For a linear system, the output response to a</p><p>sine wave input is a sine wave of the same frequency with an amplitude that</p><p>is proportional to the amplitude of the input. The ratio of the output ampli-</p><p>tude to the input amplitude, known as the gain, and the phase lag between</p><p>input and output depend on the frequency of the input, and this dependence</p><p>provides a complete description of a linear system.</p><p>Many physical systems are well approximated by linear mathematical mod-</p><p>els, provided the input amplitude is not excessive. In principle, we can identify</p><p>a linear model by noting the output, commonly referred to as the response,</p><p>to a range of sine wave inputs. But there are practical limitations to such a</p><p>procedure. In many cases, while we may be able to measure the input, we</p><p>certainly cannot specify it. Examples are wave energy devices moored at sea,</p><p>and the response of structures to wind forcing. Even when we can specify the</p><p>input, recording the output over a range of frequencies is a slow procedure. In</p><p>contrast, provided we can measure the input and output, and the input has</p><p>a sufficiently broad spectrum, we can identify the linear system from spectral</p><p>analysis. Also, spectral methods have been developed for non-linear systems.</p><p>A related application of spectral analysis is that we can determine the</p><p>spectrum of the response if we know the system and the input spectrum.</p><p>For example, we can predict the output of a wave energy device if we have</p><p>a mathematical model for its dynamics and know typical sea spectra at its</p><p>mooring.</p><p>10.2.2 Natural frequencies</p><p>If a system is set in motion by an initial displacement or impact, it may oscil-</p><p>late, and this oscillation takes place at the natural frequency (or frequencies)</p><p>of the system. A simple example is the oscillation of a mass suspended by</p><p>a spring. Linear systems have large gains at natural frequencies and, if large</p><p>oscillations are undesirable, designers need to ensure that the natural frequen-</p><p>cies of the system are far removed from forcing frequencies. Alternatively, in</p><p>the case of wave energy devices, for example, the designer may aim for the</p><p>natural frequencies of the device to match predominant frequencies in the sea</p><p>spectrum. A common example of forcing a system at its natural frequency is</p><p>pushing a child on a swing.</p><p>10.2.3 Estimator of the gain function</p><p>If a linear system is forced by a sine wave of amplitude A at frequency f ,</p><p>the response has an amplitude G(f)A, where G(f) is the gain at frequency</p><p>f . The ratio of the variance of the output to the variance of the input, for</p><p>sine waves at this frequency, is G(f)2. If the input is a stationary random</p><p>process rather than a single sine wave, its variance is distributed over a range</p><p>of frequencies, and this distribution is described by the spectrum. It seems</p><p>intuitively reasonable</p><p>to estimate the square of the gain function by the ratio</p><p>10.4 Simulated single mode of vibration system 203</p><p>of the output spectrum to the input spectrum. Consider a linear system with</p><p>a single input, xt, and a single output, yt. The gain function can be estimated</p><p>by</p><p>Ĝ(f) =</p><p>√</p><p>Cyy(f)</p><p>Cuu(f)</p><p>(10.1)</p><p>A corollary is that the output spectrum can be estimated if the gain func-</p><p>tion is known, or has been estimated, and the input spectrum has been esti-</p><p>mated by</p><p>Cyy = G2Cuu (10.2)</p><p>Equation (10.2) also holds if spectra are expressed in radians rather than</p><p>cycles, in which case the gain is a function G(ω) of ω.</p><p>10.3 Spectrum of an AR(p) process</p><p>Consider the deterministic part of an AR(p) model with a complex sinusoid</p><p>input,</p><p>xt − α1xt−1 − . . .− αpxt−p = eiωt (10.3)</p><p>Assume a solution for xt of the form Aeiθeiωt, where A is a complex number,</p><p>and substitute this into Equation (10.3) to obtain</p><p>A =</p><p>(</p><p>1− α1e</p><p>−iω − . . .− αpe</p><p>−iωp</p><p>)−1</p><p>(10.4)</p><p>The gain function, expressed as a function of ω, is the absolute value of A. Now</p><p>consider a discrete white noise input, wt, in place of the complex sinusoid. The</p><p>system is now an AR(p) process. Applying Equation (10.2), with population</p><p>spectra rather than sample spectra, and noting that the spectrum of white</p><p>noise with unit variance is 1/π (§9.8.1), gives</p><p>Γxx(ω) = |A|2 Γww =</p><p>1</p><p>π</p><p>(</p><p>1− α1e</p><p>−iω − . . .− αpe</p><p>−iωp</p><p>)−2</p><p>0 ≤ ω m a0 a1 n y set.seed(1)</p><p>> for (i in 3:n) {</p><p>x[i] Sxx Syy Gemp Freq FreH Omeg OmegH Gth Gar plot(FreH, Gth, xlab = "Frequency (Hz)", ylab = "Gain", type="l")</p><p>> lines(FreH, Gemp, lty = "dashed")</p><p>> lines(FreH, Gar, lty = "dotted")</p><p>10.5 Ocean-going tugboat</p><p>The motion of ships and aircraft is described by displacements along the or-</p><p>thogonal x, y, and z axes and rotations about these axes. The displacements</p><p>are surge, sway, and heave along the x, y, and z axes, respectively. The ro-</p><p>tations about the x, y, and z axes are roll, pitch, and yaw, respectively (Fig.</p><p>10.2). So, there are six degrees of freedom for a ship’s motion in the ocean,</p><p>and there are six natural frequencies. However, the natural frequencies will</p><p>not usually correspond precisely to the displacements and rotations, as there</p><p>is a coupling between displacements and rotations. This is typically most pro-</p><p>nounced between heave and pitch. There will be a natural frequency with</p><p>206 10 System Identification</p><p>0 1 2 3 4 5</p><p>0.</p><p>00</p><p>0.</p><p>05</p><p>0.</p><p>10</p><p>0.</p><p>15</p><p>0.</p><p>20</p><p>0.</p><p>25</p><p>Frequency (Hz)</p><p>G</p><p>ai</p><p>n</p><p>Fig. 10.1. Gain of single-mode linear system. The theoretical gain is shown by</p><p>a solid line and the estimate made from the spectra obtained from the difference</p><p>equation is shown by a broken line. The theoretical gain of the difference equation</p><p>is plotted as a dotted line and coincides exactly with the estimate.</p><p>a corresponding mode that is predominantly heave, with a slight pitch, and</p><p>another natural frequency that is predominantly pitch, with a slight heave.</p><p>Naval architects will start with computer designs and then proceed to</p><p>model testing in a wave tank before building a prototype. They will have a</p><p>good idea of the frequency response of the ship from the models, but this will</p><p>have to be validated against sea trials. Here, we analyse some of the data from</p><p>the sea trials of an ocean-going tugboat. The ship sailed over an octagonal</p><p>course, and data were collected on each leg. There was an impressive array</p><p>of electronic instruments and, after processing analog signals through anti-</p><p>aliasing filters, data were recorded at 0.5s intervals for roll (degrees), pitch</p><p>(degrees), heave (m), surge (m), sway (m), yaw</p><p>(degrees), wave height (m),</p><p>and wind speed (knots).</p><p>> www tug.dat attach(tug.dat)</p><p>> Heave.spec Wave.spec G par(mfcol = c(2, 2))</p><p>> plot( as.ts(Wave) )</p><p>> acf(Wave)</p><p>> spectrum(Wave, span = sqrt(length(Heave)), log = c("no"), main = "")</p><p>> plot(Heave.spec$freq, G, xlab="frequency Hz", ylab="Gain", type="l")</p><p>Figure 10.3 shows the estimated wave spectrum and the estimated gain</p><p>from wave height to heave. The natural frequencies associated with the</p><p>heave/pitch modes are estimated as 0.075 Hz and 0.119 Hz, and the cor-</p><p>responding gains from wave to heave are 0.15179 and 0.1323. In theory, the</p><p>gain will approach 1 as the frequency approaches 0, but the sea spectrum has</p><p>negligible components very close to 0, and no sensible estimate can be made.</p><p>Also, the displacements were obtained by integrating accelerometer signals,</p><p>and this is not an ideal procedure at very low frequencies.</p><p>10.6 Non-linearity</p><p>There are several reasons why the hydrodynamic response of a ship will not</p><p>be precisely linear. In particular, the varying cross-section of the hull accounts</p><p>208 10 System Identification</p><p>Time</p><p>W</p><p>av</p><p>e</p><p>0 500 1500 2500</p><p>−</p><p>1</p><p>0</p><p>1</p><p>0 5 10 20 30</p><p>−</p><p>0.</p><p>5</p><p>0.</p><p>0</p><p>0.</p><p>5</p><p>1.</p><p>0</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>0</p><p>1.</p><p>0</p><p>2.</p><p>0</p><p>Frequency</p><p>S</p><p>pe</p><p>ct</p><p>ru</p><p>m</p><p>bandwidth = 0.0052</p><p>0.0 0.1 0.2 0.3 0.4 0.5</p><p>0.</p><p>00</p><p>0.</p><p>05</p><p>0.</p><p>10</p><p>0.</p><p>15</p><p>Frequency Hz</p><p>G</p><p>ai</p><p>n</p><p>Fig. 10.3. Gain of heave from wave.</p><p>for non-linear buoyancy forces. Metcalfe et al. (2007) investigate this by fitting</p><p>a regression of the heave response on lagged values of the response, squares,</p><p>and cross-products of these lagged values, wave height, and wind speed. The</p><p>probing method looks at the response of the fitted model to the sum of two</p><p>complex sinusoids at frequencies ω1 and ω2. The non-linear response can be</p><p>shown as a three-dimensional plot of the gain surface against frequency ω1</p><p>and ω2 or by a contour diagram. However, in this particular application the</p><p>gain associated with the non-linear terms was small compared with the gain of</p><p>the linear terms (Metcalfe et al., 2007). This is partly because the model was</p><p>fitted to data taken when the ship was in typical weather conditions – under</p><p>extreme conditions, when capsizing is likely, linear models are inadequate.</p><p>10.7 Exercises</p><p>1. The differential equation that describes the motion of a linear system with</p><p>a single mode of vibration, such as a mass on a spring, has the general</p><p>form</p><p>ÿ + 2ζΩẏ +Ω2y = x</p><p>10.7 Exercises 209</p><p>The parameter Ω is the undamped natural frequency, and the parameter</p><p>ζ is the damping coefficient. The response is oscillatory if ζ www CBE Elec.ts Choc.ts plot(as.vector(aggregate(Choc.ts)), as.vector(aggregate(Elec.ts)))</p><p>> cor(aggregate(Choc.ts), aggregate(Elec.ts))</p><p>[1] 0.958</p><p>The high correlation of 0.96 and the scatter plot do not imply that the elec-</p><p>tricity and chocolate production variables are causally related (Fig. 11.1). In-</p><p>stead, it is more plausible that the increasing Australian population accounts</p><p>for the increasing trend in both series. Although we can fit a regression of</p><p>one variable as a linear function of the other, with added random variation,</p><p>such regression models are usually termed spurious because of the lack of any</p><p>causal relationship. In this case, it would be far better to regress the variables</p><p>on the Australian population.</p><p>Fig. 11.1. Annual electricity and chocolate production plotted against each other.</p><p>The term spurious regression is also used when</p><p>underlying stochastic</p><p>trends in both series happen to be coincident, and this seems a more appro-</p><p>priate use of the term. Stochastic trends are a feature of an ARIMA process</p><p>with a unit root (i.e., B = 1 is a solution of the characteristic equation). We</p><p>illustrate this by simulating two independent random walks:</p><p>● ●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>30000 40000 50000 60000 70000 80000 90000</p><p>20</p><p>00</p><p>0</p><p>60</p><p>00</p><p>0</p><p>10</p><p>00</p><p>00</p><p>14</p><p>00</p><p>00</p><p>Chocolate production</p><p>E</p><p>le</p><p>ct</p><p>ric</p><p>ity</p><p>p</p><p>ro</p><p>du</p><p>ct</p><p>io</p><p>n</p><p>11.2 Spurious regression 213</p><p>> set.seed(10); x for(i in 2:100) {</p><p>x[i] plot(x, y)</p><p>> cor(x, y)</p><p>[1] 0.904</p><p>The code above can be repeated for different random number seeds though</p><p>you will only sometimes notice spurious correlation. The seed value of 10 was</p><p>selected to provide an example of a strong correlation that could have resulted</p><p>by chance. The scatter plot shows how two independent time series variables</p><p>might appear related when each variable is subject to stochastic trends (Fig.</p><p>11.2).</p><p>Fig. 11.2. The values of two independent simulated random walks plotted against</p><p>each other. (See the code in the text.)</p><p>Stochastic trends are common in economic series, and so considerable care</p><p>is required when trying to determine any relationships between the variables</p><p>in multiple economic series. It may be that an underlying relationship can be</p><p>justified even when the series exhibit stochastic trends because two series may</p><p>be related by a common stochastic trend.</p><p>For example, the daily exchange rate series for UK pounds, the Euro, and</p><p>New Zealand dollars, given for the period January 2004 to December 2007,</p><p>are all per US dollar. The correlogram plots of the differenced UK and EU</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>●</p><p>● ●</p><p>●</p><p>●●</p><p>●</p><p>0 2 4 6 8 10 12</p><p>−</p><p>5</p><p>0</p><p>5</p><p>10</p><p>x</p><p>y</p><p>214 11 Multivariate Models</p><p>series indicate that both exchange rates can be well approximated by random</p><p>walks (Fig. 11.3), whilst the scatter plot of the rates shows a strong linear</p><p>relationship (Fig. 11.4), which is supported by a high correlation of 0.95. Since</p><p>the United Kingdom is part of the European Economic Community (EEC),</p><p>any change in the Euro exchange rate is likely to be apparent in the UK</p><p>pound exchange rate, so there are likely to be fluctuations common to both</p><p>series; in particular, the two series may share a common stochastic trend. We</p><p>will discuss this phenomenon in more detail when we look at cointegration in</p><p>§11.4.</p><p>> www xrates xrates[1:3, ]</p><p>UK NZ EU</p><p>1 0.558 1.52 0.794</p><p>2 0.553 1.49 0.789</p><p>3 0.548 1.49 0.783</p><p>> acf( diff(xrates$UK) )</p><p>> acf( diff(xrates$EU) )</p><p>> plot(xrates$UK, xrates$EU, pch = 4)</p><p>> cor(xrates$UK, xrates$EU)</p><p>[1] 0.946</p><p>11.3 Tests for unit roots</p><p>When investigating any relationship between two time series variables we</p><p>should check whether time series models that contain unit roots are suitable.</p><p>If they are, we need to decide whether or not there is a common stochastic</p><p>trend. The first step is to see how well each series can be approximated as</p><p>a random walk by looking at the correlogram of the differenced series (e.g.,</p><p>Fig. 11.3). Whilst this may work for a simple random walk, we have seen in</p><p>Chapter 7 that stochastic trends are a feature of any time series model with</p><p>a unit root B = 1 as a solution of the characteristic equation, which would</p><p>include more complex ARIMA processes.</p><p>Dickey and Fuller developed a test of the null hypothesis that α = 1 against</p><p>an alternative hypothesis that α library(tseries)</p><p>> adf.test(x)</p><p>11.3 Tests for unit roots 215</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(a)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>0 5 10 15 20 25 30</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>(b)</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 11.3. Correlograms of the differenced exchange rate series: (a) UK rate; (b)</p><p>EU rate.</p><p>Augmented Dickey-Fuller Test</p><p>data: x</p><p>Dickey-Fuller = -2.23, Lag order = 4, p-value = 0.4796</p><p>alternative hypothesis: stationary</p><p>This result is not surprising since we would only expect 5% of simulated</p><p>random walks to provide evidence against a null hypothesis of a unit root</p><p>at the 5% level. However, when we analyse physical time series rather than</p><p>realisations from a known model, we should never mistake lack of evidence</p><p>against a hypothesis for a demonstration that the hypothesis is true. The test</p><p>result should be interpreted with careful consideration of the length of the</p><p>time series, which determines the power of the test, and the general context.</p><p>The null hypothesis of a unit root is favoured by economists because many</p><p>financial time series are better approximated by random walks than by a</p><p>stationary process, at least in the short term.</p><p>An alternative to the augmented Dickey-Fuller test, known as the Phillips-</p><p>Perron test (Perron, 1988), is implemented in the R function pp.test. The</p><p>distinction between the two tests is that the Phillips-Perron procedure esti-</p><p>mates the autocorrelations in the stationary process ut directly (using a kernel</p><p>smoother) rather than assuming an AR approximation, and for this reason</p><p>the Phillips-Perron test is described as semi-parametric. Critical values of the</p><p>test statistic are either based on asymptotic theory or calculated from exten-</p><p>216 11 Multivariate Models</p><p>0.70 0.75 0.80 0.85</p><p>0.</p><p>48</p><p>0.</p><p>52</p><p>0.</p><p>56</p><p>EU rate</p><p>U</p><p>K</p><p>r</p><p>at</p><p>e</p><p>Fig. 11.4. Scatter plot of the UK and EU exchange rates. Both rates are per US</p><p>dollar.</p><p>sive simulations. There is no evidence to reject the unit root hypothesis, so</p><p>we conclude that the UK pound and Euro exchange rates are both likely to</p><p>contain unit roots.</p><p>> pp.test(xrates$UK)</p><p>Phillips-Perron Unit Root Test</p><p>data: xrates$UK</p><p>Dickey-Fuller Z(alpha) = -10.6, Truncation lag parameter = 7,</p><p>p-value = 0.521</p><p>alternative hypothesis: stationary</p><p>> pp.test(xrates$EU)</p><p>Phillips-Perron Unit Root Test</p><p>data: xrates$EU</p><p>Dickey-Fuller Z(alpha) = -6.81, Truncation lag parameter = 7,</p><p>p-value = 0.7297</p><p>alternative hypothesis: stationary</p><p>11.4 Cointegration</p><p>11.4.1 Definition</p><p>Many multiple time series are highly correlated in time. For example, in §11.2</p><p>we found the UK pound and Euro exchange rates very highly correlated. This</p><p>is explained by the similarity of the two economies relative to the US economy.</p><p>Another example is the high correlation between the Australian electricity and</p><p>11.4 Cointegration 217</p><p>chocolate production series, which can be reasonably attributed to an increas-</p><p>ing Australian population rather than a causal relationship. In addition, we</p><p>demonstrated that two series that are independent and contain unit roots</p><p>(e.g., they follow independent random walks) can show an apparent linear re-</p><p>lationship, due to chance similarity of the random walks over the period of the</p><p>time series, and stated that such a correlation would be spurious. However,</p><p>as demonstrated by the analysis of the UK pounds and Euro exchange rates,</p><p>it is quite possible for two series to contain unit roots and be related. Such</p><p>series are said to be cointegrated. In the case of the exchange rates, a stochas-</p><p>tic trend in the US economy during a period when the European economy is</p><p>relatively stable will impart a common, complementary, stochastic trend to</p><p>the UK pound and Euro exchange rates. We now state the precise definition</p><p>of cointegration.</p><p>a variable is measured sequentially in time over or at a fixed</p><p>interval, known as the sampling interval , the resulting data form a time series.</p><p>Observations that have been collected over fixed sampling intervals form a</p><p>historical time series. In this book, we take a statistical approach in which the</p><p>historical series are treated as realisations of sequences of random variables. A</p><p>sequence of random variables defined at fixed sampling intervals is sometimes</p><p>referred to as a discrete-time stochastic process, though the shorter name</p><p>time series model is often preferred. The theory of stochastic processes is vast</p><p>and may be studied without necessarily fitting any models to data. However,</p><p>our focus will be more applied and directed towards model fitting and data</p><p>analysis, for which we will be using R.1</p><p>The main features of many time series are trends and seasonal varia-</p><p>tions that can be modelled deterministically with mathematical functions of</p><p>time. But, another important feature of most time series is that observations</p><p>close together in time tend to be correlated (serially dependent). Much of the</p><p>methodology in a time series analysis is aimed at explaining this correlation</p><p>and the main features in the data using appropriate statistical models and</p><p>descriptive methods. Once a good model is found and fitted to data, the an-</p><p>alyst can use the model to forecast future values, or generate simulations, to</p><p>guide planning decisions. Fitted models are also used as a basis for statistical</p><p>tests. For example, we can determine whether fluctuations in monthly sales</p><p>figures provide evidence of some underlying change in sales that we must now</p><p>allow for. Finally, a fitted statistical model provides a concise summary of the</p><p>main characteristics of a time series, which can often be essential for decision</p><p>makers such as managers or politicians.</p><p>Sampling intervals differ in their relation to the data. The data may have</p><p>been aggregated (for example, the number of foreign tourists arriving per day)</p><p>or sampled (as in a daily time series of close of business share prices). If data</p><p>are sampled, the sampling interval must be short enough for the time series</p><p>to provide a very close approximation to the original continuous signal when</p><p>it is interpolated. In a volatile share market, close of business prices may not</p><p>suffice for interactive trading but will usually be adequate to show a com-</p><p>pany’s financial performance over several years. At a quite different timescale,</p><p>1 R was initiated by Ihaka and Gentleman (1996) and is an open source implemen-</p><p>tation of S, a language for data analysis developed at Bell Laboratories (Becker</p><p>et al. 1988).</p><p>1.3 R language 3</p><p>time series analysis is the basis for signal processing in telecommunications,</p><p>engineering, and science. Continuous electrical signals are sampled to provide</p><p>time series using analog-to-digital (A/D) converters at rates that can be faster</p><p>than millions of observations per second.</p><p>1.3 R language</p><p>It is assumed that you have R (version 2 or higher) installed on your computer,</p><p>and it is suggested that you work through the examples, making sure your</p><p>output agrees with ours.2 If you do not have R, then it can be installed free</p><p>of charge from the Internet site www.r-project.org. It is also recommended</p><p>that you have some familiarity with the basics of R, which can be obtained</p><p>by working through the first few chapters of an elementary textbook on R</p><p>(e.g., Dalgaard 2002) or using the online “An Introduction to R”, which is</p><p>also available via the R help system – type help.start() at the command</p><p>prompt to access this.</p><p>R has many features in common with both functional and object oriented</p><p>programming languages. In particular, functions in R are treated as objects</p><p>that can be manipulated or used recursively.3 For example, the factorial func-</p><p>tion can be written recursively as</p><p>> Fact Fact(5)</p><p>[1] 120</p><p>In common with functional languages, assignments in R can be avoided,</p><p>but they are useful for clarity and convenience and hence will be used in</p><p>the examples that follow. In addition, R runs faster when ‘loops’ are avoided,</p><p>which can often be achieved using matrix calculations instead. However, this</p><p>can sometimes result in rather obscure-looking code. Thus, for the sake of</p><p>transparency, loops will be used in many of our examples. Note that R is case</p><p>sensitive, so that X and x, for example, correspond to different variables. In</p><p>general, we shall use uppercase for the first letter when defining new variables,</p><p>as this reduces the chance of overwriting inbuilt R functions, which are usually</p><p>in lowercase.4</p><p>2 Some of the output given in this book may differ slightly from yours. This is most</p><p>likely due to editorial changes made for stylistic reasons. For conciseness, we also</p><p>used options(digits=3) to set the number of digits to 4 in the computer output</p><p>that appears in the book.</p><p>3 Do not be concerned if you are unfamiliar with some of these computing terms,</p><p>as they are not really essential in understanding the material in this book. The</p><p>main reason for mentioning them now is to emphasise that R can almost certainly</p><p>meet your future statistical and programming needs should you wish to take the</p><p>study of time series further.</p><p>4 For example, matrix transpose is t(), so t should not be used for time.</p><p>4 1 Time Series Data</p><p>The best way to learn to do a time series analysis in R is through practice,</p><p>so we now turn to some examples, which we invite you to work through.</p><p>1.4 Plots, trends, and seasonal variation</p><p>1.4.1 A flying start: Air passenger bookings</p><p>The number of international passenger bookings (in thousands) per month</p><p>on an airline (Pan Am) in the United States were obtained from the Federal</p><p>Aviation Administration for the period 1949–1960 (Brown, 1963). The com-</p><p>pany used the data to predict future demand before ordering new aircraft and</p><p>training aircrew. The data are available as a time series in R and illustrate</p><p>several important concepts that arise in an exploratory time series analysis.</p><p>Type the following commands in R, and check your results against the</p><p>output shown here. To save on typing, the data are assigned to a variable</p><p>called AP.</p><p>> data(AirPassengers)</p><p>> AP AP</p><p>Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec</p><p>1949 112 118 132 129 121 135 148 148 136 119 104 118</p><p>1950 115 126 141 135 125 149 170 170 158 133 114 140</p><p>1951 145 150 178 163 172 178 199 199 184 162 146 166</p><p>1952 171 180 193 181 183 218 230 242 209 191 172 194</p><p>1953 196 196 236 235 229 243 264 272 237 211 180 201</p><p>1954 204 188 235 227 234 264 302 293 259 229 203 229</p><p>1955 242 233 267 269 270 315 364 347 312 274 237 278</p><p>1956 284 277 317 313 318 374 413 405 355 306 271 306</p><p>1957 315 301 356 348 355 422 465 467 404 347 305 336</p><p>1958 340 318 362 348 363 435 491 505 404 359 310 337</p><p>1959 360 342 406 396 420 472 548 559 463 407 362 405</p><p>1960 417 391 419 461 472 535 622 606 508 461 390 432</p><p>All data in R are stored in objects, which have a range of methods available.</p><p>The class of an object can be found using the class function:</p><p>> class(AP)</p><p>[1] "ts"</p><p>> start(AP); end(AP); frequency(AP)</p><p>[1] 1949 1</p><p>[1] 1960 12</p><p>[1] 12</p><p>1.4 Plots, trends, and seasonal variation 5</p><p>In this case, the object is of class ts, which is an abbreviation for ‘time</p><p>series’. Time series objects have a number of methods available, which include</p><p>the functions start, end, and frequency given above. These methods can be</p><p>listed using the function methods, but the output from this function is not</p><p>always helpful. The key thing to bear in mind is that generic functions in R,</p><p>such as plot or summary, will attempt to give the most appropriate output</p><p>to any given input object; try typing summary(AP) now to see what happens.</p><p>As the objective in this book is to analyse time series, it makes sense to</p><p>put our data into objects of class ts. This can be achieved using a function</p><p>also called ts, but this was not necessary for the airline data, which were</p><p>already stored in this form.</p><p>Two non-stationary time series {xt} and {yt} are cointegrated if some</p><p>linear combination axt + byt, with a and b constant, is a stationary</p><p>series.</p><p>As an example consider a random walk {µt} given by µt = µt−1 + wt,</p><p>where {wt} is white noise with zero mean, and two series {xt} and {yt} given</p><p>by xt = µt +wx,t and yt = µt +wy,t, where {wx,t} and {wy,t} are independent</p><p>white noise series with zero mean. Both series are non-stationary, but their</p><p>difference {xt − yt} is stationary since it is a finite linear combination of</p><p>independent white noise terms. Thus the linear combination of {xt} and {yt},</p><p>with a = 1 and b = −1, produced a stationary series, {wx,t−wy,t}. Hence {xt}</p><p>and {yt} are cointegrated and share the underlying stochastic trend {µt}.</p><p>In R, two series can be tested for cointegration using the Phillips-Ouliaris</p><p>test implemented in the function po.test within the tseries library. The</p><p>function requires the series be given in matrix form and produces the results</p><p>for a test of the null hypothesis that the two series are not cointegrated. As an</p><p>example, we simulate two cointegrated series x and y that share the stochastic</p><p>trend mu and test for cointegration using po.test:</p><p>> x for (i in 2:1000) mu[i] x y adf.test(x)$p.value</p><p>[1] 0.502</p><p>> adf.test(y)$p.value</p><p>[1] 0.544</p><p>> po.test(cbind(x, y))</p><p>Phillips-Ouliaris Cointegration Test</p><p>218 11 Multivariate Models</p><p>data: cbind(x, y)</p><p>Phillips-Ouliaris demeaned = -1020, Truncation lag parameter = 9,</p><p>p-value = 0.01</p><p>In the example above, the conclusion of the adf.test is to retain the null</p><p>hypothesis that the series have unit roots. The po.test provides evidence</p><p>that the series are cointegrated since the null hypothesis is rejected at the 1%</p><p>level.</p><p>11.4.2 Exchange rate series</p><p>The code below is an analysis of the UK pound and Euro exchange rate</p><p>series. The Phillips-Ouliaris test shows there is evidence that the series are</p><p>cointegrated, which justifies the use of a regression model. An ARIMA model</p><p>is then fitted to the residuals of the regression model. The ar function is used</p><p>to determine the best order of an AR process. We can investigate the adequacy</p><p>of our cointegrated model by using R to fit a more general ARIMA process to</p><p>the residuals. The best-fitting ARIMA model has d = 0, which is consistent</p><p>with the residuals being a realisation of a stationary process and hence the</p><p>series being cointegrated.</p><p>> po.test(cbind(xrates$UK, xrates$EU))</p><p>Phillips-Ouliaris Cointegration Test</p><p>data: cbind(xrates$UK, xrates$EU)</p><p>Phillips-Ouliaris demeaned = -21.7, Truncation lag parameter = 10,</p><p>p-value = 0.04118</p><p>> ukeu.lm ukeu.res ukeu.res.ar ukeu.res.ar$order</p><p>[1] 3</p><p>> AIC(arima(ukeu.res, order = c(3, 0, 0)))</p><p>[1] -9886</p><p>> AIC(arima(ukeu.res, order = c(2, 0, 0)))</p><p>[1] -9886</p><p>> AIC(arima(ukeu.res, order = c(1, 0, 0)))</p><p>[1] -9880</p><p>> AIC(arima(ukeu.res, order = c(1, 1, 0)))</p><p>[1] -9876</p><p>11.5 Bivariate and multivariate white noise 219</p><p>Comparing the AICs for the AR(2) and AR(3) models, it is clear there is</p><p>little difference and that the AR(2) model would be satisfactory. The example</p><p>above also shows that the AR models provide a better fit to the residual</p><p>series than the ARIMA(1, 1, 0) model, so the residual series may be treated</p><p>as stationary. This supports the result of the Phillips-Ouliaris test since a</p><p>linear combination of the two exchange rates, obtained from the regression</p><p>model, has produced a residual series that appears to be a realisation of a</p><p>stationary process.</p><p>11.5 Bivariate and multivariate white noise</p><p>Two series {wx,t} and {wy,t} are bivariate white noise if they are stationary</p><p>and their cross-covariance γxy(k) = Cov(wx,t, wy,t+k) satisfies</p><p>γxx(k) = γyy(k) = γxy(k) = 0 for all k 6= 0 (11.1)</p><p>In the equation above, γxx(0) = γyy(0) = 1 and γxy(0) may be zero or non-</p><p>zero. Hence, bivariate white noise series {wx,t} and {wy,t} may be regarded as</p><p>white noise when considered individually but when considered as a pair may</p><p>be cross-correlated at lag 0.</p><p>The definition of bivariate white noise readily extends to multivariate white</p><p>noise. Let γij(k) = Cov(wi,t, wj,t+k) be the cross-correlation between the se-</p><p>ries {wi,t} and {wj,t} (i, j = 1, . . . n). Then stationary series {w1,t}, {w2,t}, ...,</p><p>{wn,t} are multivariate white noise if each individual series is white noise and,</p><p>for each pair of series (i 6= j), γij(k) = 0 for all k 6= 0. In other words, multi-</p><p>variate white noise is a sequence of independent draws from some multivariate</p><p>distribution.</p><p>Multivariate Gaussian white noise can be simulated with the rmvnorm</p><p>function in the mvtnorm library. The function may take a mean and covari-</p><p>ance matrix as a parameter input, and the dimensions of these determine the</p><p>dimension of the output matrix. In the following example, the covariance ma-</p><p>trix is 2 × 2, so the output variable x is bivariate with 1000 simulated white</p><p>noise values in each of two columns. An arbitrary value of 0.8 is chosen for</p><p>the correlation to illustrate the use of the function.</p><p>> library(mvtnorm)</p><p>> cov.mat w cov(w)</p><p>[,1] [,2]</p><p>[1,] 1.073 0.862</p><p>[2,] 0.862 1.057</p><p>> wx wy ccf(wx, wy, main = "")</p><p>220 11 Multivariate Models</p><p>The ccf function verifies that the cross-correlations are approximately zero</p><p>for all non-zero lags (Fig. 11.5). As an exercise, check that the series in each</p><p>column of x are approximately white noise using the acf function.</p><p>One simple use of bivariate or multivariate white noise is in the method</p><p>of prewhitening. Separate SARIMA models are fitted to multiple time series</p><p>variables so that the residuals of the fitted models appear to be a realisation</p><p>of multivariate white noise. The SARIMA models can then be used to forecast</p><p>the expected values of each time series variable, and multivariate simulations</p><p>can be produced by adding multivariate white noise terms to the forecasts.</p><p>The method works well provided the multiple time series have no common</p><p>stochastic trends and the cross-correlation structure is restricted to the error</p><p>process.</p><p>−20 −10 0 10 20</p><p>0.</p><p>0</p><p>0.</p><p>2</p><p>0.</p><p>4</p><p>0.</p><p>6</p><p>0.</p><p>8</p><p>Lag</p><p>A</p><p>C</p><p>F</p><p>Fig. 11.5. Cross-correlation of simulated bivariate Gaussian white noise</p><p>11.6 Vector autoregressive models</p><p>Two time series, {xt} and {yt}, follow a vector autoregressive process of order</p><p>1 (denoted VAR(1)) if</p><p>xt = θ11xt−1 + θ12yt−1 + wx,t</p><p>yt = θ21xt−1 + θ22yt−1 + wy,t (11.2)</p><p>where {wx,t} and {wy,t} are bivariate white noise and θij are model param-</p><p>eters. If the white noise sequences are defined with mean 0 and the process</p><p>is stationary, both time series {xt} and {yt} have mean 0 (Exercise 1). The</p><p>simplest way of incorporating a mean is to define {xt} and {yt} as deviations</p><p>from mean values. Equation (11.2) can be rewritten in matrix notation as</p><p>Zt = ΘZt−1 + wt (11.3)</p><p>11.6 Vector autoregressive models 221</p><p>where</p><p>Zt =</p><p>(</p><p>xt</p><p>yt</p><p>)</p><p>Θ =</p><p>(</p><p>θ11 θ12</p><p>θ21 θ22</p><p>)</p><p>wt =</p><p>(</p><p>wx,t</p><p>wy,t</p><p>)</p><p>Equation (11.3) is a vector expression for an AR(1) process; i.e., the process</p><p>is vector autoregressive. Using the backward shift operator, Equation (11.3)</p><p>can also be written</p><p>(I−ΘB)Zt = θ(B)Zt = wt (11.4)</p><p>where θ is a matrix polynomial of order 1 and I is the 2× 2 identity matrix.</p><p>A VAR(1) process can be extended to a VAR(p) process by allowing θ to be</p><p>a matrix polynomial of order p. A VAR(p) model for m time series is also</p><p>defined by Equation (11.4), in which I is the m ×m identity matrix, θ is a</p><p>polynomial of m ×m matrices of parameters, Zt is an m × 1 matrix of time</p><p>series variables, and wt is multivariate white noise. For a VAR model, the</p><p>characteristic equation is given by a determinant of a matrix. Analogous to</p><p>AR models, a VAR(p) model is stationary if the roots of the determinant |θ(x)|</p><p>all exceed unity in absolute value. For the VAR(1) model, the determinant is</p><p>given by ∣∣∣∣ 1− θ11x −θ12x</p><p>−θ21x 1− θ22x</p><p>∣∣∣∣ = (1−</p><p>In the next example, we shall create a ts object</p><p>from data read directly from the Internet.</p><p>One of the most important steps in a preliminary time series analysis is to</p><p>plot the data; i.e., create a time plot. For a time series object, this is achieved</p><p>with the generic plot function:</p><p>> plot(AP, ylab = "Passengers (1000's)")</p><p>You should obtain a plot similar to Figure 1.1 below. Parameters, such as</p><p>xlab or ylab, can be used in plot to improve the default labels.</p><p>Time</p><p>P</p><p>as</p><p>se</p><p>ng</p><p>er</p><p>s</p><p>(1</p><p>00</p><p>0s</p><p>)</p><p>1950 1952 1954 1956 1958 1960</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>Fig. 1.1. International air passenger bookings in the United States for the period</p><p>1949–1960.</p><p>There are a number of features in the time plot of the air passenger data</p><p>that are common to many time series (Fig. 1.1). For example, it is apparent</p><p>that the number of passengers travelling on the airline is increasing with time.</p><p>In general, a systematic change in a time series that does not appear to be</p><p>periodic is known as a trend . The simplest model for a trend is a linear increase</p><p>or decrease, and this is often an adequate approximation.</p><p>6 1 Time Series Data</p><p>A repeating pattern within each year is known as seasonal variation, al-</p><p>though the term is applied more generally to repeating patterns within any</p><p>fixed period, such as restaurant bookings on different days of the week. There</p><p>is clear seasonal variation in the air passenger time series. At the time, book-</p><p>ings were highest during the summer months of June, July, and August and</p><p>lowest during the autumn month of November and winter month of February.</p><p>Sometimes we may claim there are cycles in a time series that do not corre-</p><p>spond to some fixed natural period; examples may include business cycles or</p><p>climatic oscillations such as El Niño. None of these is apparent in the airline</p><p>bookings time series.</p><p>An understanding of the likely causes of the features in the plot helps us</p><p>formulate an appropriate time series model. In this case, possible causes of</p><p>the increasing trend include rising prosperity in the aftermath of the Second</p><p>World War, greater availability of aircraft, cheaper flights due to competition</p><p>between airlines, and an increasing population. The seasonal variation coin-</p><p>cides with vacation periods. In Chapter 5, time series regression models will</p><p>be specified to allow for underlying causes like these. However, many time</p><p>series exhibit trends, which might, for example, be part of a longer cycle or be</p><p>random and subject to unpredictable change. Random, or stochastic, trends</p><p>are common in economic and financial time series. A regression model would</p><p>not be appropriate for a stochastic trend.</p><p>Forecasting relies on extrapolation, and forecasts are generally based on</p><p>an assumption that present trends continue. We cannot check this assumption</p><p>in any empirical way, but if we can identify likely causes for a trend, we can</p><p>justify extrapolating it, for a few time steps at least. An additional argument</p><p>is that, in the absence of some shock to the system, a trend is likely to change</p><p>relatively slowly, and therefore linear extrapolation will provide a reasonable</p><p>approximation for a few time steps ahead. Higher-order polynomials may give</p><p>a good fit to the historic time series, but they should not be used for extrap-</p><p>olation. It is better to use linear extrapolation from the more recent values</p><p>in the time series. Forecasts based on extrapolation beyond a year are per-</p><p>haps better described as scenarios. Expecting trends to continue linearly for</p><p>many years will often be unrealistic, and some more plausible trend curves</p><p>are described in Chapters 3 and 5.</p><p>A time series plot not only emphasises patterns and features of the data</p><p>but can also expose outliers and erroneous values. One cause of the latter is</p><p>that missing data are sometimes coded using a negative value. Such values</p><p>need to be handled differently in the analysis and must not be included as</p><p>observations when fitting a model to data.5 Outlying values that cannot be</p><p>attributed to some coding should be checked carefully. If they are correct,</p><p>5 Generally speaking, missing values are suitably handled by R, provided they are</p><p>correctly coded as ‘NA’. However, if your data do contain missing values, then it</p><p>is always worth checking the ‘help’ on the R function that you are using, as an</p><p>extra parameter or piece of coding may be required.</p><p>1.4 Plots, trends, and seasonal variation 7</p><p>they are likely to be of particular interest and should not be excluded from</p><p>the analysis. However, it may be appropriate to consider robust methods of</p><p>fitting models, which reduce the influence of outliers.</p><p>To get a clearer view of the trend, the seasonal effect can be removed by</p><p>aggregating the data to the annual level, which can be achieved in R using the</p><p>aggregate function. A summary of the values for each season can be viewed</p><p>using a boxplot, with the cycle function being used to extract the seasons</p><p>for each item of data.</p><p>The plots can be put in a single graphics window using the layout func-</p><p>tion, which takes as input a vector (or matrix) for the location of each plot</p><p>in the display window. The resulting boxplot and annual series are shown in</p><p>Figure 1.2.</p><p>> layout(1:2)</p><p>> plot(aggregate(AP))</p><p>> boxplot(AP ~ cycle(AP))</p><p>You can see an increasing trend in the annual series (Fig. 1.2a) and the sea-</p><p>sonal effects in the boxplot. More people travelled during the summer months</p><p>of June to September (Fig. 1.2b).</p><p>1.4.2 Unemployment: Maine</p><p>Unemployment rates are one of the main economic indicators used by politi-</p><p>cians and other decision makers. For example, they influence policies for re-</p><p>gional development and welfare provision. The monthly unemployment rate</p><p>for the US state of Maine from January 1996 until August 2006 is plotted</p><p>in the upper frame of Figure 1.3. In any time series analysis, it is essential</p><p>to understand how the data have been collected and their unit of measure-</p><p>ment. The US Department of Labor gives precise definitions of terms used to</p><p>calculate the unemployment rate.</p><p>The monthly unemployment data are available in a file online that is read</p><p>into R in the code below. Note that the first row in the file contains the name</p><p>of the variable (unemploy), which can be accessed directly once the attach</p><p>command is given. Also, the header parameter must be set to TRUE so that R</p><p>treats the first row as the variable name rather than data.</p><p>> www Maine.month attach(Maine.month)</p><p>> class(Maine.month)</p><p>[1] "data.frame"</p><p>When we read data in this way from an ASCII text file, the ‘class’ is not</p><p>time series but data.frame. The ts function is used to convert the data to a</p><p>time series object. The following command creates a time series object:</p><p>8 1 Time Series Data</p><p>(a) Aggregated annual series</p><p>1950 1952 1954 1956 1958 1960</p><p>20</p><p>00</p><p>40</p><p>00</p><p>Jan Mar May Jul Sep Nov</p><p>10</p><p>0</p><p>30</p><p>0</p><p>50</p><p>0</p><p>(b) Boxplot of seasonal values</p><p>Fig. 1.2. International air passenger bookings in the United States for the period</p><p>1949–1960. Units on the y-axis are 1000s of people. (a) Series aggregated to the</p><p>annual level; (b) seasonal boxplots of the data.</p><p>> Maine.month.ts Maine.annual.ts layout(1:2)</p><p>> plot(Maine.month.ts, ylab = "unemployed (%)")</p><p>> plot(Maine.annual.ts, ylab = "unemployed (%)")</p><p>We can calculate the precise percentages in R, using window. This</p><p>function</p><p>will extract that part of the time series between specified start and end points</p><p>1.4 Plots, trends, and seasonal variation 9</p><p>and will sample with an interval equal to frequency if its argument is set to</p><p>TRUE. So, the first line below gives a time series of February figures.</p><p>> Maine.Feb Maine.Aug Feb.ratio Aug.ratio Feb.ratio</p><p>[1] 1.223</p><p>> Aug.ratio</p><p>[1] 0.8164</p><p>On average, unemployment is 22% higher in February and 18% lower in</p><p>August. An explanation is that Maine attracts tourists during the summer,</p><p>and this creates more jobs. Also, the period before Christmas and over the</p><p>New Year’s holiday tends to have higher employment rates than the first few</p><p>months of the new year. The annual unemployment rate was as high as 8.5%</p><p>in 1976 but was less than 4% in 1988 and again during the three years 1999–</p><p>2001. If we had sampled the data in August of each year, for example, rather</p><p>than taken yearly averages, we would have consistently underestimated the</p><p>unemployment rate by a factor of about 0.8.</p><p>(a)</p><p>un</p><p>em</p><p>pl</p><p>oy</p><p>ed</p><p>(</p><p>%</p><p>)</p><p>1996 1998 2000 2002 2004 2006</p><p>3</p><p>4</p><p>5</p><p>6</p><p>(b)</p><p>un</p><p>em</p><p>pl</p><p>oy</p><p>ed</p><p>(</p><p>%</p><p>)</p><p>1996 1998 2000 2002 2004</p><p>3.</p><p>5</p><p>4.</p><p>0</p><p>4.</p><p>5</p><p>5.</p><p>0</p><p>Fig. 1.3. Unemployment in Maine: (a) monthly January 1996–August 2006; (b)</p><p>annual 1996–2005.</p><p>10 1 Time Series Data</p><p>Time</p><p>un</p><p>em</p><p>pl</p><p>oy</p><p>ed</p><p>(</p><p>%</p><p>)</p><p>1996 1998 2000 2002 2004 2006</p><p>4.</p><p>0</p><p>4.</p><p>5</p><p>5.</p><p>0</p><p>5.</p><p>5</p><p>6.</p><p>0</p><p>Fig. 1.4. Unemployment in the United States January 1996–October 2006.</p><p>The monthly unemployment rate for all of the United States from January</p><p>1996 until October 2006 is plotted in Figure 1.4. The decrease in the unem-</p><p>ployment rate around the millennium is common to Maine and the United</p><p>States as a whole, but Maine does not seem to be sharing the current US</p><p>decrease in unemployment.</p><p>> www US.month attach(US.month)</p><p>> US.month.ts plot(US.month.ts, ylab = "unemployed (%)")</p><p>1.4.3 Multiple time series: Electricity, beer and chocolate data</p><p>Here we illustrate a few important ideas and concepts related to multiple time</p><p>series data. The monthly supply of electricity (millions of kWh), beer (Ml),</p><p>and chocolate-based production (tonnes) in Australia over the period January</p><p>1958 to December 1990 are available from the Australian Bureau of Statistics</p><p>(ABS).6 The three series have been stored in a single file online, which can be</p><p>read as follows:</p><p>www CBE[1:4, ]</p><p>choc beer elec</p><p>1 1451 96.3 1497</p><p>2 2037 84.4 1463</p><p>3 2477 91.2 1648</p><p>4 2785 81.9 1595</p><p>6 ABS data used with permission from the Australian Bureau of Statistics:</p><p>http://www.abs.gov.au.</p><p>1.4 Plots, trends, and seasonal variation 11</p><p>> class(CBE)</p><p>[1] "data.frame"</p><p>Now create time series objects for the electricity, beer, and chocolate data.</p><p>If you omit end, R uses the full length of the vector, and if you omit the month</p><p>in start, R assumes 1. You can use plot with cbind to plot several series on</p><p>one figure (Fig. 1.5).</p><p>> Elec.ts Beer.ts Choc.ts plot(cbind(Elec.ts, Beer.ts, Choc.ts))</p><p>20</p><p>00</p><p>60</p><p>00</p><p>10</p><p>00</p><p>0</p><p>14</p><p>00</p><p>0</p><p>E</p><p>le</p><p>c.</p><p>ts</p><p>10</p><p>0</p><p>15</p><p>0</p><p>20</p><p>0</p><p>B</p><p>ee</p><p>r.</p><p>ts</p><p>20</p><p>00</p><p>40</p><p>00</p><p>60</p><p>00</p><p>80</p><p>00</p><p>1960 1965 1970 1975 1980 1985 1990</p><p>C</p><p>ho</p><p>c.</p><p>ts</p><p>Time</p><p>Chocolate, Beer, and Electricity Production: 1958−1990</p><p>Fig. 1.5. Australian chocolate, beer, and electricity production; January 1958–</p><p>December 1990.</p><p>The plots in Figure 1.5 show increasing trends in production for all three</p><p>goods, partly due to the rising population in Australia from about 10 million</p><p>to about 18 million over the same period (Fig. 1.6). But notice that electricity</p><p>production has risen by a factor of 7, and chocolate production by a factor of</p><p>4, over this period during which the population has not quite doubled.</p><p>The three series constitute a multiple time series. There are many functions</p><p>in R for handling more than one series, including ts.intersect to obtain the</p><p>intersection of two series that overlap in time. We now illustrate the use of the</p><p>intersect function and point out some potential pitfalls in analysing multiple</p><p>12 1 Time Series Data</p><p>1900 1920 1940 1960 1980 2000</p><p>5</p><p>10</p><p>15</p><p>m</p><p>ill</p><p>io</p><p>ns</p><p>Fig. 1.6. Australia’s population, 1900–2000.</p><p>time series. The intersection between the air passenger data and the electricity</p><p>data is obtained as follows:</p><p>> AP.elec start(AP.elec)</p><p>[1] 1958 1</p><p>> end(AP.elec)</p><p>[1] 1960 12</p><p>> AP.elec[1:3, ]</p><p>AP Elec.ts</p><p>[1,] 340 1497</p><p>[2,] 318 1463</p><p>[3,] 362 1648</p><p>In the code below, the data for each series are extracted and plotted</p><p>(Fig. 1.7).7</p><p>> AP layout(1:2)</p><p>> plot(AP, main = "", ylab = "Air passengers / 1000's")</p><p>> plot(Elec, main = "", ylab = "Electricity production / MkWh")</p><p>> plot(as.vector(AP), as.vector(Elec),</p><p>xlab = "Air passengers / 1000's",</p><p>ylab = "Electricity production / MWh")</p><p>> abline(reg = lm(Elec ~ AP))</p><p>7 R is case sensitive, so lowercase is used here to represent the shorter record of air</p><p>passenger data. In the code, we have also used the argument main="" to suppress</p><p>unwanted titles.</p><p>1.4 Plots, trends, and seasonal variation 13</p><p>> cor(AP, Elec)</p><p>[1] 0.884</p><p>In the plot function above, as.vector is needed to convert the ts objects to</p><p>ordinary vectors suitable for a scatter plot.</p><p>Time</p><p>A</p><p>ir</p><p>P</p><p>as</p><p>se</p><p>ng</p><p>er</p><p>s</p><p>(1</p><p>00</p><p>0s</p><p>)</p><p>1958.0 1958.5 1959.0 1959.5 1960.0 1960.5 1961.0</p><p>30</p><p>0</p><p>40</p><p>0</p><p>50</p><p>0</p><p>60</p><p>0</p><p>Time</p><p>E</p><p>le</p><p>ct</p><p>ric</p><p>ity</p><p>p</p><p>ro</p><p>du</p><p>ct</p><p>io</p><p>n</p><p>(G</p><p>M</p><p>kW</p><p>h)</p><p>1958.0 1958.5 1959.0 1959.5 1960.0 1960.5 1961.0</p><p>16</p><p>00</p><p>20</p><p>00</p><p>Fig. 1.7. International air passengers and Australian electricity production for the</p><p>period 1958–1960. The plots look similar because both series have an increasing</p><p>trend and a seasonal cycle. However, this does not imply that there exists a causal</p><p>relationship between the variables.</p><p>The two time series are highly correlated, as can be seen in the plots, with a</p><p>correlation coefficient of 0.88. Correlation will be discussed more in Chapter 2,</p><p>but for the moment observe that the two time plots look similar (Fig. 1.7) and</p><p>that the scatter plot shows an approximate linear association between the two</p><p>variables (Fig. 1.8). However, it is important to realise that correlation does</p><p>not imply causation. In this case, it is not plausible that higher numbers of</p><p>air passengers in the United States cause, or are caused by, higher electricity</p><p>production in Australia. A reasonable explanation for the correlation is that</p><p>the increasing prosperity and technological development in both countries over</p><p>this period accounts for the increasing trends. The two time series also happen</p><p>to have similar seasonal variations. For these reasons, it is usually appropriate</p><p>to remove trends and seasonal effects before comparing multiple series. This</p><p>is often achieved by working with the residuals of a regression model that has</p><p>deterministic terms to represent the trend and seasonal effects (Chapter 5).</p><p>14 1 Time Series Data</p><p>In the simplest cases, the residuals can be modelled as independent random</p><p>variation from a single distribution, but much of the book is concerned with</p><p>fitting more sophisticated models.</p><p>Fig. 1.8. Scatter plot of air passengers and Australian electricity production for</p><p>the period: 1958–1960. The apparent linear relationship between the two variables</p><p>is misleading and a consequence of the trends in the series.</p><p>1.4.4 Quarterly exchange rate: GBP to NZ dollar</p><p>The trends and seasonal patterns in the previous two examples were clear</p><p>from the plots. In addition, reasonable explanations</p><p>could be put forward for</p><p>the possible causes of these features. With financial data, exchange rates for</p><p>example, such marked patterns are less likely to be seen, and different methods</p><p>of analysis are usually required. A financial series may sometimes show a</p><p>dramatic change that has a clear cause, such as a war or natural disaster. Day-</p><p>to-day changes are more difficult to explain because the underlying causes are</p><p>complex and impossible to isolate, and it will often be unrealistic to assume</p><p>any deterministic component in the time series model.</p><p>The exchange rates for British pounds sterling to New Zealand dollars</p><p>for the period January 1991 to March 2000 are shown in Figure 1.9. The</p><p>data are mean values taken over quarterly periods of three months, with the</p><p>first quarter being January to March and the last quarter being October to</p><p>December. They can be read into R from the book website and converted to</p><p>a quarterly time series as follows:</p><p>> www Z Z[1:4, ]</p><p>[1] 2.92 2.94 3.17 3.25</p><p>> Z.ts plot(Z.ts, xlab = "time / years",</p><p>ylab = "Quarterly exchange rate in $NZ / pound")</p><p>Short-term trends are apparent in the time series: After an initial surge</p><p>ending in 1992, a negative trend leads to a minimum around 1996, which is</p><p>followed by a positive trend in the second half of the series (Fig. 1.9).</p><p>The trend seems to change direction at unpredictable times rather than</p><p>displaying the relatively consistent pattern of the air passenger series and</p><p>Australian production series. Such trends have been termed stochastic trends</p><p>to emphasise this randomness and to distinguish them from more deterministic</p><p>trends like those seen in the previous examples. A mathematical model known</p><p>as a random walk can sometimes provide a good fit to data like these and is</p><p>fitted to this series in §4.4.2. Stochastic trends are common in financial series</p><p>and will be studied in more detail in Chapters 4 and 7.</p><p>Time (years)</p><p>E</p><p>xc</p><p>ha</p><p>ng</p><p>e</p><p>ra</p><p>te</p><p>in</p><p>$</p><p>N</p><p>Z</p><p>/</p><p>po</p><p>un</p><p>d</p><p>1992 1994 1996 1998 2000</p><p>2.</p><p>2</p><p>2.</p><p>6</p><p>3.</p><p>0</p><p>3.</p><p>4</p><p>Fig. 1.9. Quarterly exchange rates for the period 1991–2000.</p><p>Two local trends are emphasised when the series is partitioned into two</p><p>subseries based on the periods 1992–1996 and 1996–1998. The window function</p><p>can be used to extract the subseries:</p><p>> Z.92.96 Z.96.98 layout (1:2)</p><p>> plot(Z.92.96, ylab = "Exchange rate in $NZ/pound",</p><p>xlab = "Time (years)" )</p><p>> plot(Z.96.98, ylab = "Exchange rate in $NZ/pound",</p><p>xlab = "Time (years)" )</p><p>Now suppose we were observing this series at the start of 1992; i.e., we</p><p>had the data in Figure 1.10(a). It might have been tempting to predict a</p><p>16 1 Time Series Data</p><p>(a) Exchange rates for 1992−1996</p><p>Time (years)</p><p>E</p><p>xc</p><p>ha</p><p>ng</p><p>e</p><p>ra</p><p>te</p><p>in</p><p>$</p><p>N</p><p>Z</p><p>/</p><p>po</p><p>un</p><p>d</p><p>1992 1993 1994 1995 1996</p><p>2.</p><p>2</p><p>2.</p><p>6</p><p>3.</p><p>0</p><p>3.</p><p>4</p><p>(b) Exchange rates for 1996−1998</p><p>Time (years)</p><p>E</p><p>xc</p><p>ha</p><p>ng</p><p>e</p><p>ra</p><p>te</p><p>in</p><p>$</p><p>N</p><p>Z</p><p>/</p><p>po</p><p>un</p><p>d</p><p>1996.0 1996.5 1997.0 1997.5 1998.0</p><p>2.</p><p>4</p><p>2.</p><p>8</p><p>Fig. 1.10. Quarterly exchange rates for two periods. The plots indicate that without</p><p>additional information it would be inappropriate to extrapolate the trends.</p><p>continuation of the downward trend for future years. However, this would have</p><p>been a very poor prediction, as Figure 1.10(b) shows that the data started to</p><p>follow an increasing trend. Likewise, without additional information, it would</p><p>also be inadvisable to extrapolate the trend in Figure 1.10(b). This illustrates</p><p>the potential pitfall of inappropriate extrapolation of stochastic trends when</p><p>underlying causes are not properly understood. To reduce the risk of making</p><p>an inappropriate forecast, statistical tests, introduced in Chapter 7, can be</p><p>used to test for a stochastic trend.</p><p>1.4.5 Global temperature series</p><p>A change in the world’s climate will have a major impact on the lives of</p><p>many people, as global warming is likely to lead to an increase in ocean levels</p><p>and natural hazards such as floods and droughts. It is likely that the world</p><p>economy will be severely affected as governments from around the globe try</p><p>1.4 Plots, trends, and seasonal variation 17</p><p>to enforce a reduction in fossil fuel use and measures are taken to deal with</p><p>any increase in natural disasters.8</p><p>In climate change studies (e.g., see Jones and Moberg, 2003; Rayner et al.</p><p>2003), the following global temperature series, expressed as anomalies from</p><p>the monthly means over the period 1961–1990, plays a central role:9</p><p>> www Global Global.ts Global.annual plot(Global.ts)</p><p>> plot(Global.annual)</p><p>It is the trend that is of most concern, so the aggregate function is used</p><p>to remove any seasonal effects within each year and produce an annual series</p><p>of mean temperatures for the period 1856 to 2005 (Fig. 1.11b). We can avoid</p><p>explicitly dividing by 12 if we specify FUN=mean in the aggregate function.</p><p>The upward trend from about 1970 onwards has been used as evidence</p><p>of global warming (Fig. 1.12). In the code below, the monthly time inter-</p><p>vals corresponding to the 36-year period 1970–2005 are extracted using the</p><p>time function and the associated observed temperature series extracted using</p><p>window. The data are plotted and a line superimposed using a regression of</p><p>temperature on the new time index (Fig. 1.12).</p><p>> New.series New.time plot(New.series); abline(reg=lm(New.series ~ New.time))</p><p>In the previous section, we discussed a potential pitfall of inappropriate</p><p>extrapolation. In climate change studies, a vital question is whether rising</p><p>temperatures are a consequence of human activity, specifically the burning</p><p>of fossil fuels and increased greenhouse gas emissions, or are a natural trend,</p><p>perhaps part of a longer cycle, that may decrease in the future without needing</p><p>a global reduction in the use of fossil fuels. We cannot attribute the increase in</p><p>global temperature to the increasing use of fossil fuels without invoking some</p><p>physical explanation10 because, as we noted in §1.4.3, two unrelated time</p><p>series will be correlated if they both contain a trend. However, as the general</p><p>consensus among scientists is that the trend in the global temperature series is</p><p>related to a global increase in greenhouse gas emissions, it seems reasonable to</p><p>8 For general policy documents and discussions on climate change, see the website</p><p>(and links) for the United Nations Framework Convention on Climate Change at</p><p>http://unfccc.int.</p><p>9 The data are updated regularly and can be downloaded free of charge from the</p><p>Internet at: http://www.cru.uea.ac.uk/cru/data/.</p><p>10 For example, refer to US Energy Information Administration at</p><p>http://www.eia.doe.gov/emeu/aer/inter.html.</p><p>18 1 Time Series Data</p><p>Time</p><p>te</p><p>m</p><p>pe</p><p>ra</p><p>tu</p><p>re</p><p>in</p><p>o C</p><p>1900 1950 2000</p><p>−</p><p>1.</p><p>0</p><p>0.</p><p>0</p><p>(a) Monthly series: January 1856 to December 2005</p><p>Time</p><p>te</p><p>m</p><p>pe</p><p>ra</p><p>tu</p><p>re</p><p>in</p><p>o C</p><p>1900 1950 2000</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>2</p><p>0.</p><p>6</p><p>(b) Mean annual series: 1856 to 2005</p><p>Fig. 1.11. Time plots of the global temperature series (oC).</p><p>Time</p><p>te</p><p>m</p><p>pe</p><p>ra</p><p>tu</p><p>re</p><p>in</p><p>o C</p><p>1970 1975 1980 1985 1990 1995 2000 2005</p><p>−</p><p>0.</p><p>4</p><p>0.</p><p>0</p><p>0.</p><p>4</p><p>0.</p><p>8</p><p>Fig. 1.12. Rising mean global temperatures, January 1970–December 2005. Ac-</p><p>cording to the United Nations Framework Convention on Climate Change, the mean</p><p>global temperature is expected to continue to rise in the future unless greenhouse</p><p>gas emissions are reduced on a global scale.</p><p>1.5 Decomposition of series 19</p><p>acknowledge</p><p>a causal relationship and to expect the mean global temperature</p><p>to continue to rise if greenhouse gas emissions are not reduced.11</p><p>1.5 Decomposition of series</p><p>1.5.1 Notation</p><p>So far, our analysis has been restricted to plotting the data and looking for</p><p>features such as trend and seasonal variation. This is an important first step,</p><p>but to progress we need to fit time series models, for which we require some</p><p>notation. We represent a time series of length n by {xt : t = 1, . . . , n} =</p><p>{x1, x2, . . . , xn}. It consists of n values sampled at discrete times 1, 2, . . . , n.</p><p>The notation will be abbreviated to {xt} when the length n of the series</p><p>does not need to be specified. The time series model is a sequence of random</p><p>variables, and the observed time series is considered a realisation from the</p><p>model. We use the same notation for both and rely on the context to make</p><p>the distinction.12 An overline is used for sample means:</p><p>x̄ =</p><p>∑</p><p>xi/n (1.1)</p><p>The ‘hat’ notation will be used to represent a prediction or forecast. For</p><p>example, with the series {xt : t = 1, . . . , n}, x̂t+k|t is a forecast made at time</p><p>t for a future value at time t + k. A forecast is a predicted future value, and</p><p>the number of time steps into the future is the lead time (k). Following our</p><p>convention for time series notation, x̂t+k|t can be the random variable or the</p><p>numerical value, depending on the context.</p><p>1.5.2 Models</p><p>As the first two examples showed, many series are dominated by a trend</p><p>and/or seasonal effects, so the models in this section are based on these com-</p><p>ponents. A simple additive decomposition model is given by</p><p>xt = mt + st + zt (1.2)</p><p>where, at time t, xt is the observed series, mt is the trend, st is the seasonal</p><p>effect, and zt is an error term that is, in general, a sequence of correlated</p><p>random variables with mean zero. In this section, we briefly outline two main</p><p>approaches for extracting the trend mt and the seasonal effect st in Equation</p><p>(1.2) and give the main R functions for doing this.</p><p>11 Refer to http://unfccc.int.</p><p>12 Some books do distinguish explicitly by using lowercase for the time series and</p><p>uppercase for the model.</p><p>20 1 Time Series Data</p><p>If the seasonal effect tends to increase as the trend increases, a multiplica-</p><p>tive model may be more appropriate:</p><p>xt = mt · st + zt (1.3)</p><p>If the random variation is modelled by a multiplicative factor and the variable</p><p>is positive, an additive decomposition model for log(xt) can be used:13</p><p>log(xt) = mt + st + zt (1.4)</p><p>Some care is required when the exponential function is applied to the predicted</p><p>mean of log(xt) to obtain a prediction for the mean value xt, as the effect is</p><p>usually to bias the predictions. If the random series zt are normally distributed</p><p>with mean 0 and variance σ2, then the predicted mean value at time t based</p><p>on Equation (1.4) is given by</p><p>x̂t = emt+ste</p><p>1</p><p>2 σ2</p><p>(1.5)</p><p>However, if the error series is not normally distributed and is negatively</p><p>skewed,14 as is often the case after taking logarithms, the bias correction</p><p>factor will be an overcorrection (Exercise 4) and it is preferable to apply an</p><p>empirical adjustment (which is discussed further in Chapter 5). The issue is</p><p>of practical importance. For example, if we make regular financial forecasts</p><p>without applying an adjustment, we are likely to consistently underestimate</p><p>mean costs.</p><p>1.5.3 Estimating trends and seasonal effects</p><p>There are various ways to estimate the trend mt at time t, but a relatively</p><p>simple procedure, which is available in R and does not assume any specific</p><p>form is to calculate a moving average centred on xt. A moving average is</p><p>an average of a specified number of time series values around each value in</p><p>the time series, with the exception of the first few and last few terms. In this</p><p>context, the length of the moving average is chosen to average out the seasonal</p><p>effects, which can be estimated later. For monthly series, we need to average</p><p>twelve consecutive months, but there is a slight snag. Suppose our time series</p><p>begins at January (t = 1) and we average January up to December (t = 12).</p><p>This average corresponds to a time t = 6.5, between June and July. When we</p><p>come to estimate seasonal effects, we need a moving average at integer times.</p><p>This can be achieved by averaging the average of January up to December</p><p>and the average of February (t = 2) up to January (t = 13). This average of</p><p>13 To be consistent with R, we use log for the natural logarithm, which is often</p><p>written ln.</p><p>14 A probability distribution is negatively skewed if its density has a long tail to the</p><p>left.</p><p>1.5 Decomposition of series 21</p><p>two moving averages corresponds to t = 7, and the process is called centring.</p><p>Thus the trend at time t can be estimated by the centred moving average</p><p>m̂t =</p><p>1</p><p>2xt−6 + xt−5 + . . .+ xt−1 + xt + xt+1 + . . .+ xt+5 + 1</p><p>2xt+6</p><p>12</p><p>(1.6)</p><p>where t = 7, . . . , n − 6. The coefficients in Equation (1.6) for each month</p><p>are 1/12 (or sum to 1/12 in the case of the first and last coefficients), so that</p><p>equal weight is given to each month and the coefficients sum to 1. By using the</p><p>seasonal frequency for the coefficients in the moving average, the procedure</p><p>generalises for any seasonal frequency (e.g., quarterly series), provided the</p><p>condition that the coefficients sum to unity is still met.</p><p>An estimate of the monthly additive effect (st) at time t can be obtained</p><p>by subtracting m̂t:</p><p>ŝt = xt − m̂t (1.7)</p><p>By averaging these estimates of the monthly effects for each month, we obtain</p><p>a single estimate of the effect for each month. If the period of the time series</p><p>is a whole number of years, the number of monthly effects averaged for each</p><p>month is one less than the number of years of record. At this stage, the twelve</p><p>monthly additive components should have an average value close to, but not</p><p>usually exactly equal to, zero. It is usual to adjust them by subtracting this</p><p>mean so that they do average zero. If the monthly effect is multiplicative, the</p><p>estimate is given by division; i.e., ŝt = xt/m̂t. It is usual to adjust monthly</p><p>multiplicative factors so that they average unity. The procedure generalises,</p><p>using the same principle, to any seasonal frequency.</p><p>It is common to present economic indicators, such as unemployment per-</p><p>centages, as seasonally adjusted series. This highlights any trend that might</p><p>otherwise be masked by seasonal variation attributable, for instance, to the</p><p>end of the academic year, when school and university leavers are seeking work.</p><p>If the seasonal effect is additive, a seasonally adjusted series is given by xt− s̄t,</p><p>whilst if it is multiplicative, an adjusted series is obtained from xt/s̄t, where</p><p>s̄t is the seasonally adjusted mean for the month corresponding to time t.</p><p>1.5.4 Smoothing</p><p>The centred moving average is an example of a smoothing procedure that is</p><p>applied retrospectively to a time series with the objective of identifying an un-</p><p>derlying signal or trend. Smoothing procedures can, and usually do, use points</p><p>before and after the time at which the smoothed estimate is to be calculated.</p><p>A consequence is that the smoothed series will have some points missing at</p><p>the beginning and the end unless the smoothing algorithm is adapted for the</p><p>end points.</p><p>A second smoothing algorithm offered by R is stl. This uses a locally</p><p>weighted regression technique known as loess. The regression, which can be</p><p>a line or higher polynomial, is referred to as local because it uses only some</p><p>22 1 Time Series Data</p><p>relatively small number of points on either side of the point at which the</p><p>smoothed estimate is required. The weighting reduces the influence of outlying</p><p>points and is an example of robust regression. Although the principles behind</p><p>stl are straightforward, the details are quite complicated.</p><p>Smoothing procedures such as the centred moving average and loess do</p><p>not require a predetermined model, but they do not produce a formula that</p><p>can be extrapolated to give forecasts. Fitting a line to model a linear trend</p><p>has an advantage in this respect.</p><p>The term filtering is</p>
  • cms_files_48621_1671471958Tendencias_de_CS_e_CX_para_2023_por_SenseData
  • Historia_republica12 (78)
  • Caderno-de-questoes-Biologia_pag211
  • ATV1.ECONOMETRIA
  • CEA-Questoes_pag74
  • Caderno-de-questoes-Biologia_pag211
  • Apostila_de_Exercícios_de_Raciocínio (151)
  • A_Lista_de_Exerci_cios_de_QO_II
  • Econometria_I_Exercicios_para_revisao_e
  • A Voz do Silêncio - Helena P
  • atividades modulo 2
  • ECONOMETRIA
  • Estimação por MQO em Painéis
  • II- Entre os automatizados, os mais utilizados são Liqui- Prep, Gel hidroalcoólico, entre outros. III- Os métodos não automatizados incluem o ThinP...
  • A presença de autocorrelação nos resíduos gera diversos problemas. A respeito dos pressupostos desse problema de autocorrelação, considere as asser...
  • 5) "O BSC é capaz de traduzir as estratégias e serve como comunicador do desempenho. Nesse sistema, a comunicação é feita por uma estrutura lógica,...
  • Observe o quadro a seguir nesse caso , é possível verificar entre as duas variáveis uma relação
  • 10 Marcar para revisão A partir de 1930, com a posse do presidente Getulio Vargas, O Estado passa a se configurar como agente do desenvolvimento. D...
  • estudamos algoritimo preditivo com base em ( 1 )modelo de regressão linear e ( 2 ) modelos de regressão logistica
  • II- Entre os automatizados, os mais utilizados são Liqui- Prep, Gel hidroalcoólico, entre outros. III- Os métodos não automatizados incluem o ThinP...
  • Dona Arminda é mãe de 4 filhos. Cada um de seus filhos teve 3 filhos. Cada um de seus netos teve 2 filhos. Considerando que todos estão vivos, o nú...
  • Assinale a principal e mais comum preocupacao de modelos de forma refusing:
  • Os Agentes Comunitários de Saúde por conhecerem geograficamente bem a região onde atual e estarem inseridos na comunidade são considerados O eld en...
  • I. O processo xt é trivialmente estacionário (também chamado de estacionariedade fraca) e é inversível somente para θ < 1. II. yt segue um process...
  • O estudo de séries temporais envolve reconhecer os padrões das séries ruído branco, passeio aleatório sem drift, passeio aleatório com drift e quai...
  • Acerca da estacionariedade de séries temporais, é correto afirmar que: A estacionariedade fraca não é necessária para a modelagem em séries tempora...
  • Port 6 - Vanessa Garcia - Cálculo Integral II
  • Matemática do Vigor - Apostila de exercícios

Perguntas dessa disciplina

Grátis

What are the two main types of quantitative forecasting techniques? Time series: They seek to mathematically model the future demand by relating t...

Grátis

Event Search data is recorded with which time zone? Event Search data is recorded with UTC (Coordinated Universal Time) time zone.PST (Pacific St...
You are using Clock In Clock Out Group in Job Info. What is the correct combination that you are allowed to use? A. Time Recording Variant: Clock...
Combine column A (time series) with column B (involved phenomena) and then choose the alternative that presents the correct sequence.A - Secular t...
What is the recommended frequency for nutritional therapy (NT) in patients with DM? The ADA recommends that NT be provided by a nutritionist, in a...
Metcalfe Cowpertwait 2009 Introductory Time Series with R - Econometria (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Melvina Ondricka

Last Updated:

Views: 5481

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Melvina Ondricka

Birthday: 2000-12-23

Address: Suite 382 139 Shaniqua Locks, Paulaborough, UT 90498

Phone: +636383657021

Job: Dynamic Government Specialist

Hobby: Kite flying, Watching movies, Knitting, Model building, Reading, Wood carving, Paintball

Introduction: My name is Melvina Ondricka, I am a helpful, fancy, friendly, innocent, outstanding, courageous, thoughtful person who loves writing and wants to share my knowledge and understanding with you.