Autocorrelation is often used in signal processing for analyzing functions or series
of values, such as time domain signals.
This can be useful in digital signal processing, stock and other market prices, and
other uses . AutoCorrelation can be easily performed in an Excel worksheet using
the CORREL function, but some of the literature I've read indicates that there
are some deficiencies in the formula used in Excel. Consequently, I felt the
need to construct my own AutoCorrelation method in C# using standardized and
accepted numerical methods.
Basically in order to perform an autocorrelation on a series, we need to work with
a series that is exactly one - half the length of the original. This is because
we need to pass in two arrays of type double, and on each successive iteration
the second series is then shifted forward by "one notch". This is what
gives us our autocorrelation output series which consists of the Pearson (Correlation
coefficient) values of each correlation at successive intervals as described
above . So for example if we have a data series of 1,000 points, we would be
able to produce an autocorrelation on 500 of them.
I've used this for years to "scope out" potential cycle periods in series
like sunspot numbers, interest rates, and stock prices, along with other numerical
processing techniques such as FFT (Fast Fourier Transform).
Here is the entire class, in a Console Application in C#:
using System;
using System.IO;
namespace AutoCorrelation
{
public class AutoCorrelation
{
public static void Main()
{
Console.WriteLine("Processing input file...");
string[] s = File.ReadAllLines("t.txt");
double[] x= new double[s.Length ];
for( int j =0;j<s.Length ;j++)
{
x[j] = double.Parse(s[j]);
if(j % 1000==0)
Console.WriteLine(j);
}
Console.WriteLine("Computing Autocorrelation...");
var q = AutoCorrelation.GetAutoCorrelationOfSeries(x);
File.Delete("result.txt");
for (int i = 0; i < q.Length; i++)
{
Console.WriteLine(q[i]);
File.AppendAllText("result.txt", q[i].ToString() + "\r\n");
}
Console.WriteLine("DONE");
}
public static double GetAverage( double[] data )
{
int len = data.Length;
if ( len == 0 )
throw new Exception("No data");
double sum = 0;
for ( int i = 0; i < data.Length; i++ )
sum += data[i];
return sum / len;
}
public static double GetVariance( double[] data )
{
int len = data.Length;
// Get average
double avg = GetAverage( data );
double sum = 0;
for ( int i = 0; i < data.Length; i++ )
sum += System.Math.Pow( ( data[i] - avg ), 2 );
return sum / len;
}
public static double GetStdev( double[] data )
{
return Math.Sqrt( GetVariance( data ) );
}
public static double GetCorrelation( double[] x, double[] y)
{
if ( x.Length != y.Length )
throw new Exception("Length of sources is different");
double avgX = GetAverage( x );
double stdevX = GetStdev( x );
double avgY = GetAverage( y );
double stdevY = GetStdev( y );
double covXY = 0;
double pearson = 0;
int len = x.Length;
for ( int i = 0; i < len; i++ )
covXY += ( x[i] - avgX ) * ( y[i] - avgY );
covXY /= len;
pearson = covXY / ( stdevX * stdevY );
return pearson;
}
public static double[] GetAutoCorrelationOfSeries (double[] x)
{
int half = (int) x.Length/2;
double[] autoCorrelation = new double[half];
double[] a = new double[half];
double[] b = new double[half];
for (int i = 0; i < half; i++)
{
a[i] = x[i];
b[i] = x[i + i];
autoCorrelation[i] = GetCorrelation(a, b);
if(i%1000==0)
Console.WriteLine(i);
}
return autoCorrelation;
}
}
}
We could also do this in .NET 4.0 using the Task Parallel library as follows:
public static double[] GetAutoCorrelationOfSeries(double[] x)
{
int half = (int) x.Length / 2;
Task[] tasks = new Task[half];
double[] autoCorrelation = new double[half];
double[] a = new double[half];
double[] b = new double[half];
for (int i = 0; i < half; i++)
{
a[i] = x[i];
b[i] = x[i + i];
var task = Task.Factory.StartNew(() => DoSomeWork(a, b), TaskCreationOptions.LongRunning);
tasks[i] = task;
autoCorrelation[i] = task.Result;
if (i % 1000 == 0)
Console.WriteLine(i);
}
Task.WaitAll(tasks);
return autoCorrelation;
}
public static double DoSomeWork(double[] a, double[] b)
{
return GetCorrelation(a, b);
}
In order to get the Correlation Coefficient (Pearson number) of a time series, we
need to be able to compute the average, variance, and standard deviation of the
series. The Autocorrelation method simply takes the input series, splits it into
two arrays, and then steps through the computation each time incrementing the
starting point of the second series by one. The first coefficient in the resultset
will always be 1.0, since the data is being correlated with an exact copy of
itself. Each subsequent coefficient in the resultset will be different as the
second "half" series is moved forward one item. It is the slope and
particularly the peaks in the result series that are of interest. High relative
peaks indicate periodicities or fundamental frequencies of cycles in the data.
The result is saved to a text file in the \bin\debug folder of the application as
"results.txt". This can then be imported in to Excel, for example.
In this demo I've used Dow Jones Industrial Average adjusted daily closing prices
since 1928 as the input series -- 20,574 days of trading data. The resultant
Excel chart of the output looks like this:

You can see above that there are significant peaks at points 2618 and 8364. Since
there are 252 trading days in a year, these correspond to cycle periodicities
of 10.38 years and 33.19 years respectively. Additional analysis with other methods
will confirm that these are dominant long - term cycle periods in the stock market
as represented by the DJI.
There are other natural cycles on earth that closely match these periods, namely
the Solar cycle and other cycles such as El Niño, La Niña, and the Pacific Decadal
Oscillation.
Are long-term solar and terrestrial climatic cycles reflected in the stock market?
I don't know the answer, but it certainly would not surprise me. The oscillation
of the Sun around the Solar System's gravitational Center of Mass has long been
shown to have a predictable effect on the Sun's behavior, and, by association,
things like crop yields, interest rates, and weather patterns on earth.
Man's influence on earth's climate as described by alarmist global warming pundits
is infinitismal compared to these long term natural cosmic cycles.
You can download the Visual Studio 2010 Solution which contains the input series in the \bin\debug folder and experiment for yourself.