Autocorrelation method in C# for signal analysis

Autocorrelation (also known as serial correlation) is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal which has been buried under noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies.

Autocorrelation is often used in signal processing for analyzing functions or series of values, such as time domain signals.
This can be useful in digital signal processing, stock and other market prices, and other uses . AutoCorrelation can be easily performed in an Excel worksheet using the CORREL function, but some of the literature I've read indicates that there are some deficiencies in the formula used in Excel. Consequently, I felt the need to construct my own AutoCorrelation method in C# using standardized and accepted numerical methods.

Basically in order to perform an autocorrelation on a series, we need to work with a series that is exactly one - half the length of the original. This is because we need to pass in two arrays of type double, and on each successive iteration the second series is then shifted forward by "one notch". This is what gives us our autocorrelation output series which consists of the Pearson (Correlation coefficient) values of each correlation at successive intervals as described above . So for example if we have a data series of 1,000 points, we would be able to produce an autocorrelation on 500 of them.

I've used this for years to "scope out" potential cycle periods in series like sunspot numbers, interest rates, and stock prices, along with other numerical processing techniques such as FFT (Fast Fourier Transform).

Here is the entire class, in a Console Application in C#:

using System;
using System.IO;

namespace AutoCorrelation
{
public class AutoCorrelation
{
public static void Main()
{
Console.WriteLine("Processing input file...");
string[] s = File.ReadAllLines("t.txt");
double[] x= new double[s.Length ];
for( int j =0;j<s.Length ;j++)
{
x[j] = double.Parse(s[j]);
if(j % 1000==0)
Console.WriteLine(j);
}
Console.WriteLine("Computing Autocorrelation...");
var q = AutoCorrelation.GetAutoCorrelationOfSeries(x);
File.Delete("result.txt");
for (int i = 0; i < q.Length; i++)
{
Console.WriteLine(q[i]);
File.AppendAllText("result.txt", q[i].ToString() + "\r\n");
}
Console.WriteLine("DONE");
}

public static double GetAverage( double[] data )
{
int len = data.Length;

if ( len == 0 )
throw new Exception("No data");

double sum = 0;

for ( int i = 0; i < data.Length; i++ )
sum += data[i];

return sum / len;
}

public static double GetVariance( double[] data )
{
int len = data.Length;

// Get average
double avg = GetAverage( data );

double sum = 0;

for ( int i = 0; i < data.Length; i++ )
sum += System.Math.Pow( ( data[i] - avg ), 2 );

return sum / len;
}
public static double GetStdev( double[] data )
{
return Math.Sqrt( GetVariance( data ) );
}

public static double GetCorrelation( double[] x, double[] y)
{
if ( x.Length != y.Length )
throw new Exception("Length of sources is different");
double avgX = GetAverage( x );
double stdevX = GetStdev( x );
double avgY = GetAverage( y );
double stdevY = GetStdev( y );
double covXY = 0;
double pearson = 0;
int len = x.Length;
for ( int i = 0; i < len; i++ )
covXY += ( x[i] - avgX ) * ( y[i] - avgY );
covXY /= len;
pearson = covXY / ( stdevX * stdevY );
return pearson;
}

public static double[] GetAutoCorrelationOfSeries (double[] x)
{
int half = (int) x.Length/2;
double[] autoCorrelation = new double[half];
double[] a = new double[half];
double[] b = new double[half];
for (int i = 0; i < half; i++)
{
a[i] = x[i];
b[i] = x[i + i];
autoCorrelation[i] = GetCorrelation(a, b);
if(i%1000==0)
Console.WriteLine(i);
}
return autoCorrelation;
}
}
}

We could also do this in .NET 4.0 using the Task Parallel library as follows:

public static double[] GetAutoCorrelationOfSeries(double[] x)
{
int half = (int) x.Length / 2;
Task[] tasks = new Task[half];
double[] autoCorrelation = new double[half];
double[] a = new double[half];
double[] b = new double[half];
for (int i = 0; i < half; i++)
{

a[i] = x[i];
b[i] = x[i + i];

var task = Task.Factory.StartNew(() => DoSomeWork(a, b), TaskCreationOptions.LongRunning);
tasks[i] = task;
autoCorrelation[i] = task.Result;

if (i % 1000 == 0)
Console.WriteLine(i);
}
Task.WaitAll(tasks);
return autoCorrelation;
}

public static double DoSomeWork(double[] a, double[] b)
{
return GetCorrelation(a, b);
}

In order to get the Correlation Coefficient (Pearson number) of a time series, we need to be able to compute the average, variance, and standard deviation of the series. The Autocorrelation method simply takes the input series, splits it into two arrays, and then steps through the computation each time incrementing the starting point of the second series by one. The first coefficient in the resultset will always be 1.0, since the data is being correlated with an exact copy of itself. Each subsequent coefficient in the resultset will be different as the second "half" series is moved forward one item. It is the slope and particularly the peaks in the result series that are of interest. High relative peaks indicate periodicities or fundamental frequencies of cycles in the data.

The result is saved to a text file in the \bin\debug folder of the application as "results.txt". This can then be imported in to Excel, for example. In this demo I've used Dow Jones Industrial Average adjusted daily closing prices since 1928 as the input series -- 20,574 days of trading data. The resultant Excel chart of the output looks like this:

You can see above that there are significant peaks at points 2618 and 8364. Since there are 252 trading days in a year, these correspond to cycle periodicities of 10.38 years and 33.19 years respectively. Additional analysis with other methods will confirm that these are dominant long - term cycle periods in the stock market as represented by the DJI.

There are other natural cycles on earth that closely match these periods, namely the Solar cycle and other cycles such as El Niño, La Niña, and the Pacific Decadal Oscillation.

Are long-term solar and terrestrial climatic cycles reflected in the stock market? I don't know the answer, but it certainly would not surprise me. The oscillation of the Sun around the Solar System's gravitational Center of Mass has long been shown to have a predictable effect on the Sun's behavior, and, by association, things like crop yields, interest rates, and weather patterns on earth.

Man's influence on earth's climate as described by alarmist global warming pundits is infinitismal compared to these long term natural cosmic cycles.

You can download the Visual Studio 2010 Solution which contains the input series in the \bin\debug folder and experiment for yourself.

By Peter Bromberg   Popularity  (5485 Views)