How to build multi-dimensional neural network using Python NLTK - python

As part of a larger project, I'm trying to build a neural network that is built using two independent training sets against one input training series. In other words, I have a single input array that I want to build two independent synapses to be trained against.
In simplest terms, it looks something like this:
x = [[0,0,0],[0,1,1],[0,1,0]...] y = [[0,1],[0,0],[1,0]...] z =
[[0,0,0,0,1],[1,0,0,0,0],[1,1,0,1,1]...]
where y and z would be trained as independent synapses against trainset x.
I've been unable to modify my code to get this to work and I know it can't be super complex but I have been stuck for over a day and need some help.
Below is the code that I can't seem to get to work using multiple dimensions.
# Import Necessary Modules for Function
import numpy as np
import time
# Computer a non-linear Sigmoid curve (__--)
def sigmoid(x):
sigmoidOutput = 1/(1+np.exp(-x))
return sigmoidOutput
# Convert the output of the signmoid function to its derivative
def sigmoidDerivative(sigmoidOutput):
return sigmoidOutput*(1-sigmoidOutput)
def cleanSentence(sentence):
# Tokenize the words within the input sentence
sentenceWords = nltk.word_tokenize(sentence)
# Stem the words within the tokenized setence
sentenceWords = [stemmer.stem(userInput.lower()) for userInput in sentenceWords]
return sentenceWords
# Return a binary bag of words [0 or 1] to evaluate whether or not a word exists within
# a sentence.
def bagWordCheck(sentence, userInput, show_details=False):
# Tokenize the sentence
sentenceWords = cleanSentence(sentence)
# Create a word bag using the training-data from the user-input
wordBag = [0]*len(userInput)
for sWord in sentenceWords:
for i,w in enumerate(userInput):
if w == sWord:
wordBag[i] = 1
if show_details:
print ("found in bag: %s" % w)
return(np.array(wordBag))
# Evaluate the user's input
def think(sentence, showDetails=False):
x = bagWordCheck(sentence.lower(), userInput, showDetails)
if showDetails:
print ("sentence:", sentence, "\n bagWordCheck:", x)
# input layer is our bag of words
l0 = x
# matrix multiplication of input and hidden layer
l1 = sigmoid(np.dot(l0, synapse0))
# output layer
l2 = sigmoid(np.dot(l1, synapse2))
return l2
# ANN and Gradient Descent code from https://iamtrask.github.io//2015/07/27/python-network-part2/
def train(X, y, hidden_neurons=10, alpha=1, epochs=50000, dropout=False, dropout_percent=0.5):
print ("Training with %s neurons, alpha:%s, dropout:%s %s" % (hidden_neurons, str(alpha), dropout, dropout_percent if dropout else '') )
print ("Input matrix: %sx%s Output matrix: %sx%s" % (len(X),len(X[0]),1, len(conversationTypes)) )
np.random.seed(1)
lastMeanError = 1
# randomly initialize our weights with mean 0
synapse0 = 2*np.random.random((len(X[0]), hidden_neurons)) - 1
synapse2 = 2*np.random.random((hidden_neurons, len(conversations))) - 1
#synapse2 = 2*np.random.random((hidden_neurons, len(conversationTypes))) - 1
#synapse2 = 2*np.random.random((hidden_neurons, len(conversationSubjects))) - 1
prev_synapse0_weight_update = np.zeros_like(synapse0)
prev_synapse2_weight_update = np.zeros_like(synapse2)
synapse0_direction_count = np.zeros_like(synapse0)
synapse2_direction_count = np.zeros_like(synapse2)
for j in iter(range(epochs+1)):
# Feed forward through layers 0, 1, and 2
layer_0 = X
layer_1 = sigmoid(np.dot(layer_0, synapse0))
if(dropout):
layer_1 *= np.random.binomial([np.ones((len(X),hidden_neurons))],1-dropout_percent)[0] * (1.0/(1-dropout_percent))
layer_2 = sigmoid(np.dot(layer_1, synapse2))
# how much did we miss the target value?
layer_2_error = y - layer_2
if (j% 10000) == 0 and j > 5000:
# if this 10k iteration's error is greater than the last iteration, break out
if np.mean(np.abs(layer_2_error)) < lastMeanError:
print ("delta after "+str(j)+" iterations:" + str(np.mean(np.abs(layer_2_error))) )
lastMeanError = np.mean(np.abs(layer_2_error))
else:
print ("break:", np.mean(np.abs(layer_2_error)), ">", lastMeanError )
break
# in what direction is the target value?
# were we really sure? if so, don't change too much.
layer_2_delta = layer_2_error * sigmoidDerivative(layer_2)
# how much did each l1 value contribute to the l2 error (according to the weights)?
layer_1_error = layer_2_delta.dot(synapse2.T)
# in what direction is the target l1?
# were we really sure? if so, don't change too much.
layer_1_delta = layer_1_error * sigmoidDerivative(layer_1)
synapse2_weight_update = (layer_1.T.dot(layer_2_delta))
synapse0_weight_update = (layer_0.T.dot(layer_1_delta))
if(j > 0):
synapse0_direction_count += np.abs(((synapse0_weight_update > 0)+0) - ((prev_synapse0_weight_update > 0) + 0))
synapse2_direction_count += np.abs(((synapse2_weight_update > 0)+0) - ((prev_synapse2_weight_update > 0) + 0))
synapse2 += alpha * synapse2_weight_update
synapse0 += alpha * synapse0_weight_update
prev_synapse0_weight_update = synapse0_weight_update
prev_synapse2_weight_update = synapse2_weight_update
now = datetime.datetime.now()
# persist synapses
synapse = {'synapse0': synapse0.tolist(), 'synapse2': synapse2.tolist(),
'datetime': now.strftime("%Y-%m-%d %H:%M"),
'userInput': userInput,
'conversations' : conversations,
'conversationTypes': conversationTypes,
'conversationSubjects' : conversationSubjects
}
synapse_file = "synapses.json"
with open(synapse_file, 'w') as outfile:
json.dump(synapse, outfile, indent=4, sort_keys=True)
print ("saved synapses to:", synapse_file)
X = np.array(trainingSet)
y = np.array(completeConversations)
y1 = np.array(completeTypes)
y2 = np.array(completeSubjects)
start_time = time.time()
train(X, y, hidden_neurons=5, alpha=0.1, epochs=30000, dropout=False, dropout_percent=0.2)
#train(X, y1, hidden_neurons=5, alpha=0.1, epochs=30000, dropout=False, dropout_percent=0.2)
#train(X, y2, hidden_neurons=5, alpha=0.1, epochs=30000, dropout=False, dropout_percent=0.2)
elapsed_time = time.time() - start_time
print ("processing time:", elapsed_time, "seconds")
# probability threshold
ERROR_THRESHOLD = 0.75
# load our calculated synapse values
synapse_file = 'synapses.json'
with open(synapse_file) as data_file:
synapse = json.load(data_file)
synapse0 = np.asarray(synapse['synapse0'])
synapse2 = np.asarray(synapse['synapse2'])
def classify(sentence, showDetails=False):
results = think(sentence, showDetails)
results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD ]
results.sort(key=lambda x: x[1], reverse=True)
return_results =[[conversations[r[0]],r[1]] for r in results]
#return_results =[[conversationSubjects[r[0]],r[1]] for r in results]
#return_results =[[conversationTypes[r[0]],r[1]] for r in results]
return return_results
classify("charlotte explorer is not letting me ")
Please help!

Related

How to get shape of previous layer and pass it to next layer?

I want to take the shape of Input data which is passed to Input layer with (None,) shape, and use it in a for loop for some purpose.
Here's part of my code implementation:
lst_rfrm = []
Inpt_lyr = keras.Input(shape = (None,))
for k in range(tm_stp):
F = keras.layers.Lambda(lambda x, i, j: x[:, None, j : j + i])
F.arguments = {'i' : sub_len, 'j' : k}
tmp_rfrm = F(Inpt_lyr)
lst_rfrm.append(tmp_rfrm)
cnctnt_lyr = keras.layers.merge.Concatenate(axis = 1)(lst_rfrm)
#defining other layers ...
because the Input shape is (None,), I don't know what to give to for loop as range( at the code i describe it with 'tm_stp'). how can i get the shape of the input layer (the data that is passed to input layer) in this situation?
any help is deeply appreciated
You can try a different type of loop. It seems you are trying sliding windows, right?
You don't know the "length" to run, but you know the window size and how much of the borders to remove... so....
This function gets the slices following that principle:
windowSize = sub_len
def getWindows(x):
borderCut = windowSize - 1 #lost length in the length dimension
leftCut = range(windowSize) #start of sequence
rightCut = [i - borderCut for i in leftCut] #end of sequence - negative
rightCut[-1] = None #because it can't be zero for slicing
croppedSequences = K.stack([x[:, l: r] for l,r in zip(leftCut, rightCut)], axis=-1)
return croppedSequences
Running test:
from keras.layers import *
from keras.models import Model
import keras.backend as K
import numpy as np
windowSize = 3
batchSize = 5
randomLength = np.random.randint(5,10)
inputData = np.arange(randomLength * batchSize).reshape((batchSize, randomLength))
def getWindows(x):
borderCut = windowSize - 1
leftCut = range(windowSize)
rightCut = [i - borderCut for i in leftCut]
rightCut[-1] = None
croppedSequences = K.stack([x[:, l: r] for l,r in zip(leftCut, rightCut)], axis=-1)
return croppedSequences
inputs = Input((None,))
outputs = Lambda(getWindows)(inputs)
model = Model(inputs, outputs)
preds = model.predict(inputData)
for i, (inData, pred) in enumerate(zip(inputData, preds)):
print('sample: ', i)
print('input sequence: ', inData)
print('output sequence: \n', pred, '\n\n')

My neural network only returns 0, 1 or other constant value

I'm trying to create a network, that would help predict stock prices the following day. My input data are: open, high, low and close stock values, volume, index values, a few technical indicators and exchange rate; the output is closing price from the next day. I'm using data uploaded from Excel file.
I wrote a program, that I will paste below, but it doesn't seem to be working correctly. Network always returns 1, 0 or other constant value (between 0 - 1).
I took the following steps so far:
tried to normalise the data like so: X_norm = X/(10 ** d) where d is the smallest number for which this conditon is met: abs(X_norm) < 1. I did that for the whole set in Excel before dividing it into training and test.
shuffled the data before dividing it into training/test, so that learning examples are not from consecutive days
running the network on a smaller data set and on example data set (I generated random numbers and did a simple math using them for an output and tried running network with that)
changing amount of hidden neurons
chaninging number of iterations (up to a 1000, which was a lot for my computer considering the data set, so I didn't try any more because it would take too much time)
changing learning rate.
No matter what steps I took the outcome was always the same. I think my problem could be that I don't have a bias, but perhaps I also have other mistakes in my code that are contributing to this error.
My program:
import numpy as np
import pandas as pd
df = pd.read_excel(r"path", sheet_name="DATA", index_col=0, header=0)
df = df.to_numpy()
np.random.shuffle(df)
X_data = df[:, 0:15]
X_data = X_data.reshape(1000, 1, 15)
print(f"X_data: {X_data}")
Y_data = df[:, 15]
Y_data = Y_data.reshape(1000, 1, 1)
print(f"Y_data: {Y_data}")
X = X_data[0:801]
x_test = X_data[801:]
y = Y_data[0:801]
y_test = Y_data[801:]
print(f"X_train: {X}")
print(f"x_test: {x_test}")
print(f"Y_train: {y}")
print(f"y_test: {y_test}")
rate = 0.2
class NeuralNetwork:
def __init__(self):
self.input_neurons = 15
self.hidden1_neurons = 10
self.hidden2_neurons = 5
self.output_neuron = 1
self.input_to_hidden1_w = (np.random.random((self.input_neurons, self.hidden1_neurons))) # 14x30
self.hidden1_to_hidden2_w = (np.random.random((self.hidden1_neurons, self.hidden2_neurons))) # 30x20
self.hidden2_to_output_w = (np.random.random((self.hidden2_neurons, self.output_neuron))) # 20x1
def activation(self, x):
sigmoid = 1/(1+np.exp(-x))
return sigmoid
def activation_d(self, x):
derivative = x * (1 - x)
return derivative
def feed_forward(self, X):
self.z1 = np.dot(X, self.input_to_hidden1_w)
self.z1_a = self.activation(self.z1)
self.z2 = np.dot(self.z1_a, self.hidden1_to_hidden2_w)
self.z2_a = self.activation(self.z2)
self.z3 = np.dot(self.z2_a, self.hidden2_to_output_w)
output = self.activation(self.z3)
return output
def backward(self, X, y, rate, output):
error = y - output
z3_error_delta = error * self.activation_d(output)
z2_error = np.dot(z3_error_delta, np.transpose(self.hidden2_to_output_w))
z2_error_delta = z2_error * self.activation_d(self.z2)
z1_error = np.dot(z2_error_delta, np.transpose(self.hidden1_to_hidden2_w))
z1_error_delta = z1_error * self.activation_d(self.z1)
self.input_to_hidden1_w += rate * np.dot(np.transpose(X), z1_error_delta)
self.hidden1_to_hidden2_w += rate * np.dot(np.transpose(self.z1), z2_error_delta)
self.hidden2_to_output_w += rate * np.dot(np.transpose(self.z2), z3_error_delta)
def train(self, X, y):
output = self.feed_forward(X)
self.backward(X, y, rate, output)
def save_weights(self):
np.savetxt("w1.txt", self.input_to_hidden1_w, fmt="%s")
np.savetxt("w2.txt", self.hidden1_to_hidden2_w, fmt="%s")
np.savetxt("w3.txt", self.hidden2_to_output_w, fmt="%s")
def check(self, x_test, y_test):
self.feed_forward(x_test)
np.mean(np.square((y_test - self.feed_forward(x_test))))
Net = NeuralNetwork()
for l in range(100):
for i, pattern in enumerate(X):
for j, outcome in enumerate(y):
print(f"#: {l}")
print(f'''
# {str(l)}
# {str(X[i])}
# {str(y[j])}''')
print(f"Predicted output: {Net.feed_forward(X[i])}")
Net.train(X[i], y[j])
print(f"Error training: {(np.mean(np.square(y - Net.feed_forward(X))))}")
Net.save_weights()
for i, pattern in enumerate(x_test):
for j, outcome in enumerate(y_test):
Net.check(x_test[i], y_test[j])
print(f"Error test: {(np.mean(np.square(y_test - Net.feed_forward(x_test))))}")

Python List repeating the last element inserted for whole list

I'm working around some neural network code. I've wrote my own neuron class, that could be find here. Now, I'm writing the Brain Class, that should sumarize most of code used in a NN. In this class, self.Real_Outputs collects all the outputs and put it into a list to be used after.
I'm boiling my brain to find out why when I add an element in self.Real_Outputs the whole list receive this value. I find here in this topic a discussion similar to mine, but, in my case i've already used 'self' statement. Could you guys help me on that?
class Brain:
def __init__(self, training_set, desired_outputs, bias, learning_tax):
self.Training_Set = training_set
self.Desired_Outputs = desired_outputs
self.Bias = bias
self.Learning_Tax = learning_tax
self.Hidden_Layer = []
self.Hidden_Layer_Outputs = []
self.Hidden_Layer_Errors = []
self.Output_Layer = []
self.Output_Layer_Outputs = []
self.Output_Layer_Errors = []
self.Real_Outputs = [0 for x in self.Desired_Outputs]
def set_hidden_layers(self, number_of_layers, number_of_neurons, activation_function):
self.Hidden_Layer = [[Neuron.Neuron(len(self.Training_Set[0]), activation_function, 1, self.Bias)
for x in range(number_of_neurons)]
for y in range(number_of_layers)]
self.Hidden_Layer_Outputs = [[0 for x in range(number_of_neurons)]
for y in range(number_of_layers)]
self.Hidden_Layer_Errors = [[0 for x in range(number_of_neurons)]
for y in range(number_of_layers)]
def set_output_layer(self, number_of_neurons, activation_function):
self.Output_Layer = [Neuron.Neuron(len(self.Hidden_Layer[0]), activation_function, 0, self.Bias)
for x in range(number_of_neurons)]
self.Output_Layer_Outputs = [0 for x in range(number_of_neurons)]
self.Output_Layer_Errors = [0 for x in range(number_of_neurons)]
def start_converging(self):
j=0
while j < 10:
# Here we're coming inside the training set. If was the n-th time
# you pass here, it's the n-th iteration over the Training Set.
# 'a' represents the Training Set index
for a in range(len(self.Training_Set)):
# Here we're running over the hidden layers
# 'b' represent the layer index
for b in range(len(self.Hidden_Layer)):
# Here we're running over the neurons in the layers
# 'c' represents the neuron index
for c in range(len(self.Hidden_Layer[b])):
if b == 0:
self.Hidden_Layer[b][c].initialize_inputs(self.Training_Set[a])
self.Hidden_Layer[b][c].get_sum()
self.Hidden_Layer_Outputs[b][c] = self.Hidden_Layer[b][c].get_output()
else:
self.Hidden_Layer[b][c].initialize_inputs(self.Hidden_Layer_Outputs[b-1])
self.Hidden_Layer[b][c].get_sum()
self.Hidden_Layer_Outputs[b][c] = self.Hidden_Layer[b][c].get_output()
# Here we're running over the output layer
# 'd' represents the neuron index
for d in range(len(self.Output_Layer)):
self.Output_Layer[d].initialize_inputs(self.Hidden_Layer_Outputs[-1])
self.Output_Layer[d].get_sum()
self.Output_Layer_Outputs[d] = self.Output_Layer[d].get_output()
self.Output_Layer_Errors[d] = self.Output_Layer[d].get_error(0, self.Desired_Outputs[a])
self.Output_Layer[d].update_weights(0, self.Learning_Tax)
self.Real_Outputs[a] = self.Output_Layer_Outputs
# We're updating the hidden layers now. Notice that we should pass backwards, from
# last to first, so, we're using [-(e+1)] indexes.
# '[-(e+1)]' represents the layers index.
for e in range(len(self.Hidden_Layer)):
for f in range(len(self.Hidden_Layer[-(e+1)])):
if e == 0:
self.Hidden_Layer_Errors[-(e + 1)][-(f + 1)] = self.Hidden_Layer[-(e + 1)][-(f + 1)].get_error(0, self.Output_Layer_Errors)
self.Hidden_Layer[-(e + 1)][-(f + 1)].update_weights(0, self.Learning_Tax)
else:
self.Hidden_Layer[-(e + 1)][-(f + 1)].get_error(0, self.Hidden_Layer_Errors[- (e + 1)])
self.Hidden_Layer[-(e + 1)][-(f + 1)].update_weights(0, self.Learning_Tax)
j += 1
print (self.Desired_Outputs)
print (self.Real_Outputs)

Using numpy vectorize

I'm trying to do some bayesian probit code using data augmentation. I can get it to work if I loop over the rows of the output matrix, but I'd like to vectorize it and do it all in one shot (presumably that's faster).
import numpy as np
from numpy import random
import statsmodels.api as sm
from scipy import stats
from scipy.stats import norm, truncnorm
##################################
### Create some simulated data ###
num_leg = 50
num_bills = 20
a = np.random.uniform(-1,1,num_bills).reshape(num_bills, 1)
b = np.random.uniform(-2,2,num_bills).reshape(num_bills, 1)
x = np.random.standard_normal(num_leg).reshape(num_leg, 1)
ystar_base = a + np.dot(b,x.T)
epsilon = np.random.standard_normal(num_leg * num_bills).reshape(num_bills, num_leg)
ystar = ystar_base + epsilon
y = 1*(ystar >0)
### Initialize some stuff I need ###
avec = [0]*num_bills # These are bill parameters
bvec = [0]*num_bills
betavec = [np.matrix(zip(avec,bvec))]
xvec = [0]*num_leg # these are legislator parameters
_ones = np.ones(num_leg)
def init_y(mat): # initialize a latent y matrix
if mat==1: return truncnorm.rvs(0,10000)
else: return truncnorm.rvs(-10000,0)
vectorize_y = np.vectorize(init_y)
latent_y = np.matrix(vectorize_y(y))
burn = 500 # How long to run the MCMC
runs = 500
### define the functions ###
def sample_params(xnow,ynow): # This is the function I'd like to vectorize
if type(xnow) == list:
xnow = np.array(xnow)
if type(ynow) == list:
ynow = np.array(ynow)
ynow = ynow.T #reshape(ynow.shape[0],1)
sigma = np.linalg.inv(np.dot(xnow.T,xnow)) ###This is the line that produces an error###
xy = np.dot(xnow.T,ynow)
mu = np.dot(sigma, xy) # this is just (x'x)inv x'y
return np.random.multivariate_normal(np.array(mu).flatten(), sigma)
vecparams = np.vectorize(sample_params)
def get_mu(xnow, bnow): # getting the updated mean to draw the latent ys
if type(xnow) == list:
xnow = np.array(xnow)
if type(bnow) == list:
bnow = np.array(bnow)
mu = np.dot(xnow,bnow.T)
mu = np.matrix(mu)
return mu
def sample_y(mu, ynow): # generate latent y matrix
if ynow==1:
a, b = (0 - mu),(10000-mu)
else:
a, b = (-10000 - mu),(0-mu)
return truncnorm.rvs(a,b)
vector_sample = np.vectorize(sample_y) # I'd like to be able to do something like this
### Here's the MCMC loop with the internal loop over rows(bills)
for i in range(burn+runs):
this_beta = []
this_x = []
this_y = []
for j in range(num_bills): #I'd like to get rid of this loop
ex = zip(x_ones, x)
newbeta = sample_params(ex, latent_y[j])
this_beta.append(newbeta)
#ex = np.array(zip(x_ones, x))
#this_beta = vecparams(ex, latent_y[:,]) # and call the vectorized function here
betavec.append(this_beta)
#Note, I can vectorize the latent outputs easily enough here
mean = get_mu(ex, betavec[-1])
latent_y = np.matrix(vector_sample(mean, np.matrix(y).T).T.reshape(latent_y.shape[0], latent_y.shape[1]))
### Now a bit of code to check to see if I've recovered what I want ###
test_beta = [zip(*(z)) for z in betavec[burn:]]
test_a = np.array([z[0] for z in test_beta])
test_b = np.array([z[1] for z in test_beta])
amean = test_a.sum(axis = 0)/float(runs)
bmean = test_b.sum(axis = 0)/float(runs)
print 'a mean'
print np.corrcoef([amean, np.array(a)])
print
print 'b mean'
print np.corrcoef([bmean, np.array(b)])
If I comment out the loop and use the commented out lines just above, I get the following error at the line I indicated earlier (the one that defines sigma):
LinAlgError: 0-dimensional array given. Array must be at least two-dimensional

Is there any numpy autocorrellation function with standardized output?

I followed the advice of defining the autocorrelation function in another post:
def autocorr(x):
result = np.correlate(x, x, mode = 'full')
maxcorr = np.argmax(result)
#print 'maximum = ', result[maxcorr]
result = result / result[maxcorr] # <=== normalization
return result[result.size/2:]
however the maximum value was not "1.0". therefore I introduced the line tagged with "<=== normalization"
I tried the function with the dataset of "Time series analysis" (Box - Jenkins) chapter 2. I expected to get a result like fig. 2.7 in that book. However I got the following:
anybody has an explanation for this strange not expected behaviour of autocorrelation?
Addition (2012-09-07):
I got into Python - programming and did the following:
from ClimateUtilities import *
import phys
#
# the above imports are from R.T.Pierrehumbert's book "principles of planetary
# climate"
# and the homepage of that book at "cambridge University press" ... they mostly
# define the
# class "Curve()" used in the below section which is not necessary in order to solve
# my
# numpy-problem ... :)
#
import numpy as np;
import scipy.spatial.distance;
# functions to be defined ... :
#
#
def autocorr(x):
result = np.correlate(x, x, mode = 'full')
maxcorr = np.argmax(result)
# print 'maximum = ', result[maxcorr]
result = result / result[maxcorr]
#
return result[result.size/2:]
##
# second try ... "Box and Jenkins" chapter 2.1 Autocorrelation Properties
# of stationary models
##
# from table 2.1 I get:
s1 = np.array([47,64,23,71,38,64,55,41,59,48,71,35,57,40,58,44,\
80,55,37,74,51,57,50,60,45,57,50,45,25,59,50,71,56,74,50,58,45,\
54,36,54,48,55,45,57,50,62,44,64,43,52,38,59,\
55,41,53,49,34,35,54,45,68,38,50,\
60,39,59,40,57,54,23],dtype=float);
# alternatively in order to test:
s2 = np.array([47,64,23,71,38,64,55,41,59,48,71])
##################################################################################3
# according to BJ, ch.2
###################################################################################3
print '*************************************************'
global s1short, meanshort, stdShort, s1dev, s1shX, s1shXk
s1short = s1
#s1short = s2 # for testing take s2
meanshort = s1short.mean()
stdShort = s1short.std()
s1dev = s1short - meanshort
#print 's1short = \n', s1short, '\nmeanshort = ', meanshort, '\ns1deviation = \n',\
# s1dev, \
# '\nstdShort = ', stdShort
s1sh_len = s1short.size
s1shX = np.arange(1,s1sh_len + 1)
#print 'Len = ', s1sh_len, '\nx-value = ', s1shX
##########################################################
# c0 to be computed ...
##########################################################
sumY = 0
kk = 1
for ii in s1shX:
#print 'ii-1 = ',ii-1,
if ii > s1sh_len:
break
sumY += s1dev[ii-1]*s1dev[ii-1]
#print 'sumY = ',sumY, 's1dev**2 = ', s1dev[ii-1]*s1dev[ii-1]
c0 = sumY / s1sh_len
print 'c0 = ', c0
##########################################################
# now compute autocorrelation
##########################################################
auCorr = []
s1shXk = s1shX
lenS1 = s1sh_len
nn = 1 # factor by which lenS1 should be divided in order
# to reduce computation length ... 1, 2, 3, 4
# should not exceed 4
#print 's1shX = ',s1shX
for kk in s1shXk:
sumY = 0
for ii in s1shX:
#print 'ii-1 = ',ii-1, ' kk = ', kk, 'kk+ii-1 = ', kk+ii-1
if ii >= s1sh_len or ii + kk - 1>=s1sh_len/nn:
break
sumY += s1dev[ii-1]*s1dev[ii+kk-1]
#print sumY, s1dev[ii-1], '*', s1dev[ii+kk-1]
auCorrElement = sumY / s1sh_len
auCorrElement = auCorrElement / c0
#print 'sum = ', sumY, ' element = ', auCorrElement
auCorr.append(auCorrElement)
#print '', auCorr
#
#manipulate s1shX
#
s1shX = s1shXk[:lenS1-kk]
#print 's1shX = ',s1shX
#print 'AutoCorr = \n', auCorr
#########################################################
#
# first 15 of above Values are consistent with
# Box-Jenkins "Time Series Analysis", p.34 Table 2.2
#
#########################################################
s1sh_sdt = s1dev.std() # Standardabweichung short
#print '\ns1sh_std = ', s1sh_sdt
print '#########################################'
# "Curve()" is a class from RTP ClimateUtilities.py
c2 = Curve()
s1shXfloat = np.ndarray(shape=(1,lenS1),dtype=float)
s1shXfloat = s1shXk # to make floating point from integer
# might be not necessary
#print 'test plotting ... ', s1shXk, s1shXfloat
c2.addCurve(s1shXfloat)
c2.addCurve(auCorr, '', 'Autocorr')
c2.PlotTitle = 'Autokorrelation'
w2 = plot(c2)
##########################################################
#
# now try function "autocorr(arr)" and plot it
#
##########################################################
auCorr = autocorr(s1short)
c3 = Curve()
c3.addCurve( s1shXfloat )
c3.addCurve( auCorr, '', 'Autocorr' )
c3.PlotTitle = 'Autocorr with "autocorr"'
w3 = plot(c3)
#
# well that should it be!
#
So your problem with your initial attempt is that you did not subtract the average from your signal. The following code should work:
timeseries = (your data here)
mean = np.mean(timeseries)
timeseries -= np.mean(timeseries)
autocorr_f = np.correlate(timeseries, timeseries, mode='full')
temp = autocorr_f[autocorr_f.size/2:]/autocorr_f[autocorr_f.size/2]
iact.append(sum(autocorr_f[autocorr_f.size/2:]/autocorr_f[autocorr_f.size/2]))
In my example temp is the variable you are interested in; it is the forward integrated autocorrelation function. If you want the integrated autocorrelation time you are interested in iact.
I'm not sure what the issue is.
The autocorrelation of a vector x has to be 1 at lag 0 since that is just the squared L2 norm divided by itself, i.e., dot(x, x) / dot(x, x) == 1.
In general, for any lags i, j in Z, where i != j the unit-scaled autocorrelation is dot(shift(x, i), shift(x, j)) / dot(x, x) where shift(y, n) is a function that shifts the vector y by n time points and Z is the set of integers since we're talking about the implementation (in theory the lags can be in the set of real numbers).
I get 1.0 as the max with the following code (start on the command line as $ ipython --pylab), as expected:
In[1]: n = 1000
In[2]: x = randn(n)
In[3]: xc = correlate(x, x, mode='full')
In[4]: xc /= xc[xc.argmax()]
In[5]: xchalf = xc[xc.size / 2:]
In[6]: xchalf_max = xchalf.max()
In[7]: print xchalf_max
Out[1]: 1.0
The only time when the lag 0 autocorrelation is not equal to 1 is when x is the zero signal (all zeros).
The answer to your question is: no, there is no NumPy function that automatically performs standardization for you.
Besides, even if it did you would still have to check it against your expected output, and if you're able to say "Yes this performed the standardization correctly", then I would assume that you know how to implement it yourself.
I'm going to suggest that it might be the case that you've implemented their algorithm incorrectly, although I can't be sure since I'm not familiar with it.

Categories

Resources