This python 2 code generates random time series data with a certain noise:
from common import arbitrary_timeseries
from commonrandom import generate_trendy_price
from matplotlib.pyplot import show
ans=arbitrary_timeseries(generate_trendy_price(Nlength=180, Tlength=30, Xamplitude=10.0, Volscale=0.1))
ans.plot()
show()
Output:
Does someone know how I can generate this data in python 3?
You can use simple Markov process like this one:
import random
def random_timeseries(initial_value: float, volatility: float, count: int) -> list:
time_series = [initial_value, ]
for _ in range(count):
time_series.append(time_series[-1] + initial_value * random.gauss(0, 1) * volatility)
return time_series
ts = random_timeseries(1.2, 0.15, 100)
Now you have list with random values which can be zipped with any timestamps.
Related
I was working with NumPy and Pandas to create some artificial data for testing models.
First, I coded this:
# Constructing some random data for experiments
import math
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(42)
# Rectangular Data
total_n = 500
x = np.random.rand(total_n)*10
y = np.random.rand(total_n)*10
divider = 260
# Two lambda functions are for shifting the data, the numbers are chosen arbitrarily
f = lambda a: a*2
x[divider:] = f(x[divider:])
y[divider:] = f(y[divider:])
g = lambda a: a*3 + 5
x[:divider] = g(x[:divider])
y[:divider] = g(y[:divider])
# Colours array for separating the data
colors = ['blue']*divider + ['red']*(total_n-divider)
squares = np.array([x,y])
plt.scatter(squares[0],squares[1], c=colors, alpha=0.5)
I got what I wanted:
The Data I wanted
But I wanted to add the colors array to the numpy array, to take it as a Label variable so I added this to the code:
# Constructing some random data for experiments
import math
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(42)
# Rectangular Data
total_n = 500
x = np.random.rand(total_n)*10
y = np.random.rand(total_n)*10
divider = 260
# Two lambda functions are for shifting the data, the numbers are chosen arbitrarily
f = lambda a: a*2
x[divider:] = f(x[divider:])
y[divider:] = f(y[divider:])
g = lambda a: a*3 + 5
x[:divider] = g(x[:divider])
y[:divider] = g(y[:divider])
# Colours array for separating the data
colors = ['blue']*divider + ['red']*(total_n-divider)
squares = np.array([x,y,colors])
plt.scatter(squares[0],squares[1], c=colors, alpha=0.5)
And everything just blows out:
The Blown out Data
I got my work around this by separating the label from the whole numpy array. But still what's going on here??
Alright so I think I have the answer. A Numpy array can only have one type of data which is infered when creating the array if it is not given. When you create squares with colors in it, then squares.dtype='<U32', which means that all values are converted to a little-endian 32 character string.
To avoid that you can:
use a simple list
use a pandas dataframe, as they accept columns of different types
if you want to use numpy you can use a structured array as follow
zipped = [z for z in zip(x, y, colors)]
#input must be a list of tuples/list representing rows
#the transformation is made with zip
dtype = np.dtype([('x', float), ('y', float), ('colors', 'U10')])
#type of data, 10 characters string is U10
squares = np.array(zipped, dtype=dtype)
#creating the array by precising the type
plt.scatter(squares["x"],squares["y"], c=squares["colors"], alpha=0.5)
#when plotting call the corresponding column, just as in a dataframe
Intro
I have some range of frequencies that goes from freq_start_hz = X to freq_stop_hz = Y.
I am trying to logarithmically (base 10) space out samples between the range [freq_start_hz, freq_stop_hz], based on a number of samples per decade (num_samp_per_decade), inclusive of the endpoint.
I noticed numpy has a method logspace (link) which enables you to create logarithmic divisions of some range base ** start to base ** stop based on a total number of samples, num.
Can you help me create Python code that will create even logarithmic spacing per decade?
Example
freq_start_hz = 10, freq_stop_hz = 100, num_samp_per_decade = 5
This is easy, since it's just one decade. So one could create it using the following:
import numpy as np
from math import log10
freq_start_hz = 10
freq_stop_hz = 100
num_samp_per_decade = 5
freq_list = np.logspace(
start=log10(freq_start_hz),
stop=log10(freq_stop_hz),
num=num_samp_per_decade,
endpoint=False,
base=10,
)
freq_list = np.append(freq_list, freq_stop_hz) # Appending end
print(freq_list.tolist())
Output is [10.0, 17.78279410038923, 31.622776601683793, 56.23413251903491, 100.0]
Note: this worked nicely because I designed it this way. If freq_start_hz = 8, this method no longer works since it now spans multiple decades.
Conclusion
I am hoping somewhere out there, there's a premade method in math, numpy, another scipy library, or some other library that my internet searching hasn't turned up.
Calculate the number of points based on the number of decades in the range.
from math import log10
import numpy as np
start = 10
end = 1500
samples_per_decade = 5
ndecades = log10(end) - log10(start)
npoints = int(ndecades) * samples_per_decade
#a = np.linspace(log10(start), log10(end), num = npoints)
#points = np.power(10, a)
points = np.logspace(log10(start), log10(end), num=npoints, endpoint=True, base=10)
print(points)
I am trying to do an exponential smothing in Python on some detrended data on a Jupyter notebook. I try to import
from statsmodels.tsa.api import ExponentialSmoothing
but the following error comes up
ImportError: cannot import name 'SimpleExpSmoothing'
I don't know how to solve that problem from a Jupyter notebook, so I am trying to declare a function that does the exponential smoothing.
Let's say the function's name is expsmoth(list,a) and takes a list list and a number a and gives another list called explist whose elements are given by the following recurrence relation:
explist[0] == list[0]
explist[i] == a*list[i] + (1-a)*explist[i-1]
I am still leargnin python. How to declare a function that takes a list and a number as arguments and gives back a list whose elements are given by the above recurrence relation?
A simple solution to your problem would be
def explist(data, a):
smooth_data = data.copy() # make a copy to avoid changing the original list
for i in range(1, len(data)):
smooth_data[i] = a*data[i] + (1-a)*smooth_data[i-1]
return smooth_data
The function should work with both native python lists or numpy arrays.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.random(100) # some random data
smooth_data = explist(data, 0.2)
plt.plot(data, label='orginal')
plt.plot(smooth_data, label='smoothed')
plt.legend()
plt.show()
I am trying to graph the time needed for python to computer the factorial of integers between 1 and 150.
My script calculates the different time just fine and I am able to print them but when I try to graph it I am getting a value error, saying that my sequence is too large.
How can I solve this?
This is my code:
import numpy as np
import time
start_time = time.time()
n = np.linspace(1,151)
for i in range(151) :
np.math.factorial(i)
dt = ((time.time()-start_time))
plot(n,dt)
You need to collect your run times in a list dt to plot them:
import numpy as np
import timeit
from matplotlib import pyplot as plt
start_time = timeit.default_timer()
r = range(1, 151)
dt = []
for i in r:
np.math.factorial(i)
dt.append(timeit.default_timer()-start_time)
plt.plot(r, dt)
Result:
For a given Series I want to change the value of each element around it's current value and then calculate an arbitrary function (here std) as shown in the following code:
import pandas as pd
import numpy as np
a = pd.Series(np.random.randn(10))
perturb = {}
for item in range(2,len(a)):
serturb = {}
for ep in np.arange(-1,1,0.1):
temp = a.ix[0:item]
temp.iloc[-1] += ep
serturb[ep] = temp.std()
perturb[item] = pd.Series(serturb)
perturb = pd.DataFrame(perturb).T
The above code will become too slow for a large amount of data. The above process, when applied on a DataFrame would return a Panel. Is there an efficient way of doing it, since a lot of the calculations are being repeated?