Declare a function to do exponential smothing on data - python

I am trying to do an exponential smothing in Python on some detrended data on a Jupyter notebook. I try to import
from statsmodels.tsa.api import ExponentialSmoothing
but the following error comes up
ImportError: cannot import name 'SimpleExpSmoothing'
I don't know how to solve that problem from a Jupyter notebook, so I am trying to declare a function that does the exponential smoothing.
Let's say the function's name is expsmoth(list,a) and takes a list list and a number a and gives another list called explist whose elements are given by the following recurrence relation:
explist[0] == list[0]
explist[i] == a*list[i] + (1-a)*explist[i-1]
I am still leargnin python. How to declare a function that takes a list and a number as arguments and gives back a list whose elements are given by the above recurrence relation?

A simple solution to your problem would be
def explist(data, a):
smooth_data = data.copy() # make a copy to avoid changing the original list
for i in range(1, len(data)):
smooth_data[i] = a*data[i] + (1-a)*smooth_data[i-1]
return smooth_data
The function should work with both native python lists or numpy arrays.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.random(100) # some random data
smooth_data = explist(data, 0.2)
plt.plot(data, label='orginal')
plt.plot(smooth_data, label='smoothed')
plt.legend()
plt.show()

Related

python: Plotting and optimizing the same function

Lets say I have the following function:
def f(x):
return log(3*exp(3*x) + 7*exp(7*x))
I want to do two things:
1) plot the function over a range of x-values
2) find the root of the function using the Newton method from scipy
My problem is that it seems that plotting is best done with a numpy array x=np.linspace(-2,2,1000), but then evaluating the function results in erros TypeError: only size-1 arrays can be converted to Python scalars. I can fix this by simply changing log and exp to np.log and np.exp, respectively.
But doing so then makes scipy.optimize.newton unhappy.
It seems like I need to define the function twice, once for use in plotting (with np. ...) and once for optimizing in the form given above.
I can't imagine that this is actually the case. Any hints would be greatly appreciated.
Seems legit, you just need to use numpy functions instead of base math functions:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
%matplotlib inline
def f(x):
return np.log(3*np.exp(3*x) + 7*np.exp(7*x))
x = np.linspace(-2,2,1000)
y = f(x)
plt.scatter(x, y)
optimize.root(f, 1)

I am beginner, and have a question related to plotting in Python

I am new to python.
I wanted to know the syntax for a problem
Suppose I want to plot a quantity x = (constant with a fixed given value) * ln (1+z) versus z (which varies from c to d)
How do I define the variables x and z, how do I input an 'ln' function
I have imported numpy, scipy and matplotlib, but do not know how to proceed thereafter
Since you already imported numpy, here is just another answer:
import numpy as np
import matplotlib.pyplot as plt
x_coeff = 10
c = 0
d = 100
z = [i for i in range(c, d)]
x = [x_coeff * np.log(1+v) for i, v in enumerate(z)]
plt.plot(z, x)
plt.show()
It's always better to check the documents, and give out your first try:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html
You might also need to understand "list comprehension".
It's a beautiful and convenient way to create list in python.
For plotting a curve, you need two lists, one of them is domain on x axis and the other is range points on y-axis. first we take a constant as input,using python inbuilt input function and make sure that it is int, use math library and log function to do log as need.
import math
import matplotlib.pyplot as plt
a = int(input("enter a value for constant : "))
c,d = 0,100
xvals = list(range(c,d,1)) # start,end,step
print(xvals)
yvals = [a*math.log(1+x) for x in xvals]
print(yvals)
plt.plot(xvals,yvals)
plt.show()

Exporting large array variables (type = object) to CSV files

I have used Gekko from APM in Python to solve an optimization problem. The two main decision variables (DVs) are large arrays. The problem has converged successfully, however, I need the results of these tables in an excel worksheet for further work.
An example variable name is 's'. Since the arrays created within Gekko are GKVariable/Object variable types I cannot simply use:
pd.DataFrame(s).to_csv(r'C:\Users\...\s.csv')
because the result gives every cell of the array the label of each variable defined in the model (i.e. v1, v2, etc.)
Using print 's' within the kernel will show the numbers of the array from the optimization results but in a format that doesn't guarantee that each line is a new row of the matrix because of the many columns.
Is there another solution to copy just the resulting value of the DV 's' so it becomes a normal np.array instead of the object type variable? Open to any ideas for this.
You can use s[i].value[0]`` for steady state problems (IMODE=1orIMODE=3) ors[i].value[:]``` to access the array of values for all other IMODE options. Here is a simple example with writing the results to a file with pandas and numpy.
import numpy as np
from gekko import GEKKO
import pandas as pd
m = GEKKO(remote=False)
# Random 3x3
A = np.random.rand(3,3)
# Random 3x1
b = np.random.rand(3,1)
# Ax = b
y = m.axb(A,b)
m.solve()
yn = [y[i].value[0] for i in range(3)]
print(yn)
pd.DataFrame(yn).to_csv(r'y1.csv')
np.savetxt('y2.csv',yn,delimiter=',',comments='')

python - What produces the same plot as autocorrelation_plot()?

I need the values of the autocorrelation coefficients coming from the autocorrelation_plot(). The problem is that the output coming from this function is not accessible, so I need another function to get such values. That's why I used acf() from statsmodels but it didn't get the same plot as autocorrelation_plot() does. Here is my code:
from statsmodels.tsa.stattools import acf
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt
import numpy as np
y = np.sin(np.arange(1,6*np.pi,0.1))
plt.plot(acf(y))
plt.show()
So the result is not the same as this:
autocorrelation_plot(y)
plt.show()
This seems to be related to the nlags parameter of acf:
nlags: int, optional
Number of lags to return autocorrelation for.
I don't know what exactly this does but in the source of acf there is a slicing
that shortens the array:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
acf = avf[:nlags + 1] / avf[0]
If you use statsmodels.tsa.stattools.acovf directly the result is the same as with autocorrelation_plot:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
So you can call it like
plt.plot(acf(y, nlags=len(y)))
to make it work.
An explanation of lag: https://math.stackexchange.com/questions/2548314/what-is-lag-in-a-time-series/2548350

imshow() returns invalid dimensions for 2D array when using multiprocessing.Pool

I'm trying to use the multiprocessing module to create figures from 2D arrays faster. In the code below I create a 2D array from a hdf5 data file (please message me if you would like a sample file to test on). Using multiprocessing.Pool, I try to pass this array to the map function but it raises TypeError: Invalid dimensions for image data. I've checked to make sure my array is 2 dimensions using da.shape, so I'm not sure why it isn't working for me. Any help is much appreciated!
To import yt, see yt-project.org/#getyt.
P.S. This is my first question on Stack Overflow so please let me know if/how I can improve.
import yt
import numpy as np
import multiprocessing
from multiprocessing import Pool, Process, Array
fl_nm = raw_input("enter filename: ").strip()
level = int(raw_input("resolution level: ").strip())
ds = yt.load(fl_nm)
all_data_level_x = ds.covering_grid(level=level,left_edge=[-3.70281620e+21,0.00000000e+00,-3.70281620e+21],dims=ds.domain_dimensions*2**level)
disp_array = []
for x in xrange(0,16*2**level):
vbin = []
for z in xrange(0,80*2**level):
v = []
for y in xrange(0,8*2**level):
vel = all_data_level_x["velocity_magnitude"][x,y,z].in_units("km/s")
v.append(vel)
sigma = np.sqrt(np.sum((v - np.mean(v))**2) / np.size(v))
vbin.append(sigma)
disp_array.append(vbin)
print "{0:.1f} %".format((x+1)*100/float(16*2**level))
da = np.array(disp_array)
print "fixed resolution array created"
def __main__(data_array):
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
plt.imshow(data_array, origin = "lower", aspect = "equal", extent=[-1.2,10.8,-1.2,1.2])
plt.colorbar(fraction=0.046, pad=0.04)
print "plot created. Saving figure..."
fig_nm = 'velocity_disp_{0}_lvl_{1}.png'.format(fl_nm[-4:],level)
plt.savefig(fig_nm)
plt.close()
print "File saved as: " + fig_nm
return
pool = multiprocessing.Pool(4)
pool.map(__main__,da)
pool.map(func, iterable[, chunksize]) iterates the da. So if da is a 2-D array like [[1,2],[3,4]]. The input of your __main__ function will be [1,2] and [3,4] for every process.
I'm not sure what you want to do, so if you really want to get a full help, you can upload your executable project(to github or something else, whatever) and I will check.

Categories

Resources