I mean,
target_ZCR_mean = sample_dataframe_summary['ZCR'][1]
target_ZCR_std = sample_dataframe_summary['ZCR'][2]
lower_ZCR_lim = target_ZCR_mean - target_ZCR_std
upper_ZCR_lim = target_ZCR_mean + target_ZCR_std
target_RMS_mean = sample_dataframe_summary['RMS'][1]
target_RMS_std = sample_dataframe_summary['RMS'][2]
lower_RMS_lim = target_RMS_mean - target_RMS_std
upper_RMS_lim = target_RMS_mean + target_RMS_std
target_TEMPO_mean = sample_dataframe_summary['Tempo'][1]
target_TEMPO_std = sample_dataframe_summary['Tempo'][2]
lower_TEMPO_lim = target_TEMPO_mean - target_TEMPO_std
upper_TEMPO_lim = target_TEMPO_mean + target_TEMPO_std
target_BEAT_SPACING_mean = sample_dataframe_summary['Beat Spacing'][1]
target_BEAT_SPACING_std = sample_dataframe_summary['Beat Spacing'][2]
lower_BEAT_SPACING_lim = target_BEAT_SPACING_mean - target_BEAT_SPACING_std
upper_BEAT_SPACING_lim = target_BEAT_SPACING_mean + target_BEAT_SPACING_std
each block of four lines of code are very similar to each other except for a few characters.
Can I write a function, a class or some other piece of code, such that I can wrap just a template of four lines of code into it and let it modify itself during runtime to get the code do the work of the above code...?
By the way, I use python 3.6.
If you find yourself storing lots of variables like this, especially if they are related, there is almost certainly a better way of doing it. Modifying the source dynamically is never the solution. One way is to use a function to keep the repeated logic, and use a namedtuple to store the resultant data:
import collections
Data = collections.namedtuple('Data', 'mean, std, lower_lim, upper_lim')
def get_data(key, sample_dataframe_summary):
mean = sample_dataframe_summary[key][1]
std = sample_dataframe_summary[key][2]
lower_lim = mean - std
upper_lim = mean + std
return Data(mean, std, lower_lim, upper_lim)
zcr = get_data('ZCR', sample_dataframe_summary)
rms = get_data('RMS', sample_dataframe_summary)
tempo = get_data('Tempo', sample_dataframe_summary)
beat_spacing = get_data('Beat Spacing', sample_dataframe_summary)
then you can access the data with the . notation like zcr.mean and tempo.upper_lim
Related
I'm trying to read data from a csv and then process it on different way. (For starter just the average)
Data
(OneDrive) https://1drv.ms/u/s!ArLDiUd-U5dtg0teQoKGguBA1qt9?e=6wlpko
The data looks like this:
ID; Property1; Property2; Property3...
1; ....
1; ...
1; ...
2; ...
2; ...
3; ...
...
Every line is a GPS point. All points with same ID together (for example 1) produce one Route. The routes are not of the same length and some IDs are skipped. So it isn't a seamless increase of numbers.
I may need to add, that the points are ALWAYS the same set of meters apart from each other. And I don't need the XY information currently.
Wanted Result
In the end I want something like this:
[ID, AVG_Property1, AVG_Property2...] [1, 1.00595, 2.9595, ...] [2,1.50606, 1.5959, ...]
What I got so far
import os
import numpy
import pandas as pd
data = pd.read_csv(os.path.join('C:\\data' ,'data.csv'), sep=';')
# [id, len, prop1, prop2, ...]
routes = numpy.zeros((data.size, 10)) # 10 properties
sums = numpy.zeros(8)
nr_of_entries = 0;
current_id = 1;
for index, row in data.iterrows():
if(int(row['id']) != current_id): #after the last point of the route
routes[current_id-1][0] = current_id;
routes[current_id-1][1] = nr_of_entries; #how many points are in this route?
routes[current_id-1][2] = sums[0] / nr_of_entries;
routes[current_id-1][3] = sums[1] / nr_of_entries;
routes[current_id-1][4] = sums[2] / nr_of_entries;
routes[current_id-1][5] = sums[3] / nr_of_entries;
routes[current_id-1][6] = sums[4] / nr_of_entries;
routes[current_id-1][7] = sums[5] / nr_of_entries;
routes[current_id-1][8] = sums[6] / nr_of_entries;
routes[current_id-1][9] = sums[7] / nr_of_entries;
current_id = int(row['id']);
sums = numpy.zeros(8)
nr_of_entries = 0;
sums[0] += row[3];
sums[1] += row[4];
sums[2] += row[5];
sums[3] += row[6];
sums[4] += row[7];
sums[5] += row[8];
sums[6] += row[9];
sums[7] += row[10];
nr_of_entries = nr_of_entries + 1;
routes
My problem
1.) The way I did it, I have to copy paste the same code for every other processing approach, since as stated I need to do multiple different way. Average is just an example.
2.) The reading of the data is clumsy and fails when IDs are missing
3.) I'm a C# Developer, so my approach would be to create a Class 'Route' which has all the points and then provide methods for 'calculate average for prop 1'. Or something. This way I could also tweak the data if needed. (extreme values for example). But I have no idea how this would be done in Phyton and if this is a reasonable approach in this language.
4.) Is there a more elegant way to iterate through the original csv and getting like Route ID 1, then Route ID 2 and so on? Maybe something like LINQ Queries in C#?
Thanks for any help.
He is a solution and some ideas you can use. The example features multiple options for the same issue so you have to choose which fits the purpose best. Also it is Python 3.7, you didn't specify a version so i hope this works.
class Route(object):
"""description of class"""
def __init__(self, id, rawdata): # on startup
self.id = id
self.rawdata = rawdata
self.avg_Prop1 = self.calculate_average('Prop1')
self.sum_Prop4 = None
def calculate_average(self, Prop_Name): #selfreference for first argument in class method
return self.rawdata[Prop_Name].mean()
def give_Prop_data(self, Prop_Name): #return the Propdata as list
return self.rawdata[Prop_Name].tolist()
def any_function(self, my_function, Prop_Name): #not sure what dataframes support so turning it into a list first
return my_function(self.rawdata[Prop_Name].tolist())
#end of class definiton
data = pd.read_csv('testdata.csv', sep=';')
# [id, len, prop1, prop2, ...]
route_list = [] #List of all the objects created from the route class
for i in data.id.unique():
print('Current id:', i,' with ',len(data[data['id']==i]),'entries')
route_list.append(Route(i,data[data['id']==i]))
#created the Prop1 average in initialization of route so just accessing attribute
print(route_list[1].avg_Prop1)
for current_route in route_list:
print('Route ',current_route.id , ' Properties :')
for i in current_route.rawdata.columns[1:]: #for all except the first (id)
print(i, ' has average ', current_route.calculate_average(i)) #i is the string of the column not just an id
#or pass any function that you want
route_list[1].sum_Prop4 = (route_list[1].any_function(sum,'Prop4'))
print(route_list[1].sum_Prop4)
#which is equivalent to
print(sum(route_list[1].rawdata['Prop4']))
To adress your individual problems out of order:
For 2. and 4.) Looping only over the existing Ids (data.id.unique()) solves the problem. I have no idea what LINQ Queries are, but i assume they are similar. In general, Python has a great way of looping over objects (like for current_route in route_list), which is worth looking into if you want to use it a little more.
For 1. and 3.) Again looping solves the issue. I created a class in the example, mostly to show the syntax for classes. The benefits and drawbacks for using classes should be the same in Python as in C#.
As it is right now the class probably isn't great, but this depends on how you want to use it. If the class should just be a practical way of storing and accessing data it shouldn't have the methods, because you don't need an individual average method for each route. Then you can just access it's data and use it in a function like in sum(route_list[1].rawdata['Prop4']). If however, depending on the data (amount of rows for example) different calculations are necessary, it might come in handy to use the method calculate_average and differentiate in there.
An other example would be the use of the attributes. If you need the average for Prop1 every time, creating it at the initialization sees a good idea, otherwise i wouldn't bother always calculating it.
I hope this helps!
I want to make a function that makes a CompositeModel class from adding a differing number of GaussianModel classes.
I tried summing the Gausslist which replaces gauss1 to gauss6. I've also tried just using the Gausslist in place of mod but that does not work.
So Originally I have:
gauss1 = models.GaussianModel(prefix='g1_')
pars = gauss1.make_params(center=_259V[0][0], amplitude=_259V[1][0])
gauss2 = models.GaussianModel(prefix='g2_')
pars.update(gauss2.make_params(center=_259V[0][1], amplitude=_259V[1][1]))
gauss3 = models.GaussianModel(prefix='g3_')
pars.update(gauss3.make_params(center=_259V[0][2], amplitude=_259V[1][2]))
gauss4 = models.GaussianModel(prefix='g4_')
pars.update(gauss4.make_params(center=_259V[0][3], amplitude=_259V[1][3]))
gauss5 = models.GaussianModel(prefix='g5_')
pars.update(gauss5.make_params(center=_259V[0][4], amplitude=_259V[1][4]))
gauss6 = models.GaussianModel(prefix='g6_')
pars.update(gauss6.make_params(center=_259V[0][5], amplitude=_259V[1][5]))
mod = gauss1 + gauss2 + gauss3 + gauss4 + gauss5 + gauss6
This will give me a model made up of six Gaussian functions but I want to generalize it for when I have a smaller or larger number of functions. So far I have done the following which allows me to generate list of GaussianModels (Gausslist) whose length depends on the peak_data. So number of peaks corresponds to how many Gaussians I want:
Gausslist = []
Gausslist.append(models.GaussianModel(prefix='g0_'))
pars = Gausslist[0].make_params(center=_259V[0][0],amplitude=peak_data[1][0])
for i in range(1, len(peak_data[1])):
Gausslist.append(models.GaussianModel(prefix='g{}_'.format(i)))
pars.update(Gausslist[i].make_params(center=_259V[0][i], amplitude=peak_data[1][i]))
#
But I don't know how to tackle:
mod = gauss1 + gauss2 + gauss3 + gauss4 + gauss5 + gauss6
I tried summing the Gausslist which replaces gauss1 to gauss6. I've also tried just using the Gausslist in place of mod but that does not work.
I essentially want to add these GaussianModels to form a CompositeModel but I don't know how to add classes or if thats possible?
Try something like this:
Gausslist = []
model, params = None, None
for i, peak in enumerate(peak_data[1]):
comp = models.GaussianModel(prefix='g{}_'.format(i))
pars = comp.make_params(center=center_val, amplitude=peak[0]) #Hm, maybe?
if model is None:
model = comp
params = pars
else:
model += comp
params.update(pars)
Now you should be read to use model with however many peaks are in your peak_data[1] sequence.
FWIW, I would probably recommend storing both x and y values for your peak data so that you might be able to do something like:
pars = comp.make_params(center=peak[0], amplitude=peak[1], sigma=1)
as that will probably give better starting values.
You can use the built-in python reduce function:
model = reduce(lambda m1,m2: m1+m2, GaussList)
I am trying to create arrays of fixed size within a while loop. Since I do not know how many arrays I have to create, I am using a loop to initiate them within a while loop. The problem I am facing is, with the array declaration.I would like the name of each array to end with the index of the while loop, so it will be later useful for my calculations. I do not expect to find a easy way out, however it would be great if someone can point me in the right direction
I tried using arrayname + str(i). This returns the error 'Can't assign to operator'.
#parse through the Load vector sheet to load the values of the stress vector into the dataframe
Loadvector = x2.parse('Load_vector')
Lvec_rows = len(Loadvector.index)
Lvec_cols = len(Loadvector.columns)
i = 0
while i < Lvec_cols:
y_values + str(i) = np.zeros(Lvec_rows)
i = i +1
I expect arrays with names arrayname1, arrayname2 ... to be created.
I think the title is somewhat misleading.
An easy way to do this would be using a dictionary:
dict_of_array = {}
i = 0
while i < Lvec_cols:
dict_of_array[y_values + str(i)] = np.zeros(Lvec_rows)
i = i +1
and you can access arrayname1 by dict_of_array[arrayname1].
If you want to create a batch of arrays, try:
i = 0
while i < Lvec_cols:
exec('{}{} = np.zeros(Lvec_rows)'.format(y_values, i))
i = i +1
I am writing a scientific code in python to calculate the energy of a system.
Here is my function : cte1, cte2, cte3, cte4 are constants previously computed; pii is np.pi (calculated beforehand, since it slows the loop otherwise). I calculate the 3 components of the total energy, then sum them up.
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = sp.special.ellipe(cc)
K = sp.special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
My first idea was to simply loop over all values of the diameter :
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
for d in diametres:
res1,res2 = calc_energy(d)
totalEnergy.append(res1)
Energy1.append(res2)
In an attempt to speed up calculations, I decided to use numpy to vectorize, as shown below :
diams = diametres.reshape(-1,1) #If not reshaped, calculations won't run
r1 = np.apply_along_axis(calc_energy,1,diams)
However, the "vectorized" solution does not properly work. When timing I get 5 seconds for the first solution and 18 seconds for the second one.
I guess I'm doing something the wrong way but can't figure out what.
With your current approach, you're applying a Python function to each element of your array, which carries additional overhead. Instead, you can pass the whole array to your function and get an array of answers back. Your existing function appears to work fine without any modification.
import numpy as np
from scipy import special
cte = 2
cte1 = 2
cte2 = 2
cte3 = 2
cte4 = 2
pii = np.pi
t = 2
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = special.ellipe(cc)
K = special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
a = calc_energy(diametres) # Pass the whole array
I want to use weave.blitz to improve the performance of the following numpy code:
def fastIteration(self):
g = self.grid
nx,ny = g.ux.shape
uxold = g.old_ux
ux = g.ux
ux[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])
g.setBC()
g.old_ux = ux.copy()
In this code g is the computational grid. Which consist of two different fields ux and uxold. The old is simply used for temporary storage of the variables. In the complete Code around 95% of the runtime is spend in the fastIteration method, therefore even a simple performance gain would reduce the amount of hours spend executing this code significantly.
The output of the numpy method looks as if:
As this code is my bottleneck I want to improve the speed by using weave blitz. This method looks like:
def blitzIteration(self):
### does not work correct so far
g = self.grid
nx,ny = g.ux.shape
uxold = g.old_ux
ux = g.ux
expr = "ux[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])"
weave.blitz(expr, check_size=0)
g.setBC()
g.old_ux = ux.copy()
However this does not produce the correct output:
It looks like a bug in weave.blitz (reproduced, filed and fixed. There's more information about the actual bug there).
I thought it was odd to write 0: instead of the shorter : to get a full slice so I replaced all those slices and voilĂ , it worked.
I don't really know where the bug lies, but the expr_code generated by weave.blitz is slightly different:
When using 0:
ipdb> expr_code
'ux_blitz_buggy(blitz::Range(0,_end),blitz::Range(1,Nux_blitz_buggy(1)-1-1))=uxold(blitz::Range(0,_end),blitz::Range(1,Nuxold(1)-1-1))+ReI*(uxold(blitz::Range(0,_end),blitz::Range(2,_end))-2*uxold(blitz::Range(0,_end),blitz::Range(1,Nuxold(1)-1-1))+uxold(blitz::Range(0,_end),blitz::Range(0,Nuxold(1)-2-1)));\n'
When using :
ipdb> expr_code
'ux_blitz_not_buggy(_all,blitz::Range(1,Nux_blitz_not_buggy(1)-1-1))=uxold(_all,blitz::Range(1,Nuxold(1)-1-1))+ReI*(uxold(_all,blitz::Range(2,_end))-2*uxold(_all,blitz::Range(1,Nuxold(1)-1-1))+uxold(_all,blitz::Range(0,Nuxold(1)-2-1)));\n'
So, blitz::Range(0,_end) becomes _all and they behave in a different way.
For convenience, here is a complete script that reproduces the problem and will only succeed while the problem exists.
import numpy as np
from scipy.weave import blitz
def test_blitz_bug(N=4):
ReI = 1.2
ux_blitz_buggy, ux_blitz_not_buggy, ux_np = np.zeros((N, N)), np.zeros((N, N)), np.zeros((N, N))
uxold = np.random.randn(N, N)
ux_np[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])
expr_buggy = 'ux_blitz_buggy[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])'
expr_not_buggy = 'ux_blitz_not_buggy[:,1:-1] = uxold[:,1:-1] + ReI* (uxold[:,2:] - 2*uxold[:,1:-1] + uxold[:,0:-2])'
blitz(expr_buggy)
blitz(expr_not_buggy)
assert not np.allclose(ux_blitz_buggy, ux_np)
assert np.allclose(ux_blitz_not_buggy, ux_np)
if __name__ == '__main__':
test_blitz_bug()