Python - masking in a for loop? - python

I have three arrays, r_vals, Tgas_vals, and n_vals. They are all numpy arrays of the shape (9998.). The arrays have repeated values and I want to iterate over the unique values of r_vals and find the corresponding values of Tgas_vals, and n_vals so I can use the last two arrays to calculate the weighted average. This is what I have right now:
def calc_weighted_average (r_vals,Tgas_vals,n_vals):
for r in r_vals:
mask = r == r_vals
count = 0
count += 1
for t in Tgas_vals[mask]:
print (count, np.average(Tgas_vals[mask]*n_vals[mask]))
weighted_average = calc_weighted_average (r_vals,Tgas_vals,n_vals)
The problem I am running into is that the function is only looping through once. Did I implement mask incorrectly, or is the problem somewhere else in the for loop?

I'm not sure exactly what you plan to do with all the averages, so I'll toss this out there and see if it's helpful. The following code will calculate a bunch of weighted averages, one per unique value of r_vals and store them in a dictionary(which is then printed out).
def calc_weighted_average (r_vals, z_vals, Tgas_vals, n_vals):
weighted_vals = {} #new variable to store rval=>weighted ave.
for r in np.unique(r_vals):
mask = r_vals == r # I think yours was backwards
weighted_vals[r] = np.average(Tgas_vals[mask]*n_vals[mask])
return weighted_vals
weighted_averages = calc_weighted_average (r_vals, z_vals, Tgas_vals, n_vals)
for rval in weighted_averages:
print ('%i : %0.4f' % (rval, weighted_averages[rval])) #assuming rval is integer
alternatively, you may want to factor in "z_vals" in somehow. Your question was not clear in this.

Related

Create arrays of fixed size within a while loop in python

I am trying to create arrays of fixed size within a while loop. Since I do not know how many arrays I have to create, I am using a loop to initiate them within a while loop. The problem I am facing is, with the array declaration.I would like the name of each array to end with the index of the while loop, so it will be later useful for my calculations. I do not expect to find a easy way out, however it would be great if someone can point me in the right direction
I tried using arrayname + str(i). This returns the error 'Can't assign to operator'.
#parse through the Load vector sheet to load the values of the stress vector into the dataframe
Loadvector = x2.parse('Load_vector')
Lvec_rows = len(Loadvector.index)
Lvec_cols = len(Loadvector.columns)
i = 0
while i < Lvec_cols:
y_values + str(i) = np.zeros(Lvec_rows)
i = i +1
I expect arrays with names arrayname1, arrayname2 ... to be created.
I think the title is somewhat misleading.
An easy way to do this would be using a dictionary:
dict_of_array = {}
i = 0
while i < Lvec_cols:
dict_of_array[y_values + str(i)] = np.zeros(Lvec_rows)
i = i +1
and you can access arrayname1 by dict_of_array[arrayname1].
If you want to create a batch of arrays, try:
i = 0
while i < Lvec_cols:
exec('{}{} = np.zeros(Lvec_rows)'.format(y_values, i))
i = i +1

How to efficiently mutate certain num of values in an array?

Given an initial 2-D array:
initial = [
[0.6711999773979187, 0.1949000060558319],
[-0.09300000220537186, 0.310699999332428],
[-0.03889999911189079, 0.2736999988555908],
[-0.6984000205993652, 0.6407999992370605],
[-0.43619999289512634, 0.5810999870300293],
[0.2825999855995178, 0.21310000121593475],
[0.5551999807357788, -0.18289999663829803],
[0.3447999954223633, 0.2071000039577484],
[-0.1995999962091446, -0.5139999985694885],
[-0.24400000274181366, 0.3154999911785126]]
The goal is to multiply some random values inside the array by a random percentage. Lets say only 3 random numbers get replaced by a random multipler, we should get something like this:
output = [
[0.6711999773979187, 0.52],
[-0.09300000220537186, 0.310699999332428],
[-0.03889999911189079, 0.2736999988555908],
[-0.6984000205993652, 0.6407999992370605],
[-0.43619999289512634, 0.5810999870300293],
[0.84, 0.21310000121593475],
[0.5551999807357788, -0.18289999663829803],
[0.3447999954223633, 0.2071000039577484],
[-0.1995999962091446, 0.21],
[-0.24400000274181366, 0.3154999911785126]]
I've tried doing this:
def mutate(array2d, num_changes):
for _ in range(num_changes):
row, col = initial.shape
rand_row = np.random.randint(row)
rand_col = np.random.randint(col)
cell_value = array2d[rand_row][rand_col]
array2d[rand_row][rand_col] = random.uniform(0, 1) * cell_value
return array2d
And that works for 2D arrays but there's chance that the same value is mutated more than once =(
And I don't think that's efficient and it only works on 2D array.
Is there a way to do such "mutation" for array of any shape and more efficiently?
There's no restriction of which value the "mutation" can choose from but the number of "mutation" should be kept strict to the user specified number.
One fairly simple way would be to work with a raveled view of the array. You can generate all your numbers at once that way, and make it easier to guarantee that you won't process the same index twice in one call:
def mutate(array_anyd, num_changes):
raveled = array_anyd.reshape(-1)
indices = np.random.choice(raveled.size, size=num_changes, replace=False)
values = np.random.uniform(0, 1, size=num_changes)
raveled[indices] *= values
I use array_anyd.reshape(-1) in favor of array_anyd.ravel() because according to the docs, the former is less likely to make an inadvertent copy.
The is of course still such a possibility. You can add an extra check to write back if you need to. A more efficient way would be to use np.unravel_index to avoid creating a view to begin with:
def mutate(array_anyd, num_changes):
indices = np.random.choice(array_anyd.size, size=num_changes, replace=False)
indices = np.unravel_indices(indices, array_anyd.shape)
values = np.random.uniform(0, 1, size=num_changes)
raveled[indices] *= values
There is no need to return anything because the modification is done in-place. Conventionally, such functions do not return anything. See for example list.sort vs sorted.
Using shuffle instead of random_choice, this would be a different solution. It works on an array of any shape.
def mutate(arrayIn, num_changes):
mult = np.zeros(arrayIn.ravel().shape[0])
mult[:num_changes] = np.random.uniform(0,1,num_changes)
np.random.shuffle(mult)
mult = mult.reshape(arrayIn.shape)
arrayIn = arrayIn + mult*arrayIn
return arrayIn

Sum of elements of numpy array not same as total

I'm trying to count number of pairs and save them in two different histograms, one saves the pair in an array where the parent objects are split and the other one just saves the total, that means I have a loop that looks like this:
for k in range(N_parents):
pair_hist[k, bin] +=1
total_pair_hist[bin] +=1
where both pair_hist and total_pair as defined as,
pair_hist = np.zeros((N_parents, bins.shape[0]), dtype = np.uint64)
total_pair_hist = np.zeros(bins.shape[0], dtype = np.uint64)
I'd expect that summing the elements of pair_hist across all parents (axis=0), I'd get the total histogram. The funny thing is, if I take the sum of pair_hist:
onehalo_sum_ind = np.sum(pair_hist, axis = 0)
I don't get exactly total_pair_hist, but something slightly different:
total_pair_hist = [ 287248245 448773033 695820015 1070797576 1634146741 2466680801
3667159080 5334307986 7524739978 10206208064 13237161068 16466436715
19231751113 20949333183 21254336387 19497450101 16459529579 13038604111
9783826702 7006904025 4813946458 3207605915 2097437543 1355158303
869077173 555036759 353732683 225171870 143179912 0]
pair_hist = [ 287267022 448887401 696415932 1073435699 1644677789 2503693266
3784008845 5665555755 8380564635 12201977310 17382403650 23929909625
31103373709 36859534246 38146287402 33454446858 25689430007 18142721164
12224099624 8035266046 5211441720 3353187036 2147027818 1370663213
873519714 556182465 353995293 225224668 143189173 0]
Any idea of what's going on? Thank you in advance :)
Sorry for the late reply, but I didn't have time to work on it before. The problem was caused by numba. I was using it with the parallel=True flag to parallelise one of the loops and that caused the error.

Multiple Notes Simultaneously - Overflow Error

I have successfully generated .wav files in python of sine waves at different frequencies.
If I wish to generate harmonies, for example, a C major tirade, am I supposed to add the each sine wave of the individual notes together?
When adding 2 notes together, say C and G, the program creates the correct harmony. When I attempt to add a third note though, there is an overflow error. How might this be successfully accomplished.
The Code:
I am placing the data for the sine waves into an array of signed short integers.
wave = array.array('h')
And then adding multiple waves togeather to generate the harmonies.
for i in range(len(data)):
wave1[i] += wave2[i]
This works!
But when I add a third array, (wave3), it overflows.
This is because the signed short integer has reached its maximum. I am working with a 16 bit rate. Is the problem simply that the bit rate is too low? When creating complex audio with lots of harmonies, does the bit rate simply need to be much higher? Have I approached the problem in the absolute wrong direction?
Full Source
I don't think it's the byterate. You just have to normalize the values so that they fit properly. I've rewritten your code a bit so it uses lists at first, then arranges all the values from 0 to 32767, taking into account the volume, and puts it into the array.
def normalize(nmin, nmax, nums): #this could probably be done a bit shorter
orange = max(nums)-min(nums)
nrange = nmax-nmin
nums = [float(num)/orange*nrange for num in nums]
omin = min(nums)
nums = [num-omin+nmin for num in nums]
return nums
if __name__ == '__main__':
data, data2, data3 = [], [], []
data.extend(create_data(getTime(1), getFreq('C', 4)))
data2.extend(create_data(getTime(1), getFreq('Bb', 4)))
data3.extend(create_data(getTime(1), getFreq('G', 4)))
for i in range(len(data)):
data[i] += data2[i]
data[i] += data3[i]
data = array.array('h', [int(val) for val in normalize(0, 32767*VOLUME/100, data)])
write_wave(data, int(len(data)/SAMPLE_RATE))
winsound.PlaySound('output.wav', winsound.SND_FILENAME)

Python, complex looping calculations with lists or arrays

I am converting old pseudo-Fortran code into python and am struggling to create a framework within which I can perform some complex iterative calculations.
As a beginner, my first instinct is to use lists as I find them easier to work with, but i understand that arrays would probably be a more suitable method.
I already have all the input channels as lists and am hoping for a good explanation of how to set up loops for such calculations.
This is an example of the pseudo-Fortran i am replicating. Each (t) indicates a 'time-series channel' that I currently have stored as lists (ie. ECART2(t) and NNNN(t) are lists) All lists have the same number of entries.
do while ( ecart2(t) > 0.0002 .and. nnnn(t) < 2000. ) ;
mmm(t)=nnnn(t)+1.;
if YRPVBPO(t).ge.0.1 .and. YRPVBPO(t).le.0.999930338 .and. YAEVBPO(t).ge.0.000015 .and. YAEVBPO(t).le.0.000615 then do;
YM5(t) = customFunction(YRPVBPO,YAEVBPO);*
end;
YUEVBO(t) = YU0VBO(t) * YM5(t) ;*m/s
YHEVBO(t) = YCPEVBO(t)*TPO_TGETO1(t)+0.5*YUEVBO(t)*YUEVBO(t);*J/kg
YAVBO(t) = ddnn2(t)*(YUEVBO(t)**2);*
YDVBO(t) = YCPEVBO(t)**2 + 4*YHEVBO(t)*YAVBO(t) ;*
YTSVBPO(t) = (sqrt(YDVBO(t))-YCPEVBO(t))/2./YAVBO(t);*K
YUSVBO(t) = ddnn(t)*YUEVBO(t)*YTSVBPO(t);*m/s
YM7(t) = YUSVBO(t)/YU0VBO(t);*
YPHSVBPOtot(t) = (YPHEVBPO(t) - YPDHVBPO(t))/(1.+((YGAMAEVBO(t)-1)/2)*(YM7(t)**2))**(YGAMAEVBO(t)/(1-YGAMAEVBO(t)));*bar
YPHEVBPOtot(t) = YPHEVBPO(t) / (1.+rss0(t)*YM5(t)*YM5(t))**rss1(t);*bar
YDPVBPOtot(t) = YPHEVBPOtot(t) - YPHSVBPOtot(t) ;*bar
iter(t) = (YPHEVBPOtot(t) - YDPVBPOtot(t))/YPHEVBPOtot(t);*
ecart2(t)= ABS(iter(t)-YRPVBPO(t));*
aa(t)=YRPVBPO(t)+0.0001;
YRPVBPO(t)=aa(t);*
nnnn(t)=mmm(t);*
end;
Understanding the pseudo-fortran: With 'time-series data' there is an impicit loop iterating through the individual values in each list - as well as looping over each of those values until the conditions are met.
It will carry out the loop calculations on the first list values until the conditions are met. It then moves onto the second value in the lists and perform the same looping calculations until the conditions are met...
ECART2 = [2,0,3,5,3,4]
NNNN = [6,7,5,8,6,7]
do while ( ecart2(t) > 0.0002 .and. nnnn(t) < 2000. )
MMM = NNNN + 1
this looks at the first values in each list (2 and 6). Because the conditions are met, subsequent calculations are performed on the first values in the new lists such as MMM = [6+1,...]
Once the rest of the calculations have been performed (looping multiple times if the conditions are not met) only then does the second value in every list get considered. The second values (0 and 7) do not meet the conditions and therefore the second entry for MMM is 0.
MMM=[6+1, 0...]
Because 0 must be entered if conditons are not met, I am considering setting up all the 'New lists' in advance and populating them with 0s.
NB: 'customFunction()' is a separate function that is called, returning a value from two input values
MY CURRENT SOLUTION
set up all the empty lists
nPts = range(ECART2)
MMM = [0]*nPts
YM5 = [0]*nPts
etc...
then start performing calculations
for i in ECART2:
while (ECART2[i] > 0.0002) and (NNNN[i] < 2000):
MMM[i] = NNNN[i]+1
if YRPVBPO[i]>=0.1 and YRPVBPO[i]<=0.999930338 and YAEVBPO[i]>=0.000015 and YAEVBPO[i]<=0.000615:
YM5[i] = MACH_LBP_DIA30(YRPVBPO[i],YAEVBPO[i])
YUEVBO[i] = YU0VBO[i]*YM5[i]
YHEVBO[i] = YCPEVBO[i]*TGETO1[i] + 0.5*YUEVBO[i]^2
YAVBO[i] = DDNN2[i]*YUEVBO[i]^2
YDVBO[i] = YCPEVBO[i]^2 + 4*YHEVBO[i]*YAVBO[i]
etc etc...
but i'm guessing that there are better ways of doing this - such as the suggestion to use numpy arrays (something i plan on learning in the near future)

Categories

Resources