Python - appending to multiple arrays - python

If I wanted to perform something like Levene's test of equal variances via scipy stats, which produces two outputs (the test statistic and p-value) for all the data in a dictionary, how would I append the outputs for each test to two different lists? I tried the code below:
test_stat[]
p_value[]
for i in range(0, n_data):
for j in range(1, n_name):
test_stat[i], p_value[i] = scipy.stats.levene(data[i][name[j-1]],
data[i][name[j]],
center='median')
But this clearly isn't the way to go about it, as I keep getting anIndexError because the list assignment index out of range.
Any suggestions would be greatly appreciated. Thanks!

Not everything needs to be in a single line... This should work fine:
test_stats = []
p_values = []
for i in range(0, n_data):
for j in range(1, n_name):
test_stat, p_value = scipy.stats.levene(data[i][name[j-1]],
data[i][name[j]],
center='median')
test_stats.append(test_stat)
p_values.append(p_value)
Though of course this will add n_data * n_name rows.

Related

Python: Use the "i" counter in while loop as digit for expressions

This seems like it should be very simple but am not sure the proper syntax in Python. To streamline my code I want a while loop (or for loop if better) to cycle through 9 datasets and use the counter to call each file out using the counter as a way to call on correct file.
I would like to use the "i" variable within the while loop so that for each file with sequential names I can get the average of 2 arrays, the max-min of this delta, and the max-min of another array.
Example code of what I am trying to do but the avg(i) and calling out temp(i) in loop does not seem proper. Thank you very much for any help and I will continue to look for solutions but am unsure how to best phrase this to search for them.
temp1 = pd.read_excel("/content/113VW.xlsx")
temp2 = pd.read_excel("/content/113W6.xlsx")
..-> temp9
i=1
while i<=9
avg(i) =np.mean(np.array([temp(i)['CC_H='],temp(i)['CC_V=']]),axis=0)
Delta(i)=(np.max(avg(i)))-(np.min(avg(i)))
deltaT(i)=(np.max(temp(i)['temperature='])-np.min(temp(i)['temperature=']))
i+= 1
EG: The slow method would be repeating code this for each file
avg1 =np.mean(np.array([temp1['CC_H='],temp1['CC_V=']]),axis=0)
Delta1=(np.max(avg1))-(np.min(avg1))
deltaT1=(np.max(temp1['temperature='])-np.min(temp1['temperature=']))
avg2 =np.mean(np.array([temp2['CC_H='],temp2['CC_V=']]),axis=0)
Delta2=(np.max(avg2))-(np.min(avg2))
deltaT2=(np.max(temp2['temperature='])-np.min(temp2['temperature=']))
......
Think of things in terms of lists.
temps = []
for name in ('113VW','113W6',...):
temps.append( pd.read_excel(f"/content/{name}.xlsx") )
avg = []
Delta = []
deltaT = []
for data in temps:
avg.append(np.mean(np.array([data['CC_H='],data['CC_V=']]),axis=0)
Delta.append(np.max(avg[-1]))-(np.min(avg[-1]))
deltaT.append((np.max(data['temperature='])-np.min(data['temperature=']))
You could just do your computations inside the first loop, if you don't need the dataframes after that point.
The way that I would tackle this problem would be to create a list of filenames, and then iterate through them to do the necessary calculations as per the following:
import pandas as pd
# Place the files to read into this list
files_to_read = ["/content/113VW.xlsx", "/content/113W6.xlsx"]
results = []
for i, filename in enumerate(files_to_read):
temp = pd.read_excel(filename)
avg_val =np.mean(np.array([temp(i)['CC_H='],temp['CC_V=']]),axis=0)
Delta=(np.max(avg_val))-(np.min(avg_val))
deltaT=(np.max(temp['temperature='])-np.min(temp['temperature=']))
results.append({"avg":avg_val, "Delta":Delta, "deltaT":deltaT})
# Create a dataframe to show the results
df = pd.DataFrame(results)
print(df)
I have included the enumerate feature to grab the index (or i) should you want to access it for anything, or include it in the results. For example, you could change the the results.append line to something like this:
results.append({"index":i, "Filename":filename, "avg":avg_val, "Delta":Delta, "deltaT":deltaT})
Not sure if I understood the question correctly. But if you want to read the files inside a loop using indexes (i variable), you can create a list to hold the contents of the excel files instead of using 9 different variables.
something like
files = []
files.append(pd.read_excel("/content/113VW.xlsx"))
files.append(pd.read_excel("/content/113W6.xlsx"))
...
then use the index variable to iterate over the list
i=1
while i<=9
avg(i) = np.mean(np.array([files[i]['CC_H='],files[i]['CC_V=']]),axis=0)
...
i+=1
P.S.: I am not a Pandas/NumPy expert, so you may have to adapt the code to your needs

List of Lists of Coordinates

I am new to Python, and am struggling with a task that I assume is an extremely simple one for an experienced programmer.
I am trying to create a list of lists of coordinates for different lines. For instance:
list = [ [(x,y), (x,y), (x,y)], [Line 2 Coordinates], ....]
I have the following code:
masterlist_x = list(range(-5,6))
oneline = []
data = []
numberoflines = list(range(2))
i = 1
for i in numberoflines:
slope = randint(-5,5)
y_int = randint(-10,10)
for element in masterlist_x:
oneline.append((element,slope * element + y_int))
data.append(oneline)
The output of the variable that should hold the coordinates to one line (oneline) holds two lines:
Output
I know this is an issue with the outer looping mechanism, but I am not sure how to proceed.
Any and all help is much appreciated. Thank you very much!
#khuynh is right, you simply had the oneline = [] in wrong place, you put all the coords in one line.
Also, you have a couple unnecessary things in your code:
you don't need list() the range(), you can just iterate them directly with for
also you don't need to declare the i for the for, it does it itself
that i is not actually used, which is fine. Python convention for unused variables is _
Fixed version:
from random import randint
masterlist_x = range(-5,6)
data = []
numberoflines = range(2)
for _ in numberoflines:
oneline = []
slope = randint(-5,5)
y_int = randint(-10,10)
for element in masterlist_x:
oneline.append((element,slope * element + y_int))
data.append(oneline)
print(data)
Also on-line there where you can run it: https://repl.it/repls/GreedyRuralProduct
I suspect the whole thing could be also made with much less code, and in a way in a simpler fashion, as a list comprehension ..
UPDATE: the inner loop is indeed very suitable for a list comprehension. Maybe the outer could be made into one as well, and the whole thing could two nested list comprehensions, but I only got confused when tried that. But this is clear:
from random import randint
masterlist_x = range(-5,6)
data = []
numberoflines = range(2)
for _ in numberoflines:
slope = randint(-5,5)
y_int = randint(-10,10)
oneline = [(element, slope * element + y_int)
for element in masterlist_x]
data.append(oneline)
print(data)
Again on repl.it too: https://repl.it/repls/SoupyIllustriousApplicationsoftware

Sum of elements of numpy array not same as total

I'm trying to count number of pairs and save them in two different histograms, one saves the pair in an array where the parent objects are split and the other one just saves the total, that means I have a loop that looks like this:
for k in range(N_parents):
pair_hist[k, bin] +=1
total_pair_hist[bin] +=1
where both pair_hist and total_pair as defined as,
pair_hist = np.zeros((N_parents, bins.shape[0]), dtype = np.uint64)
total_pair_hist = np.zeros(bins.shape[0], dtype = np.uint64)
I'd expect that summing the elements of pair_hist across all parents (axis=0), I'd get the total histogram. The funny thing is, if I take the sum of pair_hist:
onehalo_sum_ind = np.sum(pair_hist, axis = 0)
I don't get exactly total_pair_hist, but something slightly different:
total_pair_hist = [ 287248245 448773033 695820015 1070797576 1634146741 2466680801
3667159080 5334307986 7524739978 10206208064 13237161068 16466436715
19231751113 20949333183 21254336387 19497450101 16459529579 13038604111
9783826702 7006904025 4813946458 3207605915 2097437543 1355158303
869077173 555036759 353732683 225171870 143179912 0]
pair_hist = [ 287267022 448887401 696415932 1073435699 1644677789 2503693266
3784008845 5665555755 8380564635 12201977310 17382403650 23929909625
31103373709 36859534246 38146287402 33454446858 25689430007 18142721164
12224099624 8035266046 5211441720 3353187036 2147027818 1370663213
873519714 556182465 353995293 225224668 143189173 0]
Any idea of what's going on? Thank you in advance :)
Sorry for the late reply, but I didn't have time to work on it before. The problem was caused by numba. I was using it with the parallel=True flag to parallelise one of the loops and that caused the error.

Save all values of a variable (in a loop) in another variable in Python

I have a code that I inform a folder, where it has n images that the code should return me the relative frequency histogram.
From there I have a function call:
for image in total_images:
histogram(image)
Where image is the current image that the code is working on and total_images is the total of images (n) it has in the previously informed folder.
And from there I call the histogram() function, sending as a parameter the current image that the code is working.
My histogram() function has the purpose of returning the histogram of the relative frequency of each image (rel_freq).
Although the returned values ​​are correct, rel_freq should be a array 1x256 positions ranging from 0 to 255.
How can I transform the rel_freq variable into a 1x256 array? And each value stored in its corresponding position?
When I do len *rel_freq) it returns me 256, that's when I realized that it is not in the format I need...
Again, although the returned data is correct...
After that, I need to create an array store_all = len(total_images)x256 to save all rel_freq...
I need to save all rel_freq in an array to later save it and to an external file, such as .txt.
I'm thinking of creating another function to do this...
Something like that, but I do not know how to do it correctly, but I believe you will understand the logic...
def store_all_histograms(total_images):
n = len(total_images)
store_all = [n][256]
for i in range(0,n):
store_all[i] = rel_freq
I know the function store_all_histograms() is wrong, I just wrote it here to show more or less the way I'm thinking of doing... but again, I do not know how to do it properly... At this point, the error I get is:
store_all = [n][256]
IndexError: list index out of range
After all, I need the store_all variable to save all relative frequency histograms for example like this:
position: 0 ... 256
store_all = [
[..., ..., ...],
[..., ..., ...],
.
.
.
n
]
Now follow this block of code:
def histogram(path):
global rel_freq
#Part of the code that is not relevant to the question...
rel_freq = [(float(item) / total_size) * 100 if item else 0 for item in abs_freq]
def store_all_histograms(total_images):
n = len(total_images)
store_all = [n][256]
for i in range(0,n):
store_all[i] = rel_freq
#Part of the code that is not relevant to the question...
# Call the functions
for fn in total_images:
histogram(fn)
store_all_histograms(total_images)
I hope I have managed to be clear with the question.
Thanks in advance, if you need any additional information, you can ask me...
Return the result, don't use a global variable:
def histogram(path):
return [(float(item) / total_size) * 100 if item else 0 for item in abs_freq]
Create an empty list:
store_all = []
and append your results:
for fn in total_images:
store_all.append(histogram(fn))
Alternatively, use a list comprehension:
store_all = [histogram(fn) for fn in total_images]
for i in range(0,n):
store_all[i+1] = rel_freq
Try this perhaps? I'm a bit confused on the question though if I'm honest. Are you trying to shift the way you call the array with all the items by 1 so that instead of calling position 1 by list[0] you call it via list[1]?
So you want it to act like this?
>>list = [0,1,2,3,4]
>>list[1]
0

Append Rows of Different Lengths to the Same Variable

I am trying to append a lengthy list of rows to the same variable. It works great for the first thousand or so iterations in the loop (all of which have the same lengths), but then, near the end of the file, the rows get a bit shorter, and while I still want to append them, I am not sure how to handle it.
The script gives me an out of range error, as expected.
Here is what the part of code in question looks like:
ii = 0
NNCat = []
NNCatelogue = []
while ii <= len(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii], nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
ii = ii + 1
print NNCatelogue, ii
Any help on this would be greatly appreciated!
I'll answer the question you didn't ask first ;) : how can this code be more pythonic?
Instead of
ii = 0
NNCat = []
NNCatelogue = []
while ii <= len(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii], nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
ii = ii + 1
you should do
NNCat = []
NNCatelogue = []
for ii, line in enumerate(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii],
nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
During each pass ii will be incremented by one for you and line will be the current line.
As for your short lines, you have two choices
Use a special value (such as None) to fill in when you don't have a real value
check the length of nn1, nn2, ..., nn11 to see if they are large enough
The second solution will be much more verbose, hard to maintain, and confusing. I strongly recommend using None (or another special value you create yourself) as a placeholder when there is no data.
def gvop(vals,indx): #get values or padding
return vals[indx] if indx<len(vals) else None
NNCatelogue = [(gvop(ev_id,ii), gvop(nn1,ii), gvop(nn2,ii), gvop(nn3,ii), gvop(nn4,ii),
gvop(nn5,ii), gvop(nn6,ii), gvop(nn7,ii), gvop(nn8,ii), gvop(nn9,ii),
gvop(nn10,ii), gvop(nn11,ii)) for ii in xrange(0, len(lines))]
By defining this other function to return either the correct value or padding, you can ensure rows are the same length. You can change the padding to anything, if None is not what you want.
Then the list comp creates a list of tuples as before, except containing padding in cases where some of the lines in the input are shorter.
from itertools import izip_longest
NNCatelogue = list(izip_longest(ev_id, nn1, nn2, ... nn11, fillvalue=None))
See here for documentation of izip. Do yourself a favour and skip the list around the iterator, if you don't need it. In many cases you can use the iterator as well as the list, and you save a lot of memory. Especially if you have long lists, that you're grouping together here.

Categories

Resources