So I wrote a model that computes results over various parameters via a nested loop. Each computation returns a list of len(columns) = 10 elements, which is added to a list of lists (res).
Say I compute my results for some parameters len(alpha) = 2, len(gamma) = 2, rep = 3, where rep is the number of repetitions that I run. This yields results in the form of a list of lists like this:
res = [ [elem_1, ..., elem_10], ..., [elem_1, ..., elem_10] ]
I know that len(res) = len(alpha) * len(gamma) * repetitions = 12 and that each inner list has len(columns) = 10 elements. I also know that every 3rd list in res is going to be a repetition (which I know from the way I set up my nested loops to iterate over all parameter combinations, in fact I am using itertools).
I now want to average the result list of lists. What I need to do is to take every (len(res) // repetitions) = 4th list , add them together element-wise, and divide by the number of repetitions (3). Sounded easier than done, for me.
Here is my ugly attempt to do so:
# create a list of lists of lists, where the inner list of lists are lists of the runs with the identical parameters alpha and gamma
res = [res[i::(len(res)//rep)] for i in range(len(res)//rep)]
avg_res = []
for i in res:
result = []
for j in (zip(*i)):
result.append(sum(j))
avg_res.append([i/repetitions for i in result])
print(len(result_list), avg_res)
This actually yields, what I want, but it surely is not the pythonic way to do it. Ugly as hell and 5 minutes later I can hardly make sense of my own code...
What would be the most pythonic way to do it? Thanks in advance!
In some cases a pythonic code is a matter of style, one of its idioms is using list comprehension instead of loop so writing result = [sum(j) for j in (zip(*i))] is simpler than iterating over zip(*i).
On the other hand nested list comprehension looks more complex so don't do
avg_res = [[i/repetitions for i in [sum(j) for j in (zip(*j))]] for j in res]
You can write:
res = [res[i::(len(res)//rep)] for i in range(len(res)//rep)]
avg_res = []
for i in res:
result = [sum(j) for j in (zip(*i))]
avg_res.append([i/repetitions for i in result])
print(len(result_list), avg_res)
Another idiom in Programming in general (and in python in particular) is naming operations with functions, and variable names, to make the code more readable:
def sum_columns(list_of_rows):
return [sum(col) for col in (zip(*list_of_rows))]
def align_alpha_and_gamma(res):
return [res[i::(len(res)//rep)] for i in range(len(res)//rep)]
aligned_lists = align_alpha_and_gamma(res)
avg_res = []
for aligned_list in aligned_lists:
sums_of_column= sum_columns(aligned_list)
avg_res.append([sum_of_column/repetitions for sum_of_column in sums_of_column])
print(len(result_list), avg_res)
Off course you can choose better names according to what you want to do in the code.
It was a bit hard to follow your instructions, but as I caught, you attempt to try sum over all element in N'th list and divide it by repetitions.
res = [list(range(i,i+10)) for i in range(10)]
N = 4
repetitions = 3
average_of_Nth_lists = sum([num for iter,item in enumerate(res) for num in item if iter%N==0])/repetitions
print(average_of_Nth_lists)
output:
85.0
explanation for the result: equals to sum(0-9)+sum(4-13)+sum(8-17) = 255 --> 255/3=85.0
created res as a list of lists, iterate over N'th list (in my case, 1,5,9 you can transform it to 4,8 etc if that what you are wish, find out where in the code or ask for help if you don't get it), sum them up and divide by repetitions
I have some question about memory error in python3.6
import itertools
input_list = ['a','b','c','d']
group_to_find = list(itertools.product(input_list,input_list))
a = []
for i in range(len(group_to_find)):
if group_to_find[i] not in a:
a.append(group_to_find[i])
group_to_find = list(itertools.product(input_list,input_list))
MemoryError
You are creating a list, in full, from the Cartesian product of your input list, so in addition to input_list you now need len(input_list) ** 2 memory slots for all the results. You then filter that list down again to a 4th list. All in all, for N items you need memory for 2N + (N * N) references. If N is 1000, that's 1 million and 2 thousand references, for N = 1 million, you need 1 million million plus 2 million references. Etc.
Your code doesn't need to create the group_to_find list, at all, for two reasons:
You could just iterate and handle each pair individually:
a = []
for pair in itertools.product(input_list, repeat=2):
if pair not in a:
a.append(pair)
This is still going to be slow, because pair not in a has to scan the whole list to find matches. You do this N times, for up to K pairs (where K is the product of the number of unique values in input_list, potentially equal to N), so that's N * K time spent checking for duplicates. You could use a = set() to make that faster. But see point 2.
Your end product in a is the exact same list of pairs that itertools.product() would produce anyway, unless you input values are not unique. You could just make those unique first:
a = itertools.product(set(input_list), repeat=2)
Again, don't put this in a list. Iterate over it in a loop and use the pairs it produces one by one.
I'm new to Python and I am trying to generate a list of 4 random numbers with integers between 1 and 9. The list must contain no repeating integers.
The issue I am having is that the program doesn't output exactly 4 numbers everytime. Sometimes it generates 3 numbers or 2 numbers and I can't figure out how to fix it.
My code:
import random
lst = []
for i in range(5):
r = random.randint(1,9)
if r not in lst: lst.append(r)
print(lst)
Is there a way to do it without the random.sample? This code is part of a larger assignment for school and my teacher doesn't want us using the random.sample or random.shuffle functions.
Your code generates 5 random numbers, but they are not necessarily unique. If a 2 is generated and you already have 2 in list you don't append it, while you should really be generating an alternative digit that hasn't been used yet.
You could use a while loop to test if you already have enough numbers:
result = [] # best not to use list as a variable name!
while len(result) < 5:
digit = random.randint(1, 9)
if digit not in result:
result.append(digit)
but that's all more work than really needed, and could in theory take forever (as millions of repeats of the same 4 initial numbers is still considered random). The standard library has a better method for just this task.
Instead, you can use random.sample() to take 5 unique numbers from a range() object:
result = random.sample(range(1, 10), 5)
This is guaranteed to produce 5 values taken from the range, without duplicate digits, and it does so in 5 steps.
Use random.sample:
import random
random.sample(range(1, 10), 4)
This generates a list of four random values between 1 to 9 with no duplicates.
Your issue is, you're iterating 5 times, with a random range of 1-9. That means you have somewhere in the neighborhood of a 50/50 chance of getting a repeat integer, which your conditional prevents from being appended to your list.
This will serve you better:
def newRunLst():
lst = []
while len(lst) < 4:
r = random.randint(1,9)
if r not in lst: lst.append(r)
print lst
if random list needed is not too small (compared to the total list) then can
generate an indexed DataFrame of random numbers
sort it and
select from the top ... like
(pd.DataFrame([(i,np.random.rand()) for i in range(10)]).sort_values(by=1))[0][:5].sort_index()
I have a list of 40 elements. I am trying to estimate how many times I need to sample this list in order to reproduce all elements in that list. However, it is important that I replace the picked element. I.e. it is possible that I will pick the same element 20 times. So far I have the following
import random
l = range(0,40)
seen=[]
x=0
while len(seen)<len(l):
r = random.choice(l)
if r not in seen:
seen.append(r)
x=x+1
print x
However, this always returns that it took 40 times to accomplish what I want. However, this is because a single element is never selected twice.
Eventually I would run this function 1000 times to get a feel for how often I would have to sample.
as always, thanks
You need just adjust the indentation of x=x+1. Because right now you just increment if the value was not seen before.
If you will do that more often with a lot of items may use a set as your seen variable because access items is faster in avarage.
l = range(0, 40)
seen = set()
x = 0
while len(seen) < len(l):
r = random.choice(l)
if r not in seen:
seen.add(r)
x = x + 1
print x
Here is a similar method to do it. Initialize a set, which by definition may only contain unique elements (no duplicates). Then keep using random.choice() to choose an element from your list. You can compare your set to the original list, and until they are the same size, you don't have every element. Keep a counter to see how many random choices it takes.
import random
def sampleValues(l):
counter = 0
values = set()
while len(values) < len(l):
values.add(random.choice(l))
counter += 1
return counter
>>> l = list(range(40))
This number will vary, you could Monte Carlo to get some stats
>>> sampleValues(l)
180
>>> sampleValues(l)
334
>>> sampleValues(l)
179
I have many pairs of lists of variable lengths (5,4,6 pairs etc..) inside a single big list, lets call it LIST. Here are two lists among the many inside the big LIST as an example:
[(38.621833, -10.825707),
(38.572191, -10.84311), -----> LIST[0]
(38.580202, -10.860877),
(38.610917, -10.85217),
(38.631526, -10.839338)]
[(38.28152, -10.744559),
(38.246368, -10.744552), -----> LIST[1]
(38.246358, -10.779088),
(38.281515, -10.779096)]
I need to create two seperate variables lets say, of which one variable will have the first "column" (i.e. LIST[0][0][0], LIST[0][1][0] AND SO ON) of all the pairs of the lists(i.e. 38.621833, 38.572191 etc) and the second variable will have the second "column" (i.e. LIST[0][0][1], LIST[0][1][1] AND SO ON) of all the pairs of the lists.
So finally I will have two variables (say x,y) that will contain all the values of the first and second "columns" of all the lists in the LIST.
The problem I face is that all these lists are not of the same length!!
I tried
x = []
y = []
for i in range(len(LIST)):
x.append(LIST[i][0][0]) #append all the values of the first numbers
y.append(LIST[i][1][1]) #append all the values of the second numbers
What I expect:
x = (38.621833,38.572191,38.580202,38.610917,38.631526,38.28152,38.246368,38.246358,38.281515)
y = (-10.825707,-10.84311,-10.860877,-10.85217,-10.839338,-10.744559,-10.744552,-10.779088,-10.779096)
But here because of the variable pairs, my loop stops abrubptly in between.
I know I need to also change the LIST[i][j][0] here, and j changes with each list. But because of the different pairs, I don't know how to go about.
How do I go about doing this?
I would use two simple for loops (it's also generic for LIST being longer than 2):
x=[]
y=[]
for i in range(len(LIST)):
for j in LIST[i]:
x.append(j[0])
y.append(j[1])
You should transpose the sublists and use itertool.chain to create a single list:
from itertools import chain
zipped = [zip(*x) for x in l]
x, y = chain.from_iterable(ele[0] for ele in zipped),chain.from_iterable(ele[1] for ele in zipped)
print(list(x),list(y))
[38.621833, 38.572191, 38.580202, 38.610917, 38.631526, 38.28152, 38.246368, 38.246358, 38.281515] [-10.825707, -10.84311, -10.860877, -10.85217, -10.839338, -10.744559, -10.744552, -10.779088, -10.779096]
for ele1,ele2 in zip(x,y):
print(ele1,ele2)
38.621833 -10.825707
38.572191 -10.84311
38.580202 -10.860877
38.610917 -10.85217
38.631526 -10.839338
38.28152 -10.744559
38.246368 -10.744552
38.246358 -10.779088
38.281515 -10.779096
Here you go. tuple as requested.
my = [(38.621833, -10.825707),(38.572191, -10.84311),(38.580202, -10.860877),(38.610917, -10.85217),(38.631526, -10.839338)]
my1 = [(38.28152, -10.744559),(38.246368, -10.744552),(38.246358, -10.779088),(38.281515, -10.779096)]
l1 = map(tuple,zip(*my))[0]
l2 = map(tuple,zip(*my))[1]
print l1,l2
Output:
(38.621833, 38.572191, 38.580202, 38.610917, 38.631526)(-10.825707, -10.84311, -10.860877, -10.85217, -10.839338)
Use map function with zip and * stuple operator.
l = [(38.621833, -10.825707),
(38.572191, -10.84311),
(38.580202, -10.860877),
(38.610917, -10.85217),
(38.631526, -10.839338)]
x= map(list, zip(*l))[0]
y = map(list, zip(*l))[1]
print 'x = {},\n y = {}' .format(x,y)
x = [38.621833, 38.572191, 38.580202, 38.610917, 38.631526],
y = [-10.825707, -10.84311, -10.860877, -10.85217, -10.839338]
or if you don't want to store it in variables then d0n't use indexing in above solution,
map(list, zip(*l)) # will give you a nested list
Your LIST extends out of 2 lists.
With
for i in range(len(LIST)):
you run exactly 2 times through your loop.
If you want to solve your problem with for-loops your need to nest them:
#declare x, y as lists
x = []
y = []
for i_list in LIST:
#outer for-loop runs 2 times - one for each list appended to LIST.
#1st run: i_list becomes LIST[0]
#2nd run: i_list becomes LIST[1]
for touple in i_list:
#inner for-loop runs as often as the number of tuple appended to i_list
#touple becomes the content of i_list[#run]
x.append(touple[0]) #adds x-value to x
y.append(touple[1]) #adds y-value to y
If you prefer working with indexes use:
for i in range(len(LIST)):
for j in range(len(LIST[i])):
x.append(LIST[i][j][0])
y.append(LIST[i][j][1]])
NOT working with indexes for appending x- or y-values is much easier to write (saves complex thoughts about the List-Structure and correct using of indexes) and is much more comprehensible for extern people reading your code.