I'm a beginner in python and I have a problem with some basics of it.
m = [[1,2,3],[4,5,6],[7,8,9]]
b = []
for col in range(3):
b.append(sum(i[col] for i in m))
print(b)
This code will sum the columns of m.
But if I extract the for i in m and write it as follows:
for col in range(3):
for i in m:
b.append(sum(i[col]))
print(b)
it doesn't work and gives me an error. All I've done is extracting the loop from the parenthesis.
What is the problem and what should I do?
Note the original code has a single call to append each time around the outer for loop.
Try:
m = [[1,2,3],[4,5,6],[7,8,9]]
b = []
for col in range(3):
toadd = 0
for i in m:
toadd += sum(i[col])
b.append(toadd)
print(b)
for col in range(3): # col is an int
for i in m: # each i is a list e.g. [1, 2, 3]
b.append(sum(i[col])) # i[col] is an int, e.g. i[0] = 1 or i[1] = 2
The problem is that sum() receives iterable as argument not an int https://docs.python.org/3/library/functions.html#sum
If in doubt, use some print statements to see whats going on within the for loops:
m = [[1,2,3],[4,5,6],[7,8,9]]
b = []
for col in range(3):
print([i[col] for i in m])
print("----")
for col in range(3):
print("new col")
for i in m:
print(i[col])
outputs:
[1, 4, 7]
[2, 5, 8]
[3, 6, 9]
----
new col
1
4
7
new col
2
5
8
new col
3
6
9
So the first for loop, i[col] is returning a list of numbers and in the second for loop, i[col] is returning an int.
This may give you a clue why sum(i[col]) is returning an error in the second for loop.
Related
I have 2 questions:
I have a dataset that contains some duplicate IDs, but some of them have different actions so they can't be removed. I want for each ID to do some math and store the final value to work with later. I already have duplicate indices, but in this code, it doesn't work properly and gives NaN.
How can I write nested loop using pandas? Cause it takes too much time to run. I've already used iterrows(), but didn't work.
l_list = []
for i in range(len(idx)):
for j in range(len(idx[i])):
if df.at[j,'action'] == 0:
a = df.rank[idx[i]]*50
b = df.study_list[idx[i]].str.strip('[]').str.split(',').str.len()
l_list.append(a + b)
Based on my understanding of what you've provided, see if this works:
In [15]: df
Out[15]:
ID rank action study_list
0 aaa 24 0 [a, b]
1 bbb 6 1 [1, 2, 3]
2 aaa 14 0 [1, 2, 3, 4]
In [16]: def do_thing(row):
...: if row['ID'] == 'aaa' and row['action'] == 0:
...: return row['rank'] * 50 + len(row['study_list'])
...: else:
...: return 100 * row['rank']
...:
In [17]: df['new_value'] = df.apply(do_thing, axis=1)
In [18]: df
Out[18]:
ID rank action study_list new_value
0 aaa 24 0 [a, b] 1202
1 bbb 6 1 [1, 2, 3] 600
2 aaa 14 0 [1, 2, 3, 4] 704
NOTE:
I have made many simplifications as your post doesn't enable a reproducible case. Read this thread to see how to best ask questions about Pandas.
I also can't guarantee speed as you have not provided the details regarding the size of the dataset.
i dont know what does the variable idx or anything. i think your code is wrong,
you have to try this code
l_list = []
for i in range(len(idx)):
for j in range(len(idx[i])):
if df.at[j,'action'] == 0:
a = df.rank[idx[i]]*50
b = df.study_list[idx[i]].str.strip('[]').str.split(',').str.len()
l_list.append(a + b)
I have defined a function to create a dataframe, but I get two lists in each column, how could I get each element of the list as a separate row in the dataframe as shown below.
a = [1, 2, 3, 4]
def function():
result = []
for i in range(0, len(a)):
number = [i for i in a]
operation = [8*i for i in a]
result.append({'number': number, 'operation': operation})
df = pd.DataFrame(result, columns=['number','operation'])
return df
function()
Result:
number operation
0 [1, 2, 3, 4] [8, 16, 24, 32]
What I really want to:
number operation
0 1 8
1 2 16
2 3 24
3 4 34
Can anyone help me please? :)
Your problems are twofold, firstly you are pushing the entire list of values (instead of the "current" value) into the result array on each pass through your for loop, and secondly you are overwriting the dataframe each time as well. It would be simpler to use a list comprehension to generate the values for the dataframe:
import pandas as pd
a = [1, 2, 3, 4]
def function():
result = [{'number' : i, 'operation' : 8*i} for i in a]
df = pd.DataFrame(result)
return df
print(function())
Output:
number operation
0 1 8
1 2 16
2 3 24
3 4 32
import numpy as np
a = [1, 2, 3, 4]
def function():
for i in range(0, len(a)):
number = [i for i in a]
operation = [8*i for i in a]
v=np.rot90(np.array((number,operation)))
result=np.flipud(v)
df = pd.DataFrame(result, columns=['number','operation'])
return df
print (function())
number operation
0 1 8
1 2 16
2 3 24
3 4 32
You are almost there. Just replace number = [i for i in a] with number = a[i] and operation = [8*i for i in a] with operation = 8 * a[i]
(FYI: No need to create pandas dataframe inside loop. You can get same output with pandas dataframe creation outside loop)
Refer to the below code:
a = [1, 2, 3, 4]
def function():
result = []
for i in range(0, len(a)):
number = a[i]
operation = 8*a[i]
result.append({'number': number, 'operation': operation})
df = pd.DataFrame(res, columns=['number','operation'])
return df
function()
number operation
0 1 8
1 2 16
2 3 24
3 4 32
I have:
hi
0 1
1 2
2 4
3 8
4 3
5 3
6 2
7 8
8 3
9 5
10 4
I have a list of lists and single integers like this:
[[2,8,3], 2, [2,8]]
For each item in the main list, I want to find out the index of when it appears in the column for the first time.
So for the single integers (i.e 2) I want to know the first time this appears in the hi column (index 1, but I am not interested when it appears again i.e index 6)
For the lists within the list, I want to know the last index of when the list appears in order in that column.
So for [2,8,3] that appears in order at indexes 6, 7 and 8, so I want 8 to be returned. Note that it appears before this too, but is interjected by a 4, so I am not interested in it.
I have so far used:
for c in chunks:
# different method if single note chunk vs. multi
if type(c) is int:
# give first occurence of correct single notes
single_notes = df1[df1['user_entry_note'] == c]
single_notes_list.append(single_notes)
# for multi chunks
else:
multi_chunk = df1['user_entry_note'].isin(c)
multi_chunk_list.append(multi_chunk)
You can do it with np.logical_and.reduce + shift. But there are a lot of edge cases to deal with:
import numpy as np
def find_idx(seq, df, col):
if type(seq) != list: # if not list
s = df[col].eq(seq)
if s.sum() >= 1: # if something matched
idx = s.idxmax().item()
else:
idx = np.NaN
elif seq: # if a list that isn't empty
seq = seq[::-1] # to get last index
m = np.logical_and.reduce([df[col].shift(i).eq(seq[i]) for i in range(len(seq))])
s = df.loc[m]
if not s.empty: # if something matched
idx = s.index[0]
else:
idx = np.NaN
else: # empty list
idx = np.NaN
return idx
l = [[2,8,3], 2, [2,8]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7]
l = [[2,8,3], 2, [2,8], [], ['foo'], 'foo', [1,2,4,8,3,3]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7, nan, nan, nan, 5]
I'm working on the following code:
mylist = [1,2,3,4,5,6,7,8,9,10.....]
for x in range(0, len(mylist), 3):
value = mylist[x:x + 3]
print(value)
Basically, I'm taking 3 items in mylist at a time, the code is bigger than that, but I'm doing a lot of things with them returning a value from it, then it takes the next 3 items from mylist and keep doing it till the end of this list.
But now I have a problem, I need to identify each iteration, but they follow a rule:
The first loop are from A, the second are from B and the third are from C.
When it reaches the third, it starts over with A, so what I'm trying to do is something like this:
mylist[0:3] are from A
mylist[3:6] are from B
mylist[6:9] are from C
mylist[9:12]are from A
mylist[12:15] are from B......
The initial idea was to implement a identifier the goes from A to C, and each iteration it jumps to the next identifier, but when it reaches C, it backs to A.
So the output seems like this:
[1,2,3] from A
[4,5,6] from B
[6,7,8] from C
[9,10,11] from A
[12,13,14] from B
[15,16,17] from C
[18,19,20] from A.....
My bad solution:
Create identifiers = [A,B,C] multiply it by the len of mylist -> identifiers = [A,B,C]*len(mylist)
So the amount of A's, B's and C's are the same of mylist numbers that it needs to identify. Then inside my for loop I add a counter that adds +1 to itself and access the index of my list.
mylist = [1,2,3,4,5,6,7,8,9,10.....]
identifier = ['A','B','C']*len(mylist)
counter = -1
for x in range(0, len(mylist), 3):
value = mylist[x:x + 3]
counter += 1
print(value, identifier[counter])
But its too ugly and not fast at all. Does anyone know a faster way to do it?
Cycle, zip, and unpack:
mylist = [1,2,3,4,5,6,7,8,9,10]
for value, iden in zip(mylist, itertools.cycle('A', 'B', 'C')):
print(value, iden)
Output:
1 A
2 B
3 C
4 A
5 B
6 C
7 A
8 B
9 C
10 A
You can always use a generator to iterate over your identifiers:
def infinite_generator(seq):
while True:
for item in seq:
yield item
Initialise the identifiers:
identifier = infinite_generator(['A', 'B', 'C'])
Then in your loop:
print(value, next(identifier))
Based on Ignacio's answer fitted for your problem.
You can first reshape your list into a list of arrays containing 3 elements:
import pandas as pd
import numpy as np
import itertools
mylist = [1,2,3,4,5,6,7,8,9,10]
_reshaped = np.reshape(mylist[:len(mylist)-len(mylist)%3],(-1,3))
print(_reshaped)
[[1 2 3]
[4 5 6]
[7 8 9]]
Note that it works since your list contains multiple of 3 elements (so you need to drop the last elements in order to respect this condition, mylist[:len(mylist)-len(mylist)%3]) - Understanding slice notation
See UPDATE section for a reshape that fits to your question.
Then apply Ignacio's solution on the reshaped list
for value, iden in zip(_reshaped, itertools.cycle(('A', 'B', 'C'))):
print(value, iden)
[1 2 3] A
[4 5 6] B
[7 8 9] C
UPDATE
You can use #NedBatchelder's chunk generator to reshape you array as expected:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
mylist = [1,2,3,4,5,6,7,8,9,10]
_reshaped = list(chunks(mylist, 3))
print(_reshaped)
[[1 2 3]
[4 5 6]
[7 8 9]
[10]]
Then:
for value, iden in zip(_reshaped, itertools.cycle(('A', 'B', 'C'))):
print(value, iden)
[1 2 3] A
[4 5 6] B
[7 8 9] C
[10] A
Performances
Your solution : 1.32 ms ± 94.3 µs per loop
With a reshaped list : 1.32 ms ± 84.6 µs per loop
You notice that there is no sensitive difference in terms of performances for an equivalent result.
You could create a Generator for the slices:
grouped_items = zip(*[seq[i::3] for i in range(3)])
I'm sorry if the title is confusing. Here's a better explanation:
So basically what I need to do is iterate through every number in list and print the biggest number west (list[0:i]) and the biggest number east. If the biggest number is smaller than i, we print i. So for list [1, 3, 2, 4, 3] the output should be:
1 4
3 4
3 4
4 4
4 3
I thought my code was correct but it doesn't work for the last number in list, is anyone able to help?
'a' is the list in my code
a = [1, 3, 2, 4, 3]
for i in a:
west = a[0:i]
east = a[i:int(len(a))]
if max(west) > i:
print(max(west))
else:
print(i)
if max(east) > i:
print(max(east))
else:
print(i)
Try:
for i in range(len(a)):
print(max(a[:i+1]))
print(max(a[i:]))
You are not iterating over the indices in your original code; and thus the partition does not make sense.
The only mistake in your code is the for i in a loop, which loops throgh i = 1,3,2,4,3 and not i=0,1,2,3,4
The following piece of code works
a=[1,3,2,4,3]
for i in range(len(a)) :
print max(i,max(a[:i+1]))
print max(i,max(a[i:]))
this may work... not fully tested but it looks correct
a = [1, 3, 2, 4, 3]
for i in a[:-1]:
west = a[0:i]
east = a[i:int(len(a))]
if max(west) > i:
print(max(west))
else:
print(i)
if max(east) > i:
print(max(east))
else:
print(i)
num = a[-1]
west = a[0:-1]
if max(west) > num:
print(max(west))
else:
print(str(a[-1]))
print(str(a[-1]))
Output: 1 4 3 4 4 4 4 3