Select next N rows in pandas dataframe using iterrows

Select next N rows in pandas dataframe using iterrows - python

I need to select each time N rows in a pandas Dataframe using iterrows.
Something like this:
def func():
selected = []
for i in range(N):
selected.append(next(dataframe.iterrows()))
yield selected
But doing this selected has N equal elements. And each time I call func I have always the same result (the first element of the dataframe).
If the dataframe is:
A B C
0 5 8 2
1 1 2 3
2 4 5 6
3 7 8 9
4 0 1 2
5 3 4 5
6 7 8 6
7 1 2 3
What I want to obtain is:
N = 3
selected = [ [5,8,2], [1,2,3], [4,5,6] ]
then, calling again the function,
selected = [ [7,8,9], [0,1,2], [3,4,5] ]
then,
selected = [ [7,8,6], [1,2,3], [5,8,2] ]

No need for .iterrows(), rather use slicing:
def flow_from_df(dataframe: pd.DataFrame, chunk_size: int = 10):
for start_row in range(0, dataframe.shape[0], chunk_size):
end_row = min(start_row + chunk_size, dataframe.shape[0])
yield dataframe.iloc[start_row:end_row, :]
To use it:
get_chunk = flow_from_df(dataframe)
chunk1 = next(get_chunk)
chunk2 = next(get_chunk)
Or not using a generator:
def get_chunk(dataframe: pd.DataFrame, chunk_size: int, start_row: int = 0) -> pd.DataFrame:
end_row = min(start_row + chunk_size, dataframe.shape[0])
return dataframe.iloc[start_row:end_row, :]

I am assuming you are calling the function in a loop. You can try this.
def select_in_df(start, end):
selected = data_frame[start:end]
selected = select.values.tolist()
return selected
print(select_in_df(0, 4)) #to update the start and end values, you can use any loop or whatever is your convenience
#here is an example
start = 0
end = 3
for i in range(10): #instead of range you can use data_frame.iterrows()
select_in_df(start, end+1) #0:4 which gives you 3 rows
start = end+1
end = i

return should be used instead of yield. if you want plain data in selected as list of list you can do this:
def func():
selected = []
for index, row in df.iterrows():
if(index<N):
rowData =[]
rowData.append(row['A'])
rowData.append(row['B'])
rowData.append(row['C'])
selected.append(rowData)
else:
break
return selected

I think I found an answer, doing this
def func(rowws = df.iterrows(), N=3):
selected = []
for i in range(N):
selected.append(next(rowws))
yield selected
selected = next(func())

Try using:
def func(dataframe, N=3):
return np.array_split(dataframe.values, N)
print(func(dataframe))
Output:
[array([[5, 8, 2],
[1, 2, 3],
[4, 5, 6]]), array([[7, 8, 9],
[0, 1, 2],
[3, 4, 5]]), array([[7, 8, 6],
[1, 2, 3]])]

Related

Function to reverse every sub-array group of size k in python

I did this code so I reverse sub array group of integers but actually it only reverse the first sub array only and I don't know why this is happening!!!
Here is the code:
def reverseInGroups(self, arr, N, K):
rev=list()
count=0
reach=K
limit=0
while limit<N-1:
rev[limit:reach]=reversed(arr[limit:reach])
limit=limit+K
reach=reach+K
if reach==N-1 or reach<N-1:
continue
elif reach>N-1:
reach=N-1
return rev
This is the the input,excpected output and my output:
For Input:
5 3
1 2 3 4 5
Your Output:
1 2 3 4 5
Expected Output:
3 2 1 5 4

I tried your code online and its fine, but you have one logic error in your function to get your desired output.
while limit<N-1:
rev[limit:reach]=reversed(arr[limit:reach])
limit=limit+K #3
reach=reach+K #6
if reach==N-1 or reach<N-1:
continue
elif reach>N-1:
reach=N #5
this is an image to see what I mean image description

You don't have to create new list rev, you can reverse items in list arr. For example:
def reverseInGroups(arr, N, K):
limit = 0
while limit < N:
arr[limit : limit + K] = reversed(arr[limit : limit + K])
limit += K
return arr
l = [1, 2, 3, 4, 5]
print(reverseInGroups(l, 5, 3))
Prints:
[3, 2, 1, 5, 4]

I suggest you use this simpler solution:
arr = [1, 2, 3, 4, 5]
n = 5
k = 3
new_arr = list()
for index in range(0, n - 1, k):
new_arr[index:index+k] = arr[index:index+k][::-1]
print(new_arr)
And the output is:
[3, 2, 1, 5, 4]
After putting this code in your function, it is as below:
def reverseInGroups(self, arr, n, k):
new_arr = list()
for index in range(0, n - 1, k):
new_arr[index:index+k] = arr[index:index+k][::-1]
return new_arr

we can do the loop in the increment of K and then reverse the array of that specific size
def reverseInGroups(self, arr, N, K):
# code here
for i in range(0, N -1 , K):
arr[i:i +K] = arr[i:i +K][::-1]

How to generate a matrix with consecutive/continues integers like [ [1, 2, 3], [4, 5, 6], [7, 8, 9]... ]?

I have n rows and m columns I need a matrix which consists of consecutive m numbers like 1,2,3,4 this in each of the n rows that needs to be in ever increasing order.
example: 3 X 4 matrix\
**[\
[1, 2, 3, 4], \
[5, 6, 7, 8],\
[9, 10, 11, 12]\
]**
The intuition is very simple. What we need is our starting element in eaxh row should be the next element of the previous row's last element. That is the only tricky part in this problem.
For that we start our next row generation with arr[i-1][-1] to the arr[i-1][-1] + m. But for the first row generation we start from 1 since the matrix is empty.
Code
mat = []
n,m = map(int,input().split())
for row in range(n):
# if the row is starting row we start it with 1
# Else we assign k to the prev rows
if row == 0:
k = 1
else:
k = mat[row-1][-1] + 1
x = []
#the new row starts from previous rows last elemnt + 1
for j in range(k,k+m):
x.append(j)
mat.append(x)
print(mat)

First generate a continuous sequence of numbers and then adjust the format, with reference to either:(n and m represent the number of rows and columns respectively)
use the built-in functions, array to generate sequences, reshape to adjust the layout
import numpy as np
n, m = map(int, input().split())
res = np.arange(1, n*m+1).reshape(n, m)
print(res)
using list generative
items = list(range(1, m*n+1))
res = [items[i:i+m] for i in range(0, len(items), m)]
print(res)

here's a one liner to achieve that -
row, col = 3, 4
mat = [[col*i + j for j in range(1, col+1)] for i in range(row)]
print(mat)

Remove smallest value from array after append to New array

I am running a selection sort function in Python that works with numpy arrays instead of lists (so I can't use .pop for this, I don't think).
The function is:
def selectionSort(arr):
newArr = []
for i in range(len(arr)):
smallest = findSmallest(arr)
newArr.append((smallest))
arr = arr[(arr > smallest)]
return newArr
I want that "arr = arr[(arr > smallest)] which obviously doesn't work, to remove the smallest value (or, the value appended to newArr i.e the same value) from the passed array in the same way that .pop would do with a list.
I've tried things along these lines:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
index = [2, 3, 6]
new_a = np.delete(a, index)
But couldn't get it to work. At the end of the day, I need to get something in the format of:
arr = randint(0,10,20)
to return an array sorted in ascending order. All I can manage is returning the smallest values repeated.
Thanks for any help

Try
arr = arr[np.where(arr > smallest)]

You may try:
arr = arr[ arr != np.min(a)]
This way you'll take from arr all the elements except the smallest one and reassign them to arr.

Your algorithm is almost correct. Indeed, it works if there are no duplicate values in arr:
import numpy as np
def selectionSort(arr):
newArr = []
for i in range(len(arr)):
smallest = findSmallest(arr)
newArr.append((smallest))
arr = arr[(arr > smallest)]
return newArr
findSmallest = np.min
# no duplicate values
auniq = np.random.choice(np.arange(20), (10,), replace=False)
print(auniq)
print(selectionSort(auniq))
Sample run:
[ 0 1 7 4 10 14 13 16 9 12]
[0, 1, 4, 7, 9, 10, 12, 13, 14, 16]
If there are duplicates it will crash because upon removing a minimum with duplicates the duplicates will be removed as well and that throws off the logic of the loop.
# duplicate values
adupl = np.random.randint(0, 9, (10,))
print(adupl)
# next line would crash
#print(selectionSort(adupl))
One fix is to only remove one copy of duplicates. This can for example be done using argmin which returns the index of the/one minimum, not its value.
def selectionSort2(arr):
arr = np.array(arr)
sorted = np.empty_like(arr)
for i in range(len(sorted)):
j = arr.argmin()
sorted[i] = arr[j]
arr = np.delete(arr, j)
return sorted
print(selectionSort2(adupl))
This works but is terribly inefficient because np.delete is more or less O(n). It is cheaper to swap the minimum element with a boundary element and then cut that off:
def selectionSort3(arr):
arr = np.array(arr)
sorted = np.empty_like(arr)
for i in range(len(sorted)):
j = arr[i:].argmin()
sorted[i] = arr[i + j]
arr[i], arr[i + j] = arr[i + j], arr[i]
return sorted
print(selectionSort3(adupl))
Looking at selectionSort3 we can observe that the separate output sorted is not actually needed, because arr has been sorted inplace already:
def selectionSort4(arr):
arr = np.array(arr)
for i in range(len(arr)):
j = arr[i:].argmin()
arr[i], arr[i + j] = arr[i + j], arr[i]
return arr
print(selectionSort4(adupl))
Sample output (adupl and output of selectionSort2-4):
[0 4 3 8 8 4 5 0 4 2]
[0 0 2 3 4 4 4 5 8 8]
[0 0 2 3 4 4 4 5 8 8]
[0 0 2 3 4 4 4 5 8 8]

How can I do an operation only for one value on the fly in a numpy array? (or the better way)

I have an array e.g. a = np.arange(0, 5) and it looks like this array([0, 1, 2, 3, 4]).
Now what I want to do is e.g. to add only at the index 2 a value like 5 and get a new numpy array object.
Like this: b = add_value_on_idx(a, 2, 5) # This should return a new object
I have tried this approaches to achieve this
#! /usr/bin/python3.5
import numpy as np
from time import time
# Get a numpy array first
a = np.random.randint(0, 10, (5, ))
# Define a new tuple of zeros with only one number at the
# specific index
def do_add_at_idx_1(arr, idx, val):
return arr+(((0, )*(idx))+(val, )+(0, )*(arr.shape[0]-idx-1))
# Or do it like this
def do_add_at_idx_2(arr, idx, val):
one_val_vec = np.zeros((arr.shape[0], )).astype(arr.dtype)
one_val_vec[idx] = val
return arr+one_val_vec
# Or add it directly to the given array
def do_add_at_idx_3(arr, idx, val):
arr = arr.copy()
arr[idx] += val
return arr
# Or use directly np.add.at function (but need a copy first)
def do_add_at_idx_4(arr, idx, val):
arr = arr.copy()
np.add.at(arr, idx, val)
return arr
a_old = a.copy()
print("a: {}".format(a))
print("do_add_at_idx_1(a, 2, 5): {}".format(do_add_at_idx_1(a, 2, 5)))
print("do_add_at_idx_2(a, 2, 5): {}".format(do_add_at_idx_2(a, 2, 5)))
print("do_add_at_idx_3(a, 2, 5): {}".format(do_add_at_idx_3(a, 2, 5)))
print("do_add_at_idx_4(a, 2, 5): {}".format(do_add_at_idx_4(a, 2, 5)))
print("Was 'a' modified? {}".format("No." if np.all(a==a_old) else "YES!!"))
Which will output something like this:
a: [6 6 9 2 8]
do_add_at_idx_1(a, 2, 5): [ 6 6 14 2 8]
do_add_at_idx_2(a, 2, 5): [ 6 6 14 2 8]
do_add_at_idx_3(a, 2, 5): [ 6 6 14 2 8]
do_add_at_idx_4(a, 2, 5): [ 6 6 14 2 8]
Was 'a' modified? No.
I have also tested the time differences for these functions too:
n = 100000
print("n: {}".format(n))
idxs = np.random.randint(0, a.shape[0], (n, ))
vals = np.random.randint(0, 10, (n, ))
for i in range(1, 5):
func_name = "do_add_at_idx_{}".format(i)
func = globals()[func_name]
start = time()
for idx, val in zip(idxs, vals):
func(a, idx, val)
delta = time()-start
print("Taken time for func '{}': {:2.4f}s".format(func_name, delta))
With this as an output:
Taken time for func 'do_add_at_idx_1': 2.1899s
Taken time for func 'do_add_at_idx_2': 1.4600s
Taken time for func 'do_add_at_idx_3': 0.5757s
Taken time for func 'do_add_at_idx_4': 2.8043s
Are there any better ways to do this approach?

Adding to a specific variable in a list

I'm trying to add two lists. If the last variable is greater than 10, it needs to carry over to the previous variable in the list. For example :
1 / 2 / 3 (List 1)
7 / 8 / 9 (List 2)
Should equal
9 / 1 / 2 not 8/10/12
So far, I have
list1 = [1, 2, 3]
list2 = [7, 8, 9]
SumOfLists = [x+y for x,y in zip(list1, list2)]
That adds the lists together, but I'm not sure how to make the number carry over.

You can try this code.
list1 = [1, 2, 3]
list2 = [7, 8, 9]
def add_list(a,b):
carry = 0
res_list = []
for i,j in zip(a[::-1],b[::-1]): # Iterate through the lists in reverse
val = (i+j+carry)%10 # Store the sum in val
carry = (i+j+carry)//10 # Store the carry
res_list.append(val) # Append to the returning list
return res_list[::-1] # Return the list
print add_list(list1,list2)
Wil print
[9, 1, 2]
Algorithm
Loop through each of the values in reverse. Add each corresponding values. If the values are above 10 then find the exceeding value and put it to carry. Finally return the reverse of the list.

list1 = [1, 2, 3]
list2 = [7, 8, 9]
cur = 0 # num to carry over
result = []
for x,y in zip(reversed(list2),reversed(list1)):
if x + y + cur > 10: # if sum greater than 10, remember to add 1 on
t = x+y + cur # the next loop
d = str(t)[1] # get the rightmost digit
result.append(int(d))
cur = 1
else: # nothing to curry over, but still add cur,
# it may be 1
result.append(x+y+cur)
cur = 0
print(list(reversed(result)) )
[9, 1, 2]

just subtract 10 if it's more then 10 and add 1 to it's previous element. Do this proccess for all element in sum list
if SumOfLists[2] >= 10:
SumOfLists[2] -= 10
SumOfLists[1] += 1
And at last check
if SumOfLists[0] >= 10:
for i in range(len(SumOfLists)-1,0,-1):
SumOfLists[i] = SumOfLists[i-1]
SumOfLists[0] = 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select next N rows in pandas dataframe using iterrows - python

I think I found an answer, doing this def func(rowws = df.iterrows(), N=3): selected = [] for i in range(N): selected.append(next(rowws)) yield selected selected = next(func())

Try using: def func(dataframe, N=3): return np.array_split(dataframe.values, N) print(func(dataframe)) Output: [array([[5, 8, 2], [1, 2, 3], [4, 5, 6]]), array([[7, 8, 9], [0, 1, 2], [3, 4, 5]]), array([[7, 8, 6], [1, 2, 3]])]

Related

Function to reverse every sub-array group of size k in python

How to generate a matrix with consecutive/continues integers like [ [1, 2, 3], [4, 5, 6], [7, 8, 9]... ]?

Remove smallest value from array after append to New array

How can I do an operation only for one value on the fly in a numpy array? (or the better way)

Adding to a specific variable in a list

Categories

Resources