Getting the sum of rows until a certain point

Getting the sum of rows until a certain point - python

I would like to have some code that would add one from the row above until a new 'SCU_KEY' comes up. For example here is code and what I would like:
df = pd.DataFrame({'SCU_KEY' : [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8], 'count':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})
Expected output:
df = pd.DataFrame({'SCU_KEY' : [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8], 'count':[1, 2, 3, 1, 2, 3, 1, 1, 2, 3, 4]})

You can try this:
import pandas as pd
df = pd.DataFrame({
'SCU_KEY': [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8],
'count': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
})
s = df['SCU_KEY']
df['count'] = s.groupby(s).cumcount() + 1
print(df)
It gives:
SCU_KEY count
0 3 1
1 3 2
2 3 3
3 5 1
4 5 2
5 5 3
6 7 1
7 8 1
8 8 2
9 8 3
10 8 4
This assumes that values of the SCU_KEY column cannot reappear once they change, or that they can reappear but then you want to continue counting them where you left off.
If, instead, each contiguous sequence of repeating values should be counted starting from 1, then you can use this instead:
s = df['SCU_KEY']
df['count'] = s.groupby((s.shift() != s).cumsum()).cumcount() + 1
For the above dataframe the result will be the same as before, but you can add, say, 3 at the end of the SCU_KEY column to see the difference.

This will do the job-
import pandas as pd
df = pd.DataFrame({'SCU_KEY' : [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8], 'count':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})
for item in set(df['SCU_KEY']):
inc = 0
for i in range(len(df.index)):
if df['SCU_KEY'][i] == item:
df['count'][i] += inc
inc += 1
P.S.- As others have mentioned, it's a good practice to show your work before asking others for solution. It shows your effort which everyone appreciates and encourages to help you.

Related

Equal Less and Greater List python

Hi guys I'm trying to figure out how to compare the previous number with the current one until the last digit.
this is the list:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7]
I need on each sequence of iteration the highest number (e.g. in the first one it's 10).
After the sequence is finalized it again begins counting from the beginning (1,2,3,4..etc) until a condition is reached.
Now the problem is that I get the result correctly all until the very last iteration, the max number should be in the 7 (as you can see: 1,2,3,4,5,6,7)
but the algorithm skips it. I tried with zip function even with iter loop the same issue.
example codes that yield the same results are the following:
def printElements(arr, n):
# Traverse array from index 1 to n-2
# and check for the given condition
for i in range(1, n - 1, 1):
if (arr[i] > arr[i - 1] and
arr[i] > arr[i + 1]):
print(arr[i], end = " ")
# Driver Code
if __name__ == '__main__':
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7]
n = len(arr)
printElements(arr, n)
print(count_shelf)
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7]
for prev, current in zip(arr, arr[1:]):
print(prev,current)
if prev > current:
x = prev
print(prev,'prev greater')
print(current,'current')
results of the last alg:
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 1
10 prev greater
1 current
1 2
2 3
3 4
4 5
5 6
6 1
6 prev greater
1 current
1 2
2 3
3 4
4 5
5 6
6 7
7 1
7 prev greater
1 current
1 2
2 3
3 1
3 prev greater
1 current
1 2
2 3
3 4
4 5
5 1
5 prev greater
1 current
1 2
2 3
3 4
4 1
4 prev greater
1 current
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 1
8 prev greater
1 current
1 2
2 3
3 4
4 5
5 6
6 7 ``

arr.append(float('-inf'))
def printElements(arr, n):
# Traverse array from index 1 to n-2
# and check for the given condition
for i in range(1, n - 1, 1):
if (arr[i] > arr[i - 1] and
arr[i] > arr[i + 1]):
print(arr[i], end = " ")
# Driver Code
if __name__ == '__main__':
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7]
arr.append(float('-inf'))
n = len(arr)
printElements(arr, n)

You can use list comprehension to get the maximum value for each sequence.
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7]
maxvals = [lst[x] for x in range(len(lst)) if x == len(lst)-1 or lst[x] > lst[x+1]]
print(maxvals)
Output
[10, 6, 7, 3, 5, 4, 8, 7]
I don't see any way to use zip to find the solution.

R sequence function in Python

pandas version: 1.2
I am trying to take a python pandas dataframe column pandas and create the same type of logic as in R that would be
ss=sequence(df$los)
Which produces for the first two records
[1] 1 2 3 4 5 1 2 3 4 5
Example dataframe:
df = pd.DataFrame([('test', 5), ('t2', 5), ('t3', 2), ('t4', 6)],
columns=['first', 'los'])
df
first los
0 test 5
1 t2 5
2 t3 2
3 t4 6
So the first row is sequenced 1-5 and second row is sequenced 1-5 and third row is sequenced 1-2 etc. In R this becomes one sequenced list. I would like that is python.
What I have been able to do is.
ss = df['los']
ss.apply(lambda x: np.array(range(1, x)))
18 [1, 2, 3, 4, 5]
90 [1, 2, 3, 4, 5]
105 [1,2]
106 [1, 2, 3, 4, 5, 6]
Which is close but then I need to combine it into a single pd.Series so that it should be:
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 1, 2, 3, 4, 5, 6]

Use explode():
df.los.apply(lambda x: np.arange(1, x+1)).explode().tolist()
Output:
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 1, 2, 3, 4, 5, 6]
Note - you can skip the ss assignment step, and use np.arange to streamline a bit.

You can just use concatenate:
np.concatenate([np.arange(x)+1 for x in df['los']])
Output:
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 1, 2, 3, 4, 5, 6])

Combining data contained in several lists

I am working on a personal project in python 3.6. I used pandas to import the data from an excel file in a dataframe and then I extracted data into several lists.
Now, I will give an example to illustrate exactly what I am trying to achieve.
So I have let's say 3 input lists a,b and c(I did insert the index and some additional white spaces for in lists so it is easier to follow):
0 1 2 3 4 5 6
a=[1, 5, 6, [10,12,13], 1, [5,3] ,7]
b=[3, [1,2], 3, [5,6], [1,3], [5,6], 9]
c=[1, 0 , 4, [1,2], 2 , 8 , 9]
I am trying to combine the data in order to get all the combinations when in one of the lists there is a list containing multiple elements. So the output needs to be like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
a=[1, 5, 5, 6, 10,10,10, 10, 12, 12, 12, 12, 13, 13, 13, 13, 1, 1, 5, 5, 3, 3, 7]
b=[3, 1, 2, 3, 5, 5, 6, 6, 5, 5, 6, 6, 5, 5, 6, 6, 1, 3, 5, 6, 5, 6, 9]
c=[1, 0, 0, 4, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 8, 8, 8, 8, 9]
To make this more clear:
From the original lists if we look at index 1 elements:
a[1]=5, b[1]=[1,2], c[1]=0. These got transformed to the following values on the 1 and 2 index positions: a[1:3]=[ 5, 5 ]; b[1:3]=[1, 2]; c[1:3]=[ 0, 0]
This needs to be applied also to index 3, 4, and 5 in the original input lists in order to obtain something similar to the example output above.
I want to be able to generalize this to more lists (a,b,c.....n). I have been able to do this for two lists, but in a totally not elegant, definitely not pythonic way. Also I think the code I wrote can't be generalized to more lists.
I am looking for some help, at least some pointers to some reading material that can help me achieve what I presented above.
Thank you!

You could do something like this.
Looks at each column, works out the combinations, then output the list:
import pandas as pd
import numpy
a=[1, 5, 6, [10,12,13], 1, [5,3] ,7]
b=[3, [1,2], 3, [5,6], [1,3], [5,6], 9]
c=[1, 0 , 4, [1,2], 2 , 8 , 9]
df = pd.DataFrame([a,b,c])
final_df = pd.DataFrame()
i=0
for col in df.columns:
temp_df = pd.DataFrame(df[col])
get_combo = []
for idx, row in temp_df.iterrows():
get_combo.append([row[i]])
combo_list = [list(x) for x in numpy.array(numpy.meshgrid(*get_combo)).T.reshape(-1,len(get_combo))]
temp_df_alpha = pd.DataFrame(combo_list).T
i+=1
if len(final_df) == 0:
final_df = temp_df_alpha
else:
final_df = pd.concat([final_df, temp_df_alpha], axis=1, sort=False)
for idx, row in final_df.iterrows():
print (row.tolist())
Output:
[1, 5, 5, 6, 10, 10, 12, 12, 13, 13, 10, 10, 12, 12, 13, 13, 1, 1, 5, 5, 3, 3, 7]
[3, 1, 2, 3, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 1, 3, 5, 6, 5, 6, 9]
[1, 0, 0, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 9]

Sort by frequency using the key argument not working as expected [duplicate]

This question already has answers here:
Sort list by frequency
(8 answers)
Closed 3 years ago.
A given array is to be sorted on the basis of the frequency of occurrence of its elements.
I tried using key=arr.count (arr is the name of the list I want to sort). It works for some inputs. I also tried using the collections.Counter() class object, it behaved similarly to how arr.count did.
>>> arr = [6, 4, 6, 4, 4, 6, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 1, 7, 7, 7, 2, 2, 2, 7, 1, 7, 1, 2, 1, 2, 7, 1, 1, 7, 2, 1, 2]
>>> sorted(arr, key=arr.count)
[6, 4, 6, 4, 4, 6, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 1, 7, 7, 7, 2, 2, 2, 7, 1, 7, 1, 2, 1, 2, 7, 1, 1, 7, 2, 1, 2]
>>> sorted(arr, key=counts.get)
[6, 4, 6, 4, 4, 6, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 1, 7, 7, 7, 2, 2, 2, 7, 1, 7, 1, 2, 1, 2, 7, 1, 1, 7, 2, 1, 2]
Expected output is:
1 1 1 1 1 1 1 2 2 2 2 2 2 2 7 7 7 7 7 7 7 3 3 3 3 3 3 5 5 5 5 4 4 4 6 6 6
Not sure what I am doing wrong here.

Use a tuple to sort first by frequency and then by value, for inverting the ordering you can use - (so smallest numbers comes first), and then since you want the biggest count first use reverse:
sorted(arr, key=lambda x: (arr.count(x), -x), reverse=True)
Output:
[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 7, 7, 7, 7, 7, 7, 7, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 4, 4, 4, 6, 6, 6]

I think the problem is that some entries have the same frequency, e.g.:
arr.count(1) == arr.count(2) == arr.count(7)
To make sure that these entries remain grouped, you have to sort not only by counts, but also by value:
counts = collections.Counter(arr)
sorted(arr, key=lambda x: (counts[x], x), reverse=True)
Output:
[7, 7, 7, 7, 7, 7, 7, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 6, 6, 6, 4, 4, 4]

Create a new series with same index

I have a series with the following
2 [2, 2, 1, 2, 0, 0, 5, 8, 7, 1, 2, 1, 0, 8, 4, ...
5 [3, 1, 5, 0]
8 [9, 0, 0, 0, 9, 0, 6, 1, 7, 0, 1, 4, 6, 1, 3, ...
9 [1, 1, 0, 8, 0, 0, 2, 9, 8, 6, 0, 3, 0]
11 [1, 0, 0, 2, 0, 0, 0, 0, 1, 1, 8, 7, 5, 7, 5, ...
I want to create a new series that keeps the index (2, 5, 8, 9, 11), with values equal to the length of the list in each row
The result would be
2 25
5 4
8 20
9 13
11 18

list(map(lambda x: (x, len(object[x])), indices))
Its somewhat pseudo code because you haven't specified your data type or variable names, but the general approach is that you have an object of data indexed by some index x. So loop over all the xs and obtain the length property of the resultant data structure.
Edit: since you stated it was pandas Series of integer lists try this:
import pandas as pd
S = pd.Series([[1,2,3], [2,3]], index=[2,4])
print(S)
# 2 [1, 2, 3]
# 4 [2, 3]
lengths = list(map(lambda x: len(S[x]), S.index))
S2 = pd.Series(lengths, index=S.index)
print(S2)
# 2 3
# 4 2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting the sum of rows until a certain point - python

Related

Equal Less and Greater List python

R sequence function in Python

Combining data contained in several lists

Sort by frequency using the key argument not working as expected [duplicate]

Create a new series with same index

Categories

Resources