Pandas cumsum separated by comma - python

I have a dataframe with a column with data as:
my_column my_column_two
1,2,3 A
5,6,8 A
9,6,8 B
5,5,8 B
if I do:
data = df.astype(str).groupby('my_column_two').agg(','.join).cumsum()
data.iloc[[0]]['my_column'].apply(print)
data.iloc[[1]]['my_column'].apply(print)
I have:
1,2,3,5,6,8
1,2,3,5,6,89,6,8,5,5,8
how can I have 1,2,3,5,6,8,9,6,8,5,5,8 so the cummulative adds a comma when adding the previous row? (Notice 89 should be 8,9)

Were you after?
df['new']=df.groupby('my_column_two')['my_column'].apply(lambda x: x.str.split(',').cumsum())
my_column my_column_two new
0 1,2,3 A [1, 2, 3]
1 5,6,8 A [1, 2, 3, 5, 6, 8]
2 9,6,8 B [9, 6, 8]
3 5,5,8 B [9, 6, 8, 5, 5, 8]

Related

Dataframe age column grouping in pandas [duplicate]

It seems like a simple question, but I need ur help.
For example, I have df:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [2, 1, 3, 1, 8, 9, 6, 7, 4, 6]
How can I group 'x' in range from 1 to 5, and from 6 to 10 and calc mean 'y' value for this two bins?
I expect to get new df like:
x_grpd = [5, 10]
y_grpd = [3, 6.4]
Range of 'x' is given as an example. Ideally i want to be able to set any int value to get different bins quantity.
You can use cut and groupby.mean:
bins = [5, 10]
df2 = (df
.groupby(pd.cut(df['x'], [0]+bins,
labels=bins,
right=True))
['y'].mean()
.reset_index()
)
Output:
x y
0 5 3.0
1 10 6.4

extract elements of tuple from a pandas series

I have a pandas series with data of type tuple as list elements. The length of the tuple is exactly 2 and there are a bunch of NaNs. I am trying to split each list in the tuple into its own column.
import pandas as pd
import numpy as np
df = pd.DataFrame({'val': [([1,2,3],[4,5,6]),
([7,8,9],[10,11,12]),
np.nan]
})
Expected Output:
If you know the lenght of tuples are exactly 2, you can do:
df["x"] = df.val.str[0]
df["y"] = df.val.str[1]
print(df[["x", "y"]])
Prints:
x y
0 [1, 2, 3] [4, 5, 6]
1 [7, 8, 9] [10, 11, 12]
2 NaN NaN
You could also convert the column to a list and cast it to the DataFrame constructor (fill None with np.nan as well):
out = pd.DataFrame(df['val'].tolist(), columns=['x','y']).fillna(np.nan)
Output:
x y
0 [1, 2, 3] [4, 5, 6]
1 [7, 8, 9] [10, 11, 12]
2 NaN NaN
One way using pandas.Series.apply:
new_df = df["val"].apply(pd.Series)
print(new_df)
Output:
0 1
0 [1, 2, 3] [4, 5, 6]
1 [7, 8, 9] [10, 11, 12]
2 NaN NaN

Is there any method to append test data with predicted data?

I have 1 random array of tested dataset like array=[[5, 6 ,7, 1], [5, 6 ,7, 4], [5, 6 ,7, 3]] and 1 array of predicted data like array_pred=[10, 3, 4] both with the equal length. Now I want to append this result like this in 1 res_array = [[5, 6 ,7, 1, 10], [5, 6 ,7, 4, 3], [5, 6 ,7, 3, 4]]. I don't know what to say it but I want this type of result in python. Actually I have to store it in a dataframe and then have to generate an excel file from this data. this is what I want. Is it possible??
Use numpy.vstack for join arrays, convert to Series and then to excel:
a = np.hstack((array, np.array(array_pred)[:, None]))
#thank you #Ch3steR
a = np.column_stack([array, array_pred])
print(a)
0 [5, 6, 7, 1, 10]
1 [5, 6, 7, 4, 3]
2 [5, 6, 7, 3, 4]
dtype: object
s = pd.Series(a.tolist())
print (s)
0 [5, 6, 7, 1, 10]
1 [5, 6, 7, 4, 3]
2 [5, 6, 7, 3, 4]
dtype: object
s.to_excel(file, index=False)
Or if need flatten values convert to DataFrame, Series and use concat:
df = pd.concat([pd.DataFrame(array), pd.Series(array_pred)], axis=1, ignore_index=True)
print(df)
0 1 2 3 4
0 5 6 7 1 10
1 5 6 7 4 3
2 5 6 7 3 4
And then:
df.to_excel(file, index=False)

Maximum of an array constituting a pandas dataframe cell

I have a pandas dataframe in which a column is formed by arrays. So every cell is an array.
Say there is a column A in dataframe df, such that
A = [ [1, 2, 3],
[4, 5, 6],
[7, 8, 9],
... ]
I want to operate in each array and get, e.g. the maximum of each array, and store it in another column.
In the example, I would like to obtain another column
B = [ 3,
6,
9,
...]
I have tried these approaches so far, none of them giving what I want.
df['B'] = np.max(df['A']);#
df.applymap (lambda B: A.max())
df['B'] = df.applymap (lambda B: np.max(np.array(df['A'].tolist()),0))
How should I proceed? And is this the best way to have my dataframe organized?
You can just apply(max). It doesn't matter if the values are lists or np.array.
df = pd.DataFrame({'a': [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
df['b'] = df['a'].apply(max)
print(df)
Outputs
a b
0 [1, 2, 3] 3
1 [4, 5, 6] 6
2 [7, 8, 9] 9
Here is one way without apply:
df['B']=np.max(df['A'].values.tolist(),axis=1)
A B
0 [1, 2, 3] 3
1 [4, 5, 6] 6
2 [7, 8, 9] 9

How to Replace Pandas Series with Dictionary Values

I want to link my dictionary values to pandas series object. I have already tried replace method and map method still no luck.
As per link:
Replace values in pandas Series with dictionary
Still not working, my sample pandas looks like:
index column
0 ESL Literacy
1 Civics Government Team Sports
2 Health Wellness Team Sports
3 Literacy Mathematics
4 Mathematics
Dictionary:
{'civics': 6,
'esl': 5,
'government': 7,
'health': 8,
'literacy': 1,
'mathematics': 4,
'sports': 3,
'team': 2,
'wellness': 9}
Desired Output:
0 [5,1]
1 [6,7,2,3]
2 [8,9,2,3]
3 [1,4]
4 [4]
Any help would be appreciated. Thank you :)
A fun solution
s=df.column.str.get_dummies(' ')
s.dot(s.columns.str.lower().map(d).astype(str)+',').str[:-1].str.split(',')
Out[413]:
0 [5, 1]
1 [6, 7, 3, 2]
2 [8, 3, 2, 9]
3 [1, 4]
4 [4]
dtype: object
Or in pandas 0.25.0 we can use explode:
df.column.str.split().explode().str.lower().map(d).groupby(level=0).agg(list)
Out[420]:
0 [5, 1]
1 [6, 7, 2, 3]
2 [8, 9, 2, 3]
3 [1, 4]
4 [4]
Name: column, dtype: object
Using str.lower, str.split, and a comprehension.
u = df['column'].str.lower().str.split('\s+')
pd.Series([[d.get(word) for word in row] for row in u])
0 [5, 1]
1 [6, 7, 2, 3]
2 [8, 9, 2, 3]
3 [1, 4]
4 [4]
dtype: object

Categories

Resources