Dataframe with column of ranges. Given number, select row where number occurs - python

I have a dataframe with a column of a range of numbers, and then more columns of data
[1, 2, 3, ..., 10] | a | b
[11, 12, 13, 14, ...] | c | d
Given a number like 10, 14, etc. how do I select the row where that number is in the range, i.e for 10 I want [1, 2, 3, ..., 10] | a | b row to be returned.
So far Ive tried dfs['A'].ix[10 in dfs['A']['B']] where dfs is a dictionary of dataframes, 'A' is a dataframe, 'B' is the column with ranges.
How do I do this?

Use apply to loop through column B and check each element individually which returns a logical index for subsetting:
df = pd.DataFrame({"B": [list(range(1,11)), list(range(11,21))], "col1":["a", "b"], "col2":["c", "d"]})
df[df["B"].apply(lambda x: 10 in x)]
# B col1 col2
# 0 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] a c

df = pd.DataFrame({'ranges':[range(11), range(11,20)], 'dat1':['a','c'], 'dat2':['b','d']})
mask = df.ranges.apply(lambda x: 10 in x)
df.ix[mask]

Related

Dataframe age column grouping in pandas [duplicate]

It seems like a simple question, but I need ur help.
For example, I have df:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [2, 1, 3, 1, 8, 9, 6, 7, 4, 6]
How can I group 'x' in range from 1 to 5, and from 6 to 10 and calc mean 'y' value for this two bins?
I expect to get new df like:
x_grpd = [5, 10]
y_grpd = [3, 6.4]
Range of 'x' is given as an example. Ideally i want to be able to set any int value to get different bins quantity.
You can use cut and groupby.mean:
bins = [5, 10]
df2 = (df
.groupby(pd.cut(df['x'], [0]+bins,
labels=bins,
right=True))
['y'].mean()
.reset_index()
)
Output:
x y
0 5 3.0
1 10 6.4

Creating a list in a Dataframe column which is a range of values from other two data frame columns

I need to create a list in a dataframe column, which is a range of numbers. The range limits should be the values in other two data frame columns.
df = pd.DataFrame({'A': [3, 7, 2, 8], 'B': [1, 3, 9, 3]},index=[1,2,3,4])
Now In need a dataframe column which will be series of lists like below
[1,2,3]
[3,4,5,6,7]
[2,3,4,5,6,7,8,9]
[3,4,5,6,7,8]
I'm able to create a list in a dataframe column this way.
df['C'] = (df[['A','B']]).to_numpy().tolist()
This gives a column as below
[3,1]
[7,3]
[2,9]
[8,3]
But I'm not able to figure out how to create a list that is range of these values in a dataframe column.
I have also defined a fuction which will generate a list of range of numbers for any given two numbers
def createlist(r1,r2):
if (r1 == r2):
return r1
elif (r1 < r2):
res = []
while(r1 < r2+1 ):
res.append(r1)
r1 += 1
return res
else:
res = []
while(r1+1 > r2 ):
res.append(r2)
r2 += 1
return res
But struggling to apply this function to generate a dataframe column while taking inputs from other columns. Can you please help out? Thanks in advance.
You can try DataFrame.apply on rows
df['C'] = df.apply(lambda row: list(range(row.min(), row.max()+1)), axis=1)
print(df)
A B C
1 3 1 [1, 2, 3]
2 7 3 [3, 4, 5, 6, 7]
3 2 9 [2, 3, 4, 5, 6, 7, 8, 9]
4 8 3 [3, 4, 5, 6, 7, 8]

Pandas cumsum separated by comma

I have a dataframe with a column with data as:
my_column my_column_two
1,2,3 A
5,6,8 A
9,6,8 B
5,5,8 B
if I do:
data = df.astype(str).groupby('my_column_two').agg(','.join).cumsum()
data.iloc[[0]]['my_column'].apply(print)
data.iloc[[1]]['my_column'].apply(print)
I have:
1,2,3,5,6,8
1,2,3,5,6,89,6,8,5,5,8
how can I have 1,2,3,5,6,8,9,6,8,5,5,8 so the cummulative adds a comma when adding the previous row? (Notice 89 should be 8,9)
Were you after?
df['new']=df.groupby('my_column_two')['my_column'].apply(lambda x: x.str.split(',').cumsum())
my_column my_column_two new
0 1,2,3 A [1, 2, 3]
1 5,6,8 A [1, 2, 3, 5, 6, 8]
2 9,6,8 B [9, 6, 8]
3 5,5,8 B [9, 6, 8, 5, 5, 8]

Using Pandas groupby how can you aggregate a column of lists using addition?

I have a dataframe with a column that contains a list of values. Each row in the dataframe has a list of the same length. I'd like to use Dataframe.groupby to group the data in the dataframe and sum together the lists in the following fashion:
In:
import pandas as pd
#Sample data
a = pd.DataFrame([['a', 'test', list([0,1,2,3,4])],['b', 'test', list([5,6,7,8,9])]], columns=['id', 'grp', 'values'])
print(a)
#Some function to group the dataframe
#b = a.groupby('grp').someAggregationFunction()
#Example of desired output
b = pd.DataFrame([['test', list([5,7,9,11,13])]], columns=['grp', 'values'])
print(b)
Out:
id grp values
0 a test [0, 1, 2, 3, 4]
1 b test [5, 6, 7, 8, 9]
grp values
0 test [5, 7, 9, 11, 13]
You may not like this answer, but it's better not to use lists in dataframes. You should seek, wherever possible, to use numeric series for numeric data:
res = df.join(pd.DataFrame(df.pop('values').tolist()))\
.groupby('grp').sum().reset_index()
print(res)
grp 0 1 2 3 4
0 test 5 7 9 11 13
Push it into one line
a.groupby('grp')['values'].apply(lambda x : pd.DataFrame(x.values.tolist()).sum().tolist())
Out[286]:
grp
test [5, 7, 9, 11, 13]
Name: values, dtype: object
Also I recommend do not using apply here
b=pd.DataFrame(a['values'].values.tolist()).groupby(a['grp']).sum()
pd.DataFrame({'grp':b.index,'values':b.values.tolist()})
Out[293]:
grp values
0 test [5, 7, 9, 11, 13]
One solution is to transform your lists into np.arrays and use simple sum
a['v'] = a.v.transform(np.array)
a.groupby('grp').v.apply(lambda x: x.sum())
grp v
0 test [5, 7, 9, 11, 13]
Notice that I changed values to v not to be mistaken with the .values accessor from pd.DataFrame
Using numpy.stack:
pd.DataFrame(
[(i, np.stack(g).sum(0)) for i, g in a.groupby('grp')['values']],
columns=['grp', 'values']
)
grp values
0 test [5, 7, 9, 11, 13]
Also using apply, but apply will be slow:
a.groupby('grp')['values'].apply(lambda x: np.stack(x).sum(0)).to_frame('values')
values
grp
test [5, 7, 9, 11, 13]

Convert pandas dataframe to a column-based order list

I want to convert pandas dataframe into a list.
For example, I have a dataframe like below, and I want to make list with all columns.
Dataframe (df)
A B C
0 4 8
1 5 9
2 6 10
3 7 11
Expected result
[[0,1,2,3], [4,5,6,7], [8,9,10,11]]
If I use df.values.tolist(), it will return in row-based order list like below.
[[0,4,8], [1,5,9], [2,6,10], [3,7,11]]
It is possible to transpose the dataframe, but I want to know whether there are better solutions.
I think simpliest is transpose.
Use T or numpy.ndarray.transpose:
df1 = df.T.values.tolist()
print (df1)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Or:
df1 = df.values.transpose().tolist()
print (df1)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Another answer with list comprehension, thank you John Galt:
L = [df[x].tolist() for x in df.columns]

Categories

Resources