I have the following dataframe
import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
'time': [1,2,3,1,2,3],
'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
'col_a': [1,2,3,4,5,6],
'col_b': [-1,-2,-3,-4,-5,-6],
'col_c': [10,20,30,40,50,60]})
id time col_id col_a col_b col_c
0 1 1 ffp 1 -1 10
1 1 2 ffp 2 -2 20
2 1 3 ffp 3 -3 30
3 2 1 hie 4 -4 40
4 2 2 hie 5 -5 50
5 2 3 ttt 6 -6 60
I would like to create a new col in foo, which will take the value of either col_a or col_b or col_c, depending on the value of col_id.
I am doing the following:
foo['col'] = np.where(foo.col_id == "ffp", foo.col_a,
np.where(foo.col_id == "hie",foo.col_b, foo.col_c))
which gives
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60
Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:
dict_cols_matching = {"ffp" : "col_a", "hie": "col_b", "ttt": "col_c"}
Any ideas ?
You can map the values of the dictionary on col_id, then perform indexing lookup:
import numpy as np
idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))
foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]
Output:
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60
With np.select function to arrange condition list to choice list:
foo['col'] = np.select([foo.col_id.eq("ffp"), foo.col_id.eq("hie"), foo.col_id.eq("ttt")],
[foo.col_a, foo.col_b, foo.col_c])
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60
You can use lambda function to select the column based on your id, but the method depends on the order of the columns, adjust the parameter 3 if you change the order.
import pandas as pd
import numpy as np
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
'time': [1,2,3,1,2,3],
'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
'col_a': [1,2,3,4,5,6],
'col_b': [-1,-2,-3,-4,-5,-6],
'col_c': [10,20,30,40,50,60]})
idSet = np.unique(foo['col_id'].to_numpy()).tolist()
foo['col'] = foo.apply(lambda x: x[idSet.index(x.col_id)+3], axis=1)
display(foo)
Output
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60
You might use a reset_index in combination with a rowwise apply:
foo[["col_id"]].reset_index().apply(lambda u: foo.loc[u["index"],dict_cols_matching[u["col_id"]]], axis=1)
I have table like this:
id a b
1 -10 -5
2 -5 0
3 0 5
4 5 10
i want to replace to -9999999 and 9999999 in the beginning and last like this:
id a b
1 -9999999 -5
2 -5 0
3 0 5
4 5 9999999
is there any possible way to do that?
Use if id is column use DataFrame.iat for set first row and second column and last row and last column by indexing:
df.iat[0, 1] = -9999999
df.iat[-1, -1] = 9999999
print (df)
id a b
0 1 -9999999 -5
1 2 -5 0
2 3 0 5
3 4 5 9999999
Use if id is index set first column:
df.iat[0, 0] = -9999999
df.iat[-1, -1] = 9999999
print (df)
a b
id
1 -9999999 -5
2 -5 0
3 0 5
4 5 9999999
Let's say I have a dataframe df:
df = pd.DataFrame({'col1': [1,1,2,2,2], 'col2': ['A','B','A','B','C'], 'value': [2,4,6,8,10]})
col1 col2 value
0 1 A 2
1 1 B 4
2 2 A 6
3 2 B 8
4 2 C 10
I'm looking for a way to create any missing rows among the possible combination of col1 and col2 with exiting values, and fill in the missing rows with zeros
The desired result would be:
col1 col2 value
0 1 A 2
1 1 B 4
2 2 A 6
3 2 B 8
4 2 C 10
5 1 C 0 <- Missing the "1-C" combination, so create it w/ value = 0
I've looked into using stack and unstack to make this work, but I'm not sure that's exactly what I need.
Thanks in advance
Use pivot , then stack
df.pivot(*df.columns).fillna(0).stack().to_frame('values').reset_index()
Out[564]:
col1 col2 values
0 1 A 2.0
1 1 B 4.0
2 1 C 0.0
3 2 A 6.0
4 2 B 8.0
5 2 C 10.0
Another way using unstack with fill_value=0 and stack, reset_index
df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()
Out[311]:
col1 col2 value
0 1 A 2
1 1 B 4
2 1 C 0
3 2 A 6
4 2 B 8
5 2 C 10
You could use reindex + MultiIndex.from_product:
index = pd.MultiIndex.from_product([df.col1.unique(), df.col2.unique()])
result = df.set_index(['col1', 'col2']).reindex(index, fill_value=0).reset_index()
print(result)
Output
col1 col2 value
0 1 A 2
1 1 B 4
2 1 C 0
3 2 A 6
4 2 B 8
5 2 C 10
I have a dataframe with a column populated with groups of 1s and 0s. How can I assign each group a consecutive number beginning from 1?
I have tried a for loop across rows, but I need a column operation for fast performance.
d = {'col1': [1,1,1,0,0,1,1,0,0,0,1,1]}
df1 = pd.DataFrame(data=d)
df1
col1
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 0
8 0
9 0
10 1
11 1
I need the following output:
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5
You can compare shifted values for not equal and add cumulative sum by Series.cumsum:
df1['col2'] = df1['col1'].ne(df1['col1'].shift()).cumsum()
print (df1)
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5
I have a Pandas data frame like so:
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
df = pd.DataFrame(data=d)
Which looks like:
doc sent col1 col2 col3
0 0 0 5 4 8
1 0 1 6 3 2
2 0 2 1 2 9
3 1 0 6 1 6
4 1 1 5 1 5
I'd like to bind the previous row and the next next row to each column like so (accounting for "doc" and "sent" column in my example, which count as indices that nothing can come before or after as seen below):
doc sent col1 col2 col3 p_col1 p_col2 p_col3 n_col1 n_col2 n_col3
0 0 0 5 4 8 0 0 0 6 3 2
1 0 1 6 3 2 5 4 8 1 2 9
2 0 2 1 2 9 6 3 2 6 1 6
3 1 0 6 1 6 0 0 0 5 1 5
4 1 1 5 1 5 6 1 6 0 0 0
use pd.DataFrame.shift to get the prev / next rows, pd.concat to merge the dataframes & fillna to set nulls to zero
The presence of nulls upcasts the ints to floats, since numpy integer arrays cannot contain null values, which are cast back to ints after replacing nulls with 0.
cs = ['col1', 'col2', 'col3']
g = df.groupby('doc')
pd.concat([
df,
g[cs].shift(-1).add_prefix('n'),
g[cs].shift().add_prefix('p')
], axis=1).fillna(0).astype(int)
outputs:
doc sent col1 col2 col3 ncol1 ncol2 ncol3 pcol1 pcol2 pcol3
0 0 0 5 4 8 6 3 2 0 0 0
1 0 1 6 3 2 1 2 9 5 4 8
2 0 2 1 2 9 0 0 0 6 3 2
3 1 0 6 1 6 5 1 5 0 0 0
4 1 1 5 1 5 0 0 0 6 1 6