Now I have to do like this:
df = pd.DataFrame({'column': ['A', 'B', 'C', 'D', 'E', 'F', 'G', '-']})
df['column'] = df['column'].str.replace('A', 'cat').replace('B', 'rabit').replace('C', 'octpath').replace('D', 'spider').replace('E', 'mammoth').replace('F', 'snake').replace('G', 'starfish')
But I think this is long and unreadable.
Do you know a simple solution?
Here is another approach using pandas.Series.replace:
d = {'A':'cat','B':'rabit', 'C':'octpath','D':'spider','E':'mammoth','F':'snake','G':'starfish'}
df['column'] = df['column'].replace(d)
Output:
column
0 cat
1 rabit
2 octpath
3 spider
4 mammoth
5 snake
6 starfish
7 -
You can define a dict of your replacement values and call map on the column passing in your dict, to handle values that are not present you can pass param na_action='ignore', this will return NaN or None as you want to keep your existing values if not present you can call fillna and pass your orig column:
In[60]:
df = pd.DataFrame({'column': ['A', 'B', 'C', 'D', 'E', 'F', 'G', '-']})
d = {'A':'cat','B':'rabit', 'C':'octpath','D':'spider','E':'mammoth','F':'snake','G':'starfish'}
df['column'] = df['column'].map(d, na_action='ignore').fillna(df['column'])
df
Out[60]:
column
0 cat
1 rabit
2 octpath
3 spider
4 mammoth
5 snake
6 starfish
7 -
df = pd.DataFrame({'column': ['A', 'B', 'C', 'D', 'E', 'F', 'G', '-']})
mapper={'A':'cat','B':'rabit','C':'octpath','D':'spider','E':'mammoth'}
df['column']=df.column.apply(lambda x:mapper.get(x))
0 cat
1 rabit
2 octpath
3 spider
4 mammoth
5 None
6 None
7 None
in case you want to set default values
df['column']=df.column.apply(lambda x:mapper.get(x) if mapper.get(x) is not None else "pandas")
df.column
0 cat
1 rabit
2 octpath
3 spider
4 mammoth
5 pandas
6 pandas
7 pandas
greatings from shibuya
Related
I am trying to rename a column and combine that renamed column to others like it. The row indexes will not be the same (i.e. I am not combining 'City' and 'State' from two columns).
df = pd.DataFrame({'Col_1': ['A', 'B', 'C'],
'Col_2': ['D', 'E', 'F'],
'Col_one':['G', 'H', 'I'],})
df.rename(columns={'Col_one' : 'Col_1'}, inplace=True)
# Desired output:
({'Col_1': ['A', 'B', 'C', 'G', 'H', 'I'],
'Col_2': ['D', 'E', 'F', '-', '-', '-'],})
I've tried pd.concat and a few other things, but it fails to combine the columns in a way I'm expecting. Thank you!
This is melt and pivot after you have renamed:
u = df.melt()
out = (u.assign(k=u.groupby("variable").cumcount())
.pivot("k","variable","value").fillna('-'))
out = out.rename_axis(index=None,columns=None)
print(out)
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
Using append without modifying the actual dataframe:
result = (df[['Col_1', 'Col_2']]
.append(df[['Col_one']]
.rename(columns={'Col_one': 'Col_1'}),ignore_index=True).fillna('-')
)
OUTPUT:
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
Might be a slightly longer method than other answers but the below delivered the required output.
df = pd.DataFrame({'Col_1': ['A', 'B', 'C'],
'Col_2': ['D', 'E', 'F'],
'Col_one':['G', 'H', 'I'],})
# Create a list of the values we want to retain
TempList = df['Col_one']
# Append existing dataframe with the values from the list
df = df.append(pd.DataFrame({'Col_1':TempList}), ignore_index = True)
# Drop the redundant column
df.drop(columns=['Col_one'], inplace=True)
# Populate NaN with -
df.fillna('-', inplace=True)
Output is
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
Using concat should work.
import pandas as pd
df = pd.DataFrame({'Col_1': ['A', 'B', 'C'],
'Col_2': ['D', 'E', 'F'],
'Col_one':['G', 'H', 'I'],})
df2 = pd.DataFrame()
df2['Col_1'] = pd.concat([df['Col_1'], df['Col_one']], axis = 0)
df2 = df2.reset_index()
df2 = df2.drop('index', axis =1)
df2['Col_2'] = df['Col_2']
df2['Col_2'] = df2['Col_2'].fillna('-')
print(df2)
prints
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
mydf = pd.DataFrame({'dts':['1/1/2000','1/1/2000','1/1/2000','1/2/2000', '1/3/2000', '1/3/2000'],
'product':['A', 'B', 'A','A', 'A','B'],
'value':[1,2,2,3,6,1]})
a =mydf.groupby(['dts','product']).sum()
so a has multi-level index now...
a
Out[1]:
value
dts product
1/1/2000 A 3
B 2
1/2/2000 A 3
1/3/2000 A 6
B 1
how to extract product-level index in a? a.index['product']does not work.
Using get_level_values
>>> a.index.get_level_values(1)
Index(['A', 'B', 'A', 'A', 'B'], dtype='object', name='product')
You can also use the name of the level:
>>> a.index.get_level_values('product')
Index(['A', 'B', 'A', 'A', 'B'], dtype='object', name='product')
I have a DataFrame I created using pandas and want to create new table based on the original, but filtered based on certain conditions.
df = pd.DataFrame(
[['Y', 'Cat', 'no', 'yes', 6],
['Y', 4, 7, 9, 'dog'],
['N', 6, 4, 6, 'pig'],
['N', 3, 6, 'beer', 8]],
columns = ('Data', 'a', 'b', 'c', 'd')
)
My condition that doesnt work:
if (df['Data']=='Y') & (df['Data']=='N'):
df3=df.loc[:,['Data', 'a', 'b', 'c']]
else:
df3=df.loc[:,['Data', 'a', 'b']]
I want the new table to contain data that matches the following criteria:
If df.Data has value 'Y' and 'N', the new table get columns ('Data', 'a', 'b')
If not, the new table gets columns ('Data', 'a', 'b', 'c')
Data a b
0 Y Cat no
1 Y 4 7
2 N 6 4
3 N 3 6
Data a b c
0 Y Cat no yes
1 Y 4 7 9
2 Y 6 4 6
3 Y 3 6 beer
You are comparing a series with a character rather than checking existence for a single Boolean result. You can, instead, use pd.Series.any which returns True if any value in a series is True:
if (df['Data']=='Y').any() & (df['Data']=='N').any():
# do something
An alternative method is to use pd.DataFrame.drop with a ternary statement:
df = df.drop(['d'] if set(df['Data']) == {'Y', 'N'} else ['c', 'd'], 1)
print(df)
Data a b c
0 Y Cat no yes
1 Y 4 7 9
2 N 6 4 6
3 N 3 6 beer
if all(df.Data.unique() == ['Y','N']) == True:
df3 = df[['Data', 'a', 'b', 'c']]
else:
df3 = df[['Data','a','b']]
I have given dataframe
Id Direction Load Unit
1 CN05059815 LoadFWD 0,0 NaN
2 CN05059815 LoadBWD 0,0 NaN
4 ....
....
and the given list.
list =['CN05059830','CN05059946','CN05060010','CN05060064' ...]
I would like to sort or group the data by a given element of the list.
For example,
The new data will have exactly the same sort as the list. The first column would start withCN05059815 which doesn't belong to the list, then the second CN05059830 CN05059946 ... are both belong to the list. With remaining the other data
One way is to use Categorical Data. Here's a minimal example:
# sample dataframe
df = pd.DataFrame({'col': ['A', 'B', 'C', 'D', 'E', 'F']})
# required ordering
lst = ['D', 'E', 'A', 'B']
# convert to categorical
df['col'] = df['col'].astype('category')
# set order, adding values not in lst to the front
order = list(set(df['col']) - set(lst)) + lst
# attach ordering information to categorical series
df['col'] = df['col'].cat.reorder_categories(order)
# apply ordering
df = df.sort_values('col')
print(df)
col
2 C
5 F
3 D
4 E
0 A
1 B
Consider below approach and example:
df = pd.DataFrame({
'col': ['a', 'b', 'c', 'd', 'e']
})
list_ = ['d', 'b', 'a']
print(df)
Output:
col
0 a
1 b
2 c
3 d
4 e
Then in order to sort the df with the list and its ordering:
df.reindex(df.assign(dummy=df['col'])['dummy'].apply(lambda x: list_.index(x) if x in list_ else -1).sort_values().index)
Output:
col
2 c
4 e
3 d
1 b
0 a
This question already has answers here:
How to group dataframe rows into list in pandas groupby
(17 answers)
Closed 6 years ago.
I have a df like this:
ID Cluster Product
1 4 'b'
1 4 'f'
1 4 'w'
2 7 'u'
2 7 'b'
3 5 'h'
3 5 'f'
3 5 'm'
3 5 'd'
4 7 's'
4 7 'b'
4 7 'g'
Where ID is the primary and unique key of another df that is the source for this df. Cluster is not a key, different IDs often have same Cluster value; anyway it's an information I have to carry on.
What I want to obtain is this dataframe:
ID Cluster Product_List_by_ID
1 4 ['b','f','w']
2 7 ['u','b']
3 5 ['h','f','m','d']
4 7 ['s','b','g']
If this is not possible, also a dictionary like this could be fine:
d = {ID:[1,2,3,4], Cluster:[4,7,5,7],
Product_List_by_ID:[['b','f','w'],['u','b'],['h','f','m','d'],['s','b','g']]}
I have tried many ways unsuccessfully.. it seems that it is not possible to insert lists as pandas dataframe values..
Anyway I think it should not be so difficult to get the goal in some tricky way.. Sorry if I am going out of mind, but I am new to coding
Any suggests?! Thanks
use groupby
df.groupby(['ID', 'Cluster']).Product.apply(list)
ID Cluster
1 4 ['b', 'f', 'w']
2 7 ['u', 'b']
3 5 ['h', 'f', 'm', 'd']
4 7 ['s', 'b', 'g']
Name: Product, dtype: object
Another solution is first remove ' from column Product if necessary by str.strip:
df.Product = df.Product.str.strip("'")
And then groupby with apply, last if need dictionary use to_dict with parameter orient='list'
print (df.groupby(['ID', 'Cluster'])
.Product.apply(lambda x: x.tolist())
.reset_index()
.to_dict(orient='list'))
{'Cluster': [4, 7, 5, 7],
'ID': [1, 2, 3, 4],
'Product': [['b', 'f', 'w'], ['u', 'b'],
['h', 'f', 'm', 'd'], ['s', 'b', 'g']]}