Set list as index of Pandas DataFrame - python

I have the following list:
index_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
I would like to use it to define an index for the following DataFrame:
df = pd.DataFrame(index= [0,1,4,7])
The DataFrames index corresponds to an entry in the list. The end result should look like:
df_target = pd.DataFrame(index= ['a','b','e','h'])
I have tried a number of built in functions for pandas, but no luck.

Use Index.map with dictonary created by enumerate:
target_df = pd.DataFrame(index=df.index.map(dict(enumerate(index_list))))
print (target_df)
Empty DataFrame
Columns: []
Index: [a, b, e, h]
If need change index in existing DataFrame:
df = pd.DataFrame({'a':range(4)}, index= [0,1,4,7])
print (df)
a
0 0
1 1
4 2
7 3
df = df.rename(dict(enumerate(index_list)))
print (df)
a
a 0
b 1
e 2
h 3

like this:
print(pd.DataFrame(index=[index_list[i] for i in [0,1,4,7]]))
Output:
Empty DataFrame
Columns: []
Index: [a, b, e, h]

You can pass a Series object to Index.map method.
df.index.map(pd.Series(index_list))
# Index(['a', 'b', 'e', 'h'], dtype='object')
Or rename the Dataframe index directly
df.rename(index=pd.Series(index_list))
# Empty DataFrame
# Columns: []
# Index: [a, b, e, h]

Related

Create a list according to another list in panda dataframe

I have two dataframes as below:
df1
Sites
['a']
['b', 'c', 'a', 'd']
['c']
df2
Site cells
a 2
b 4
c 3
d 5
what i need is to add a column in df1 with list of cells corresponding to list in df1
like
result_df
Sites Cells
['a'] [2]
['b', 'c', 'a', 'd'] [4,3,2,5]
['c'] [3]
I wrote below, just the column is map object,
df1['Cells'] = df1['Sites'].map(lambda x: map(df2.set_index('Site')['cells'],x))
I tried using list to convert it to list but I get this error
df1['Cells'] = df1['Sites'].map(lambda x: list(map(df2.set_index('Site')['cells'],x)))
TypeError: 'Series' object is not callable
Let us do
df1['value'] = df1.Sites.map(lambda x : df2.set_index('Site')['cells'].get(x).tolist())
df1
Out[728]:
Sites value
0 [a] [2]
1 [b, c, a, d] [4, 3, 2, 5]
2 [c] [3]
If you have some Sites in df1 not in df2 try with explode
df1['new'] = df1.Sites.explode().map(df2.set_index('Site')['cells']).groupby(level=0).agg(list)

Finding the common elements in 2 columns which is present in a single dataframe

[image of a data frame]
I want to find the common elements which is present in top30 and played games column
i have used the below code , but it does not give me the right output
output : {f, 1, , ,, a, s, t, r, g, 2, ', y, o, e, [
The output which i am looking is
Prediction1['precision_at_30'] = [
set(a).intersection(b) for a, b in zip(Prediction1['played_games'], Prediction1['top 30'])]
You can use apply to check the set intersection:
df['result'] = df.apply(lambda r: set(r['games']).intersection(r['played_games']), axis=1)
Example:
games played_games result
0 [abc, def, ghi] [def, abc] {abc, def}
Your columns seems to be the string representation of lists, so you can use pd.eval to convert your string in real python list before using set predicate:
df = pd.DataFrame({'games': ["['abc', 'bef']", "['b', 'c', 'e', 'f']"],
'played_games': ["['abc', 'bef', 'e']", "['b', 'f']"]})
df['result'] = df[['games', 'played_games']].apply(
lambda x: set(pd.eval(x['games'])).intersection(pd.eval(x['played_games'])),
axis=1)
Output:
>>> df
games played_games result
0 ['abc', 'bef'] ['abc', 'bef', 'e'] {abc, bef}
1 ['b', 'c', 'e', 'f'] ['b', 'f'] {b, f}
Update: with your sample data
df = pd.read_csv('https://raw.githubusercontent.com/ajayvd/stack-overflow/main/testing.csv', index_col=0)
df['result'] = df[['top 30', 'played_games']].apply(
lambda x: set(pd.eval(x['top 30'])).intersection(pd.eval(x['played_games'])),
axis=1)
print(df['result'])
# Output:
0 {fdtsl, ashhof, ctiv, aeolus, batcat, drgch, a...
1 {aogs, ashace, ashjut, bib, athn}
2 {aogs}
3 {ashjut, anwild}
4 {}
Name: result, dtype: object

How to remove strings from a column matching with strings of another column of dataframe?

I have to two dataframes
first one: df
df1 = pd.DataFrame({
'Sample': ['Sam1', 'Sam2', 'Sam3'],
'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am, x,y,z,a']
})
df1
looks as:
Sample Value
0 Sam1 ak,b,c,k
1 Sam2 d,k,e,b,f,a
2 Sam3 am,x,y,z,a
second one: df2
df2 = pd.DataFrame({
'Remove': ['ak', 'b', 'k', 'a', 'am']})
df2
Looks as:
Remove
0 ak
1 b
2 k
3 a
4 am
I want to remove the strings from df1['Value'] that are matching with df2['Remove']
Expected output is:
Sample Value
Sam1 c
Sam2 d,e,f
Sam3 x,y,z
This code did not help me
Any help, thanks
Using apply as a 1 liner
df1['Value'] = df1['Value'].str.split(',').apply(lambda x:','.join([i for i in x if i not in df2['Remove'].values]))
Output:
>>> df1
Sample Value
0 Sam1 c
1 Sam2 d,e,f
2 Sam3 x,y,z
You can use apply() to remove items in df1 Value column if it is in df2 Remove column.
import pandas as pd
df1 = pd.DataFrame({
'Sample': ['Sam1', 'Sam2', 'Sam3'],
'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am, x,y,z,a']
})
df2 = pd.DataFrame({'Remove': ['ak', 'b', 'k', 'a', 'am']})
remove_list = df2['Remove'].values.tolist()
def remove_value(row, remove_list):
keep_list = [val for val in row['Value'].split(',') if val not in remove_list]
return ','.join(keep_list)
df1['Value'] = df1.apply(remove_value, axis=1, args=(remove_list,))
print(df1)
Sample Value
0 Sam1 c
1 Sam2 d,e,f
2 Sam3 x,y,z
This script will help you
for index, elements in enumerate(df1['Value']):
elements = elements.split(',')
df1['Value'][index] = list(set(elements)-set(df2['Remove']))
Just iterate the data frame and get the diff of array with the remove array like this
The complete code will be sth like this
import pandas as pd
df1 = pd.DataFrame({
'Sample': ['Sam1', 'Sam2', 'Sam3'],
'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am,x,y,z,a']
})
df2 = pd.DataFrame({
'Remove': ['ak', 'b', 'k', 'a', 'am']})
for index, elements in enumerate(df1['Value']):
elements = elements.split(',')
df1['Value'][index] = list(set(elements)-set(df2['Remove']))
print(df1)
output
Sample Value
0 Sam1 [c]
1 Sam2 [e, d, f]
2 Sam3 [y, x, z]

Sort or groupby dataframe in python using given string

I have given dataframe
Id Direction Load Unit
1 CN05059815 LoadFWD 0,0 NaN
2 CN05059815 LoadBWD 0,0 NaN
4 ....
....
and the given list.
list =['CN05059830','CN05059946','CN05060010','CN05060064' ...]
I would like to sort or group the data by a given element of the list.
For example,
The new data will have exactly the same sort as the list. The first column would start withCN05059815 which doesn't belong to the list, then the second CN05059830 CN05059946 ... are both belong to the list. With remaining the other data
One way is to use Categorical Data. Here's a minimal example:
# sample dataframe
df = pd.DataFrame({'col': ['A', 'B', 'C', 'D', 'E', 'F']})
# required ordering
lst = ['D', 'E', 'A', 'B']
# convert to categorical
df['col'] = df['col'].astype('category')
# set order, adding values not in lst to the front
order = list(set(df['col']) - set(lst)) + lst
# attach ordering information to categorical series
df['col'] = df['col'].cat.reorder_categories(order)
# apply ordering
df = df.sort_values('col')
print(df)
col
2 C
5 F
3 D
4 E
0 A
1 B
Consider below approach and example:
df = pd.DataFrame({
'col': ['a', 'b', 'c', 'd', 'e']
})
list_ = ['d', 'b', 'a']
print(df)
Output:
col
0 a
1 b
2 c
3 d
4 e
Then in order to sort the df with the list and its ordering:
df.reindex(df.assign(dummy=df['col'])['dummy'].apply(lambda x: list_.index(x) if x in list_ else -1).sort_values().index)
Output:
col
2 c
4 e
3 d
1 b
0 a

Python Pandas lookup and replace df1 value from df2

I have two dataframes
df df2
df column FOUR matches with df2 column LOOKUP COL
I need to match df column FOUR with df2 column LOOKUP COL and replace df column FOUR with the corresponding values from df2 column RETURN THIS
The resulting dataframe could overwrite df but I have it listed as result below.
NOTE: THE INDEX DOES NOT MATCH ON EACH OF THE DATAFRAMES
df = pd.DataFrame([['a', 'b', 'c', 'd'],
['e', 'f', 'g', 'h'],
['j', 'k', 'l', 'm'],
['x', 'y', 'z', 'w']])
df.columns = ['ONE', 'TWO', 'THREE', 'FOUR']
ONE TWO THREE FOUR
0 a b c d
1 e f g h
2 j k l m
3 x y z w
df2 = pd.DataFrame([['a', 'b', 'd', '1'],
['e', 'f', 'h', '2'],
['j', 'k', 'm', '3'],
['x', 'y', 'w', '4']])
df2.columns = ['X1', 'Y2', 'LOOKUP COL', 'RETURN THIS']
X1 Y2 LOOKUP COL RETURN THIS
0 a b d 1
1 e f h 2
2 j k m 3
3 x y w 4
RESULTING DF
ONE TWO THREE FOUR
0 a b c 1
1 e f g 2
2 j k l 3
3 x y z 4
You can use Series.map. You'll need to create a dictionary or a Series to use in map. A Series makes more sense here but the index should be LOOKUP COL:
df['FOUR'] = df['FOUR'].map(df2.set_index('LOOKUP COL')['RETURN THIS'])
df
Out:
ONE TWO THREE FOUR
0 a b c 1
1 e f g 2
2 j k l 3
3 x y z 4
df['Four']=[df2[df2['LOOKUP COL']==i]['RETURN THIS'] for i in df['Four']]
Should be something like sufficient to do the trick? There's probably a more pandas native way to do it.
Basically, list comprehension - We generate a new array of df2['RETURN THIS'] values based on using the lookup column as we iterate over the i in df['Four'] list.

Categories

Resources