How to select column if string is in column name

How to select column if string is in column name - python

so I have a dict of dataframes with many columns. I want to selected all the columns that have the string 'important' in them.
So some of the frames may have important_0 or important_9_0 as their column name. How can I select them and put them into their own new dictionary with all the values each columns contains.

import pandas as pd
df = pd.DataFrame(columns=['a', 'b', 'important_c'])
selected_cols = [c for c in df.columns if c.startswith('important_')]
print(selected_cols)
# ['important_c']
dict_df = { x: pd.DataFrame(columns=['a', 'b', 'important_c']) for x in range(3) }
new_dict = { x: dict_df[x][[c for c in dict_df[x].columns if c.startswith('important_')]] for x in dict_df }

important_columns = [x for x in df.columns if 'important' in x]
#changing your dataframe by remaining columns that you need
df = df[important_columns]

Related

Extract key value pairs from dict in pandas column using list items in another column

Trying to create a new column that is the key/value pairs extracted from a dict in another column using list items in a second column.
Sample Data:
names name_dicts
['Mary', 'Joe'] {'Mary':123, 'Ralph':456, 'Joe':789}
Expected Result:
names name_dicts new_col
['Mary', 'Joe'] {'Mary':123, 'Ralph':456, 'Joe':789} {'Mary':123, 'Joe':789}
I have attempted to use AST to convert the name_dicts column to a column of true dictionaries.
This function errored out with a "cannot convert string" error.
col here is the df['name_dicts'] col
def get_name_pairs(col):
for k,v in col.items():
if k.isin(df['names']):
return

Using a list comprehension and operator.itemgetter:
from operator import itemgetter
df['new_col'] = [dict(zip(l, itemgetter(*l)(d)))
for l,d in zip(df['names'], df['name_dicts'])]
output:
names name_dicts new_col
0 [Mary, Joe] {'Mary': 123, 'Ralph': 456, 'Joe': 789} {'Mary': 123, 'Joe': 789}
used input:
df = pd.DataFrame({'names': [['Mary', 'Joe']],
'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}]
})

You can apply a lambda function with dictionary comprehension at row level to get the values from the dict in second column based on the keys in the list of first column:
# If col values are stored as string:
import ast
for col in df:
df[col] = df[col].apply(ast.literal_eval)
df['new_col']=df.apply(lambda x: {k:x['name_dicts'].get(k,0) for k in x['names']},
axis=1)
# Replace above lambda by
# lambda x: {k:x['name_dicts'][k] for k in x['names'] if k in x['name_dicts']}
# If you want to include only key/value pairs for the key that is in
# both the list and the dictionary
names ... new_col
0 [Mary, Joe] ... {'Mary': 123, 'Joe': 789}
[1 rows x 3 columns]
PS: ast.literal_eval runs without error for the sample data you have posted for above code.

Your function needs only small change - and you can use it with .apply()
import pandas as pd
df = pd.DataFrame({
'names': [['Mary', 'Joe']],
'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}],
})
def filter_data(row):
result = {}
for key, val in row['name_dicts'].items():
if key in row['names']:
result[key] = val
return result
df['new_col'] = df.apply(filter_data, axis=1)
print(df.to_string())
Result:
names name_dicts new_col
0 [Mary, Joe] {'Mary': 123, 'Ralph': 456, 'Joe': 789} {'Mary': 123, 'Joe': 789}
EDIT:
If you have string "{'Mary':123, 'Ralph':456, 'Joe':789}" in name_dicts then you can replace ' with " and you will have json which you can convert to dictionary using json.loads
import json
df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)
Or directly convert it as Python's code:
import ast
df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)
Eventually:
df['name_dicts'] = df['name_dicts'].apply(eval)
Full code:
import pandas as pd
df = pd.DataFrame({
'names': [['Mary', 'Joe']],
'name_dicts': ["{'Mary':123, 'Ralph':456, 'Joe':789}",], # strings
})
#import json
#df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)
#df['name_dicts'] = df['name_dicts'].apply(eval)
import ast
df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)
def filter_data(row):
result = {}
for key, val in row['name_dicts'].items():
if key in row['names']:
result[key] = val
return result
df['new_col'] = df.apply(filter_data, axis=1)
print(df.to_string())

Find columns in Pandas DataFrame containing dicts

I have a pandas DataFrame with several columns containing dicts. I am trying to identify columns that contain at least 1 dict.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'i': [0, 1, 2, 3],
'd': [np.nan, {'p':1}, {'q':2}, np.nan],
't': [np.nan, {'u':1}, {'v':2}, np.nan]
})
# Iterate over cols to find dicts
cdict = [i for i in df.columns if isinstance(df[i][0],dict)]
cdict
[]
How do I find cols with dicts? Is there a solution to find cols with dicts without iterating over every cell / value of columns?

You can do :
s = df.applymap(lambda x:isinstance(x, dict)).any()
dict_cols = s[s].index.tolist()
print(dict_cols)
['d', 't']

We can apply over the columns although this still is iterating but making use of apply.
df.apply(lambda x: [any(isinstance(y, dict) for y in x)], axis=0)
EDIT: I think using applymap is more direct. However, we can use our boolean result to get the column names
any_dct = df.apply(lambda x: [any(isinstance(y, dict) for y in
x)], axis=0, result_type="expand")
df.iloc[:,any_dct.iloc[0,:].tolist()].columns.values

Replace Multiple Values of columns

I have a data frame with different columns name (asset.new, few, value.issue, etc). And I want to change some characters or symbols in the name of columns. I can do it in this form:
df.columns = df.columns.str.replace('.', '_')
df.columns = df.columns.str.replace('few', 'LOW')
df.columns = df.columns.str.replace('value', 'PRICE')
....
But I think it should have a better and shorter way.

You can create a dictionary with the actual character as a key and the replacement as a value and then you iterate through your dictionary:
df = pd.DataFrame({'asset.new':[1,2,3],
'few':[4,5,6],
'value.issue':[7,8,9]})
replaceDict = { '.':'_', 'few':'LOW', 'value':'PRICE'}
for k, v in replaceDict.items():
df.columns = [c.replace(k, v) for c in list(df.columns)]
print(df)
output:
asset_new LOW PRICE_issue
1 4 7
2 5 8
3 6 9
or:
df.columns = df.columns.to_series().replace(["\.","few","value"],['_','LOW','PRICE'],regex=True)
Produces the same output.

Use Series.replace with dictionary - also necessary escape . because special regex character:
d = { '\.':'_', 'few':'LOW', 'value':'PRICE'}
df.columns = df.columns.to_series().replace(d, regex=True)
More general solution with re.esape:
import re
d = { '.':'_', 'few':'LOW', 'value':'PRICE'}
d1 = {re.escepe(k): v for k, v in d.items()}
df.columns = df.columns.to_series().replace(d1, regex=True)

How to remove one dictionary from dataframe

I have the following dataframe:
And I made dictionaries from each unique appId as you see below:
with this command:
dfs = dict(tuple(timeseries.groupby('appId')))
After that I want to remove all dictionaries which have less than 30 rows from my dataframe. I removed those dictionaries from my dictionaries(dfs) and then I tried this code:
pd.concat([dfs]).drop_duplicates(keep=False)
but it doesn't work.

I believe you need transform size and then filter by boolean indexing:
df = pd.concat([dfs])
df = df[df.groupby('appId')['appId'].transform('size') >= 30]
#alternative 1
#df = df[df.groupby('appId')['appId'].transform('size').ge(30)]
#alternative 2 (slowier in large data)
#df = df.groupby('appId').filter(lambda x: len(x) >= 30)
Another approach is filter dictionary:
dfs = {k: v for k, v in dfs.items() if len(v) >= 30}
EDIT:
timeseries = timeseries[timeseries.groupby('appId')['appId'].transform('size') >= 30]
dfs = dict(tuple(timeseries.groupby('appId')))

How to combine multiple columns from a pandas df into a list

How can you combine multiple columns from a dataframe into a list?
Input:
df = pd.DataFrame(np.random.randn(10000, 7), columns=list('ABCDEFG'))
If I wanted to create a list from column A I would perform:
df1 = df['A'].tolist()
But if I wanted to combine numerous columns into this list it wouldn't be efficient write df['A','B','C'...'Z'].tolist()
I have tried to do the following but it just adds the columns headers to a list.
df1 = list(df.columns)[0:8]
Intended input:
A B C D E F G
0 0.787576 0.646178 -0.561192 -0.910522 0.647124 -1.388992 0.728360
1 0.265409 -1.919283 -0.419196 -1.443241 -2.833812 -1.066249 0.553379
2 0.343384 0.659273 -0.759768 0.355124 -1.974534 0.399317 -0.200278
Intended Output:
[0.787576, 0.646178, -0.561192, -0.910522, 0.647124, -1.388992, 0.728360,
0.265409, -1.919283, -0.419196, -1.443241, -2.833812, -1.066249, 0.553379,
0.343384, 0.659273, -0.759768, 0.355124, -1.974534, 0.399317, -0.200278]

Is this what you are looking for
lst = df.values.tolist()
flat_list = [item for x in lst for item in x]
print(flat_list)

You can using to_dict
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
df.to_dict('l')
Out[1036]:
{'A': [-0.5611441440595607,
-0.3785906500723589,
-0.19480328695097676,
-0.7472526275034221,
-2.4232786057647457,
0.10506614562827334,
0.4968179288412277,
1.635737019365132,
-1.4286421753281746,
0.4973223222844811],
'B': [-1.0550082961139444,
-0.1420067090193365,
0.30130476834580633,
1.1271866812852227,
0.38587456174846285,
-0.531163142682951,
-1.1335754634118729,
0.5975963084356348,
-0.7361022807495443,
1.4329395663140427],
...}
Or adding values.tolist()
df[list('ABC')].values.tolist()
Out[1041]:
[[0.09552771302434987, 0.18551596484768904, -0.5902249875268607],
[-1.5285190712746388, 1.2922627021799646, -0.8347422966138306],
[-0.4092028716404067, -0.5669107267579823, 0.3627970727410332],
[-1.3546346273319263, -0.9352316948439341, 1.3568726575880614],
[-1.3509518030469496, 0.10487182694997808, -0.6902134363370515]]
Edit : np.concatenate(df[list('ABC')].T.values.tolist())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to select column if string is in column name - python

important_columns = [x for x in df.columns if 'important' in x] #changing your dataframe by remaining columns that you need df = df[important_columns]

Related

Extract key value pairs from dict in pandas column using list items in another column

Find columns in Pandas DataFrame containing dicts

Replace Multiple Values of columns

How to remove one dictionary from dataframe

How to combine multiple columns from a pandas df into a list

Categories

Resources