Looping dictionary through column using Pandas - python

I have a data frame with a column called "Input", consisting of various numbers.
I created a dictionary that looks like this
sampleDict = {
"a" : ["123","456"],
"b" : ["789","272"]
}
I am attempting to loop through column "Input" against this dictionary. If any of the values in the dictionary are found (123, 789, etc), I would like to create a new column in my data frame that signifies where it was found.
For example, I would like to create column called "found" where the value is "a" when 456 was found in "Input." the value is "b" when 789 was found in the input.
I tried the following code but my logic seems to be off:
for key in sampleDict:
for p_key in df['Input']:
if code in p_key:
if code in sampleDict[key]:
df = print(code)
print(df)

Use map by flattened lists to dictionary, only is necessary all values in lists are unique:
d = {k: oldk for oldk, oldv in sampleDict.items() for k in oldv}
print (d)
{'123': 'a', '456': 'a', '789': 'b', '272': 'b'}
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print (df)
Input found
0 789 b
1 456 a
2 100 NaN
If duplicated values in lists is possible use aggregation, e.g. by join in first step and map by Series:
sampleDict = {
"a" : ["123","456", "789"],
"b" : ["789","272"]
}
df1 = pd.DataFrame([(k, oldk) for oldk, oldv in sampleDict.items() for k in oldv],
columns=['a','b'])
s = df1.groupby('a')['b'].apply(', '.join)
print (s)
a
123 a
272 b
456 a
789 a, b
Name: b, dtype: object
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(s)
print (df)
Input found
0 789 a, b
1 456 a
2 100 NaN

You can use collections.defaultdict to construct a mapping of list values to key(s). Data from #jezrael.
from collections import defaultdict
d = defaultdict(list)
for k, v in sampleDict.items():
for w in v:
d[w].append(k)
print(d)
defaultdict(list,
{'123': ['a'], '272': ['b'], '456': ['a'], '789': ['a', 'b']})
Then use pd.Series.map to map inputs to keys in a new series:
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print(df)
Input found
0 789 [a, b]
1 456 [a]
2 100 NaN

create a mask using a list comprehension then convert the list to an array and mask the true values in the search array
sampleDict = {
"a" : ["123","456"],
"b" : ["789","272"]
}
search=['789','456','100']
#https://www.techbeamers.com/program-python-list-contains-elements/
#https://stackoverflow.com/questions/10274774/python-elegant-and-efficient-ways-to-mask-a-list
for key,item in sampleDict.items():
print(item)
mask=[]
[mask.append(x in search) for x in item]
arr=np.array(item)
print(arr[mask])

Related

Using groupby and append values at columns

Consider the following csv file where there is a duplicate name in "Name" column:
ID,Name,T,CA,I,C,IP
129,K1,1.2,64,386,5522,0.07
6,K1,1.1,3072,28800,6485,4.44
157,K2,1.1,512,1204,3257,0.37
I want to group the rows by name and record I and C columns like this
K1:
0 I 386 28800
1 C 5522 6485
K2:
0 I 1204
1 C 3257
I have written this code which groups the rows by name column and build a dictionary.
data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
df = pd.read_csv('test.csv', usecols=['Name', 'I', 'C'])
for name, df_group in df.groupby('Name'):
my_dict[name] = pd.DataFrame(df_group)
print(my_dict)
But the output is
{'dummy': Value
C 0
I 1, 'K1': Name I C
0 K1 386 5522
1 K1 28800 6485, 'K2': Name I C
2 K2 1204 3257}
As you can see the I and C are written in columns, so the rows for each key are increased. That is the opposite of what I want. How can I fix that?
I think you need select columns with transpose. I dont use dict comprehension, because in your code are added new DataFrame to existing dict:
data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
for name, df_group in df.groupby('Name'):
my_dict[name] = df_group[[ 'I', 'C']].T
print(my_dict['K1'])
0 1
I 386 28800
C 5522 6485
If new column is necessary:
data = {'Value':[0,1]}
kernel_df = pd.DataFrame(data, index=['C','I'])
my_dict = {'dummy':kernel_df}
for name, df_group in df.groupby('Name'):
my_dict[name] = df_group[[ 'I', 'C']].T.rename_axis('g').reset_index()
print(my_dict['K1'])
g 0 1
0 I 386 28800
1 C 5522 6485

How to create a dataframe from a nested dictionary using pandas?

I have the following nested dictionary:
dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}}
I want to create a dataframe using pandas in order to achieve the following
df = pd.DataFrame(columns=list('abcd'))
df.loc[0] = [1,2,3,4]
You could pop the 'remaining' dict to update dict1, then convert the values to vectors (like lists).
nested = dict1.pop('remaining')
dict1.update(nested)
pd.DataFrame({k: [v] for k, v in dict1.items()})
a b c d
0 1 2 3 4
You can use pandas.json_normalize:
dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}}
df = pd.json_normalize(dict1)
df.columns = list('abcd')
Result:
a b c d
0 1 2 3 4

Convert dictionary with nested lists to a specific pandas df structure

How can I convert dictionary to the "Desired output" below ?
I have a dictionary with a nested list as values:
dictionary = {'Person1': ['a', 1, 'b', 2], 'Person2': ['c', 3, 'd', 4]} #(the dict is longer than this, this is just an example)
Desired output:
name
letter
value
'Person1'
'a'
1
'Person1'
'b'
2
'Person2'
'c'
3
'Person2'
'd'
4
Current code:
import pandas as pd
df = pd.DataFrame.from_dict(dictionary)
Change dictionary to list of tuples by zipping pair and unpairs value of lists in list comprehension and pass to DataFrame constructor:
L = [(k, *x) for k, v in dictionary.items() for x in zip(v[::2], v[1::2])]
df = pd.DataFrame(L, columns=['name','letter','value'])
print (df)
name letter value
0 Person1 a 1
1 Person1 b 2
2 Person2 c 3
3 Person2 d 4

Map a Pandas series with a dictionary as argument, where the value is a tuple

I am trying to map a column of my df with a dictionary. My dictionary contains tuple as value and I only want the first element of the tuple. How can I achieve that ?
my_dict = {'foo': (1, 0.1)}
df['original_column'] = 'foo'
what I get so far:
df['mapped column'] = (1, 0.1)
what I want:
df['mapped column'] = 1
Any Idea ?
Use Series.map by new dictionary created by dictionary comprehension for get first value of tuple:
df = pd.DataFrame({
'original_column':['foo','bar','baz']
})
my_dict = {'foo': (1, 0.1), 'bar':(2,0.5),'baz':(5,6)}
d = {k:v[0] for k, v in my_dict.items()}
df['mapped column'] = df['original_column'].map(d)
print (df)
original_column mapped column
0 foo 1
1 bar 2
2 baz 5
Another solution is map original and select first values of tuples by str[0], but performance is worse if large DataFrame:
my_dict = {'foo': (1, 0.1), 'bar':(2,0.5),'baz':(5,6)}
df['mapped column'] = df['original_column'].map(my_dict).str[0]
print (df)
original_column mapped column
0 foo 1
1 bar 2
2 baz 5

Creating Python Column based on List of list value

I have a list of list and a dataframe df:
test_list=[[A,B,C],[A,B,D],[A,B,E],[F,G]]
and dataframe is
ID
B
C
D
E
The element of List of list represent hierarchy .I want to create a new column "type" in the dataframe whose value represent its parent.
My final Dataframe should be like:
value parent
B A
C B
D B
E B
I have a very large dataset and test_list is also very large
As per my comments on using a dictionary, here's the code.
import pandas as pd
test_list=[["A","B","C"],["A","B","D"],["A","B","E"],["F","G"]]
dict = {}
for sublist in test_list:
for n, elem in enumerate(sublist):
if n != 0:
dict[elem] = prev
prev = elem
df = pd.DataFrame([dict.keys(), dict.values()]).T
df.columns= ['element', 'parent']
df.set_index('element', inplace=True)
print(df)
giving the following output.
parent
element
B A
C B
D B
E B
G F
You could use a dictionary. Here is a working example :
df = pd.DataFrame({'ID': ['B', 'C', 'D', 'E']})
test_list=[['A','B','C'],['A','B','D'],['A','B','E'],['F','G']]
parent = {}
for element in test_list:
for i in range(len(element)-1):
parent[element[i+1]] = element[i]
df['parent'] = [parent[x] for x in df['ID']]
In [1] : print(df)
Out[1] : ID parent
0 B A
1 C B
2 D B
3 E B

Categories

Resources