#I have a data frame like this, what i am trying to do is to search the Description column it see if it contain the string in my dictionary by using for loops. the results look good for me but have do not know how to save it to data frame or list or any sort of file i can export it :
import pandas as pd
data = {'ID': ['1', '2'],
'Description': ['there is a good book which is best for kids.', 'there is a bad book which worst for kids.'],
}
df = pd.DataFrame (data, columns = ['ID','Description'])
myDict={'A':{'best', 'good'}, 'D':{'bad','worst'}}
for i in range(len(df)):
for key, val in myDict.items():
for item in val:
if item in df['Description'][i]:
print(item)
print(i)
good
0
best
0
bad
1
worst
1
###output should like this. how do i create a dataframe or list to capture the results
#0 good best
#1 bad worst
Instead of printing the matches, append them to a list containing the matches for the current row of the dataframe. Then append the list that row of results to the list of results.
result = []
for i in range(len(df)):
row = [i]
for key, val in myDict.items():
for item in val:
if item in df['Description'][i]:
row.append(item)
result.append(row)
If I understand correctly, you want to aggregate the results in some data structure even as a list of tuples potentially? I added two lines to your code snippet:
results = []
import pandas as pd
data = {'ID': ['1', '2'],
'Description': ['there is a good book which is best for kids.', 'there is a bad book which worst for kids.'],
}
df = pd.DataFrame (data, columns = ['ID','Description'])
myDict={'A':{'best', 'good'}, 'D':{'bad','worst'}}
results = [] # aggregate results into a list
for i in range(len(df)):
for key, val in myDict.items():
for item in val:
if item in df['Description'][i]:
print(item)
print(i)
results.append((item, i)) # results[("good", 0), ("best", 0), ...]
# You can print them out like this
for x, y in results:
print("{} {}".format(x,y))
Related
Say I have a long list and I want to iteratively join them to produce a final dataframe.
The data is originally in dict so I need to iterate over the dictionary first.
header = ['apple', 'pear', 'cocoa']
for key, value in data.items():
for idx in header:
# Flatten the dictionary to dataframe
data_df = pd.json_normalize(data[key][idx])
# Here I start to lose.....
How can I iteratively join the dataframe?
Manually it can be done like this:
data_df = pd.json_normalize(data["ParentKey"]['apple'])
data_df1 = pd.json_normalize(data["ParentKey"]['pear'])
final_df = data_df1.join(data_df, lsuffix='_left')
# or
final_df = pd.concat([data_df, data_df1], axis=1, sort=False)
Since the list will be large, I want to iterate them instead.. How can I achieve this?
Is this what you're looking for? You can use k as a counter to indicate whether or not it's the first iterator and then for future ones just join it to that same dataframe:
header = ['apple', 'pear', 'cocoa']
for key, value in data.items():
k = 0
for idx in header:
data_df = pd.json_normalize(data[key][idx])
if k==0:
final_df = data_df
else:
final_df = final_df.join(data_df, lsuffix='_left')
k += 1
I have a for loop in which I need to assign the value of the new key taking from another dictionary. I am trying to avoid for loop in for loop. My first dictionary looks like the following:
ids = [{'1020': 'ID-2522'}, {'1030': 'ID-2523'}, {'1040': 'ID-2524'}]
The list of dictionaries I am looping through looks like the following:
data = [{'sf_id': '1020', TotalPrice': '504'}, {'sf_id': '1030', TotalPrice': '400'}, {'sf_id': '1040', TotalPrice': '500'}]
Here is my for loop:
for index, my_dict in enumerate(data):
for key, value in my_dict.items():
new_id = my_dict["sf_id"]
opportunity = ids[new_id]
So that it grabs the correspondent value. Desired output would be:
print(opportunity)
ID-2522
ID-2523
ID-2524
Assuming there are no duplicates key in the entire collection you can use reduce to merge every id dict into a single dict of ids so you only iterate both collections just once
from functools import reduce
ids = [{'1020': 'ID-2522'}, {'1030': 'ID-2523'}, {'1040': 'ID-2524'}]
data = [{'sf_id': '1020', 'TotalPrice': '504'}, {'sf_id': '1030', 'TotalPrice': '400'}, {'sf_id': '1040', 'TotalPrice': '500'}]
ids_map = reduce(lambda x, y: x.update(y) or x , ids, {})
for my_dict in data:
new_id = my_dict["sf_id"]
opportunity = ids_map[new_id]
print(opportunity)
As noted in #mad_'s comment, ids is a list of dicts, not a dict. This makes looking up one of the keys very inefficient, as you would have to go through the whole list to find it.
So, the first step is to merge all of these small dicts into a large one.
Then, we can efficiently build a list with your expected output:
ids = [{'1020': 'ID-2522'}, {'1030': 'ID-2523'}, {'1040': 'ID-2524'}]
data = [{'sf_id': '1020', 'TotalPrice': '504'},
{'sf_id': '1030', 'TotalPrice': '400'},
{'sf_id': '1040', 'TotalPrice': '500'}]
ids_dict = {k:v for dct in ids for k, v in dct.items()}
opportunity = []
for my_dict in data:
new_id = my_dict["sf_id"]
opportunity.append(ids_dict[new_id])
print(opportunity)
# ['ID-2522', 'ID-2523', 'ID-2524']
(Note that you don't need enumerate if you don't use the index)
Or, shorter, with a list comprehension:
opportunity = [ids_dict[my_dict["sf_id"]] for my_dict in data]
print(opportunity)
# ['ID-2522', 'ID-2523', 'ID-2524']
# Create a new variable and save sftoap in it
new_ids = ids
# pop the items from original list of dicts and save it in a new dict
sftoap_dict = {}
for d in new_ids:
key, value = d.popitem()
sftoap_dict[key] = value
If there is a single value then it should be the key with a null value as a list and if there are multiple values then the first value will be the key and rest will be the list of values in a dictionary.
Ex: Column
ram
sneha, vijay, harish
deva
babu, dominic
Expected o/p:
{
'ram':[],
'sneha': ['vijay', 'harish'],
'deva' : [],
'babu' : ['dominic']
}
You can simply run a loop to select one row at a time and set 0th index of the row as key and the values following as values in a list.
It can be done like this
col = [['ram'], ['sneha', 'vijay', 'harish'], ['deva'], ['babu', 'dominic']]
out = {}
for row in col:
out[row[0]] = [x for x in row[1 : len(row)]]
print(out)
Assuming your column data is in the format I described, the below solution gives the expected output
dict = {}
cols = [['ram'],['sneha', 'vijay', 'harish'],['deva'],['babu', 'dominic']]
for row in cols:
dict[row[0]] = [item for item in row[1:len(row)]]
You can do it like this:
columns_list = [['ram'], ['sneha', 'vijay', 'harish'],
['deva'], ['babu', 'dominic']]
result = {item[0]: item[1:] for item in columns_list}
print(result)
# {'ram': [], 'sneha': ['vijay', 'harish'], 'deva': [], 'babu': ['dominic']}
If input is string then you can do like this:
rows = '''ram
sneha, vijay, harish
deva
babu, dominic'''
columns_list = [row.split(',') for row in rows.split("\n")]
# if columns have space at begin and end then `strip` them
columns_list = tuple(map(lambda cols: [c.strip() for c in cols], columns_list))
print(columns_list)
# (['ram'], ['sneha', 'vijay', 'harish'], ['deva'], ['babu', 'dominic'])
result = {item[0]: item[1:] for item in columns_list}
print(result)
# {'ram': [], 'sneha': ['vijay', 'harish'], 'deva': [], 'babu': ['dominic']}
If your source data is a string object then
data = ["ram", "sneha, vijay, harish", "deva", "babu, dominic"]
res = {}
for i in data:
val = i.split(",")
res[val[0]]= list(map(str.strip, val[1:]))
print(res)
Output:
{'babu': ['dominic'], 'deva': [], 'ram': [], 'sneha': ['vijay', 'harish']}
I am reading in a csv via csv.DictReader and trying to replace any empty values with the None value. DictReader seems to take the file as an instance of dictionaries where each row of the CSV is a dictionary (which I am fine with). However when I try to iterate through it row/dictionary by row/dictionary and replace any empty values ("") with None I seem to get unstuck. I had previously written this as a list comprehension like this:
for row in data:
row = [None if not x else x for x in row]
But I need to switch to using dictionaries rather than lists. I've not had any experience with dictionary comprehensions before but when I try to extend this for dictionaries I just cant get it to work. I was thinking something along the lines of:
for row in data:
row.values() = [None if not x else x for x in row.values()}
but I just get SyntaxError: invalid syntax.. I've tried a lot of other things (too many to list here) like:
for row in data:
row = {k:None for k,v in row if v not v else v}
but this seems to have the same problem.
For reference, my data looks like:
{'colour': 'ab6612', 'line': '1', 'name': 'Baker', 'stripe': ''}
{'colour': 'f7dc00', 'line': '3', 'name': '', 'stripe': 'FFFFFF'}
and would ideally end up as:
{'colour': 'ab6612', 'line': '1', 'name': 'Baker', 'stripe': None}
{'colour': 'f7dc00', 'line': '3', 'name': None, 'stripe': 'FFFFFF'}
Your issue is that you are changing the name row to reference a new dictionary in the for loop, this will not change anything inside your original list/DictReader object - data .
If data is a list, you should enumerate over data and change the dictionary inside data (or make that reference a new dictionary)
Example -
for i,row in enumerate(data):
data[i] = {k:(v if v else None) for k,v in row.items()}
Example test -
>>> data = [{1:2 , 3:''},{4:'',5:6}]
>>> for i,row in enumerate(data):
... data[i] = {k:(v if v else None) for k,v in row.items()}
...
>>> data
[{1: 2, 3: None}, {4: None, 5: 6}]
And since you are using DictReader class, you cannot directly, change the DictReader object, so you should create a new list , and add the changed row in the new list (or a DictWriter object, would prefer the DictWriter object) -
Example -
>>> newdata = []
>>> for row in data:
... newdata.append({k:(v if v else None) for k,v in row.items()})
Your main error is that you are trying to iterate twice over your dictionary whereas you only need to do it once.
Try:
data = {k:(v if v else None) for k,v in data.items()}
without the for-loop.
If you are using CSV and the data is too large please use iteritems()
this will save prevent the large list generation caused by items()
Try:
new_data=[]
for row in data:
new_data.append({k:(v if v else None) for k,v in row.iteritems()})
if you dont understand comprehensions follow this simple for loop:
for row in data:
for k,v in row.iteritems():
if not v:
row[k]=None
the second method is easy to understand also does not create an additional list which is a better for higher performance
I have a list of dicts:
list = [{'id':'1234','name':'Jason'},
{'id':'2345','name':'Tom'},
{'id':'3456','name':'Art'}]
How can I efficiently find the index position [0],[1], or [2] by matching on name = 'Tom'?
If this were a one-dimensional list I could do list.index() but I'm not sure how to proceed by searching the values of the dicts within the list.
lst = [{'id':'1234','name':'Jason'}, {'id':'2345','name':'Tom'}, {'id':'3456','name':'Art'}]
tom_index = next((index for (index, d) in enumerate(lst) if d["name"] == "Tom"), None)
# 1
If you need to fetch repeatedly from name, you should index them by name (using a dictionary), this way get operations would be O(1) time. An idea:
def build_dict(seq, key):
return dict((d[key], dict(d, index=index)) for (index, d) in enumerate(seq))
people_by_name = build_dict(lst, key="name")
tom_info = people_by_name.get("Tom")
# {'index': 1, 'id': '2345', 'name': 'Tom'}
A simple readable version is
def find(lst, key, value):
for i, dic in enumerate(lst):
if dic[key] == value:
return i
return -1
It won't be efficient, as you need to walk the list checking every item in it (O(n)). If you want efficiency, you can use dict of dicts.
On the question, here's one possible way to find it (though, if you want to stick to this data structure, it's actually more efficient to use a generator as Brent Newey has written in the comments; see also tokland's answer):
>>> L = [{'id':'1234','name':'Jason'},
... {'id':'2345','name':'Tom'},
... {'id':'3456','name':'Art'}]
>>> [i for i,_ in enumerate(L) if _['name'] == 'Tom'][0]
1
Seems most logical to use a filter/index combo:
names=[{}, {'name': 'Tom'},{'name': 'Tony'}]
names.index(next(filter(lambda n: n.get('name') == 'Tom', names)))
1
And if you think there could be multiple matches:
[names.index(item) for item in filter(lambda n: n.get('name') == 'Tom', names)]
[1]
Answer offered by #faham is a nice one-liner, but it doesn't return the index to the dictionary containing the value. Instead it returns the dictionary itself. Here is a simple way to get: A list of indexes one or more if there are more than one, or an empty list if there are none:
list = [{'id':'1234','name':'Jason'},
{'id':'2345','name':'Tom'},
{'id':'3456','name':'Art'}]
[i for i, d in enumerate(list) if 'Tom' in d.values()]
Output:
>>> [1]
What I like about this approach is that with a simple edit you can get a list of both the indexes and the dictionaries as tuples. This is the problem I needed to solve and found these answers. In the following, I added a duplicate value in a different dictionary to show how it works:
list = [{'id':'1234','name':'Jason'},
{'id':'2345','name':'Tom'},
{'id':'3456','name':'Art'},
{'id':'4567','name':'Tom'}]
[(i, d) for i, d in enumerate(list) if 'Tom' in d.values()]
Output:
>>> [(1, {'id': '2345', 'name': 'Tom'}), (3, {'id': '4567', 'name': 'Tom'})]
This solution finds all dictionaries containing 'Tom' in any of their values.
Here's a function that finds the dictionary's index position if it exists.
dicts = [{'id':'1234','name':'Jason'},
{'id':'2345','name':'Tom'},
{'id':'3456','name':'Art'}]
def find_index(dicts, key, value):
class Null: pass
for i, d in enumerate(dicts):
if d.get(key, Null) == value:
return i
else:
raise ValueError('no dict with the key and value combination found')
print find_index(dicts, 'name', 'Tom')
# 1
find_index(dicts, 'name', 'Ensnare')
# ValueError: no dict with the key and value combination found
One liner!?
elm = ([i for i in mylist if i['name'] == 'Tom'] or [None])[0]
I needed a more general solution to account for the possibility of multiple dictionaries in the list having the key value, and a straightforward implementation using list comprehension:
dict_indices = [i for i, d in enumerate(dict_list) if d[dict_key] == key_value]
def search(itemID,list):
return[i for i in list if i.itemID==itemID]
The following will return the index for the first matching item:
['Tom' in i['name'] for i in list].index(True)
my answer is better in one a dictionary to use
food_time_dict = {"Lina": 312400, "Tom": 360054, "Den": 245800}
print(list(food_time_dict.keys()).index("Lina"))
I request keys from the dictionary, then I translate the list if it is not added, there will be an error then I use it as a list. but on your code:
lists = [{'id': '1234', 'name': 'Jason'},
{'id': '2345', 'name': 'Tom'},
{'id': '3456', 'name': 'Art'}]
def dict_in_lists_index(lists, search): # function for convenience
j = 0 # [j][i]
for i in lists:
try: # try our varible search if not found in list
return f"[{j}][{list(i.values()).index(search)}]"
# small decor
except ValueError: # error was ValueError
pass # aa... what must was what you want to do
j += 1 # not found? ok j++
return "Not Found"
def dict_cropped_index(lists, search):
for i in lists:
try:
return list(i.values()).index(search)
except ValueError:
pass
return "Not Found"
print(dict_in_lists_index(lists, 'Tom')) # and end
print(dict_cropped_index(lists, 'Tom')) # now for sure end
For a given iterable, more_itertools.locate yields positions of items that satisfy a predicate.
import more_itertools as mit
iterable = [
{"id": "1234", "name": "Jason"},
{"id": "2345", "name": "Tom"},
{"id": "3456", "name": "Art"}
]
list(mit.locate(iterable, pred=lambda d: d["name"] == "Tom"))
# [1]
more_itertools is a third-party library that implements itertools recipes among other useful tools.