I have a dictionary that's something like this.
d = {
"p1": ["$0.00", nan, "$25.00"],
"p2": ["$30.25", nan, "$12.25"],
"p3": ["$0.00"],
"p4": [nan, "$15.00"]
}
I'd like to look up values from the dictionary and assign them to a column in pandas DataFrame. The DataFrame is something like this:
df = pd.DataFrame({
'name': ["p1","p2","p3","p4"],
'val': [np.nan, np.nan, 4, np.nan]
})
Assignment:
def fun(name):
# look up from dict
# apply condition
return
df['val'] = df.apply(lambda x: func(x['name']), axis=1)
For the assignment, the condition is to find the largest value from the dictionary, if length of the value pair > 0, i.e there is more than 1 element in the list. Note, the elements are stored as strings. So, need the function needs to convert them to float type.
Expected output:
First rework your dictionary to keep the "highest" value with a custom key. Then perform a simple map:
d2 = {k: max((e for e in v if pd.notna(e)),
key=lambda x: float(x.strip('$')) if isinstance(x, str) else x)
for k,v in d.items()}
# {'p1': '$25.00', 'p2': '$30.25', 'p3': '$0.00', 'p4': '$15.00'}
df['val'] = df['name'].map(d2)
output:
name val
0 p1 $25.00
1 p2 $30.25
2 p3 $0.00
3 p4 $15.00
Related
can anyone advise how to loop over every Nth item in a dictionary?
Essentially I have a dictionary of dataframes and I want to be able to create a new dictionary based on every 3rd dataframe item (including the first) based on index positioning of the original. Once I have this I would like to concatenate the dataframes together.
So for example if I have 12 dataframes , I would like the new dataframe to contain the first,fourth,seventh,tenth etc..
Thanks in advance!
if the dict is required, you may use tuple of dict keys:
custom_dict = {
'first': 1,
'second': 2,
'third': 3,
'fourth': 4,
'fifth': 5,
'sixth': 6,
'seventh': 7,
'eighth': 8,
'nineth': 9,
'tenth': 10,
'eleventh': 11,
'twelveth': 12,
}
for key in tuple(custom_dict)[::3]:
print(custom_dict[key])
then, you may call pandas.concat:
df = pd.concat(
[
custom_dict[key]
for key in tuple(custom_dict)[::3]
],
# =========================================================================
# axis=0 # To Append One DataFrame to Another Vertically
# =========================================================================
axis=1 # To Append One DataFrame to Another Horisontally
)
assuming custom_dict[key] returns pandas.DataFrame, not int as in my code above.
What you ask it a bit strange. Anyway, you have two main options.
convert your dictionary values to list and slice that:
out = pd.concat(list(dfs.values())[::3])
output:
a b c
0 x x x
0 x x x
0 x x x
0 x x x
slice your dictionary keys and generate a subdictionary:
out = pd.concat({k: dfs[k] for k in list(dfs)[::3]})
output:
a b c
df1 0 x x x
df4 0 x x x
df7 0 x x x
df10 0 x x x
Used input:
dfs = {f'df{i+1}': pd.DataFrame([['x']*3], columns=['a', 'b', 'c']) for i in range(12)}
I have a dataframe (very large, millions of rows). Here how it looks:
id value
a1 0:0,1:10,2:0,3:0,4:7
b4 0:5,1:0,2:0,3:0,4:1
c5 0:0,1:3,2:2,3:0,4:0
k2 0:0,1:2,2:0,3:4,4:0
I want to turn those strings into dictionary, but only those key value pairs, where there is no 0. So desired result is:
id value
a1 {1:10, 4:7}
b4 {4:1}
c5 {1:3, 2:2}
k2 {1:2}
How to do that? when I try to use dict() function but it brings KeyError: 0:
df["value"] = dict(df["value"])
So I have problems with turning it into dictionary in the first place
I also have tried this:
df["value"] = json.loads(df["value"])
but it brings same error
This could do the trick, simply using list comprehensions:
import pandas as pd
dt = pd.DataFrame({"id":["a1", "b4", "c5", "k2"],
"value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})
def to_dict1(s):
return [dict([map(int, y.split(":")) for y in x.split(",") if "0" not in y.split(":")]) for x in s]
dt["dict"] = to_dict1(dt["value"])
Another way to obtain the same result would be using regular expressions (the pattern (?!0{1})(\d) matches any number but a single 0):
import re
def to_dict2(s):
return [dict([map(int, y) for y in re.findall("(?!0{1})(\d):(?!0{1})(\d+)", x)]) for x in s]
In terms of performance, to_dict1 is almost 20% faster, according to my tests.
This code will make a result you want. I made a sample input as you provided, and printed an expected result at the end.
import pandas as pd
df = pd.DataFrame(
{
'id': ['a1', 'b4', 'c5', 'k2'],
'value': ['0:0,1:10,2:0,3:0,4:7', '0:5,1:0,2:0,3:0,4:1', '0:0,1:3,2:2,3:0,4:0', '0:0,1:2,2:0,3:4,4:0']
}
)
value = [] # temporal value to save only key, value pairs without 0
for i, row in df.iterrows():
pairs = row['value'].split(',')
d = dict()
for pair in pairs:
k, v = pair.split(':')
k = int(k)
v = int(v)
if (k != 0) and (v != 0):
d[k] = v
value.append(d)
df['value'] = pd.Series(value)
print(df)
# id value
#0 a1 {1: 10, 4: 7}
#1 b4 {4: 1}
#2 c5 {1: 3, 2: 2}
#3 k2 {1: 2, 3: 4}
def make_dict(row):
""" Requires string list of shape
["0":"0", "1":"10", ...]"""
return {key: val for key, val
in map(lambda x: map(int, x.split(":")), row)
if key != 0 and val != 0}
df["value"] = df.value.str.split(",").apply(make_dict)
This is how I would do it:
def string_to_dict(s):
d = {}
pairs = s.split(',') # get each key pair
for pair in pairs:
key, value = pair.split(':') # split key from value
if int(value): # skip the pairs with zero value
d[key] = value
return d
df['value'] = df['value'].apply(string_to_dict)
use a dictionary comprehension to exclude key or value items equal to zero
txt="""id value
a1 0:0,1:10,2:0,3:0,4:7
b4 0:5,1:0,2:0,3:0,4:1
c5 0:0,1:3,2:2,3:0,4:0
k2 0:0,1:2,2:0,3:4,4:0 """
df = pd.DataFrame({"id":["a1", "b4", "c5", "k2"],
"value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})
for key,row in df.iterrows():
results=[]
{results.append({int(k),int(v)}) if int(k)!=0 and int(v)!=0 else None for k,v in (x.split(':') for x in row['value'].split(','))}
df.loc[key,'value']=results
print(df)
output:
id value
0 a1 [{1, 10}, {4, 7}]
1 b4 [{1, 4}]
2 c5 [{1, 3}, {2}]
3 k2 [{1, 2}, {3, 4}]
I have a dataframe, in which one column contain a dictionaries for every row. I want to select rows whose dictionary contains a specific value. Doesn't matter which key contains it.
The dictionaries have many levels (they contain a lot of lists, with a lot of dictionaries, again with a lot of lists and so on).
The data could look similar to this, but with the dictionaries being more complex:
df = pd.DataFrame({"A": [1,2,3], "B": [{"a":1}, {"b":**specific_value**}, {"c":3}]})
A B
0 1 {'a': 1}
1 2 {'b': 2}
2 3 {'c': 3}
I tried:
df.B.apply(lambda x : 'specific_value' in x.values())
To which I get "false" even the rows that I know contain the 'specific_value'. I am unsure if it is because of the layers.
You could use a recursive function to search for the specific value:
import pandas as pd
def nested_find_value(d, needle=4):
# we assume d is always a list or dictionary
haystack = d.values() if isinstance(d, dict) else d
for hay in haystack:
if isinstance(hay, (list, dict)):
yield from nested_find_value(hay, needle)
else:
yield hay == needle
def find(d, needle=4):
return any(nested_find_value(d, needle))
df = pd.DataFrame({"A": [1, 2, 3], "B": [{"a": 1}, {"b": {"d": 4}}, {"c": 3}]})
result = df["B"].apply(find)
print(result)
Output
0 False
1 True
2 False
Name: B, dtype: bool
In the example above the specific value is 4.
I have test data which is gathered based on multiple inputs, and results in a single output. I'm currently storing this data in a dictionary whose keys are my parameter/ results labels, and whose values are the test conditions and results. I would like to be able to filter the data so I can generate plots based on isolated conditions.
In my example below, my test conditions would be 'a' and 'b', and the result of the experiment would be 'c'. I want to filter my data so I get a dictionary with the same key, value structure and only my filtered results. However my current dictionary comprehension returns an empty dictionary. Any advice to get the desired result?
Current Code:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filtered_data = {k:v for k,v in data.iteritems() if v in data['b'] >= 20}
Desired Result:
{'a': [0, 1, 2], 'b': [20, 20, 20], 'c': [2.3, 2.9, 3.4]}
Current Result:
{}
Also, is this dictionary of lists a good schema to store data of this type, given that I'm going to want to filter the results, or is there a better way to accomplish this?
use this:
k:[v[i] for i,x in enumerate(v) if data['b'][i] >= 20] for k,v in data.items()}
Desired Result:
{'a': [0, 1, 2], 'c': [2.3, 2.9, 3.4], 'b': [20, 20, 20]}
Consider using the pandas module for this type of work.
import pandas as pd
df = pd.DataFrame(data)
df = df[df["b"] >= 20]
print(df)
It appears like this will give you what you want. You are using the dictionary key to represent the column name and the values are just rows in a given column, so it is amenable to using a dataframe.
Result:
a b c
3 0 20 2.3
4 1 20 2.9
5 2 20 3.4
Are all of the dictionary value lists in matching orders? If so, you could just look at whichever list you want to filter by, say 'b' in this case, find the values you want, and then either use those indices or the same slice on the other values in the dictionary.
For example:
matching_indices = []
for i in data['b']:
if data['b'][i] >= 20:
matching_indices.append(i)
new_dict = {}
for key in data:
for item in matching_indices:
new_dict[key] = data[key][item]
You could probably figure a dictionary comprehension for it if you wanted. Hopefully this is clear.
you can change this into a method which would give it more flexibility. Your current logic means that dataset a and c are neglected because there are no values greater than or equal to 20:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filter_vals = ['a', 'b']
new_d = {}
for k, v in data.iteritems():
if k in filter_vals:
new_d[k] = [i for i in v if i >= 20]
print new_d
Now i'm not a big fan if many if statements, but something like this is straight forward and can be called many times
def my_filter(operator, condition, filter_vals, my_dict):
new_d = {}
for k, v in my_dict.iteritems():
if k in filter_vals:
if operator == '>':
new_d[k] = [i for i in v if i > condition]
elif operator == '<':
new_d[k] = [i for i in v if i < condition]
elif operator == '<=':
new_d[k] = [i for i in v if i <= condition]
elif operator == '>=':
new_d[k] = [i for i in v if i >= condition]
return new_d
I agree with the pandas approach above.
If for some reason you hate pandas or are an old school computer scientist, tuples are a good way to tore relational data. In your example, the a, b, and c lists are columns rather than rows. For tuples, you would want to store the rows as:
data = {'a':(0,10,1.3),'b':(1,10,1.9),'c':(2,10,2.3),'d':(0,20,2.3),'e':(1,20,2.9),'f':(2,20,3.4)}
where the tuples are stored in the (condition1, condition2, outcome) format you described and you can call a single test or filter a set as you describe. From there you can get a filtered set of results as follows:
filtered_data = {k:v for k,v in data.iteritems() if v[1]>=20}
which returns:
{'d': (0, 20, 2.3), 'e': (1, 20, 2.9), 'f': (2, 20, 3.4)}
I have a dictionary that has Key:Values.
The values are integers. I would like to get a sum of the values based on a condition...say all values > 0 (i.e).
I've tried few variations, but nothing seems to work unfortunately.
Try using the values method on the dictionary (which returns a generator in Python 3.x), iterating through each value and summing if it is greater than 0 (or whatever your condition is):
In [1]: d = {'one': 1, 'two': 2, 'twenty': 20, 'negative 4': -4}
In [2]: sum(v for v in d.values() if v > 0)
Out[2]: 23
>>> a = {'a' : 5, 'b': 8}
>>> sum(value for _, value in a.items() if value > 0)