Replace empty values of a dictionary with NaN - python

I have a dictionary with missing values (the key is there, but the associated value is empty). For example I want the dictionary below:
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
to be changed to this form:
dct = {'ID':NaN, 'gender':'male', 'age':'20', 'weight':NaN, 'height':'5.7'}
How can I write that in the most time-efficient way?

You can use a dictionary comprehension. Also as was noted in the comments, naming something dict in Python is not good practice.:
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
dct = {k: None if not v else v for k, v in dct.items() }
print(dct)
Output:
{'ID': None, 'gender': 'male', 'age': '20', 'weight': None, 'height': '5.7'}
Just replace None with whatever you want it to default to.
In your question, you want to replace with NaN.
You can use any of the following:
float('nan') if you are using Python 2.x, or with Python <3.5
math.nan for Python 3.5+
numpy.nan using numpy

You can use implicit syntax with boolean or expression:
In [1]: dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
In [2]: {k: v or None for k, v in dct.items()}
Out[2]: {'ID': None, 'age': '20', 'gender': 'male', 'height': '5.7', 'weight': None}
But be aware that in The Zen of Python it's said:
Explicit is better than implicit.

You can create a class object to represent NaN:
class NaN:
def __init__(self, default=None):
self.val = default
def __repr__(self):
return 'NaN'
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
new_d = {a:NaN() if not b else b for a, b in dct.items()}
Output:
{'gender': 'male', 'age': '20', 'ID': NaN, 'weight': NaN, 'height': '5.7'}

You can use a for loop to iterate over all of they keys and values in the Dictionary.
dct = {'ID': '', 'gender': 'male', 'age': '20', 'weight': '', 'height': '5.7'}
for key, value in dct.items():
if value == '':
dct[key] = 'NaN'
print(dct)
You created your dictionary with a series of key value pairs.
I used a for loop and the .items() method to iterate over each key value pair in your dictionary.
if the value of the key/value pair is an empty string, We change the that particular value to 'NaN' and leave the rest unchanged.
When we print the new dictionary we get this output:
{'ID': 'NaN', 'gender': 'male', 'age': '20', 'weight': 'NaN', 'height': '5.7'}
This is time efficient because it is a quick loop, so long as you are okay with not 'NaN' values being strings. I am not sure if you are looking for them to be strings, however, you can change the value from 'NaN' to None very simply if that is what you are looking for. The for loop is relatively efficient in terms of time since it will iterate over each value quickly.

Related

Adding key and value to dictionary in python based on other dictionaries

I am using for loop in python and every loop creates a dictionary. I have the below set of dictionaries created.
{'name': 'xxxx'}
{'name': 'yyyy','age':'28'}
{'name': 'zzzz','age':'27','sex':'F'}
My requirement is to compare all the dictionaries created and find out the missing key values and add the key to missing dictionaries and order every dictionary based on key. Below is the expected output
Expected output:
{'age':'','name': 'xxxx','sex':''}
{'age':'28','name': 'yyyy','sex':''}
{'age':'27','name': 'zzzz','sex':'F'}
How to achieve this in python.
If you want to modify the dicts in-place, dict.setdefault would be easy enough.
my_dicts = [
{'name': 'xxxx'},
{'name': 'yyyy','age':'28'},
{'name': 'zzzz','age':'27','sex':'F'},
]
desired_keys = ['name', 'age', 'sex']
for d in my_dicts:
for key in desired_keys:
d.setdefault(key, "")
print(my_dicts)
prints out
[
{'name': 'xxxx', 'age': '', 'sex': ''},
{'name': 'yyyy', 'age': '28', 'sex': ''},
{'name': 'zzzz', 'age': '27', 'sex': 'F'},
]
If you don't want to hard-code the desired_keys list, you can make it a set and gather it from the dicts before the loop above.
desired_keys = set()
for d in my_dicts:
desired_keys.update(set(d)) # update with keys from `d`
Another option, if you want new dicts instead of modifying them in place, is
desired_keys = ... # whichever method you like
empty_dict = dict.fromkeys(desired_keys, "")
new_dicts = [{**empty_dict, **d} for d in my_dicts]
EDIT based on comments:
This doesn't remove keys that are not there in desired keys.
This will leave only the desired keys:
desired_keys = ... # Must be a set
for d in my_dicts:
for key in desired_keys:
d.setdefault(key, "")
for key in set(d) - desired_keys:
d.pop(key)
However, at that point it might be easier to just create new dicts:
new_dicts = [
{key: d.get(value, "") for key in desired_keys}
for d in my_dicts
]
data = [{'name': 'xxxx'},
{'name': 'yyyy','age':'28'},
{'name': 'zzzz','age':'27','sex':'F'}]
First get the maximum, to get all the keys.
Then use dict.get to get default value as empty string for each of the keys, and sort the dictionary on key, you can combine List-comprehension and dict-comprehension:
allKD = max(data, key=len)
[dict(sorted({k:d.get(k, '') for k in allKD}.items(), key=lambda x:x[0])) for d in data]
OUTPUT:
[{'age': '', 'name': 'xxxx', 'sex': ''},
{'age': '28', 'name': 'yyyy', 'sex': ''},
{'age': '27', 'name': 'zzzz', 'sex': 'F'}]
One approach:
from operator import or_
from functools import reduce
lst = [{'name': 'xxxx'},
{'name': 'yyyy', 'age': '28'},
{'name': 'zzzz', 'age': '27', 'sex': 'F'}]
# find all the keys
keys = reduce(or_, map(dict.keys, lst))
# update each dictionary with the complement of the keys
for d in lst:
d.update(dict.fromkeys(keys - d.keys(), ""))
print(lst)
Output
[{'name': 'xxxx', 'age': '', 'sex': ''}, {'name': 'yyyy', 'age': '28', 'sex': ''}, {'name': 'zzzz', 'age': '27', 'sex': 'F'}]

Is it possible to write a function that returns multiple values in a specific format?

I'm writing a function that returns two values which will form the key-value pair of a dictionary. This function will be used to create a dictionary with dictionary comprehension. However, using dictionary comprehension the pair of values need to be provided in the format 'key: value'. To accomplish this, I have to call the function twice. Once for the key, and once for the value. For example,
sample_list = [['John', '24', 'M', 'English'],
['Jeanne', '21', 'F', 'French'],
['Yuhanna', '22', 'M', 'Arabic']]
def key_value_creator(sample_list):
key = sample_list[0]
value = {'age': sample_list[1],
'gender': sample_list[2],
'lang': sample_list[3]}
return key, value
dictionary = {key_value_creator(item)[0]: \
key_value_creator(item)[1] for item in sample_list}
As you can see, the function is called twice to generate values that can be generated in one run. Is there a way to return the values in a format that can be usable by the comprehension? If that is possible, the function need only be called once, as such:
dictionary = {key_value_creator(item) for item in sample_list}
As far as I have seen, other ways of returning multiple values is to return them in the form of a dictionary or list,
return {'key': key, 'value': value}
return [key, value]
but either way, to access them we will have to call the function twice.
dictionary = {key_value_creator(item)['key']: \
key_value_creator(item)['value'] for item in sample_list}
dictionary = {key_value_creator(item)[0]: \
key_value_creator(item)[1] for item in sample_list}
Is there a way to format these values so that we can send them to the dictionary comprehension statement in the format that it requires?
EDIT:
Expected Output:
{ 'John': {'age': '24', 'gender': 'M', 'lang': 'English'},
'Jeanne': {'age': '21', 'gender': 'F', 'lang': 'French'},
'Yuhanna': {'age': '22', 'gender': 'M', 'lang': 'Arabic'}}
Just use the dict builtin function, expecting a sequence of (key, value) pairs as returned by your key_value_creator function and making a dict from those:
>>> dict(map(key_value_creator, sample_list))
{'Jeanne': {'age': '21', 'gender': 'F', 'lang': 'French'},
'John': {'age': '24', 'gender': 'M', 'lang': 'English'},
'Yuhanna': {'age': '22', 'gender': 'M', 'lang': 'Arabic'}}
Also works with a generator expression instead of map:
>>> dict(key_value_creator(item) for item in sample_list)
Or use a dictionary comprehension with a nested generator expression and tuple-unpacking:
>>> {k: v for k, v in (key_value_creator(item) for item in sample_list)}
Or without your key_value_creator function, just using a nested dictionary comprehension:
>>> {n: {"age": a, "gender": g, "lang": l} for n, a, g, l in sample_list}

Accesssing all values from same key for different dictionaries within nested dictionary

I have a nested dictionary:
d = { 'wing': {'name': 'Ali', 'age': '19'},
'scrumHalf': {'name': 'Bob', 'age': '25'},
'flyHalf': {'name': 'Sam', 'age': '43'},
'prop': {'name': 'rob', 'age': '33'}}
I want to pull out the values for age only to generate a list
[19, 25, 43, 33]
I want to do this using a for loop, and as naively as possible, as I usually find that easiest to understand.
I have managed to print all of the keys using a for loop:
for i in d:
print i
for j in d[i]:
print j
but when I tried to edit it to print the values I got the error
NameError: name 'value' is not defined. How can I get 'value' to mean the value attached to a key?
Here is my edited version
for i in d:
print (i[value])
for j in d[i]:
print (j[value])
I am using python 2.7
You can access values in the dict with the help of the method values():
[i['age'] for i in d.values()]
# ['19', '25', '43', '33']
>>> [d.get(k).get('age') for k, v in d.items()]
['33', '25', '19', '43']
In-order to access a dictionary's value you are iterating through keys first which is correct i.e. for i in d:. So, in order to access value of key i in d, you'll need to do d[i] which will give you the value, for example {'name': 'rob', 'age': '33'} then to access the required key you'll have to access from dictionary once more i.e. d[i]['age'].

How to convert a list to dictionary where some keys are missing their values?

I have a list of objects with their counts associated after a semicolon. Trying to convert this list into a dictionary but some keys will be missing their values once converted. Tried using try/except but not sure how to store the value individually into the dictionary.
Example:
t = ['Contact:10', 'Account:20','Campaign:', 'Country:', 'City:']
The Campaign and Country objects would have no values when converting. I would like to either pass or assign a NaN as the dictionary value.
I tried something like this but with no avail.
for objects in t:
try:
dictionary = dict(objects.split(":") for objects in t)
except:
pass
Any suggestion is appreciated.
You do not really need to try/catch:
t = ['Contact:10', 'Account:20','Campaign:', 'Country:', 'City:']
{ a: b for a,b in (i.split(':') for i in t) }
this yields:
{'Account': '20', 'Campaign': '', 'City': '', 'Contact': '10', 'Country': ''}
If you want None instead of empty string:
{ a: b if b else None for a,b in (i.split(':') for i in t) }
You can use a generator expression with a split of each item and pass the output to the dict constructor:
dict(i.split(':') for i in t)
This returns:
{'Contact': '10', 'Account': '20', 'Campaign': '', 'Country': '', 'City': ''}
If you would like the assign NaN as a default value you can do it with a dict comprehension instead:
{a: b or float('nan') for i in t for a, b in (i.split(':'),)}
This returns:
{'Contact': '10', 'Account': '20', 'Campaign': nan, 'Country': nan, 'City': nan}
If the value is missing, it will be an empty string
>>> 'foo:'.split(':')
['foo', '']
So this leads us to
data = {}
for pair in t:
key, value = pair.split(':')
data[key] = int(value) or float('nan')
This could be cleaned up a little with a dictionary comprehension.
import string
pairs = map(string.split, t)
data = {key: int(value) or float('nan') for key, value in pairs}
You could also decline to put those keys in the dictionary like so
data = {}
for pair in t:
key, value = pair.split(':')
if value:
data[key] = int(value) or float('nan')
t = ['Contact:10', 'Account:20','Campaign:', 'Country:', 'City:']
d = {}
for obj in t:
field = obj.split(':')
d[field[0]] = field[1] if field[1] else None
print(d)
Output:
{'Country': '', 'City': '', 'Campaign': '', 'Account': '20', 'Contact': '10'}

How should I remove all dicts from a list that have None as one of their values?

Suppose I have a list like so:
[{'name': 'Blah1', 'age': x}, {'name': 'Blah2', 'age': y}, {'name': None, 'age': None}]
It is guaranteed that both 'name' and 'age' values will either be filled or empty.
I tried this:
for person_dict in list:
if person_dict['name'] == None:
list.remove(person_dict)
But obviously that does not work because the for loop skips over an index sometimes and ignores some blank people.
I am relatively new to Python, and I am wondering if there is a list method that can target dicts with a certain value associated with a key.
EDIT: Fixed tuple notation to list as comments pointed out
Just test for the presence of None in the dict's values to test ALL dict keys for the None value:
>>> ToD=({'name': 'Blah1', 'age': 'x'}, {'name': 'Blah2', 'age': 'y'}, {'name': None, 'age': None})
>>> [e for e in ToD if None not in e.values()]
[{'age': 'x', 'name': 'Blah1'}, {'age': 'y', 'name': 'Blah2'}]
Or, use filter:
>>> filter(lambda d: None not in d.values(), ToD)
({'age': 'x', 'name': 'Blah1'}, {'age': 'y', 'name': 'Blah2'})
Or, if it is a limited test to 'name':
>>> filter(lambda d: d['name'], ToD)
({'age': 'x', 'name': 'Blah1'}, {'age': 'y', 'name': 'Blah2'})
You can use list comprehension as a filter like this
[c_dict for c_dict in dict_lst if all(c_dict[key] is not None for key in c_dict)]
This will make sure that you get only the dictionaries where all the values are not None.
for index,person_dict in enumerate(lis):
if person_dict['name'] == None:
del lis[index]
you can also try
lis=[person_dict for person_dict in lis if person_dict['name'] != None]
never use List as variable
You can create new list with accepted data. If you have tuple then you have to create new list.
List comprehension could be faster but this version is more readable for beginners.
data = ({'name': 'Blah1', 'age': 'x'}, {'name': 'Blah2', 'age': 'y'}, {'name': None, 'age': None})
new_data = []
for x in data:
if x['name']: # if x['name'] is not None and x['name'] != ''
new_data.append(x)
print new_data

Categories

Resources