I am reading in a csv via csv.DictReader and trying to replace any empty values with the None value. DictReader seems to take the file as an instance of dictionaries where each row of the CSV is a dictionary (which I am fine with). However when I try to iterate through it row/dictionary by row/dictionary and replace any empty values ("") with None I seem to get unstuck. I had previously written this as a list comprehension like this:
for row in data:
row = [None if not x else x for x in row]
But I need to switch to using dictionaries rather than lists. I've not had any experience with dictionary comprehensions before but when I try to extend this for dictionaries I just cant get it to work. I was thinking something along the lines of:
for row in data:
row.values() = [None if not x else x for x in row.values()}
but I just get SyntaxError: invalid syntax.. I've tried a lot of other things (too many to list here) like:
for row in data:
row = {k:None for k,v in row if v not v else v}
but this seems to have the same problem.
For reference, my data looks like:
{'colour': 'ab6612', 'line': '1', 'name': 'Baker', 'stripe': ''}
{'colour': 'f7dc00', 'line': '3', 'name': '', 'stripe': 'FFFFFF'}
and would ideally end up as:
{'colour': 'ab6612', 'line': '1', 'name': 'Baker', 'stripe': None}
{'colour': 'f7dc00', 'line': '3', 'name': None, 'stripe': 'FFFFFF'}
Your issue is that you are changing the name row to reference a new dictionary in the for loop, this will not change anything inside your original list/DictReader object - data .
If data is a list, you should enumerate over data and change the dictionary inside data (or make that reference a new dictionary)
Example -
for i,row in enumerate(data):
data[i] = {k:(v if v else None) for k,v in row.items()}
Example test -
>>> data = [{1:2 , 3:''},{4:'',5:6}]
>>> for i,row in enumerate(data):
... data[i] = {k:(v if v else None) for k,v in row.items()}
...
>>> data
[{1: 2, 3: None}, {4: None, 5: 6}]
And since you are using DictReader class, you cannot directly, change the DictReader object, so you should create a new list , and add the changed row in the new list (or a DictWriter object, would prefer the DictWriter object) -
Example -
>>> newdata = []
>>> for row in data:
... newdata.append({k:(v if v else None) for k,v in row.items()})
Your main error is that you are trying to iterate twice over your dictionary whereas you only need to do it once.
Try:
data = {k:(v if v else None) for k,v in data.items()}
without the for-loop.
If you are using CSV and the data is too large please use iteritems()
this will save prevent the large list generation caused by items()
Try:
new_data=[]
for row in data:
new_data.append({k:(v if v else None) for k,v in row.iteritems()})
if you dont understand comprehensions follow this simple for loop:
for row in data:
for k,v in row.iteritems():
if not v:
row[k]=None
the second method is easy to understand also does not create an additional list which is a better for higher performance
Related
New to the Stackoverflow, apologies if the title isn't that clear.
Effectively I am working with two xl to CSV files, both converted into nested dictionaries using method to_dict, where index is the key for the each (main?) dictionary and the columns are the keys for each nested dictionary.
i.e.
DICTA = {0: {x:1, y:2, v:3}, 1: {x:5, y:6, v:7}, 2: {x:8, y:9, v:10}}
DICTB = {0: {a:3, b:12, c:13, d:14}, 1: {a:15, b:16, c:17, d:18}, 2: {a:19, b:20, c:21, d:22}}
Values are arbitrary for the example above (length of both dictionaries will always be the same, nested dictionaries have different number of keys)
Each nested dictionary in DICT B can only be used once to update a a nested DICT A dict i.e. each nested dict in DICT A 'belongs' to a nested dict in DICT B but not in any specific order.
My aim is to update values (of nested dicts) in Dict A with values from Dict B (keys are diff for both) if other conditions/values are met.. i.e. what I have so far:
for k, v in DICTA.items():
i=0
h=0
if DICTA[i].get('v') in (DICTB[h].get('a'), (DICTB[h].get('b')):
if (DICTB[h].get('a') != '15': #another condition I need to put in
DICTA[i].update({'x': DICTB[h].get('c')})
DICTA[i].update({'y': DICTB[h].get('d')})
i+=1
else:
DICTA[i].update({'y': DICTB[h].get('c')})
DICTA[i].update({'x': DICTB[h].get('d')})
i+=1
else:
h+=1
Actual output:
In: DICTA
Out: {0: {x:13, y:14, v:3}, 1: {x:5, y:6, v:7}, 2: {x:8, y:9, v:10}}
Expected Output for the above:
In: DICTA
Out: {0: {x:13, y:14, v:3}, 1: {x:18, y:17, v:7}, 2: {x:21, y:22, v:10}}
My issue is that this works for the first DICTA entry but then fails to update the next two i.e. this clearly doesn't update i or h correctly to loop through the next nested dictionary.
Fully aware the above might be painfully un-pythonic and am very much open to easier ways of solving this.
Thanks guys appreciate any help with the above.
If I understand you correctly this should work:
for row_a, row_b in zip(DICTA.values(), DICTB.values()):
if row_a.get('v') in (row_b.get('a'), row_b.get('b')):
if row_b.get('a') != '15':
row_a.update({
'x': row_b.get('c'),
'y': row_b.get('d')
})
else:
row_a.update({
'y': row_b.get('c'),
'x': row_b.get('d')
})
Also instead of:
row_a.update({
'x': row_b.get('c'),
'y': row_b.get('d')
})
You could use:
row_a['x'] = row_b.get('c')
row_a['y'] = row_b.get('d')
but that's a question of preference.
The output must be like this:
[{'id': '1', 'first_name': 'Heidie','gender': 'Female'}, {'id': '2', 'first_name': 'Adaline', 'gender': 'Female'}, {...}
There is a code snippet that works, running this requirement.
with open('./test.csv', 'r') as file_read:
reader = csv.DictReader(file_read, skipinitialspace=True)
listDict = [{k: v for k, v in row.items()} for row in reader]
print(listDict)
However, i can't understand some points about this code above:
List comprehension: listDict = [{k: v for k, v in row.items()} for row in reader]
How the python interpret this?
How does the compiler assemble a list always with the header (id,first_name, gender) and their values?
How would be the implementation of this code with nested for
I read theese answers, but i still do not understand:
python list comprehension double for
convert csv file to list of dictionaries
My csv file:
id,first_name,last_name,email,gender
1,Heidie,Philimore,hphilimore0#msu.edu,Female
2,Adaline,Wapplington,awapplington1#icq.com,Female
3,Erin,Copland,ecopland2#google.co.uk,Female
4,Way,Buckthought,wbuckthought3#usa.gov,Male
5,Adan,McComiskey,amccomiskey4#theatlantic.com,Male
6,Kilian,Creane,kcreane5#hud.gov,Male
7,Mandy,McManamon,mmcmanamon6#omniture.com,Female
8,Cherish,Futcher,cfutcher7#accuweather.com,Female
9,Dave,Tosney,dtosney8#businesswire.com,Male
10,Torr,Kiebes,tkiebes9#dyndns.org,Male
your list comprehension :
listDict = [{k: v for k, v in row.items()} for row in reader]
equals:
item_list = []
#go through every row
for row in reader:
item_dict = {}
#in every row go through each item
for k,v in row.items():
#add each items k,v to dict.
item_dict[k] = v
#append every item_dict to item_list
item_list.append(item_dict)
print(item_list)
EDIT (some more explanation):
#lets create a list
list_ = [x ** 2 for x in range(0,10)]
print(list_)
this returns:
[0,1,4,9,16,25,36,49,64,81]
You can write this as:
list_ = []
for x in range(0,10):
list_.append(x ** 2)
So in that example yes you read it 'backwards'
Now assume the next:
#lets create a list
list_ = [x ** 2 for x in range(0,10) if x % 2 == 0]
print(list_)
this returns:
[0,4,16,36,64]
You can write this as:
list_ = []
for x in range(0,10):
if x % 2 == 0:
list_.append(x ** 2)
So thats not 100% backwards, but it should be logical whats happening. Hope this helps you!
I am reading a dictionary from external source, let's say
data = {'name': 'James', 'gender': 'male'}
And sometimes
data = {'name': 'James', 'gender': 'male', 'article': {'title':'abc'}}
And sometimes
data = {'name': 'James', 'gender': 'male', 'article': None}
I know that I can use .get(key, default) when I am not sure if articles exists in data:
articles = data.get('article', {}).get('title')
But sometimes, they provide the element with None value, so the above doesn't work and cause error, and need to become:
articles = data.get('article') or {}
But this requires me to break it into 2 statements instead of chaining up to get values from article as mentioned earlier.
Is there a more elegant way to do that, something like:
data.get('article', {}, ignore=[None])
or
data.get_ignore_none('article', {})
By default .get() will return None if the key doesn't exist. In your case you are returning an empty dictionary.
Now, I don't know what error is being raised, but I am sure its from get_stuff(article) rather than your list comprehension.
You have a few ways to solve this:
Modify get_stuff so that it takes the value directly, rather than each element. This way, you are just passing it [get_stuff(value) for value in data.get('articles')]. Now, in get_stuff, you simply do this:
def get_stuff(foo):
if not foo:
return None
for item in foo:
do stuff
return normal_results
Add a filter in your list comprehension:
[get_stuff(foo) for foo in data.get('articles') if data.get('articles')]
There's nothing wrong with using exceptions to break out early in this case. I'm assuming you want the title value, or None, no matter what the data is. The following function will work (for Python 3).
def get_title(d):
try:
return data.get("article").get("title")
except AttributeError:
return None
If the outer dictionary gets a None as value or by default it will raise the AttributeError on the None object which you just catch.
First off you seem to think that using an or expression to discard false-y results from data.get('article') can only be done in two statements like the following:
temp = data.get('article') or {}
articles = temp.get("title")
However you can just put brackets around the first expression and call .get("title") on it's return value directly:
articles = (data.get('article') or {}).get("title")
But I feel this is not particularly readable or efficient, when 'article' missing or None then you are creating a new mapping and checking it for "title" unnecessarily.
One possible solution is to use a function like the following:
def nested_get(mapping, *keys):
"""gets a value from a nested dictionary,
if any key along the way is missing or None then None is returned
will raise an AttributeError if a value in the chain is not a dictionary (support the .get method)"""
current = mapping
for item in keys:
current = current.get(item)
if current is None:
return None
return current
Then you would do nested_get(data, "article", "title") to try to get data["article"]["title"] without throwing errors if data["article"] is None or missing.
I tested this with the following code:
test_cases = [{'name': 'James', 'gender': 'male'},
{'name': 'James', 'gender': 'male', 'article': {'title':'abc'}},
{'name': 'James', 'gender': 'male', 'article': None}]
for case in test_cases:
print(case)
print(nested_get(case,"article","title"))
print()
#the follwing will raise an error since mapping["a"] would need to be a dict
nested_get({"a":5}, "a","b")
how about this
>>> data = {1:(42,23)}
>>> [x for x in data.get(1) or []]
[42, 23]
>>> [x for x in data.get(32) or []]
[]
use or to change to your default value in case you get None or something that eval to false
Edit:
In the same way you can or and brackets to get desired output in one line
articles = (data.get('article') or {}).get('title')
and with just that you handle all 3 cases.
you can also define get_ignore_none like for example
def get_ignore_none(data_dict, key, default=None):
if key in data_dict:
value = data_dict[key]
return value if value is not None else default
return default
Since you're loading this data from an external source, one option is a preprocessing step as soon as you load it:
from collections import Mapping
def remove_none(d):
for k, v in d.items():
if v is None:
del d[k]
if isinstance(v, Mapping):
remove_none(v)
data = load_data_from_somewhere()
remove_none(data)
Now you can just use get everywhere you need to:
articles = data.get('article', {}).get('title')
This question already has answers here:
Writing a CSV horizontally
(2 answers)
Closed 8 years ago.
Thanks to this other thread, I've successfully written my dictionary to a csv as a beginner using Python:
Writing a dictionary to a csv file with one line for every 'key: value'
dict1 = {0 : 24.7548, 1: 34.2422, 2: 19.3290}
csv looks like this:
0 24.7548
1 34.2422
2 19.3290
Now, i'm wondering what would be the best approach to organize several dictionaries with the same keys. I'm looking to have the keys as a first column, then the dict values in columns after that, all with a first row to distinguish the columns by dictionary names.
Sure, there are a lot of threads trying to do similar things, such as: Trouble writing a dictionary to csv with keys as headers and values as columns, but don't have my data structured in the same way (yet…). Maybe the dictionaries must be merged first.
dict2 = {0 : 13.422, 1 : 9.2308, 2 : 20.132}
dict3 = {0 : 32.2422, 1 : 23.342, 2 : 32.424}
My ideal output:
ID dict1 dict2 dict3
0 24.7548 13.422 32.2422
1 34.2422 9.2308 23.342
2 19.3290 20.132 32.424
I'm not sure, yet, how the column name ID for key names will work its way in there.
Use the csv module and list comprehension:
import csv
dict1 = {0: 33.422, 1: 39.2308, 2: 30.132}
dict2 = {0: 42.2422, 1: 43.342, 2: 42.424}
dict3 = {0: 13.422, 1: 9.2308, 2: 20.132}
dict4 = {0: 32.2422, 1: 23.342, 2: 32.424}
dicts = dict1, dict2, dict3, dict4
with open('my_data.csv', 'wb') as ofile:
writer = csv.writer(ofile, delimiter='\t')
writer.writerow(['ID', 'dict1', 'dict2', 'dict3', 'dict4'])
for key in dict1.iterkeys():
writer.writerow([key] + [d[key] for d in dicts])
Note that dictionaries is unordered by default, so if you want the keys in ascending order, you have to sort the keys:
for key in sorted(dict1.iterkeys(), key=lambda x: int(x)):
writer.writerow([key] + [d[key] for d in dicts])
If you need to handle situations where you can't be sure that all dicts have the same keys, you'll need to change some small stuff:
with open('my_data.csv', 'wb') as ofile:
writer = csv.writer(ofile, delimiter='\t')
writer.writerow(['ID', 'dict1', 'dict2', 'dict3', 'dict4'])
keys = set(d.keys() for d in dicts)
for key in keys:
writer.writerow([key] + [d.get(key, None) for d in dicts])
Use defaultdict(list)
from collections import defaultdict
merged_dict = defaultdict(list)
dict_list = [dict1, dict2, dict3]
for dict in dict_list:
for k, v in dict.items():
merged_dict[k].append(v)
This is what you get:
{0: [24.7548, 13.422, 32.2422], 1: [34.2422, 9.2308, 23.342], 2: [19.329, 20.132, 32.424]})
Then write the merged_dict to csv file as you had previously done for a single dict. This time writerow method of csv module will be helpful.
Here is one way to do it.
my_dicts = [dict1, dict2, dict3]
dict_names = range(1, len(my_dicts)+1)
header = "ID," + ",".join(map(lambda x: "dict"+str(x)), dict_names) + "\n"
all_possible_keys = set(reduce(lambda x,y: x + y.keys(), my_dicts, []))
with open("file_to_write.csv", "w") as output_file:
output_file.write(header)
for k in all_possible_keys:
print_str = "{},".format(k)
for d in my_dicts:
print_str += "{},".format(d.get(k, None))
print_str += "\n"
output_file.write(print_str)
It has been some time since I used Python, but here's my suggestion.
In Python, dictionary values can be of any type (as far as I remember, don't flame me if I'm wrong). At least it should be possible to map your keys to lists.
So you can loop over your dictionaries and maybe create a new dictionary 'd', and for each key, if the value is already in 'd', push the value to the value of 'd' (since the value of the associated key is a list).
Then you can write out the new dictionary as: (pseudocode)
for each key,value in dictionary
write key
write TAB
for each v in value
write v + TAB
write new line
end for
This doesn't include the 'header names' though, but I'm sure that's quite easy to add.
Here is my issue:
I am doing an LDAP search in Python. The search will return a dictionary object:
{'mail':['user#mail.com'],'mobile':['07852242242'], 'telephoneNumber':['01112512152']}
As you can see, the returned dictionary contains list values.
Sometimes, no result will be found:
{'mail':None, 'mobile':None, 'telephoneNumber':['01112512152']}
To extract the required values, I am using get() as to avoid exceptions if the dictionary item does not exist.
return {"uname":x.get('mail')[0], "telephone":x.get('telephoneNumber')[0], "mobile":x.get('mobile')[0]}
I want to return my own dictionary as above, with just the string values, but Im struggling to find an efficient way to check that the lists are not None and keep running into index errors or type errors:
(<type 'exceptions.TypeError'>, TypeError("'NoneType' object is unsubscriptable",)
Is there a way to use a get() method on a list, so that if the list is None it wont throw an exception???
{"uname":x.get('mail').get(0)}
What is the most efficient way of getting the first value of a list or returning None without using:
if isinstance(x.get('mail'),list):
or
if x.get('mail') is not None:
If you want to flatten your dictionary, you can just do:
>>> d = {'mail':None, 'mobile':None, 'telephoneNumber':['01112512152']}
>>>
>>> dict((k,v and v[0] or v) for k,v in d.items())
{'mail': None, 'mobile': None, 'telephoneNumber': '01112512152'}
If you'd also like to filter, cutting off the None values, then you could do:
>>> dict((k,v[0]) for k,v in d.items() if v)
{'telephoneNumber': '01112512152'}
Try the next:
return {"uname":(x.get('mail') or [None])[0], ...
It is a bit unreadable, so you probably want to wrap it into some helper function.
You could do something like this:
input_dict = {'mail':None, 'mobile':None, 'telephoneNumber':['01112512152']}
input_key_map = {
'mail': 'uname',
'telephoneNumber': 'telephone',
'mobile': 'mobile',
}
dict((new_name, input_dict[old_name][0])
for old_name, new_name in input_key_map.items() if input_dict.get(old_name))
# would print:
{'telephone': '01112512152'}
I'm not sure if there's a straightforward way to do this, but you can try:
default_value = [None]
new_x = dict((k, default_value if not v else v) for k, v in x.iteritems())
And use new_x instead of x.