List of list of dictionaries to CSV - python

I know I can write a list of dictionaries directly to a CSV file. Similarly, is there a direct way to write a list of list of dictionaries to a csv file in python without iterating through each line manually?
Sample data
[[{'e': 46, 'p': 100, 'n': 0, 'a': 100, ...},
{'e': 29, 'p': 40, 'n': 1, 'a': 40, ...}, ...],
[{...}, ...]
Expected format
e,p,n,a,....
46,100,0,100,....
29,40,1,40,...
.......
Note this is not a list of dictionaries, but a list of list of dictionaries

Without Pandas, you can use itertools.chain to get a flattened list of all dictionaries and then write that to your CSV file with csv.DictWriter:
import csv
from itertools import chain
data = [
[{'e': 46, 'p': 100, 'n': 0, 'a': 100},
{'e': 29, 'p': 40, 'n': 1, 'a': 40}],
[{'e': 56, 'p': 200, 'n': 23, 'a': 10},
{'e': 22, 'p': 41, 'n': 11, 'a': 420}]]
fieldnames = ['e', 'p', 'n', 'a']
with open('mydata.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(chain.from_iterable(data))
Output (mydata.csv)
e,p,n,a
46,100,0,100
29,40,1,40
56,200,23,10
22,41,11,420

There must be a way to accomplish your task using just core Python, but I would go for Pandas:
import pandas as pd
d = yourListOfListsOfDictionaries
df = pd.concat(map(pd.DataFrame, d), sort=True)
# a e n p
#0 100 46 0 100
#0 40 29 1 40
df.to_csv(index=False)
#'a,e,n,p\n100,46,0,100\n40,29,1,40\n'

If you want the set of keys to be the union of all of the dictionaries in the list of list of dicts, then you can do something like this:
import csv
x = \
[[{'e': 46, 'p': 100, 'n': 0, 'a': 100},
{'e': 29, 'p': 40, 'n': 1, 'a': 40}],
[{'e': 19, 'p': 10, 'n': 1, 'a': 10, 'b':8}]]
key_dict = {}
for l in x:
for d in l:
for k in d:
key_dict[k] = None
with open('file.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, key_dict.keys())
writer.writeheader()
for l in x:
for d in l:
writer.writerow(d)
Result:
a,p,b,e,n
100,100,,46,0
40,40,,29,1
10,10,8,19,1

If the dictionaries have the same format, just flatten the list like so (assuming it's indeed a list of lists, two-dimensional):
data = [
[{'a': 10, 'b': 20, 'c': 30}],
[{'a': 20, 'b': 30, 'c': 40},
{'a': 30, 'b': 40, 'c': 50}]
]
rows = [item for sublist in data for item in sublist]
Then just write your rows to the CSV:
with open('my_data.csv', 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, rows[0].keys())
dict_writer.writeheader()
dict_writer.writerows(rows)
A combination of the two following posts:
How to make a flat list out of list of lists?
How do I convert this list of dictionaries to a csv file?

Related

Is there a way to map strings to integers automatically? [duplicate]

This question already has answers here:
Pandas DENSE RANK
(4 answers)
How to use Pandas to replace column entries in DataFrame and create dictionary new-old values
(2 answers)
Closed 2 months ago.
d = {'col': ['ana', 'ben', 'carl', 'dennis', 'earl', ...]}
df = pd.DataFrame(data = d)
I have an example dataframe here. Usually, if there are more than 5 unique values, OHE will not be used (correct me if I'm wrong).
Instead, mapping using a dictionary is used.
An example dictionary would be
dict = {'ana': 1, 'ben': 2, 'carl':, 3, ...}
Is there a library or any way to make this automatic (though manual mapping may be better as you know which values are mapped to which number)?
EDIT 1
Using ascii_lowercase, I am able to map single letter strings to integers. But as shown above, what if my strings are not single letters?
original question
You can generate the dictionary programatically using ascii.lowercase and enumerate in a dictionary comprehension:
from string import ascii_lowercase
dic = {k:v for v,k in enumerate(ascii_lowercase, start=1)}
Output:
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}
Then you can just map:
df['col'].map(dic)
edit: dictionary from an arbitrary Series of values
You can use pandas.factorize:
v,k = pd.factorize(df['col'])
dic = dict(zip(k, v+1))
Output: {'ana': 1, 'ben': 2, 'carl': 3, 'dennis': 4, 'earl': 5}

Create a list of dictionaries with random small alphabet letters as a key and as a random value number from 1-50

number of dictionaries in a list random from 2 to 15
dictionary length is random
for example: [{'d': 15, 'c': 9, 'g': 18, 'm': 33, 's': 10}, {'a': 9, 'h': 50, 'r': 15}]
I would like to use list comprehension, and I started from this:
import string
letter_count = dict((key, 0) for key in
string.ascii_lowercase) print(letter_count)
number_of_dictionaries = 3 # should be random
list_of_dictionaries = [dict() for number in range(number_of_dictionaries)]
I have no idea how to make random key and letters not in order.
Since the letters will be the keys, you need to select non-repeating random letters (which you can do using random.sample). the dictionary can be created by pairing up the letters with a list of random numbers using zip():
import string
import random
size = 10
keys = random.sample(string.ascii_lowercase,size)
values = (random.randint(1,50) for _ in range(size))
result = dict(zip(keys,values))
print(result)
{'r': 43, 't': 40, 'o': 18, 'm': 8, 'f': 47, 'a': 21, 'y': 50, 'b': 44,
'i': 42, 'w': 31}
To get multiple dictionaries in a list, you can combine this approach in a loop selecting a random size for each (in range 1-26).
dictList = []
for _ in range(random.randint(2,15)): # random number of dictionaries
size = random.randint(1,26) # random dictionary size
keys = random.sample(string.ascii_lowercase,size) # random letters
values = (random.randint(1,50) for _ in range(size)) # random numbers
oneDict = dict(zip(keys,values)) # assemble dict.
dictList.append(oneDict) # add it to the list
print(dictList)
[{'u': 14, 'j': 49},
{'y': 32},
{'y': 7, 'c': 26},
{'p': 11, 'k': 20, 'n': 6},
{'h': 4, 'f': 35, 'w': 19, 'n': 19, 'g': 25, 'p': 4, 'k': 36},
{'h': 47}]

Writing a list of dictionaries in CSV

The next problem you have a list of dictionaries of the format
[{'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14},
{'a': 20, 'b': 21, 'c': 22, 'd': 23, 'e': 24},
{'a': 30, 'b': 31, 'c': 32, 'd': 33, 'e': 34},
{'a': 40, 'b': 41, 'c': 42, 'd': 43, 'e': 44}]
which you want to move to CSV-file, looking like
"a","b","c","d","e"
10,11,12,13,14
20,21,22,23,24
30,31,32,33,34
40,41,42,43,44
Problem is that when you start code:
def write_csv_from_list_dict(filename, table, fieldnames, separator, quote):
table = []
for dit in table:
a_row = []
for fieldname in fieldnames:
a_row.append(dit[fieldname])
table.append(a_row)
file_handle = open(filename, 'wt', newline='')
csv_write = csv.writer(file_handle,
delimiter=separator,
quotechar=quote,
quoting=csv.QUOTE_NONNUMERIC)
csv_write.writerow(fieldnames)
for row in table:
csv_write.writerow(row)
file_handler.close()
raising error
(Exception: AttributeError) "'list' object has no attribute 'keys'"
at line 148, in _dict_to_list wrong_fields = rowdict.keys() - self.fieldnames
Why to be so hard to say, explicitly to close a file, not a string.
The below code should work
data = [{'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14},
{'a': 20, 'b': 21, 'c': 22, 'd': 23, 'e': 24},
{'a': 30, 'b': 31, 'c': 32, 'd': 33, 'e': 34},
{'a': 40, 'b': 41, 'c': 42, 'd': 43, 'e': 44}]
keys = data[0].keys()
with open('data.csv', 'w') as f:
f.write(','.join(keys) + '\n')
for entry in data:
f.write(','.join([str(v) for v in entry.values()]) + '\n')
data.csv
a,b,c,d,e
10,11,12,13,14
20,21,22,23,24
30,31,32,33,34
40,41,42,43,44

Filter list of dictionaries based on keys with one nested dictionary

Example:
[{"a":{"x":13, "y":32, "z":33}, "b":5, "c":7, "d":8, "e":9}, {"a":{"x":18, "y":28, "z":38}, "b":57, "c":77, "d":87, "e":97}, {"a":{"x":17, "y":72, "z":73}, "b":58, "c":70, "d":80, "e":90}, ...]
This is just a small sample set, but what I would like is a list with a filtered list of items in each dictionary such as below:
Sample Output:
[{"x":13, "b":5, "e"9}, {"x":18, "b":57, "e"97}, {"x":17, "b":58, "e"90}, ...]
I can filter it down to the following:
[{"a":{"x":13, "y":32, "z":33}, "b":5, "e":9}, {"a":{"x":18, "y":28, "z":38}, "b":57, "e":97}, {"a":{"x":17, "y":72, "z":73}, "b":58, "e":90}, ...]
using the following code
for i in range(len(results)):
desired_keys = ['a', 'b', 'e']
bigdict = all_results[i]
filtered = {x: bigdict[x] for x in desired_keys if x in bigdict}
but have yet to be able to figure out how to get the one element of the nested dictionary out.
You cannot just use your approach since it only works for top-level keys. You will need to specify each key and how to access it from the nested dictionary:
>>> [{'x': e['a']['x'], 'b': e['b'], 'e': e['e']} for e in results]
[{'x': 13, 'b': 5, 'e': 9}, {'x': 18, 'b': 57, 'e': 97}, {'x': 17, 'b': 58, 'e': 90}, ...]
As mentioned, all the items in the nested dictionaries must be visited.
This recursive approach
vals = [{"a":{"x":13, "y":32, "z":33}, "b":5, "c":7, "d":8, "e":9}, {"a":{"x":18, "y":28, "z":38}, "b":57, "c":77, "d":87, "e":97}, {"a":{"x":17, "y":72, "z":73}, "b":58, "c":70, "d":80, "e":90}]
def get_items(d, keys):
res = dict()
for k, v in d.items():
if isinstance(v, dict):
res.update(get_items(v, keys))
elif k in keys:
res[k] = v
return res
r = [get_items(d, {'x','b', 'e'}) for d in vals]
print(r)
produces
[{'x': 13, 'b': 5, 'e': 9}, {'x': 18, 'b': 57, 'e': 97}, {'x': 17, 'b': 58, 'e': 90}]
Note: make sure the keys do not appear more than once in any given path along the nested dictionaries.
Another possible recursive approach with a generator function:
def get_vals(d, to_find):
for a, b in d.items():
if a in to_find:
yield (a, b)
yield from [] if not isinstance(b, dict) else get_vals(b, to_find)
data = [{"a":{"x":13, "y":32, "z":33}, "b":5, "c":7, "d":8, "e":9}, {"a":{"x":18, "y":28, "z":38}, "b":57, "c":77, "d":87, "e":97}, {"a":{"x":17, "y":72, "z":73}, "b":58, "c":70, "d":80, "e":90}]
result = [dict(get_vals(i, ['x', 'b', 'e'])) for i in data]
Output:
[{'x': 13, 'b': 5, 'e': 9}, {'x': 18, 'b': 57, 'e': 97}, {'x': 17, 'b': 58, 'e': 90}]

make a matrix from a list of dictionaries in python3

I have a list of dictionaries like this example:
example:
a = [{'C': 3742, 'A': 38799, 'F': 66, 'D': 848, 'B': 12953, 'E': 140}, {'C': 2319, 'A': 23551, 'F': 33, 'D': 568, 'B': 8192, 'E': 87}]
for every single dictionary in the list I would like to sort the items based on the the Keys from A to F. and then make a list of lists (of the sorted dictionary) but only from the values of dictionary. here is the expected output:
expected output:
res = [[38799, 12953, 3742, 848, 140, 66], [23551, 8192, 2319, 568, 87, 33]]
to do so I made the following code in python:
res = []
for i in range(len(a)):
for e in sorted(a[i].keys()):
res.append(a[i][e])
but it does not return what I want. do you know how to fix it?
You want to put the result of from the dictionaries to an array, before adding to the final results
a = [{'C': 3742, 'A': 38799, 'F': 66, 'D': 848, 'B': 12953, 'E': 140}, {'C': 2319, 'A': 23551, 'F': 33, 'D': 568, 'B': 8192, 'E': 87}]
res = []
for i in range(len(a)):
sub_res = []
for e in sorted(a[i].keys()):
sub_res.append(a[i][e])
res.append(sub_res)
A shorter version of this would be:
res = [ [i[e] for e in sorted(i.keys())] for i in a ]
Use List comprehension. Avoid using loops.
y = [[i[key]for key in sorted(i.keys())] for i in x]
To sort items you can use built-in function sorted():
a = [{'C': 3742, 'A': 38799, 'F': 66, 'D': 848, 'B': 12953, 'E': 140}, {'C': 2319, 'A': 23551, 'F': 33, 'D': 568, 'B': 8192, 'E': 87}]
b = [[i[k] for k in sorted(i)] for i in a]
Add a list instead of adding individual elements in res list.
res = []
for i in range(len(a)):
temp = []
for e in sorted(a[i].keys()):
temp.append(a[i][e])
res.append(temp)
Here is another method using the function items of dict:
>>> [[i[1] for i in sorted(e.items())] for e in a]
[[38799, 12953, 3742, 848, 140, 66], [23551, 8192, 2319, 568, 87, 33]]
>>>
It sorts the values by keys.

Categories

Resources