list of dicts to memory for csv reader - python

I have a list of dictionaries. For example
l = [{'date': '2014-01-01', 'spent': '$3'},[{'date': '2014-01-02', 'spent': '$5'}]
I want to make this csv-like so if I want to save it as a csv I can.
I have other that gets a list and calls splitlines() so I can use csv methods.
for example:
reader = csv.reader(resp.read().splitlines(), delimiter=',')
How can I change my list of dictionaries into a list that like a csv file?
I've been trying cast the dictionary to a string, but haven't had much luck. It should be something like
"date", "spent"
"2014-01-01", "$3"
"2014-01-02", "$5"
this will also help me print out the list of dictionaries in a nice way for the user.
update
This is the function I have which made me want to have the list of dicts:
def get_daily_sum(resp):
rev = ['revenue', 'estRevenue']
reader = csv.reader(resp.read().splitlines(), delimiter=',')
first = reader.next()
for i, val in enumerate(first):
if val in rev:
place = i
break
else:
place = None
if place:
total = sum(float(r[place]) for r in reader)
else:
total = 'Not available'
return total
so I wanted to total up a column from a list of column names. The problem was that the "revenue" column was not always in the same place.
Is there a better way? I have one object that returns a csv like string, and the other a list of dicts.

You would want to use csv.DictWriter to write the file.
with open('outputfile.csv', 'wb') as fout:
writer = csv.DictWriter(fout, ['date', 'spent'])
for dct in lst_of_dict:
writer.writerow(dct)

A solution using list comprehension, should work for any number of keys, but only if all your dicts have same heys.
l = [[d[key] for key in dicts[0].keys()] for d in dicts]
To attach key names for column titles:
l = dicts[0].keys() + l
This will return a list of lists which can be exported to csv:
import csv
myfile = csv.writer(open("data.csv", "wb"))
myfile.writerows(l)

Related

Create a ist of dictionaries from csv file without inbuilt function

I want to create a list of dictionary from a csv using only inbuilt functions. I need my function to return something like: [{col1: [values]}, {col2: [values]}, ...]
Here's my code snippet:
result = []
with open(filename, 'r') as f:
headers = f.readline().strip().split(',')
for line in f:
line_values = line.strip().split(',')
for i, header in enumerate(headers):
row = {}
row[header] = line_values[i]
result.append(row)
But I get a result like: [{'col1': '16'}, {'col2': '3779'}, {'col3': '60.5'}, ....]
I'm guessing that rather than a list of dictionaries, you want a single dictionary where each key is a column name and each value is a list of the values in that column. Here's how to get that:
with open('/tmp/data.csv', 'r') as f:
headers = f.readline().strip().split(',')
result = {k:[] for k in headers}
for line in f:
line_values = line.strip().split(',')
if len(line_values) == len(headers):
for i, header in enumerate(headers):
result[header].append(line_values[i])
print(result)
With that code and this input:
Col1,Col2,Col3
row1c1,row1c2,row1c3
row2c1,row2c2,row2c3
row3c1,row3c2,row3c3
You get:
{'Col1': ['row1c1', 'row2c1', 'row3c1'], 'Col2': ['row1c2', 'row2c2', 'row3c2'], 'Col3': ['row1c3', 'row2c3', 'row3c3']}
If you really want the format that you show in your example, you can convert the result to that as follows:
result = [{k: v} for k, v in result.items()]
Which gives you:
[{'Col1': ['row1c1', 'row2c1', 'row3c1']}, {'Col2': ['row1c2', 'row2c2', 'row3c2']}, {'Col3': ['row1c3', 'row2c3', 'row3c3']}]
The first result is more useful, as you can easily look up the values for a column via result[<column name>]. With the second version, you would need to iterate over each of the values in the list and look for the dictionary that contains a key that is the name of the column you're looking for. In this latter case, the inner dictionaries aren't doing you any good and are just making lookup harder and less efficient.
NOTE: Even if you really do want the latter format of the result, you would still compute that result in this same way.

Turning a CSV file with a header into a python dictionary

Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
Lets say I have the following example csv file
a,b
100,200
400,500
How would I make into a dictionary like below:
{a:[100,400],b:[200,500]}
I am having trouble figuring out how to do it manually before I use a package, so I understand. Any one can help?
some code I tried
with open("fake.csv") as f:
index= 0
dictionary = {}
for line in f:
words = line.strip()
words = words.split(",")
if index >= 1:
for x in range(len(headers_list)):
dictionary[headers_list[i]] = words[i]
# only returns the last element which makes sense
else:
headers_list = words
index += 1
At the very least, you should be using the built-in csv package for reading csv files without having to bother with parsing. That said, this first approach is still applicable to your .strip and .split technique:
Initialize a dictionary with the column names as keys and empty lists as values
Read a line from the csv reader
Zip the line's contents with the column names you got in step 1
For each key:value pair in the zip, update the dictionary by appending
with open("test.csv", "r") as file:
reader = csv.reader(file)
column_names = next(reader) # Reads the first line, which contains the header
data = {col: [] for col in column_names}
for row in reader:
for key, value in zip(column_names, row):
data[key].append(value)
Your issue was that you were using the assignment operator = to overwrite the contents of your dictionary on every iteration. This is why you either want to pre-initialize the dictionary like above, or use a membership check first to test if the key exists in the dictionary, adding it if not:
key = headers_list[i]
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(words[i])
An even cleaner shortcut is to take advantage of dict.get:
key = headers_list[i]
dictionary[key] = dictionary.get(key, []) + [words[i]]
Another approach would be to take advantage of the csv package by reading each row of the csv file as a dictionary itself:
with open("test.csv", "r") as file:
reader = csv.DictReader(file)
data = {}
for row_dict in reader:
for key, value in row_dict.items():
data[key] = data.get(key, []) + [value]
Another standard library package you could use to clean this up further is collections, with defaultdict(list), where you can directly append to the dictionary at a given key without worrying about initializing with an empty list if the key wasn't already there.
To do that just keep the column name and data seperate then iterate the column and add the value for the corresponding index in data, not sure if this work with empty values.
However, I am much sure that going through pandas would be 100% easier, it's a really used library for working with data in external files.
import csv
datas = []
with open('fake.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
cols = row
line_count += 1
else:
datas.append(row)
line_count += 1
dict = {}
for index, col in enumerate(cols): #Iterate through the data with value and indices
dict[col] = []
for data in datas: #append a in the current dict key, a new value.
#if this key doesn't exist, it will create a new one.
dict[col].append(data[index])
print(dict)

Convert CSV file to a dictionary of lists

My csv file looks like this:
Name,Surname,Fathers_name
Prakash,Patel,sudeep
Rohini,Dalal,raghav
Geeta,vakil,umesh
I want to create a dictionary of lists which should be like this:
dict = {Name: [Pakash,Rohini,Geeta], Surname: [Patel,Dalal,vakil], Fathers_name: [sudeep,raghav,umesh]}
This is my code:
with open(ram_details, 'r') as csv_file:
csv_content = csv.reader(csv_file,delimiter=',')
header = next(csv_content)
if header != None:
for row in csv_content:
dict['Name'].append(row[0])
It is throwing an error that key does not exists? Also, if there is any better way to get the desired output!!! Can someone help me with this?
Your code looks fine. It should work, still if you are getting into any trouble you can always use defaultdict.
from collections import defaultdict
# dict = {'Name':[],'Surname':[],'FatherName':[]}
d = defaultdict(list)
with open('123.csv', 'r') as csv_file:
csv_content = csv.reader(csv_file,delimiter=',')
header = next(csv_content)
if header != None:
for row in csv_content:
# dict['Name'].append(row[0])
# dict['Surname'].append(row[1])
# dict['FatherName'].append(row[2])
d['Name'].append(row[0])
d['Surname'].append(row[1])
d['FatherName'].append(row[2])
Please don't name a variable similar to a build in function or type (such as dict).
The problem is that you haven't initialized a dictionary object yet. So you try to add a key and value to an object which is not known to be dict yet. In any case you need to do the following:
result = dict() # <-- this is missing
result[key] = value
Since you want to create a dictionary and want to append to it directly you can also use python's defaultdict.
A working example would be:
import csv
from collections import defaultdict
from pprint import pprint
with open('details.csv', 'r') as csv_file:
csv_content = csv.reader(csv_file, delimiter=',')
headers = list(map(str.strip, next(csv_content)))
result = defaultdict(list)
if headers != None:
for row in csv_content:
for header, element in zip(headers, row):
result[header].append(element)
pprint(result)
Which leads to the output:
defaultdict(<class 'list'>,
{'Fathers_name': ['sudeep', 'raghav', 'umesh'],
'Name': ['Prakash', 'Rohini ', 'Geeta '],
'Surname': ['Patel ', 'Dalal ', 'vakil ']})
Note 1) my csv file had some extra trailing spaces, which can be removed using strip(), as I did for the headers.
Note 2) I am using the zip function to iterate over the elements and headers at the same time (this saves me to index the row).
Possible alternative is using pandas to_dict method (docs)
You may try to use pandas to achieve that:
import pandas as pd
f = pd.read_csv('todict.csv')
d = f.to_dict(orient='list')
Or if you like a one liner:
f = pd.read_csv('todict.csv').to_dict('orient='list')
First you read your csv file to a pandas data frame (I saved your sample to a file named todict.csv). Then you use the dataframe to dict method to convert to dictionary, specifying that you want lists as your dictinoary values, as explained in the documentation.

How would I write to a CSV file from a python dictionary of dictionaries while some values are empty

I have a Python dictionary of dictionaries and have date stored that i need to write to a CSV file.
the problem i'm having is that some of the dictionaries from the file i have read don't contain any information for that particular ID. So my CSV file column are not lined up properly .
example
d["first1"]["title"] = founder
d["first1"]["started"] = 2005
d["second1"]["title"] = CEO
d["second1"]["favcolour"] = blue
and so when i use the following code:
for key, value in d.iteritems():
ln = [key]
for ikey, ivalue in value.iteritems():
ln.append(ikey)
ln.extend([v for v in ivalue])
writer.writerow(ln)
my CSV file will have all the information but the "started" and "favcolour" are in the same column i want it so that the columns only contain one .
Thanks all in advance
Here's a suggestion:
d = {"first1": {"title": 'founder', "started": 2005}, "second1": {"title": 'CEO', "favcolour": 'blue'}}
columns = []
output = []
for key, value in d.iteritems():
for ikey, ivalue in value.iteritems():
if ikey not in columns:
columns.append(ikey)
ln = []
for col in columns:
if col not in value:
ln.append('')
else:
ln.append(value[col])
output.append(ln)
with open('file', 'w') as fl:
csv_writer = csv.writer(fl)
csv_writer.writerow(columns)
for ln in output:
print ln
csv_writer.writerow(ln)
file:
started,title,favcolour
2005,founder
,CEO,blue
If it doesn't need to be human-readable, you can use alternatively pickle:
import pickle
# Write:
with open('filename.pickle', 'wb') as handle:
pickle.dump(d, handle)
# Read:
with open('filename.pickle', 'rb') as handle:
d = pickle.load(handle)
You can use the DictWriter class in csv to easily append what would be a sparse dictionary into a CSV. The only caveat is you need to know all the possible fields at the beginning.
import csv
data = { "first": {}, "second": {} }
data["first"]["title"] = "founder"
data["first"]["started"] = 2005
data["second"]["title"] = "CEO"
data["second"]["favcolour"] = "blue"
fieldNames = set()
for d in data:
for key in data[d].keys():
# Add all possible keys to fieldNames, beacuse fieldNames is
# a set, you can't have duplicate values
fieldNames.add(key)
with open('csvFile.csv', 'w') as csvfile:
# Initialize DictWriter with the list of fieldNames
# You can sort fieldNames to whatever order you wish the CSV
# headers to be in.
writer = csv.DictWriter(csvfile, fieldnames=list(fieldNames))
# Add Header to the CSV file
writer.writeheader()
# Iterate through all sub-dictionaries
for d in data:
# Add the sub-dictionary to the csv file
writer.writerow(data[d])
Pandas works really well for things like this, so if it's an option, I would recommend it.
import pandas as pd
#not necessary, but for me it's usually easier to work with a list of dicts than dicts
my_list = [my_dict[key] for key in my_dict]
# When you pass a list of dictionaries to pandas DataFrame class, it will take care of
#alignment issues for you, but if you're wanting to do something specific
#with None values, you will need to further manipulate the frame
df = pd.DataFrame(my_list)
df.to_csv('file_path_to_save_to')

Write dictionary of lists (varying length) to csv in Python

iam currently struggling with dictionaries of lists.
Given a dictionary like that:
GO_list = {'Seq_A': ['GO:1234', 'GO:2345', 'GO:3456'],
'Seq_B': ['GO:7777', 'GO:8888']}
No i wanted to write this dictionary to a csv file as
follows:
EDIT i have added the whole function to give more information
def map_GI2GO(gilist, mapped, gi_to_go):
with open(gilist) as infile:
read_gi = csv.reader(infile)
GI_list = {rows[0]:rows[1] for rows in read_gi} # read GI list into dictionary
GO_list = defaultdict(list) # set up GO list as empty dictionary of lists
infile.close()
with open(gi_to_go) as mapping:
read_go = csv.reader(mapping, delimiter=',')
for k, v in GI_list.items(): # iterate over GI list and mapping file
for row in read_go:
if len(set(row[0]).intersection(v)) > 0 :
GO_list[k].append(row[1]) # write found GOs into dictionary
break
mapping.close()
with open(mapped, 'wb') as outfile: # save mapped SeqIDs plus GOs
looked_up_go = csv.writer(outfile, delimiter='\t', quoting=csv.QUOTE_MINIMAL)
for key, val in GO_list.iteritems():
looked_up_go.writerow([key] + val)
outfile.close()
However this gives me the following output:
Seq_A,GO:1234;GO2345;GO:3456
Seq_B,GO:7777;GO:8888
I would prefer to have the list entries in separate columns,
separated by a defined delimiter. I have a hard time to get
rid of the ;, which are apparently separating the list entries.
Any ideas are welcome
If I were you I would try out itertools izip_longest to match up columns of varying length...
from csv import writer
from itertools import izip_longest
GO_list = {'Seq_A': ['GO:1234', 'GO:2345', 'GO:3456'],
'Seq_B': ['GO:7777', 'GO:8888']}
with open("test.csv","wb") as csvfile:
wr = writer(csvfile)
wr.writerow(GO_list.keys())#writes title row
for each in izip_longest(*GO_list.values()): wr.writerow(each)

Categories

Resources