I want to create a list of dictionary from a csv using only inbuilt functions. I need my function to return something like: [{col1: [values]}, {col2: [values]}, ...]
Here's my code snippet:
result = []
with open(filename, 'r') as f:
headers = f.readline().strip().split(',')
for line in f:
line_values = line.strip().split(',')
for i, header in enumerate(headers):
row = {}
row[header] = line_values[i]
result.append(row)
But I get a result like: [{'col1': '16'}, {'col2': '3779'}, {'col3': '60.5'}, ....]
I'm guessing that rather than a list of dictionaries, you want a single dictionary where each key is a column name and each value is a list of the values in that column. Here's how to get that:
with open('/tmp/data.csv', 'r') as f:
headers = f.readline().strip().split(',')
result = {k:[] for k in headers}
for line in f:
line_values = line.strip().split(',')
if len(line_values) == len(headers):
for i, header in enumerate(headers):
result[header].append(line_values[i])
print(result)
With that code and this input:
Col1,Col2,Col3
row1c1,row1c2,row1c3
row2c1,row2c2,row2c3
row3c1,row3c2,row3c3
You get:
{'Col1': ['row1c1', 'row2c1', 'row3c1'], 'Col2': ['row1c2', 'row2c2', 'row3c2'], 'Col3': ['row1c3', 'row2c3', 'row3c3']}
If you really want the format that you show in your example, you can convert the result to that as follows:
result = [{k: v} for k, v in result.items()]
Which gives you:
[{'Col1': ['row1c1', 'row2c1', 'row3c1']}, {'Col2': ['row1c2', 'row2c2', 'row3c2']}, {'Col3': ['row1c3', 'row2c3', 'row3c3']}]
The first result is more useful, as you can easily look up the values for a column via result[<column name>]. With the second version, you would need to iterate over each of the values in the list and look for the dictionary that contains a key that is the name of the column you're looking for. In this latter case, the inner dictionaries aren't doing you any good and are just making lookup harder and less efficient.
NOTE: Even if you really do want the latter format of the result, you would still compute that result in this same way.
Related
My csv file looks like this:
Name,Surname,Fathers_name
Prakash,Patel,sudeep
Rohini,Dalal,raghav
Geeta,vakil,umesh
I want to create a dictionary of lists which should be like this:
dict = {Name: [Pakash,Rohini,Geeta], Surname: [Patel,Dalal,vakil], Fathers_name: [sudeep,raghav,umesh]}
This is my code:
with open(ram_details, 'r') as csv_file:
csv_content = csv.reader(csv_file,delimiter=',')
header = next(csv_content)
if header != None:
for row in csv_content:
dict['Name'].append(row[0])
It is throwing an error that key does not exists? Also, if there is any better way to get the desired output!!! Can someone help me with this?
Your code looks fine. It should work, still if you are getting into any trouble you can always use defaultdict.
from collections import defaultdict
# dict = {'Name':[],'Surname':[],'FatherName':[]}
d = defaultdict(list)
with open('123.csv', 'r') as csv_file:
csv_content = csv.reader(csv_file,delimiter=',')
header = next(csv_content)
if header != None:
for row in csv_content:
# dict['Name'].append(row[0])
# dict['Surname'].append(row[1])
# dict['FatherName'].append(row[2])
d['Name'].append(row[0])
d['Surname'].append(row[1])
d['FatherName'].append(row[2])
Please don't name a variable similar to a build in function or type (such as dict).
The problem is that you haven't initialized a dictionary object yet. So you try to add a key and value to an object which is not known to be dict yet. In any case you need to do the following:
result = dict() # <-- this is missing
result[key] = value
Since you want to create a dictionary and want to append to it directly you can also use python's defaultdict.
A working example would be:
import csv
from collections import defaultdict
from pprint import pprint
with open('details.csv', 'r') as csv_file:
csv_content = csv.reader(csv_file, delimiter=',')
headers = list(map(str.strip, next(csv_content)))
result = defaultdict(list)
if headers != None:
for row in csv_content:
for header, element in zip(headers, row):
result[header].append(element)
pprint(result)
Which leads to the output:
defaultdict(<class 'list'>,
{'Fathers_name': ['sudeep', 'raghav', 'umesh'],
'Name': ['Prakash', 'Rohini ', 'Geeta '],
'Surname': ['Patel ', 'Dalal ', 'vakil ']})
Note 1) my csv file had some extra trailing spaces, which can be removed using strip(), as I did for the headers.
Note 2) I am using the zip function to iterate over the elements and headers at the same time (this saves me to index the row).
Possible alternative is using pandas to_dict method (docs)
You may try to use pandas to achieve that:
import pandas as pd
f = pd.read_csv('todict.csv')
d = f.to_dict(orient='list')
Or if you like a one liner:
f = pd.read_csv('todict.csv').to_dict('orient='list')
First you read your csv file to a pandas data frame (I saved your sample to a file named todict.csv). Then you use the dataframe to dict method to convert to dictionary, specifying that you want lists as your dictinoary values, as explained in the documentation.
I am preparing for a test and one of the topics is to parse tabular data without using csv/panda packages.
The ask is to take data with an arbitrary number of columns and convert it into a dictionary. The delimiter can be a space, colon or comma. For instance, here is some data with comma as the delimiter -
person,age,nationality,language, education
Jack,18,Canadian,English, bs
Rahul,25,Indian,Hindi, ms
Mark,50,American,English, phd
Kyou, 21, Japanese, English, bs
This should be converted into a dictionary format like this -
{'person': ['Jack', 'Rahul', 'Mark', 'Kyou'], 'age': ['18', '25', '50', '21'], 'education': ['doc', 'eng', 'llb', 'ca'], 'language': ['English', 'Hindi', 'English', 'English'
], 'nationality': ['Canadian', 'Indian', 'American', 'Japanese']}
Columns can vary among different files. My program should be flexible to handle this variety. For instance, in the next file there might be another column titled "gender".
I was able to get this working but feel my code is very "clunky". It works but I would like to do something more "pythonic".
from collections import OrderedDict
def parse_data(myfile):
# initialize myd as an ordered dictionary
myd = OrderedDict()
# open file with data
with open (myfile, "r") as f:
# use readlines to store tabular data in list format
data = f.readlines()
# use the first row to initialize the ordered dictionary keys
for item in data[0].split(','):
myd[item.strip()] = [] # initializing dict keys with column names
# variable use to access different column values in remaining rows
i = 0
# access each key in the ordered dict
for key in myd:
'''Tabular data starting from line # 1 is accessed and
split on the "," delimiter. The variable "i" is used to access
each column incrementally. Ordered dict format of myd ensures
columns are paired appropriately'''
myd[key] = [ item.split(',')[i].strip() for item in data[1:]]
i += 1
print dict(myd)
# my-input.txt
parse_data("my-input.txt")
Can you please suggest how can I make my code "cleaner"?
Here is a more pythonic way to approach this.
def parse(file):
with open(file, 'r') as f:
headings = f.readline().strip().split(',')
values = [l.strip().split(',') for l in f]
output_dict = {h: v for h, v in zip(headings, [*zip(*values)])}
return output_dict
print(parse('test.csv'))
First, take the first line in the file as the headings to use for the keys in the dictionary (this will break with duplicate headings)
Then, all the remaining values are read into a list of lists of strings using a list comprehension.
Finally the dictionary is compiled by zipping the list of headings with a transpose (thats what the [*zip(*values))] represents - if you are willing to use numpy you can replace this with numpy.array(values).T for example)
Slightly better version
def parse_data(myfile):
# read lines and strip out extra whitespaces and newline characters
lines = [line.strip() for line in open(myfile,"r").readlines()]
dict = {} # initialize our dict variable
# start loop from second line
for x in range(1,len(lines)):
# for each line split values and store them in dict[col]
for y in range(len(lines[0].split(","))):
# if col is not present in dict create new column and initialize it with a list
if lines[0].split(",")[y] not in dict:
dict[lines[0].split(",")[y]] = []
# store the corresponding column value to the dict
dict[lines[0].split(",")[y]].append(lines[x].split(",")[y])
parse_data("my-input.txt")
See it in action here.
Hope it helps!
My file is formatted into three columns of numbers:
2 12345 1.12345
1 54321 1.54321
3 12345 1.12345
I would like to have Python use the first two columns as keys and use the third column as the values. The file is large, meaning that I can't format it by hand. So how do I have Python automatically convert my large file into a dictionary?
Here is my code:
with open ('file name.txt' 'r') as f:
rows = ( line.split('\t') for line in f )
d = { row[0]:row[:3] for row in rows}
print(d)
The output prints the numbers diagonally all over the place. How do I format it properly?
Banana, you're close.
You need a comma separating the arguments of open.
You want to assign the third member of row, i.e. row[2].
You need to decide how to group the first two members of row into a hashable key. Making a tuple out of them, i.e. (row[0],row[1]) works.
Print the dictionary line by line.
Try:
with open('filename.txt','r') as f:
rows = ( line.split('\t') for line in f )
d = { (row[0],row[1]):row[2] for row in rows}
for key in d.keys():
print key,d[key]
I'm not sure exactly how you want the keys to layout. Regardless, you should use the csv module, using the '\t' as your delimiter.
import csv
with open('data.txt') as file:
tsvfile = csv.reader(file, delimiter='\t')
d = { "{},{}".format(row[0], row[1]): row[2] for row in tsvfile }
print(d)
Prints out:
{'3,12345': '1.12345', '1,54321': '1.54321', '2,12345': '1.12345'}
Alternatively, you have this:
with open('data.txt') as file:
tsvfile = csv.reader(file, delimiter='\t')
d = {}
for row in tsvfile:
d[row[0]] = row[2]
d[row[1]] = row[2]
print(d)
Prints out:
{'54321': '1.54321', '3': '1.12345', '1': '1.54321', '12345': '1.12345', '2': '1.12345'}
You should try -
import pprint
d = {}
with open ('file name.txt','r') as f:
for line in f:
row = line.split('\t')
if len(row) == 3:
d[(row[0], row[1])] = row[2]
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(d)
First of all your slicing is wrong.You can get the first tow column with line[:2] and 3rd with line[2].
Also you don't need to create your rows in a separated data structure you can use unpacking operation and map function within a dict comprehension :
with open ('ex.txt') as f:
d={tuple(i):j.strip() for *i,j in map(lambda line:line.split('\t'),f)}
print(d)
result :
{('2', '12345'): '1.12345', ('3', '12345'): '1.12345', ('1', '54321'): '1.54321'}
Note that as *i is a list and lists are unhashable objects you can not use it as your dictionary key so you can convert it to tuple.
And if you want to preserve the order you can use collections.OrderedDict :
from collections import OrderedDict
with open ('ex.txt') as f:
d=OrderedDict({tuple(i):j.strip() for *i,j in map(lambda line:line.split('\t'),f)})
print(d)
OrderedDict([(('2', '12345'), '1.12345'), (('1', '54321'), '1.54321'), (('3', '12345'), '1.12345')])
I have a list of dictionaries. For example
l = [{'date': '2014-01-01', 'spent': '$3'},[{'date': '2014-01-02', 'spent': '$5'}]
I want to make this csv-like so if I want to save it as a csv I can.
I have other that gets a list and calls splitlines() so I can use csv methods.
for example:
reader = csv.reader(resp.read().splitlines(), delimiter=',')
How can I change my list of dictionaries into a list that like a csv file?
I've been trying cast the dictionary to a string, but haven't had much luck. It should be something like
"date", "spent"
"2014-01-01", "$3"
"2014-01-02", "$5"
this will also help me print out the list of dictionaries in a nice way for the user.
update
This is the function I have which made me want to have the list of dicts:
def get_daily_sum(resp):
rev = ['revenue', 'estRevenue']
reader = csv.reader(resp.read().splitlines(), delimiter=',')
first = reader.next()
for i, val in enumerate(first):
if val in rev:
place = i
break
else:
place = None
if place:
total = sum(float(r[place]) for r in reader)
else:
total = 'Not available'
return total
so I wanted to total up a column from a list of column names. The problem was that the "revenue" column was not always in the same place.
Is there a better way? I have one object that returns a csv like string, and the other a list of dicts.
You would want to use csv.DictWriter to write the file.
with open('outputfile.csv', 'wb') as fout:
writer = csv.DictWriter(fout, ['date', 'spent'])
for dct in lst_of_dict:
writer.writerow(dct)
A solution using list comprehension, should work for any number of keys, but only if all your dicts have same heys.
l = [[d[key] for key in dicts[0].keys()] for d in dicts]
To attach key names for column titles:
l = dicts[0].keys() + l
This will return a list of lists which can be exported to csv:
import csv
myfile = csv.writer(open("data.csv", "wb"))
myfile.writerows(l)
I'm trying to get the top entry (string) in a matrix of data to be the variable name for the rest of the (numerical) data in each column. I've used the following to open the file and create the matrix.
with open('x.dat', 'r') as f:
row = 0
for line in f:
words = line.split(',')
for col in range(len(words)):
DataMatrix[row][col] = words[col]
row += 1
f.close()
However, i can't see how to take the string and have it be recognized as a variable name for the "list" of data which will be filled by the column of numerics. This has got to be simpler than I'm making it. Any help?
The data file looks like: ... (can't seem to get the format to show correctly, but each [] is a row and the rows are stacked on top of one another)
['% Time','FrameNo','Version','X','Y','Z',…]
['66266.265514','948780','2.5','64','0','30'…]
[66266.298785','948785', 2.5','63','0','32',…]
…
What you are looking for is the vars built-in function of python. This will give you a dict representing the variables in scope.
I don't follow the code in your example enough to add this solution into it, but here is an example using vars that might help:
# Set up some data (which would actually be read from a file in your example)
headers = ['item', 'quantity', 'cost']
data = [['dress', 'purse', 'puppy'], [1, 2, 15], [27.00, 15.00, 2.00]]
for i in range(len(headers)):
name = headers[i]
value = list()
for data_item in data[i]:
value.append(data_item)
# This sets the name of the header to the name of a variable
vars()[name] = value
# Print to prove that vars() worked
print 'Items', item
print 'Quantities', quantity
print 'Costs', cost
Which produces the following output:
Items ['dress', 'purse', 'puppy']
Quantities [1, 2, 15]
Costs [27.0, 15.0, 2.0]
Use the int function
with open('x.dat', 'r') as f:
row = 0
for line in f:
words = line.split(',')
for col in range(len(words)):
DataMatrix[row][int(col)] = words[int(col)]
row += 1
f.close()
Alternatively, you could do a CSV reader to make this marginally easier.
with open('x.dat', 'rb') as csvfile:
theReader = csv.reader(csvfile, delimiter=',')
row=0;
for line in theReader:
row+=1
for col in range(len(line)):
DataMatrix[row][int(col)] = words[int(col)]