Convert CSV file to a dictionary of lists - python

My csv file looks like this:
Name,Surname,Fathers_name
Prakash,Patel,sudeep
Rohini,Dalal,raghav
Geeta,vakil,umesh
I want to create a dictionary of lists which should be like this:
dict = {Name: [Pakash,Rohini,Geeta], Surname: [Patel,Dalal,vakil], Fathers_name: [sudeep,raghav,umesh]}
This is my code:
with open(ram_details, 'r') as csv_file:
csv_content = csv.reader(csv_file,delimiter=',')
header = next(csv_content)
if header != None:
for row in csv_content:
dict['Name'].append(row[0])
It is throwing an error that key does not exists? Also, if there is any better way to get the desired output!!! Can someone help me with this?

Your code looks fine. It should work, still if you are getting into any trouble you can always use defaultdict.
from collections import defaultdict
# dict = {'Name':[],'Surname':[],'FatherName':[]}
d = defaultdict(list)
with open('123.csv', 'r') as csv_file:
csv_content = csv.reader(csv_file,delimiter=',')
header = next(csv_content)
if header != None:
for row in csv_content:
# dict['Name'].append(row[0])
# dict['Surname'].append(row[1])
# dict['FatherName'].append(row[2])
d['Name'].append(row[0])
d['Surname'].append(row[1])
d['FatherName'].append(row[2])

Please don't name a variable similar to a build in function or type (such as dict).
The problem is that you haven't initialized a dictionary object yet. So you try to add a key and value to an object which is not known to be dict yet. In any case you need to do the following:
result = dict() # <-- this is missing
result[key] = value
Since you want to create a dictionary and want to append to it directly you can also use python's defaultdict.
A working example would be:
import csv
from collections import defaultdict
from pprint import pprint
with open('details.csv', 'r') as csv_file:
csv_content = csv.reader(csv_file, delimiter=',')
headers = list(map(str.strip, next(csv_content)))
result = defaultdict(list)
if headers != None:
for row in csv_content:
for header, element in zip(headers, row):
result[header].append(element)
pprint(result)
Which leads to the output:
defaultdict(<class 'list'>,
{'Fathers_name': ['sudeep', 'raghav', 'umesh'],
'Name': ['Prakash', 'Rohini ', 'Geeta '],
'Surname': ['Patel ', 'Dalal ', 'vakil ']})
Note 1) my csv file had some extra trailing spaces, which can be removed using strip(), as I did for the headers.
Note 2) I am using the zip function to iterate over the elements and headers at the same time (this saves me to index the row).
Possible alternative is using pandas to_dict method (docs)

You may try to use pandas to achieve that:
import pandas as pd
f = pd.read_csv('todict.csv')
d = f.to_dict(orient='list')
Or if you like a one liner:
f = pd.read_csv('todict.csv').to_dict('orient='list')
First you read your csv file to a pandas data frame (I saved your sample to a file named todict.csv). Then you use the dataframe to dict method to convert to dictionary, specifying that you want lists as your dictinoary values, as explained in the documentation.

Related

Writing a function that returns the number of unique names in a column in a dataset - Python

I'm currently trying to write a function that takes an integer a dataset (one that I already have, named data). And looks for a column in this dataset called name. It then has to return the number of different types of names there are in the column (there are 4 values, but only 3 types of values--two of them are the same).
I'm having a hard time with this program, but this is what I have so far:
def name_count(data):
unique = []
for name in data:
if name.strip() not in unique:
unique[name] += 1
else:
unique[name] = 1
unique.append(name)
The only import I'm allowed to use for this challenge is math.
Does anyone have any help or advice they can offer with this problem?
You can use a set to keep duplicates from it, for example:
data = ['name1', 'name2', 'name3', 'name3 ']
cleaned_data = map(lambda x: x.strip(), data)
count = len(set(cleaned_data))
print(count)
>>> 3
You almost had it. Unique should be a dictionary, not a list.
def name_count(data):
unique = {}
for name in data:
if name.strip() in unique:
unique[name] += 1
else:
unique[name] = 1
return unique
#test
print(name_count(['Jack', 'Jill', 'Mary', 'Sam', 'Jack', 'Mary']))
#output
{'Jack': 2, 'Jill': 1, 'Mary': 2, 'Sam': 1}
def name_count(data):
df = pandas.DataFrame(data)
unique = []
for name in df["name"]: #if column name is "name"
if name:
if (name not in unique) :
unique.append(name)
return unique
You need to pass the complete dataset to the function and not just the integers.
It is not clear what kind of data variable you already have there.
So, I will suggest a solution, starting from reading the file.
Considering that you have a csv file and that there is a restriction on importing only math module (as you mentioned), then this should work.
def name_count(filename):
with open(filename, 'r') as fh:
headers = next(fh).split(',')
name_col_idx = headers.index('name')
names = [
line.split(',')[name_col_idx]
for line in fh
]
return len(set(names))
Here we read the first line, identify the location of name in the header, collect all items in the name column into a variable names and finally return the length of the set, which contains only unique elements.
Here is the solution if you are feeding a csv file to your function. It reads the csv file, gets rid of the header line, accumulates the names which are on index 1 of each line, casts the list as a set to get rid of the duplicates and returns the length of the set which is the same as the number of unique names.
import csv
def name_count(filename):
with open(filename, "r") as csvfile:
csvreader = csv.reader(csvfile)
names = [row[1] for row in csvreader if row][1:]
return len(set(names))
Alternatively, if you don't want to use a csv reader, you can use a tect file reader without any imports as follows. The code splits each line on commas.
def name_count(filename):
with open(filename, "r") as input:
names = [row.rstrip('\n').split(',')[1] for row in input if row][1:]
return len(set(names))

How would I write to a CSV file from a python dictionary of dictionaries while some values are empty

I have a Python dictionary of dictionaries and have date stored that i need to write to a CSV file.
the problem i'm having is that some of the dictionaries from the file i have read don't contain any information for that particular ID. So my CSV file column are not lined up properly .
example
d["first1"]["title"] = founder
d["first1"]["started"] = 2005
d["second1"]["title"] = CEO
d["second1"]["favcolour"] = blue
and so when i use the following code:
for key, value in d.iteritems():
ln = [key]
for ikey, ivalue in value.iteritems():
ln.append(ikey)
ln.extend([v for v in ivalue])
writer.writerow(ln)
my CSV file will have all the information but the "started" and "favcolour" are in the same column i want it so that the columns only contain one .
Thanks all in advance
Here's a suggestion:
d = {"first1": {"title": 'founder', "started": 2005}, "second1": {"title": 'CEO', "favcolour": 'blue'}}
columns = []
output = []
for key, value in d.iteritems():
for ikey, ivalue in value.iteritems():
if ikey not in columns:
columns.append(ikey)
ln = []
for col in columns:
if col not in value:
ln.append('')
else:
ln.append(value[col])
output.append(ln)
with open('file', 'w') as fl:
csv_writer = csv.writer(fl)
csv_writer.writerow(columns)
for ln in output:
print ln
csv_writer.writerow(ln)
file:
started,title,favcolour
2005,founder
,CEO,blue
If it doesn't need to be human-readable, you can use alternatively pickle:
import pickle
# Write:
with open('filename.pickle', 'wb') as handle:
pickle.dump(d, handle)
# Read:
with open('filename.pickle', 'rb') as handle:
d = pickle.load(handle)
You can use the DictWriter class in csv to easily append what would be a sparse dictionary into a CSV. The only caveat is you need to know all the possible fields at the beginning.
import csv
data = { "first": {}, "second": {} }
data["first"]["title"] = "founder"
data["first"]["started"] = 2005
data["second"]["title"] = "CEO"
data["second"]["favcolour"] = "blue"
fieldNames = set()
for d in data:
for key in data[d].keys():
# Add all possible keys to fieldNames, beacuse fieldNames is
# a set, you can't have duplicate values
fieldNames.add(key)
with open('csvFile.csv', 'w') as csvfile:
# Initialize DictWriter with the list of fieldNames
# You can sort fieldNames to whatever order you wish the CSV
# headers to be in.
writer = csv.DictWriter(csvfile, fieldnames=list(fieldNames))
# Add Header to the CSV file
writer.writeheader()
# Iterate through all sub-dictionaries
for d in data:
# Add the sub-dictionary to the csv file
writer.writerow(data[d])
Pandas works really well for things like this, so if it's an option, I would recommend it.
import pandas as pd
#not necessary, but for me it's usually easier to work with a list of dicts than dicts
my_list = [my_dict[key] for key in my_dict]
# When you pass a list of dictionaries to pandas DataFrame class, it will take care of
#alignment issues for you, but if you're wanting to do something specific
#with None values, you will need to further manipulate the frame
df = pd.DataFrame(my_list)
df.to_csv('file_path_to_save_to')

Python import csv to lists with multiple rows

I want to import a database.csv that contains 4 values
key,email1,email2,email3
filename,email#example.com,email2#example.com,email3#example.com
filename2,email#yahoo.com,email#google.com,email#outlook.com
etc,etc,etc,etc
Next I want to separate the column key to equal a list of filenames, and email1, email2, and email3 to another list
key = [filename]
emails = [email#example.com,email2#example.com,email3#example.com]
Current code
import csv
with open('data.csv') as read_csv:
reader = csv.reader(read_csv)
for row in reader:
key = row[0]
emails = row[1::]
return key
return emails
Output is
key = [filename2]
emails = [filename2,email#yahoo.com,email#google.com,email#outlook.com]
What I need is the key to match correspondingly with the emails to pass to another function.
A dictionary sounds like the appropriate solution here.
import csv
result = {}
with open('data.csv') as read_csv:
reader = csv.reader(read_csv)
for row in reader:
result[row[0]] = row[1:]
You can then always access or pass on the values like this:
result[filename2]

Convert a csv to a dictionary with multiple values?

I have a csv file like this:
pos,place
6696,266835
6698,266835
938,176299
940,176299
941,176299
947,176299
948,176299
949,176299
950,176299
951,176299
770,272944
2751,190650
2752,190650
2753,190650
I want to convert it to a dictionary like the following:
{266835:[6696,6698],176299:[938,940,941,947,948,949,950,951],190650:[2751,2752,2753]}
And then, fill the missing numbers in the range in the values:
{{266835:[6696,6697,6698],176299:[938,939,940,941,942,943,944,945,946947,948,949,950,951],190650:[2751,2752,2753]}
}
Right now i have tried to build the dictionary using solution suggested here, but it overwrites the old value with new one.
Any help would be great.
Here is a function that i wrote for converting csv2dict
def csv2dict(filename):
"""
reads in a two column csv file, and the converts it into dictionary
"""
import csv
with open(filename) as f:
f.readline()#ignore first line
reader=csv.reader(f,delimiter=',')
mydict=dict((rows[1],rows[0]) for rows in reader)
return mydict
Easiest is to use collections.defaultdict() with a list:
import csv
from collections import defaultdict
data = defaultdict(list)
with open(inputfilename, 'rb') as infh:
reader = csv.reader(infh)
next(reader, None) # skip the header
for col1, col2 in reader:
data[col2].append(int(col1))
if len(data[col2]) > 1:
data[col2] = range(min(data[col2]), max(data[col2]) + 1)
This also expands the ranges on the fly as you read the data.
Based on what you have tried -
from collections import default dict
# open archive reader
myFile = open ("myfile.csv","rb")
archive = csv.reader(myFile, delimiter=',')
arch_dict = defaultdict(list)
for rows in archive:
arch_dict[row[1]].append(row[0])
print arch_dict

list of dicts to memory for csv reader

I have a list of dictionaries. For example
l = [{'date': '2014-01-01', 'spent': '$3'},[{'date': '2014-01-02', 'spent': '$5'}]
I want to make this csv-like so if I want to save it as a csv I can.
I have other that gets a list and calls splitlines() so I can use csv methods.
for example:
reader = csv.reader(resp.read().splitlines(), delimiter=',')
How can I change my list of dictionaries into a list that like a csv file?
I've been trying cast the dictionary to a string, but haven't had much luck. It should be something like
"date", "spent"
"2014-01-01", "$3"
"2014-01-02", "$5"
this will also help me print out the list of dictionaries in a nice way for the user.
update
This is the function I have which made me want to have the list of dicts:
def get_daily_sum(resp):
rev = ['revenue', 'estRevenue']
reader = csv.reader(resp.read().splitlines(), delimiter=',')
first = reader.next()
for i, val in enumerate(first):
if val in rev:
place = i
break
else:
place = None
if place:
total = sum(float(r[place]) for r in reader)
else:
total = 'Not available'
return total
so I wanted to total up a column from a list of column names. The problem was that the "revenue" column was not always in the same place.
Is there a better way? I have one object that returns a csv like string, and the other a list of dicts.
You would want to use csv.DictWriter to write the file.
with open('outputfile.csv', 'wb') as fout:
writer = csv.DictWriter(fout, ['date', 'spent'])
for dct in lst_of_dict:
writer.writerow(dct)
A solution using list comprehension, should work for any number of keys, but only if all your dicts have same heys.
l = [[d[key] for key in dicts[0].keys()] for d in dicts]
To attach key names for column titles:
l = dicts[0].keys() + l
This will return a list of lists which can be exported to csv:
import csv
myfile = csv.writer(open("data.csv", "wb"))
myfile.writerows(l)

Categories

Resources