Creating a dictionary using data from a .csv file - python

I have a .csv file with 20 lines, and each line is formatted as follows:
Lucy, 23, F, diabetes
Darwin, 60, M, hypertension
Dave, 35, M, epilepsy
Tiffany, 12, F, asthma
... and so on.
I am looking to convert this .csv file into a dictionary, presented as follows:
dict = {
'Lucy':{
age: 23
gender: 'F'
condition: 'diabetes'
},
'Darwin':{
age: 60
gender: 'M'
condition: 'hypertension'
},
#(and so on for all 20 lines)
}
Each line is in the form: name, age, gender, condition. Here's what I have tried so far.
dict ={}
f = open("medical.csv', mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
line.split(",")
... and this is where I hit a halt. I cannot figure out how to assign the titles to each value in the line such that the dictionary will be displayed as above, with the tags 'age', 'gender' and 'condition'. And when I run the code, there is a SyntaxError: invalid syntax message on "medical.csv".
The age has to be an integer. If it is not an integer, I want the program to skip that line when creating the dictionary.
Any help would be much appreciated!

I recommend not naming your dictionary keys with the names because the names can be repeated.
At the beginning create the main dict, then iterate over lines in CSV. In each line extract name person properties (You used split method - it fits perfectly here!, but instead of doing split(",") use split(", ")). Create dictionary for each person and assign keys and values to it this way:
person = {}
person['age'] = 23
An so on...
Then assign this person's dictionary as a value to the main dictionary and set the key to person's name. Hope it helps a bit!

I suggest to use the csv module for this purpose. Note the handy skipinitialspace argument.
import csv
from pprint import pprint
def row_to_dict(ts):
return {k: t for k, t in zip(("age", "gender", "condition"), ts)}
if __name__ == "__main__":
result = {}
with open("medical.csv") as f:
reader = csv.reader(f, skipinitialspace=True)
for row in reader:
name, data = row[0], row[1:]
result[name] = row_to_dict(data)
pprint(result)

First of all, please keep in mind that there might be more "pythonic" answers to your problem.
Well, you are on the right path:
dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
l = line.split(",")
Let's give a name to the result to line.split(",") (l).
Now l is in this format:
l[0] contains the name
l[1] contains the age
l[2] contains the sex
l[3] contains the condition.
Now, the first element of l is the name, so let's add it to the dictionary:
dict[l[0].strip()] = {}
Note:
I'm using l[0].strip() because there might be unwanted whitespace at the beginning or end of it
I'm initializing a new dictionary inside the dictionary (the data structure you want is a dictionary of dictionaies)
Now, let's add in turn the other fields:
dict[l[0].strip()]['gender'] = l[2].strip()
dict[l[0].strip()]['condition'] = l[3].strip()
This works, unless the age is not an integer, so we need to use a try except block for that beforehand:
try:
age = int(l[1].strip())
except ValueError:
continue # You want to skip the current iteration, right?
Now we can put everything together, polishing the code a bit:
dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
l = line.split(",")
age = -1
try:
age = int(l[1].strip())
except ValueError:
continue
key = l[0].strip()
dict[key]['age'] = age
dict[key]['sex'] = l[2].strip()
dict[key]['condition'] = l[3].strip()
Of course this supposes all the names are different (I've just read firanek's answer: I agree with her/him in the fact that you should not use names as the key, with this approach, you lose all the data about all the people with the same name except for the last one)
Oh, I was almost forgetting about it: you can use the dict constructor and replace the lines dict[keys][<string>] = <thing>:
dict[key] = {'age' = age, 'sex' = l[2].strip(), 'condition' = l[3].strip().

You may want to check out the Pandas library, and manipulate the data with DataFrames as it has lots of built-in functionality.
import pandas as pd
data=pd.read_csv("data.csv", header=None ,names=["Name", "Age", "Gender", "Condition"], index_col=False, na_values=",NaN, null", verbose=True)
data=pd.DataFrame(data)
newdata=data.dropna(subset=['Age'])
print("new data: \n", newdata)
Also a similar question: Pandas: drop columns with all NaN's

Related

reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]

How to fetch these below values in python

I have a file in the below format
.aaa b/b
.ddd e/e
.fff h/h
.lop m/n
I'm trying to read this file. My desired output is if I find ".aaa" I should get b/b, if I find ".ddd" I should get e/e and so on.
I know how to fetch 1st column and 2nd column but I don't know how to compare them and fetch the value. This is what I've written.
file = open('some_file.txt')
for line in file:
fields = line.strip().split()
print (fields[0]) #This will give 1st column
print (fields[1]) # This will give 2nd column
This is not the right way of doing things. What approach follow?
Any time you want to do lookups, a dictionary is going to be your friend.
You could write a function to load the data into a dictionary:
def load_data(filename):
result = dict()
with open(filename, 'r') as f:
for line in f:
k,v = line.strip().split() # will fail if not exactly 2 fields
result[k] = v
return result
And then use it to perform your lookups like this:
data = load_data('foo.txt')
print data['.aaa']
It sounds like what you may want is to build a dictionary mapping column 1 to column 2. You could try:
file = open('some_file.txt')
field_dict = {}
for line in file:
fields = line.strip().split()
field_dict[fields[0]] = fields[1]
Then in your other code, when you see '.ddd' you can simply get the reference from the dictionary (e.g. field_dict['.ddd'] should return 'e/e')
Just do splitting on each line according to the spaces and check whether the first item matches the word you gave. If so then do printing the second item from the list.
word = input("Enter the word to search : ")
with open(file) as f:
for line in f:
m = line.strip().split()
if m[0] == word:
print m[1]

Python 2 - iterating through csv with determinating specific lines as dicitonary

I generated csv from multiple dictionaries (to be readable and editable too) with help of this question. Output is simple
//Dictionary
key,value
key2,value2
//Dictionary2
key4, value4
key5, value5
i want double backslash to be separator to create new dictionary, but every calling csv.reader(open("input.csv")) evaluates through lines so i have no use of:
import csv
dict = {}
for key, val in csv.reader(open("input.csv")):
dict[key] = val
Thanks for helping me out..
Edit: i made this piece of.. well "code".. I'll be glad if you can check it out and review:
#! /usr/bin/python
import csv
# list of dictionaries
l = []
# evalute throught csv
for row in csv.reader(open("test.csv")):
if row[0].startswith("//"):
# stripped "//" line is name for dictionary
n = row[0][2:]
# append stripped "//" line as name for dictionary
#debug
print n
l.append(n)
#debug print l[:]
elif len(row) == 2:
# debug
print "len(row) %s" % len(row)
# debug
print "row[:] %s" % row[:]
for key, val in row:
# print key,val
l[-1] = dic
dic = {}
dic[key] = val
# debug
for d in l:
print l
for key, value in d:
print key, value
unfortunately i got this Error:
DictName
len(row) 2
row[:] ['key', ' value']
Traceback (most recent call last):
File "reader.py", line 31, in <module>
for key, val in row:
ValueError: too many values to unpack
Consider not using CSV
First of all, your overall strategy to the data problem is probably not optimal. The less tabular your data looks, the less sense it makes to keep it in a CSV file (though your needs aren't too far out of the realm).
For example, it would be really easy to solve this problem using json:
import json
# First the data
data = dict(dict1=dict(key1="value1", key2="value2"),
dict2=dict(key3="value3", key4="value4"))
# Convert and write
js = json.dumps(data)
f = file("data.json", 'w')
f.write(js)
f.close()
# Now read back
f = file("data.json", 'r')
data = json.load(f)
print data
Answering the question as written
However, if you are really set on this strategy, you can do something along the lines suggested by jonrsharpe. You can't just use the csv module to do all the work for you, but actually have to go through and filter out (and split by) the "//" lines.
import csv
import re
def header_matcher(line):
"Returns something truthy if the line looks like a dict separator"
return re.match("//", line)
# Open the file and ...
f = open("data.csv")
# create some containers we can populate as we iterate
data = []
d = {}
for line in f:
if not header_matcher(line):
# We have a non-header row, so we make a new entry in our draft dictionary
key, val = line.strip().split(',')
d[key] = val
else:
# We've hit a new header, so we should throw our draft dictionary in our data list
if d:
# ... but only if we actually have had data since the last header
data.append(d)
d = {}
# The very last chunk will need to be captured as well
if d:
data.append(d)
# And we're done...
print data
This is quite a bit messier, and if there is any chance of needed to escape commas, it will get messier still. If you needed, you could probably find a clever way of chunking up the file into generators that you read with CSV readers, but it won't be particularly clean/easy (I started an approach like this but it looked like pain...). This is all a testament to your approach likely being the wrong way to store this data.
An alternative if you're set on CSV
Another way to go if you really want CSV but aren't stuck on the exact data format you specify: Add a column in the CSV file corresponding to the dictionary the data should go into. Imagine a file (data2.csv) that looks like this:
dict1,key1,value1
dict1,key2,value2
dict2,key3,value3
dict2,key4,value4
Now we can do something cleaner, like the following:
import csv
data = dict()
for chunk, key, val in csv.reader(file('test2.csv')):
try:
# If we already have a dict for the given chunk id, this should add the key/value pair
data[chunk][key] = val
except KeyError:
# Otherwise, we catch the exception and add a fresh dictionary with the key/value pair
data[chunk] = {key: val}
print data
Much nicer...
The only good argument for doing something closer to what you have in mind over this is if there is LOTS of data, and space is a concern. But that is not very likely to be case in most situations.
And pandas
Oh yes... one more possible solution is pandas. I haven't used it much yet, so I'm not as much help, but there is something along the lines of a group_by function it provides, which would let you group by the first column if you end up structuring the data as in the the 3-column CSV approach.
I decided to use json instead
Reading this is easier for the program and there's no need to filter text. For generating the data inside database in external file.json will serve python program.
#! /usr/bin/python
import json
category1 = {"server name1":"ip address1","server name2":"ip address2"}
category2 = {"server name1":"ip address1","server name1":"ip address1"}
servers = { "category Alias1":category1,"category Alias2":category2}
js = json.dumps(servers)
f = file("servers.json", "w")
f.write(js)
f.close()
# Now read back
f = file("servers.json", "r")
data = json.load(f)
print data
So the output is dictionary containing keys for categories and as values are another dictionaries. Exactly as i wanted.

a loop to read lines that start with specific letters

I'm using a for loop to read a file, but I only want to read specific lines, say line that start with "af"and "apn". Is there any built-in feature to achieve this?
How to split this line after reading it ?
How to store the elements from the split into a dictionary?
Lets say the first element of the line after the split is employee ID i store it in the dictionary then the second element is his full name i want to store it in the dictionary too.
So when i use this line "employee_dict{employee_ID}" will i get his full name ?
Thank you.
You can do so very easily
f = open('file.txt', 'r')
employee_dict = {}
for line in f:
if line.startswith("af") or line.startswith("apn"):
emprecords = line.split() #assuming the default separator is a space character
#assuming that all your records follow a common format, you can then create an employee dict
employee = {}
#the first element after the split is employee id
employee_id = int(emprecords[0])
#enter name-value pairs within the employee object - for e.g. let's say the second element after the split is the emp name, the third the age
employee['name'] = emprecords[1]
employee['age'] = emprecords[2]
#store this in the global employee_dict
employee_dict[employee_id] = employee
To retrieve the name of employee id 1 after having done the above use something like:
print employee_dict[1]['name']
Hope this gives you an idea on how to go about
if your file looks like
af, 1, John
ggg, 2, Dave
you could create dict like
d = {z[1].strip() : z[2].strip() for z in [y for y in [x.split(',') for x in open(r"C:\Temp\test1.txt")] if y[0] in ('af', 'apn')]}
More readable version
d = {}
for l in open(r"C:\Temp\test1.txt"):
x = l.split(',')
if x[0] not in ('af', 'apn'): continue
d[x[1].strip()] = x[2].strip()
both solutions give you d = {'1': 'John'} on this example. To get name from the dict, you can do name = d['1']
prefixes = ("af", "apn")
with open('file.txt', 'r') as f:
employee_dict = dict((line.split()[:2]) for line in f if any(line.startswith(p) for p in prefixes))
dictOfNames={}
file = open("filename","r")
for line in file:
if line.startswith('af') or if line.startswith('apn'):
line=line.split(',') #split using delimiter of ','
dictOfNames[line[1]] = line[2] # take 2nd element of line as id and 3rd as name
The program above will read the file and store the second element as id and third as name if it starts with 'af' or 'apn'. Assuming comma is the delimiter.
now you can go with dictOfNames[id] to get the name.

convert list of values from a txt file to dictionary

So i have this txt file called "Students.txt", I want to define a function called load(student)
so i have this code:
def load(student):
body
im not quite sure what to write for the body of code so that it reads the file and returns the value from the file as dictionary. I know it would be something like readlines()
anyway, the file students.txt looks like this:
P883, Michael Smith, 1991
L672, Jane Collins, 1992
(added)L322, Randy Green, 1992
H732, Justin Wood, 1995(added)
^key ^name ^year of birth
the function has to return as a dictionary that looks like this:
{'P883': ('Michael Smith',1991),
'(key)':('name','year')}
I managed to return the values by trial and error however i cant make new lines and keep returning \n.
===============
this question has been answered and i used the following code which works perfectly however when there is a space in the values from the txt file.. (see added parts) it doesnt work anymore and gives an error saying that list index is out of range
Looks like a CSV file. You can use the csv module then:
import csv
studentReader = csv.reader(open('Students.txt', 'rb'), delimiter=',', skipinitialspace=True)
d = dict()
for row in studentReader:
d[row[0]] = tuple(row[1:])
This won't give you the year as integer, you have to transform it yourself:
for row in studentReader:
d[row[0]] = tuple(row[1], int(row[2]))
Something like this should do it, I think:
students = {}
infile = open("students.txt")
for line in infile:
line = line.strip()
parts = [p.strip() for p in line.split(",")]
students[parts[0]] = (parts[1], parts[2])
This might not be 100%, but should give you a starting-point. Error handling was omitted for brevity.
def load(students_file):
result = {}
for line in students_file:
key, name, year_of_birth = [x.strip() for x in line.split(",")]
result[key] = (name, year_of_birth)
return result
I would use the pickle module in python to save your data as a dict so you could load it easily by unpickling it. Or, you could just do:
d = {}
with open('Students.txt', 'r') as f:
for line in f:
tmp = line.strip().split(',')
d[tmp[0]] = tuple(tmp[1],tmp[2])

Categories

Resources