So i have this txt file called "Students.txt", I want to define a function called load(student)
so i have this code:
def load(student):
body
im not quite sure what to write for the body of code so that it reads the file and returns the value from the file as dictionary. I know it would be something like readlines()
anyway, the file students.txt looks like this:
P883, Michael Smith, 1991
L672, Jane Collins, 1992
(added)L322, Randy Green, 1992
H732, Justin Wood, 1995(added)
^key ^name ^year of birth
the function has to return as a dictionary that looks like this:
{'P883': ('Michael Smith',1991),
'(key)':('name','year')}
I managed to return the values by trial and error however i cant make new lines and keep returning \n.
===============
this question has been answered and i used the following code which works perfectly however when there is a space in the values from the txt file.. (see added parts) it doesnt work anymore and gives an error saying that list index is out of range
Looks like a CSV file. You can use the csv module then:
import csv
studentReader = csv.reader(open('Students.txt', 'rb'), delimiter=',', skipinitialspace=True)
d = dict()
for row in studentReader:
d[row[0]] = tuple(row[1:])
This won't give you the year as integer, you have to transform it yourself:
for row in studentReader:
d[row[0]] = tuple(row[1], int(row[2]))
Something like this should do it, I think:
students = {}
infile = open("students.txt")
for line in infile:
line = line.strip()
parts = [p.strip() for p in line.split(",")]
students[parts[0]] = (parts[1], parts[2])
This might not be 100%, but should give you a starting-point. Error handling was omitted for brevity.
def load(students_file):
result = {}
for line in students_file:
key, name, year_of_birth = [x.strip() for x in line.split(",")]
result[key] = (name, year_of_birth)
return result
I would use the pickle module in python to save your data as a dict so you could load it easily by unpickling it. Or, you could just do:
d = {}
with open('Students.txt', 'r') as f:
for line in f:
tmp = line.strip().split(',')
d[tmp[0]] = tuple(tmp[1],tmp[2])
Related
I have a .csv file with 20 lines, and each line is formatted as follows:
Lucy, 23, F, diabetes
Darwin, 60, M, hypertension
Dave, 35, M, epilepsy
Tiffany, 12, F, asthma
... and so on.
I am looking to convert this .csv file into a dictionary, presented as follows:
dict = {
'Lucy':{
age: 23
gender: 'F'
condition: 'diabetes'
},
'Darwin':{
age: 60
gender: 'M'
condition: 'hypertension'
},
#(and so on for all 20 lines)
}
Each line is in the form: name, age, gender, condition. Here's what I have tried so far.
dict ={}
f = open("medical.csv', mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
line.split(",")
... and this is where I hit a halt. I cannot figure out how to assign the titles to each value in the line such that the dictionary will be displayed as above, with the tags 'age', 'gender' and 'condition'. And when I run the code, there is a SyntaxError: invalid syntax message on "medical.csv".
The age has to be an integer. If it is not an integer, I want the program to skip that line when creating the dictionary.
Any help would be much appreciated!
I recommend not naming your dictionary keys with the names because the names can be repeated.
At the beginning create the main dict, then iterate over lines in CSV. In each line extract name person properties (You used split method - it fits perfectly here!, but instead of doing split(",") use split(", ")). Create dictionary for each person and assign keys and values to it this way:
person = {}
person['age'] = 23
An so on...
Then assign this person's dictionary as a value to the main dictionary and set the key to person's name. Hope it helps a bit!
I suggest to use the csv module for this purpose. Note the handy skipinitialspace argument.
import csv
from pprint import pprint
def row_to_dict(ts):
return {k: t for k, t in zip(("age", "gender", "condition"), ts)}
if __name__ == "__main__":
result = {}
with open("medical.csv") as f:
reader = csv.reader(f, skipinitialspace=True)
for row in reader:
name, data = row[0], row[1:]
result[name] = row_to_dict(data)
pprint(result)
First of all, please keep in mind that there might be more "pythonic" answers to your problem.
Well, you are on the right path:
dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
l = line.split(",")
Let's give a name to the result to line.split(",") (l).
Now l is in this format:
l[0] contains the name
l[1] contains the age
l[2] contains the sex
l[3] contains the condition.
Now, the first element of l is the name, so let's add it to the dictionary:
dict[l[0].strip()] = {}
Note:
I'm using l[0].strip() because there might be unwanted whitespace at the beginning or end of it
I'm initializing a new dictionary inside the dictionary (the data structure you want is a dictionary of dictionaies)
Now, let's add in turn the other fields:
dict[l[0].strip()]['gender'] = l[2].strip()
dict[l[0].strip()]['condition'] = l[3].strip()
This works, unless the age is not an integer, so we need to use a try except block for that beforehand:
try:
age = int(l[1].strip())
except ValueError:
continue # You want to skip the current iteration, right?
Now we can put everything together, polishing the code a bit:
dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
l = line.split(",")
age = -1
try:
age = int(l[1].strip())
except ValueError:
continue
key = l[0].strip()
dict[key]['age'] = age
dict[key]['sex'] = l[2].strip()
dict[key]['condition'] = l[3].strip()
Of course this supposes all the names are different (I've just read firanek's answer: I agree with her/him in the fact that you should not use names as the key, with this approach, you lose all the data about all the people with the same name except for the last one)
Oh, I was almost forgetting about it: you can use the dict constructor and replace the lines dict[keys][<string>] = <thing>:
dict[key] = {'age' = age, 'sex' = l[2].strip(), 'condition' = l[3].strip().
You may want to check out the Pandas library, and manipulate the data with DataFrames as it has lots of built-in functionality.
import pandas as pd
data=pd.read_csv("data.csv", header=None ,names=["Name", "Age", "Gender", "Condition"], index_col=False, na_values=",NaN, null", verbose=True)
data=pd.DataFrame(data)
newdata=data.dropna(subset=['Age'])
print("new data: \n", newdata)
Also a similar question: Pandas: drop columns with all NaN's
I have a text file with something like a million lines that are formatted like this:
{"_id":"0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0","parameterId":"visib_mean_last10min","stationId":"06193","timeCreated":1581590344449633,"timeObserved":1577922600000000,"value":11100}
The file has no headers. I want to be able to observe it as an array.
I've tried this:
df = pd.read_csv("2020-01_2.txt", delimiter = ",", header = None, names = ["_id", "parameterId", "stationId", "timeCreated", "timeObserved", "value"])
and while that does sort the files into columns and rows like I want it to it will plot "_id":"0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0" as the first entry where I would only want "0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0".
How do I plot only the value that comes after each ":" into the array?
As put it by #mousetail this looks as some kind of json file. You may want to do as follows:
import json
mylist = []
with open("2020-01_2.txt") as f:
for line_no, line in enumerate(f):
mylist.append([])
mydict = json.loads(line)
for k in mydict:
mylist[line_no].append(mydict[k])
mydict= {}
It will output a list of lists, each one of them corresponding to a file line.
Good luck!
I have a file in the below format
.aaa b/b
.ddd e/e
.fff h/h
.lop m/n
I'm trying to read this file. My desired output is if I find ".aaa" I should get b/b, if I find ".ddd" I should get e/e and so on.
I know how to fetch 1st column and 2nd column but I don't know how to compare them and fetch the value. This is what I've written.
file = open('some_file.txt')
for line in file:
fields = line.strip().split()
print (fields[0]) #This will give 1st column
print (fields[1]) # This will give 2nd column
This is not the right way of doing things. What approach follow?
Any time you want to do lookups, a dictionary is going to be your friend.
You could write a function to load the data into a dictionary:
def load_data(filename):
result = dict()
with open(filename, 'r') as f:
for line in f:
k,v = line.strip().split() # will fail if not exactly 2 fields
result[k] = v
return result
And then use it to perform your lookups like this:
data = load_data('foo.txt')
print data['.aaa']
It sounds like what you may want is to build a dictionary mapping column 1 to column 2. You could try:
file = open('some_file.txt')
field_dict = {}
for line in file:
fields = line.strip().split()
field_dict[fields[0]] = fields[1]
Then in your other code, when you see '.ddd' you can simply get the reference from the dictionary (e.g. field_dict['.ddd'] should return 'e/e')
Just do splitting on each line according to the spaces and check whether the first item matches the word you gave. If so then do printing the second item from the list.
word = input("Enter the word to search : ")
with open(file) as f:
for line in f:
m = line.strip().split()
if m[0] == word:
print m[1]
I generated csv from multiple dictionaries (to be readable and editable too) with help of this question. Output is simple
//Dictionary
key,value
key2,value2
//Dictionary2
key4, value4
key5, value5
i want double backslash to be separator to create new dictionary, but every calling csv.reader(open("input.csv")) evaluates through lines so i have no use of:
import csv
dict = {}
for key, val in csv.reader(open("input.csv")):
dict[key] = val
Thanks for helping me out..
Edit: i made this piece of.. well "code".. I'll be glad if you can check it out and review:
#! /usr/bin/python
import csv
# list of dictionaries
l = []
# evalute throught csv
for row in csv.reader(open("test.csv")):
if row[0].startswith("//"):
# stripped "//" line is name for dictionary
n = row[0][2:]
# append stripped "//" line as name for dictionary
#debug
print n
l.append(n)
#debug print l[:]
elif len(row) == 2:
# debug
print "len(row) %s" % len(row)
# debug
print "row[:] %s" % row[:]
for key, val in row:
# print key,val
l[-1] = dic
dic = {}
dic[key] = val
# debug
for d in l:
print l
for key, value in d:
print key, value
unfortunately i got this Error:
DictName
len(row) 2
row[:] ['key', ' value']
Traceback (most recent call last):
File "reader.py", line 31, in <module>
for key, val in row:
ValueError: too many values to unpack
Consider not using CSV
First of all, your overall strategy to the data problem is probably not optimal. The less tabular your data looks, the less sense it makes to keep it in a CSV file (though your needs aren't too far out of the realm).
For example, it would be really easy to solve this problem using json:
import json
# First the data
data = dict(dict1=dict(key1="value1", key2="value2"),
dict2=dict(key3="value3", key4="value4"))
# Convert and write
js = json.dumps(data)
f = file("data.json", 'w')
f.write(js)
f.close()
# Now read back
f = file("data.json", 'r')
data = json.load(f)
print data
Answering the question as written
However, if you are really set on this strategy, you can do something along the lines suggested by jonrsharpe. You can't just use the csv module to do all the work for you, but actually have to go through and filter out (and split by) the "//" lines.
import csv
import re
def header_matcher(line):
"Returns something truthy if the line looks like a dict separator"
return re.match("//", line)
# Open the file and ...
f = open("data.csv")
# create some containers we can populate as we iterate
data = []
d = {}
for line in f:
if not header_matcher(line):
# We have a non-header row, so we make a new entry in our draft dictionary
key, val = line.strip().split(',')
d[key] = val
else:
# We've hit a new header, so we should throw our draft dictionary in our data list
if d:
# ... but only if we actually have had data since the last header
data.append(d)
d = {}
# The very last chunk will need to be captured as well
if d:
data.append(d)
# And we're done...
print data
This is quite a bit messier, and if there is any chance of needed to escape commas, it will get messier still. If you needed, you could probably find a clever way of chunking up the file into generators that you read with CSV readers, but it won't be particularly clean/easy (I started an approach like this but it looked like pain...). This is all a testament to your approach likely being the wrong way to store this data.
An alternative if you're set on CSV
Another way to go if you really want CSV but aren't stuck on the exact data format you specify: Add a column in the CSV file corresponding to the dictionary the data should go into. Imagine a file (data2.csv) that looks like this:
dict1,key1,value1
dict1,key2,value2
dict2,key3,value3
dict2,key4,value4
Now we can do something cleaner, like the following:
import csv
data = dict()
for chunk, key, val in csv.reader(file('test2.csv')):
try:
# If we already have a dict for the given chunk id, this should add the key/value pair
data[chunk][key] = val
except KeyError:
# Otherwise, we catch the exception and add a fresh dictionary with the key/value pair
data[chunk] = {key: val}
print data
Much nicer...
The only good argument for doing something closer to what you have in mind over this is if there is LOTS of data, and space is a concern. But that is not very likely to be case in most situations.
And pandas
Oh yes... one more possible solution is pandas. I haven't used it much yet, so I'm not as much help, but there is something along the lines of a group_by function it provides, which would let you group by the first column if you end up structuring the data as in the the 3-column CSV approach.
I decided to use json instead
Reading this is easier for the program and there's no need to filter text. For generating the data inside database in external file.json will serve python program.
#! /usr/bin/python
import json
category1 = {"server name1":"ip address1","server name2":"ip address2"}
category2 = {"server name1":"ip address1","server name1":"ip address1"}
servers = { "category Alias1":category1,"category Alias2":category2}
js = json.dumps(servers)
f = file("servers.json", "w")
f.write(js)
f.close()
# Now read back
f = file("servers.json", "r")
data = json.load(f)
print data
So the output is dictionary containing keys for categories and as values are another dictionaries. Exactly as i wanted.
This dictionary is supposed to take the three letter country code of a country, i.e, GRE for great britain, and then take the four consecutive numbers after it as a tuple. it should be something like this:
{GRE:(204,203,112,116)} and continue doing that for every single country in the list. The txt file goes down like so:
Country,Games,Gold,Silver,Bronze
AFG,13,0,0,2
ALG,15,5,2,8
ARG,40,18,24,28
ARM,10,1,2,9
ANZ,2,3,4,5 etc.;
This isn't actually code i just wanted to show it is formatted.
I need my program to skip the first line because it's a header. Here's what my code looks like thus far:
def medals(goldMedals):
infile = open(goldMedals, 'r')
medalDict = {}
for line in infile:
if infile[line] != 0:
key = line[0:3]
value = line[3:].split(',')
medalDict[key] = value
print(medalDict)
infile.close()
return medalDict
medals('GoldMedals.txt')
Your for loop should be like:
next(infile) # Skip the first line
for line in infile:
words = line.split(',')
medalDict[words[0]] = tuple(map(int, words[1:]))
A variation on a theme, I'd convert all the remaining cols to ints, and I'd use a namedtuple:
from collections import namedtuple
with open('file.txt') as fin:
# The first line names the columns
lines = iter(fin)
columns = lines.next().strip().split(',')
row = namedtuple('Row', columns[1:])
results = {}
for line in lines:
columns = line.strip().split(',')
results[columns[0]] = row(*(int(c) for c in columns[1:]))
# Results is now a dict to named tuples
This has the nice feature of 1) skipping the first line and 2) providing both offset and named access to the rows:
# These both work to return the 'Games' column
results['ALG'].Games
results['ALG'][0]
with open('path/to/file') as infile:
answer = {}
for line in infile:
k,v = line.strip().split(',',1)
answer[k] = tuple(int(i) for i in v.split(','))
I think inspectorG4dget's answer is the most readable... but for those playing code golf:
with open('medals.txt', 'r') as infile:
headers = infile.readline()
dict([(i[0], tuple(i[1:])) for i in [list(line.strip().split(',')) for line in infile]])