Python Converting .csv file with comma delimiter to dictionary - python

So I have tried to fix this problem for quite a while now and did some research on trying to figure out why my code won't work, but I simply can't get the dictionary to print with all the proper key:value pairs I need.
So here's the story. I am reading a .csv file where the first column are text abbreviations and in the second column they are the full english meaning. Now I have tried multiple ways of trying to open this file, read it, and then store it to dictionary we create. My issue is that the file gets read, and when I print the separated pieces (I believe it goes through the whole file, but I don't know since it does get cut off around line 1007, but goes through to 4600. The problem is that when I now want to take all that stuff and put it into key:value pairs inside a dictionary. The only one that gets stored is the very first line in the file.
Here is the code:
def createDictionary(filename):
f = open(filename, 'r')
dic = {}
for line in f:
#line = line.strip()
data = line.split(',')
print data
dic[data[0]] = data[1]
print dic
What I assumed was the issue was:
print dic
Since it is printing within the loop, but since it is in the loop it should just print everytime it goes through again and again. I am confused on what I am doing wrong. The other methods I attempted to use were json, but I don't know too much about how to use it, and then I also read up about the csv module, but I don't think our professor wants us to use that so i was hoping for someone to spot my error. Thanks in advance!!!
EDIT
This is the output of my program
going to be late\rg2cu', 'glad to see you\rg2e', 'got to eat\rg2g', 'got to go\rg2g2tb', 'got to go to the bathroom\rg2g2w', 'got to go to work\rg2g4aw', 'got to go for a while\rg2gb', 'got to go bye\rg2gb2wn', 'got to go back to work now\rg2ge', 'got to go eat\rg2gn', 'got to go now\rg2gp', 'got to go pee\rg2gpc', 'got 2 go parents coming\rg2gpp', 'got to go pee pee\rg2gs', 'got to go sorry\rg2k', 'good to know\rg2p', 'got to pee\rg2t2s', 'got to talk to someone\rg4u', 'good for you\rg4y', 'good for you\rg8', 'gate\rg9', 'good night\rga', 'go ahead\rgaalma', 'go away and leave me alone\rgafi', 'get away from it\rgafm', 'Get away from me\rgagp', 'go and get pissed\rgaj'
Which goes on for a bit until the end of the file and then after that its supposed to print the entire dictionary in which I get this
{'$$': 'money\r/.'}
Along with a
none
EDIT 2
Here is the full code:
def createDictionary(filename):
f = open(filename, 'r')
dic = {}
for line in f:
line = line.strip()
data = line.split(',')
print data
dic[data[0]] = data[1]
print dic
if __name__ == "__main__":
x = createDictionary("textToEnglish.csv")
print x
EDIT 3
Here is the file I am trying to make into a dictionary
https://1drv.ms/u/s!AqnudQBXpxTGiC9vQEopu1dOciIS

Simply add a return in your function. Also, you will see the dictionary length is not the same as csv rows due to repeated values in first column of csv. Dictionary keys must be unique, so when a reused key is assigned to a value, the latter value replaces former.
def createDictionary(filename):
f = open(filename, 'r')
dic = {}
for line in f:
#line = line.strip()
data = line.split(',')
print(data)
dic[data[0]] = data[1]
return dic
if __name__ == "__main__":
x = createDictionary("textToEnglish.csv")
print type(x)
# <class 'dict'>
print len(x)
# 4255
for k, v in x.items():
print(k, v)
And try not to print dictionary all at once especially with so many values which becomes intense overhead on memory. See how you can iterate through keys and values with for loop.

Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.
Pandas is a library for handling data in Python, preferred by many Data Scientists.
Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.
In your case:
import pandas
def createDictionary(filename):
my_data = pandas.DataFrame.from_csv(filename, sep=',', index_col=False)
list_of_dicts = [item for item in my_data.T.to_dict().values()]
return list_of_dicts
if __name__ == "__main__":
x = createDictionary("textToEnglish.csv")
print type(x)
# <class 'list'>
print len(x)
# 4255
print type(x[0])
# <class 'dict'>

Related

Using Python and Json and trying to replace sections

I'm using python to try and locate and change different parts of a Json file.
I have a list with 2 columns and what I want to do is look for the string in the first column, find it in the Json file and then replace it with the second column in the list.
Does anyone have any idea how to do this? Been driving me mad.
for row in new_list:
if json_str == new_list[row][0]:
json_str.replace(new_list[row][0], new_list[row][1])
I tried using the .replace() above but it says that it list indices must be integers or slices, not list.
The way that I've managed to print off all the data works...
But this is not referencing anything, so if anyone has any ideas, feel free to lend a hand, thanks.
import json
# I import a json file and a text file...
# with open('json file', 'r', encoding="utf8") as jsonData:
# data = json.load(jsonData)
# jsonData.close()
jsonData = {"employees":[
{"firstName":"a"},
{"firstName":"b"},
{"firstName":"c"}
]}
# text_file = open('text file', 'r', encoding="utf8")
# list = text_file.readlines()
# jsonString = str(data)
# The text file contains lots of data like 'a|A', 'b|B, 'c|C'
# so column 1 is lower and column 2 is upper
list = 'a|A, b|B, c|C'
def print_all():
for value in list:
new_list = value.split("|")
print("%s" % value.split("|"))
# This prints column 1 and 2
if new_list[0:] == 'some value':
print(new_list[1])
# This prints off the 'replaced' value
print_all()
edit for the comment, this should be able to run... I think
Without more context it's hard to say for sure, but it sounds as though what you want is to use
for row in range(len(new_list)):
instead of
for row in new_list:
If your JSON file is small enough, just read it into memory and then .replace() in a loop.
#UNTESTED
with open('json.txt') as json_file:
json_str = json_file.read()
for was, will_be in new_list:
json_str = json_str.replace(was, will_be)
with open('new-json.txt', 'w') as json_file:
json_file.write(json_str)

How to update dictionary on python?

So I'm working with creating a master dictionary while running a query for individual information.
Currently I have:
dictionary = {}
user_input =input('enter user id: ')
D = query(user_input)
dictionary[user_input] = D
And if I print dictionary[user_input] = D, I will get something like this:
{'user_input':[info]}
I want to prompt repeatedly and save all the individual information in one master dictionary and put it into a textfile.
How do I format my print so that when I try to print it to the textfile it's all written as one big dictionary?
What I've tried:
output_file = ('output.txt', 'w')
print(dictionary, file = output_file)
output_file.close()
This only seems to print {}
EDIT: Tried something diff.
Since D already returns a dictionary, I tried:
dictionary.update(D)
Which is supposed to add the dictionary that is stored in D to the dictionary right?
However, when I try printing dictionary:
print(dictionary)
#it returns: {}
Use json.dump to write to the file. Then you can use json.load to load that data back to a dictionary object.
import json
with open('dictionary.txt', 'w') as f:
json.dump(dictionary, f)
https://docs.python.org/3/library/json.html
EDIT: since you cannot use json maybe you can just separate the questions and answers with new lines like this. That will also be easy and clean to parse later:
with open('dictionary.txt', 'w') as f:
for k,v in dictionary.items():
f.write('%s=%s\n' % (k, v,))
Not totally familiar with the issue, so I'm not sure if this is what you're looking for. But you don't need to print the assignment itself in order to get the value. You can just keep adding more things to the dictionary as you go, and then print the whole dictionary to file at the end of your script, like so:
dictionary = {}
user_input =input('enter user id: ')
D = query(user_input)
dictionary[user_input] = D
# do this more times....
# then eventually....
print(dictionary)
# or output to a file from here, as described in the other answer

reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]

Python 2 - iterating through csv with determinating specific lines as dicitonary

I generated csv from multiple dictionaries (to be readable and editable too) with help of this question. Output is simple
//Dictionary
key,value
key2,value2
//Dictionary2
key4, value4
key5, value5
i want double backslash to be separator to create new dictionary, but every calling csv.reader(open("input.csv")) evaluates through lines so i have no use of:
import csv
dict = {}
for key, val in csv.reader(open("input.csv")):
dict[key] = val
Thanks for helping me out..
Edit: i made this piece of.. well "code".. I'll be glad if you can check it out and review:
#! /usr/bin/python
import csv
# list of dictionaries
l = []
# evalute throught csv
for row in csv.reader(open("test.csv")):
if row[0].startswith("//"):
# stripped "//" line is name for dictionary
n = row[0][2:]
# append stripped "//" line as name for dictionary
#debug
print n
l.append(n)
#debug print l[:]
elif len(row) == 2:
# debug
print "len(row) %s" % len(row)
# debug
print "row[:] %s" % row[:]
for key, val in row:
# print key,val
l[-1] = dic
dic = {}
dic[key] = val
# debug
for d in l:
print l
for key, value in d:
print key, value
unfortunately i got this Error:
DictName
len(row) 2
row[:] ['key', ' value']
Traceback (most recent call last):
File "reader.py", line 31, in <module>
for key, val in row:
ValueError: too many values to unpack
Consider not using CSV
First of all, your overall strategy to the data problem is probably not optimal. The less tabular your data looks, the less sense it makes to keep it in a CSV file (though your needs aren't too far out of the realm).
For example, it would be really easy to solve this problem using json:
import json
# First the data
data = dict(dict1=dict(key1="value1", key2="value2"),
dict2=dict(key3="value3", key4="value4"))
# Convert and write
js = json.dumps(data)
f = file("data.json", 'w')
f.write(js)
f.close()
# Now read back
f = file("data.json", 'r')
data = json.load(f)
print data
Answering the question as written
However, if you are really set on this strategy, you can do something along the lines suggested by jonrsharpe. You can't just use the csv module to do all the work for you, but actually have to go through and filter out (and split by) the "//" lines.
import csv
import re
def header_matcher(line):
"Returns something truthy if the line looks like a dict separator"
return re.match("//", line)
# Open the file and ...
f = open("data.csv")
# create some containers we can populate as we iterate
data = []
d = {}
for line in f:
if not header_matcher(line):
# We have a non-header row, so we make a new entry in our draft dictionary
key, val = line.strip().split(',')
d[key] = val
else:
# We've hit a new header, so we should throw our draft dictionary in our data list
if d:
# ... but only if we actually have had data since the last header
data.append(d)
d = {}
# The very last chunk will need to be captured as well
if d:
data.append(d)
# And we're done...
print data
This is quite a bit messier, and if there is any chance of needed to escape commas, it will get messier still. If you needed, you could probably find a clever way of chunking up the file into generators that you read with CSV readers, but it won't be particularly clean/easy (I started an approach like this but it looked like pain...). This is all a testament to your approach likely being the wrong way to store this data.
An alternative if you're set on CSV
Another way to go if you really want CSV but aren't stuck on the exact data format you specify: Add a column in the CSV file corresponding to the dictionary the data should go into. Imagine a file (data2.csv) that looks like this:
dict1,key1,value1
dict1,key2,value2
dict2,key3,value3
dict2,key4,value4
Now we can do something cleaner, like the following:
import csv
data = dict()
for chunk, key, val in csv.reader(file('test2.csv')):
try:
# If we already have a dict for the given chunk id, this should add the key/value pair
data[chunk][key] = val
except KeyError:
# Otherwise, we catch the exception and add a fresh dictionary with the key/value pair
data[chunk] = {key: val}
print data
Much nicer...
The only good argument for doing something closer to what you have in mind over this is if there is LOTS of data, and space is a concern. But that is not very likely to be case in most situations.
And pandas
Oh yes... one more possible solution is pandas. I haven't used it much yet, so I'm not as much help, but there is something along the lines of a group_by function it provides, which would let you group by the first column if you end up structuring the data as in the the 3-column CSV approach.
I decided to use json instead
Reading this is easier for the program and there's no need to filter text. For generating the data inside database in external file.json will serve python program.
#! /usr/bin/python
import json
category1 = {"server name1":"ip address1","server name2":"ip address2"}
category2 = {"server name1":"ip address1","server name1":"ip address1"}
servers = { "category Alias1":category1,"category Alias2":category2}
js = json.dumps(servers)
f = file("servers.json", "w")
f.write(js)
f.close()
# Now read back
f = file("servers.json", "r")
data = json.load(f)
print data
So the output is dictionary containing keys for categories and as values are another dictionaries. Exactly as i wanted.

How to write to a specific line in a text file

I hope I'm not reposting (I did research before hand) but I need a little help.
So I'll explain the problem as best as I can.
I have is a text file, and inside it I have information in this format:
a 10
b 11
c 12
I read this file and convert it to a dictionary with the first column as the key, and the second as the value.
Now I'm trying to do the opposite, I need to be able to write the file back with modified values in the same format, the key separated by a space, then the corresponding value.
Why would I want to do this?
Well, all the values are supposed to be changeable by the user using the program. So when the do decide to change the values, I need them to be written back to the text file.
This is where the problem is, I just don't know how to do it.
How might I go about doing this?
I've got my current code for reading the values here:
T_Dictionary = {}
with open(r"C:\NetSendClient\files\nsed.txt",newline = "") as f:
reader = csv.reader(f, delimiter=" ")
T_Dictionary = dict(reader)
ok,supposing the dictionary is called A and the file is text.txt i would do that:
W=""
for i in A: # for each key in the dictionary
W+="{0} {1}\n".format(i,A[i]) # Append to W a dictionary key , a space , the value corresponding to that key and start a new line
with open("text.txt","w") as O:
O.write(W)
if i understood what you were asking.
however using this method would leave an empty line at the end of the file ,but that can be removed replacing
O.write(W)
with
O.write(W[0:-1])
i hope it helped
Something like this:
def txtf_exp2(xlist):
print("\n", xlist)
t = open("mytxt.txt", "w+")
# combines a list of lists into a list
ylist = []
for i in range(len(xlist)):
newstr = xlist[i][0] + "\n"
ylist.append(newstr)
newstr = str(xlist[i][1]) + "\n"
ylist.append(newstr)
t.writelines(ylist)
t.seek(0)
print(t.read())
t.close()
def txtf_exp3(xlist):
# does the same as the function above but is simpler
print("\n", xlist)
t = open("mytext.txt", "w+")
for i in range(len(xlist)):
t.write(xlist[i][0] + "\n" + str(xlist[i][1]) + "\n")
t.seek(0)
print(t.read())
t.close()
You'll have to make some changes, but it's very similar to what you're trying to do. M

Categories

Resources