Python 2 - iterating through csv with determinating specific lines as dicitonary - python

I generated csv from multiple dictionaries (to be readable and editable too) with help of this question. Output is simple
//Dictionary
key,value
key2,value2
//Dictionary2
key4, value4
key5, value5
i want double backslash to be separator to create new dictionary, but every calling csv.reader(open("input.csv")) evaluates through lines so i have no use of:
import csv
dict = {}
for key, val in csv.reader(open("input.csv")):
dict[key] = val
Thanks for helping me out..
Edit: i made this piece of.. well "code".. I'll be glad if you can check it out and review:
#! /usr/bin/python
import csv
# list of dictionaries
l = []
# evalute throught csv
for row in csv.reader(open("test.csv")):
if row[0].startswith("//"):
# stripped "//" line is name for dictionary
n = row[0][2:]
# append stripped "//" line as name for dictionary
#debug
print n
l.append(n)
#debug print l[:]
elif len(row) == 2:
# debug
print "len(row) %s" % len(row)
# debug
print "row[:] %s" % row[:]
for key, val in row:
# print key,val
l[-1] = dic
dic = {}
dic[key] = val
# debug
for d in l:
print l
for key, value in d:
print key, value
unfortunately i got this Error:
DictName
len(row) 2
row[:] ['key', ' value']
Traceback (most recent call last):
File "reader.py", line 31, in <module>
for key, val in row:
ValueError: too many values to unpack

Consider not using CSV
First of all, your overall strategy to the data problem is probably not optimal. The less tabular your data looks, the less sense it makes to keep it in a CSV file (though your needs aren't too far out of the realm).
For example, it would be really easy to solve this problem using json:
import json
# First the data
data = dict(dict1=dict(key1="value1", key2="value2"),
dict2=dict(key3="value3", key4="value4"))
# Convert and write
js = json.dumps(data)
f = file("data.json", 'w')
f.write(js)
f.close()
# Now read back
f = file("data.json", 'r')
data = json.load(f)
print data
Answering the question as written
However, if you are really set on this strategy, you can do something along the lines suggested by jonrsharpe. You can't just use the csv module to do all the work for you, but actually have to go through and filter out (and split by) the "//" lines.
import csv
import re
def header_matcher(line):
"Returns something truthy if the line looks like a dict separator"
return re.match("//", line)
# Open the file and ...
f = open("data.csv")
# create some containers we can populate as we iterate
data = []
d = {}
for line in f:
if not header_matcher(line):
# We have a non-header row, so we make a new entry in our draft dictionary
key, val = line.strip().split(',')
d[key] = val
else:
# We've hit a new header, so we should throw our draft dictionary in our data list
if d:
# ... but only if we actually have had data since the last header
data.append(d)
d = {}
# The very last chunk will need to be captured as well
if d:
data.append(d)
# And we're done...
print data
This is quite a bit messier, and if there is any chance of needed to escape commas, it will get messier still. If you needed, you could probably find a clever way of chunking up the file into generators that you read with CSV readers, but it won't be particularly clean/easy (I started an approach like this but it looked like pain...). This is all a testament to your approach likely being the wrong way to store this data.
An alternative if you're set on CSV
Another way to go if you really want CSV but aren't stuck on the exact data format you specify: Add a column in the CSV file corresponding to the dictionary the data should go into. Imagine a file (data2.csv) that looks like this:
dict1,key1,value1
dict1,key2,value2
dict2,key3,value3
dict2,key4,value4
Now we can do something cleaner, like the following:
import csv
data = dict()
for chunk, key, val in csv.reader(file('test2.csv')):
try:
# If we already have a dict for the given chunk id, this should add the key/value pair
data[chunk][key] = val
except KeyError:
# Otherwise, we catch the exception and add a fresh dictionary with the key/value pair
data[chunk] = {key: val}
print data
Much nicer...
The only good argument for doing something closer to what you have in mind over this is if there is LOTS of data, and space is a concern. But that is not very likely to be case in most situations.
And pandas
Oh yes... one more possible solution is pandas. I haven't used it much yet, so I'm not as much help, but there is something along the lines of a group_by function it provides, which would let you group by the first column if you end up structuring the data as in the the 3-column CSV approach.

I decided to use json instead
Reading this is easier for the program and there's no need to filter text. For generating the data inside database in external file.json will serve python program.
#! /usr/bin/python
import json
category1 = {"server name1":"ip address1","server name2":"ip address2"}
category2 = {"server name1":"ip address1","server name1":"ip address1"}
servers = { "category Alias1":category1,"category Alias2":category2}
js = json.dumps(servers)
f = file("servers.json", "w")
f.write(js)
f.close()
# Now read back
f = file("servers.json", "r")
data = json.load(f)
print data
So the output is dictionary containing keys for categories and as values are another dictionaries. Exactly as i wanted.

Related

How to read csv data from a file into memory without use of libraries Python

So I have a problem to solve for a practice task. The task is to develop a function which reads csv data from a file into memory, but we cannot use any libraries to do so. So i can't use csv reader, Pandas, NumPy etc.
This is what I have come up with, but it does not work as it says 'csv_list is not defined'. I am a bit stuck on where to go from here, and have maninly only coded using libraries, so coding manually and developing functions myself are a struggle! I have looked on here for any solutions but none of them seem to work / they use libraries which I cannot use.
If anyone has a way to do this I would be so grateful!
#define read csv
def read_csv (file_name):
with open(file_name) as f:
csv_list = [[val.strip() for val in r.split (",")] for r in f.readlines()]
#convert file to dictionary structure
(_, *header), *data = csv_list
csv_dict = {}
for row in data:
key, *values = row
csv_dict[key] = {key: value for key, value in zip(header, values)}
#insert name of file to be read by user
read_csv (task1.csv)
The answer to why you see the error is that csv_list is local to the method read_csv(file_name). Rather than set that variable, perhaps you want to just return the result of the comprehension. Then you can get the value you seek when you call that method. Note, you need to call it much sooner than you currently do.
This will give you a dictionary, but I'm not sure it is structured in the way you likely want. I would expect a list of dictionaries after reading a CSV not what you are producing but that is your call.
#define read csv
def read_csv (file_name):
with open(file_name) as f:
return [[val.strip() for val in r.split (",")] for r in f.readlines()]
(_, *header), *data = read_csv("small.csv")
#convert file to dictionary structure
csv_dict = {}
for row in data:
key, *values = row
csv_dict[key] = {key: value for key, value in zip(header, values)}
print(csv_dict)
If it were me, I might make the following tweaks to get a result that was list based. I'm also going to get rid of the stuff like *header while at it:
def read_csv(file_name):
with open(file_name) as f:
yield from [[c.strip() for c in r.split(",")] for r in f]
rows = read_csv("small.csv")
header_row = next(rows)
csv_dict = [dict(zip(header_row, row)) for row in rows]
print(csv_dict)

Python Converting .csv file with comma delimiter to dictionary

So I have tried to fix this problem for quite a while now and did some research on trying to figure out why my code won't work, but I simply can't get the dictionary to print with all the proper key:value pairs I need.
So here's the story. I am reading a .csv file where the first column are text abbreviations and in the second column they are the full english meaning. Now I have tried multiple ways of trying to open this file, read it, and then store it to dictionary we create. My issue is that the file gets read, and when I print the separated pieces (I believe it goes through the whole file, but I don't know since it does get cut off around line 1007, but goes through to 4600. The problem is that when I now want to take all that stuff and put it into key:value pairs inside a dictionary. The only one that gets stored is the very first line in the file.
Here is the code:
def createDictionary(filename):
f = open(filename, 'r')
dic = {}
for line in f:
#line = line.strip()
data = line.split(',')
print data
dic[data[0]] = data[1]
print dic
What I assumed was the issue was:
print dic
Since it is printing within the loop, but since it is in the loop it should just print everytime it goes through again and again. I am confused on what I am doing wrong. The other methods I attempted to use were json, but I don't know too much about how to use it, and then I also read up about the csv module, but I don't think our professor wants us to use that so i was hoping for someone to spot my error. Thanks in advance!!!
EDIT
This is the output of my program
going to be late\rg2cu', 'glad to see you\rg2e', 'got to eat\rg2g', 'got to go\rg2g2tb', 'got to go to the bathroom\rg2g2w', 'got to go to work\rg2g4aw', 'got to go for a while\rg2gb', 'got to go bye\rg2gb2wn', 'got to go back to work now\rg2ge', 'got to go eat\rg2gn', 'got to go now\rg2gp', 'got to go pee\rg2gpc', 'got 2 go parents coming\rg2gpp', 'got to go pee pee\rg2gs', 'got to go sorry\rg2k', 'good to know\rg2p', 'got to pee\rg2t2s', 'got to talk to someone\rg4u', 'good for you\rg4y', 'good for you\rg8', 'gate\rg9', 'good night\rga', 'go ahead\rgaalma', 'go away and leave me alone\rgafi', 'get away from it\rgafm', 'Get away from me\rgagp', 'go and get pissed\rgaj'
Which goes on for a bit until the end of the file and then after that its supposed to print the entire dictionary in which I get this
{'$$': 'money\r/.'}
Along with a
none
EDIT 2
Here is the full code:
def createDictionary(filename):
f = open(filename, 'r')
dic = {}
for line in f:
line = line.strip()
data = line.split(',')
print data
dic[data[0]] = data[1]
print dic
if __name__ == "__main__":
x = createDictionary("textToEnglish.csv")
print x
EDIT 3
Here is the file I am trying to make into a dictionary
https://1drv.ms/u/s!AqnudQBXpxTGiC9vQEopu1dOciIS
Simply add a return in your function. Also, you will see the dictionary length is not the same as csv rows due to repeated values in first column of csv. Dictionary keys must be unique, so when a reused key is assigned to a value, the latter value replaces former.
def createDictionary(filename):
f = open(filename, 'r')
dic = {}
for line in f:
#line = line.strip()
data = line.split(',')
print(data)
dic[data[0]] = data[1]
return dic
if __name__ == "__main__":
x = createDictionary("textToEnglish.csv")
print type(x)
# <class 'dict'>
print len(x)
# 4255
for k, v in x.items():
print(k, v)
And try not to print dictionary all at once especially with so many values which becomes intense overhead on memory. See how you can iterate through keys and values with for loop.
Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.
Pandas is a library for handling data in Python, preferred by many Data Scientists.
Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.
In your case:
import pandas
def createDictionary(filename):
my_data = pandas.DataFrame.from_csv(filename, sep=',', index_col=False)
list_of_dicts = [item for item in my_data.T.to_dict().values()]
return list_of_dicts
if __name__ == "__main__":
x = createDictionary("textToEnglish.csv")
print type(x)
# <class 'list'>
print len(x)
# 4255
print type(x[0])
# <class 'dict'>

How to update dictionary on python?

So I'm working with creating a master dictionary while running a query for individual information.
Currently I have:
dictionary = {}
user_input =input('enter user id: ')
D = query(user_input)
dictionary[user_input] = D
And if I print dictionary[user_input] = D, I will get something like this:
{'user_input':[info]}
I want to prompt repeatedly and save all the individual information in one master dictionary and put it into a textfile.
How do I format my print so that when I try to print it to the textfile it's all written as one big dictionary?
What I've tried:
output_file = ('output.txt', 'w')
print(dictionary, file = output_file)
output_file.close()
This only seems to print {}
EDIT: Tried something diff.
Since D already returns a dictionary, I tried:
dictionary.update(D)
Which is supposed to add the dictionary that is stored in D to the dictionary right?
However, when I try printing dictionary:
print(dictionary)
#it returns: {}
Use json.dump to write to the file. Then you can use json.load to load that data back to a dictionary object.
import json
with open('dictionary.txt', 'w') as f:
json.dump(dictionary, f)
https://docs.python.org/3/library/json.html
EDIT: since you cannot use json maybe you can just separate the questions and answers with new lines like this. That will also be easy and clean to parse later:
with open('dictionary.txt', 'w') as f:
for k,v in dictionary.items():
f.write('%s=%s\n' % (k, v,))
Not totally familiar with the issue, so I'm not sure if this is what you're looking for. But you don't need to print the assignment itself in order to get the value. You can just keep adding more things to the dictionary as you go, and then print the whole dictionary to file at the end of your script, like so:
dictionary = {}
user_input =input('enter user id: ')
D = query(user_input)
dictionary[user_input] = D
# do this more times....
# then eventually....
print(dictionary)
# or output to a file from here, as described in the other answer

reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]

Extracting variable names and data from csv file using Python

I have a csv file that has each line formatted with the line name followed by 11 pieces of data. Here is an example of a line.
CW1,0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64
There are 12 lines in total, each with a unique name and data.
What I would like to do is extract the first cell from each line and use that to name the corresponding data, either as a variable equal to a list containing that line's data, or maybe as a dictionary, with the first cell being the key.
I am new to working with inputting files, so the farthest I have gotten is to read the file in using the stock solution in the documentation
import csv
path = r'data.csv'
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
print(row[0])
I am failing to figure out how to assign each row to a new variable, especially when I am not sure what the variable names will be (this is because the csv file will be created by a user other than myself).
The destination for this data is a tool that I have written. It accepts lists as input such as...
CW1 = [0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64]
so this would be the ideal end solution. If it is easier, and considered better to have the output of the file read be in another format, I can certainly re-write my tool to work with that data type.
As Scironic said in their answer, it is best to use a dict for this sort of thing.
However, be aware that dict objects do not have any "order" - the order of the rows will be lost if you use one. If this is a problem, you can use an OrderedDict instead (which is just what it sounds like: a dict that "remembers" the order of its contents):
import csv
from collections import OrderedDict as od
data = od() # ordered dict object remembers the order in the csv file
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile, delimiter = ' ')
for row in reader:
data[row[0]] = row[1:] # Slice the row up into 0 (first item) and 1: (remaining)
Now if you go looping through your data object, the contents will be in the same order as in the csv file:
for d in data.values():
myspecialtool(*d)
You need to use a dict for these kinds of things (dynamic variables):
import csv
path = r'data.csv'
data = {}
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
data[row[0]] = row[1:]
dicts are especially useful for dynamic variables and are the best method to store things like this. to access you just need to use:
data['CW1']
This solution also means that if you add any extra rows in with new names, you won't have to change anything.
If you are desperate to have the variable names in the global namespace and not within a dict, use exec (N.B. IF ANY OF THIS USES INPUT FROM OUTSIDE SOURCES, USING EXEC/EVAL CAN BE HIGHLY DANGEROUS (rm * level) SO MAKE SURE ALL INPUT IS CONTROLLED AND UNDERSTOOD BY YOURSELF).
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
exec("{} = {}".format(row[0], row[1:])
In python, you can use slicing: row[1:] will contain the row, except the first element, so you could do:
>>> d={}
>>> with open("f") as f:
... c = csv.reader(f, delimiter=',')
... for r in c:
... d[r[0]]=map(int,r[1:])
...
>>> d
{'var1': [1, 3, 1], 'var2': [3, 0, -1]}
Regarding variable variables, check How do I do variable variables in Python? or How to get a variable name as a string in Python?. I would stick to dictionary though.
An alternative to using the proper csv library could be as follows:
path = r'data.csv'
csvRows = open(path, "r").readlines()
dataRows = [[float(col) for col in row.rstrip("\n").split(",")[1:]] for row in csvRows]
for dataRow in dataRows: # Where dataRow is a list of numbers
print dataRow
You could then call your function where the print statement is.
This reads the whole file in and produces a list of lines with trailing newlines. It then removes each newline and splits each row into a list of strings. It skips the initial column and calls float() for each entry. Resulting in a list of lists. It depends how important the first column is?

Categories

Resources