I am new to python and I'm trying to create a csv parsing script.
I pass rows from the csv to a list but what currently troubles me is that I need to add the first header line as a dictionary in each item.
def parse_csv(datafile):
data = []
with open(datafile, "r") as f:
next(f) #skip headerline
for line in f:
splitLine = line.strip(',')
rowL = splitLine.rstrip('\n') #remove the newline char
data.append(rowL)
pprint(data)
return data
If the 1st header line has the dictionaries (e.g Title, Name etc) how am I going to pass to each stripped element?
e.g {'Dict1': 'data1', 'Dict2': 'data2' }
This may be considered duplicate but tried various ways from similar posts but none worked properly on my case.
I strongly recommend to use the provided csv library. It will save you a lot of time and effort. Here is what you want to do:
import csv
data = []
with open(datafile, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
print(row['Title'], row['Name'])
In this example each row is actually a python dictionary.
#GeorgiDimitrov is certainly right that the proper approach is to use the csv module from the standard library, but, if you're doing this only for self-instruction purposes, then...:
def parse_csv(datafile):
data = []
with open(datafile, "r") as f:
headers = next(f).split(',')
for line in f:
splitLine = line.split(',')
dd = dict(zip(headers,splitLine))
data.append(dd
pprint(data)
return data
This will not properly deal with quoted/escaped commas, &c -- all subtleties that are definitely best left to the csv module:-).
Related
I have playlists that are lists of dictionaries, that look aprox like this:
[{'url':url1,'title':title1}, {'url':url2,'title':title2}]
Of course, the playlist has a name, name1
What I want to do is in one function, to save it to a CSV file. My wish is that the whole dictionary can just be one item in the CSV file, instead of dividing up the dictionary over the columns.
Then I want to have another function to import the playlists.
I litteraly spent hours to do this, and I am in desperation because I don't manage.
I show what I tried:
To save the playist:
def SavePlaylist():
name = SearchBox.get() + "§"
if len(playlist) >1 and name != "":
newrow = [name, playlist]
with open(os.path.join(sys.path[0], "AntiTubePlaylists.txt"), "a") as f_object:
writer_object = writer(f_object)
writer_object.writerow(newrow)
f_object.close()
To import the playlist:
with open(os.path.join(sys.path[0], "AntiTubePlaylists.txt"), "r") as csv_file:
csv_reader = csv.reader(csv_file, quotechar='"', delimiter='§')
for row in csv_reader:
playlistt = row[1][2:][:-1]
print(playlistt[0])
playlistmenu.add_command(label=row[0], command=(lambda playlistt = playlistt: ImportPlaylist(playlist)))
What it does is, it creates a list that has every single character of my list. (i.e. the print command prints just "["
Though using eval may work, it is a big security risk! You should only use it for testing purposes, and only on files of which you are certain that nobody untrusted has tampered with. The reason for this is that eval actually runs everything in the file as code, so if someone injected some evil code into the csv file you're reading, you would (unknowingly) execute it when you load the file!
Indeed, saving an object using JSON is much easier. Specifically json.dump() and json.load() (https://docs.python.org/3/library/json.html#basic-usage) should be what you want.
An alternative is using the pickle library, which is made for storing Python objects, but as the page says, it is less secure, so you're probably better off sticking to JSON. Added benefit is that JSON is human-readable, whereas pickle will output some garbage:
>>> import json
>>> import pickle
>>> name1 = [{'url':'url1','title':'title1'}, {'url':'url2','title':'title2'}]
>>> json.dumps(name1)
'[{"url": "url1", "title": "title1"}, {"url": "url2", "title": "title2"}]'
>>> pickle.dumps(name1)
b'\x80\x04\x95?\x00\x00\x00\x00\x00\x00\x00]\x94(}\x94(\x8c\x03url\x94\x8c\x04url1\x94\x8c\x05title\x94\x8c\x06title1\x94u}\x94(h\x02\x8c\x04url2\x94h\x04\x8c\x06title2\x94ue.'
Note both json.dump and pickle.dump have a version that dumps to string rather than to file, namely json.dumps and pickle.dumps, which may be convenient.
Eval solves the problem:
with open(os.path.join(sys.path[0], "AntiTubePlaylists.txt"), "r") as csv_file:
csv_reader = csv.reader(csv_file, quotechar='"', delimiter='§')
for row in csv_reader:
playlistt = row[1][2:][:-1]
playlistt = eval(playlistt)
print(playlistt[0])
playlistmenu.add_command(label=row[0], command=(lambda playlistt = playlistt: ImportPlaylist(playlist)))
do you want a csv having 2 cols (url, title)?
I would personally use pandas: create a dataframe from your list of dicts and then save to csv
for example
import pandas as pd
name1 = [{'url':'url1','title':'title1'},
{'url':'url2','title':'title2'},
{'url':'url3','title':'title3'},
{'url':'url4','title':'title4'}]
df = pd.DataFrame(name1)
df.to_csv('PATH/name1.csv')
To use JSON, the file needs to be in the proper format.
So I came up with the following solution, mind the str(s).replace; JSON needs double quotes for the items.
with open(os.path.join(sys.path[0], "AntiTubePlaylists.txt"), "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter='§')
for row in csv_reader:
s = str(row[1][2:][:-1])
s = str(s).replace("'",'"')
playlist_import = json.loads(s)
print(playlist_import[0])
playlistmenu.add_command(label=row[0], command=(lambda playlistt = playlist_import: ImportPlaylist(playlistt)))
How can I add record from csv file into dictionary in function where the input attribute will be tha path fo that csv file?
Please help with this uncompleted function :
def csv_file (p):
dictionary={}
file=csv.reader(p)
for rows in file:
dictionary......(rows)
return dictionary
You need to open the file first:
def csv_file(p):
dictionary = {}
with open(p, "rb") as infile: # Assuming Python 2
file = csv.reader(infile) # Possibly DictReader might be more suitable,
for row in file: # but that...
dictionary......(row) # depends on what you want to do.
return dictionary
It seems as though you haven't even opened the file, you need to use open for that.
Try the following code:
import csv
from pprint import pprint
INFO_LIST = []
with open('sample.csv') as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
for i, row in enumerate(reader):
if i == 0:
TITLE_LIST = [var for var in row]
continue
INFO_LIST.append({title: info for title, info in zip(TITLE_LIST, row)})
pprint(INFO_LIST)
I use the following csv file as an example:
"REVIEW_DATE","AUTHOR","ISBN","DISCOUNTED_PRICE"
"1985/01/21","Douglas Adams",0345391802,5.95
"1990/01/12","Douglas Hofstadter",0465026567,9.95
"1998/07/15","Timothy ""The Parser"" Campbell",0968411304,18.99
"1999/12/03","Richard Friedman",0060630353,5.95
"2001/09/19","Karen Armstrong",0345384563,9.95
"2002/06/23","David Jones",0198504691,9.95
"2002/06/23","Julian Jaynes",0618057072,12.50
"2003/09/30","Scott Adams",0740721909,4.95
"2004/10/04","Benjamin Radcliff",0804818088,4.95
"2004/10/04","Randel Helms",0879755725,4.50
You can put all that logic into a function like so:
def csv_file(file_path):
# Checking if a filepath is a string, if not then we return None
if not isinstance(file_path, str):
return None
# Creating a the list in which we will hold our dictionary's files
_info_list = []
with open(file_path) as f:
# Setting the delimiter and quotechar
reader = csv.reader(f, delimiter=',', quotechar='"')
# We user enumerate here, because we know the first row contains data about the information
for i, row in enumerate(reader):
# The first row contains the headings
if i == 0:
# Creating a list from first row
title_list = [var for var in row]
continue
# Zipping title_list and info_list together, so that a dictionary comprehension is possible
_info_list.append({title: info for title, info in zip(title_list, row)})
return _info_list
APPENDIX
open()
zip
Dictionary Comprehension
Delmiter, its the character that separates values, in this case ,.
Quotechar, its the character, that holds values in a csv, in this case ".
I am new to Python (coming from PHP background) and I have a hard time figuring out how do I put each line of CSV into a list. I wrote this:
import csv
data=[]
reader = csv.reader(open("file.csv", "r"), delimiter=',')
for line in reader:
if "DEFAULT" not in line:
data+=line
print(data)
But when I print out data, I see that it's treated as one string. I want a list. I want to be able to loop and append every line that does not have "DEFAULT" in a given line. Then write to a new file.
How about this?
import csv
reader = csv.reader(open("file.csv", "r"), delimiter=',')
print([line for line in reader if 'DEFAULT' not in line])
or if it's easier to understand:
import csv
reader = csv.reader(open("file.csv", "r"), delimiter=',')
data = [line for line in reader if 'DEFAULT' not in line]
print(data)
and of course the ultimate one-liner:
import csv
print([l for l in csv.reader(open("file.csv"), delimiter=',') if 'DEFAULT' not in l])
I have some code that is meant to convert CSV files into tab delimited files. My problem is that I cannot figure out how to write the correct values in the correct order. Here is my code:
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
Now, since both my write statements are in the for row in data loop, my headers are being written multiple times over. If I outdent the first write statement, I'll have an obvious formatting error. If I move the second write statement above the first and then outdent, my data will be out of order. What can I do to make sure that the first write statement gets written once as a header, and the second gets written for each line in the CSV file? How do I extract the first 'write' statement outside of the loop without breaking the dictionary? Thanks!
The csv module contains methods for writing as well as reading, making this pretty trivial:
import csv
with open("test.csv") as file, open("test_tab.csv", "w") as out:
reader = csv.reader(file)
writer = csv.writer(out, dialect=csv.excel_tab)
for row in reader:
writer.writerow(row)
No need to do it all yourself. Note my use of the with statement, which should always be used when working with files in Python.
Edit: Naturally, if you want to select specific values, you can do that easily enough. You appear to be making your own dictionary to select the values - again, the csv module provides DictReader to do that for you:
import csv
with open("test.csv") as file, open("test_tab.csv", "w") as out:
reader = csv.DictReader(file)
writer = csv.writer(out, dialect=csv.excel_tab)
for row in reader:
writer.writerow([row["name"], row["order_num"], ...])
As kirelagin points out in the commends, csv.writerows() could also be used, here with a generator expression:
writer.writerows([row["name"], row["order_num"], ...] for row in reader)
Extract the code that writes the headers outside the main loop, in such a way that it only gets written exactly once at the beginning.
Also, consider using the CSV module for writing CSV files (not just for reading), don't reinvent the wheel!
Ok, so I figured it out, but it's not the most elegant solutions. Basically, I just ran the first loop, wrote to the file, then ran it a second time and appended the results. See my code below. I would love any input on a better way to accomplish what I've done here. Thanks!
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.close()
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
tab_file.close()
I'm having trouble trying to convert a text file into a list of lists split by commas. Basically, I want:
DATE OF OCCURRENCE,WARD,LONGITUDE,LATITUDE
06/04/2011,3,-87.61619704286184,41.82254380664193
06/04/2011,20,-87.62391924557963,41.79367531770095
to look like:
[["DATE OF OCCURRENCE", "WARD", "LONGITUDE" , "LATITUDE"],
["06/04/2011", "3", "-87.61619704286184", "41.82254380664193"],
["06/04/2011", "20", "-87.62391924557963", "41.79367531770095"]]
Here is the code I have so far:
row = []
crimefile = open(fileName, 'r')
for line in crimefile.readlines():
row.append([line])
for i in line.split(","):
row[-1].append(i)
However, this gets me a result of:
[['DATE OF OCCURRENCE,WARD,LONGITUDE,LATITUDE\n', 'DATE OF OCCURRENCE', 'WARD', 'LONGITUDE', 'LATITUDE\n'],
['06/04/2011,3,-87.61619704286184,41.82254380664193\n', '06/04/2011', '3', '-87.61619704286184', '41.82254380664193\n'],
['06/04/2011,20,-87.62391924557963,41.79367531770095', '06/04/2011', '20', '-87.62391924557963', '41.79367531770095']]
I just want to be able to remove that first part and replace it with the second. How can I do that?
Maybe:
crimefile = open(fileName, 'r')
yourResult = [line.split(',') for line in crimefile.readlines()]
This looks like a CSV file, so you could use the python csv module to read it. For example:
import csv
crimefile = open(fileName, 'r')
reader = csv.reader(crimefile)
allRows = [row for row in reader]
Using the csv module allows you to specify how things like quotes and newlines are handled. See the documentation I linked to above.
I believe #michael's comment might be a bit deprecated. Since I have come across this question and it seems to be still relevant I would like to provide a more up to date solution based on that previous response, which would be something like:
with open(file_name, 'r') as f:
your_result = [line.split(',') for line in f.read().splitlines()]
Going with what you've started:
row = [[]]
crimefile = open(fileName, 'r')
for line in crimefile.readlines():
tmp = []
for element in line[0:-1].split(','):
tmp.append(element)
row.append(tmp)