Opening an uploaded csv file - python

I am failing to open this uploaded csv file. When I use a file from the pc directory it works fine but when I upload it from an html form I get this error:
TypeError: coercing to Unicode: need string or buffer, file found
When trying to read from uploaded csv file
domain_file = request.POST['csv'].file
file = open(domain_file, "r")
csv_file = csv.reader(file, delimiter=",", quotechar='"')
This works fine when am using a file from pc
file = open('/Desktop/csv.csv', "r")
csv_file = csv.reader( file, delimiter=",", quotechar='"')

The file contains a file object, not a path. Use the filename property instead: http://flask.pocoo.org/docs/0.10/patterns/fileuploads/
Maybe something like this:
domain_file = request.files['csv']
if domain_file and allowed_file(domain_file.filename):
file = open(domain_file, 'r')
#...
Also see http://werkzeug.pocoo.org/docs/0.9/wrappers/#werkzeug.wrappers.BaseRequest.files

If you do this you'll be able to iterate through the data in the csv line by line shown in a dict.
import csv
csv_contents = request.POST['csv'].value.decode('utf-8')
file = csv_contents.splitlines()
data = csv.DictReader(file)

Related

How to read the headers of a csv file using csv module in "rb" mode?

I am currently reading the csv file in "rb" mode and uploading the file to an s3 bucket.
with open(csv_file, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
All of this is working fine but now I have to validate the headers in the csv file before making the put call.
When I try to run below, I get an error.
with open(csv_file, 'rb') as DATA:
csvreader = csv.reader(file)
columns = next(csvreader)
# run-some-validations
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
This throws
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
As a workaround, I have created a new function which opens the file in "r" mode and does validation on the csv headers and this works ok.
def check_csv_headers():
with open(csv_file, 'r') as file:
csvreader = csv.reader(file)
columns = next(csvreader)
I do not want to read the same file twice. Once for header validation and once for uploading to s3. The upload part also doesn't work if I do it in "r" mode.
Is there a way I can achieve this while reading the file only once in "rb" mode ? I have to make this work using the csv module and not the pandas library.
Doing what you want is possible but not very efficient. Simply opening a file isn't that expensive. The CSV reader only reads only line at a time, not the entire file.
To do what you want you have to :
Read the first line as bytes
Decode it into a string (using the correct encoding)
Convert it to a list of strings
Parse it with csv.reader and finally
Seek to the start of the stream.
Otherwise you'll end up uploading only the data without the headers :
with open(csv_file, 'rb') as DATA:
header=file.readline()
lines=[header.decode()]
csvreader = csv.reader(lines)
columns = next(csvreader)
// run-some-validations
DATA.seek(0)
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Opening the file as text is not only simpler, it allows you to separate the validation logic from the upload code.
To ensure only one line is read at a time you can use buffering=1
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Or
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
//If successful
return True
def upload_csv(filePath):
if check_csv_headers(filePath) :
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

How to load multiple csv files to an iterable variable in Python?

I want to open a variable number of csv files and then I would like to iterate over the csv files opened and upload 1 row of each file at a time to my sql database.
For example, loop through each file uploading the first row of each file to the database, then loop again through each file uploading the second row of each file to the database.
However, I'm stuck in having the csv files ready to be uploaded in a single object.
The error happens at 'csv_data[i] = csv.reader...'
Each file is for a different table, so I cannot append them.
import csv
import sys
i = 0
for argv in sys.argv[1:]:
csv_file = open(argv, newline='', encoding='utf-8-sig')
csv_data[i] = csv.reader(csv_file, dialect='excel', delimiter=',', quotechar='|')
csv_file.close()
i += 1
After this code, I would need something to loop through each file uploading a certain row number.
zip together the files, iterate through them:
file_handles = [open(file, newline='', encoding='utf-8-sig') for file in argv[1:]]
readers = (csv.reader(file, dialect='excel', delimiter=',', quotechar='|') for file in file_handles)
# zip here
for line_group in zip(*readers):
# line_group is a tuple of line i of each file
# don't forget to close your files
for file_handle in file_handles:
try:
file_handle.close()
except:
print("Issue closing one of the files")

Printing csv through printer with python

I want to output csv file with python. I have gone through below code and it is working well with .txt file but I am unable to print csv through it.
import os
import tempfile
filename = tempfile.mktemp(".txt")
open (filename , "w").write ("Printing file")
os.startfile(filename, "print")
Actually I want to print a csv file that had been already created, there will be no need to write and create new file then print it out.
Edit: From print I meant hardcopy print through printer
If you want to print the content of a csv you can try this:
import csv
file_path = 'a.csv'
with open(file_path) as file:
content = csv.reader(file)
for row in content:
print(row)
I was talking about printing csv file as hardcopy with python code.
def printing():
#reading from csv writing in txt
with open("CSV_files//newfile.txt", "w") as my_output_file:
cs = pd.read_csv("CSV_files\\attendance.csv",header=None,index_col=None)
with open("CSV_files//attendance.csv", "r") as my_input_file:
[ my_output_file.write(" | ".join(row)+'\n') for row in csv.reader(my_input_file)]
my_output_file.close()
#reading from file and storing into reader and converting into string as .write() takes string
strnew = ""
with open('CSV_files//newfile.txt',"r") as f:
reader = f.read()
strnew = reader
#for checking
with open('CSV_files//print.txt',"w") as f:
f.write(strnew)
#printing
filename = tempfile.mktemp("attendance.txt")#creating a temp file
open (filename , "w").write(strnew)
os.startfile(filename, "print")
messagebox.showinfo("Print","Printing Request sent successfully!")
For more info:
github project link

IOError when downloading and decompressing gzip file

I'm trying to download and decompress a gzip file and then convert the resulting decompressed file which is of tsv format into a CSV format which would be easier to parse. I am trying to gather the data from the "Download Table" link in this URL. My code is as follows, where I am using the same idea as in this post, however I get the error IOError: [Errno 2] No such file or directory: 'file=data/irt_euryld_d.tsv' in the line with open(outFilePath, 'w') as outfile:
import os
import urllib2
import gzip
import StringIO
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?"
filename = "D:\Sidney\irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename[:-3]
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
#Now have to deal with tsv file
import csv
with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
tsvin = csv.reader(tsvin, delimiter='\t')
csvout = csv.writer(csvout) #Converting output into CSV Format
Thank You
The path you were setting filename to was not a valid path to have a file written to it. So you have to change filename = "data/irt_euryld_d.tsv.gz" to be a valid path to wherever you want the irt_euryld_d.tsv.gz file to live. For example if I wanted the irt_euryld_d.tsv.gz file on my desktop I would set the value of filename = "/Users/heinst/Desktop/data/irt_euryld_d.tsv.gz". Since this is a valid path, python will not give you the No such file or directory error anymore.

reading gzipped csv file in python 3

I'm having problems reading from a gzipped csv file with the gzip and csv libs. Here's what I got:
import gzip
import csv
import json
f = gzip.open(filename)
csvobj = csv.reader(f,delimiter = ',',quotechar="'")
for line in csvobj:
ts = line[0]
data_json = json.loads(line[1])
but this throws an exception:
File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 64, in download_from_S3
self.parse_dump_file(filename)
File "C:\Users\yaronol\workspace\raw_data_from_s3\s3_data_parser.py", line 30, in parse_dump_file
for line in csvobj:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
gunzipping the file and opening that with csv works fine. I've also tried decoding the file text to convert from bytes to str...
What am I missing here?
Default mode for gzip.open is rb, if you wish to work with strs, you have to specify it extra:
f = gzip.open(filename, mode="rt")
OT: it is a good practice to write I/O operations in a with block:
with gzip.open(filename, mode="rt") as f:
You are opening the file in binary mode (which is the default for gzip).
Try instead:
import gzip
import csv
f = gzip.open(filename, mode='rt')
csvobj = csv.reader(f,delimiter = ',',quotechar="'")
too late, you can use datatable package in python
import datatable as dt
df = dt.fread(filename)
df.head()

Categories

Resources