I must be missing something very simple here, but I've been hitting my head against the wall for a while and don't understand where the error is. I am trying to open a csv file and read the data. I am detecting the delimiter, then reading in the data with this code:
with open(filepath, 'r') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read())
delimiter = repr(dialect.delimiter)[1:-1]
csvdata = [line.split(delimiter) for line in csvfile.readlines()]
However, my csvfile is being read as having no length. If I run:
print(sum(1 for line in csvfile))
The result is zero. If I run:
print(sum(1 for line in open(filepath, 'r')))
Then I get five lines, as expected. I've checked for name clashes by changing csvfile to other random names, but this does not change the result. Am I missing a step somewhere?
You need to move the file pointer back to the start of the file after sniffing it. You don't need to read the whole file in to do that, just enough to include a few rows:
import csv
with open(filepath, 'r') as f_input:
dialect = csv.Sniffer().sniff(f_input.read(2048))
f_input.seek(0)
csv_input = csv.reader(f_input, dialect)
csv_data = list(csv_input)
Also, the csv.reader() will do the splitting for you.
Related
I am currently reading the csv file in "rb" mode and uploading the file to an s3 bucket.
with open(csv_file, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
All of this is working fine but now I have to validate the headers in the csv file before making the put call.
When I try to run below, I get an error.
with open(csv_file, 'rb') as DATA:
csvreader = csv.reader(file)
columns = next(csvreader)
# run-some-validations
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
This throws
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
As a workaround, I have created a new function which opens the file in "r" mode and does validation on the csv headers and this works ok.
def check_csv_headers():
with open(csv_file, 'r') as file:
csvreader = csv.reader(file)
columns = next(csvreader)
I do not want to read the same file twice. Once for header validation and once for uploading to s3. The upload part also doesn't work if I do it in "r" mode.
Is there a way I can achieve this while reading the file only once in "rb" mode ? I have to make this work using the csv module and not the pandas library.
Doing what you want is possible but not very efficient. Simply opening a file isn't that expensive. The CSV reader only reads only line at a time, not the entire file.
To do what you want you have to :
Read the first line as bytes
Decode it into a string (using the correct encoding)
Convert it to a list of strings
Parse it with csv.reader and finally
Seek to the start of the stream.
Otherwise you'll end up uploading only the data without the headers :
with open(csv_file, 'rb') as DATA:
header=file.readline()
lines=[header.decode()]
csvreader = csv.reader(lines)
columns = next(csvreader)
// run-some-validations
DATA.seek(0)
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Opening the file as text is not only simpler, it allows you to separate the validation logic from the upload code.
To ensure only one line is read at a time you can use buffering=1
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Or
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
//If successful
return True
def upload_csv(filePath):
if check_csv_headers(filePath) :
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
The raw ECG that I have is in csv format. I need to convert it into .txt file which will have only the ECG data. I need a python code for the same. Can I get some help on this.
csv_file = 'ECG_data_125Hz_Simulator_Patch_Normal_Sinus.csv'
txt_file = 'ECG_data_125Hz_Simulator_Patch_Normal_Sinus.txt'
import csv
with open(txt_file, "w") as my_output_file:
with open(csv_file, "r") as my_input_file:
//need to write data to the output file
my_output_file.close()
The input ECG data looks like this:
Raw_ECG_data
What worked for me
import csv
csv_file = 'FL_insurance_sample.csv'
txt_file = 'ECG_data_125Hz_Simulator_Patch_Normal_Sinus.txt'
with open(txt_file, "w") as my_output_file:
with open(csv_file, "r") as my_input_file:
[ my_output_file.write(" ".join(row)+'\n') for row in csv.reader(my_input_file)]
my_output_file.close()
A few things:
You can open multiple files with the same context manager (with statement):
with open(csv_file, 'r') as input_file, open(txt_file, 'w') as output_file:
...
When using a context manager to handle files, there's no need to close the file, that's what the with statement is doing; it's saying "with the file open, do the following". So once the block is ended, the file is closed.
You could do something like:
with open(csv_file, 'r') as input_file, open(txt_file, 'w') as output_file:
for line in input_file:
output_file.write(line)
... But as #MEdwin says a csv can just be renamed and the commas will no longer act as separators; it will just become a normal .txt file. You can rename a file in python using os.rename():
import os
os.rename('file,txt', 'file.csv')
Finally, if you want to remove certain columns from the csv when writing to the txt file, you can use .split(). This allows you use an identifier such as a comma, and separate the line according this identifier into a list of strings. For example:
"Hello, this is a test".split(',')
>>> ["Hello", "this is a test"]
You can then just write certain indices from the list to the new file.
For more info on deleting columns en masse, see this post
I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:
When you call csvFilename = gzip.open(filename, 'rb') then reader = csv.reader(open(csvFilename)), is that reader not a valid csv file?
I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.
Problem I am trying to solve
I am trying to take a results.csv.gz and convert it to a results.csv so that I can turn the results.csv into a python dictionary and then combine it with another python dictionary.
File 1:
alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)
Calls File 2:
import gzip
import csv
def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData
def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC
*edit: also forgive my non-PEP style of naming in Python
gzip.open returns a file-like object (same as what plain open returns), not the name of the decompressed file. Simply pass the result directly to csv.reader and it will work (the csv.reader will receive the decompressed lines). csv does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb' is fine, the module doesn't deal with encodings, but then, neither does the csv module). Simply change:
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
to:
# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved
# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved
The following worked for me for python==3.7.9:
import gzip
my_filename = my_compressed_file.csv.gz
with gzip.open(my_filename, 'rt') as gz_file:
data = gz_file.read() # read decompressed data
with open(my_filename[:-3], 'wt') as out_file:
out_file.write(data) # write decompressed data
my_filename[:-3] is to get the actual filename so that it does get a random filename.
I am trying to add extra columns in a csv file after processing an input csv file. But, I am getting extra new line added after each line in the output.
What's missing or wrong in my below code -
import csv
with open('test.csv', 'r') as infile:
with open('test_out.csv', 'w') as outfile:
reader = csv.reader(infile, delimiter=',')
writer = csv.writer(outfile, delimiter=',')
for row in reader:
colad = row[5].rstrip('0123456789./ ')
if colad == row[5]:
col2ad = row[11]
else:
col2ad = row[5].split(' ')[-1]
writer.writerow([row[0],colad,col2ad] +row[1:])
I am processing huge a csv file so would like to get rid of those extra lines.
I had the same problem on Windows (your OS as well, I presume?). CSV and Windows as combination make a \r\r\n at the end of each line (so: double newline).
You need to open the output file in binary mode:
with open('test_out.csv', 'wb') as outfile:
For other answers:
Python's CSV writer produces wrong line terminator
CSV in Python adding an extra carriage return
I am using Blair's Python script which modifies a CSV file to add the filename as the last column (script appended below). However, instead of adding the file name alone, I also get the Path and File name in the last column.
I run the below script in windows 7 cmd with the following command:
python C:\data\set1\subseta\add_filename.py C:\data\set1\subseta\20100815.csv
The resulting ID field is populated by the following C:\data\set1\subseta\20100815.csv, although, all I need is 20100815.csv.
I'm new to python so any suggestion is appreciated!
import csv
import sys
def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
header = reader.next()
header.append('ID')
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename)
writer.writerow(row)
# Close the file and we're done.
f.close()
# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])
Use os.path.basename(filename). See http://docs.python.org/library/os.path.html for more details.