python code for Exporting scraped data to CSV - python

import csv
in_txt = csv.reader(open(post.text, "rb"), delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)
when executing above code i am getting IO error and i need to save in CSV in seperate folder

You dont need to open file before passing it to csvreader.
You can directly pass the file to csvreader and it would work
import csv
in_txt = csv.reader("post.text", "rb", delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)

Try the following:
import csv
with open(post.text, "rb") as f_input, open(r"C:\Users\sptechsoft\Documents\source3.csv", "wb") as f_output:
in_csv = csv.reader(f_input, delimiter='\t')
out_csv = csv.writer(f_output)
out_csv.writerows(in_csv)
The csv.reader() and csv.writer() needs either a list or a file object. It cannot open the file for you. By using with it ensures the files are correctly closed automatically afterwards.
Also do not forget to prefix your path string with r to disable any string escaping due to the backslashes.

Related

How to read the headers of a csv file using csv module in "rb" mode?

I am currently reading the csv file in "rb" mode and uploading the file to an s3 bucket.
with open(csv_file, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
All of this is working fine but now I have to validate the headers in the csv file before making the put call.
When I try to run below, I get an error.
with open(csv_file, 'rb') as DATA:
csvreader = csv.reader(file)
columns = next(csvreader)
# run-some-validations
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
This throws
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
As a workaround, I have created a new function which opens the file in "r" mode and does validation on the csv headers and this works ok.
def check_csv_headers():
with open(csv_file, 'r') as file:
csvreader = csv.reader(file)
columns = next(csvreader)
I do not want to read the same file twice. Once for header validation and once for uploading to s3. The upload part also doesn't work if I do it in "r" mode.
Is there a way I can achieve this while reading the file only once in "rb" mode ? I have to make this work using the csv module and not the pandas library.
Doing what you want is possible but not very efficient. Simply opening a file isn't that expensive. The CSV reader only reads only line at a time, not the entire file.
To do what you want you have to :
Read the first line as bytes
Decode it into a string (using the correct encoding)
Convert it to a list of strings
Parse it with csv.reader and finally
Seek to the start of the stream.
Otherwise you'll end up uploading only the data without the headers :
with open(csv_file, 'rb') as DATA:
header=file.readline()
lines=[header.decode()]
csvreader = csv.reader(lines)
columns = next(csvreader)
// run-some-validations
DATA.seek(0)
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Opening the file as text is not only simpler, it allows you to separate the validation logic from the upload code.
To ensure only one line is read at a time you can use buffering=1
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Or
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
//If successful
return True
def upload_csv(filePath):
if check_csv_headers(filePath) :
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Adding csv filename to a column in python (200 files)

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz
First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

CSV NamedTemporaryFile not saving properly

I'm trying to make a csv file using Python's csv and tempfile tools. I've been declaring it as follows:
csvattachment = tempfile.NamedTemporaryFile(suffix='.csv', prefix=('student_' + studentID), delete=False)
with open(csvattachment.name, 'w+') as csvfile:
filewriter = csv.writer(csvfile, delimiter=',')
filewriter.writerow([ #WRITE CONTENT HERE])
What I am then doing after is attaching this file and sending it out. The problem with that is that instead of being called 'student_1736823.csv' the attachment name is something uglier like <tempfile._TemporaryFileWrapper object at 0x10cbf5e48>
The NamedTemporaryFile() class already returns an open file, you don't have to reopen it
with tempfile.NamedTemporaryFile(suffix='.csv', prefix=('student_' + studentID),
delete=False, mode='w+') as csvfile:
filewriter = csv.writer(csvfile, delimiter=',')
filewriter.writerow([ #WRITE CONTENT HERE])

erroneous line added while adding new columns python

I am trying to add extra columns in a csv file after processing an input csv file. But, I am getting extra new line added after each line in the output.
What's missing or wrong in my below code -
import csv
with open('test.csv', 'r') as infile:
with open('test_out.csv', 'w') as outfile:
reader = csv.reader(infile, delimiter=',')
writer = csv.writer(outfile, delimiter=',')
for row in reader:
colad = row[5].rstrip('0123456789./ ')
if colad == row[5]:
col2ad = row[11]
else:
col2ad = row[5].split(' ')[-1]
writer.writerow([row[0],colad,col2ad] +row[1:])
I am processing huge a csv file so would like to get rid of those extra lines.
I had the same problem on Windows (your OS as well, I presume?). CSV and Windows as combination make a \r\r\n at the end of each line (so: double newline).
You need to open the output file in binary mode:
with open('test_out.csv', 'wb') as outfile:
For other answers:
Python's CSV writer produces wrong line terminator
CSV in Python adding an extra carriage return

How to pull a file with python code in Unix env

I have the file in this place
/home/unica/app/Affinium/Campaign/partitions/partition1/scripts/runscripts/campaigns/cnyr/dev
I want to call it here.like.
with open('/home/unica/app/Affinium/Campaign/partitions/partition1/scripts/runscripts/campaigns/cnyr/dev/CNYR_DM_TM_CAMPAIGN_WAVES.csv','rb') as csvfile
But it is throwing error as syntax error.Also how can I simplify the path name into some alias name.
Try this:
fileName = '/home/unica/app/Affinium/Campaign/partitions/partition1/scripts/runscripts/campaigns/cnyr/dev/CNYR_DM_TM_CAMPAIGN_WAVES.csv'
with open(fileName, 'rb') as csvfile: # notice that the line must end with a ':'
for line in csvfile:
# do something
Or even better, use the csv module:
import csv
with open(fileName, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|') # specify delimiter, etc.
for row in reader:
# do something

Categories

Resources