Adding csv filename to a column in python (200 files)

Adding csv filename to a column in python (200 files) - python

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz

First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

Related

How to load multiple csv files to an iterable variable in Python?

I want to open a variable number of csv files and then I would like to iterate over the csv files opened and upload 1 row of each file at a time to my sql database.
For example, loop through each file uploading the first row of each file to the database, then loop again through each file uploading the second row of each file to the database.
However, I'm stuck in having the csv files ready to be uploaded in a single object.
The error happens at 'csv_data[i] = csv.reader...'
Each file is for a different table, so I cannot append them.
import csv
import sys
i = 0
for argv in sys.argv[1:]:
csv_file = open(argv, newline='', encoding='utf-8-sig')
csv_data[i] = csv.reader(csv_file, dialect='excel', delimiter=',', quotechar='|')
csv_file.close()
i += 1
After this code, I would need something to loop through each file uploading a certain row number.

zip together the files, iterate through them:
file_handles = [open(file, newline='', encoding='utf-8-sig') for file in argv[1:]]
readers = (csv.reader(file, dialect='excel', delimiter=',', quotechar='|') for file in file_handles)
# zip here
for line_group in zip(*readers):
# line_group is a tuple of line i of each file
# don't forget to close your files
for file_handle in file_handles:
try:
file_handle.close()
except:
print("Issue closing one of the files")

Converting multiple CSVs to TSVs using Python

Trying to convert multiple (5) CSVs to TSVs using python, but when I run this it only creates 1 TSV. Can anyone help?
import csv
import sys
import os
import pathlib
print ("Exercise1.csv"), sys.argv[0]
dirname = pathlib.Path('/Users/Amber/Documents')
for file in pathlib.Path().rglob('*.csv'):
with open(file,'r') as csvin, open('Exercise1.tsv', 'w') as tsvout:
csvin = csv.reader(csvin)
tsvout = csv.writer(tsvout, delimiter='\t')
for row in csvin:
print(row)
tsvout.writerow(row)
exit ()
Thanks!

You're opening each file in the .csv folder with your for loop, but only opening a single file to write to (Exercise1.tsv). So you're overwriting the same file each time. You need to make new files to write to in each iteration of the loop. You could try something like this:
for i,file in enumerate(pathlib.Path().rglob('*.csv')):
with open(file,'r') as csvin, open('Exercise_{}.tsv'.format(i), 'w') as tsvout:
csvin = csv.reader(csvin)
tsvout = csv.writer(tsvout, delimiter='\t')
enumerate() adds a counter to the for loop. This will append a number to your Exercise.tsv files from 0 to the length of the files in your directory.

python code for Exporting scraped data to CSV

import csv
in_txt = csv.reader(open(post.text, "rb"), delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)
when executing above code i am getting IO error and i need to save in CSV in seperate folder

You dont need to open file before passing it to csvreader.
You can directly pass the file to csvreader and it would work
import csv
in_txt = csv.reader("post.text", "rb", delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)

Try the following:
import csv
with open(post.text, "rb") as f_input, open(r"C:\Users\sptechsoft\Documents\source3.csv", "wb") as f_output:
in_csv = csv.reader(f_input, delimiter='\t')
out_csv = csv.writer(f_output)
out_csv.writerows(in_csv)
The csv.reader() and csv.writer() needs either a list or a file object. It cannot open the file for you. By using with it ensures the files are correctly closed automatically afterwards.
Also do not forget to prefix your path string with r to disable any string escaping due to the backslashes.

erroneous line added while adding new columns python

I am trying to add extra columns in a csv file after processing an input csv file. But, I am getting extra new line added after each line in the output.
What's missing or wrong in my below code -
import csv
with open('test.csv', 'r') as infile:
with open('test_out.csv', 'w') as outfile:
reader = csv.reader(infile, delimiter=',')
writer = csv.writer(outfile, delimiter=',')
for row in reader:
colad = row[5].rstrip('0123456789./ ')
if colad == row[5]:
col2ad = row[11]
else:
col2ad = row[5].split(' ')[-1]
writer.writerow([row[0],colad,col2ad] +row[1:])
I am processing huge a csv file so would like to get rid of those extra lines.

I had the same problem on Windows (your OS as well, I presume?). CSV and Windows as combination make a \r\r\n at the end of each line (so: double newline).
You need to open the output file in binary mode:
with open('test_out.csv', 'wb') as outfile:
For other answers:
Python's CSV writer produces wrong line terminator
CSV in Python adding an extra carriage return

Append rows to csv file

I want to add python list to a csv file with this code:
RESULTS = ['aa','bb','cc']
resultFile = open("c:\\temp\\output4.csv",'wb')
wr = csv.writer(resultFile, dialect='excel')
wr.writerows(RESULTS)
resultFile.flush()
but this code overwrites my previous file. how to enable appending?
Unfortunately I can't find any solution to do this using this way.

Open the file in append mode instead:
resultFile = open("c:\\temp\\output4.csv", 'ab')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding csv filename to a column in python (200 files) - python

Related

How to load multiple csv files to an iterable variable in Python?

Converting multiple CSVs to TSVs using Python

python code for Exporting scraped data to CSV

erroneous line added while adding new columns python

Append rows to csv file

Categories

Resources