Converting multiple CSVs to TSVs using Python - python

Trying to convert multiple (5) CSVs to TSVs using python, but when I run this it only creates 1 TSV. Can anyone help?
import csv
import sys
import os
import pathlib
print ("Exercise1.csv"), sys.argv[0]
dirname = pathlib.Path('/Users/Amber/Documents')
for file in pathlib.Path().rglob('*.csv'):
with open(file,'r') as csvin, open('Exercise1.tsv', 'w') as tsvout:
csvin = csv.reader(csvin)
tsvout = csv.writer(tsvout, delimiter='\t')
for row in csvin:
print(row)
tsvout.writerow(row)
exit ()
Thanks!

You're opening each file in the .csv folder with your for loop, but only opening a single file to write to (Exercise1.tsv). So you're overwriting the same file each time. You need to make new files to write to in each iteration of the loop. You could try something like this:
for i,file in enumerate(pathlib.Path().rglob('*.csv')):
with open(file,'r') as csvin, open('Exercise_{}.tsv'.format(i), 'w') as tsvout:
csvin = csv.reader(csvin)
tsvout = csv.writer(tsvout, delimiter='\t')
enumerate() adds a counter to the for loop. This will append a number to your Exercise.tsv files from 0 to the length of the files in your directory.

Related

Adding csv filename to a column in python (200 files)

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz
First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

Printing csv through printer with python

I want to output csv file with python. I have gone through below code and it is working well with .txt file but I am unable to print csv through it.
import os
import tempfile
filename = tempfile.mktemp(".txt")
open (filename , "w").write ("Printing file")
os.startfile(filename, "print")
Actually I want to print a csv file that had been already created, there will be no need to write and create new file then print it out.
Edit: From print I meant hardcopy print through printer
If you want to print the content of a csv you can try this:
import csv
file_path = 'a.csv'
with open(file_path) as file:
content = csv.reader(file)
for row in content:
print(row)
I was talking about printing csv file as hardcopy with python code.
def printing():
#reading from csv writing in txt
with open("CSV_files//newfile.txt", "w") as my_output_file:
cs = pd.read_csv("CSV_files\\attendance.csv",header=None,index_col=None)
with open("CSV_files//attendance.csv", "r") as my_input_file:
[ my_output_file.write(" | ".join(row)+'\n') for row in csv.reader(my_input_file)]
my_output_file.close()
#reading from file and storing into reader and converting into string as .write() takes string
strnew = ""
with open('CSV_files//newfile.txt',"r") as f:
reader = f.read()
strnew = reader
#for checking
with open('CSV_files//print.txt',"w") as f:
f.write(strnew)
#printing
filename = tempfile.mktemp("attendance.txt")#creating a temp file
open (filename , "w").write(strnew)
os.startfile(filename, "print")
messagebox.showinfo("Print","Printing Request sent successfully!")
For more info:
github project link

How to combine several text files into one file?

I want to combine several text files into one output files.
my original code is to download 100 text files then each time I filter the text from several words and the write it to the output file.
Here is part of my code that suppose to combine the new text with the output text. The result each time overwrite the output file, delete the previous content and add the new text.
import fileinput
import glob
urls = ['f1.txt', 'f2.txt','f3.txt']
N =0;
print "read files"
for url in urls:
read_files = glob.glob(urls[N])
with open("result.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())
N+=1
and I tried this also
import fileinput
import glob
urls = ['f1.txt', 'f2.txt','f3.txt']
N =0;
print "read files"
for url in urls:
file_list = glob.glob(urls[N])
with open('result-1.txt', 'w') as file:
input_lines = fileinput.input(file_list)
file.writelines(input_lines)
N+=1
Is there any suggestions?
I need to concatenate/combine approximately 100 text files into one .txt file In sequence manner. (Each time I read one file and add it to the result.txt)
The problem is that you are re-opening the output file on each loop iteration which will cause it to overwrite -- unless you explicitly open it in append mode.
The glob logic is also unnecessary when you already know the filename.
Try this instead:
with open("result.txt", "wb") as outfile:
for url in urls:
with open(url, "rb") as infile:
outfile.write(infile.read())

Using a list of file names from a CSV file to find and copy files using Python

I need to use a list of file names from a CSV file to find and copy the respective files. Below is the code. I'm not getting any errors, but the following is not yielding any results (I have checked and rechecked the sample list I created with the appropriate files). Any idea where I messed up? I appreciate any and all help. (I'm using the latest version of Python)
import os, shutil, csv
files_to_find = []
with open('C:\\pdfsearch.csv') as fh:
reader = csv.reader(fh)
files_to_find = list(reader)
for root, dirs, files in os.walk('C:\\mail'):
for _file in files:
if _file in files_to_find:
print ('Found file in: ') + str(root)
shutil.copy(os.path.abspath(root + '/' + _file), 'C:\\Matches')
the problem is that csvreader returns rows so you end up with a list of lists ... so instead of the output you expect ['1003055716CBR201510.pdf', '1003080516CBR201510.pdf'] you get [['1003055716CBR201510.pdf'], ['1003080516CBR201510.pdf']]
just dont use csvreader
_files_to_find=set(open("pdfsearch.csv").read().splitlines(False))
or alternatively take the first element of each row
with open('C:\\pdfsearch.csv') as fh:
reader = csv.reader(fh)
for row in reader:
_files2find.append(row[0])

add file name without file path to csv in python

I am using Blair's Python script which modifies a CSV file to add the filename as the last column (script appended below). However, instead of adding the file name alone, I also get the Path and File name in the last column.
I run the below script in windows 7 cmd with the following command:
python C:\data\set1\subseta\add_filename.py C:\data\set1\subseta\20100815.csv
The resulting ID field is populated by the following C:\data\set1\subseta\20100815.csv, although, all I need is 20100815.csv.
I'm new to python so any suggestion is appreciated!
import csv
import sys
def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
header = reader.next()
header.append('ID')
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename)
writer.writerow(row)
# Close the file and we're done.
f.close()
# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])
Use os.path.basename(filename). See http://docs.python.org/library/os.path.html for more details.

Categories

Resources