How to load multiple csv files to an iterable variable in Python? - python

I want to open a variable number of csv files and then I would like to iterate over the csv files opened and upload 1 row of each file at a time to my sql database.
For example, loop through each file uploading the first row of each file to the database, then loop again through each file uploading the second row of each file to the database.
However, I'm stuck in having the csv files ready to be uploaded in a single object.
The error happens at 'csv_data[i] = csv.reader...'
Each file is for a different table, so I cannot append them.
import csv
import sys
i = 0
for argv in sys.argv[1:]:
csv_file = open(argv, newline='', encoding='utf-8-sig')
csv_data[i] = csv.reader(csv_file, dialect='excel', delimiter=',', quotechar='|')
csv_file.close()
i += 1
After this code, I would need something to loop through each file uploading a certain row number.

zip together the files, iterate through them:
file_handles = [open(file, newline='', encoding='utf-8-sig') for file in argv[1:]]
readers = (csv.reader(file, dialect='excel', delimiter=',', quotechar='|') for file in file_handles)
# zip here
for line_group in zip(*readers):
# line_group is a tuple of line i of each file
# don't forget to close your files
for file_handle in file_handles:
try:
file_handle.close()
except:
print("Issue closing one of the files")

Related

Python [WinError 32] with shutil.move(source, destination) during the move (writes most of the file, but not all)

I'm trying to add a reference to an image file to a downloaded csv filled with product names and skus.
I downloaded a csv file with 1055 lines (products), I have 1,635 gallery images that should be tied to those products. So after the loop I should have a temp csv file with 1,635 lines. Where products with multiple images are listed as duplicates.
base_csv = 'products_gallery.csv'
temp_csv = NamedTemporaryFile(mode='w', delete=False)
gallery_path = 'path_to_images/compressed_gallery'
gallery_imgs = os.listdir(gallery_path)
fields = ['name','sku','product_id','gallery_image']
#update all rows to contain the images.
with open(base_csv, 'r', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile, fieldnames=fields, delimiter='\t') #read downloaded csv file
writer = csv.DictWriter(temp_csv, fieldnames=fields, delimiter='\t') #write to temp csv file
for row in reader:
for img_name in gallery_imgs: #loop through local images and match product names
if row['name'] in img_name:
row['gallery_image'] = img_name
writer.writerow(row)
else:
continue
time.sleep(10) #doesn't help
shutil.move(temp_csv.name, base_csv)
The code crashes on the last line above with
PermissionError: [WinError 32] The process cannot access the file
because it is being used by another process:
'C:\path_to_user\AppData\Local\Temp\tmpgu67y173'
Yet somehow it manages to write/update my main csv file to 1,569 lines, missing only the last 39 products. The temp csv file that's created has all 1,635 lines. I don't know why this would work partially instead of not at all. I've used this exact code before to modify other csv files that I've created without any issues.
I've tried:
clearing out python processes from the Task Manager
Rebooting to clear out processes
adding time.sleep(10) before the shutil.move() just in case it needs time to close the file or something
It seems I forgot to include the temp file with my open command and therefore the temp file wasn't being closed properly.
with open(base_csv, 'r', encoding='UTF-8') as csvfile, temp_csv:
Changing this line here resolved the issue.

Adding csv filename to a column in python (200 files)

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz
First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

Why is my created file blank?

I'm having trouble storing data minus the header into a new file. I don't understand Python enough to debug.
Ultimately, I'd like to extract data from each file and store into one main csv file rather than opening each file individually, while copying and pasting everything into the main csv file I would like.
My code is as follows:
import csv, os
# os.makedirs() command will create a folder titled in green or in apostrophies
os.makedirs('HeaderRemoved', exist_ok=True)
# Loop through every file in the current working directory.
for csvFilename in os.listdir('directory'):
if not csvFilename.endswith('.csv'):
continue #skips non-csv files
print('Removing header from ' + csvFilename + '...')
### Read the CSV file in (skipping first Row)###
csvRows = []
csvFileObj = open(csvFilename)
readerObj = csv.reader(csvFileObj)
for row in readerObj:
if readerObj.line_num == 1:
continue # skips first row
csvRows.append(row)
print (csvRows) #----------->Check to see if it has anything stored in array
csvFileObj.close()
#Todo: Write out the CSV file
csvFileObj = open(os.path.join('HeaderRemoved', 'directory/mainfile.csv'), 'w',
newline='')
csvWriter = csv.writer(csvFileObj)
for row in csvRows:
csvWriter.writerow(row)
csvFileObj.close()
The csv files that are being "scanned" or "read" have text and numbers. I do not know if this might be preventing the script from properly "reading" and storing the data into the csvRow array.
The problem comes from you reusing the same variable when you loop over your file names. See the documentation for listdir, it returns a list of filenames. Then your newfile isn't really pointing to the file anymore, but
to a string filename from the directory.
https://docs.python.org/3/library/os.html#os.listdir
with open(scancsvFile, 'w') as newfile:
array = []
#for row in scancsvFile
for newfile in os.listdir('directory'): # <---- you're reassigning the variable newfile here
if newfile.line_num == 1:
continue
array.append(lines)
newfile.close()

add file name without file path to csv in python

I am using Blair's Python script which modifies a CSV file to add the filename as the last column (script appended below). However, instead of adding the file name alone, I also get the Path and File name in the last column.
I run the below script in windows 7 cmd with the following command:
python C:\data\set1\subseta\add_filename.py C:\data\set1\subseta\20100815.csv
The resulting ID field is populated by the following C:\data\set1\subseta\20100815.csv, although, all I need is 20100815.csv.
I'm new to python so any suggestion is appreciated!
import csv
import sys
def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
header = reader.next()
header.append('ID')
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename)
writer.writerow(row)
# Close the file and we're done.
f.close()
# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])
Use os.path.basename(filename). See http://docs.python.org/library/os.path.html for more details.

Appending to a csv file in Python

Hi I have a csv file with names and surnames and empty username and password columns.
How can I use python csv to write to the columns 3 and 4 in each row, just appending to it, not overwriting anything.
The csv module doesn't do that, you'd have to write it out to a separate file then overwrite the old file with the new one, or read the whole file into memory and then write over it.
I'd recommend the first option:
from csv import writer as csvwriter, reader as cvsreader
from os import rename # add ', remove' on Windows
with open(infilename) as infile:
csvr = csvreader(infile)
with open(outfilename, 'wb') as outfile:
csvw = csvwriter(outfile)
for row in csvr:
# do whatever to get the username / password
# for this row here
row.append(username)
row.append(password)
csvw.writerow(row)
# or 'csvw.writerow(row + [username, password])' if you want one line
# only on Windows
# remove(infilename)
rename(outfilename, infilename)

Categories

Resources