I'm downloading files from a site and I need to save the original file, then open it and then add the url that the file was downloaded from and the date of the download to the file before saving the file to a different directory.
I've used this answer to amend the csv: how to Add New column in beginning of CSV file by Python
but I'm struggling to redirect the file to a different directory before the write() function is called.
Is the best answer to write the file and then move it, or is there a way to write the file to a different directory within the open() function?
if fileName in fileList:
print "already got file "+ fileName
else:
# download the file
urllib.urlretrieve(csvUrl, os.path.basename(fileName))
#print "Saving to 1_Downloaded "+ fileName
# open the file and then add the extra columns
with open(fileName, 'rb') as inf, open("out_"+fileName, 'wb') as outf:
csvreader = csv.DictReader(inf)
# add column names to beginning
fieldnames = ['url_source','downloaded_at'] + csvreader.fieldnames
csvwriter = csv.DictWriter(outf, fieldnames)
csvwriter.writeheader()
for node, row in enumerate(csvreader, 1):
csvwriter.writerow(dict(row, url_source=csvUrl, downloaded_at=today))
I believe both would work.
To me it seems the neatest way to do it would be to append to the file and relocate it afterwards.
Have a look at:
shutil.move
I belive rewriting the entire file would be less efficient.
It's not necessary to rebuild the file, try using the time module to create a time stamp string for your file name, and using os.rename to move your file.
Example - this just moves the file to your specified location:
os.rename('filename.csv','NEW_dir/filename.csv')
Hope this helps.
Went with an additional routine using shutil in the end:
# move and rename the 'out_' files to the right dir
source = os.listdir(downloaded)
for files in source:
if files.startswith('out_'):
newName = files.replace('out_','')
newPath = renamed+'/'+newName
shutil.move(files,newPath)
Related
I am processing files in a directory and after processing a file I want to save it using the original name but also add xx to the file name. My purpose is to identify which files have been processed.
Basic suggestions as to how to proceed are appreciated
If the only purpose is flag the file in order to know which files have been processed I would try another strategy (adding file metadata or something). But from your question, I infer the only thing you need is a rename of the file after being processed... You can use os.rename:
import os
filename = "example.txt"
flag_suffix = ".xx"
with open(filename, "wb+") as f:
# process file
...
os.rename(filename, f"{filename}{flag_suffix}")
My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')
Sorry for being not so clear. I want to read csv files in a for loop. each file is afterwards processed with some calculations. Afterwards I want to read the next file and do the same. Instead of manually changing the file names how can I do this with a loop ?
My code below is not working, putting the filenames for the pd_read_csv is wrong. But how to solve this?
filenumber=0
for files in range(4):
filenames=["file1","file2",
"file3","file4"]
os.chdir(r"/folder")
results=pd.read_csv('files[filenumber].csv',sep=',',header=0, index_col=None)
#dosomething with the file and move than to the next file
filenumber=+1
I guess you are looking for this:
filenames=["file1","file2","file3","file4"]
for i in range(len(filenames)):
filename = filenames[i]+'.csv'
results=pd.read_csv(filename,sep=',',header=0, index_col=None)
# Now do whatever operations you want
Since you have a pattern in your file names, another way would be:
for i in range(4):
filename = 'file'+str(i+1)+'.csv'
results=pd.read_csv(filename,sep=',',header=0, index_col=None)
# Now do whatever operations you want
You can iterate over your entire computer automatically:
import csv
import os
for root, dirs, files in os.walk(".\\your_directory_to_start\\"):
# for each file and directory...
for file in files:
# for each file
if file.endsswith(".csv"):
# if file is csv
print(os.path.join(root, file))
# show file name with location
ruta_completa = os.path.join(root, file)
# store in a variable the full path to file
mi_archivo = open(ruta_completa)
#open the file
mi_csv = csv.reader(mi_archivo)
# extract data from file
mis_datos = list(mi_csv)
# convert data from file into list
mis_datos
# show in screen all the data
mis_datos[0]
#extract the first row value
mis_datos[0][0]
#extract the first cell value in the first row
# do whatever you want... even create a new xlsx file or csv file
I have a list of pathnames to CSV files. I need to open each CSV file, take the data without the header and then merge it all together into a new CSV file.
I have this code which gets me the list of CSV file pathnames:
file_list = []
for folder_name, sub_folders, file_names in os.walk(wd):
for file_name in file_names:
file_extention = folder_name + '\\' + file_name
if file_name.endswith('csv'):
file_list.append(file_extention)
An example of my list is:
['C:\\Users\\Documents\\GPS_data\\West_coast\\Westland\\GPS_data1.csv',
'C:\\Users\\Documents\\GPS_data\\West_coast\\Westland\\GPS_data2.csv',
'C:\\Users\\Documents\\GPS_data\\West_coast\\Westland\\GPS_data3.csv']
I am struggling to figure out what to do, any help would be greatly appreciated. Thanks.
The main idea is to read in each line of a file, and write it to the new file. But remember to skip the first line that has the column headers in it. I previously recommend the cvs module, however it doesn't seem like that is necessary, since this task does not require analyzing the data.
file_list = ['data1.csv','data2.csv']
with open('new.csv', 'w') as newfile: # create a new file
for filename in filelist:
with open(filename) as csvfile:
next(csvfile) # skip the header row
for row in csvfile:
newfile.write(line) # write to the new csv file
Edit: clarified my answer.
I am using
for file in fileList:
f.write(open(file).read())
I am combining files if a folder to one csv. However I dont need X amount of headers in the one file.
Is there a way to use this and have it write everything but the first row (header) coming from the files in the files?
Use python csv module
Or something like that:
for file_name in file_list:
file_obj = open(file_name)
file_obj.read()
f.write(file_obj.read())
This solution doesn't load whole file into memory, so when you use file_obj.readlines(), whole file content is load into memory
Note, that it isn't good practice to name variables with builtin names
for file in fileList:
mylines = open(file).readlines()
f.write("".join(mylines[1:]))
This should point you in the right direction. Please don't do your homework on stackoverflow.
If it's a cvs file, look into python csv lib.