Writing filepaths to a new csv using Python? - python

I'm new to python and trying to create a csv file from a list of files within a folder. Essentially, the code is supposed to write rows based on the files in a folder by parsing the file name to isolate the unique identifier (which here is the name of a city), then writing that identifier to a column, followed by the original file name/file path to the attachment. This bit of code writes the headers, "City" and "Attachment", then stops and returns the error statement, that no PDFs are in the folder (there are, in fact, 100 PDF files in the folder).
Here is the code that I'm having some trouble editing:
attachments_folder = "H:/Attachments"
attachments_table = attachments_folder + "\\" + "attachments_Table.csv"
for f in os.listdir(attachments_folder):
file_name, file_ext = os.path.splitext(f)
f_file, f_city = file_name.split('_')
writer = csv.writer(open(attachments_table, "wb"), delimiter=",")
writer.writerow(["City", "Attachment"])
if str(f).find(".pdf") > -1:
writer.writerow([f_city, f])
else:
print "Error: no PDF's found"
I apologize in advance that this is likely a clunky and/or yucky bit of code. I was curious if I needed to break out the two things going on here (parsing the file name, then write lines to the csv rows), but I got syntax errors with this reformatted version:
for f in os.listdir(attachments_folder):
file_name, file_ext = os.path.splitext(f)
f_file, f_city = file_name.split('_')
writer = csv.writer(open(attachments_table, "wb"), delimiter=",")
writer.writerow(["City", "Attachment"])
for f in os.listdir(attachments_folder):
writer.writerow(["City", "Attachment"])
if str(f).find(".pdf") > -1:
writer.writerow([f_city, f])
else:
print "Error: no PDF's found"
Any guidance on what I'm missing here would be much appreciated! Thanks!

There are a couple of issues going on here:
Most importantly, you're overwriting the file in each step of the loop, thereby deleting whatever may have been written in prior steps.
Secondly, and likely adding to the confusion, your warning isn't telling the truth. One or more PDFs may have been found.
I fixed the first issue by opening the file only once, before the loop. I fixed the second issue by modifying the existing warning and adding another positive status when a PDF is found. Here's a working rewrite of your program (in Python 3):
# In Python 3, newline='' prevents extra blank rows. You don't need this in Python 2.
# You can also leave 'wb' as the file mode in Python 2.
with open(attachments_table, 'w', newline='') as output:
writer = csv.writer(output, delimiter=",")
writer.writerow(['City', 'Attachment'])
for f in os.listdir(attachments_folder):
file_name, file_ext = os.path.splitext(f)
f_file, f_city = file_name.split('_')
if str(f).find('.pdf') > -1:
print('The current file is a PDF.')
writer.writerow([f_city, f])
else:
print('The current file is not a PDF.')

Related

gzipping files from python is changing the file names

I am trying to gzip files using python 3. When I gzip the files, the code is changing the filename without me doing anything. I am not sure I totally understand the working of gzip module.
Below is the code:
dir_in = '/localfolder/new_files/'
dir_out = '/localfolder/zippedfiles/
file_name = 'transactions_may05'
def gzip_files(dir_in, dir_out, file_name):
with open(dir_in + file_name, 'rb') as f_in, gzip.open(dir_out + 'unprocessed.' + file_name + '.gz', 'wb') as f_out:
f_out.writelines(f_in)
Expected Output:
Outer file: unprocessed.transactions_may05.gz
when I double click it, I should get the original file transactions_may05
Current Output:
Outer file: unprocessed.transactions_may05.gz -- As expected
when I double click it the internal file also has unprocessed. appended to it. I am not sure why unprocessed. gets appended to internal file name
Internal File:unprocessed.transactions_may05
Any help would be appreciated. Thank you.
That's the expected behavior of gzip and gunzip.
As mentioned in the manual page:
gunzip takes a list of files on its command line and replaces each
file whose name ends with .gz, -gz, .z, -z, or _z (ignoring case) and
which begins with the correct magic number with an uncompressed
file without the original extension.
If you don't want the name to change, you should not modify the filename when you compress it.

Create new files, don't overwrite existing files, in python

I'm writing to a file in three functions and i'm trying not overwrite the file. I want every time i run the code i generate a new file
with open("atx.csv", 'w')as output:
writer = csv.writer(output)
If you want to write to different files each time you execute the script, you need to change the file names, otherwise they will be overwritten.
import os
import csv
filename = "atx"
i = 0
while os.path.exists(f"{filename}{i}.csv"):
i += 1
with open(f"{filename}{i}.csv", 'w') as output:
writer = csv.writer(output)
writer.writerow([1, 2, 3]) #or whatever you want to write in the file, this line is just an example
Here I use os.path.exists() to check if a file is already present on the disk, and increment the counter.
First time you run the script, you get axt0.csv, second time axt1.csv, and so on.
Replicate this for your three files.
EDIT
Also note that here I'm using formatted string literals which are available since python3.6. If you have an earlier version of python, use "{}{:d}.csv".format(filename, i) instead of f"{filename}{i}.csv"
EDIT bis after comments
If the same file is needed to be manipulated by more functionsduring the execution of the script, the easiest thing came to my mind is to open the writer outside the functions and pass it as an argument.
filename = "atx"
i = 0
while os.path.exists(f"{filename}{i}.csv"):
i += 1
with open(f"{filename}{i}.csv", 'w') as output:
writer = csv.writer(output)
foo(writer, ...) #put the arguments of your function instead of ...
bar(writer, ...)
etc(writer, ...)
This way each time you call one of the functions it writes to the same file, appending the output at the bottom of the file.
Of course there are other ways. You might check for the file name existence only in the first function you call, and in the others just open the file with the 'a' options, to append the output.
you can do something like this so that each file gets named something a little different and therefore will not be overwritten:
for v in ['1','2','3']:
with open("atx_{}.csv".format(v), 'w')as output:
writer = csv.writer(output)
You are using just one filename. When using the same value as a name atx.csv you will either overwritte it with w or append with a.
If you want new files to be created, just check first if the file is there.
import os
files = os.listdir()
files = [f for f in files if 'atx' in f]
num = str(len(files)) if len(files) > 0 else ''
filename = "atx{0}.csv".format(num)
with open(filename, 'w')as output:
writer = csv.writer(output)
Change with open("atx.csv", 'w') to with open("atx.csv", 'a')
https://www.guru99.com/reading-and-writing-files-in-python.html#2

Read all the text files in a folder and change a character in a string if it presents

I have a folder with csv formated documents with a .arw extension. Files are named as 1.arw, 2.arw, 3.arw ... etc.
I would like to write a code that reads all the files, checks and replaces the forwardslash / with a dash -. And finally creates new files with the replaced character.
The code I wrote as follows:
for i in range(1,6):
my_file=open("/path/"+str(i)+".arw", "r+")
str=my_file.read()
if "/" not in str:
print("There is no forwardslash")
else:
str_new = str.replace("/","-")
print(str_new)
f = open("/path/new"+str(i)+".arw", "w")
f.write(str_new)
my_file.close()
But I get an error saying:
'str' object is not callable.
How can I make it work for all the files in a folder? Apparently my for loop does not work.
The actual error is that you are replacing the built-in str with your own variable with the same name, then try to use the built-in str() after that.
Simply renaming the variable fixes the immediate problem, but you really want to refactor the code to avoid reading the entire file into memory.
import logging
import os
for i in range(1,6):
seen_slash = False
input_filename = "/path/"+str(i)+".arw"
output_filename = "/path/new"+str(i)+".arw"
with open(input_filename, "r+") as input, open(output_filename, "w") as output:
for line in input:
if not seen_slash and "/" in line:
seen_slash = True
line_new = line.replace("/","-")
print(line_new.rstrip('\n')) # don't duplicate newline
output.write(line_new)
if not seen_slash:
logging.warn("{0}: No slash found".format(input_filename))
os.unlink(output_filename)
Using logging instead of print for error messages helps because you keep standard output (the print output) separate from the diagnostics (the logging output). Notice also how the diagnostic message includes the name of the file we found the problem in.
Going back and deleting the output filename when you have examined the entire input file and not found any slashes is a mild wart, but should typically be more efficient.
This is how I would do it:
for i in range(1,6):
with open((str(i)+'.arw'), 'r') as f:
data = f.readlines()
for element in data:
element.replace('/', '-')
f.close()
with open((str(i)+'.arw'), 'w') as f:
for element in data:
f.write(element)
f.close()
this is assuming from your post that you know that you have 6 files
if you don't know how many files you have you can use the OS module to find the files in the directory.

Using variable as part of name of new file in python

I'm fairly new to python and I'm having an issue with my python script (split_fasta.py). Here is an example of my issue:
list = ["1.fasta", "2.fasta", "3.fasta"]
for file in list:
contents = open(file, "r")
for line in contents:
if line[0] == ">":
new_file = open(file + "_chromosome.fasta", "w")
new_file.write(line)
I've left the bottom part of the program out because it's not needed. My issue is that when I run this program in the same direcoty as my fasta123 files, it works great:
python split_fasta.py *.fasta
But if I'm in a different directory and I want the program to output the new files (eg. 1.fasta_chromsome.fasta) to my current directory...it doesn't:
python /home/bin/split_fasta.py /home/data/*.fasta
This still creates the new files in the same directory as the fasta files. The issue here I'm sure is with this line:
new_file = open(file + "_chromosome.fasta", "w")
Because if I change it to this:
new_file = open("seq" + "_chromosome.fasta", "w")
It creates an output file in my current directory.
I hope this makes sense to some of you and that I can get some suggestions.
You are giving the full path of the old file, plus a new name. So basically, if file == /home/data/something.fasta, the output file will be file + "_chromosome.fasta" which is /home/data/something.fasta_chromosome.fasta
If you use os.path.basename on file, you will get the name of the file (i.e. in my example, something.fasta)
From #Adam Smith
You can use os.path.splitext to get rid of the .fasta
basename, _ = os.path.splitext(os.path.basename(file))
Getting back to the code example, I saw many things not recommended in Python. I'll go in details.
Avoid shadowing builtin names, such as list, str, int... It is not explicit and can lead to potential issues later.
When opening a file for reading or writing, you should use the with syntax. This is highly recommended since it takes care to close the file.
with open(filename, "r") as f:
data = f.read()
with open(new_filename, "w") as f:
f.write(data)
If you have an empty line in your file, line[0] == ... will result in a IndexError exception. Use line.startswith(...) instead.
Final code :
files = ["1.fasta", "2.fasta", "3.fasta"]
for file in files:
with open(file, "r") as input:
for line in input:
if line.startswith(">"):
new_name = os.path.splitext(os.path.basename(file)) + "_chromosome.fasta"
with open(new_name, "w") as output:
output.write(line)
Often, people come at me and say "that's hugly". Not really :). The levels of indentation makes clear what is which context.

python clear content writing on same file

I am a newbie to python. I have a code in which I must write the contents again to my same file,but when I do it it clears my content.Please help to fix it.
How should I modify my code such that the contents will be written back on the same file?
My code:
import re
numbers = {}
with open('1.txt') as f,open('11.txt', 'w') as f1:
for line in f:
row = re.split(r'(\d+)', line.strip())
words = tuple(row[::2])
if words not in numbers:
numbers[words] = [int(n) for n in row[1::2]]
numbers[words] = [n+1 for n in numbers[words]]
row[1::2] = map(str, numbers[words])
indentation = (re.match(r"\s*", line).group())
print (indentation + ''.join(row))
f1.write(indentation + ''.join(row) + '\n')
In general, it's a bad idea to write over a file you're still processing (or change a data structure over which you are iterating). It can be done...but it requires much care, and there is little safety or restart-ability should something go wrong in the middle (an error, a power failure, etc.)
A better approach is to write a clean new file, then rename it to the old name. For example:
import re
import os
filename = '1.txt'
tempname = "temp{0}_{1}".format(os.getpid(), filename)
numbers = {}
with open(filename) as f, open(tempname, 'w') as f1:
# ... file processing as before
os.rename(tempname, filename)
Here I've dropped filenames (both original and temporary) into variables, so they can be easily referred to multiple times or changed. This also prepares for the moment when you hoist this code into a function (as part of a larger program), as opposed to making it the main line of your program.
You don't strictly need the temporary name to embed the process id, but it's a standard way of making sure the temp file is uniquely named (temp32939_1.txt vs temp_1.txt or tempfile.txt, say).
It may also be helpful to create backups of the files as they were before processing. In which case, before the os.rename(tempname, filename) you can drop in code to move the original data to a safer location or a backup name. E.g.:
backupname = filename + ".bak"
os.rename(filename, backupname)
os.rename(tempname, filename)
While beyond the scope of this question, if you used a read-process-overwrite strategy frequently, it would be possible to create a separate module that abstracted these file-handling details away from your processing code. Here is an example.
Use
open('11.txt', 'a')
To append to the file instead of w for writing (a new or overwriting a file).
If you want to read and modify file in one time use "r+' mode.
f=file('/path/to/file.txt', 'r+')
content=f.read()
content=content.replace('oldstring', 'newstring') #for example change some substring in whole file
f.seek(0) #move to beginning of file
f.write(content)
f.truncate() #clear file conent "tail" on disk if new content shorter then old
f.close()

Categories

Resources