I have a script which does the following: If there is a file that ends with 'Kobo.xlsx' in the same directory as the script, it reads the file, makes some changes to it, defines a 'new_filename' from the name of the file, and spits out a new .xlsx with the new filename. Here is the code:
## Get path of Script ##
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
files = os.listdir(dname)
for file in files:
if file.endswith('Kobo.xlsx'):
## Define New Filename ##
filename = os.path.basename(file)
size = len(filename)
new_filename = (filename[:size - 9]) + "Sorted.xlsx"
## Pandas Things ... ##
#### Exporting ####
writer = pd.ExcelWriter(new_filename, engine='xlsxwriter')
writer.save()
Sometimes when I run the script, I get the following error:
writer = pd.ExcelWriter(new_filename, engine='xlsxwriter')
NameError: name 'new_filename' is not defined
Sometimes when I run the script, I don't. It seems the error only comes up for certain filenames (different directories don't matter). However, If I go through the command prompt to get the new filename, it works in both cases (for the filename that the script worked with, and that the script didn't work with).
How do I avoid the error indefinitely? Thanks in advance - sorry if it's a silly question.
You don't want to do the write unless you already did the read, so those last two lines need to be indented to be part of the if statement. If the very first file you read does not end in Kobo.xlsx, then you'll try to do the write, but the variable new_filename was never created.
Related
This is likely a fundamental Python question, but I'm stumped (still learning). My script uses Pandas to create txt files from csv cells, and works properly. However, I'd like to write the files to a specific directory, listed as save_path below. However, my efforts to put this together keep running into errors.
Here's my (not) working code:
import os
import pandas as pd
save_path = "C:\users\name\folder\txts"
df= pd.read_csv("C:\users\name\folder\test.csv", sep=",")
df2 = df.fillna('')
for index in range(len(df)):
with open(df2["text_number"][index] + '.txt', 'w') as output:
output2 = os.path.join(save_path, output) # I'm uncertain how to structure or place the os.path.join command.
output2.write(df2["text"][index])
The resulting error is below:
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'TextIOWrapper'
Thoughts? Any assistance is greatly appreciated.
You need to first generate the file name and then open it in write mode to put the contents.
for index in range(len(df)):
# create file name
filename = df2["text_number"][index] + '.txt'
# then generate full path using os lib
full_path = os.path.join(save_path, filename)
# now open that file, dont forget to use w+ to create the file if it doesn't exist
with open(full_path, 'w+') as output_file_handler:
# and write the contents
output_file_handler.write(df2["text"][index])
This should work.
(But you might want to check out this answer)
for index in range(len(df)):
filename = df2["text_number"][index] + '.txt'
fp = os.path.join(save_path, filename)
with open(fp, 'w') as output:
output.write(df2["text"][index])
My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')
I am trying to create a file in the folder I running my .py script. This is the code I am using. The problem is that the open function requires / for directories. new_file_path uses \ instead. This is causing the open function to fail. How do I fix this?
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
new_file_path = str(os.path.join(dir_path, 'mynewfile.txt'))
open(new_file_path, "x")
First of all, as #buran commented, there is no need to use str, the following suffices:
new_file_path = os.path.join(dir_path, 'mynewfile.txt')
There is a distinction between where the script exists, given by __file__, and the current working directory (usually the place from which the script was invoked), given by os.getcwd(). It is not entirely clear from the question wording which one was intended, although they are often the same. But not in the following case:
C:\Booboo>python3 test\my_script.py
But they are in the following case:
C:\Booboo>python3 my_script.py
But if you are trying to open a file in the current working directory, why should you even be bothering with making a call to os.getcwd()? Opening a file without specifying any directory should by definition place the file in the current working directory:
import os
with open('mynewfile.txt', "x") as f:
# do something with file f (it will be closed for you automatically when the block terminates
Another problem may be that you are opening the file with an invalid flag, "x" if the file already exists. Try, "w":
with open(new_file_path, "w") as f:
# do something with file f (it will be closed for you automatically when the block terminates
have you tried simply
import os
dir_path = os.getcwd()
open(dir_path +'mynewfile.txt', "x")
Edit: Sorry for the last message, it saved incomplete
you need to use os.getcwd to get the crrent working directory
import os
dir_path = os.getcwd()
new_file_path = str(os.path.join(dir_path, 'mynewfile.txt'))
open(new_file_path, "x")
I'm trying to save dataframes to csv files, but using a sub-directory to allocate it to gives an error, FileNotFoundError: [Errno 2] No such file or directory
Using absolute path works and by running everything, then commenting out the rest of my script (not clearing the variables) then only running the code to save the dataframe, using sub-directories works.
So the path directories that i have used before saving the dataframe (in the same script), somehow affects it.
Here is a very general code of my script
check = "Data\\"
def my_function(directory):
(myfunction)
return allvars;
directories = [os.path.abspath(x[0]) for x in os.walk(check)]
directories.remove(os.path.abspath(check))
list_of_df = []
for i in directories:
try:
os.chdir(i)
x = my_function(i)
list_of_df .append(x)
except(ValueError):
continue
#specific savepath needed otherwise error!
savepath = 'MY_ABSOLUTE_PATH\\'
for a, b in enumerate(list_of_df ):
# b.to_csv(savepath+dataframe{}.csv'.format(a)) #WORKS
b.to_csv('Data\\Random\\dataframe{}.csv'.format(a)) #ERROR
to_csv does create the file if it doesn't exist as you said, but it does not create directories that don't exist. Ensure that the subdirectory you are trying to save your file within has been created first.
I often do something like this in my work:
import os
outname = 'name.csv'
outdir = './dir'
if not os.path.exists(outdir):
os.mkdir(outdir)
fullname = os.path.join(outdir, outname)
df.to_csv(fullname)
This can easily be wrapped up in a function if you need to do this frequently.
I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.
import glob
import os
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(output_filename, 'w')
# Process the data
...
To output the resulting files in a separate directory I would:
import glob
import os
output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(os.path.join(output_dir, output_filename), 'w')
# Process the data
...
You don't need write shell script,
maybe this question will help you?
How to list all files of a directory?
It depends on how you implement the iteration logic.
If you want to implement it in python, just do it;
If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.
I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:
pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered
file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want
You need to import os to use the os.listdir command.
You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example
import os
import glob
for file in glob.glob('*.py'):
data = open(file, 'r+')
output_name = os.path.splitext(file)[0]
output = open(output_name+'.txt', 'w')
output.write(data.read())
This code will read the content from input and store it in outputfile.