My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')
Related
I have multiple folders, in a common parent folder, say 'work'. Inside that, I have multiple sub-folders, named 'sub01', 'sub02', etc. All the folders have same files inside, for eg, mean.txt, sd.txt.
I have to add contents of all 'mean.txt' into a single file. I am stuck with, how to open subfolder one by one. Thanks.
getting all files as a list
g = open("new_file", "a+")
for files in list:
f = open(files, 'r')
g.write(f.read())
f.close()
g.close()
I am not getting how to get a list of all files in the subfolder, to make this work
************EDIT*********************
found a solution
os.walk() helped, but had a problem, it was random (it didn't iterate in alphabetical order)
had to use sort to make it in order
import os
p = r"/Users/xxxxx/desktop/bianca_test/" # main_folder
list1 = []
for root, dirs, files in os.walk(p):
if root[-12:] == 'native_space': #this was the sub_folder common in all parent folders
for file in files:
if file == "perfusion_calib_gm_mean.txt":
list1.append(os.path.join(root, file))
list1.sort() # os.walk() iterated folders randomly; this is to overcome that
f = open("gm_mean.txt", 'a+')
for item in list1:
g = open(item, 'r')
f.write(g.read())
print("writing", item)
g.close()
f.close()
Thanks to all who helped.
As i understand it you want to collate all 'mean.txt' files into one file. This should do the job but beware there is no ordering to which file goes where. Note also i'm using StringIO() to buffer all the data since strings are immutable in Python.
import os
from io import StringIO
def main():
buffer = StringIO()
for dirpath, dirnames, filenames in os.walk('.'):
if 'mean.txt' in filenames:
fp = os.path.join(dirpath, 'mean.txt')
with open(fp) as f:
buffer.write(f.read())
all_file_contents = buffer.getvalue()
print(all_file_contents)
if __name__ == '__main__':
main()
Here's a pseudocode to help you get started. Try to google, read and understand the solutions to get better as a programmer:
open mean_combined.txt to write mean.txt contents
open sd_combined.txt to write sd.txt contents
for every subdir inside my_dir:
for every file inside subdir:
if file.name is 'mean.txt':
content = read mean.txt
write content into mean_combined.txt
if file.name is 'sd.txt':
content = read sd.txt
write content into sd_combined.txt
close mean_combined.txt
close sd_combined.txt
You need to look up how to:
open a file to read its contents (hint: use open)
iterate files inside directory (hint: use pathlib)
write a string into a file (hint: read Input and Output)
use context managers for releasing resources (hint: read with statement)
I was working on saving text to different files. so, now I already created several files and each text file has some texts/paragraph in it. Now, I just want to save these files to a directory. I already created a self-defined directory, but now it is empty. I want to save these text files into my directory.
The partial code is below:
for doc in root:
docID = doc.find('DOCID').text.strip()
text = doc.find('TEXT').text,strip()
f = open("%s" %docID, 'w')
f.write(str(text))
Now, I created all the files with text in it. and I also have a blank folder/directory now. I just don't know how to put these files into the directory.
I would be appreciate it.
========================================================================
[Solved] Thank you guys for your all helping! I figured it out. I just edit my summary here. I got a few problems.
1. my docID was saved as tuple. I need to convert to string without any extra symbol. here is the reference i used: https://stackoverflow.com/a/17426417/9387211
2. I just created a new path and write the text to it. i used this method: https://stackoverflow.com/a/8024254/9387211
Now, I can share my updated code and there is no more problem here. Thanks everyone again!
for doc in root:
docID = doc.find('DOCID').text.strip()
did = ''.join(map(str,docID))
text = doc.find('TEXT').text,strip()
txt = ''.join(map(str,docID))
filename = os.path.join(dst_folder_path, did)
f = open(filename, 'w')
f.write(str(text))
Suppose you have all the text files in home directory (~/) and you want to move them to /path/to/dir folder.
from shutil import copyfile
import os
docid_list = ['docid-1', 'docid-2']
for did in docid_list:
copyfile(did, /path/to/folder)
os.remove(did)
It will copy the docid files in /path/to/folder path and remove the files from the home directory (assuming you run this operation from home dir)
You can frame the file path for open like
doc_file = open(<file path>, 'w')
I'm trying to come up with a way for the filenames that I'm reading to have the same filename as what I'm writing. The code is currently reading the images and doing some processing. My output will be extracting the data from that process into a csv file. I want both the filenames to be the same. I've come across fname for matching, but that's for existing files.
So if your input file name is in_file = myfile.jpg do this:
my_outfile = "".join(infile.split('.')[:-1]) + 'csv'
This splits infile into a list of parts that are separated by '.'. It then puts them back together minus the last part, and adds csv
your my_outfile will be myfile.csv
Well in python it's possible to do that but, the original file might be corrupted if we were to have the same exact file name i.e BibleKJV.pdf to path BibleKJV.pdf will corrupt the first file. Take a look at this script to verify that I'm on the right track (if I'm totally of disregard my answer):
import os
from PyPDF2 import PdfFileReader , PdfFileWriter
path = "C:/Users/Catrell Washington/Pride"
input_file_name = os.path.join(path, "BibleKJV.pdf")
input_file = PdfFileReader(open(input_file_name , "rb"))
output_PDF = PdfFileWriter()
total_pages = input_file.getNumPages()
for page_num in range(1,total_pages) :
output_PDF.addPage(input_file.getPage(page_num))
output_file_name = os.path.join(path, "BibleKJV.pdf")
output_file = open(output_file_name , "wb")
output_PDF.write(output_file)
output_file.close()
When I ran the above script, I lost all data from the original path "BibleKJV.pdf" thus proving that if the file name and the file delegation i.e .pdf .cs .word etc, are the same then the data, unless changed very minimal, will be corrupted.
If this doesn't give you any help please, edit your question with a script of what you're trying to achieve.
I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.
import glob
import os
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(output_filename, 'w')
# Process the data
...
To output the resulting files in a separate directory I would:
import glob
import os
output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(os.path.join(output_dir, output_filename), 'w')
# Process the data
...
You don't need write shell script,
maybe this question will help you?
How to list all files of a directory?
It depends on how you implement the iteration logic.
If you want to implement it in python, just do it;
If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.
I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:
pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered
file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want
You need to import os to use the os.listdir command.
You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example
import os
import glob
for file in glob.glob('*.py'):
data = open(file, 'r+')
output_name = os.path.splitext(file)[0]
output = open(output_name+'.txt', 'w')
output.write(data.read())
This code will read the content from input and store it in outputfile.
I want to open a file to write to.
with open('test.txt','a') as textfile:
...
It works like this.
Now I want this file to be opened/created from a directory called args.runkeyword.
with open(os.path.join(args.runkeyword, 'test.txt'),'a') as textfile:
t says it can't find test/test.txt (supposing runkeyword is test).
I also tried by appending with os.getcwd() but it still can't find or create the file.
Any ideas?
os.getcwd() is irrelevant on your work actually. Use os.listdir() to see every folder in a directory. If anything named by test before it may be problem.
A recursive function like this may usefull for you;
import os
def tara(directory):
start = os.getcwd()
files = []
os.chdir(directory)
for oge in os.listdir(os.curdir):
if not os.path.isdir(oge):
files.append(oge)
else:
files.extend(tara(oge))
os.chdir(start)
return files
file = open('test.txt', 'a+')
You should have 'a+' not 'a', the + allows you to append.