Load files from a list of file paths in python - python

I have a text file with a couple hundred file paths to text files which I would like to open, write / cut up pieces from it and save under a new name.
I've been Googling how to do this and found the module glob, but I can't figure out exactly how to use this.
Could you guys point me in the right direction?

If you have specific paths to files, you won't need to glob module. The glob module is useful when you want to use path like /user/home/someone/pictures/*.jpg. From what I understand you have a file with normal paths.
You can use this code as a start:
with open('file_with_paths', 'r') as paths_list:
for file_path in paths_list:
with open(file_path, 'r') as file:
# Do what you want with one of the files here.

You can just traverse the file line by line and then take out what you want from that name. Later save/create it . Below sample code might help
with open('file_name') as f:
for file_path in f:
import os
file_name = os.path.basename(file_path)
absolute path = os.path.dirname(file_path)
# change whatever you want to with above two and save the file
# os.makedirs to create directry
# os.open() in write mode to create the file
Let me know if it helps you

Related

Python search files in multiple subdirectories for specific string and return file path(s) if present

I would be very grateful indeed for some help for a frustrated and confused Python beginner.
I am trying to create a script that searches a windows directory containing multiple subdirectories and different file types for a specific single string (a name) in the file contents and if found prints the filenames as a list. There are approximately 2000 files in 100 subdirectories, and all the files I want to search don't necessarily have the same extension - but are all in essence, ASCII files.
I've been trying to do this for many many days but I just cannot figure it out.
So far I have tried using glob recursive coupled with reading the file but I'm so very bewildered. I can successfully print a list of all the files in all subdirectories, but don't know where to go from here.
import glob
files = []
files = glob.glob('C:\TEMP' + '/**', recursive=True)
print(files)
Can anyone please help me? I am 72 year old scientist trying to improve my skills and "automate the boring stuff", but at the moment I'm just losing the will.
Thank you very much in advance to this community.
great to have you here!
What you have done so far is found all the file paths, now the simplest way is to go through each of the files, read them into the memory one by one and see if the name you are looking for is there.
import glob
files = glob.glob('C:\TEMP' + '/**', recursive=True)
target_string = 'John Smit'
# itereate over files
for file in files:
try:
# open file for reading
with open(file, 'r') as f:
# read the contents
contents = f.read()
# check if contents have your target string
if target_string in conents:
print(file)
except:
pass
This will print the file path each time it found a name.
Please also note I have removed the second line from your code, because it is redundant, you initiate the list in line 3 anyway.
Hope it helps!
You could do it like this, though i think there must be a better approach
When you find all files in your directory, you iterate over them and check if they contain that specific string.
for file in files:
if(os.path.isfile(file)):
with open(file,'r') as f:
if('search_string' in f.read()):
print(file)

Edit multiple text files, and save as new files

My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')

Creating empty text files with specific names in a directory

I have two directories:
dir = path/to/annotations
and
dir_img = path/to/images
The format of image names in dir_img is image_name.jpg.
I need to create empty text files in dir as: image_name.txt, wherein I can later store annotations corresponding to the images. I am using Python.
I don't know how to proceed. Any help is highly appreciated. Thanks.
[Edit]: I tried the answer given here. It ran without any error but didn't create any files either.
This should create empty files for you and then you can proceed further.
import os
for f in os.listdir(source_dir):
if f.endswith('.jpg'):
file_path = os.path.join(target_dir, f.replace('.jpg', '.txt'))
with open(file_path, "w+"):
pass
You can use the module os to list the existing files, and then just open the file in mode w+ which will create the file even if you're not writing anything into it. Don't forget to close your file!
import os
for f in os.listdir(source_dir):
if f.endswith('.jpg'):
open(os.path.join(target_dir, f.replace('.jpg', '.txt')), 'w+').close()

How can I run a python script on many files to get many output files?

I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.
import glob
import os
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(output_filename, 'w')
# Process the data
...
To output the resulting files in a separate directory I would:
import glob
import os
output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(os.path.join(output_dir, output_filename), 'w')
# Process the data
...
You don't need write shell script,
maybe this question will help you?
How to list all files of a directory?
It depends on how you implement the iteration logic.
If you want to implement it in python, just do it;
If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.
I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:
pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered
file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want
You need to import os to use the os.listdir command.
You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example
import os
import glob
for file in glob.glob('*.py'):
data = open(file, 'r+')
output_name = os.path.splitext(file)[0]
output = open(output_name+'.txt', 'w')
output.write(data.read())
This code will read the content from input and store it in outputfile.

Opening a file in a folder in Python

I want to open a file to write to.
with open('test.txt','a') as textfile:
...
It works like this.
Now I want this file to be opened/created from a directory called args.runkeyword.
with open(os.path.join(args.runkeyword, 'test.txt'),'a') as textfile:
t says it can't find test/test.txt (supposing runkeyword is test).
I also tried by appending with os.getcwd() but it still can't find or create the file.
Any ideas?
os.getcwd() is irrelevant on your work actually. Use os.listdir() to see every folder in a directory. If anything named by test before it may be problem.
A recursive function like this may usefull for you;
import os
def tara(directory):
start = os.getcwd()
files = []
os.chdir(directory)
for oge in os.listdir(os.curdir):
if not os.path.isdir(oge):
files.append(oge)
else:
files.extend(tara(oge))
os.chdir(start)
return files
file = open('test.txt', 'a+')
You should have 'a+' not 'a', the + allows you to append.

Categories

Resources