Find files that had been stored from a single text file - python

Is there any way for me to read file that I saved inside a text file using python?
For example I have a file called filenames.txt. The content of the file should have name of other files such as:
/home/ikhwan/acespc.c
/home/ikhwan/trloc.cpp
/home/ikhwan/Makefile.sh
/home/ikhwan/Readme.txt
So, theoretically what I want to do is I have a Python script to change some header of the file. So filenames.txt will act as a platform for me whenever I want to run the script to change only selected file. The reason is I have so many files inside directory and subdirectories and I just want python to read only the files that I put inside filenames.txt and only change that particular file. In the future, if I want to run the script on other files, I just can add or replace filenames in filenames.txt
So the flow of the script will be as follows:
Run script-->script start search for the filenames inside filenames.txt-->script will add or change header of the file.
Current, i used os.walk but it will search within all directory and subdirectory. Here are my current function.
def read_file(file):
skip = 0
headStart = None
headEnd = None
yearsLine = None
haveLicense = False
extension = os.path.splitext(file)[1]
logging.debug("File extension is %s",extension)
type = ext2type.get(extension)
logging.debug("Type for this file is %s",type)
if not type:
return None
settings = typeSettings.get(type)
with open(file,'r') as f:
lines = f.readlines()

You don't need to walk through the file system if you already have your file paths listed in the filenames.txt, just open it, read it line by line and then process each file path from it, e.g.
# this is your method that will be called with each file path from the filenames.txt
def process_file(path):
# do whatever you want with `path` in terms of processing
# let's just print it to STDOUT as an example
with open(path, "r") as f:
print(f.read())
with open("filenames.txt", "r") as f: # open filenames.txt for reading
for line in f: # read filenames.txt line by line
process_file(line.rstrip()) # send the path stored on the line to process_file()

Related

Python change strings line by line in all files within a directory

So I have some files located in directory.
Some of the files contain paths like this and some are empty: C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module.c
What would be the best way to cut it just by counting backslashes from the end: So in this case we need to cut everything what is after 4th backslash when counting backward:
Folder1\Folder2\Folder3\Module.c
I need some function that will go through all files and do this on each line of a file.
Current code which do not work for some reason is:
directory = os.listdir(//path_to_dir//)
for file in directory:
with open (file) as f:
for s in f:
print('\\'.join(s.split('\\')[-4:]))
I would try something like this:
from pathlib import Path
def change(s):
return '\\'.join(s.split('\\')[-4:])
folder = Path.cwd() / "folder" # here is your folder with files
files = folder.glob("*")
for f in files:
with open(f, "r") as file:
content = file.read()
lines = content.split('\n')
new_lines = []
for line in lines:
new_lines.append(change(line))
with open(f, "w") as file:
file.write("\n".join(new_lines))
It look for all files in the subfolder folder, does replacing on every line of every file and saves the files.

Getting FileNotFoundError when trying to open a file for reading in Python 3

I am using the OS module to open a file for reading, but I'm getting a FileNotFoundError.
I am trying to
find all the files in a given sub-directory that contain the word "mda"
for each of those files, grab the string in the filename just after two "_"s (indicates a specific code called an SIC)
open that file for reading
will write to a master file for some Mapreduce processing later
When I try to do the opening, I get the following error:
File "parse_mda_SIC.py", line 16, in <module>
f = open(file, 'r')
FileNotFoundError: [Errno 2] No such file or directory:
'mda_3357_2017-03-08_1000230_000143774917004005__3357.txt'
I am suspicious the issue is either with the "file" variable or the fact that it is one directory down, but confused why this would occur when I am using OS to address that lower directory.
I have the following code :
working_dir = "data/"
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
f = open(file, 'r')
I would expect to be able to open the file without issue and then create my list from the data. Thanks for your help.
This should work for you. You need to append the directory because it sees it as just the file name at the top of your code and will look only in the directory where your code is located for that file name.
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
f = open(os.path.join(working_dir, file), 'r')
Also it's a good practice to open files using a context manager of with as it will handle closing your file when it is no longer needed:
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
with open(os.path.join(working_dir, file), 'r') as f:
# do stuff with f here
You need to append the directory, like this:
f = open(os.path.join(working_dir, file, 'r'))

Creating a .txt file of file directories

I am trying to create a .txt file of file directories at a location, remove the prefixes and save the text file.
I use the os.walk module to build a list of directories of a location into a .txt file. I always get the text file of the directories.
The part where it removes the prefixes of those lines of directories in the next chunk of code doesn't work. It creates its own .txt file (as it is supposed to) but it is always empty.
If there is a solution that does all of this in one .txt file and one block of code that would be even better!
Here is what I have so far, and I'm using dummy directories for privacy's sake.
import os
from datetime import datetime
# this is to create a filename with the timestamp_directory_list for a .txt file
now = datetime.now()
filename = datetime.now().strftime("%Y_%m_%d_%H_%M_%S_directory_list.txt")
# uses os module to walk the directories and files
# within a given location, then writes it line by line to a .txt file
with open(filename, "w") as directory_list:
for path, subdirs, files in os.walk(r"C:/Users"):
for filenameX in files:
f = os.path.join(path)
directory_list.write(str(f) + os.linesep)
# Open up .txt file, read a line, trim the prefix, then save it
# this is to create a filename with the timestamp_directory_list for a .txt file
trim = datetime.now().strftime("%Y_%m_%d_%H_%M_%S_trimmed_directories.txt")
def remove_prefix(text, prefix):
# Remove prefix from supplied text string
if prefix in text:
return text[len(prefix):]
return text
with open(filename, "r") as trim_input, \
open(trim, "a") as trim_output:
for line in trim_input:
print line
if "C" in line:
print line
trim_output = remove_prefix(trim_input, 'C')
trim_output.write(line+ os.linesep)
you've mixed up the variable names, actually I'd expect that if run it would raise some exceptions.
you have trim_output for both the output file and the trimmed line
you are calling remove_prefix on the "input_file_object" not on the line
you get the trimmed line (overriding, I think, the output file refference), but you write the (not trimmed) line to the output file
your code should be something like
with open(filename, "r") as trim_input, \
open(trim, "a") as trim_output:
for line in trim_input:
print line
if "C" in line:
# this if is a bit useless you have an another if inside the remove_prefix,
# also you are skyping all the lines without prefix
print line
trimmed_line = remove_prefix(line, 'C')
trim_output.write(trimmed_line+ os.linesep)
Later edit:
in order to have the code behave as stated in the initial description the "if" should not be present, and the code unindented by one level
also the remove_prefix is flawed
def remove_prefix(text, prefix):
# Remove prefix from supplied text string
if prefix in text:
# if prefix is "C", this is true for "Ctest" and also for "testC"
return text[len(prefix):] # but it removes the first chars
return text
it should be
def remove_prefix(text, prefix):
# Remove prefix from supplied text string
if text and text.startswith(prefix):
return text[len(prefix):]
return text

Python3 Walking A Directory and Writing To Files In That Directory

I have created a script that writes the binary content of my files to other files in the same directory that end with a certain extension. When doing this however I have come onto an issue, when I open the file and save the new file in a folder, it creates a new file. I'll show an example. There is a folder, FOLDER1 with test.py inside, inside the folder is a folder called OPTION1 with the files, bye.sh and hello. When I try to execute test.py, my program sees the files bye.sh and hello(hello's binary data is being copied to bye.sh) in OPTION1 and reads it, but creates a new file in the FOLDER1 alongside test.py creating a new bye.sh in FOLDER1, how do I make it overwrite the bye.sh in OPTION1? Just mentioning, I am running Mac OS X.
Code here:
import os
class FileHandler:
#import all needed modules
def __init__(self):
import os
#Replaces second file with the binary data in file one
"""
Trys To Check If a File Exists
If It exists, it will open the file that will be copied in binary form and the output file in reading/writing binary form
If the input file doesnt exist it will tell the user, then exit the program
Once the files are open, it reads the input file binary and copies it to the output file
Then it checks to see if the two binaries match which they should if it all worked correctly, the program will then return true
The Files are then closed
If it failed to try it will tell the user the file was skipped
"""
def replaceBinary(self,file_in="",file_out=""):
try:
if(os.path.isfile(file_in)):
in_obj = open(file_in,"rb") #Opens The File That Will Be Copied
out_obj = open(file_out,"wb+") #Opens The File That Will Be Written To
object_data = in_obj.read()#Get Contents of In File
out_obj.write(object_data)# Write Contents of In File To Out File
print("SPECIAL FILE:"+file_out)
print(os.getcwd())
if out_obj.read() == in_obj.read():
print("Its the same...")
return True
print("Done Copying...")
else:
print("Usage: Enter an input file and output file...")
print(file_in+" Doesn't Exist...")
raise SystemExit
return False
in_obj.close()
out_obj.close()
except:
print("File, "+file_out+" was skipped.")
"""
Procedurally continues down an entire location and all its locations in a tree like manner
Then For each file found it checks if it matches of the extensions
It checks a file for the extensions and if it has one of them, it calls the replace binary function
"""
def WalkRoot(self,file1="",root_folder=""):
for root,dirs,files in os.walk(root_folder):
for f in files:
print(os.path.abspath(f))
array = [".exe",".cmd",".sh",".bat",".app",".lnk",".command",".out",".msi",".inf",".com",".bin"]
for extension in array:
if(f.endswith(extension)):
print("Match|\n"+os.path.abspath(f)+"\n")
self.replaceBinary(file1,os.path.abspath(f))
break #If The right extension is met skip rest of extensions and move to next file
print("Done...")
def main():
thing = FileHandler()
#path = os.path.join(os.getcwd())
path = input(str("Enter path:"))
thing.WalkRoot("Execs/hello","/Users/me/Documents/Test")
main()
Thanks!
The list of files returned by os.walk() does not include directory information. However, the dirpath (which you call root) is updated as you go through the list. Quoting the manual,
filenames is a list of the names of the non-directory files in dirpath. Note
that the names in the lists contain no path components. To get a full path
(which begins with top) to a file or directory in dirpath, do
os.path.join(dirpath, name).[1]
So, to get the correct, full path to the files within your top-level directory, try replacing
self.replaceBinary(file1,os.path.abspath(f))
with
self.replaceBinary(file1, os.path.join(root, f))
[1] https://docs.python.org/3.5/library/os.html#os.walk

Python - create next file

I am writing a small script. The script creates .txt files. I do not want to replace existing files. So what I want python to do is to check if the file already exists. If it does not it can proceed. If the file does exists I would like python to increment the name and than check again if the file already exists. If the file does not already exist python may create it.
EXAMPLE:
current dir has these files in it:
file_001.txt
file_002.txt
I want python to see that the two files exists and make the next file:
file_003.txt
creating files can be done like this:
f = open("file_001.txt", "w")
f.write('something')
f.close()
checking if a file exists:
import os.path
os.path.isfile(fname)
If you want to check whether it's both a file and that it exist then use os.path.exists along with os.path.isfile. Or else just the former seems suffice. Following might help:
import os.path as op
print op.exists(fname) and op.isfile(fname)
or just print op.exists(fname)
Here is some code that will get the job done, I answered it myself.
import os.path
def next_file(filename):
"""
filename: string. Name only of the current file
returns: string. The name of the next file to be created
assumes the padding of the file is filename_001.txt The number of starting zeros does not not matter
"""
fill_exists = True
current = '001'
padding = len(current) # length of digits
file = '{}_{}.txt'.format(filename, current) # the actual name of the file, inlc. extension
while fill_exists:
if not os.path.isfile(file): # if the file does not already exist
f = open(file, 'w') # create file
f.write(filename)
f.close()
return 'Created new file: {}_{}.txt'.format(filename, current) # shows the name of file just created
else:
current = str(int(current)+1).zfill(padding) # try the next number
file = '{}_{}.txt'.format(filename, current)

Categories

Resources