I am trying to search for a very specific string in a folder full of binary files. The goal is to have the program open each binary file, search for the specific string and then print out file that the string is located in.
I think I have something that is close to working, but is not there yet. I was playing the bytes on the string I want to search but I still am not finding anything. I have also tried struct.uppack but that didn't seem to work either.
Any help is much appreciated. Thank you for your time.
Code:
import os
toSearch =bytes("find me","unicode_escape")
folderToSearch = "C:\\dir\\for\\bin\\files"
for root, dirs, files in os.walk(folderToSearch):
for file in files:
if file.endswith(".ROM"):
with open(root+"\\"+file,"rb") as binary_file:
fileContent = binary_file.read()
if fileContent.find(toSearch) != -1:
print(os.path.join(root, file))
I'm not sure why using find() doesn't work, but the following does on my system:
import os
toSearch = b"find me"
folderToSearch = "C:\\dir\\for\\bin\\files"
for root, dirs, files in os.walk(folderToSearch):
for file in files:
if file.endswith(".ROM"):
print(f'checking file {file}')
filepath = os.path.join(root, file)
with open(filepath, "rb") as binary_file:
fileContent = binary_file.read()
if toSearch in fileContent:
print(filepath)
print('done')
This might help you do some debugging. (I also refactored your code to use pathlib instead of os to make it cleaner).
from pathlib import Path
encoding = "unicode_escape"
search_dir = Path("C:\\dir\\for\\bin\\files")
search_bytes = bytes("find me", encoding)
roms = {"match": [], "no_match": []}
for rom_file in search_dir.glob("**/*.ROM"):
with open(rom_file, 'rb') as rom_handle:
rom_contents = rom_handle.read()
match = "match" if (search_bytes in rom_contents) else "no_match"
roms[match].append({
str(rom_file.resolve()): rom_contents
})
If you run this, you can manually inspect the bytes that are read in for matching/non-matching results.
Related
My folder structure is: C:/Users/Desktop/SampleTestFiles/ProjectFiles/ExceptionLogFiles/
Using below code, I am trying to create file in ExceptionLogFiles folder if file Exceptionlog.txt does not exists and if file exists then open the file and write some text to the file. But for some reason code is unable to detect the relative path.
Please can anyone help me in correcting code:
fileDir = 'C:/Users/Desktop/SampleTestFiles'
filename = os.path.join(fileDir, '\..\ExceptionLogFiles\ExceptionLog.txt')
#print(filename) gives: C:/Users/Desktop/SampleTestFiles/../ExceptionLog.txt
if os.path.exists(filename):
print(filename, 'exists')
#Open file and write something to the file
f = open(file, 'w')
f.write("Exception Text")
f.close()
else:
print('file not exists')
#Create File and Write something to the file.
f = open(file, 'w+')
f.write("Exception Text")
f.close()
What you tried to do was kind of like this, in an addition-like fashion
(
C:/Users/Desktop/SampleTestFiles
+
.. (which is up one directory)
)
+ ExceptionLogFiles\ExceptionLog.txt
The "parenthesized" addition will actually resolve to C:/Users/Desktop/, and we add ExceptionLogFiles\ExceptionLog.txt' to that. So we'd be looking at: `C:/Users/Desktop/ExceptionLogFiles\ExceptionLog.txt'
However, even if you dropped the ..\ from your string, those backslashes don't become literal backslashes in a string without you escaping them.
Try this (and NOTE the backslashes are doubled so as to escape backslash, which is the escape character!)
fileDir = 'C:/Users/Desktop/SampleTestFiles'
filename = os.path.join(fileDir, 'ExceptionLogFiles\\ExceptionLog.txt')
Looks like you're looking for normpath
import os
fileDir = 'C:/Users/Desktop/SampleTestFiles'
filename = os.path.join(fileDir, '../ExceptionLogFiles/ExceptionLog.txt')
print(filename)
print(os.path.normpath(filename))
result:
C:/Users/Desktop/SampleTestFiles/../ExceptionLogFiles/ExceptionLog.txt
C:/Users/Desktop/ExceptionLogFiles/ExceptionLog.txt
You can use "with open('path','a+') as f", whatever file exists or not,you can write something into it.
I want to write a program for this: In a folder I have n number of files; first read one file and perform some operation then store result in a separate file. Then read 2nd file, perform operation again and save result in new 2nd file. Do the same procedure for n number of files. The program reads all files one by one and stores results of each file separately. Please give examples how I can do it.
I think what you miss is how to retrieve all the files in that directory.
To do so, use the glob module.
Here is an example which will duplicate all the files with extension *.txt to files with extension *.out
import glob
list_of_files = glob.glob('./*.txt') # create the list of file
for file_name in list_of_files:
FI = open(file_name, 'r')
FO = open(file_name.replace('txt', 'out'), 'w')
for line in FI:
FO.write(line)
FI.close()
FO.close()
import sys
# argv is your commandline arguments, argv[0] is your program name, so skip it
for n in sys.argv[1:]:
print(n) #print out the filename we are currently processing
input = open(n, "r")
output = open(n + ".out", "w")
# do some processing
input.close()
output.close()
Then call it like:
./foo.py bar.txt baz.txt
You may find the fileinput module useful. It is designed for exactly this problem.
I've just learned of the os.walk() command recently, and it may help you here.
It allows you to walk down a directory tree structure.
import os
OUTPUT_DIR = 'C:\\RESULTS'
for path, dirs, files in os.walk('.'):
for file in files:
read_f = open(os.join(path,file),'r')
write_f = open(os.path.join(OUTPUT_DIR,file))
# Do stuff
Combined answer incorporating directory or specific list of filenames arguments:
import sys
import os.path
import glob
def processFile(filename):
fileHandle = open(filename, "r")
for line in fileHandle:
# do some processing
pass
fileHandle.close()
def outputResults(filename):
output_filemask = "out"
fileHandle = open("%s.%s" % (filename, output_filemask), "w")
# do some processing
fileHandle.write('processed\n')
fileHandle.close()
def processFiles(args):
input_filemask = "log"
directory = args[1]
if os.path.isdir(directory):
print "processing a directory"
list_of_files = glob.glob('%s/*.%s' % (directory, input_filemask))
else:
print "processing a list of files"
list_of_files = sys.argv[1:]
for file_name in list_of_files:
print file_name
processFile(file_name)
outputResults(file_name)
if __name__ == '__main__':
if (len(sys.argv) > 1):
processFiles(sys.argv)
else:
print 'usage message'
from pylab import *
import csv
import os
import glob
import re
x=[]
y=[]
f=open("one.txt",'w')
for infile in glob.glob(('*.csv')):
# print "" +infile
csv23=csv2rec(""+infile,'rb',delimiter=',')
for line in csv23:
x.append(line[1])
# print len(x)
for i in range(3000,8000):
y.append(x[i])
print ""+infile,"\t",mean(y)
print >>f,""+infile,"\t\t",mean(y)
del y[:len(y)]
del x[:len(x)]
I know I saw this double with open() somewhere but couldn't remember where. So I built a small example in case someone needs.
""" A module to clean code(js, py, json or whatever) files saved as .txt files to
be used in HTML code blocks. """
from os import listdir
from os.path import abspath, dirname, splitext
from re import sub, MULTILINE
def cleanForHTML():
""" This function will search a directory text files to be edited. """
## define some regex for our search and replace. We are looking for <, > and &
## To replaced with &ls;, > and &. We might want to replace proper whitespace
## chars to as well? (r'\t', ' ') and (f'\n', '<br>')
search_ = ((r'(<)', '<'), (r'(>)', '>'), (r'(&)', '&'))
## Read and loop our file location. Our location is the same one that our python file is in.
for loc in listdir(abspath(dirname(__file__))):
## Here we split our filename into it's parts ('fileName', '.txt')
name = splitext(loc)
if name[1] == '.txt':
## we found our .txt file so we can start file operations.
with open(loc, 'r') as file_1, open(f'{name[0]}(fixed){name[1]}', 'w') as file_2:
## read our first file
retFile = file_1.read()
## find and replace some text.
for find_ in search_:
retFile = sub(find_[0], find_[1], retFile, 0, MULTILINE)
## finally we can write to our newly created text file.
file_2.write(retFile)
This thing also works for reading multiple files, my file name is fedaralist_1.txt and federalist_2.txt and like this, I have 84 files till fedaralist_84.txt
And I'm reading the files as f.
for file in filename:
with open(f'federalist_{file}.txt','r') as f:
f.read()
I'm trying to use python to search for a string in a folder which contains multiple .txt files.
My objective is to find those files containing the string and move/or re-write them in another folder.
what I have tried is:
import os
for filename in os.listdir('./*.txt'):
if os.path.isfile(filename):
with open(filename) as f:
for line in f:
if 'string/term to be searched' in line:
f.write
break
probably there is something wrong with this but, of course, cannot figure it out.
os.listdir argument must be a path, not a pattern. You can use glob to accomplish that task:
import os
import glob
for filename in glob.glob('./*.txt'):
if os.path.isfile(filename):
with open(filename) as f:
for line in f:
if 'string/term to be searched' in line:
# You cannot write with f, because is open in read mode
# and must supply an argument.
# Your actions
break
As Antonio says, you cannot write with f because it is open in read mode.
A possible solution to avoid the problem is the following:
import os
import shutil
source_dir = "your/source/path"
destination_dir = "your/destination/path"
for top, dirs, files in os.walk(source_dir):
for filename in files:
file_path = os.path.join(top, filename)
check = False
with open(file_path, 'r') as f:
if 'string/term to be searched' in f.read():
check = True
if check is True:
shutil.move(file_path, os.path.join(destination_dir , filename))
Remember that if your source_dir or destination_dir contains some "special characters" you have to put the double back-slash.
For example, this:
source_dir = "C:\documents\test"
should be
source_dir = "C:\documents\\test"
Currently I am trying to write a function will walk through the requested directory and print all the text of all the files.
Right now, the function works in displaying the file_names as a list so the files surely exist (and there is text in the files).
def PopularWordWalk (starting_dir, word_dict):
print ("In", os.path.abspath(starting_dir))
os.chdir(os.path.abspath(starting_dir))
for (this_dir,dir_names,file_names) in os.walk(starting_dir):
for file_name in file_names:
fpath = os.path.join(os.path.abspath(starting_dir), file_name)
fileobj = open(fpath, 'r')
text = fileobj.read()
print(text)
Here is my output with some checking of the directory contents:
>>> PopularWordWalk ('text_dir', word_dict)
In /Users/normanwei/Documents/Python for Programmers/Homework 4/text_dir
>>> os.listdir()
['.DS_Store', 'cats.txt', 'zen_story.txt']
the problem is that whenever i try to print the text, i get nothing. eventually I want to push the text through some other functions but as of now it seems moot without any text. Can anyone lend any experience on why no text is appearing? (when trying to open files/read/storing&printing text manually in idle it works i.e. if I just manually inputted 'cats.txt' instead of 'file_name') - currently running python 3.
EDIT - The question has been answered - just have to remove the os.chdir line - see jojo's answer for explanation.
This line won't work
file = open(file_name, 'r')
Because it would require that these files exist in the same folder you are running the script from. You would have to provide the path to those files, as well as the file names
with open(os.path.join(starting_dir,file_name), 'r') as file:
#do stuff
This way it will build the full path from the directory and the file name.
If you do os.chdir(os.path.abspath(starting_dir)) you go into starting_dir. Then for (this_dir,dir_names,file_names) in os.walk(starting_dir): will loop over nothing since starting_dir is not in starting_dir.
Long story short, comment the line os.chdir(os.path.abspath(starting_dir)) and you should be good.
Alternatively if you want to stick to the os.chdir, this should do the job:
def PopularWordWalk (starting_dir, word_dict):
print ("In", os.path.abspath(starting_dir))
os.chdir(os.path.abspath(starting_dir))
for (this_dir,dir_names,file_names) in os.walk('.'):
for file_name in file_names:
fpath = os.path.join(os.path.abspath(starting_dir), file_name)
with open(fpath, 'r') as fileobj:
text = fileobj.read()
print(text)
You'll want to join the root path with the file path. I'd change:
file = open(file_name, 'r')
to
fpath = os.path.join(this_dir, file_name)
file = open(fpath, 'r')
You may also want to use another word to describe it than file as that's a built-in function in Python. I'd recommend fileobj.
Just to add on to the previous answer, you will have to join the absolute path and the relative path of the walk.
Try this:
fpath = os.path.abspath(os.path.join(this_dir, file_name))
f = open(fpath, 'r')
I have a directory of text files that all have the extension .txt. My goal is to print the contents of the text file. I wish to be able use the wildcard *.txt to specify the file name I wish to open (I'm thinking along the lines of something like F:\text\*.txt?), split the lines of the text file, then print the output.
Here is an example of what I want to do, but I want to be able to change somefile when executing my command.
f = open('F:\text\somefile.txt', 'r')
for line in f:
print line,
I had checked out the glob module earlier, but I couldn't figure out how to actually do anything to the files. Here is what I came up with, not working.
filepath = "F:\irc\as\*.txt"
txt = glob.glob(filepath)
lines = string.split(txt, '\n') #AttributeError: 'list' object has no attribute 'split'
print lines
import os
import re
path = "/home/mypath"
for filename in os.listdir(path):
if re.match("text\d+.txt", filename):
with open(os.path.join(path, filename), 'r') as f:
for line in f:
print line,
Although you ignored my perfectly fine solution, here you go:
import glob
path = "/home/mydir/*.txt"
for filename in glob.glob(path):
with open(filename, 'r') as f:
for line in f:
print line,
You can use the glob module to get a list of files for wildcards:
File Wildcards
Then you just do a for-loop over this list and you are done:
filepath = "F:\irc\as\*.txt"
txt = glob.glob(filepath)
for textfile in txt:
f = open(textfile, 'r') #Maybe you need a os.joinpath here, see Uku Loskit's answer, I don't have a python interpreter at hand
for line in f:
print line,
This code accounts for both issues in the initial question: seeks for the .txt file in the current directory and then allows the user to search for some expression with the regex
#! /usr/bin/python3
# regex search.py - opens all .txt files in a folder and searches for any line
# that matches a user-supplied regular expression
import re, os
def search(regex, txt):
searchRegex = re.compile(regex, re.I)
result = searchRegex.findall(txt)
print(result)
user_search = input('Enter the regular expression\n')
path = os.getcwd()
folder = os.listdir(path)
for file in folder:
if file.endswith('.txt'):
print(os.path.join(path, file))
txtfile = open(os.path.join(path, file), 'r+')
msg = txtfile.read()
search(user_search, msg)
Check out "glob — Unix style pathname pattern expansion"
http://docs.python.org/library/glob.html
This problem just came up for me and I was able to fix it with pure python:
Link to the python docs is found here: 10.8. fnmatch — Unix filename pattern matching
Quote: "This example will print all file names in the current directory with the extension .txt:"
import fnmatch
import os
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
print(file)