Having trouble looping through os.walk(./) - python

I am trying to make a script which loops through the current directory and checks for each file if it has a clone(The copy of the same file). Here's the code:
import os
import filecmp
scan_result = os.walk("./")
checked_files = []
for parent, dirs, files in scan_result:
for file in files:
file = parent + "\\" + file
checked_files.append(file)
for par, d, f in scan_result:
for fi in f:
data = par + "\\" + fi
if data not in checked_files:
result = filecmp.cmp(file, data)
print(f"Comparing {file} with {data}")
if result:
print(f"Dupe Found for {file} and {data}")
else:
print(f"No luck for the file {file} with {data}")
But whenever I run the code the script only checks one file in the same folder without actually going into any other directories and ends abruptly. Any solution to do this. Thanks.

I am thanking #rdas for answering this question in the comments. os.walk() returns a generator object, so when I tried to iterate through that object twice in the same code, the iterator completely skips over all the directories. The simple solution to this problem is to just replace this line of code
scan_result = os.walk("./")
with
scan_result = list(os.walk("./"))

Related

How to use os.system to convert all files in a folder at once using external python script

I've managed to find out the method to convert a file from one file extension to another (.evtx to .xml) using an external script. Below is what I am using:
os.system("file_converter.py file1.evtx > file1.xml")
This successfully converts a file from .txt to .xml using the external script I called (file_converter.py).
I am now trying to find out a method on how I can use 'os.system' or perhaps another method to convert more than one file at once, I would like for my program to dive into a folder and convert all of the 10 files I have at once to .xml format.
The questions I have are how is this possible as os.system only takes 1 argument and I'm not sure on how I could make it locate through a directory as unlike the first file I converted was on my standard home directory, but the folder I want to access with the 10 files is inside of another folder, I am trying to find out a way to address this argument and for the conversion to be done at once, I also want the file name to stay the same for each individual file with the only difference being the '.xml' being changed from '.evtx' at the end.
The file "file_converter.py" is downloadable from here
import threading
import os
def file_converter(file):
os.system("file_converter.py {0} > {1}".format(file, file.replace(".evtx", ".xml")))
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
for file in os.listdir(base_dir):
threading.Thread(target=file_converter, args=(file,)).start()
Here my sample code.
You can generate multiple thread to run the operation "concurrently". The program will check for all files in the directory and convert it.
EDIT python2.7 version
Now that we have more information about what you want I can help you.
This program can handle multiple file concurrently from one folder, it check also into the subfolders.
import subprocess
import os
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
commands_to_run = list()
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
commands_to_run.append("C:\\Python27\\python.exe file_converter.py {0} > {1}".format(fullPath, fullPath.replace(".evtx", ".xml")))
return allFiles
print "Searching for files"
file_list(base_dir)
print "Running conversion"
processes = [subprocess.Popen(command, shell=True) for command in commands_to_run]
print "Waiting for converted files"
for process in processes:
process.wait()
print "Conversion done"
The subprocess module can be used in two ways:
subprocess.Popen: it run the process and continue the execution
subprocess.call: it run the process and wait for it, this function return the exit status. This value if zero indicate that the process terminate succesfully
EDIT python3.7 version
if you want to solve all your problem just implement the code that you share from github in your program. You can easily implement it as function.
import threading
import os
import Evtx.Evtx as evtx
import Evtx.Views as e_views
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
tmp_list.append(e_views.XML_HEADER)
tmp_list.append("<Events>")
for record in log.records():
try:
tmp_list.append(record.xml())
except Exception as e:
print(e)
tmp_list.append("</Events>")
with open(file_out, 'w') as final:
final.writelines(tmp_list)
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
threading.Thread(target=convert, args=(fullPath, fullPath.replace(".evtx", ".xml"))).start()
return allFiles
print("Searching and converting files")
file_list(base_dir)
If you want to show your files generate, just edit as above:
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
with open(file_out, 'a') as final:
final.write(e_views.XML_HEADER)
final.write("<Events>")
for record in log.records():
try:
final.write(record.xml())
except Exception as e:
print(e)
final.write("</Events>")
UPDATE
If you want to delete the '.evtx' files after the conversion you can simply add the following rows at the end of the convert function:
try:
os.remove(file_in)
except(Exception, ex):
raise ex
Here you just need to use try .. except because you run the thread only if the input value is a file.
If the file doesn't exist, this function throws an exception, so it's necessary to check os.path.isfile() first.
import os, sys
DIR = "D:/Test"
# ...or as a command line argument
DIR = sys.argv[1]
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
name, ext = os.path.splitext(f)
if ext == ".txt":
new_path = os.path.join(DIR, f"{name}.xml")
os.rename(path, new_path)
Iterates over a directory, and changes all text files to XML.

Find fileS and then find a string in those files

I have written a function that finds all of the version.php files in a path. I am trying to take the output of that function and find a line from that file. The function that finds the files is:
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
print os.path.join(root,file)
find_file()
There are several version.php files in the path and I would like to return a string from each of those files.
Edit:
Thank you for the suggestions, my implementation of the code didn't fit my need. I was able to figure it out by creating a list and passing each item to the second part. This may not be the best way to do it, I've only been doing python for a few days.
def cmsoutput():
fileList = []
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
fileList.append(os.path.join(root,file))
for path in fileList:
with open(path) as f:
for line in f:
if line.startswith("$wp_version ="):
version_number = line[15:20]
inst_path = re.sub('wp-includes/version.php', '', path)
version_number = re.sub('\';', '', version_number)
print inst_path + " = " + version_number
cmsoutput()
Since you want to use the output of your function, you have to return something. Printing it does not cut it. Assuming everything works it has to be slightly modified as follows:
import os
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
return os.path.join(root,file)
foundfile = find_file()
Now variable foundfile contains the path of the file we want to look at. Looking for a string in the file can then be done like so:
with open(foundfile, 'r') as f:
content = f.readlines()
for lines in content:
if '$wp_version =' in lines:
print(lines)
Or in function version:
def find_in_file(string_to_find, file_to_search):
with open(file_to_search, 'r') as f:
content = f.readlines()
for lines in content:
if string_to_find in lines:
return lines
# which you can call it like this:
find_in_file("$wp_version =", find_file())
Note that the function version of the code above will terminate as soon as it finds one instance of the string you are looking for. If you wanna get them all, it has to be modified.

command line arguments using python

I have this code that will let the user choose which file he wants to update by passing an argument in the command line, and then it do some more things but I have not included that here:
import sys
import os
from sys import argv
path = "/home/Desktop/python/test"
files = os.walk( path )
filename = argv[1]
if filename in files:
inputFile = open(filename, 'r')
else:
print "no match found"
sys.exit()
inputFile.close()
When I run the script it keeps giving me "no match found" but im pretty sure the file is there. I cant see what Im doing wrong
os.walk() returns a generator, one that produces tuples with (root, directories, files) values for each iteration.
You can't use that generator to test for a single file, not with a simple in membership test.
You'll also need to re-instate the whole path; you can't just open an unclassified filename without the directory it lives in. Just use a for loop here, and break once you found it. The else suite on a for loop only executes when you did not use break (e.g. the file was not found):
path = "/home/Desktop/python/test"
filename = argv[1]
for root, directories, files in os.walk(path):
if filename in files:
full_path = os.path.join(root, filename)
break
else:
print "no match found"
sys.exit()
with open(full_path) as input_file:
# do something with the file
I added a with statement to handle the lifetime of the file object; once the with block is exited the file is automatically closed for you.
Alternatively, you may use following code snippet.
import os.path
filename = argv[1]
path = "/home/Desktop/python/test/"
if os.path.isfile(path + filename):
inputFile = open(path + filename, "r")
else:
print "File Not Found"

Matching MD5 Hashes from another script

Ok so i'm trying to create a script that does the following: Searches a directory for known hashes. Here is my first script:
Hash.py
import hashlib
from functools import partial
#call another python script
execfile("knownHashes.py")
def md5sum(filename):
with open(filename, mode='rb') as f:
d = hashlib.md5()
for buf in iter(partial(f.read, 128), b''):
d.update(buf)
return d.hexdigest()
print "Hash of is: "
print(md5sum('photo.jpg'))
if md5List == md5sum:
print "Match"
knownHashes.py
print ("Call worked\n")
md5List = "01071709f67193b295beb7eab6e66646" + "5d41402abc4b2a76b9719d911017c592"
The problem at the moment is that I manually have to type in the file I want to find out the hash of where it says photo.jpg. Also, The I haven't got the md5List to work yet.
I want the script to eventually work like this:
python hash.py <directory>
1 match
cookies.jpg matches hash
So how can I get the script to search a directory rather than manually type in what file to hash? Also, how can I fix the md5List because that is wrong?
You can get a list of files in the current working directory using the following. This is the directory that you run the script from.
import os
#Get list of files in working directory
files_list = os.listdir(os.getcwd())
You can iterate through the list using a for loop:
for file in files_list:
#do something
As equinoxel also mentioned below, you can use os.walk() as well.
Simple little gist should solve most of your problems. Understandable if you don't like using OOP for this problem, but I believe all of the important conceptual pieces are here in a pretty clean, concise representation. Let me know if you have any questions.
class PyGrep:
def __init__(self, directory):
self.directory = directory
def grab_all_files_with_ending(self, file_ending):
"""Will return absolute paths to all files with given file ending in self.directory"""
walk_results = os.walk(self.directory)
file_check = lambda walk: len(walk[2]) > 0
ending_prelim = lambda walk: file_ending in " ".join(walk[2])
relevant_results = (entry for entry in walk_results if file_check(entry) and ending_prelim(entry))
return (self.grab_files_from_os_walk(result, file_ending) for result in relevant_results)
def grab_files_from_os_walk(self, os_walk_tuple, file_ending):
format_check = lambda file_name: file_ending in file_name
directory, subfolders, file_paths = os_walk_tuple
return [os.path.join(directory, file_path) for file_path in file_paths if format_check(file_path)]

Why is my write function not creating a file?

According to all the sources I've read, the open method creates a file or overwrites one with an existing name. However I am trying to use it and i get an error:
File not found - newlist.txt (Access is denied)
I/O operation failed.
I tried to read a file, and couldn't. Are you sure that file exists? If it does exist, did you specify the correct directory/folder?
def getIngredients(path, basename):
ingredient = []
filename = path + '\\' + basename
file = open(filename, "r")
for item in file:
if item.find("name") > -1:
startindex = item.find("name") + 5
endindex = item.find("<//name>") - 7
ingredients = item[startindex:endindex]
ingredient.append(ingredients)
del ingredient[0]
del ingredient[4]
for item in ingredient:
printNow(item)
file2 = open('newlist.txt', 'w+')
for item in ingredient:
file2.write("%s \n" % item)
As you can see i'm trying to write the list i've made into a file, but its not creating it like it should. I've tried all the different modes for the open function and they all give me the same error.
It looks like you do not have write access to the current working directory. You can get the Python working directory with import os; print os.getcwd().
You should then check whether you have write access in this directory. This can be done in Python with
import os
cwd = os.getcwd()
print "Write access granted to current directory", cwd, '>', os.access(cwd, os.W_OK)
If you get False (no write access), then you must put your newfile.txt file somewhere else (maybe at path + '/newfile.txt'?).
Are you certain the directory that you're trying to create the folder in exists?
If it does NOT... Then the OS won't be able to create the file.
This looks like a permissions problem.
either the directory does not exist or your user doesn't have the permissions to write into this directory .
I guess the possible problems may be:
1) You are passing the path and basename as parameters. If you are passing the parameters as strings, then you may get this problem:
For example:
def getIngredients(path, basename):
ingredient = []
filename = path + '\\' + basename
getIngredients("D","newlist.txt")
If you passing the parameters the above way, this means you are doing this
filename = "D" + "\\" + "newlist.txt"
2) You did not include a colon(:) after the path + in the filename.
3) Maybe, the file does not exist.

Categories

Resources