I have below requirements:
Read the files from Directory, need to find files with wildcard search as there are lot of files in the directory.
If file pattern doesn't match, then I need to raise an exception. Please look at the code below. I am struggling with exception.
I am able to search the files with fnmatch function but if file doesn't exist, then I am am struggling with exception. How do I add exception? Please look at the readstatus() function and please help me how do I add exception logic if file does not exist.
import os
import sys
import boto3
from botocore.client import Config
import configparser
import re
import os.path
import glob
import aws_encryption_sdk
import fnmatch
## Initialize the Parameters
def initconfig(input):
config = configparser.ConfigParser()
config.read_file(open( 'CONFIG_AIRBILLING.conf'))
print('Code Name is :'+ input)
global REMOTE_DIR,ACCESS_KEY_ID,ACCESS_SECRET_KEY,BUCKET_NAME,TARGET_DIR,FILENAME,SRC_DIR,FILEPATH
ACCESS_KEY_ID = config.get('ACCESS', 'ACCESS_KEY_ID')
print('ACCESS_ID_IS:'+ ACCESS_KEY_ID)
ACCESS_SECRET_KEY = config.get('ACCESS', 'ACCESS_SECRET_KEY')
BUCKET_NAME = config.get('ACCESS', 'BUCKET_NAME')
SRC_DIR = config.get(input, 'SRC_DIR')
FILENAME = config.get(input, 'FILENAME')
# FILENAME=FILENAME+'*.txt'
FILEPATH=SRC_DIR+'\\'+FILENAME
print('File Path is:'+FILEPATH)
TARGET_DIR = config.get(input, 'TARGET_DIR')
## This function will make sure file exist in Source directory
def readstatus():
for file in os.listdir(SRC_DIR):
if fnmatch.fnmatch(file,FILENAME+'*.txt'):
result='True'
print('****'+file)
movefiles(file)
## This function will move the files to AWS S3 bucket
def movefiles(result):
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
s3.Bucket(BUCKET_NAME).put_object(Key=TARGET_DIR + '/' + result, Body=result)
print('***File Moved***')
if __name__ == '__main__':
initconfig(sys.argv[1])
readstatus()
How about something like this:
The snippet below filters the list of files in the SRC_DIR using the fnmatch and pattern, and returns the list. Then, it checks if the list is empty, i.e. no files matching the pattern were found, it raises an Exception. Otherwise, it goes on ahead processing the individual files.
def readstatus():
files = list(filter(lambda f: fnmatch.fnmatch(f, FILENAME+"*.txt"), os.listdir(SRC_DIR)))
if not files:
raise Exception("Files matching pattern not found!")
for file in files:
print(f"***{file}")
movefiles(file)
In case the file is not found, you could expect an error like
Traceback (most recent call last):
File "main.py", line 20, in <module>
readstatus()
File "main.py", line 10, in readstatus
raise Exception("Files matching pattern not found!")
Exception: Files matching pattern not found!
TRY:
def readstatus():
result = False
for file in os.listdir(SRC_DIR):
if fnmatch.fnmatch(file, FILENAME + '*.txt'):
result = True
print('****' + file)
movefiles(file)
if result != True:
print("your file doesn't exist")
#add exception logic here
The Logic is that if the file is matched with the given filename, then result variable will be True, otherwise the result variable will be False.
We use the value of the variable result as a checksum for whether the file existed or not.
If you're using Python 3.4+, consider using pathlib to make a lot of these pains easier.
def readstatus():
root = pathlib.Path(SRC_DIR)
for fpath in root.glob(f"{FILENAME}*.txt"):
movefiles(str(fpath))
Related
I've managed to find out the method to convert a file from one file extension to another (.evtx to .xml) using an external script. Below is what I am using:
os.system("file_converter.py file1.evtx > file1.xml")
This successfully converts a file from .txt to .xml using the external script I called (file_converter.py).
I am now trying to find out a method on how I can use 'os.system' or perhaps another method to convert more than one file at once, I would like for my program to dive into a folder and convert all of the 10 files I have at once to .xml format.
The questions I have are how is this possible as os.system only takes 1 argument and I'm not sure on how I could make it locate through a directory as unlike the first file I converted was on my standard home directory, but the folder I want to access with the 10 files is inside of another folder, I am trying to find out a way to address this argument and for the conversion to be done at once, I also want the file name to stay the same for each individual file with the only difference being the '.xml' being changed from '.evtx' at the end.
The file "file_converter.py" is downloadable from here
import threading
import os
def file_converter(file):
os.system("file_converter.py {0} > {1}".format(file, file.replace(".evtx", ".xml")))
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
for file in os.listdir(base_dir):
threading.Thread(target=file_converter, args=(file,)).start()
Here my sample code.
You can generate multiple thread to run the operation "concurrently". The program will check for all files in the directory and convert it.
EDIT python2.7 version
Now that we have more information about what you want I can help you.
This program can handle multiple file concurrently from one folder, it check also into the subfolders.
import subprocess
import os
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
commands_to_run = list()
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
commands_to_run.append("C:\\Python27\\python.exe file_converter.py {0} > {1}".format(fullPath, fullPath.replace(".evtx", ".xml")))
return allFiles
print "Searching for files"
file_list(base_dir)
print "Running conversion"
processes = [subprocess.Popen(command, shell=True) for command in commands_to_run]
print "Waiting for converted files"
for process in processes:
process.wait()
print "Conversion done"
The subprocess module can be used in two ways:
subprocess.Popen: it run the process and continue the execution
subprocess.call: it run the process and wait for it, this function return the exit status. This value if zero indicate that the process terminate succesfully
EDIT python3.7 version
if you want to solve all your problem just implement the code that you share from github in your program. You can easily implement it as function.
import threading
import os
import Evtx.Evtx as evtx
import Evtx.Views as e_views
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
tmp_list.append(e_views.XML_HEADER)
tmp_list.append("<Events>")
for record in log.records():
try:
tmp_list.append(record.xml())
except Exception as e:
print(e)
tmp_list.append("</Events>")
with open(file_out, 'w') as final:
final.writelines(tmp_list)
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
threading.Thread(target=convert, args=(fullPath, fullPath.replace(".evtx", ".xml"))).start()
return allFiles
print("Searching and converting files")
file_list(base_dir)
If you want to show your files generate, just edit as above:
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
with open(file_out, 'a') as final:
final.write(e_views.XML_HEADER)
final.write("<Events>")
for record in log.records():
try:
final.write(record.xml())
except Exception as e:
print(e)
final.write("</Events>")
UPDATE
If you want to delete the '.evtx' files after the conversion you can simply add the following rows at the end of the convert function:
try:
os.remove(file_in)
except(Exception, ex):
raise ex
Here you just need to use try .. except because you run the thread only if the input value is a file.
If the file doesn't exist, this function throws an exception, so it's necessary to check os.path.isfile() first.
import os, sys
DIR = "D:/Test"
# ...or as a command line argument
DIR = sys.argv[1]
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
name, ext = os.path.splitext(f)
if ext == ".txt":
new_path = os.path.join(DIR, f"{name}.xml")
os.rename(path, new_path)
Iterates over a directory, and changes all text files to XML.
I am fairly new to python. Need some help. This is what I need.
We want to look for the file in directory, if file exist and size of the file is not zero bytes, I want to encrypt the file using AWS KMS encryption and upload it to bucket.
If file doesn't exist then raise an exception
If file exist but the file is zero bytes then raise exception.
Below is the code I could come up with but I am sure there is better way to write this and need your help. One thing I couldn't achieve is the encryption.
import os
import sys
import boto3
from botocore.client import Config
import configparser
import re
import os.path
import glob
## Initialize the Parameters
def initconfig(input):
config = configparser.ConfigParser()
config.read_file(open( 'CONFIG_AIRBILLING.conf'))
print('Code Name is :'+ input)
global REMOTE_DIR,ACCESS_KEY_ID,ACCESS_SECRET_KEY,BUCKET_NAME,TARGET_DIR,FILENAME,SRC_DIR,File,FILEPATH
ACCESS_KEY_ID = config.get('ACCESS', 'ACCESS_KEY_ID')
print('ACCESS_ID_IS:'+ ACCESS_KEY_ID)
ACCESS_SECRET_KEY = config.get('ACCESS', 'ACCESS_SECRET_KEY')
BUCKET_NAME = config.get('ACCESS', 'BUCKET_NAME')
SRC_DIR = config.get(input, 'SRC_DIR')
FILENAME = config.get(input, 'FILENAME')
FILENAME=FILENAME+'*.txt'
FILEPATH=SRC_DIR+'\\'+FILENAME
print('File Path is:'+FILEPATH)
TARGET_DIR = config.get(input, 'TARGET_DIR')
File='demo.txt'
## This function will make sure file exist in Source directory
def readstatus():
print('Startibg')
try:
with open(FILEPATH,'r') as f:
f.closed
result='True'
movefiles(result)
except (Exception) as e:
print('***Error:File Not Found or Accessible***')
result='False*'
raise e
## This function will move the files to AWS S3 bucket
def movefiles(result):
if result=='True':
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
s3.Bucket(BUCKET_NAME).put_object(Key=TARGET_DIR + '/' + File, Body=File)
print('***File Moved***')
print("Done")
if __name__ == '__main__':
print(len(sys.argv))
initconfig(sys.argv[1])
print(sys.argv)
readstatus()
#initconfig(input=input())
#readstatus()
I have a folder with few articles and I would like to map text of each article into a common list in order to use the list for the tf-idf transformation. For example:
folder = [article1, article2, article3]
into list
list = ['text_of_article1', 'text_of_article2', 'text_of_article3']
def multiple_file(arg): #arg is path to the folder with multiple files
'''Function opens multiple files in a folder and maps each of them to a list
as a string'''
import glob, sys, errno
path = arg
files = glob.glob(path)
list = [] #list where file string would be appended
for item in files:
try:
with open(item) as f: # No need to specify 'r': this is the default.
list.append(f.read())
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
return list
When I set the path to the folder with my articles I get an empty list. However, when I set it directly to one article, then that article appears in the list. How could I get all of them mapped into my list. :S
This is the code, not sure if this is what you had in mind:
def multiple_files(arg): #arg is path to the folder with multiple files
'''Function opens multiple files in a folder and maps each of them to a list
as a string'''
import glob, sys, errno, os
path = arg
files = os.listdir(path)
list = [] #list where file string would be appended
for item in files:
try:
with open(item) as f: # No need to specify 'r': this is the default.
list.append(f.read())
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
return list
And this is the error:
Traceback (most recent call last):
File "<ipython-input-7-13e1457699ff>", line 1, in <module>
x = multiple_files(path)
File "<ipython-input-5-6a8fab5c295f>", line 10, in multiple_files
with open(item) as f: # No need to specify 'r': this is the default.
IOError: [Errno 2] No such file or directory: 'u02.txt'
Article No. 2 is actually the first one in the newly created list.
Suppose path == "/home/docs/guzdeh". If you just say glob.glob(path) you only get [path] because nothing else matches the pattern. You want glob.glob(path + "/*") to get everything in that directory, or glob.glob(path + "/*.txt") for all the txt files.
Alternatively you could use import os; os.listdir(path), which I think makes more sense.
UPDATE:
Regarding the new code, the problem is that os.listdir only returns the path relative to the directory listed. Therefore you need to combine the two for python to know where you're talking about. Add:
item = os.path.join(path, item)
before trying to open(item). You might also want to name your variables better.
I am working on creating a script to watch a folder, grab any new .zip files, and then upload them via FTP to a predetermined area. Right now FTP testing is being performed Locally, since the environment isnt yet created.
The strategy I am taking is to first, unzip into a local folder. Then, perform ftplib.storbinary , on the file from the local folder, to the ftpdestination. However, the unzipping process appears to be working but I am getting a "file does not exist" error, all though I can see it in the folder itself.
Also, is there anyway to unzip directly into an FTP location? I havent been able to find a way hence the approach I am taking.
Thanks, local ftp info removed from code. All paths that are relevant in this code will be changed, most likely to dynamic fashion, but for now this is a local environment
extractZip2.py
import zipfile
import ftplib
import os
import logging
import time
from socket import error as socket_error
#Logging Setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('__name__')
FTPaddress = ''
FTPusername = ''
FTPpassword = ''
ftp_destination_location = ''
path_to_watch = "C:/Users/206420055/Desktop/test2/"
before = dict ([(f,None) for f in os.listdir(path_to_watch)])
temp_destination_location = "C:/Users/206420055/Desktop/temp/"
def unzip(fullPath,temporaryPath):
with zipfile.ZipFile(fullPath, "r") as z :
logger.info("Unzipping {0}".format(fullPath))
z.extractall(temporaryPath)
logger.info("Unzipped into local directory {0}".format(temp_destination_location))
def check_or_create_ftp(session, folder):
"""
Checks to see if necessary folder for currenttab is available.
Creates the folder if not found, and enters it.
"""
if folder not in session.nlst():
logger.info('Directory for {0} does not exist, creating directory\n'.format(folder))
session.mkd(folder)
session.cwd(folder)
def check_or_create(temp_destination):
"""
Checks to see if local savepath exists. Will create savepath if not exists.
"""
if not os.path.exists(temp_destination):
logger.info('Directory for %s does not exist, creating directory\n' % temp_destination)
os.makedirs(str(temp_destination))
def transfer(address,username,password,filename,destination):
logger.info("Creating Session")
try:
session = session_init(address,username,password,destination)
except (socket_error,ftplib.error_perm) as e:
logger.error(str(e))
logger.error("Error in Session Init")
else:
try:
logger.info("Sending File {0}".format(filename))
send_file(session,filename)
except (IOError, OSError, ftplib.error_perm) as e:
logger.error(e)
def session_init(address,username,password,path):
session = ftplib.FTP(address,username,password)
check_or_create_ftp(session,path)
logger.info("Session Established")
return session
def send_file(session,filename):
file = open(filename,'rb')
logger.info('Sending File : STOR '+filename)
session.storbinary('STOR '+ filename, file)
file.close()
def delete_local_files(savepath, file):
logger.info("Cleaning Up Folder {0}".format(savepath))
os.remove(file)
while 1:
time.sleep(5)
after = dict ([(f,None) for f in os.listdir(path_to_watch)])
added = [f for f in after if not f in before]
removed = [f for f in before if not f in after]
if added: print "Added: ",", ".join(added)
before = after
check_or_create(temp_destination_location)
if added :
for file in added:
print file
if file.endswith('.zip'):
unzip(path_to_watch+file, temp_destination_location)
temp_files = os.listdir(temp_destination_location)
print("Temp Files {0}".format(temp_files))
for tf in temp_files:
print("TF {0}".format(tf))
transfer(FTPaddress,FTPusername,FTPpassword,tf,ftp_destination_location)
#delete_local_files(temp_destination_location,tf)
else:
pass
edit: adding error image
Seen above, we see the file in the temp folder. But the console obviously shows the error.
just change it to
from glob import glob
zips_in_path = dict ([(f,None) for f in glob("{base_path}/*.zip".format(base_path = path_to_watch)])
os.listdir does not include the path_to_watch part of the path it is just the filenames, however glob does.
so you could also do
after = dict ([(os.path.join(path_to_watch,f),None) for f in os.listdir(path_to_watch)])
using either of these methods you should be able to get the full path to the files in the path
I'm trying to write a Python code that will import LANDSAT satellite images into Grass GIS by adapting this code: http://grass.osgeo.org/wiki/LANDSAT
LANDSAT tiles are downloaded as folders, each containing 7 tiff images (Band 1-7). I therefore have a directory which contains several subdirectories (one for each LANDSAT tile).
My code at present is as follows:
#!/usr/bin/python
import os
import sys
import glob
import grass.script as grass
def import_tifs(dirpath):
for dirpath, dirname, filenames in os.walk(dirpath):
for dirname in dirpath:
dirname = os.path.join(dirpath,dirname)
for file in os.listdir(dirname):
if os.path.splitext(file)[-1] != '.TIF':
continue
ffile = os.path.join(dirname, file)
name = os.path.splitext(file)[0].split(dirname)[-1]
grass.message('Importing %s -> %s#%s...' % (file, name, dirpath))
grass.run_command('r.in.gdal',
flags = 'o',
input = ffile,
output = name,
quiet = True,
overwrite = True)
def main():
if len(sys.argv) == 1:
for directory in filter(os.path.isdir, os.listdir(os.getcwd())):
import_tifs(directory)
else:
import_tifs(sys.argv[1])
if __name__ == "__main__":
main()
I'm getting the following error:
Traceback (most recent call last):
File "C:/Users/Simon/Documents/import_landsat2.py", line
40, in <module>
main()
File "C:/Users/Simon/Documents/import_landsat2.py", line
37, in main
import_tifs(sys.argv[1])
File "C:/Users/Simon/Documents/import_landsat2.py", line
17, in import_tifs
for file in os.listdir(dirname):
WindowsError: [Error 3] The system cannot find the path
specified: 'dirpath\\C/*.*'
Can anyone explain what is happening and what I need to do to fix it, or suggest an alternative? Thanks.
I believe your main problem is that dirname in os.walk() returns a list (not a string), so your subsequent strings (namely dirname = os.path.join(dirpath,dirname)) are a bit malformed. Here is one possible alternative - to test this, I used the full path to the directory as sys.argv[1], but you can make it more dynamic to suit your case. Also, avoid using variable names such as file since they are Python keywords. I couldn't test out your grass.* functions, but hopefully this will be a clear enough example so you can tweak how you need. os.walk() natively handles a lot of standard parsing, so you can remove some of the directory-manipulating functions:
def import_tifs(dirpath):
for dirpath, dirname, filenames in os.walk(dirpath):
# Iterate through the files in the current dir returned by walk()
for tif_file in filenames:
# If the suffix is '.TIF', process
if tif_file.upper().endswith('.tif'):
# This will contain the full path to your file
full_path = os.path.join(dirpath, tif_file)
# tif_file will already contain the name, so you can call from here
grass.message('Importing %s -> %s#%s...' % (full_path, tif_file, dirpath))
grass.run_command('r.in.gdal',
flags = 'o',
input = full_path,
output = tif_file,
quiet = True,
overwrite = True)
I've just rewritten your code to list all the dir tree and find for a file extension, in this case '.tif',
#!/usr/bin/python
import os
import sys
def import_tifs(dirpath):
for dirpath, dirname, filenames in os.walk(dirpath):
for filename in filenames:
name, extension = os.path.splitext(filename)
if extension.lower() == ".tif":
filepath = os.path.join(dirpath, filename)
print(filepath, name, dirpath)
def main():
if len(sys.argv) == 1:
import_tifs(os.getcwd())
else:
import_tifs(sys.argv[1])
if __name__ == "__main__":
main()
please check if that is what you are looking for...