I have the code below which is bringing attachments into parent_directory using api connection.
Problem: The code works great but the only problem with this code is it gets stuck when there're existing folders.
Solution: How can make this code bypass the existing folders. So if the folder exists, then don't do anything just move to the next loop.
import pandas as pd
import os
import zipfile
parent_directory = "folderpath"
csv_file_dir = "myfilepath.csv"
user = "API_username"
key = "API_password"
os.chdir(parent_directory)
bdr_data = pd.read_csv(csv_file_dir)
api_first = "… " + user + ":" + key + "…"
for index, row in bdr_data.iterrows():
#print(row['url_attachment'])
name = row['Ref_Num']
os.makedirs(parent_directory + name)
os.chdir(parent_directory + name)
url = api_first + row['url_attachment'] + " -o attachments.zip"
os.system(url)
os.chdir(parent_directory)
You can do it like this.
for index, row in bdr_data.iterrows():
name = row['Ref_Num']
child_dir = (parent_directory + name)
if os.path.exists(child_dir): # check if folder exist.
print(f'{child_dir} already exist') # you may want to know what is skipped
continue # skip iteration.
os.makedirs(child_dir) # if folder not found, do what you need.
Related
config.yml example,
DBtables:
CurrentMinuteLoad:
CSV_File: trend.csv
Table_Name: currentminuteload
GUI image,
This may not be the cleanest route to take.
I'm making a GUI that creates a config.yml file for another python script I'm working with.
Using pysimplegui, My button isn't functioning the way I'd expect it to. It currently and accurately checks for the Reference name (example here would be CurrentMinuteLoad) and will kick it back if it exists, but will skip the check for the table (so the ELIF statement gets skipped). Adding the table still works, I'm just not getting the double-check that I want. Also, I have to hit the Okay button twice in the GUI for it to work?? A weird quirk that doesn't quite make sense to me.
def add_table():
window2.read()
with open ("config.yml","r") as h:
if values['new_ref'] in h.read():
sg.popup('Reference name already exists')
elif values['new_db'] in h.read():
sg.popup('Table name already exists')
else:
with open("config.yml", "a+") as f:
f.write("\n " + values['new_ref'] +":")
f.write("\n CSV_File:" + values['new_csv'])
f.write("\n Table_Name:" + values['new_db'])
f.close()
sg.popup('The reference "' + values['new_ref'] + '" has been included and will add the table "' + values['new_db'] + '" to PG Admin during the next scheduled upload')
When you use h.read(), you should save the value since it will read it like a stream, and subsequent calls for this method will result in an empty string.
Try editing the code like this:
with open ("config.yml","r") as h:
content = h.read()
if values['new_ref'] in content:
sg.popup('Reference name already exists')
elif values['new_db'] in content:
sg.popup('Table name already exists')
else:
# ...
You should update the YAML file using a real YAML parser, that will allow you
to check on duplicate values, without using in, which will give you false
positives when a new value is a substring of an existing value (or key).
In the following I add values twice, and show the resulting YAML. The
first time around the check on new_ref and new_db does not find
a match although it is a substring of existing values. The second time
using the same values there is of course a match on the previously added
values.
import sys
import ruamel.yaml
from pathlib import Path
def add_table(filename, values, verbose=False):
error = False
yaml = ruamel.yaml.YAML()
data = yaml.load(filename)
dbtables = data['DBtables']
if values['new_ref'] in dbtables:
print(f'Reference name "{values["new_ref"]}" already exists') # use sg.popup in your code
error = True
for k, v in dbtables.items():
if values['new_db'] in v.values():
print(f'Table name "{values["new_db"]}" already exists')
error = True
if error:
return
dbtables[values['new_ref']] = d = {}
for x in ['new_cv', 'new_db']:
d[x] = values[x]
yaml.dump(data, filename)
if verbose:
sys.stdout.write(filename.read_text())
values = dict(new_ref='CurrentMinuteL', new_cv='trend_csv', new_db='currentminutel')
add_table(Path('config.yaml'), values, verbose=True)
print('========')
add_table(Path('config.yaml'), values, verbose=True)
which gives:
DBtables:
CurrentMinuteLoad:
CSV_File: trend.csv
Table_Name: currentminuteload
CurrentMinuteL:
new_cv: trend_csv
new_db: currentminutel
========
Reference name "CurrentMinuteL" already exists
Table name "currentminutel" already exists
So I have an application that generally does a good job at collecting information and moving stuff from one AWS s3 bucket to another, and then processing it, but it doesn't really do a good job when people name their file with a pretext string.
Currently, I look for glob:
dump_files = glob.glob('docker-support*.zip')
What this does is I have logic built to only look for things that account for file names that utilize docker-support as the main identifier.
However, I need it to account for times when people do something like
super_Secret123-Production-whatever-docker-support*.zip
Basically, I would like for the function to rename it using that variable dump_files
Should I just set the variable to something like this:
dump_files = glob.glob('*docker-support*.zip')
or
dump_files = glob.glob('/^(.*?)\docker-support*.zip')
The main thing is I am going to want to pick it up, rename it and then strip the part of the file name that is before the actual file name needed for processing: docker-support*.zip as the application needs to look for files in S3 just named in that format.
Code that handles this:
#!/usr/bin/env python3
# main execution loop for dump analysis tool
# Author: Bryce Ryan, Mirantis Inc.
#
# checks for new files in dump_originals, when found, runs run-parts against that file
# v1.1
# pause.main.loop check
# improved error handling
# escape file name to run-parts to avoid metacharacters
#
#
import os
import tempfile
import time
import zipfile
import logging
import shutil
import glob
from datetime import date
import sys
from os import path
logging.basicConfig(filename='/dump/logs/analyzer_logs.txt', level=logging.DEBUG, format='%(asctime)s %(message)s', datefmt='%Y-%m-%dT%H:%M:%S%z' )
ROOT_DIR = os.path.abspath('..')
logging.debug("ROOT_DIR: {}".format(ROOT_DIR))
DUMP_DIR = os.path.join(ROOT_DIR, 'dump_originals')
logging.debug("DUMP_DIR: {}".format(DUMP_DIR))
WORK_DIR = os.path.join(ROOT_DIR, 'work_dir')
logging.debug("WORK_DIR: {}".format(WORK_DIR))
# can we actually create a file? just because we have perms, or think we do, doesn't mean there are
# enough inodes or capacity to do basic stuff.
with open(os.path.join(DUMP_DIR, "testfile"), 'w'):
pass
logging.info("Beginning event loop for lodestone. Looking for new files in {}".format(DUMP_DIR))
print("Beginning event loop for lodestone.")
sys.stdout.flush()
os.chdir(DUMP_DIR)
logging.basicConfig(filename="analyzer.logs", level=logging.DEBUG)
while True:
# here at the top of the loop, check to see if we should wait for a bit
# typically, because of testing or maintenance
# if the magic, undocumented file exists, wait for 5 sec and check again
# do this forever
while path.exists("/dump/pause.main.loop"):
print("Pausing main loop for 60s, waiting on /dump/pause.main.loop")
time.sleep(60)
dump_files = glob.glob('docker-support*.zip')
try:
if dump_files[0] != '':
logging.debug("files found")
print("================== BEGIN PROCESSING NEW FILE ========= ")
print("File found:", dump_files[0])
print(" ")
logging.info("Processing new file: " + dump_files[0] )
sys.stdout.flush()
support_dump_file = dump_files[0]
# check that it's an actual zip; if not, ignore it
if not zipfile.is_zipfile(support_dump_file):
print("File: " + str(support_dump_file))
print("Inbound file is not recognized as a zip file.\n\n")
logging.info("Inbound file not recognized as a zip file.")
# now move it out of the way so we don't see it again;
# ok if exists on destination and we ignore the error
shutil.move( support_dump_file, "../dump_complete/" )
# no further processing, so back to the top
sys.stdout.flush()
continue
temp_dir = tempfile.mkdtemp(prefix='dump.', dir=WORK_DIR)
os.chmod(temp_dir, 0o777)
logging.info("temp_dir is: " + temp_dir)
# cmd = ROOT_DIR + "/utilities/run-parts --exit-on-error --arg=analyze --arg=" + DUMP_DIR + " --arg=" + support_dump_file + " --arg=" + temp_dir + " " + ROOT_DIR + "/analysis"
cmd = ROOT_DIR + "/utilities/run-parts --arg=analyze --arg=" + DUMP_DIR + " --arg=\'" + support_dump_file + "\' --arg=" + temp_dir + " " + ROOT_DIR + "/analysis"
print(cmd)
logging.info("Will execute: " + cmd )
sys.stdout.flush()
try:
retcode = os.system(cmd)
tempdir =temp_dir
if retcode == 1:
print("Removing temporary work_dir")
logging.debug("Removing temporary work_dir", tempdir)
shutil.rmtree(tempdir, ignore_errors=True)
sys.stdout.flush()
finally:
print("Finally block for cmd. . .")
print("Removing temporary work_dir")
logging.debug("Removing work_dir " + tempdir)
print(tempdir)
sys.stdout.flush()
# shutil.rmtree(tempdir, ignore_errors=True)
os.system('/bin/rm -rf' + tempdir)
sys.stdout.flush()
except:
pass
# pause for a moment; save some processor cycles
sys.stdout.flush()
time.sleep(1)
Right now I do not have the function that will rename this in there.
I have the following mailboxes on my IMAP server (refer to the attached screenshot).
I want to only select the mailbox Folder1 and check if there are any sub-directories. I already tried the following code:
svr = imaplib.IMAP4_SSL(imap_address)
svr.login(user, pwd)
svr.select('inbox') <<<<<<<<<<<<<<<<<
rv, data = svr.search(None, "ALL")
test, folders = svr.list('""', '*')
print(folders)
I thought changing 'inbox' to 'folder1' (statement indicated with arrows) would select Folder1 and then I can retrieve the sub-directories. But nothing happened and still it shows the same result as 'inbox'.
Can somebody help me understand what I am doing wrong here.
As I would not be knowing the name of folder I tried a different approach. I would first collect all the folders in the root directory and then parse them one by one to check if any sub-directory exists.
root_folders = []
svr = imaplib.IMAP4_SSL(imap_address)
svr.login(user, pwd)
svr.select('inbox')
response, folders = svr.list('""', '*')
def parse_mailbox(data):
flags, b, c = data.partition(' ')
separator, b, name = c.partition(' ')
return flags, separator.replace('"', ''), name.replace('"', '')
def subdirectory(folder):
#For directories 'Deleted Items', 'Sent Items', etc. with whitespaces,
#the name of the directory needs to be passed with double quotes, hence '"' + name + '"'
test, folders = obj.list('""','"' + name+ '/*"')
if(folders is not None):
print('Subdirectory exists') # you can also call parse_mailbox to find the name of sub-directory
for mbox in folders:
flags, separator, name = parse_mailbox(bytes.decode(mbox))
fmt = '{0} : [Flags = {1}; Separator = {2}'
if len(name.split('/')) > 1:
continue
else:
root_folders.append(name)
for folder in root_folders:
subdirectory(folder)
Although this is a tailored code from my script, but this should be the solution for the question put up.
I'm working on a SQLAlchemy dialect for Apache Drill and I've run into an issue that I can't quite seem to figure out.
The basic problem is that SQLAlchemy is generating a query like the one below:
SELECT `field1`, `field2`
FROM dfs.test.data.csv LIMIT 100
which fails because data.csv needs backticks around it as shown below:
SELECT `field1`, `field2`
FROM dfs.test.`data.csv` LIMIT 100
I've defined the various visit_() functions in the dialect's compiler but these seem to have no effect.
This took some time to figure out, and I thought I'd post the result so that if anyone else runs into this issue, they'll have a point of reference as to how to solve it.
Here is the final working code:
https://github.com/JohnOmernik/sqlalchemy-drill/blob/master/sqlalchemy_drill/base.py
Here is what ultimately solved the issue:
def __init__(self, dialect):
super(DrillIdentifierPreparer, self).__init__(dialect, initial_quote='`', final_quote='`')
def format_drill_table(self, schema, isFile=True):
formatted_schema = ""
num_dots = schema.count(".")
schema = schema.replace('`', '')
# For a file, the last section will be the file extension
schema_parts = schema.split('.')
if isFile and num_dots == 3:
# Case for File + Workspace
plugin = schema_parts[0]
workspace = schema_parts[1]
table = schema_parts[2] + "." + schema_parts[3]
formatted_schema = plugin + ".`" + workspace + "`.`" + table + "`"
elif isFile and num_dots == 2:
# Case for file and no workspace
plugin = schema_parts[0]
formatted_schema = plugin + "." + schema_parts[1] + ".`" + schema_parts[2] + "`"
else:
# Case for non-file plugins or incomplete schema parts
for part in schema_parts:
quoted_part = "`" + part + "`"
if len(formatted_schema) > 0:
formatted_schema += "." + quoted_part
else:
formatted_schema = quoted_part
return formatted_schema
I have this snippet of code that looks like this:
server_directory = "/Users/storm/server"
def get_directory(self, username):
home = server_directory + "/" + username
typic = os.getcwd()
if typic == server_directory:
return "/"
elif typic == home:
return "~"
else:
return typic
And every-time I change the directory out of the two nice server directory and home directory of the user, it would look like /Users/storm/server/svr_user. How do I make it /svr_user2 instead of /Users/storm/server/svr_user, since I would like to emulate a home directory and a virtual "root" directory?
Although you can do a lot with string manipulation, a better way would be using os.path:
import os
src = '/Users/storm/server/svr_user'
dst = '/svr_user2'
a = '/Users/storm/server/svr_user/x/y/z'
os.path.join(dst, os.path.relpath(a, src))
returns
'/svr_user2/x/y/z'
The not so politically correct alternative of eumiro's answer would be:
import re
src = '/Users/storm/server/svr_user'
dst = '/svr_user2'
a = '/Users/storm/server/svr_user/x/y/z'
re.sub(src, dst, a, 1)
Which yields:
'/svr_user2/x/y/z'
Notice the 1 which means replace once.