Simplifying code blocks with functions [Python]

Simplifying code blocks with functions [Python] - python

Im searching for a way to simplify my code via functions. 90% of my operation are equal and only differ from the if condition.
E.g.
if isFile:
fFound = False
for key in files:
if item["path"] in key:
fFound = True
for c in cmds.keys():
if item["path"] in cmds[c]["files"]:
ifxchecker(item["requiredIFX"], cmds[c]["ifx_match"])
outputCFG()
if not fFound:
notFound.append(item['path'])
else:
dir = item["path"][:-1]
pFound = False
for key in files:
if dir in key:
pFound = True
for c in cmds.keys():
for file in cmds[c]["files"]:
if dir in file:
ifxchecker(item["requiredIFX"], cmds[c]["ifx_match"])
outputCFG()
if not pFound:
notFound.append(dir)
My code is working fine, I'm just trying to get the most of it in a function and only differ from these small if conditions. I can't find a way to simplify it and I'm not even sure if there is.
I did some small functions as you see but I think there would be a better way to simplify the whole construct.

Unfortunately can't test it, because multiple vars and methods are not defined, but it seems to work. Maybe using is_dir bool variable instead of elem will be better, if you'd like: replace elem with is_dir and add the following line to the beginning of function:
elem = item["path"][:-1] if is_dir else item["path"]
def do_stuff(elem, files, item, cmds, notFound):
fFound = False
for key in files:
if elem in key:
fFound = True
for c in cmds.keys():
if elem in cmds[c]["files"]:
ifxchecker(item["requiredIFX"], cmds[c]["ifx_match"])
outputCFG()
if not fFound:
return elem
if isFile:
res = do_stuff(item["path"], files, item, cmds)
if res is not None:
notFound.append(res)
else:
do_stuff(item["path"][:-1], files, item, cmds)
if res is not None:
notFound.append(res)

I solved it like that with #azro method:
def cfgFunction(x):
global file
fFound = False
for file in files:
if x in file:
fFound = True
for group in cmds.keys():
if x in cmds[group]["files"]:
ifxchecker(item["requiredIFX"], cmds[group]["ifx_match"])
outputCFG()
if not fFound:
notFound.append(x)

Related

How to recreate the tree organization in nested dictionnaries

I've a problem I have been struggling on for some time now. What I need to do is to check things for a large amount of data inside many folders. To keep track of what has been done I wanted to create a yaml file containing the tree organization of my data structure. Thus, the objective is to create nested dictionaries of the folders containing data.
The script I made is working, but it duplicates each folder and I don't know how to call recursively the function to avoid this. Here is the code :
def load_tree_structure_as_dictionnary(current_dict):
for dir_name in current_dict.keys():
lst_sub_dir = [f.path for f in os.scandir(dir_name) if f.is_dir()]
if lst_sub_dir == []:
current_dict[dir_name]['correct_calibration'] = None
else:
for sub_dir in lst_sub_dir:
current_dict[dir_name][sub_dir] = load_tree_structure_as_dictionnary( {sub_dir: {}} )
return current_dict
init_dict = {data_path : {} }
full_dict = load_tree_structure_as_dictionnary(init_dict)
I know the error is in the recursive call, but I can't create a new 'sub_dir' key if there isnt a dictionnary initialized ( hence the {sub_dir : {}} )
Also I am new to writing stackoverflow questions, lmk if something needs to be improved in the syntax.

After changing current_dict[dir_name][sub_dir] = load_tree_structure_as_dictionnary( {sub_dir: {}} ) to current_dict[dir_name].update(load_tree_structure_as_dictionnary( {sub_dir: {}} )) your code will not duplicate the sub_dir.
def load_tree_structure_as_dictionnary(current_dict):
for dir_name in current_dict.keys():
lst_sub_dir = [f.path for f in os.scandir(dir_name) if f.is_dir()]
if lst_sub_dir == []:
current_dict[dir_name]['correct_calibration'] = None
else:
for sub_dir in lst_sub_dir:
current_dict[dir_name].update(load_tree_structure_as_dictionnary( {sub_dir: {}} ))
return current_dict
init_dict = {"venv" : {} }
full_dict = load_tree_structure_as_dictionnary(init_dict)

Use of 'and' in if statements

I need to check thousands of directories for two kinds of files. I have restricted to the index, or idx, to less than four since within that range there would be the two kinds of files that need to be found, the 'jpg' and the '.thmb'. But I need the the if statement to require that those two kinds of files are in the directory. The if statement:
if ('.jpg' in val) and ('thmb' in val):
works except I keep getting printout through the else statement that data is missing, when it is not true:
Data missing W:\\North2015\200\10 200001000031.jpg 0
Data missing W:\\North2015\200\10 200001000032.jpg 1
Data missing W:\\North2015\200\100 200014000001.jpg 0
Data missing W:\\North2015\200\100 200014000002.jpg 1
Data missing W:\\North2015\200\101 200014100081.jpg 2
Here is the code below:
def missingFileSearch():
for folder in setFinder():
for idx,val in enumerate(os.listdir(folder)):
if idx < 4:
if ('.jpg' in val) and ('thmb' in val):
pass
else:
print'Data missing',folder,val,idx
So i am wondering why I am getting the output through the else statement.
Also, this line of code gets hung up:
if val.endswith('.jpg') and ('thmb' in val):
print'Data is here!',folder,val,idx
This is chiefly what I need the code to do.

I would do this:
def missingFileSearch():
folders_with_missing = []
for folder in setFinder():
thmb_found = False
jpg_found = False
for fname in os.listdir(folder):
thmb_found |= 'thmb' in fname
jpg_found |= fname.endswith('.jpg')
if thmb_found and jpg_found:
break # break inner loop, move on to check next folder
else: # loop not broken
if not thmb_found and not jpg_found:
desc = "no thmb, no .jpg"
elif not thmb_found:
desc = "no thmb"
else:
desc = "no .jpg"
folders_with_missing.append((folder, desc))
return folders_with_missing
I have tested a slightly modified version of this code (no setFinder() function):
def missingFileSearch():
folders_with_missing = []
for folder in os.listdir(my_dir):
thmb_found = False
jpg_found = False
for fname in os.listdir(os.path.join(my_dir, folder)):
thmb_found |= 'thmb' in fname
jpg_found |= fname.endswith('.jpg')
if thmb_found and jpg_found:
break # break inner loop, move on to check next folder
else: # loop not broken
if not thmb_found and not jpg_found:
desc = "no thmb, no .jpg"
elif not thmb_found:
desc = "no thmb"
else:
desc = "no .jpg"
folders_with_missing.append((folder, desc))
return folders_with_missing
I created four test folders with self explanatory names:
>>> os.listdir(my_dir)
['both_thmb_jpg', 'missing_jpg', 'missing_thmb', 'no_files']
Then ran the function:
>>> missingFileSearch()
[('missing_jpg', 'no .jpg'), ('missing_thmb', 'no thmb'), ('no_files', 'no thmb, no .jpg')]

How to speed up dictionary build in Python

I have looked at the links but nothing appears to apply. I am doing what I thought would be a simple build of three dictionaries that I use elsewhere. They are not all that large but this function takes almost 4 minutes to complete. I am likely missing something and as I would like this to run faster. This is Python 3.4
class VivifiedDictionary(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
def dict_build(exclude_chrY):
coordinate_intersection_dict = VivifiedDictionary()
aberration_list_dict = VivifiedDictionary()
gene_list_dict = VivifiedDictionary()
if eval(exclude_chrY):
chr_y = ""
else:
chr_y = "chrY"
abr_type_list = ["del", "ins"]
mouse_list = ["chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chrX", chr_y]
for chrom in mouse_list:
for aberration in abr_type_list:
coordinate_intersection_dict[chrom][aberration] = []
aberration_list_dict[chrom][aberration] = []
gene_list_dict[chrom][aberration] = []

Pleas check this The Python Profilers I think this may help in finding the bottleneck in your bigger script.

Python - Getting Attributes From A File of Constants

I have a file of constant variables that I need to query and I am not sure how to go about it.
I have a database query which is returning user names and I need to find the matching user name in the file of constant variables.
The file looks like this:
SALES_MANAGER_01 = {"user_name": "BO01", "password": "password", "attend_password": "BO001",
"csm_password": "SM001", "employee_num": "BOSM001"}
There is just a bunch of users just like the one above.
My function looks like this:
#attr("user_test")
def test_get_user_for_login(self):
application_code = 'BO'
user_from_view = self.select_user_for_login(application_code=application_code)
users = [d['USER'] for d in user_from_view]
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
password = ""
global_users = dir(gum)
for item in global_users:
if user_wo_ent not in item.__getattr__("user_name"):
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
else:
password = item.__getattr__("password")
print(user_wo_ent, password)
global_users = dir(gum) is my file of constants. So I know I am doing something wrong since I am getting an attribute error AttributeError: 'str' object has no attribute '__getattr__', I am just not sure how to go about resolving it.

You should reverse your looping as you want to compare each item to your match condition. Also, you have a dictionary, so use it to do some heavy lifting.
You need to add some imports
import re
from ast import literal_eval
I've changed the dir(gum) bit to be this function.
def get_global_users(filename):
gusers = {} # create a global users dict
p_key = re.compile(ur'\b\w*\b') # regex to get first part, e.g.. SALES_MANAGER_01
p_value = re.compile(ur'\{.*\}') # regex to grab everything in {}
with (open(filename)) as f: # open the file and work through it
for line in f: # for each line
gum_key = p_key.match(line) # pull out the key
gum_value = p_value.search(line) # pull out the value
''' Here is the real action. update a dictionary
with the match of gum_key and with match of gum_value'''
gusers[gum_key.group()] = literal_eval(gum_value.group())
return(gusers) # return the dictionary
The bottom of your existing code is replaced with this.
global_users = get_global_users(gum) # assign return to global_users
for key, value in global_users.iteritems(): # walk through all key, value pairs
if value['user_name'] != user_wo_ent:
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
else:
password = value['password']

So a very simple answer was get the dir of the constants file then parsing over it like so:
global_users = dir(gum)
for item in global_users:
o = gum.__dict__[item]
if type(o) is not dict:
continue
if gum.__dict__[item].get("user_name") == user_wo_ent:
print(user_wo_ent, o.get("password"))
else:
print("User was not in global_user_mappings")

I was able to find the answer by doing the following:
def get_user_for_login(application_code='BO'):
user_from_view = BaseServiceTest().select_user_for_login(application_code=application_code)
users = [d['USER'] for d in user_from_view]
user_with_ent = choice(users)
user_wo_ent = user_with_ent[4:]
global_users = dir(gum)
user_dict = {'user_name': '', 'password': ''}
for item in global_users:
o = gum.__dict__[item]
if type(o) is not dict:
continue
if user_wo_ent == o.get("user_name"):
user_dict['user_name'] = user_wo_ent
user_dict['password'] = o.get("password")
return user_dict

Why do I keep getting this error in map-reduce while using mincemeat?

I just want to calculate word count from some 7500 files with some condition on which words to count. The program goes like this.
import glob
import mincemeat
text_files = glob.glob('../fldr/2/*')
def file_contents(file_name):
f = open(file_name)
try:
return f.read()
finally:
f.close()
source = dict((file_name, file_contents(file_name))
for file_name in text_files)
def mapfn(key, value):
for line in value.splitlines():
list2 = [ ]
for temp in line.split("::::"):
list2.append(temp)
if (list2[0] == '5'):
for review in list2[1].split():
yield [review.lower(),1]
def reducefn(key, value):
return key, len(value)
s = mincemeat.Server()
s.datasource = source
s.mapfn = mapfn
s.reducefn = reducefn
results = s.run_server(password="wola")
print results
The error I get while running this program is
error: uncaptured python exception, closing channel <__main__.Client connected at 0x250f990>
(<type 'exceptions.IndexError'>:list index out of range
[C:\Python27\lib\asyncore.py|read|83]
[C:\Python27\lib\asyncore.py|handle_read_event|444]
[C:\Python27\lib\asynchat.py|handle_read|140]
[mincemeat.py|found_terminator|96]
[mincemeat.py|process_command|194]
[mincemeat.py|call_mapfn|170]
[projminc2.py|mapfn|21])

Take a look at what's in list2 e.g. by doing
print(list2)
or with a debugger. If you do this you'll see that list2 only has one element so list2[1] isn't valid.
(You don't really want to split on "::::" - that's a typo in your script).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simplifying code blocks with functions [Python] - python

I solved it like that with #azro method: def cfgFunction(x): global file fFound = False for file in files: if x in file: fFound = True for group in cmds.keys(): if x in cmds[group]["files"]: ifxchecker(item["requiredIFX"], cmds[group]["ifx_match"]) outputCFG() if not fFound: notFound.append(x)

Related

How to recreate the tree organization in nested dictionnaries

Use of 'and' in if statements

How to speed up dictionary build in Python

Python - Getting Attributes From A File of Constants

Why do I keep getting this error in map-reduce while using mincemeat?

Categories

Resources