I am new to Python and I was wondering if there was a way I could shorten/optimise the below loops:
for breakdown in data_breakdown:
for data_source in data_source_ids:
for camera in camera_ids:
if (camera.get("id") == data_source.get("parent_id")) and (data_source.get("id") == breakdown.get('parent_id')):
for res in result:
if res.get("camera_id") == camera.get("id"):
res.get('data').update({breakdown.get('name'): breakdown.get('total')})
I tried this oneliner, but it doesn't seem to work:
res.get('data').update({breakdown.get('name'): breakdown.get('total')}) for camera in camera_ids if (camera.get("id") == data_source.get("parent_id")) and (data_source.get("id") == breakdown.get('parent_id'))
You can use itertools.product to handle the nested loops for you, and I think (although I'm not sure because I can't see your data) you can skip all the .get and .update and just use the [] operator:
from itertools import product
for b, d, c in product(data_breakdown, data_source_ids, camera_ids):
if c["id"] != d["parent_id"] or d["id"] != b["parent_id"]:
continue
for res in result:
if res["camera_id"] == c["id"]:
res['data'][b['name']] = b['total']
If anything, to optimize the performance of those loops, you should make them longer and more nested, with the data_source.get("id") == breakdown.get('parent_id') happening outside of the camera loop.
But there is perhaps an alternative, where you could change the structure of your data so that you don't need to loop nearly as much to find matching ID values. Convert each of your current lists (of dicts) into a single dict with its keys equal to the 'id' value you'll be trying to match in that loop, and the value being whole dict.
sources_dict = {source.get("id"): source for source in data_source_ids}
cameras_dict = {camera.get("id"): camera for camera in camera_ids}
results_dict = {res.get("camera_id"): res for res in result}
Now the whole loop only needs one level:
for breakdown in data_breakdown:
source = sources_dict[breakdown["parent_id"]]
camera = cameras_dict[source["parent_id"]]
res = results_dict[camera["id"]]
res.data[breakdown["name"]] = breakdown["total"]
This code assumes that all the lookups with get in your current code were going to succeed in getting a value. You weren't actually checking if any of the values you were getting from a get call was None, so there probably wasn't much benefit to it.
I'd further note that it's not clear if the camera loop in your original code was at all necessary. You might have been able to skip it and just directly compare data_source['parent_id'] against res['camera_id'] without comparing them both to a camera['id'] in between. In my updated version, that would translate to leaving out the creation of the cameras_dict and just directly indexing results_dict with source["parent_id"] rather than indexing to find camera first.
Related
I've followed a tutorial to write a Flask REST API and have a special request about a Python code.
The offered code is following:
# data list is where my objects are stored
def put_one(name):
list_by_id = [list for list in data_list if list['name'] == name]
list_by_id[0]['name'] = [new_name]
print({'list_by_id' : list_by_id[0]})
It works, which is nice, and even though I understand what line 2 is doing, I would like to rewrite it in a way that it's clear how the function iterates over the different lists. I already have an approach but it returns Key Error: 0
def put(name):
list_by_id = []
list = []
for list in data_list:
if(list['name'] == name):
list_by_id = list
list_by_id[0]['name'] = request.json['name']
return jsonify({'list_by_id' : list_by_id[0]})
My goal with this is also to be able to put other elements, that don't necessarily have the type 'name'. If I get to rewrite the function in an other way I'll be more likely to adapt it to my needs.
I've looked for tools to convert one way of coding into the other and answers in forums before coming here and couldn't find it.
It may not be beatiful code, but it gets the job done:
def put(value):
for i in range(len(data_list)):
key_list = list(data_list[i].keys())
if data_list[i][key_list[0]] == value:
print(f"old value: {key_list[0], data_list[i][key_list[0]]}")
data_list[i][key_list[0]] = request.json[test_key]
print(f"new value: {key_list[0], data_list[i][key_list[0]]}")
break
Now it doesn't matter what the key value is, with this iteration the method will only change the value when it finds in the data_list. Before the code breaked at every iteration cause the keys were different and they played a role.
Set-up
I'm scraping apartment ads using Scrapy. For certain housing characteristics, I loop over the elements of a list BC I obtain per ad. If the characteristic is in the list I assign a 'yes' and if not a 'no'. E.g.
for x in BC:
if 'Terraza' in x:
terrace = 'yes'
break
else:
terrace = 'no'
For each 'yes-no' characteristic I have a copy of the above loop.
Problem
Besides looping over the elements of the list, I'd like to loop over the characteristics themselves. I.e. I'd like to 'merge' all the loops per characteristic into one loop.
I've tried the following (my actual bcl does contain multiple elements):
found = False
bcl = ['Terraza']
for x in l: # l is a list of strings containing housing characteristics
for y in bcl:
if y in x:
y = 'yes'
found = True
break
else:
y = 'no'
if found:
break
terrace = Terrazza
but this loop does not create the variable Terrazza. I'm not sure I can solve this with globals.
How do I make this loop work?
Depending on the wanted result, you may take a different approach. In a case like this, I tend to use a more functional style of coding. I am not sure if this is what you intend, but I think you could do it this way:
list_characteristics = ['Terraza', "Swimming pool"] # sample ad text - list of strings
bcl = ["test1", "test2", "Terraza", "test3"] # sample checklist - list of strings
def check_characteristics(checklist, adlist):
list_of_found_items = []
for characteristic in list_characteristics:
print("characteristic:", characteristic)
for item in bcl:
print("item:", item)
if item in characteristic:
list_of_found_items.append(item)
return list_of_found_items
result_list = check_characteristics(bcl, list_characteristics)
print("This ad has the following characteristics:", result_list)
Using the code above, you have a function that takes two lists of strings and lists all the items found. In case you one want to know if there is at least one of those, you can use this other function as a faster, short-circuit, way:
list_characteristics = ['Terraza', "Swimming pool"] # ad text - list of strings
bcl = ["test1", "test2", "Terraza", "test3"] # checklist - list of strings
def has_any_characteristic(checklist, adlist):
for characteristic in list_characteristics:
for item in bcl:
if item in characteristic:
return True
return False
result = has_any_characteristic(bcl, list_characteristics)
print("This ad has at least one of the wanted characteristics?", result)
Seems like a lot of code, but you code it once and then you use it whenever you need it, in a clean and easy to read way, IMHO. The definition of these two functions can even be placed in a separate module that you import where needed. So, in the main code, you will only need to use one line to call the function. Each function allows to answer a simple question in a way that is easy to understand, as presented in the print()statements in both code samples above.
Your problem isn't in merging loops but probably in breaking from outer loop. You can break out of a top loop by raising custom exception and then trying to catch it. Take a look at this peace of code:
bcl = ['Terraza']
class FoundCharacteristicException(Exception):
pass
for x in li:
try:
for y in bcl:
if y in x:
raise FoundCharacteristicException
except FoundCharacteristicException:
break
I got a huge dict adding data in it. I am trying to search if already a key exists in the dict but takes to long when the dictionary grows. how can I get this search in parallel in a multiprocesser system?
def __getVal(self, key, val):
ret= 0
if key in self.mydict:
ret= val + self.mydict[key]
else:
ret = val
return ret
Perhaps before trying to split in multiprocess, you should try this:
Instead of looking if the key is in the dictionnary, access it, in a try...catch block.
On my various computer, it's so far faster than looking in the key list.
So your final code would be something like:
try:
ret = val + self.mydict[key]
catch:
ret = val
Just use .get with `a default value of 0
return self.mydict.get(key, 0) + val
Using ret = 0 and adding to it is pointless, just return as above.
The problem is how Nick Bastin said, "it is not search speed, but the cost of making the dictionary larger as you continue to add elements".
The cost is caused by the hashmap that creates for a new item. Due the hashmap is a short eventually collision and makes other proccesses to insert.
One solution is recompile the Hashmap to make the hashmap larger.
In this case changing for a List was sufficient, this grows without the inconvenient of the collision.
I am new to python and really programming in general and am learning python through a website called rosalind.info, which is a website that aims to teach through problem solving.
Here is the problem, wherein you're asked to calculate the percentage of guanine and thymine to the string of DNA given to for each ID, then return the ID of the sample with the greatest percentage.
I'm working on the sample problem on the page and am experiencing some difficulty. I know my code is probably really inefficient and cumbersome but I take it that's to be expected for those who are new to programming.
Anyway, here is my code.
gc = open("rosalind_gcsamp.txt","r")
biz = gc.readlines()
i = 0
gcc = 0
d = {}
for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
biz[i] = biz[i].replace("\n","")
biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
del biz[i+2]
What I'm trying to accomplish here is, given input such as this:
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
Break what's given into a list based on the lines and concatenate the two lines of DNA like so:
['>Rosalind_6404', 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG', 'TCCCACTAATAATTCTGAGG\n']
And delete the entry two indices after the ID, which is >Rosalind. What I do with it later I still need to figure out.
However, I keep getting an index error and can't, for the life of me, figure out why. I'm sure it's a trivial reason, I just need some help.
I've even attempted the following to limited success:
for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
biz[i] = biz[i].replace("\n","")
biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
elif biz[i].startswith("A" or "C" or "G" or "T") and biz[i+1].startswith(">"):
del biz[i]
which still gives me an index error but at least gives me the biz value I want.
Thanks in advance.
It is very easy do with itertools.groupby using lines that start with > as the keys and as the delimiters:
from itertools import groupby
with open("rosalind_gcsamp.txt","r") as gc:
# group elements using lines that start with ">" as the delimiter
groups = groupby(gc, key=lambda x: not x.startswith(">"))
d = {}
for k,v in groups:
# if k is False we a non match to our not x.startswith(">")
# so use the value v as the key and call next on the grouper object
# to get the next value
if not k:
key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
d[key] = val
print(d)
{'>Rosalind_0808': 'CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT', '>Rosalind_5959': 'CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC', '>Rosalind_6404': 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG'}
If you need order use a collections.OrderedDict in place of d.
You are looping over the length of biz. So in your last iteration biz[i+1] and biz[i+2] don't exist. There is no item after the last.
I have code that generates a list of 28 dictionaries. It cycles thru 28 files and links data points from each file in the appropriate dictionary. In order to make my code more flexible I wanted to use:
tegDics = [dict() for x in range(len(files))]
But when I run the code the first 27 dictionaries are blank and only the last, tegDics[27], has data. Below is the code including the clumsy, yet functional, code I'm having to use that generates the dictionaries:
x=0
import os
files=os.listdir("DirPath")
os.chdir("DirPath")
tegDics = [{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}] # THIS WORKS!!!
#tegDics = [dict() for x in range(len(files))] - THIS WON'T WORK!!!
allRads=[]
while x<len(tegDics): # now builds dictionaries
for line in open(files[x]):
z=line.split('\t')
allRads.append(z[2])
tegDics[x][z[2]]=z[4] # pairs catNo with locNo
x+=1
Does anybody know why the more elegant code doesn't work.
Since you're using x within the list comprehension, it will no longer be zero by the time you reach the while loop - it will be len(files)-1 instead. I suggest changing the variable you use to something else. It's traditional to use a single underscore for a value you don't care about.
tegDics = [dict() for _ in range(len(files))]
It could be useful to eliminate your use of x entirely. It's customary in python to iterate directly over the objects in a sequence, rather than using a counter variable. You might do something like:
for tegDic in tegDics:
#do stuff with tegDic here
Although it's slightly trickier in your case, since you want to simultaneously iterate through tegDics and files at the same time. You can use zip to do that.
import os
files=os.listdir("DirPath")
os.chdir("DirPath")
tegDics = [dict() for _ in range(len(files))]
allRads=[]
for file, tegDic in zip(files,tegDics):
for line in open(file):
z=line.split('\t')
allRads.append(z[2])
tegDic[z[2]]=z[4] # pairs catNo with locNo
Anyway there is a simplest way imho:
taegDics = [{}]*len(files)