Set-up
I'm scraping apartment ads using Scrapy. For certain housing characteristics, I loop over the elements of a list BC I obtain per ad. If the characteristic is in the list I assign a 'yes' and if not a 'no'. E.g.
for x in BC:
if 'Terraza' in x:
terrace = 'yes'
break
else:
terrace = 'no'
For each 'yes-no' characteristic I have a copy of the above loop.
Problem
Besides looping over the elements of the list, I'd like to loop over the characteristics themselves. I.e. I'd like to 'merge' all the loops per characteristic into one loop.
I've tried the following (my actual bcl does contain multiple elements):
found = False
bcl = ['Terraza']
for x in l: # l is a list of strings containing housing characteristics
for y in bcl:
if y in x:
y = 'yes'
found = True
break
else:
y = 'no'
if found:
break
terrace = Terrazza
but this loop does not create the variable Terrazza. I'm not sure I can solve this with globals.
How do I make this loop work?
Depending on the wanted result, you may take a different approach. In a case like this, I tend to use a more functional style of coding. I am not sure if this is what you intend, but I think you could do it this way:
list_characteristics = ['Terraza', "Swimming pool"] # sample ad text - list of strings
bcl = ["test1", "test2", "Terraza", "test3"] # sample checklist - list of strings
def check_characteristics(checklist, adlist):
list_of_found_items = []
for characteristic in list_characteristics:
print("characteristic:", characteristic)
for item in bcl:
print("item:", item)
if item in characteristic:
list_of_found_items.append(item)
return list_of_found_items
result_list = check_characteristics(bcl, list_characteristics)
print("This ad has the following characteristics:", result_list)
Using the code above, you have a function that takes two lists of strings and lists all the items found. In case you one want to know if there is at least one of those, you can use this other function as a faster, short-circuit, way:
list_characteristics = ['Terraza', "Swimming pool"] # ad text - list of strings
bcl = ["test1", "test2", "Terraza", "test3"] # checklist - list of strings
def has_any_characteristic(checklist, adlist):
for characteristic in list_characteristics:
for item in bcl:
if item in characteristic:
return True
return False
result = has_any_characteristic(bcl, list_characteristics)
print("This ad has at least one of the wanted characteristics?", result)
Seems like a lot of code, but you code it once and then you use it whenever you need it, in a clean and easy to read way, IMHO. The definition of these two functions can even be placed in a separate module that you import where needed. So, in the main code, you will only need to use one line to call the function. Each function allows to answer a simple question in a way that is easy to understand, as presented in the print()statements in both code samples above.
Your problem isn't in merging loops but probably in breaking from outer loop. You can break out of a top loop by raising custom exception and then trying to catch it. Take a look at this peace of code:
bcl = ['Terraza']
class FoundCharacteristicException(Exception):
pass
for x in li:
try:
for y in bcl:
if y in x:
raise FoundCharacteristicException
except FoundCharacteristicException:
break
Related
I've followed a tutorial to write a Flask REST API and have a special request about a Python code.
The offered code is following:
# data list is where my objects are stored
def put_one(name):
list_by_id = [list for list in data_list if list['name'] == name]
list_by_id[0]['name'] = [new_name]
print({'list_by_id' : list_by_id[0]})
It works, which is nice, and even though I understand what line 2 is doing, I would like to rewrite it in a way that it's clear how the function iterates over the different lists. I already have an approach but it returns Key Error: 0
def put(name):
list_by_id = []
list = []
for list in data_list:
if(list['name'] == name):
list_by_id = list
list_by_id[0]['name'] = request.json['name']
return jsonify({'list_by_id' : list_by_id[0]})
My goal with this is also to be able to put other elements, that don't necessarily have the type 'name'. If I get to rewrite the function in an other way I'll be more likely to adapt it to my needs.
I've looked for tools to convert one way of coding into the other and answers in forums before coming here and couldn't find it.
It may not be beatiful code, but it gets the job done:
def put(value):
for i in range(len(data_list)):
key_list = list(data_list[i].keys())
if data_list[i][key_list[0]] == value:
print(f"old value: {key_list[0], data_list[i][key_list[0]]}")
data_list[i][key_list[0]] = request.json[test_key]
print(f"new value: {key_list[0], data_list[i][key_list[0]]}")
break
Now it doesn't matter what the key value is, with this iteration the method will only change the value when it finds in the data_list. Before the code breaked at every iteration cause the keys were different and they played a role.
I am new to Python and I was wondering if there was a way I could shorten/optimise the below loops:
for breakdown in data_breakdown:
for data_source in data_source_ids:
for camera in camera_ids:
if (camera.get("id") == data_source.get("parent_id")) and (data_source.get("id") == breakdown.get('parent_id')):
for res in result:
if res.get("camera_id") == camera.get("id"):
res.get('data').update({breakdown.get('name'): breakdown.get('total')})
I tried this oneliner, but it doesn't seem to work:
res.get('data').update({breakdown.get('name'): breakdown.get('total')}) for camera in camera_ids if (camera.get("id") == data_source.get("parent_id")) and (data_source.get("id") == breakdown.get('parent_id'))
You can use itertools.product to handle the nested loops for you, and I think (although I'm not sure because I can't see your data) you can skip all the .get and .update and just use the [] operator:
from itertools import product
for b, d, c in product(data_breakdown, data_source_ids, camera_ids):
if c["id"] != d["parent_id"] or d["id"] != b["parent_id"]:
continue
for res in result:
if res["camera_id"] == c["id"]:
res['data'][b['name']] = b['total']
If anything, to optimize the performance of those loops, you should make them longer and more nested, with the data_source.get("id") == breakdown.get('parent_id') happening outside of the camera loop.
But there is perhaps an alternative, where you could change the structure of your data so that you don't need to loop nearly as much to find matching ID values. Convert each of your current lists (of dicts) into a single dict with its keys equal to the 'id' value you'll be trying to match in that loop, and the value being whole dict.
sources_dict = {source.get("id"): source for source in data_source_ids}
cameras_dict = {camera.get("id"): camera for camera in camera_ids}
results_dict = {res.get("camera_id"): res for res in result}
Now the whole loop only needs one level:
for breakdown in data_breakdown:
source = sources_dict[breakdown["parent_id"]]
camera = cameras_dict[source["parent_id"]]
res = results_dict[camera["id"]]
res.data[breakdown["name"]] = breakdown["total"]
This code assumes that all the lookups with get in your current code were going to succeed in getting a value. You weren't actually checking if any of the values you were getting from a get call was None, so there probably wasn't much benefit to it.
I'd further note that it's not clear if the camera loop in your original code was at all necessary. You might have been able to skip it and just directly compare data_source['parent_id'] against res['camera_id'] without comparing them both to a camera['id'] in between. In my updated version, that would translate to leaving out the creation of the cameras_dict and just directly indexing results_dict with source["parent_id"] rather than indexing to find camera first.
I'm making a very simple graphical program and am quite new.
nums = ["second", "third", "fourth"]
for colours in range(3):
numString = nums[colours]
inputWindow.getMouse()
colour1 = inputBox.getText()
inputBox.setText("")
instruction.setText("Please enter the {0} colour: ".format(numString))
Where I put 'colour1', I want it to cycle through colour1, colour2, colour3 and colour4 on each iteration (without using a long if statement). Dictionary cannot be used for this particular program.
At the end, the function returns all of these colours inputted from the user. I tried using a list but realised you can't use them for variable names.
Thanks for help.
Edit: Seeing as there's a lot of confusion (which I'm sorry for), I'll try and explain better:
My code up there is kind of strange and comes from nowhere. I'll simplify it:
def getInputs():
for colours in range(3):
colour1 = userInput()
return colour1, colour2, colour3, colour4
This is basically the gist. I wanna know if there's a way to cycle through the different variables where 'colour1 = userinput()' is (without using dictionary).
Editing to reflect the new information. The principal thing to keep in mind here is that you can use sequence types (list, dict, etc.) to collect your results.
def get_inputs():
# a list to collect the inputs
rval = []
for colours in range(4):
# range(4) will walk through [0,1,2,3]
# this will add to the end of the list
rval.append(userInput())
# after the for loop ran four times, there will be four items in the list
return rval
In case you really want to want return a tuple, the last line can be return tuple(rval)
I'm a computing student who posted a question here the other day about helping me with my function to sort scores in order and I got some great help and it now works but I would also like it to sort the names according to the scores (so if James gets 10 then it prints "James 10". Right now what is happening is that the scores are sorting and printing to the screen properly but the names are just printing in the order that they are entered. I've tried this:
def sortlist():
global scorelist, namelist, hss
namelist = []
scorelist = []
hs = open("hstname.txt", "r")
namelist = hs.read().splitlines()
hss = open("hstscore.txt","r")
for line in hss:
scorelist.append(int(line))
switched = True
while switched:
switched = False
for i in range(len(scorelist)-1):
for j in range(len(namelist)-1):
if scorelist[i] < scorelist[i+1]:
scorelist[i],scorelist[i+1] = scorelist[i+1],scorelist[i]
namelist[j],namelist[j+1] = namelist[j+1],namelist[j]
switched = True
The score part works fine and it took me ages to get it and I'm not allowed to use a pre defined function like .sort(). Can anyone offer any help/advice? Or if you can see what I'm doing wrong then can you offer a solution? I can't work this out for the life of me
You don't need to use nested loops in order to go through both lists.
Since you should be manipulating the two lists in exactly the same way, you should only be using one for loop, and using the i variable to index into both lists.
If yoy turn this:
for i in range(len(scorelist)-1):
for j in range(len(namelist)-1):
if scorelist[i] < scorelist[i+1]:
scorelist[i],scorelist[i+1] = scorelist[i+1],scorelist[i]
namelist[j],namelist[j+1] = namelist[j+1],namelist[j]
switched = True
into this:
for i in range(len(scorelist)-1):
if scorelist[i] < scorelist[i+1]:
scorelist[i],scorelist[i+1] = scorelist[i+1],scorelist[i]
namelist[i],namelist[i+1] = namelist[i+1],namelist[i]
switched = True
then you should get both lists sorted as you want them.
The only time when this will cause an error is if your two lists are of different lengths. If namelist is somehow shorter than scorelist then this code will throw an exception. You could guard against that by checking before your sort routine that
len(scorelist) == len(namelist)
Imagine, you have a function that returns a single sorted list e.g. here's an implementation of Quicksort algorithm in Python:
def qsorted(L):
return L and (qsorted([x for x in L[1:] if x < L[0]]) + # lesser items
[L[0]] + # pivot
qsorted([x for x in L[1:] if x >= L[0]])) # greater or equal
Then you could use it to sort your scorelist:
qsorted_scorelist = qsorted(scorelist)
To sort namelist according to the order of scorelist; you could use Schwartzian transform:
qsorted_namelist = [name for score, name in qsorted(zip(scorelist, namelist))]
Note that the same function qsorted() is used in both case: to sort the scorelist along and to sort both lists together. You should try to extract common functionality into separate functions instead of modifying your sorting algorithm inplace for a slighty different task.
To test that the result is correct; you could use the builtin sorted() function:
sorted_namelist = [name for score, name in sorted(zip(scorelist, namelist))]
i ran into a little logic problem and trying to figure it out.
my case is as follows:
i have a list of items each item represents a Group
i need to create a set of nested groups,
so, for example:
myGroups = ["head", "neck", "arms", "legs"]
i need to get them to be represented like this:
(if you can imaging a folder structure)
head
|_> neck
|_> arms
|_>legs
and so on until i hit the last element.
what i thought would work (but don't know really how to advance here) is:
def createVNTgroups(self, groupsData):
for i in range(len(groupsData)):
print groupsData[i]
for q in range(1, len(groupsData)):
print groupsData[q]
but in this case, i am running over same elements in 'i' that i already took with 'q'.
could someone give me a hint?
thanks in advance!
If I understood well, you want a nested structure. For this case, you can use a recursive function:
myGroups = ["head", "neck", "arms", "legs"]
def createVNTgroups(alist):
temp = alist[:] # needed because lists are mutable
first = [temp.pop(0)] # extract first element from list
if temp: # if the list still contains more items,
second = [createVNTgroups(temp)] # do it recursively
return first + second # returning the second object attached to the first.
else: # Otherwise,
return first # return the last element
print createVNTgroups(myGroups)
this produces a nested list:
['head', ['neck', ['arms', ['legs']]]]
Is that what you were looking for?
>>> m
['head', 'neck', 'arms', 'legs']
>>> reduce(lambda x,y:[x,y][::-1] if x!=y else [x], m[::-1],m[-1])
['head', ['neck', ['arms', ['legs']]]]