Running the same program until condition satisfied - python

I am trying to create a small program that searches a folder of images, chooses one, checks its size and finishes if the chosen image is at least 5KB. If it is not then I need it to loop back to the choosing step (and then the size check, and so on..)
I am using functions for the choosing and the size-check but when I try to use them in a while loop I get all sorts of indentation errors and now I'm very confused. I've commented the section where I was using the function, but really I guess I want the whole thing to loop back, to the comment at the top..
Here's my code -
#CHOOSE POINT
def chosen():
random.choice(os.listdir(r"/Users/me/p1/images"))
def size():
os.path.getsize(r"/Users/me/p1/images/"+chosen)
thresh = 5000
while size < thresh:
print(chosen + " is too small")
# loop back to CHOOSE POINT
else:
print(chosen + " is at least 5KB")
Am I thinking about this all wrong? Will using the function in my while-loop do what I want? What's the best way to achieve what I'm trying to do? I'm quite new to this and getting very confused.

The first thing to realise is that code like this:
def chosen():
random.choice(os.listdir(r"/Users/me/p1/images"))
is only the definition of a function. It only runs each time you actually call it, with chosen().
Secondly, random.choice() will make a random choice from the list provided (although it's fairly inefficient to keep reading that from disk every time you call it, and it's unclear why you'd pick one at random, but that's OK), but since you don't actually return the value, the function isn't very useful. A choice is made and then discarded. Instead you probably wanted:
def chosen():
return random.choice(os.listdir(r"/Users/me/p1/images"))
Thirdly, this function definition:
def size():
os.path.getsize(r"/Users/me/p1/images/"+chosen)
It tries to use chosen, but that's just the name of a function you previously defined. You probably want get the size of an actual file that was chosen, which the function needs to be provided with as a parameter:
def size(fn):
return os.path.getsize(r"/Users/me/p1/images/"+fn)
Now to use those functions:
file_size = 0
threshold = 5000
while file_size < threshold:
a_file = chosen()
file_size = size(a_file)
if file_size < threshold:
print(a_file + " is too small")
else:
print(a_file + " is at least 5KB")
print('Done')
The variable file_size is initialised to 0, to make sure the loop starts. The loop will keep going until the condition is met at the start.
Every time, chosen() is executed, the returned value is remembers as the variable a_file, which you can then use in the rest of the code to refer back to.
It then gets passed to size(), to obtain a size and finally, the test is performed to print the right message.
A more efficient way to achieve the same:
threshold = 5000
while True:
a_file = chosen()
file_size = size(a_file)
if file_size < threshold:
print(a_file + " is too small")
else:
break
print(a_file + " is at least 5KB")
The break just exits the while loop which would keep going forever since it tests for True. This avoid testing the same thing twice.
So, you'd end up with:
import random
import os
def chosen():
return random.choice(os.listdir(r"/Users/me/p1/images/"))
def size(fn):
return os.path.getsize(r"/Users/me/p1/images/"+fn)
threshold = 5000
while True:
a_file = chosen()
file_size = size(a_file)
if file_size < threshold:
print(a_file + " is too small")
else:
break
print(a_file + " is at least 5KB")

Related

Algorithm to return all possible paths in this program to a nested list

So I have a game with a function findViableMoves(base). If i call this function at the start with the parameter base, I get an output [move1, move2 ... moven] which denotes all of the n viable moves that the user can perform give the state base. (there are in fact 2 root moves)
Upon performing a move, say move2 - base gets changed depending on the move, the function gets called again and now we have an output for findViableMoves(base) of [move21,move22 .... move2n].
Depth-first-tree
If you look at this diagram, it's a very similar situation - there's no looping back, it's just a normal tree. I need a program that performs a depth-first search (i think?) on all the possible moves given a starting state of base, and then returns then in a list as such:
[[move1,move11,move111],[move1,move11,move112],....[moven,moven1,moven11],...]
There will be more elements in these lists (14 at most), but I was just wondering if someone could provide any hints over how I can build an algorithm to do this? Efficiency doesn't really matter to me as there isn't too many paths, I just want it done for now.
I'm not 100% clear on what you're after, but if you have a list or similar iterable that is changing while the loop is happening you could try something like the below.
This example allows the list and the loop condition to both remain dynamic during the loop execution.
import random
import sys
import time
changing_list = ['A', 27, 0.12]
def perform_operation(changing_list, counter):
sometimes_add_another_element_threshold = 0.6
if random.random() > sometimes_add_another_element_threshold:
changing_list.append(random.random())
print(changing_list[counter])
def main(z=0):
safety_limit = 100
counter = 0
condition = True
while condition and counter < safety_limit:
perform_operation(changing_list, counter)
counter += 1
condition = counter<len(changing_list)
print("loop finished")
if __name__ == '__main__':
start_time = time.time()
main(int(sys.argv[1])) if len(sys.argv)>1 else main()
print(time.time() - start_time)
which provides output of varying length that looks something like:
A
27
0.12
0.21045788812161237
0.20230442292518247
loop finished
0.0058634281158447266

how can i speed up my code?

It's a program that suggests to the user a player's name if the user made a typo. It's extremely slow.
First it has to issue a get request, then checks to see if the player's name is within the json data, if it is, pass. Else, it takes all the players' first and last names and appends it to names. Then it checks whether the first_name and last_name closely resembles the names in the list using get_close_matches. I knew from the start this would be very slow, but there has to be a faster way to do this, it's just I couldn't come up with one. Any suggestions?
from difflib import get_close_matches
def suggestion(first_name, last_name):
names = []
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json")
for n in my_request['activeplayers']['playerentry']:
if last_name == n['player']['LastName'] and first_name == n['player']['FirstName']:
pass
else:
names.append(n['player']['FirstName'] + " " + n['player']['LastName'])
suggest = get_close_matches(first_name + " " + last_name, names)
return "did you mean " + "".join(suggest) + "?"
print suggestion("mattthews ", "stafffford") #should return Matthew Stafford
Well, since it turned out my suggestion in the comments worked out, I might as well post it as an answer with some other ideas included.
First, take your I/O operation out of the function so that you're not wasting time making the request every time your function is run. Instead, you will get your json and load it into local memory when you start the script. If at all possible, downloading the json data beforehand and instead opening a text file might be a faster option.
Second, you should get a set of unique candidates per loop because there is no need to compare them multiple times. When a name is discarded by get_close_matches(), we know that same name does not need to be compared again. (It would be a different story if the criteria with which the name is being discarded depends on the subsequent names, but I doubt that's the case here.)
Third, try to work with batches. Given that get_close_matches() is reasonably efficient, comparing to, say, 10 candidates at once shouldn't be any slower than to 1. But reducing the for loop from going over 1 million elements to over 100K elements is quite a significant boost.
Fourth, I assume that you're checking for last_name == ['LastName'] and first_name == ['FirstName'] because in that case there would have been no typo. So why not simply break out of the function?
Putting them all together, I can write a code that looks like this:
from difflib import get_close_matches
# I/O operation ONCE when the script is run
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json")
# Creating batches of 10 names; this also happens only once
# As a result, the script might take longer to load but run faster.
# I'm sure there is a better way to create batches, but I'm don't know any.
batch = [] # This will contain 10 names.
names = [] # This will contain the batches.
for player in my_request['activeplayers']['playerentry']:
name = player['FirstName'] + " " + player['LastName']
batch.append(name)
# Obviously, if the number of names is not a multiple of 10, this won't work!
if len(batch) == 10:
names.append(batch)
batch = []
def suggest(first_name, last_name, names):
desired_name = first_name + " " + last_name
suggestions = []
for batch in names:
# Just print the name if there is no typo
# Alternatively, you can create a flat list of names outside of the function
# and see if the desired_name is in the list of names to immediately
# terminate the function. But I'm not sure which method is faster. It's
# a quick profiling task for you, though.
if desired_name in batch:
return desired_name
# This way, we only match with new candidates, 10 at a time.
best_matches = get_close_matches(desired_name, batch)
suggestions.append(best_matches)
# We need to flatten the list of suggestions to print.
# Alternatively, you could use a for loop to append in the first place.
suggestions = [name for batch in suggestions for name in batch]
return "did you mean " + ", ".join(suggestions) + "?"
print suggestion("mattthews ", "stafffford") #should return Matthew Stafford

Iteration over an array is behaving as if the array is being modified during iteration, when this is not happening

I’m writing some software for a weather station in python. Every minute, on the minute, new values are taken from sensors (the values are actually taken from a file that updates each minute) and added to an array. The items in this array are to be uploaded to an online database. The array acts as a buffer to stop the current latest data being overwritten by a newer version if, for example the internet connection goes down. When the connection comes back up, the items in the array can all be uploaded and nothing will have been lost. The population of the array happens in a separate thread.
Back on the main thread, 1 second after the minute a copy of the array is made. This copy is then iterated over and each item is uploaded to the database. Following the upload of an item, it is removed from the ORIGINAL array but not the copy, to prevent errors caused by manipulating an while iterating over it. If an error occurs with uploading any of the items then the entire iteration stops and is left to be resumed on the next minute, in order to preserve the chronology of data in the database. This is thread safe because I’m deleting from the original array, but I am iterating over a copy of it, and I see no problems with the threading side of things anywhere in the code.
However, I am having a very hard to remove problem that looks like something you would find when modifying a list while iterating over it, even though this is not happening anywhere. When supplied with one piece of data every minute (this is normal operation), everything uploads fine. Additionally, it also works fine if there are two items in the array (done by removing the internet connection for one minute). Both items upload in the right order. However, if there are more than two items to upload in the list (internet is down for more than a minute) then the iteration seems to skip every other item in the list. This results in every other item being uploaded, then every other item of the remaining items in the array being uploaded and so on until the array returns to a normal one per minute. This leads to the uploads in the database being out of order.
I have narrowed the error down to the line that deletes the item from the uploadBuffer array, as without this line it works fine (except obviously the items aren’t removed from the buffer once uploaded). I don’t see how this problem can be exist because I’m iterating over a copy of the original list. Removal of the item from the original list happens using its value and not its index, so that can’t be the problem. Also, the copy of the original array is made once, before the iteration begins, so I don’t see how removing items from the original can affect the iteration over the copy.
I really would like to get this fixed as I’ve been trying for over two weeks with no luck, so thanks for any help, and my code is below. Also, please mention if you notice any problems with how I am using the threading locks as I’m not completely sure if I’m using them correctly.
import urllib2
from lxml import etree
import os
import datetime
import time
import httplib
import socket
import threading
timeFormat = "%d/%m/%Y at %H:%M:%S.%f"
rollingData = "/home/pi/weatherstation/other/latest.xml"
bufferLock = threading.Lock()
print("MOD_UPLOAD: file watching started on " +
datetime.datetime.now().strftime(timeFormat))
uploadBuffer = []
def watchForReport():
previousModified = os.stat(rollingData)[8]
while True:
try:
currentModified = os.stat(rollingData)[8]
# New data needs to be uploaded
if not currentModified == previousModified:
newData = open(rollingData, "r").read()
bufferLock.acquire()
uploadBuffer.append(newData)
bufferLock.release()
previousModified = currentModified
except:
print("OSError")
# Watch for new data in separate thread
threading.Thread(target = watchForReport).start()
while True:
currentSecond = datetime.datetime.now().strftime("%S")
if currentSecond == "01" or currentSecond == "02":
if bufferLock.locked() == False:
bufferLock.acquire()
bufferCopy = uploadBuffer
bufferLock.release()
# If there are records to be uploaded
if len(bufferCopy) > 0:
for item in bufferCopy:
print("\nreport: " + item[:35])
print("bufferCount: " + str(len(bufferCopy)))
if item == "":
# This isn't the problem, as the text 'empty' is almost never printed
print("empty")
bufferCopy.remove(item)
break
else:
report = etree.fromstring(item)
# Separate date and time and format
timestamp = report.xpath("#time")[0]
tDate = timestamp.rsplit("T", 2)[0]
tTime = timestamp.rsplit("T", 2)[1]
tTime = tTime.replace(":", "-")
# Generate URL to write data to the database
url = "http://www.example.com/uploadfile.php?date="
url += tDate + "&time=" + tTime + "&shieldedtemp="
url += report.xpath("#shieldedtemp")[0] + "&exposedtemp="
url += report.xpath("#exposedtemp")[0] + "&soil10temp="
url += report.xpath("#soil10temp")[0] + "&soil30temp="
url += report.xpath("#soil30temp")[0] + "&soil100temp="
url += report.xpath("#soil100temp")[0]
try:
# Upload report to database
response = urllib2.urlopen(url, timeout = 9).read()
print("response: " + response)
# Response[4] means success
if response == "response[4]":
bufferLock.acquire()
# Thus is the line causing the problem:
uploadBuffer.remove(item)
bufferLock.release()
else:
# Break out of loop on failure to preserve chronology
break
except urllib2.HTTPError:
print("urllib2.HTTPError")
print("bufferReCount: " + str(len(bufferCopy)))
break
except httplib.BadStatusLine:
print("httplib.BadStatusLine")
print("bufferReCount: " + str(len(bufferCopy)))
break
except urllib2.URLError:
print("urllib2.URLError")
print("bufferReCount: " + str(len(bufferCopy)))
break
except socket.timeout:
print("timeout")
print("bufferReCount: " + str(len(bufferCopy)))
break
print("bufferReCount: " + str(len(bufferCopy)))

If statement doesn't take place

I have been testing some my code and for some reason my if statement is being ignored. The first if statement works but the second if statement doesn't, i have tried changing it to elif and it still doesn't work. Thanks in advance.
import random
diff = input("What is the ritual difficulty? ")
level = input("How many ritual levels do you have that pertain to this ritual? ")
bag = []
for success in xrange(10):
bag.append("Success")
bag.append("Flaw")
bag.append("Fail")
extra = level - diff
if extra >= 1:
extra = extra / 2
int(extra)
for chance in xrange(extra):
bag.append("Success")
if extra < 0:
for chance in xrange(extra):
bag.append("Flaw")
bag.append("Fail")
bag.append("Backlash")
print bag
random.shuffle(bag)
outcome = bag.pop()
print "The outcome of the ritual is: ", outcome
You second if will be entered if the diff is larger than the level. However, even if it does get entered it won't actally do something:
if extra < 0:
for chance in xrange(extra):
bag.append("Flaw")
bag.append("Fail")
bag.append("Backlash")
The xrange function will yield an empty "list" (it's not actually a list, as the documentation for xrange explains) for negative values, i.e. nothing will be appended to the bag.
It is not ignored, but when you run a for loop on xrange(extra) with extra being negative, then obviously this loop is immediately terminated.
P.S.: you might wanna try xrange(-extra) instead...

Python define and call function - method behavior changing upon processing user input

I defined a method, like so:
class MyDatastructure(object):
# init method here
def appending(self, elem):
self.data.append(elem)
if self.count >= self.size:
print "popping " + str(self.data[0])
print "inserting " + str(elem)
self.data.pop(0)
elif self.count < self.size:
self.count += 1
print "count after end of method " + str(self.count)
I tested it out, and it worked as supposed to.
Underneath this definition, I wanted to process some user input and use this method. However, it doesn't enter the if case anymore! Any idea why?
# in the same file
def process_input():
while True:
# getting user input
x = raw_input()
ds = MyDatastructure(x) # creating data structure of size, which was taken from user input, count initially 0
ds.appending(1)
ds.appending(2)
ds.appending(3)
# Still appending and NOT popping, even though the if in appending doesn't allow it!
# This functionality works if I test it without user input!
The problem is with this line:
x = raw_input()
Calling raw_input will return a string. If I type in the number 3, that means that the data structure will assign the string "3" to the size.
Attempting to compare a string to a number is considered undefined behavior, and will do weird things -- see this StackOverflow answer. Note that Python 3 fixes this piece of weirdness -- attempting to compare a string to an int will cause a TypeError exception to occur.
Instead, you want to convert it into an int so you can do your size comparison properly.
x = int(raw_input())

Categories

Resources