Concatenating strings more efficiently in Python - python

I've been learning Python for a couple of months, and wanted to understand a cleaner and more efficient way of writing this function. It's just a basic thing I use to look up bus times near me, then display the contents of mtodisplay on an LCD, but I'm not sure about the mtodisplay=mtodisplay+... line. There must be a better, smarter, more Pythonic way of concatenating a string, without resorting to lists (I want to output this string direct to LCD. Saves me time. Maybe that's my problem ... I'm taking shortcuts).
Similarly, my method of using countit and thebuslen seems a bit ridiculous! I'd really welcome some advice or pointers in making this better. Just wanna learn!
Thanks
json_string = requests.get(busurl)
the_data = json_string.json()
mtodisplay='220 buses:\n'
countit=0
for entry in the_data['departures']:
for thebuses in the_data['departures'][entry]:
if thebuses['line'] == '220':
thebuslen=len(the_data['departures'][entry])
print 'buslen',thebuslen
countit += 1
mtodisplay=mtodisplay+thebuses['expected_departure_time']
if countit != thebuslen:
mtodisplay=mtodisplay+','
return mtodisplay

Concatenating strings like this
mtodisplay = mtodisplay + thebuses['expected_departure_time']
Used to be very inefficient, but for a long time now, Python does reuse the string being catentated to (as long as there are no other references to it), so it's linear performance instead of the older quadratic performance which should definitely be avoided.
In this case it looks like you already have a list of items that you want to put commas between, so
','.join(some_list)
is probably more appropriate (and automatically means you don't get an extra comma at the end).
So next problem is to construct the list(could also be a generator etc.). #bgporter shows how to make the list, so I'll show the generator version
def mtodisplay(busurl):
json_string = requests.get(busurl)
the_data = json_string.json()
for entry in the_data['departures']:
for thebuses in the_data['departures'][entry]:
if thebuses['line'] == '220':
thebuslen=len(the_data['departures'][entry])
print 'buslen',thebuslen
yield thebuses['expected_departure_time']
# This is where you would normally just call the function
result = '220 buses:\n' + ','.join(mtodisplay(busurl))

I'm not sure what you mean by 'resorting to lists', but something like this:
json_string = requests.get(busurl)
the_data = json_string.json()
mtodisplay= []
for entry in the_data['departures']:
for thebuses in the_data['departures'][entry]:
if thebuses['line'] == '220':
thebuslen=len(the_data['departures'][entry])
print 'buslen',thebuslen
mtodisplay.append(thebuses['expected_departure_time'])
return '220 buses:\n' + ", ".join(mtodisplay)

Related

Deleting/Removing element from a list when comparing to another list Python

So I have a good one. I'm trying to build two lists (ku_coins and bin_coins) of crypto tickers from two different exchanges, but I don't want to double up, so if it appears on both exchanges I want to remove it from ku_coins.
A slight complication occurs as Kucoin symbols come in as AION-BTC, while Binance symbols come in as AIONBTC, but it's no problem.
So firstly, I create the two lists of symbols, which runs fine, no problem. What I then try and do is loop through the Kucoin symbols and convert them to the Binance style symbol, so AIONBTC instead of AION-BTC. Then if it appears in the Binance list I want to remove it from the Kucoin list. However, it appears to randomly refuse to remove a handful of symbols that match the requirement. For example AION.
It removes the majority of doubled up symbols but in AIONs case for example it just won't delete it.
If I just do print(i) after this loop:
for i in ku_coins:
if str(i[:-4] + 'BTC') in bin_coins:
It will happily print AION-BTC as one of the symbols, as it fits the requirement perfectly. However, when I stick the ku_coins.remove(i) command in before printing, it suddenly decideds not to print AION suggesting it doesn't match the requirements. And it's doing my head in. Obviously the remove command is causing the problem, but I can't for the life of me figure out why. Any help really appreciated.
import requests
import json
ku_dict = json.loads(requests.get('https://api.kucoin.com/api/v1/market/allTickers').text)
ku_syms = ku_dict['data']['ticker']
ku_coins = []
for x in range(0, len(ku_syms)):
if ku_syms[x]['symbol'][-3:] == 'BTC':
ku_coins.append(ku_syms[x]['symbol'])
bin_syms = json.loads(requests.get('https://www.binance.com/api/v3/ticker/bookTicker').text)
bin_coins = []
for i in bin_syms:
if i['symbol'][-3:] == 'BTC':
bin_coins.append(i['symbol'])
ku_coins.sort()
bin_coins.sort()
for i in ku_coins:
if str(i[:-4] + 'BTC') in bin_coins:
ku_coins.remove(i)
#top bantz, #Fourier has already mentioned that you shouldn't modify a list you're iterating over. What you can do in this case is to create a copy of ku_coins first then iterate over that, and then remove the element from the original ku_coins that matches your if condition. See below:
ku_coins.sort()
bin_coins.sort()
# Create a copy
ku_coins_ = ku_coins[:]
# Then iterate over that copy
for i in ku_coins_:
if str(i[:-4] + 'BTC') in bin_coins:
ku_coins.remove(i)
How about modifying the code to:
while ku_coins:
i = ku_coins.pop()
if str(i[:-4] + 'BTC') in bin_coins:
pass
else:
# do something
the pop() method removes i from the ku_coins list
pop()

Python Printing on the same Line

The problem that I have is printing phone_sorter() and number_calls() all on the same lines. For instance it will print the two lines of phone_sorter but the number_calls will be printed right below it. I have tried the end='' method but it does not seem to work.
customers=open('customers.txt','r')
calls=open('calls.txt.','r')
def main():
print("+--------------+------------------------------+---+---------+--------+")
print("| Phone number | Name | # |Duration | Due |")
print("+--------------+------------------------------+---+---------+--------+")
print(phone_sorter(), number_calls())
def time(x):
m, s = divmod(seconds, x)
h, m = divmod(m, x)
return "%d:%02d:%02d" % (h, m, s)
def phone_sorter():
sorted_no={}
for line in customers:
rows=line.split(";")
sorted_no[rows[1]]=rows[0]
for value in sorted(sorted_no.values()):
for key in sorted_no.keys():
if sorted_no[key] == value:
print(sorted_no[key],key)
def number_calls():
no_calls={}
for line in calls:
rows=line.split(";")
if rows[1] not in no_calls:
no_calls[rows[1]]=1
else:
no_calls[rows[1]]+=1
s={}
s=sorted(no_calls.keys())
for key in s:
print(no_calls[key])
main()
Your key problem is that both phone_sorter and number_calls do their own printing, and return None. So, printing their return values is absurd and should just end with a None None line that makes no sense, after they've done all their own separate-line printing.
A better approach is to restructure them to return, not print, the strings they determine, and only then arrange to print those strings with proper formatting in the "orchestrating" main function.
It looks like they'll each return a list of strings (which they are now printing on separate lines) and you'll likely want to zip those lists if they are in corresponding order, to prepare the printing.
But your code is somewhat opaque, so it's hard to tell if the orders of the two are indeed corresponding. They'd better be, if the final printing is to make sense...
Added: let me exemplify with some slight improvement and one big change in phone_sorter...:
def phone_sorter():
sorted_no={}
for line in customers:
rows=line.split(";")
sorted_no[rows[1]]=rows[0]
sorted_keys = sorted(sorted_no, key=sorted_no.get)
results = [(sorted_no[k], k) for k in sorted_keys]
return results
Got it? Apart from doing the computations better, the core idea is to put together a list and return it -- it's main's job to format and print it appropriately, in concert with a similar list returned by number_calls (which appears to be parallel).
def number_calls():
no_calls=collections.Counter(
line.split(';')[1] for line in calls)
return [no_calls(k) for k in sorted(no_calls)]
Now the relationship between the two lists is not obvious to me, but, assuming they're parallel, main can do e.g:
nc = no_calls()
ps = phone_sorter()
for (duration, name), numcalls in zip(ps, nc):
print(...however you want to format the fields here...)
Those headers you printed in main don't tell me what data should be printed under each, and how the printing should be formatted (width of
each field, for example). But, main, and only main, should be
intimately familiar with these presentation issues and control them, while the other functions deal with the "business logic" of extracting the data appropriately. "Separation of concerns" -- a big issue in programming!

python loop optimzation - iterate dirs 3 levels and delete

Hi I have the following procedure,
Questions:
- How to make it elegant, more readable, compact.
- What can I do to extract common loops to another method.
Assumptions:
From a given rootDir the dirs are organized as in ex below.
What the proc does:
If input is 200, it deletes all DIRS that are OLDER than 200 days. NOT based on modifytime, but based on dir structure and dir name [I will later delete by brute force "rm -Rf" on each dir that are older]
e.g dir structure:
-2009(year dirs) [will force delete dirs e.g "rm -Rf" later]
-2010
-01...(month dirs)
-05 ..
-01.. (day dirs)
-many files. [I won't check mtime at file level - takes more time]
-31
-12
-2011
-2012 ...
Code that I have:
def get_dirs_to_remove(dir_path, olderThanDays):
today = datetime.datetime.now();
oldestDayToKeep = today + datetime.timedelta(days= -olderThanDays)
oldKeepYear = int(oldestDayToKeep.year)
oldKeepMonth =int(oldestDayToKeep.month);
oldKeepDay = int(oldestDayToKeep.day);
for yearDir in os.listdir(dirRoot):
#iterate year dir
yrPath = os.path.join(dirRoot, yearDir);
if(is_int(yearDir) == False):
problemList.append(yrPath); # can't convery year to an int, store and report later
continue
if(int(yearDir) < oldKeepYear):
print "old Yr dir: " + yrPath
#deleteList.append(yrPath); # to be bruteforce deleted e.g "rm -Rf"
yield yrPath;
continue
elif(int(yearDir) == oldKeepYear):
# iterate month dir
print "process Yr dir: " + yrPath
for monthDir in os.listdir(yrPath):
monthPath = os.path.join(yrPath, monthDir)
if(is_int(monthDir) == False):
problemList.append(monthPath);
continue
if(int(monthDir) < oldKeepMonth):
print "old month dir: " + monthPath
#deleteList.append(monthPath);
yield monthPath;
continue
elif (int(monthDir) == oldKeepMonth):
# iterate Day dir
print "process Month dir: " + monthPath
for dayDir in os.listdir(monthPath):
dayPath = os.path.join(monthPath, dayDir)
if(is_int(dayDir) == False):
problemList.append(dayPath);
continue
if(int(dayDir) < oldKeepDay):
print "old day dir: " + dayPath
#deleteList.append(dayPath);
yield dayPath
continue
print [ x for x in get_dirs_to_remove(dirRoot, olderThanDays)]
print "probList" % problemList # how can I get this list also from the same proc?
This actually looks pretty nice, except for the one big thing mentioned in this comment:
print "probList" % problemList # how can I get this list also from the same proc?
It sounds like you're storing problemList in a global variable or something, and you'd like to fix that. Here are a few ways to do this:
Yield both delete files and problem files—e.g., yield a tuple where the first member says which kind it is, and the second what to do with it.
Take the problemList as a parameter. Remember that lists are mutable, so appending to the argument will be visible to the caller.
yield the problemList at the end—which means you need to restructure the way you use the generator, because it's no longer just a simple iterator.
Code the generator as a class instead of a function, and store problemList as a member variable.
Peek at the internal generator information and cram problemList in there, so the caller can retrieve it.
Meanwhile, there are a few ways you could make the code more compact and readable.
Most trivially:
print [ x for x in get_dirs_to_remove(dirRoot, olderThanDays)]
This list comprehension is exactly the same as the original iteration, which you can write more simply as:
print list(get_dirs_to_remove(dirRoot, olderThanDays))
As for the algorithm itself, you could partition the listdir, and then just use the partitioned lists. You could do it lazily:
yearDirs = os.listdir(dirRoot):
problemList.extend(yearDir for yearDir in yearDirs if not is_int(yearDir))
yield from (yearDir for yearDir in yearDirs if int(yearDir) < oldKeepYear)
for year in (yearDir for yearDir in yearDirs if int(yearDir) == oldKeepYear):
# next level down
Or strictly:
yearDirs = os.listdir(dirRoot)
problems, older, eq, newer = partitionDirs(yearDirs, oldKeepYear)
problemList.extend(problems)
yield from older
for year in eq:
# next level down
The latter probably makes more sense, especially given that yearDirs is already a list, and isn't likely to be that big anyway.
Of course you need to write that partitionDirs function—but the nice thing is, you get to use it again in the months and days levels. And it's pretty simple. In fact, I might actually do the partitioning by sorting, because it makes the logic so obvious, even if it's more verbose:
def partitionDirs(dirs, keyvalue):
problems = [dir for dir in dirs if not is_int(dir)]
values = sorted(dir for dir in dirs if is_int(dir), key=int)
older, eq, newer = partitionSortedListAt(values, keyvalue, key=int)
If you look around (maybe search "python partition sorted list"?), you can find lots of ways to implement the partitionSortedListAt function, but here's a sketch of something that I think is easy to understand for someone who hasn't thought of the problem this way:
i = bisect.bisect_right(vals, keyvalue)
if vals[i] == keyvalue:
return problems, vals[:i], [vals[i]], vals[i+1:]
else:
return problems, vals[:i], [], vals[i:]
If you search for "python split predicate" you can also find other ways to implement the initial split—although keep in mind that most people are either concerned with being able to partition arbitrary iterables (which you don't need here), or, rightly or not, worried about efficiency (which you don't care about here either). So, don't look for the answer that someone says is "best"; look at all of the answers, and pick the one that seems most readable to you.
Finally, you may notice that you end up with three levels that look almost identical:
yearDirs = os.listdir(dirRoot)
problems, older, eq, newer = partitionDirs(yearDirs, oldKeepYear)
problemList.extend(problems)
yield from older
for year in eq:
monthDirs = os.listdir(os.path.join(dirRoot, str(year)))
problems, older, eq, newer = partitionDirs(monthDirs, oldKeepMonth)
problemList.extend(problems)
yield from older
for month in eq:
dayDirs = os.listdir(os.path.join(dirRoot, str(year), str(month)))
problems, older, eq, newer = partitionDirs(dayDirs, oldKeepDay)
problemList.extend(problems)
yield from older
yield from eq
You can simplify this further through recursion—pass down the path so far, and the list of further levels to check, and you can turn this 18 lines into 9. Whether that's more readable or not depends on how well you manage to encode the information to pass down and the appropriate yield from. Here's a sketch of the idea:
def doLevel(pathSoFar, dateComponentsLeft):
if not dateComponentsLeft:
return
dirs = os.listdir(pathSoFar)
problems, older, eq, newer = partitionDirs(dirs, dateComponentsLeft[0])
problemList.extend(problems)
yield from older
if eq:
yield from doLevel(os.path.join(pathSoFar, eq[0]), dateComponentsLeft[1:]))
yield from doLevel(rootPath, [oldKeepYear, oldKeepMonth, oldKeepDay])
If you're on an older Python version that doesn't have yield from, the earlier stuff is almost trivial to transform; the recursive version as written will be uglier and more painful. But there's really no way to avoid this when dealing with recursive generators, because a sub-generator cannot "yield through" a calling generator.
I would suggest not using generators unless you are absolutely sure you need them. In this case, you don't need them.
In the below, newer_list isn't strictly needed. While categorizeSubdirs could be made recursive, I don't feel that the increase in complexity is worth the repetition savings (but that's just a personal style issue; I only use recursion when it's unclear how many levels of recursion are needed or the number is fixed but large; three isn't enough IMO).
def categorizeSubdirs(keep_int, base_path):
older_list = []
equal_list = []
newer_list = []
problem_list = []
for subdir_str in os.listdir(base_path):
subdir_path = os.path.join(base_path, subdir_str))
try:
subdir_int = int(subdir_path)
except ValueError:
problem_list.append(subdir_path)
else:
if subdir_int keep_int:
newer_list.append(subdir_path)
else:
equal_list.append(subdir_path)
# Note that for your case, you don't need newer_list,
# and it's not clear if you need problem_list
return older_list, equal_list, newer_list, problem_list
def get_dirs_to_remove(dir_path, olderThanDays):
oldest_dt = datetime.datetime.now() datetime.timedelta(days= -olderThanDays)
remove_list = []
problem_list = []
olderYear_list, equalYear_list, newerYear_list, problemYear_list = categorizeSubdirs(oldest_dt.year, dir_path))
remove_list.extend(olderYear_list)
problem_list.extend(problemYear_list)
for equalYear_path in equalYear_list:
olderMonth_list, equalMonth_list, newerMonth_list, problemMonth_list = categorizeSubdirs(oldest_dt.month, equalYear_path))
remove_list.extend(olderMonth_list)
problem_list.extend(problemMonth_list)
for equalMonth_path in equalMonth_list:
olderDay_list, equalDay_list, newerDay_list, problemDay_list = categorizeSubdirs(oldest_dt.day, equalMonth_path))
remove_list.extend(olderDay_list)
problem_list.extend(problemDay_list)
return remove_list, problem_list
The three nested loops at the end could be made less repetitive at the cost of code complexity. I don't think that it's worth it, though reasonable people can disagree. All else being equal, I prefer simpler code to slightly more clever code; as they say, reading code is harder than writing it, so if you write the most clever code you can, you're not going to be clever enough to read it. :/

Python: print variable together from different for loop

for synset in wn.synsets(wordstr):
len_lemma_names = len (synset.lemma_names)
#print len_lemma_names, synset.lemma_names
count_lemma = count_lemma + len_lemma_names
for synset_scores in swn_senti_synset:
count_synset = count_synset + 1
#print count_synset, synset_scores
I am trying to print len_lemma_names in front of count_synset but it did not work. Is there any way possible for printing them together? Thank you...
I think that you are wanting to iterate over the two, together. If this is the case, you want to use zip, or to avoid turning it all into one big list at once, itertools.izip.
from itertools import izip
for synset, synset_scores in izip(wn.synsets(wordstr), swn_senti_synset):
# Now you can deal with both at once in this loop.
len_lemma_names = len(synset.lemma_names)
count_lemma += len_lemma_names
count_synset += 1
# Mix to taste.
print len_lemma_names, count_synset
Note that the count_synset part may be better done with enumerate (I don't know its initial value or whether you're wanting to use it outside this code).

What is the Pythonic way to implement a simple FSM?

Yesterday I had to parse a very simple binary data file - the rule is, look for two bytes in a row that are both 0xAA, then the next byte will be a length byte, then skip 9 bytes and output the given amount of data from there. Repeat to the end of the file.
My solution did work, and was very quick to put together (even though I am a C programmer at heart, I still think it was quicker for me to write this in Python than it would have been in C) - BUT, it is clearly not at all Pythonic and it reads like a C program (and not a very good one at that!)
What would be a better / more Pythonic approach to this? Is a simple FSM like this even still the right choice in Python?
My solution:
#! /usr/bin/python
import sys
f = open(sys.argv[1], "rb")
state = 0
if f:
for byte in f.read():
a = ord(byte)
if state == 0:
if a == 0xAA:
state = 1
elif state == 1:
if a == 0xAA:
state = 2
else:
state = 0
elif state == 2:
count = a;
skip = 9
state = 3
elif state == 3:
skip = skip -1
if skip == 0:
state = 4
elif state == 4:
print "%02x" %a
count = count -1
if count == 0:
state = 0
print "\r\n"
The coolest way I've seen to implement FSMs in Python has to be via generators and coroutines. See this Charming Python post for an example. Eli Bendersky also has an excellent treatment of the subject.
If coroutines aren't familiar territory, David Beazley's A Curious Course on Coroutines and Concurrency is a stellar introduction.
You could give your states constant names instead of using 0, 1, 2, etc. for improved readability.
You could use a dictionary to map (current_state, input) -> (next_state), but that doesn't really let you do any additional processing during the transitions. Unless you include some "transition function" too to do extra processing.
Or you could do a non-FSM approach. I think this will work as long as 0xAA 0xAA only appears when it indicates a "start" (doesn't appear in data).
with open(sys.argv[1], 'rb') as f:
contents = f.read()
for chunk in contents.split('\xaa\xaa')[1:]:
length = ord(chunk[0])
data = chunk[10:10+length]
print data
If it does appear in data, you can instead use string.find('\xaa\xaa', start) to scan through the string, setting the start argument to begin looking where the last data block ended. Repeat until it returns -1.
I am a little apprehensive about telling anyone what's Pythonic, but here goes. First, keep in mind that in python functions are just objects. Transitions can be defined with a dictionary that has the (input, current_state) as the key and the tuple (next_state, action) as the value. Action is just a function that does whatever is necessary to transition from the current state to the next state.
There's a nice looking example of doing this at http://code.activestate.com/recipes/146262-finite-state-machine-fsm. I haven't used it, but from a quick read it seems like it covers everything.
A similar question was asked/answered here a couple of months ago: Python state-machine design. You might find looking at those responses useful as well.
I think your solution looks fine, except you should replace count = count - 1 with count -= 1.
This is one of those times where fancy code-show-offs will come up ways of have dicts mapping states to callables, with a small driver function, but it isn't better, just fancier, and using more obscure language features.
I suggest checking out chapter 4 of Text Processing in Python by David Mertz. He implements a state machine class in Python that is very elegant.
I think the most pythonic way would by like what FogleBird suggested, but mapping from (current state, input) to a function which would handle the processing and transition.
You can use regexps. Something like this code will find the first block of data. Then it's just a case of starting the next search from after the previous match.
find_header = re.compile('\xaa\xaa(.).{9}', re.DOTALL)
m = find_header.search(input_text)
if m:
length = chr(find_header.group(1))
data = input_text[m.end():m.end() + length]

Categories

Resources