Was coding something in Python. Have a piece of code, wanted to know if it can be done more elegantly...
# Statistics format is - done|remaining|200's|404's|size
statf = open(STATS_FILE, 'r').read()
starf = statf.strip().split('|')
done = int(starf[0])
rema = int(starf[1])
succ = int(starf[2])
fails = int(starf[3])
size = int(starf[4])
...
This goes on. I wanted to know if after splitting the line into a list, is there any better way to assign each list into a var. I have close to 30 lines assigning index values to vars. Just trying to learn more about Python that's it...
done, rema, succ, fails, size, ... = [int(x) for x in starf]
Better:
labels = ("done", "rema", "succ", "fails", "size")
data = dict(zip(labels, [int(x) for x in starf]))
print data['done']
What I don't like about the answers so far is that they stick everything in one expression. You want to reduce the redundancy in your code, without doing too much at once.
If all of the items on the line are ints, then convert them all together, so you don't have to write int(...) each time:
starf = [int(i) for i in starf]
If only certain items are ints--maybe some are strings or floats--then you can convert just those:
for i in 0,1,2,3,4:
starf[i] = int(starf[i]))
Assigning in blocks is useful; if you have many items--you said you had 30--you can split it up:
done, rema, succ = starf[0:2]
fails, size = starf[3:4]
I might use the csv module with a separator of | (though that might be overkill if you're "sure" the format will always be super-simple, single-line, no-strings, etc, etc). Like your low-level string processing, the csv reader will give you strings, and you'll need to call int on each (with a list comprehension or a map call) to get integers. Other tips include using the with statement to open your file, to ensure it won't cause a "file descriptor leak" (not indispensable in current CPython version, but an excellent idea for portability and future-proofing).
But I question the need for 30 separate barenames to represent 30 related values. Why not, for example, make a collections.NamedTuple type with appropriately-named fields, and initialize an instance thereof, then use qualified names for the fields, i.e., a nice namespace? Remember the last koan in the Zen of Python (import this at the interpreter prompt): "Namespaces are one honking great idea -- let's do more of those!"... barenames have their (limited;-) place, but representing dozens of related values is not one -- rather, this situation "cries out" for the "let's do more of those" approach (i.e., add one appropriate namespace grouping the related fields -- a much better way to organize your data).
Using a Python dict is probably the most elegant choice.
If you put your keys in a list as such:
keys = ("done", "rema", "succ" ... )
somedict = dict(zip(keys, [int(v) for v in values]))
That would work. :-) Looks better than 30 lines too :-)
EDIT: I think there are dict comphrensions now, so that may look even better too! :-)
EDIT Part 2: Also, for the keys collection, you'd want to break that into multpile lines.
EDIT Again: fixed buggy part :)
Thanks for all the answers. So here's the summary -
Glenn's answer was to handle this issue in blocks. i.e. done, rema, succ = starf[0:2] etc.
Leoluk's approach was more short & sweet taking advantage of python's immensely powerful dict comprehensions.
Alex's answer was more design oriented. Loved this approach. I know it should be done the way Alex suggested but lot of code re-factoring needs to take place for that. Not a good time to do it now.
townsean - same as 2
I have taken up Leoluk's approach. I am not sure what the speed implication for this is? I have no idea if List/Dict comprehensions take a hit on speed of execution. But it reduces the size of my code considerable for now. I'll optimize when the need comes :) Going by - "Pre-mature optimization is the root of all evil"...
Related
I red that
lis2= map(str.strip, lis1)
is faster and better-written than
lis2= []
for z in lis1:
lis2.append(z.strip())
now, I have the following code:
for item in sel:
name = item.text
songs = []
for song in item.find_next_siblings('div', class_="listalbum-item"):
if song.find_previous_sibling('div', class_='album') == item:
if 'www.somesite.com/lyrics' in song.find('a')['href']:
songs.append([song.text, song.find('a')['href']])
else:
songs.append([song.text, 'https://www.somesite.com/' + song.find('a')['href'][3:]])
album[name] = songs
how can apply the concept above to that piece of code? To be honest the first question should be is it necessary? really is it possible to optimize that? , but anyway, some advices?
thanks in advance!
You should consider the real difference between the two fragments:
lis2= map(str.strip, lis1)
And:
lis2= []
for z in lis1:
lis2.append(z.strip())
It's fair to say that, to many programmers, the first is more clearly written. After all, it literally says what is being done, not how it's done. Whenever you can do that, it's the better choice, as long as you're not sacrificing anything that's important (like, possibly, speed).
Another way of improving the readability would be to use better names and follow PEP8 coding conventions:
list2 = map(str.strip, list1)
But if it's 'better' or 'faster', really depends. The real difference is that the map example leaves it up to the implementation of map to decide how a list is constructed. On the inside of map, you could find code that just uses a for loop and list.append() as well, but it could also be some really complicated, but very efficient code that does it faster or using fewer resources somehow.
The reason people say using something like map is better (and why it's often faster) is that you're leaving the work up to a library function that has been specifically written to keep the implementation details away from you and others that read your code, and to allow you to benefit from improvements to that function over time.
It's quite possible though that, for your specific case, with your specific data, you know how to write something that performs even better. So, if speed is the objective, you may be able to improve on map for your case.
What the 'best' code is for you, depends on how you rank your code. Does readability (for others and your future self) get trumped by speed?
If your case, since you call some external library to search through an HTML document, it's very likely that will cost orders of magnitude more time than a simple for loop in Python, that's actually pretty clear as well.
Have you looked at ways of doing that work more optimally? For example, this bit:
for song in item.find_next_siblings('div', class_="listalbum-item"):
if song.find_previous_sibling('div', class_='album') == item:
find_next_siblings finds all document element siblings of the item that are a listalbum-item (apparently, a song) and for each of those, you check if the first sibling immediately before it that is an album is the item.
In other words, it appears that you're trying to loop over all the songs of the album, but do you expect songs in sel that are not on the album? If not, you should be able to optimise this code quite a bit - but it's hard to help improve the code without knowing what the content looks like.
This can be done as a list comprehension, and will be faster as a list comprehension ... but is certainly not necessarily clearer as a list comprehension:
[[song.text,
(song.find('a')['href']
if 'www.somesite.com/lyrics' in song.find('a')['href']
else 'https://www.somesite.com/' + song.find('a')['href'][3:])]
for song in item.find_next_siblings('div', class_="listalbum-item")]
Probably a stupid question, but I am wondering in general, and if anyone knows, how much foresight the Python interpreter has, specifically in the field of regular expressions and text parsing.
Suppose my code at some point looks like this:
mylist = ['a', 'b', 'c', ... ]
if 'g' in list: print(mylist.index('g'))
is there any safer way to do this with a while loop or similar. I mean, will the index be looked up with a second parsing from the beginning or are the two g's (in the above line) the same thing in Python's mind?
It'll do the lookup both times. If it's worth it (for, say, a very big list), use try:
try:
print(mylist.index('g'))
except ValueError:
pass
The result of the containment check is not cached, and so the index will need to be discovered anew. And the dynamic nature of Python makes implicit caching of such a thing unreliable since the __contains__() method may mutate the object (although it would be a violation of several programming principles to do so).
Your code will result in two lookups, first to determine if 'g' is in the list and second to find the index. Python won't try to consolidate them into a single lookup. If you're worried about efficiency you can use a dictionary instead of a list which will make both lookups O(1) instead of O(n).
You can easily make a dict to look up. Something like this:
mydict = {k:v for v,k in enumerate(mylist)}
The overhead of creating the dict won't be worthwhile unless you are doing a few such lookups on the same list
Try is better option to find the index of element in list.
try:
print(mylist.index('g'))
except ValueError:
print "value not in list"
pass
yeah it will be looked up twice, the python interpreter doesn't cache instructions, though I've being wondering if its possible (for certain things), if this is an issue, then you can use sets or dicts both of which have constant look up time.
Either way it seems you are LBYL, in python we tend to EAFP so its quite common to wrap such things in try ... except blocks
From web2py example 33 we see:
db.purchase.insert(buyer_id=form.vars.buyer_id,
product_id=form.vars.product_id,
quantity=form.vars.quantity)
but I think there should be some way to make this less repetitive. perhaps this?
db.purchase.insert(**dict( [k = getattr(form.vars, k) for k in "buyer_id product_id quantity".split()]))
For me, DRY means 1) not repeating actual code, and 2) (and more important) not repeating information; i.e. there should be one place that each item of information exists.
In this instance, you're really just repeating a pattern, and I think that's fine. The second example is much harder to read; why complicate it just to save a few characters?
You could avoid repeating form.vars:
vars = form.vars
db.purchase.insert(
buyer_id=vars.buyer_id,
product_id=vars.product_id,
quantity=vars.quantity)
There is still some repetition, but I think it's better to leave it with some repetition rather than making your code hard to read.
if those 3 things are all there is you could do
db.purchase.insert(**form.vars)
otherwise I think that the original code is plenty dry
but i guess you could do
to_insert = {"product_id":form.vars.product_id,"quantity":form.vars.quantity,"buyer_id":form.vars.buyer_id}
db.purchase.insert(**to_insert)
which is similar to your second example but is more readable and simple (some of the tenets of python)
I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.
In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"
the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.
The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.
for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.
I'm currently writing a program in Python to track statistics on video games. An example of the dictionary I'm using to track the scores :
ten = 1
sec = 9
fir = 10
thi5 = 6
sec5 = 8
games = {
'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"],
'nethack': [fir+fir+fir+sec+thi5, "Nethack"]
}
Right now, I'm going about this the hard way, and making a big long list of nested ifs, but I don't think that's the proper way to go about it. I was trying to figure out a way to sort the dictionary, via the arrays, and then, finding a way to display the first ten that pop up... instead of having to work deep in the if statements.
So... basically, my question is : Do you have any ideas that I could use to about making this easier, instead of wayyyy, way harder?
===== EDIT ====
the ten+fir produces numbers. I want to find a way to go about sorting the lists (I lack the knowledge of proper terminology) to go by the number (basically, whichever ones have the highest number in the first part of the array go first.
Here's an example of my current way of going about it (though, it's incomplete, due to it being very tiresome : Example Nests (paste2) (let's try this one?)
==== SECOND EDIT ====
In case someone doesn't see my comment below :
ten, fir, et cetera - these are just variables for scores. Basically, it goes from a top ten list into a variable number.
ten = 1, nin = 2, fir = 10, fir5 = 10, sec5 = 8, sec = 9...
so : 'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"] actually registers as : 'adom': [1+10+9+8, "Ancient Domain of Mysteries"] , which ends up looking like :
'adom': [28, "Ancient Domain of Mysteries"]
So, basically, if I ended up doing the "top two" out of my example, it'd be :
((1)) Nethack (48)
((2)) ADOM (28)
I'd write an actual number, but I'm thinking of changing a few things up, so the numbers might be a touch different, and I wouldn't want to rewrite it.
== THIRD (AND HOPEFULLY THE FINAL) EDIT ==
Fixed my original code example.
How about something like this:
scores = games.items()
scores.sort(key = lambda key, value: value[0])
return scores[:10]
This will return the first 10 items, sorted by the first item in the array.
I'm not sure if this is what you want though, please update the question (and fix the example link) if you need something else...
import heapq
return heapq.nlargest(10, games.iteritems(), key=lambda k, v: v[0])
is the most direct way to get the top ten key / value pairs, sorted by the first item of each "value" list. If you can define more precisely what output you want (just the names, the name / value pairs, or what else?) and the sorting criterion, this is easy to adjust, of course.
Wim's solution is good, but I'd say that you should probably go the extra mile and push this work off onto a database, rather than relying on Python. Python interfaces well with most types of databases, where much of what you're exploring is already a solved problem.
For example, instead of worrying about shifting your dictionaries to various other data types in order to properly sort them, you can simply get all the data for each pertinent entry pre-sorted based on the criteria of your query. There goes the need for convoluted sorting and resorting right there.
While dictionaries are tempting to use, because they give the illusion of database-like abilities to access data based on its attributes, I still think they stumble quite a bit with respect to implementation. I don't really have any numbers to throw at you, but just from personal experience, anything you do on Python when it comes to manipulating large amounts of data, you can do much faster and more efficient both in code and computation with something like MySQL.
I'm not sure what you have planned as far as the structure of your data goes, but along with adding data, changing its structure is a lot easier using a database, too.