Python - Generating/iterating a combination

Python - Generating/iterating a combination - python

I'd like to start by saying that I'm fairly new to Python and am very interested in coding in general. I have some familiarity with basic concepts, but the functions used specifically in Python are largely unknown to me. Fwiw I'm a visual/experiential learner.
I'd also like to state right off the bat that I apologize if this question has been asked and already answered. I found alot of "similar" questions, but none that really helped me find a solution.
Problem
As stated in my topic title I'm interested in creating an "output" (correct terminology?) of an nCr into a list or whatever is the best method to view it.
It would be a 10 choose 5. The ten variables could be numbers, letters, names, words, etc. There are no repeats and no order to the combinations.
Research
I would like to say that I'd looked at similar topics like this question/answer and found the concepts helpful, but have discovered that in the example code:
from itertools import izip
reduce(lambda x, y: x * y[0] / y[1], izip(xrange(n - r + 1, n+1), xrange(1, r+1)), 1)
The reduce tool/function isn't used that way anymore. I think I read that it changed to functools as of Python 3.
Question
How would the above code (examples are helpful) or any other be updated/changed to accommodate for reduce? Are there any other ways to output the combination results?
***Edit***
I think I didn't clearly connect the content of the Problem and Question headings. Basically my main question is under the Problem heading. While it is helpful to see how a person can use the itertools to make combinations from a list, I don't have any idea how to output the 10 choose 5. :\

Are you trying to do:
>>> import itertools
>>> for combination in itertools.combinations('ABCDEFGHIJ', 5):
>>> print(''.join(combination))
ABCDE
ABCDF
ABCDG
ABCDH
ABCDI
ABCDJ
ABCEF
ABCEG
ABCEH
ABCEI
If so, it's done. If you want to learn how to implement the combinations function yourself, you're lucky because the itertools.combinations documentation gives an implementation.

Related

How to extract these sub-strings from a string with regex in python?

I'm building a module in python, that focuses mainly on mathematics. I thought it would be a nice touch to add support for mathematical series. I had no issues with implementing arithmetic progression and geometric series, but I stumbled upon a problem when attempting to implement recursive series. I've come up with a solution to that, but for that I first need to extract the elements of the series from a user-input string that represents the series.I think that regex might be the best option, but it is my biggest phobia in the world, so I'd really appreciate the help.
For example, for a string like
"a_n = a_{n-1} + a_{n-2}"
I want to have a set
{"a_n","a_{n-1}","a_{n-2}"}
It also needs to support more complicated recursive definitions, like:
"a_n*a_{n-1} = ln(a_{n-2} * a_n)*a_{n-3}"
the set will be:
{"a_n","a_{n-1}","a_{n-2}","a_{n-3}"}
Feel free to do some minor syntax changes if you think it'll make it easier for the task.

The regex is easy a_(?:n|{n-\d})
a_
then
either n
or {n-\d}
import re
ptn = re.compile(r"a_(?:n|{n-\d})")
print(set(ptn.findall("a_n = a_{n-1} + a_{n-2}")))
# {'a_{n-1}', 'a_n', 'a_{n-2}'}
print(set(ptn.findall("a_n*a_{n-1} = ln(a_{n-2} * a_n)*a_{n-3}")))
# {'a_{n-1}', 'a_{n-3}', 'a_n', 'a_{n-2}'}

optimizing code that contains a several foor lops and if-else statements in Python

I red that
lis2= map(str.strip, lis1)
is faster and better-written than
lis2= []
for z in lis1:
lis2.append(z.strip())
now, I have the following code:
for item in sel:
name = item.text
songs = []
for song in item.find_next_siblings('div', class_="listalbum-item"):
if song.find_previous_sibling('div', class_='album') == item:
if 'www.somesite.com/lyrics' in song.find('a')['href']:
songs.append([song.text, song.find('a')['href']])
else:
songs.append([song.text, 'https://www.somesite.com/' + song.find('a')['href'][3:]])
album[name] = songs
how can apply the concept above to that piece of code? To be honest the first question should be is it necessary? really is it possible to optimize that? , but anyway, some advices?
thanks in advance!

You should consider the real difference between the two fragments:
lis2= map(str.strip, lis1)
And:
lis2= []
for z in lis1:
lis2.append(z.strip())
It's fair to say that, to many programmers, the first is more clearly written. After all, it literally says what is being done, not how it's done. Whenever you can do that, it's the better choice, as long as you're not sacrificing anything that's important (like, possibly, speed).
Another way of improving the readability would be to use better names and follow PEP8 coding conventions:
list2 = map(str.strip, list1)
But if it's 'better' or 'faster', really depends. The real difference is that the map example leaves it up to the implementation of map to decide how a list is constructed. On the inside of map, you could find code that just uses a for loop and list.append() as well, but it could also be some really complicated, but very efficient code that does it faster or using fewer resources somehow.
The reason people say using something like map is better (and why it's often faster) is that you're leaving the work up to a library function that has been specifically written to keep the implementation details away from you and others that read your code, and to allow you to benefit from improvements to that function over time.
It's quite possible though that, for your specific case, with your specific data, you know how to write something that performs even better. So, if speed is the objective, you may be able to improve on map for your case.
What the 'best' code is for you, depends on how you rank your code. Does readability (for others and your future self) get trumped by speed?
If your case, since you call some external library to search through an HTML document, it's very likely that will cost orders of magnitude more time than a simple for loop in Python, that's actually pretty clear as well.
Have you looked at ways of doing that work more optimally? For example, this bit:
for song in item.find_next_siblings('div', class_="listalbum-item"):
if song.find_previous_sibling('div', class_='album') == item:
find_next_siblings finds all document element siblings of the item that are a listalbum-item (apparently, a song) and for each of those, you check if the first sibling immediately before it that is an album is the item.
In other words, it appears that you're trying to loop over all the songs of the album, but do you expect songs in sel that are not on the album? If not, you should be able to optimise this code quite a bit - but it's hard to help improve the code without knowing what the content looks like.

This can be done as a list comprehension, and will be faster as a list comprehension ... but is certainly not necessarily clearer as a list comprehension:
[[song.text,
(song.find('a')['href']
if 'www.somesite.com/lyrics' in song.find('a')['href']
else 'https://www.somesite.com/' + song.find('a')['href'][3:])]
for song in item.find_next_siblings('div', class_="listalbum-item")]

Built-in way to do immutable shuffle in Python (CPython)? [duplicate]

This question already has an answer here:
How to shuffle a copied list without shuffling the original list?
(1 answer)
Closed 4 years ago.
The random.shuffle() built-in shuffles in place, which is fine for many purposes. But suppose we would want to leave the original collection intact, and generate a random permutation based on the original sequence, is there a prefered way to do this in the standard library?
When I look at CPython's random.py I see an intial comment that reads:
sequences
---------
pick random element
pick random sample
pick weighted random sample
generate random permutation
Particularly, the last line is of interest. However, I struggle to see what method in this class achieves this.
Naturally, this is not a hard problem to solve, even for a novice Python programmer. But it would be nice to have a standard way of doing it in the standard library, and I'm sure it must exist somewhere. Perhaps someplace other than random.py?

According to the docs of random.shuffle(), you could use random.sample():
To shuffle an immutable sequence and return a new shuffled list, use sample(x, k=len(x)) instead of shuffle().
The same thing was analized in this post

This seems like the obvious solution that shouldn't do more work than necessary:
def shuffled(gen):
ls = list(gen)
random.shuffle(ls)
return ls
Since it's so simple to build from stdlib primitives, I'm not sure it would make sense to include this as a separate primitive.

Pyenchant Module - Spell checker

How do I trim the output of Python Pyenchat Module's 'suggested words list ?
Quite often it gives me a huge list of 20 suggested words that looks awkward when displayed on the screen and also has a tendency to go out of the screen .

Like sentinel, I'm not sure if the problem you're having is specific to pyenchant or a python-familiarity issue. If I assume the latter, you could simply select the number of values you'd like as part of your program. In simple form, this could be as easy as:
suggestion_list = pyenchant_function(document_filled_with_typos)
number_of_suggestions = len(suggestion_list)
MAX_SUGGESTIONS = 3 # you choose what you like
if number_of_suggestions > MAX_SUGGESTIONS:
answer = suggestion_list[0:(MAX_Suggestions-1)] # python lists are indexed to 0
else:
answer = suggestion_list
Note: I'm choosing to be clear rather than concise here, since I'm guessing that will be valued by asker, if asker is unclear on using list indices.
Hope this helps and good luck with python.

Assuming it returns a standard Python list, you use standard Python slicing syntax. E.g. suggestedwords[:10] gets just the first 10.

A better way to assign list into a var

Was coding something in Python. Have a piece of code, wanted to know if it can be done more elegantly...
# Statistics format is - done|remaining|200's|404's|size
statf = open(STATS_FILE, 'r').read()
starf = statf.strip().split('|')
done = int(starf[0])
rema = int(starf[1])
succ = int(starf[2])
fails = int(starf[3])
size = int(starf[4])
...
This goes on. I wanted to know if after splitting the line into a list, is there any better way to assign each list into a var. I have close to 30 lines assigning index values to vars. Just trying to learn more about Python that's it...

done, rema, succ, fails, size, ... = [int(x) for x in starf]
Better:
labels = ("done", "rema", "succ", "fails", "size")
data = dict(zip(labels, [int(x) for x in starf]))
print data['done']

What I don't like about the answers so far is that they stick everything in one expression. You want to reduce the redundancy in your code, without doing too much at once.
If all of the items on the line are ints, then convert them all together, so you don't have to write int(...) each time:
starf = [int(i) for i in starf]
If only certain items are ints--maybe some are strings or floats--then you can convert just those:
for i in 0,1,2,3,4:
starf[i] = int(starf[i]))
Assigning in blocks is useful; if you have many items--you said you had 30--you can split it up:
done, rema, succ = starf[0:2]
fails, size = starf[3:4]

I might use the csv module with a separator of | (though that might be overkill if you're "sure" the format will always be super-simple, single-line, no-strings, etc, etc). Like your low-level string processing, the csv reader will give you strings, and you'll need to call int on each (with a list comprehension or a map call) to get integers. Other tips include using the with statement to open your file, to ensure it won't cause a "file descriptor leak" (not indispensable in current CPython version, but an excellent idea for portability and future-proofing).
But I question the need for 30 separate barenames to represent 30 related values. Why not, for example, make a collections.NamedTuple type with appropriately-named fields, and initialize an instance thereof, then use qualified names for the fields, i.e., a nice namespace? Remember the last koan in the Zen of Python (import this at the interpreter prompt): "Namespaces are one honking great idea -- let's do more of those!"... barenames have their (limited;-) place, but representing dozens of related values is not one -- rather, this situation "cries out" for the "let's do more of those" approach (i.e., add one appropriate namespace grouping the related fields -- a much better way to organize your data).

Using a Python dict is probably the most elegant choice.
If you put your keys in a list as such:
keys = ("done", "rema", "succ" ... )
somedict = dict(zip(keys, [int(v) for v in values]))
That would work. :-) Looks better than 30 lines too :-)
EDIT: I think there are dict comphrensions now, so that may look even better too! :-)
EDIT Part 2: Also, for the keys collection, you'd want to break that into multpile lines.
EDIT Again: fixed buggy part :)

Thanks for all the answers. So here's the summary -
Glenn's answer was to handle this issue in blocks. i.e. done, rema, succ = starf[0:2] etc.
Leoluk's approach was more short & sweet taking advantage of python's immensely powerful dict comprehensions.
Alex's answer was more design oriented. Loved this approach. I know it should be done the way Alex suggested but lot of code re-factoring needs to take place for that. Not a good time to do it now.
townsean - same as 2
I have taken up Leoluk's approach. I am not sure what the speed implication for this is? I have no idea if List/Dict comprehensions take a hit on speed of execution. But it reduces the size of my code considerable for now. I'll optimize when the need comes :) Going by - "Pre-mature optimization is the root of all evil"...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.