Built-in way to do immutable shuffle in Python (CPython)? [duplicate] - python

This question already has an answer here:
How to shuffle a copied list without shuffling the original list?
(1 answer)
Closed 4 years ago.
The random.shuffle() built-in shuffles in place, which is fine for many purposes. But suppose we would want to leave the original collection intact, and generate a random permutation based on the original sequence, is there a prefered way to do this in the standard library?
When I look at CPython's random.py I see an intial comment that reads:
sequences
---------
pick random element
pick random sample
pick weighted random sample
generate random permutation
Particularly, the last line is of interest. However, I struggle to see what method in this class achieves this.
Naturally, this is not a hard problem to solve, even for a novice Python programmer. But it would be nice to have a standard way of doing it in the standard library, and I'm sure it must exist somewhere. Perhaps someplace other than random.py?

According to the docs of random.shuffle(), you could use random.sample():
To shuffle an immutable sequence and return a new shuffled list, use sample(x, k=len(x)) instead of shuffle().
The same thing was analized in this post

This seems like the obvious solution that shouldn't do more work than necessary:
def shuffled(gen):
ls = list(gen)
random.shuffle(ls)
return ls
Since it's so simple to build from stdlib primitives, I'm not sure it would make sense to include this as a separate primitive.

Related

Query about syntax used to append items to a list [duplicate]

This question already has answers here:
Why do these list operations (methods: clear / extend / reverse / append / sort / remove) return None, rather than the resulting list?
(6 answers)
Closed 3 years ago.
I've just started learning python a few weeks back, one of the first things that was taught was how to manipulate lists. Normally if I have a list and want to append a new element it would look something like this.
test_list=[1,2,3]
test_list.append(4)
I just wanted to know why we don't instead say
test_list=test_list.append(4)
This thought came to me because normally for a variable 'x' if we want to update its value, we do the following:
x=initial_value
x=new_value
In the same way for the list case, doesnt test_list.apppend(4) represent a list with the same elements of test_list and contain an extra element 4? In which case why is it that we cant use the same syntax in both cases to update the variable storing the information? In the list case, if I try and print the list I get None and in the x case, if I print x i get new_value?
A list in Python is more of a class than a type. The function adds an element to the list but does not return anything.
This is all about object-oriented programming. If you are no familiar with it, read about it. In short, append is the method of the object which belongs to list class. Basically, this method could behave in 2 ways:
1) like it does, mutating self object
and
2) how you described, generating new object (btw, this is the behaviour of similar Array.concat method in JavaScript)
So, this implementation is just part of language design, the decision made by language creators, and we just have to admit and use it this way :)

Selecting One String from data list in Python

I am relatively new to programming, so be easy on me.
I have made a program (much like a magic 8 ball) that the user asks a question, and then I have created a list with all my answers inside of it (14 answers). My program shuffles all the answers, but then I now need to assign every piece of data (using a random number generator from 1-14 (which I've completed), and then have it match the shuffled data with a random number and print that.
I have created the random number generator, and the answer list shuffles. I just need to know how to assign the number to the strings in my list, and then print that one string.
Lists have indices; if your number is between 0 and 13 (inclusive), then you can just use that directly on your list:
print(answers[random_number])
However, the random module has a dedicated function for just this use case; random.choice() picks one value from a sequence at random:
print(random.choice(answers))
No need to shuffle anything that way..
Demo:
>>> import random
>>> answers = ['Without a doubt!', 'Hmmm, not so sure', 'By the winds, set sail now!', 'Oh no, no NO NO!!']
>>> print(random.choice(answers))
Hmmm, not so sure
>>> print(random.choice(answers))
Oh no, no NO NO!!

Does it pay off to use a generator as input to sorted() instead of a list-comprehension [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
sorted() using Generator Expressions Rather Than Lists
We all know using generators instead of instantiating lists all the time saves time and memory, especially if we use comprehensions a lot.
Here's a question though, consider the following code:
output = SomeExpensiveCallEgDatabase()
results = [result[0] for result in output]
return sorted(results)
The call to sorted will return a sorted list of the results. Would it be better or worse to declare results as below and then call sorted?
results = (result[0] for result in output)
My guess is the call to sorted() would traverse the generator and instantiate a list itself in order to run quicksort or mergesort on it. So there would be no advantage in using the generator here. Is this assumption correct?
I believe your assumption to be true, since there is no easy way of ordering the collection without first having the whole list in memory (at least certainly not with the default sorting algorithm, TimSort if I'm not mistaken).
Check this out:
sorted() using Generator Expressions Rather Than Lists
To create the new List, the builtin sorted method uses PySequence_List:
PyObject* PySequence_List(PyObject *o) Return value: New reference.
Return a list object with the same contents as the arbitrary sequence
o. The returned list is guaranteed to be new.
Pros and cons of both approaches:
Memory-wise:
The returned list is the one used for the sorted version, so this would mean that in this case, only one list is stored completely in memory at any given time, using the generator version.
This makes the generator version more efficient memory-wise.
Speed:
Here the version with the whole list wins.
To create a new list based on a generator, an empty list must be created (or at best with the first element), and each following element appended to the list, with the possible redimensioning steps this may provoke.
To create a new list based on a previous list, the size of the list is known beforehand, and thus can be allocated at once and each of the entries assigned (possibly, there are other optimizations at work here, but I can't back that up).
So regarding speed, the list wins.
The answer to "what's the best", comes down to the most common answer in any field of engineering... it depends....
No you are still creating a brand new list with sorted()
output = SomeExpensiveCallEgDatabase()
results = [result[0] for result in output]
results.sort()
return results
would be closer to the generator version.
I believe it's better to use the generator version because some future version of Python may be able to take advantage of this to work more efficiently. It's always nice to get a speed up for free.
Yes, you are correct (although I believe the sorting routine is still called tim-sort, after uncle timmy <wink-ly y'rs>)

Difference between two "contains" operations for python lists

I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.
In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"
the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.
The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.
for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.

Pyenchant Module - Spell checker

How do I trim the output of Python Pyenchat Module's 'suggested words list ?
Quite often it gives me a huge list of 20 suggested words that looks awkward when displayed on the screen and also has a tendency to go out of the screen .
Like sentinel, I'm not sure if the problem you're having is specific to pyenchant or a python-familiarity issue. If I assume the latter, you could simply select the number of values you'd like as part of your program. In simple form, this could be as easy as:
suggestion_list = pyenchant_function(document_filled_with_typos)
number_of_suggestions = len(suggestion_list)
MAX_SUGGESTIONS = 3 # you choose what you like
if number_of_suggestions > MAX_SUGGESTIONS:
answer = suggestion_list[0:(MAX_Suggestions-1)] # python lists are indexed to 0
else:
answer = suggestion_list
Note: I'm choosing to be clear rather than concise here, since I'm guessing that will be valued by asker, if asker is unclear on using list indices.
Hope this helps and good luck with python.
Assuming it returns a standard Python list, you use standard Python slicing syntax. E.g. suggestedwords[:10] gets just the first 10.

Categories

Resources