Let's say I want to sort rows and I want to resolve any ties with the next column, subsequent ties to with the next-next column etc.
In python words the equivalent of sorted(rows, key=itemgetter(1, 2, 3, 4, ...)).
I tried writing my own generator but sorted doesn't iterate over my generator as it does with the tuple itemgetter returns. Any advice?
For the reasons noted in the comments, you cannot sort a list of things that hasn't been yet created. Generators exist to yield results when they are asked for so you can't sort a an iterable that hasn't been iterated (as with list(generator()).
To put in more ordinary terms, I'm thinking of ten names but am not telling you what they are yet, please sort them into alphabetical order. You should respond "how can I sort them when you haven't given them to me?" and you'd be correct: you can't.
OK, here's what you say you want to do:
I want to sort rows and I want to resolve any ties with the next column, subsequent ties to with the next-next column etc.
Note, first, that the documentation for the key argument does the following:
key specifies a function of one argument that is used to extract a comparison key from each list element
So your itemgetter idea isn't quite right, since you want to move through the list only when a comparison is equal.
However, things are actually much easier than you think. Check out the Python docs (See also this SO question.):
Sequence types also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. (For full details see Comparisons in the language reference.)
Which, I think, is exactly what you want if you just make sure that each row is an equal-length sequence (list or tuple).
(Aha, I just read the comment regarding the die-roll function producing the keys. Confusing -- not sure if the above is helpful in that case, but I'm not sure what you are asking actually makes sense...)
Related
In python 2.7, I would like to verify whether a subset list of elements is included in a longer nested list when comparing let's say only the first two elements.
Lets say we have a big list of nested elements (this big_list will have over 10k elements so looping for every comparison is very inefficient and I'd like to avoid this). For this example, lets say we only have 4 nested lists in big_list:
`
big_list = ((2,3,5,6,7), (4,5,6,7,8), (6,7,8,8), (8,4,2,7))
`
If I have a single list, let's say (4,5,11,11,11), I am looking for an operation that will return True when compared to big_list since the second list in big_list starts with (4,5,...) and matches the first two elements of my single_list. Essentially I want to know whether the first two elements of a single list (e.g. (4,5,11,11,11)) are repeated in my big list regardless of the other followed numbers (e.g. 11,11, ...).
My operation should also return False if another single_list (e.g. (4,8,11,11,11) ) does not match the first two element in the big_list.
I hope this is clearer. Any help?
Thanks in advance,
Since you have a huge list, to avoid iterating over the whole thing every time — O(n) time complexity for each search, you can do a constant time lookup using a set.
tup_truth_set = set([tup[:2] for tup in big_list]) # set with first two letters of interest
then you would simply do something like this to check in constant time:
tuple_of_interest[:2] in tup_truth_set
I don't think that you can avoid the loop over your list. Even if you don't run the loop yourself and suppose there is a built-in function, that I am not aware of and can do what you are asking, I am pretty sure it would loop the list in the background. So I suggest a single line of code to do that, including a loop, obviously.
(4,5,11,11,11)[:2] in [i[:2] for i in big_list]
Is the program below guaranteed to always produce the same output?
s = 'fgvhlsdagfcisdghfjkfdshfsal'
for c in s:
print(c)
Yes, it is. This is because the str type is an immutable sequence. Sequences represent a finite ordered set of elements (see Sequences in the Data model chapter of the Reference guide).
Iteration through a given string (any Sequence) is guaranteed to always produce the same results in the same order for different runs of the CPython interpreter, versions of CPython and implementations of Python.
Yes. Internally the string you have there is stored in an c style array (depending on interpreter implementation), being a sequential array of data, one can create an iterator. In order to use for ... in ... syntax, you need to be able to iterate over the object after the in. A string supplies its own iterator which allows it to be parsed via for in syntax in sequential order as do all python sequences.
The same is true for lists, and even custom objects that you create. However not all iterable python objects will necessarily be in order or represent the values they store, a clear example of this is the dictionary. Dictionary iteration yields keys which may or may not be in the order you added them in (depending on the version of python you use among other things, so don't assume its ordered unless you use OrderedDict) instead of sequential values like list tuple and string.
Yes, it is. Over a string, a for-loop iterates over the characters in order. This is also true for lists and tuples -- a for-loop will iterate over the elements in order.
You may be thinking of sets and dictionaries. These don't specify a particular order, so:
for x in {"a","b","c"}: # over a set
print(x)
for key in {"x":1, "y":2, "z":3}: # over a dict
print(key)
will iterate in some arbitrary order that you can't easily predict in advance.
See this Stack Overflow answer for some additional information on what guarantees are made about the order for dictionaries and sets.
Yes. The for loop is sequential.
Yes, the loop will always print each letter one by one starting from the first character and ending with the last.
I am currently reading Learning Python, 5th Edition - by Mark Lutz and have come across the phrase "Physically Stored Sequence".
From what I've learnt so far, a sequence is an object that contains items that can be indexed in sequential order from left to right e.g. Strings, Tuples and Lists.
So in regards to a "Physically Stored Sequence", would that be a Sequence that is referenced by a variable for use later on in a program? Or am not getting it?
Thank you in advance for your answers.
A Physically Stored Sequence is best explained by contrast. It is one type of "iterable" with the main example of the other type being a "generator."
A generator is an iterable, meaning you can iterate over it as in a "for" loop, but it does not actually store anything--it merely spits out values when requested. Examples of this would be a pseudo-random number generator, the whole itertools package, or any function you write yourself using yield. Those sorts of things can be the subject of a "for" loop but do not actually "contain" any data.
A physically stored sequence then is an iterable which does contain its data. Examples include most data structures in Python, like lists. It doesn't matter in the Python parlance if the items in the sequence have any particular reference count or anything like that (e.g. the None object exists only once in Python, so [None, None] does not exactly "store" it twice).
A key feature of physically stored sequences is that you can usually iterate over them multiple times, and sometimes get items other than the "first" one (the one any iterable gives you when you call next() on it).
All that said, this phrase is not very common--certainly not something you'd expect to see or use as a workaday Python programmer.
I am very new to python and my apologies is this has already been answered. I can see a lot of previous answers to 'sort' questions but my problem seems a little different from these questions and answers.
I have a list of keys, with each key contained in a tuple, that I am trying to sort. Each key is derived from a subset of the columns in a CSV file, but this subset is determined by the user at runtime and can't be hard coded as it will vary from execution to execution. I also have a datetime value that will always form part of the key as the last item in the tuple (so there will be at least one item to sort on - even if the user provides no additional items).
The tuples to be sorted look like:
(col0, col1, .... colN, datetime)
Where col0 to colN are based on the values found in columns in a CSV file, and the 'N' can change from run to run.
In each execution, the tuples in the list will always have the same number of items in each tuple. However, they need to be able to vary from run to run based on user input.
The sort looks like:
sorted(concurrencydict.keys(), key=itemgetter(0, 1, 2))
... when I do hard-code the sort based on the first three columns. The issue is that I don't know in advance of execution that 3 items will need to be sorted - it may be 1, 2, 3 or more.
I hope this description makes sense.
I haven't been able to think of how I can get itemgetter to accept a variable number of values.
Does anyone know whether there is an elegant way of performing a sort based on a variable number of items in python where the number of sort items is determined at run time (and not based on fixed column numbers or attribute names)?
I guess I'll turn my comment into an answer.
You can pass a variable number of arguments (which are packed into an iterable object) by using *args in the function call. In your specific case, you can put your user-supplied selection of column numbers to sort by into a sort_columns list or tuple, then call:
sorted_keys = sorted(concurrencydict.keys(), key=itemgetter(*sort_columns))
I'm trying to build a solution to properly order an array of value pairs so that they end up in the correct sequence. Consider this example in Python:
theArray = [['Dempster St','Main St'],['Dempster St','Church St'],['Emerson St','Church St']]
I need to order the array so that in the end it looks like this:
theArray = [['Emerson St','Church St'],['Church St','Dempster St'],['Dempster St','Main St']]
Some considerations:
There is no guarantee that the order within each pair point in the same direction. Ex: in the example above, the second array element has the order of their pairs pointing in the opposite direction of the rest (Dempster to Church instead of Church to Dempster)
The code should be built so that it could be used in both Python and C, so ideally it should be done without any language-specific tricks
At the end, it doesn't matter in which order the final array will be built, as long as the elements follow the correct order. For example, the solution below would also work:
theArray = [['Main St','Dempster St'],['Dempster St','Church St'],['Church St','Emerson St']]
Ideas?
I managed to make it work. I iterated each element of every pair with each other by using multiple nested loops - so that I could check for their uniqueness (and in order to do that, I increment an associated variable whenever an item was found more than once, like a refcount); at the end, the two elements with the lowest count are beginning and end of the route. From there it was quite easy to find the remaining connections.