What is a "Physically Stored Sequence" in Python?

What is a "Physically Stored Sequence" in Python? - python

I am currently reading Learning Python, 5th Edition - by Mark Lutz and have come across the phrase "Physically Stored Sequence".
From what I've learnt so far, a sequence is an object that contains items that can be indexed in sequential order from left to right e.g. Strings, Tuples and Lists.
So in regards to a "Physically Stored Sequence", would that be a Sequence that is referenced by a variable for use later on in a program? Or am not getting it?
Thank you in advance for your answers.

A Physically Stored Sequence is best explained by contrast. It is one type of "iterable" with the main example of the other type being a "generator."
A generator is an iterable, meaning you can iterate over it as in a "for" loop, but it does not actually store anything--it merely spits out values when requested. Examples of this would be a pseudo-random number generator, the whole itertools package, or any function you write yourself using yield. Those sorts of things can be the subject of a "for" loop but do not actually "contain" any data.
A physically stored sequence then is an iterable which does contain its data. Examples include most data structures in Python, like lists. It doesn't matter in the Python parlance if the items in the sequence have any particular reference count or anything like that (e.g. the None object exists only once in Python, so [None, None] does not exactly "store" it twice).
A key feature of physically stored sequences is that you can usually iterate over them multiple times, and sometimes get items other than the "first" one (the one any iterable gives you when you call next() on it).
All that said, this phrase is not very common--certainly not something you'd expect to see or use as a workaday Python programmer.

Related

What is the meaning of this code segment?

I am trying to implement a function in python which takes in input an iterable and loops through it to perform some operation. I was confused about how to handle different iterables (example: lists and dictionaries cannot be looped in the same general way), so I looked in the statistics library in python and found that they are handling this situation like this: -
def variance(data, xbar=None):
if iter(data) is data: #<-----1
data = list(data)
...
then, they are handling data as list everywhere.
So, my question is : -
What is the meaning of (1); and
Is this the right method as it is everytime making a new list out of data. Can't they simply use the iterator to loop through the data?

iter(something) returns an iterator object that returns the elements of something. If something is already an iterator, it simply returns it unchanged. So
if iter(data) is data:
is a way of telling whether data is an iterator object. If it is, it converts it to a list of all the elements.
It's doing this because the code after that needs a real list of the elements. There are things you can do with a list that you can't do with an iterator, such as access specific elements, insert/delete elements, and loop over it multiple times. Iterators can only be processed sequentially.

3 questions about generators and iterators in Python

Everyone says you lose the benefit of generators if you put the result into a list.
But you need a list, or a sequence, to even have a generator to begin with, right? So, for example, if I need to go through the files in a directory, don't I have to make them into a list first, like with os.listdir()? If so, then how is that more efficient? (I am always working with strings and files, so I really hate that all the examples use range and integers, but I digress)
Taking it a step further, the mere presence of the yield keyword is supposed to make a generator. So if I do:
for x in os.listdir():
yield x
Is a list still being created? Or is os.listdir() itself now also magically a generator? Is it possible that, os.listdir() not having been called yet, that there really isn't a list here yet?
Finally, we are told that iterators need iter() and next() methods. But doesn’t that also mean they need an index? If not, what is next() operating on? How does it know what is next without an index? Before 3.6, dict keys had no order, so how did that iteration work?

No.
See, there's no list here:
def one():
while True:
yield 1
Index and next() are two independent tools to perform an iteration. Again, if you have an object such that its iterator's next() always returns 1, you don't need any indices.
In deeper detail...
See, technically, you can always associate a list and an index with any generator or iterator: simply write down all its returned values — you'll get at most countable set of values a₀, a₁, ... But those are merely a mathematical formalism quite unnecessarily having anything in common with how a real generator works. For instance, you have a generator that always yields one. You can count how many ones have you got from it so far, and call that an index. You can write down all that ones, comma-separated, and call that a list. Do those two objects correctly describe your elapsed generator's output? Apparently so. Are they in a least bit important for the generator itself? Not really.
Of course, a real generator will probably have a state (you can call it an index—provided you don't necessarily call something an index if it is only a non-negative integral scalar; you can write down all its states, provided it works deterministically, number them and call current state's number index—yes, approximately that). They will always have a source of their states and returned values. So, indices and lists can be regarded as abstractions that describe object's behaviour. But quite unnecessary they are concrete implementation details that are really used.
Consider unbuffered file reader. It retrieves a single byte from the disk and immediately yields it. There's no a real list in memory, only the file contents on the disk (there may even be no, if our file reader is connected to a net socket instead of a real disk drive, and the Oracle of Delphi is at connection's other end). You can call file position index—until you read the stdin, which is only forward-traversable and thus indexing it makes no real physical sense—same goes for network connections via unreliable protocol, BTW.
Something like this.

1) This is wrong; it is just the easiest example to explain a generator from a list. If you think of the 8 queens-problem and you return each position as soon as the program finds it, I can't recognize a result list anywhere. Note, that often iterators are alternately offered even by python standard library (islice() vs. slice(), and an easy example not representable by a list is itertools.cycle().
In consequence 2 and 3 are also wrong.

Is the order of execution guaranteed when looping over a string?

Is the program below guaranteed to always produce the same output?
s = 'fgvhlsdagfcisdghfjkfdshfsal'
for c in s:
print(c)

Yes, it is. This is because the str type is an immutable sequence. Sequences represent a finite ordered set of elements (see Sequences in the Data model chapter of the Reference guide).
Iteration through a given string (any Sequence) is guaranteed to always produce the same results in the same order for different runs of the CPython interpreter, versions of CPython and implementations of Python.

Yes. Internally the string you have there is stored in an c style array (depending on interpreter implementation), being a sequential array of data, one can create an iterator. In order to use for ... in ... syntax, you need to be able to iterate over the object after the in. A string supplies its own iterator which allows it to be parsed via for in syntax in sequential order as do all python sequences.
The same is true for lists, and even custom objects that you create. However not all iterable python objects will necessarily be in order or represent the values they store, a clear example of this is the dictionary. Dictionary iteration yields keys which may or may not be in the order you added them in (depending on the version of python you use among other things, so don't assume its ordered unless you use OrderedDict) instead of sequential values like list tuple and string.

Yes, it is. Over a string, a for-loop iterates over the characters in order. This is also true for lists and tuples -- a for-loop will iterate over the elements in order.
You may be thinking of sets and dictionaries. These don't specify a particular order, so:
for x in {"a","b","c"}: # over a set
print(x)
for key in {"x":1, "y":2, "z":3}: # over a dict
print(key)
will iterate in some arbitrary order that you can't easily predict in advance.
See this Stack Overflow answer for some additional information on what guarantees are made about the order for dictionaries and sets.

Yes. The for loop is sequential.

Yes, the loop will always print each letter one by one starting from the first character and ending with the last.

Why python for loops don't default to one iteration for single objects

This may seem like an odd question but why doesn't python by default "iterate" through a single object by default.
I feel it would increase the resilience of for loops for low level programming/simple scripts.
At the same time it promotes sloppiness in defining data structures properly though. It also clashes with strings being iterable by character.
E.g.
x = 2
for a in x:
print(a)
As opposed to:
x = [2]
for a in x:
print(a)
Are there any reasons?
FURTHER INFO: I am writing a function that takes a column/multiple columns from a database and puts it into a list of lists. It would just be visually "nice" to have a number instead of a single element list without putting type sorting into the function (probably me just being OCD again though)
Pardon the slightly ambiguous question; this is a "why is it so?" not an "how to?". but in my ignorant world, I would prefer integers to be iterable for the case of the above mentioned function. So why would it not be implemented. Is it to do with it being an extra strain on computing adding an __iter__ to the integer object?
Discussion Points
Is an __iter__ too much of a drain on machine resources?
Do programmers want an exception to be thrown as they expect integers to be non-iterable
It brings up the idea of if you can't do it already, why not just let it, since people in the status quo will keep doing what they've been doing unaffected (unless of course the previous point is what they want); and
From a set theory perspective I guess strictly a set does not contain itself and it may not be mathematically "pure".

Python cannot iterate over an object that is not 'iterable'.
The 'for' loop actually calls inbuilt functions within the iterable data-type which allow it to extract elements from the iterable.
non-iterable data-types don't have these methods so there is no way to extract elements from them.
This Stack over flow question on 'if an object is iterable' is a great resource.

The problem is with the definition of "single object". Is "foo" a single object (Hint: it is an iterable with three strings)? Is [[1, 2, 3]][0] a single object (It is only one object, with 3 elements)?

The short answer is that there is no generalizable way to do it. However, you can write functions that have knowledge of your problem domain and can do conversions for you. I don't know your specific case, but suppose you want to handle an integer or list of integers transparently. You can create your own iterator:
def col_iter(item):
if isinstance(item, int):
yield item
else:
for i in item:
yield i
x = 2
for a in col_iter(x):
print a
y = [1,2,3,4]
for a in col_iter(y):
print a

The only thing that i can think of is that python for loops are looking for something to iterate through not just a value. If you think about it what would the value of "a" be? if you want it to be the number 2 then you don't need the for loop in the first place. If you want it to go through 1, 2 or 0, 1, 2 then you want. for a in range(x): not positive if that's the answer you're looking for but it's what i got.

iterable as comprarison key in sorted()?

Let's say I want to sort rows and I want to resolve any ties with the next column, subsequent ties to with the next-next column etc.
In python words the equivalent of sorted(rows, key=itemgetter(1, 2, 3, 4, ...)).
I tried writing my own generator but sorted doesn't iterate over my generator as it does with the tuple itemgetter returns. Any advice?

For the reasons noted in the comments, you cannot sort a list of things that hasn't been yet created. Generators exist to yield results when they are asked for so you can't sort a an iterable that hasn't been iterated (as with list(generator()).
To put in more ordinary terms, I'm thinking of ten names but am not telling you what they are yet, please sort them into alphabetical order. You should respond "how can I sort them when you haven't given them to me?" and you'd be correct: you can't.

OK, here's what you say you want to do:
I want to sort rows and I want to resolve any ties with the next column, subsequent ties to with the next-next column etc.
Note, first, that the documentation for the key argument does the following:
key specifies a function of one argument that is used to extract a comparison key from each list element
So your itemgetter idea isn't quite right, since you want to move through the list only when a comparison is equal.
However, things are actually much easier than you think. Check out the Python docs (See also this SO question.):
Sequence types also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. (For full details see Comparisons in the language reference.)
Which, I think, is exactly what you want if you just make sure that each row is an equal-length sequence (list or tuple).
(Aha, I just read the comment regarding the die-roll function producing the keys. Confusing -- not sure if the above is helpful in that case, but I'm not sure what you are asking actually makes sense...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.