What is the meaning of this code segment? - python

I am trying to implement a function in python which takes in input an iterable and loops through it to perform some operation. I was confused about how to handle different iterables (example: lists and dictionaries cannot be looped in the same general way), so I looked in the statistics library in python and found that they are handling this situation like this: -
def variance(data, xbar=None):
if iter(data) is data: #<-----1
data = list(data)
...
then, they are handling data as list everywhere.
So, my question is : -
What is the meaning of (1); and
Is this the right method as it is everytime making a new list out of data. Can't they simply use the iterator to loop through the data?

iter(something) returns an iterator object that returns the elements of something. If something is already an iterator, it simply returns it unchanged. So
if iter(data) is data:
is a way of telling whether data is an iterator object. If it is, it converts it to a list of all the elements.
It's doing this because the code after that needs a real list of the elements. There are things you can do with a list that you can't do with an iterator, such as access specific elements, insert/delete elements, and loop over it multiple times. Iterators can only be processed sequentially.

Related

what does generator's yield return in python and how is this different from return [duplicate]

This question already has answers here:
What does the "yield" keyword do in Python?
(51 answers)
Closed 3 years ago.
I am really confused as what does keyword "yield" return in generator? what are the real use case of this, when should i use it.
how is it different from "return" keyword?
what i have learnt is generator is better in term of performance but i cannot think of any real use case, if asked in interviews !
Thanks in advance!
This may be useful for text processing. If you have a larg corpus and you want to normalize the characters in the text, you apply a normalize function for every text for example.
You would like a function that loads a text just when you are going to use it and not the complete corpus because it may be too large for your computer.
Example:
from lxml import etree
def get_data(data_directory, parser):
for filename in os.listdir(data_directory):
if filename.endswith("xml"):
tree = etree.parse(os.path.join(data_directory, filename), parser=parser)
yield tree.getroot()
else:
print("None")
return None
You have a directory where all your files are. You want to parse only the XML files.
You can do such processing with a yield statement as if you loaded all your data:
for root in get_data(DATA_DIRECTORY, parser):
result = process(root)
save_result(result)
Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.
You can read more about the differences here
The difference between yielding a single value and returning a single value is that yield wraps the value in an iterator, which is also called a stream or enumerator in other languages. A list is one example of an enumerator, and to simplify this answer, you can pretend that all iterators are just lists.
The difference between yielding many values (say, inside a for loop and returning an iterator (or list), is when the values are calculated. With yield, one value is calculated, and returned to the caller. If the caller doesn't need the whole list of values, the rest of the list is not even calculated.
However, when returning a list, the entire list must be calculated beforehand. Say you have this function:
def findIndex(enumerator, item):
idx = 0
for value in enumerator:
if (value == item):
return idx
idx = idx + 1
It takes an iterator, and searches for an item, returning the index of that item.
Now, here's where iterators make a difference. Imagine that you are going to call findIndex like this:
findIndex(gimme_the_values(), 3);
Say that gimme_the_values is some function which calculates a list of integers; however, let's also say that, the process of calculating those integers takes a long time, for some reason. Maybe, you're scanning through a 1500 page document, looking for every number that occurs in it, and that's the list of values that you're returning.
Now, let's say that the first several numbers to occur in this document are the numbers 7, 1998, 3, and 18; and let's say that the three occurs on the 40th page. If you define gimme_the_values to use yield, you can stop generating that "list" at page 40 — you'll never even scan for and return the the 18. However, if gimme_the_values returns a list instead of yielding, you have to scan every page, and generate the whole list, even though you really only need the first 3 in this case.

Why can a zip() variable in python be parsed only once? [duplicate]

This question already has answers here:
The zip() function in Python 3
(2 answers)
Closed last month.
I seem to have found out a weird bug in Python and I do not know if it exists already or is it something wrong that I am doing. Please explain.
We know that we can zip two lists in python to combine them as tuples. We can again parse them easily. When I am trying to parse the same zipped variable more than once, Python doesnt seem to be doing that and it ends up giving empty lists []. The first time it will do it but more than once it wont.
Example:
lis1=[1,2,3,4,5]
lis2=['a','b','a','b','a']
zip_variable=zip(lis1,lis2)
op1=[val2 for (val1,val2) in zip_variable if val1<4]
op2=[val1 for (val1,val2) in zip_variable if val2=='a']
op3=[val1 for (val1,val2) in zip_variable if val2=='b']
print(op1,"\n",op2,"\n",op3)
Output:
['a','b','a']
[]
[]
I have the solution to fix it which is by making multiple variables for the same zip i.e as below:
lis1=[1,2,3,4,5]
lis2=['a','b','a','b','a']
zip_variable1=zip(lis1,lis2)
zip_variable2=zip(lis1,lis2)
zip_variable3=zip(lis1,lis2)
op1=[val2 for (val1,val2) in zip_variable1 if val1<4]
op2=[val1 for (val1,val2) in zip_variable2 if val2=='a']
op3=[val1 for (val1,val2) in zip_variable3 if val2=='b']
print(op1,"\n",op2,"\n",op3)
Output:
['a','b','a']
[1,3,5]
[2,4]
The solution is always possible if we dont care about memory.
But the main question why does this happen?
zip() returns an iterator in Python 3. It produces only one tuple at a time from the source iterables, as needed, and when those have been iterated over, zip() has nothing more to yield. This approach reduces memory needs and can improve performance as well (especially if you don't actually ever request all the zipped tuples).
If you need the same sequence again, either call zip() again, or convert zip() to a list like list(zip(...)).
You could also use itertools.tee() to create "copies" of a zip() iterator. However, behind the scenes, this stores any items that haven't been requested by all iterators. If you're going to do that, you might as well just use a list to begin with.
Because zip function returns an iterator.
This kind of object can only be iterated once.
If you want to iterate multiple times the same zip I recomend you creating a list or a tuple from it (list(zip(a, b)) or tuple(zip(a, b)))

similar list.append statements returning different results

I have the two expressions below, which to me are basically the same but the first line gives a list with generator inside rather than the values while the second one works fine.
I just wanted to know why this happens what is a generator and how its used.
newer_list.append([sum(i)] for i in new_list)
for i in new_list:
newer_list.append([sum(i)])
The first one has a generator expression (sum[i] for i in new_list), while the second one just loops, adding the sum.
It is possible you wanted something like newer_list.extend([sum(i) for i in new_list]), where extend concatenates lists instead of just appending, and the whole thing is wrapped in brackets so it's a list comprehension instead of a generator.
A generator is a way for Python to keep from storing everything in memory. The expression ([sum(i)] for i in new_list) is a formula for generating the items in a list. To keep from storing that list in memory, it just stores the function it would need to execute, which has less of a memory footprint.
To turn a generator into a list, you can just do list([sum(i)] for i in new_list), or in this case ([[sum(i)] for i in new_list])

How to convert a tuple (in place) to first item in list?

I have boiler plate code that performs my sql queries and returns results (this is based off of working code written years ago by someone else). Generally, that code will return a list of tuples, which is totally fine for what I need.
However, if there's only one result, the code returns a single tuple instead, and breaks code that expects to loop through a list of tuples.
I need any easy way to convert the tuple into the first item of a list, so I can use it in my code expecting to loop through lists.
What's the most straightforward way to do this in a single line of code?
I figured there must be a straightforward way to do this, and there is. If my result set is called rows:
if not isinstance(rows,list):
rows = [rows]
I don't know if this is the most Pythonic construction, or if there's a way of combining the isintance and rows = [rows] lines into a single statement.

What is a "Physically Stored Sequence" in Python?

I am currently reading Learning Python, 5th Edition - by Mark Lutz and have come across the phrase "Physically Stored Sequence".
From what I've learnt so far, a sequence is an object that contains items that can be indexed in sequential order from left to right e.g. Strings, Tuples and Lists.
So in regards to a "Physically Stored Sequence", would that be a Sequence that is referenced by a variable for use later on in a program? Or am not getting it?
Thank you in advance for your answers.
A Physically Stored Sequence is best explained by contrast. It is one type of "iterable" with the main example of the other type being a "generator."
A generator is an iterable, meaning you can iterate over it as in a "for" loop, but it does not actually store anything--it merely spits out values when requested. Examples of this would be a pseudo-random number generator, the whole itertools package, or any function you write yourself using yield. Those sorts of things can be the subject of a "for" loop but do not actually "contain" any data.
A physically stored sequence then is an iterable which does contain its data. Examples include most data structures in Python, like lists. It doesn't matter in the Python parlance if the items in the sequence have any particular reference count or anything like that (e.g. the None object exists only once in Python, so [None, None] does not exactly "store" it twice).
A key feature of physically stored sequences is that you can usually iterate over them multiple times, and sometimes get items other than the "first" one (the one any iterable gives you when you call next() on it).
All that said, this phrase is not very common--certainly not something you'd expect to see or use as a workaday Python programmer.

Categories

Resources