How can I restart a string iterator endlessly? [duplicate] - python

This question already has answers here:
Circular list iterator in Python
(9 answers)
Closed last month.
This question is somewhat related to this, this, and this one. Assume I have two generators/iterators of different lengths:
>>> s = "abcde"
>>> r = range(0, 16)
I now want to repeat iterating over the shorter one until the longer one is exhausted. The standard zip() function terminates once the shorter of the two is exhausted:
>>> for c, i in zip(s, r) :
... print(c, i)
...
a 0
b 1
c 2
d 3
e 4
The best I can come up with is wrapping the string into a generator like so:
>>> def endless_s(s) :
... while True :
... for c in s :
... yield c
which gives me the desired result of
>>> _s = endless_s(s)
>>> for c, i in zip(_s, r) :
... print(c, i)
...
a 0
b 1
c 2
d 3
e 4
a 5
b 6
c 7
d 8
e 9
a 10
b 11
c 12
d 13
e 14
a 15
Now I wonder: is there a better and more compact way of doing this? Like an endless string join, or some such?

You could do this with itertools.cycle:
Make an iterator returning elements from the iterable and saving a
copy of each. When the iterable is exhausted, return elements from the
saved copy. Repeats indefinitely.
which is able to replace your function entirely:
from itertools import cycle as endless_s

Related

Python - Multiple Assignment [duplicate]

This question already has answers here:
Multiple assignment and evaluation order in Python
(11 answers)
Closed 2 years ago.
Recently I was reading through the official Python documentation when I came across the example on how to code the Fibonacci series as follows:
a, b = 0, 1
while a < 10:
print (a)
a, b = b, a + b
which outputs to 0,1,1,2,3,5,8
Since I've never used multiple assignment myself, I decided to hop into Visual Studio to figure out how it worked. I noticed that if I changed the notation to...
a = 0
b = 1
while a < 10:
print (a)
a, b = b, a + b
... the output remains the same.
However, if I change the notation to...
a = 0
b = 1
while a < 10:
print(a)
a = b
b = a + b
... the output changes to 0, 1, 2, 4, 8
The way I understand multiple assignments is that it shrinks what can be done into two lines into one. But obviously, this reasoning must be flawed if I can't apply this logic to the variables under the print(a) command.
It would be much appreciated if someone could explain why this is/what is wrong with my reasoning.
a = 0
b = 1
while a < 10:
print(a)
a = b
b = a + b
In this case, a becomes b and then b becomes the changed a + b
a, b = 0, 1
while a < 10:
print (a)
a, b = b, a+b
In this case, a becomes b and at the same time b becomes the original a + b.
That's why, in your case b becomes the new a + b, which, since a = b, basically means b = b + b. That's why the value of b doubles everytime.
When you do a, b = d, e the order in which assignment happens in from right to left. That is, b is first given the value of e and then the other assignment happens. So when you do a, b = b, a + b what you are effectively writing is,
b = a + b
a = b
Hence the difference.
You can verify this by doing
a = 0
b = 1
while a < 10:
a, b = b, a + b
print(a, b)
the first output is 1 1. So first b becomes 0+1 and then a is given the value of b=a making it also 1.
If you want more details on how this works, you can check out this question.
In a multiple assignment, the right side is always computed first.
In effect,
a, b = b, a + b
is the same as:
b = a + b
a = b

How to loop through .csv file and extract certain values in python?

I'm trying to loop through the 11th column in a CSV file and search for the term "abc" (as an example). For every "abc" it finds, I want it to return the value of the first column of the same row, unless it's empty. If it's empty, I want it to go up the first column row by row until it finds a cell that's not empty and return the value of that cell.
I've already imported the needed CSV file. Here's my code trying to do the above.
for row in csvReader:
if row[10] == 'abc':
colAVal = row
while colAVal[0] == '' and colAVal != 0:
colAVal -= 1
print(colAVal[0])
My question is does this code do what it's supposed to do?
And for the second part of what I'm trying to do, I want to be able to manipulate the values that it returns - is there a way of storing these values so that that I can write code that does something for every colAVal[0] that the first part returned?
What you have there won't quite do what you want. Involking
colAVal -= 1
does not give you the previous row in an iterator. In languages with a more standard for loop, you could instead access the data you want by going backwards on the current iterator row until you found what you wanted, but in python this is not the recommended approach. Python's for loop is more of a for each loop, and as such once you've gone from one row to the next, the previous is inaccessable without saving it or accessing it directly by row count on the input data object. Mixing these kinds of access is highly not recommended, and can get confusing fast.
You also have two questions in you question above, and I'll try my best to answer both.
Given a dataset that looks like the following:
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12
0,0,0,0,0,0,0,0,0,0,abc,0
1,1,1,1,1,1,1,1,1,1,1,1
2,2,2,2,2,2,2,2,2,2,2,2
3,3,3,3,3,3,3,3,3,3,3,3
4,4,4,4,4,4,4,4,4,4,4,4
,5,5,5,5,5,5,5,5,5,abc,5
,6,6,6,6,6,6,6,6,6,abc,6
7,7,7,7,7,7,7,7,7,7,7,7
you would expect the answers to be 0, 4, and 4, if I'm understanding your question correctly. You could accomplish that and save the data for later use with something like the following:
#! /usr/bin/env python
import csv
results = []
with open('example.csv') as file_handler:
for row in csv.reader(file_handler):
if row[0] != '' and row[0] != 0:
lastValidFirstColumn = row[0]
if row[10] == 'abc':
results.append(lastValidFirstColumn)
print(results)
# prints ['0', '4', '4']
the data you want if I understood correctly is now stored in the results variable. Its not too difficult to write it to file or do other manipulations for it, and I'd recommend looking them up yourself, it'd be a better learning experience.
You can do this in pandas pretty easily
import pandas as pd
import numpy as np
df = pd.read_csv('my.csv', header=None)
Using a made up csv, we have these values
0 1 2 3 4 5 6 7 8 9 10
0 20.0 b a b a b a b a b abc
1 NaN c d c d c d c d c def
2 10.0 d e d e d e d e d ghi
3 NaN e f e f e f e f e abc
df['has_abc'] = np.where(df[10]=='abc', df.ffill()[0], np.nan)
df.dropna(subset=['has_abc'], inplace=True)
Output
0 1 2 3 4 5 6 7 8 9 10 has_abc
0 20.0 b a b a b a b a b abc 20.0
3 NaN e f e f e f e f e abc 10.0

Explain how multiple variable assignment in single line works? (Example: a, b = b, a+b)

Say I have this Python code
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while b < n:
result.append(b)
a, b = b, a+b
return result
For n=1000 this prints:
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
But I don't understand why it's 1 1 2 3.
The issue is with this line:
a, b = b, a+b
What is the order of execution?
The two options I see are:
1:
a = b
b = a+b
2:
b = a+b
a = b
But neither gives me the correct result when I try it manually.
What am I missing?
None of the two options you shared actually describe the working of:
a, b = b, a+b
Above code assigns a with the value of b. And b with the older value of a+b (i.e. in a+b the older value of a). You may consider it as an equivalent of:
>>> temp_a, temp_b = a, b
>>> a = temp_b
>>> b = temp_a + temp_b
Example: Dual variable assignment in one line:
>>> a, b = 3, 5
>>> a, b = b, a+b
>>> a
5
>>> b
8
Equivalent Explicit Logic:
>>> a, b = 3, 5
>>> temp_a, temp_b = a, b
>>> a = temp_b
>>> b = temp_a + temp_b
>>> a
5
>>> b
8
The order of operations in a, b = b, a+b is that the tuple (b, a+b) is constructed, and then that tuple is assigned to the variables (a, b). In other words, the right side of the assignment is entirely evaluated before the left side.
(Actually, starting with Python 2.6, no tuple is actually constructed in cases like this with up to 3 variables - a more efficient series of bytecode operations gets substituted. But this is, by design, not a change that has any observable differences.)
It's python standard way to swap two variables, Here is a working example to clear your doubt,
Python evaluates expressions from left to right. Notice that while
evaluating an assignment, the right-hand side is evaluated before the
left-hand side.
http://docs.python.org/3/reference/expressions.html#evaluation-order
a=[1,2,3,4,5]
for i,j in enumerate(a):
if i==1:
a[i-1],a[i]=a[i],a[i-1]
print(a)
output:
[1, 2, 3, 4, 5]
For more info , read this tutorial

python, rank a list of number/string (convert list elements to ordinal value)

Say I have a list (or numpy array or pandas series) as below
l = [1,2,6,6,4,2,4]
I want to return a list of each value's ordinal, 1-->1(smallest), 2-->2, 4-->3, 6-->4 and
to_ordinal(l) == [1,2,4,4,3,2,4]
and I want it to also work for list of strings input.
I can try
s = numpy.unique(l)
then loop over each element in l and find its index in s. Just wonder if there is a direct method?
In pandas you can call rank and pass method='dense':
In [18]:
l = [1,2,6,6,4,2,4]
s = pd.Series(l)
s.rank(method='dense')
Out[18]:
0 1
1 2
2 4
3 4
4 3
5 2
6 3
dtype: float64
This also works for strings:
In [19]:
l = ['aaa','abc','aab','aba']
s = pd.Series(l)
Out[19]:
0 aaa
1 abc
2 aab
3 aba
dtype: object
In [20]:
s.rank(method='dense')
Out[20]:
0 1
1 4
2 2
3 3
dtype: float64
I don't think that there is a "direct method" for this1. The most straight forward way that I can think to do it is to sort a set of the elements:
sorted_unique = sorted(set(l))
Then make a dictionary mapping the value to it's ordinal:
ordinal_map = {val: i for i, val in enumerate(sorted_unique, 1)}
Now one more pass over the data and we can get your list:
ordinals = [ordinal_map[val] for val in l]
Note that this is a roughly O(NlogN) algorithm (due to the sort) -- And the more non-unique elements you have, the closer it becomes to O(N).
1Certainly not in vanilla python and I don't know of anything in numpy. I'm less familiar with pandas so I can't speak to that.

Creating Simultaneous Loops in Python

I want to create a loop who has this sense:
for i in xrange(0,10):
for k in xrange(0,10):
z=k+i
print z
where the output should be
0
2
4
6
8
10
12
14
16
18
You can use zip to turn multiple lists (or iterables) into pairwise* tuples:
>>> for a,b in zip(xrange(10), xrange(10)):
... print a+b
...
0
2
4
6
8
10
12
14
16
18
But zip will not scale as well as izip (that sth mentioned) on larger sets. zip's advantage is that it is a built-in and you don't have to import itertools -- and whether that is actually an advantage is subjective.
*Not just pairwise, but n-wise. The tuples' length will be the same as the number of iterables you pass in to zip.
The itertools module contains an izip function that combines iterators in the desired way:
from itertools import izip
for (i, k) in izip(xrange(0,10), xrange(0,10)):
print i+k
You can do this in python - just have to make the tabs right and use the xrange argument for step.
for i in xrange(0, 20, 2);
print i
What about this?
i = range(0,10)
k = range(0,10)
for x in range(0,10):
z=k[x]+i[x]
print z
0
2
4
6
8
10
12
14
16
18
What you want is two arrays and one loop, iterate over each array once, adding the results.

Categories

Resources