Pythonic way to count specific neighbors in a list - python

I have a list. For example:
[0, 0, 1, 0, 0, 1, 0]
I'd like to know what is the most effective way to count the 1 -> 0 transitions. In this case for example the answer is 2 (in the 2-3 and in the 5-6 positions)
I tried the following:
stat=[0, 0, 1, 0, 0, 1, 0]
pair1=stat[:-1]
pair2=stat[1:]
result=len([i for i in zip(pair1, pair2) if i==(1,0)])
I'm wondering if there is a better way

Here are 3 ways:
from itertools import islice
import numpy as np
lst = [0, 0, 1, 0, 0, 1, 0]
res1 = sum(i - j == 1 for i, j in zip(lst, lst[1:])) # 2
res2 = sum(i - j == 1 for i, j in zip(lst, islice(lst, 1, None))) # 2
res3 = np.sum(np.diff(lst) == -1) # 2
Explanation
First method utilises sum with a generation expression and zip to loop pairwise elements.
Second method is similar to the first but performs better as it avoids building a second list explicitly.
Third method utilises the 3rd party numpy library and is a vectorised approach.

Transforming your input data with slices and zips and cuts and folds is one way to approach it. And it's awesome to see how these generic actions can be combined to build a machine that represents our intended action, even if it arrives at the desired result in a roundabout way.
However, I think a more direct approach yields a more natural program. You can express your intention using natural descriptors and operations. Another benefit is you can more clearly visualize the space-time requirements of the process your function creates. Ie, it's easy to see switches below runs in O(n); comparatively, it's very hard to estimate space-time requirements of the "machine" implementation.
A simple recursive function
def switches (iter, last = 0):
if not iter:
return 0
first, *rest = iter
if first == last:
return switches (rest, last)
else:
return 1 + switches (rest, first)
print (switches ([ 0, 0, 1, 1, 0, 0, 1, 1, 1, 0 ]))
# 4 :(
Above, the answer is 4 because it's counting switches from 0 to 1 and switches from 1 to 0. You only want to count switches in one direction. We could modify our function like this
def switches (iter, last = 0):
if not iter:
return 0
first, *rest = iter
if first == last:
return switches (rest, last)
else:
if first == 1: # only count when switching from 1
return 1 + switches (rest, first)
else:
return 0 + switches (rest, first)
print (switches ([ 0, 0, 1, 1, 0, 0, 1, 1, 1, 0 ]))
# 2 :)
But you can see there's a clever way to condense the conditional
def switches (iter, last = 0):
if not iter:
return 0
first, *rest = iter
if first == last:
return switches (rest, last)
else:
return first + switches (rest, first)
print (switches ([ 0, 0, 1, 1, 0, 0, 1, 1, 1, 0 ]))
# 2 :)

You can use sum:
s = [0, 0, 1, 0, 0, 1, 0]
new_s = sum(abs(s[i]-s[i+1]) == 1 for i in range(0, len(s)-1, 2))
Output:
2

Related

How to find the maximum probability of satisfying the conditions in all combinations of arrays

for example, I got a list of tokens and each token's number of characters(length) is
length = [2, 1, 1, 2, 2, 3, 2, 1, 1, 2, 2, 2]
and here is the list of each token's probability of [not insert a linefeed, insert a linefeed] after the token
prob = [[9.9978e-01, 2.2339e-04], [9.9995e-01, 4.9344e-05], [0.9469, 0.0531],
[9.9994e-01, 5.8422e-05], [0.9964, 0.0036], [9.9991e-01, 9.4295e-05],
[9.9980e-01, 1.9620e-04], [1.0000e+00, 5.2492e-08], [9.9998e-01, 1.8293e-05],
[9.9999e-01, 5.1220e-06], [1.0000e+00, 3.9795e-06], [0.0142, 0.9858]]
and the result for the probabilies is
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
which means inserting a linefeed after the last token.
The whole length of this line is 21, and I would like to have a maximum of 20 characters per line.
In that case, I have to insert one (in this example, maybe more in other situations) more linefeed to make sure every line has 20 characters at most.
In this example, the best answer is
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1]
since the 3rd token gets the highest probability of inserting a linefeed.
My thought is to make all combinations of these probabilities.(Multiply them instead of adding) I got 12 tokens in this example, each token gets its 0-1 Classification Probability, so there are 2^12 kinds of combination. And I use the binary sequence to record every situation (since it's a 0-1 Classification Problem)and store them in a dictionary in format of [binary sequence, the combination of probabilities].
for i in range(nums):
num *= 2
numx = bin(num)
for i in range(num):
numx = bin(numx - 1)
str1 = numx.encode('ascii').decode('ascii')
str1 = str1.lstrip('0b')
probb = 1
for k in range(len(str1)):
x = str1[k]
if int(x) == 0: # [0, 1]
probb *= prob_task2[k][0]
else:
probb *= prob_task2[k][1]
dic[str1] = probb
Then I want to sort all kinds of combination, and search the possible result from high to low.
I make two loops for making all combinations. And another two loops for searching the combinations from top to low in order to meet the restriction of characters. But I got some troubles with the efficiency, since once there are 40 tokens, I have to count 2^40 kinds of combinations.
I am not good at algorithms, so I do want to ask is there an efficient way to solve the problem.
To rephrase, you have a list of tokens of given lengths, each with an
independent probability of being followed by a line break, and you want
to find the maximum likelihood outcome whose longest line doesn’t exceed
the given max.
There is an efficient dynamic program (O(n L) where n is the number of
tokens and L is the line length). The idea is that we can prevent the
search tree from blowing up exponentially by pruning the less likely
possibilities that have the same current line length. In Python:
import collections
import math
length = [2, 1, 1, 2, 2, 3, 2, 1, 1, 2, 2, 2]
prob = [
[9.9978e-1, 2.2339e-4],
[9.9995e-1, 4.9344e-5],
[0.9469, 0.0531],
[9.9994e-1, 5.8422e-5],
[0.9964, 0.0036],
[9.9991e-1, 9.4295e-5],
[9.998e-1, 1.962e-4],
[1.0e0, 5.2492e-8],
[9.9998e-1, 1.8293e-5],
[9.9999e-1, 5.122e-6],
[1.0e0, 3.9795e-6],
[0.0142, 0.9858],
]
max_line_length = 20
line_length_to_best = {length[0]: (0, [])}
for i, (p_no_break, p_break) in enumerate(prob[:-1]):
line_length_to_options = collections.defaultdict(list)
for line_length, (likelihood, breaks) in line_length_to_best.items():
length_without_break = line_length + length[i + 1]
if length_without_break <= max_line_length:
line_length_to_options[length_without_break].append(
(likelihood + math.log2(p_no_break), breaks + [0])
)
line_length_to_options[length[i + 1]].append(
(likelihood + math.log2(p_break), breaks + [1])
)
line_length_to_best = {
line_length: max(options)
for (line_length, options) in line_length_to_options.items()
}
_, breaks = max(line_length_to_best.values())
print(breaks + [1])

Count specific value in pandas rolling window

I have a dataframe with thousands rows. One column consists of only 3 values: -1, 0, 1. I would like to count in rolling window (let's say 100) how many times a specific value (let's say 0) occurs.
How can I do it? I do not see such a method related to the object Rolling and I don't know how to do it by apply.
It's pretty simple, I coded up a quick demo. You should get the idea.
Example
# Parameters
# iterable - column
# size - window size (100)
def window(iterable, size=2):
i = iter(iterable)
win = []
for e in range(0, size):
win.append(next(i))
yield win
for e in i:
win = win[1:] + [e]
yield win
# Sample data
a = [1, 0, 0, 0, 1, 1]
from collections import Counter
result = []
value = 1 # Value to keep count (-1, 0, 1)
for i in window(a, 2):
count = Counter(i)[value]
result.append(count)
# Sample output
print(result)
[1, 0, 0, 1, 2]
I guess this will help. I tested this, It works
def cnt(x):
prev_count = 0
for i in x:
if i == 0:
prev_count+=1
return prev_count
df['col'].rolling(100,min_periods=1).apply(cnt)

Replace 0 with 01 and 1 with 10 in a List[int]

I am taking efforts to solve problem K-th Symbol in Grammar - LeetCode
On the first row, we write a 0. Now in every subsequent row, we look at the previous row and replace each occurrence of 0 with 01, and each occurrence of 1 with 10.
Explanation:
row 1: 0
row 2: 01
row 3: 0110
row 4: 01101001
I wrote such a replace function
def replace(row: "List[int]") -> "List[int]":
"""
rtype: row
"""
for i in range(len(row)):
if row[i] == 0: #0 -> 01
row.insert(i+1, 1)
elif row[i] == 1: #1 -> 10
row.insert(i+1, 0)
return row
However, it does not work properly.
In [3]: r2 = replace([0])
In [4]: r2
Out[4]: [0, 1]
In [5]: r3 = replace(r2); print(r3)
[0, 1, 0, 1] # correct is [0, 1, 1, 0]
In [6]: r4 = replace(r3); print(r4)
[0, 1, 0, 1, 0, 1, 0, 1] #correct is ['0', '1', '1', '0', '1', '0', '0', '1']
Use a new list does not work either.
def replace(row: "List[int]") -> "List[int]":
"""
rtype: row
"""
copy = list(row)
for i in range(len(copy)):
if copy[i] == 0: #0 -> 01
row.insert(i+1, 1)
elif copy[i] == 1: #1 -> 10
row.insert(i+1, 0)
return row
What's the problem?
Use str.join with dict:
a = '0'
d = {'0': '01', '1': '10'}
# Now run below line only
a = ''.join(d[s] for s in a)
a
Output:
01
0110
01101001
...
the problem with your code is that you are inserting elements in a list in a loop without taking into account the elongation of the list.
This can be solved by adding a simple "index shifter":
def replace(row: "List[int]") -> "List[int]":
for en, i in enumerate(range(len(row))):
if row[i] == 0: #0 -> 01
row.insert(i+1 + en, 1)
elif row[i] == 1: #1 -> 10
row.insert(i+1 + en, 0)
return row
replace([0, 1, 1, 0])
>>> [0, 1, 1, 0, 1, 0, 0, 1]
Some explanation:
I gave this answer because you asked what was wrong with your code, and I supposed you wanted to, or you were asked to, solve this task with this approach.
I have to say I'd definitely go with other approaches, as many have proposed, anyway:
if you really want to understand why it works print row and row[i] in the for loop
you'll probably find out that this approach works only for this specific task of the "k-th symbol" (try for example replace([0, 0])), and this is quite mind blowing
p.s. if you were wondering about the en variable, here it is used just for "didactic" purposes, as in this case it's always en == i >>> True
Update:
I had a look at the link you provided, and the task you were asked for was:
Given row N and index K, return the K-th indexed symbol in row N. (The values of K are 1-indexed).
with:
row 1: 0
row 2: 01
row 3: 0110
row 4: 01101001
# etc.
and the following constraints:
N will be an integer in the range [1, 30]
K will be an integer in the range [1, 2^(N-1)]
so, I think none of the proposed solutions is going to work with a list of 2^29 elements.
In order to solve the quiz, we have to go a bit deeper.
Basically, what we are doing by replacing 0 and 1 with 01, 10, starting from 0, at an n-th iteration, is extending list from n-1-th with the the same inverted list, in fact:
def replace(row):
return row + list(map(lambda x: int(not x), row))
replace([0, 1, 1, 0])
>>> [0, 1, 1, 0, 1, 0, 0, 1]
produces the correct output, and that's why your corrected algorithm works.
But still this wouldn't allow you to deal with a N = 2^30 constraint.
As hinted in the link, a recursive approach is the best.
Here I give a possible solution with no explanation, just as a curiosity:
def solution(n, k):
if n == 1:
return 0
if k <= 2**(n-1)/2:
return solution(n-1, k)
else:
return 1 - solution(n-1, k-2**(n-1)/2)
`
Just as an explanation of how to use another list to write a result while iterating over a list:
def replace(row: "List[int]") -> "List[int]":
ret = []
for i in range(len(row)):
if row[i] == 0: #0 -> 01
ret += [0, 1]
elif row[i] == 1: #1 -> 10
ret += [1, 0]
return ret
You shouldn't modify a iterable while iterating over unless you want to take into account length changes. Most of the time it is easier to create anothe list and fill it little by little.
And the result is:
replace([0,1,1,0])
>>> [0, 1, 1, 0, 1, 0, 0, 1]
def replace(row: "List[int]") -> "List[int]":
"""
rtype: row
"""
outList = []
for val in row:
if val == 0: #0 -> 01
outList.append(0)
outList.append(1)
elif val == 1: #1 -> 10
outList.append(1)
outList.append(0)
return outList
Your version modified the list as it read it. Its range was only the length of the original array. It only looked at two values, but it inserted a 1 after the first 0. It then saw the 1, causing it to insert another 0.

Creating a function that changes values in a list based on probability

I'm new to python (or programming for that matter). I'm looking to create a def function with two parameters: a list of bits, and an error probability.
For each element in the list of bits, there is a chance (error probability) that the element should be flipped from 0 to 1 or vice versa. The function should return the new list that contains the bits and the actual number of bits that were flipped.
I've been experimenting for about an hour and a half and couldn't really come up with anything. I know we're supposed to use the function random.random and a for loop inside the def function, but nothing has really worked.
The result should look something like this:
>>>x
[0,0,1,1,0,0,0,0]
>>>(NewList,FlipTimes)=TheFunction(x,0.2)
>>>NewList
[0,0,1,0,1,0,0,1]
>>>FlipTimes
3
Again, I'm very new to programming, so my attempt here is pretty futile.
def addNoise(a,b):
c=0
for x in a:
y=random.random
if y<b:
if x==1:
x=0
else:
x=1
for i in x:
if y<b== True:
c+=1
return(x,c)
import random
def flipbit(x, prob):
count = 0
out = []
for e in x:
if random.random() <= prob:
count += 1
out.append(int(not e))
else:
out.append(e)
return out, count
x = [0,0,1,0,1,0,0,1]
new_list, flip_times = flipbit(x, 0.2)
print ('original: ', x)
print ('new list: ', new_list)
print ('flip times: ', flip_times)
# original: [0, 0, 1, 0, 1, 0, 0, 1]
# new list: [0, 0, 1, 0, 1, 1, 1, 1]
# flip times: 2

How to shuffle an array of numbers without two consecutive elements repeating?

I'm currently trying to get an array of numbers like this one randomly shuffled:
label_array = np.repeat(np.arange(6), 12)
The only constrain is that no consecutive elements of the shuffle must be the same number. For that I'm currently using this code:
# Check if there are any occurrences of two consecutive
# elements being of the same category (same number)
num_occurrences = np.sum(np.diff(label_array) == 0)
# While there are any occurrences of this...
while num_occurrences != 0:
# ...shuffle the array...
np.random.shuffle(label_array)
# ...create a flag for occurrences...
flag = np.hstack(([False], np.diff(label_array) == 0))
flag_array = label_array[flag]
# ...and shuffle them.
np.random.shuffle(flag_array)
# Then re-assign them to the original array...
label_array[flag] = flag_array
# ...and check the number of occurrences again.
num_occurrences = np.sum(np.diff(label_array) == 0)
Although this works for an array of this size, I don't know if it would work for much bigger arrays. And even so, it may take a lot of time.
So, is there a better way of doing this?
May not be technically the best answer, hopefully it suffices for your requirements.
import numpy as np
def generate_random_array(block_length, block_count):
for blocks in range(0, block_count):
nums = np.arange(block_length)
np.random.shuffle(nums)
try:
if nums[0] == randoms_array [-1]:
nums[0], nums[-1] = nums[-1], nums[0]
except NameError:
randoms_array = []
randoms_array.extend(nums)
return randoms_array
generate_random_array(block_length=1000, block_count=1000)
Here is a way to do it, for Python >= 3.6, using random.choices, which allows to choose from a population with weights.
The idea is to generate the numbers one by one. Each time we generate a new number, we exclude the previous one by temporarily setting its weight to zero. Then, we decrement the weight of the chosen one.
As #roganjosh duly noted, we have a problem at the end when we are left with more than one instance of the last value - and that can be really frequent, especially with a small number of values and a large number of repeats.
The solution I used is to insert these value back into the list where they don't create a conflict, with the short send_back function.
import random
def send_back(value, number, lst):
idx = len(lst)-2
for _ in range(number):
while lst[idx] == value or lst[idx-1] == value:
idx -= 1
lst.insert(idx, value)
def shuffle_without_doubles(nb_values, repeats):
population = list(range(nb_values))
weights = [repeats] * nb_values
out = []
prev = None
for i in range(nb_values * repeats):
if prev is not None:
# remove prev from the list of possible choices
# by turning its weight temporarily to zero
old_weight = weights[prev]
weights[prev] = 0
try:
chosen = random.choices(population, weights)[0]
except IndexError:
# We are here because all of our weights are 0,
# which means that all is left to choose from
# is old_weight times the previous value
send_back(prev, old_weight, out)
break
out.append(chosen)
weights[chosen] -= 1
if prev is not None:
# restore weight
weights[prev] = old_weight
prev = chosen
return out
print(shuffle_without_doubles(6, 12))
[5, 1, 3, 4, 3, 2, 1, 5, 3, 5, 2, 0, 5, 4, 3, 4, 5,
3, 4, 0, 4, 1, 0, 1, 5, 3, 0, 2, 3, 4, 1, 2, 4, 1,
0, 2, 0, 2, 5, 0, 2, 1, 0, 5, 2, 0, 5, 0, 3, 2, 1,
2, 1, 5, 1, 3, 5, 4, 2, 4, 0, 4, 2, 4, 0, 1, 3, 4,
5, 3, 1, 3]
Some crude timing: it takes about 30 seconds to generate (shuffle_without_doubles(600, 1200)), so 720000 values.
I came from Creating a list without back-to-back repetitions from multiple repeating elements (referred as "problem A") as I organise my notes and there was no correct answer under "problem A" nor in the current one. Also these two problems seems different because problem A requires same elements.
Basically what you asked is same as an algorithm problem (link) where the randomness is not required. But when you have like almost half of all numbers same, the result can only be like "ABACADAEA...", where "ABCDE" are numbers. In the most voted answer to this problem, a priority queue is used so the time complexity is O(n log m), where n is the length of the output and m is the count of option.
As for this problem A easier way is to use itertools.permutations and randomly select some of them with different beginning and ending so it looks like "random"
I write draft code here and it works.
from itertools import permutations
from random import choice
def no_dup_shuffle(ele_count: int, repeat: int):
"""
Return a shuffle of `ele_count` elements repeating `repeat` times.
"""
p = permutations(range(ele_count))
res = []
curr = last = [-1] # -1 is a dummy value for the first `extend`
for _ in range(repeat):
while curr[0] == last[-1]:
curr = choice(list(p))
res.extend(curr)
last = curr
return res
def test_no_dup_shuffle(count, rep):
r = no_dup_shuffle(count, rep)
assert len(r) == count * rep # check result length
assert len(set(r)) == count # check all elements are used and in `range(count)`
for i, n in enumerate(r): # check no duplicate
assert n != r[i - 1]
print(r)
if __name__ == "__main__":
test_no_dup_shuffle(5, 3)
test_no_dup_shuffle(3, 17)

Categories

Resources