Python: forward fill the data in a list - python

I have a list named x, I would like to fill the zero data with previous value, which means:
x = [x[t]=x[t-1] if x[t] == 0.0 for t in range(1,len(x)-2)]
But it displayed: SyntaxError: invalid syntax
I'm wondering where is wrong with my code? Thanks a lot.

It's your assignment x[t] = x[t-1]. Instead just use a for loop:
for t in range(1, len(x)-1):
if x[t] == 0:
x[t] = x[t-1]
Although it would probably be considered more Pythonic to use enumerate to do this:
for idx, val in enumerate(x):
if idx==0: continue # skip the first element
if val == 0:
x[idx] = x[idx-1]
# DEMO
In [1]: x = [1,0,3,0,4,0,5,0]
In [2]: for idx,val in enumerate(x):
...: if idx==0: continue
...: if val == 0:
...: x[idx] = x[idx-1]
...:
In [3]: x
Out[3]: [1, 1, 3, 3, 4, 4, 5, 5]
You could also make this work with a list comp by implementing a pairwise iterator
from itertools import tee
def pairwise(iterable):
a,b = tee(iterable)
next(b) # advance one iterator
return zip(a,b)
x = [x[0]] + [val if val else lastval for lastval,val in pairwise(x)]
We need to specifically add the first element since the pairwise iterator skips it. Alternatively we could define pairwise differently, e.g.
def pairwise(iterable):
iterable = itertools.chain([None], iterable)
a,b = itertools.tee(iterable)
next(b)
return zip(a,b)
x = [val if val else lastval for lastval,val in pairwise(x)]
# ta-da!

Here's a list comprehension to do what you require:
x = [xi if xi or i==0 else x[i-1]
for i, xi in enumerate(x)]

For full forward and backward filling (backwards in case non found before), the following will give you a filled element even if the element before it is zero:
# ∨ ∨ ∨ ∨ ∨ ∨
x = [ 0, 1, 2, 0, 3, 5, 0, 0, 0, 9, 0 ]
y = []
for i,e in enumerate(x):
if e == 0:
# search left, get first non zero
for left_e in reversed(x[:i]):
if left_e != 0:
e = left_e
break
# backward fill if all elements on the left are zeros
if e == 0:
# search right, get first non zero
for right_e in x[i+1:]:
if right_e != 0:
e = right_e
break
y.append(e)
print(y)
# [1, 1, 2, 2, 3, 5, 5, 5, 5, 9, 9]
If you want forward filling with looking only at one the previous element then Alex's answers suffice.
You can also use a simpler method next():
x = [ 0, 1, 2, 0, 3, 5, 0, 0, 0, 9, 0 ]
y = []
for i,e in enumerate(x):
if e == 0:
e = next((item for item in x[i:] if item != 0), e)
e = next((item for item in reversed(x[:i]) if item != 0), e)
y.append(e)
print(y)
# [1, 1, 2, 2, 3, 5, 5, 5, 5, 9, 9]

Related

Finding indices of first non-zero items in a list

I have the following list :
list_test = [0,0,0,1,0,2,5,4,0,0,5,5,3,0,0]
I would like to find the indices of all the first numbers in the list that are not equal to zero.
In this case the output should be:
output = [3,5,10]
Is there a Pythonic way to do this?
According to the output, I think you want the first index of continuous non-zero sequences.
As for Pythonic, I understand it as list generator, while it's poorly readable.
# works with starting with non-zero element.
# list_test = [1, 0, 0, 1, 0, 2, 5, 4, 0, 0, 5, 5, 3, 0, 0]
list_test = [0, 0, 0, 1, 0, 2, 5, 4, 0, 0, 5, 5, 3, 0, 0]
output = [i for i in range(len(list_test)) if list_test[i] != 0 and (i == 0 or list_test[i - 1] == 0)]
print(output)
There is also a numpy based solution:
import numpy as np
l = np.array([0,0,0,1,0,2,5,4,0,0,5,5,3,0,0])
non_zeros = np.where(l != 0)[0]
diff = np.diff(non_zeros)
np.append(non_zeros [0], non_zeros [1 + np.where(diff>=2)[0]]) # array([ 3, 5, 10], dtype=int64)
Explanation:
First, we find the non-zero places, then we calculate the pair differences of those position (we need to add 1 because its out[i] = a[i+1] - a[i], read more about np.diff) then we need to add the first element of non-zero and also all the values where the difference was greater then 1)
Note:
It will also work for the case where the array start with non-zero element or all non-zeros.
From the Link.
l = [0,0,0,1,0,2,5,4,0,0,5,5,3,0,0]
v = {}
for i, x in enumerate(l):
if x != 0 and x not in v:
v[x] = i
list_test = [0,0,0,1,0,2,5,4,0,0,5,5,3,0,0]
res = {}
for index, item in enumerate(list_test):
if item > 0:
res.setdefault(index, None)
print(res.keys())
I don't knwo what you mean by Pythonic way, but this is an answer using a simple loop:
list_test = [0,0,0,1,0,2,5,4,0,0,5,5,3,0,0]
out = []
if list_test[0] == 0:
out.append(0)
for i in range(1, len(list_test)):
if (list_test[i-1] == 0) and (list_test[i] != 0):
out.append(i)
Don't hesitate to precise what you mean by "Pythonic" !

Delete occurrences of an element if it occurs more than n times in Python

How can I fix my code to pass the test case for Delete occurrences of an element if it occurs more than n times?
My current code pass one test case and I'm sure that the problem is caused by order.remove(check_list[i]).
However, there is no way to delete the specific element with pop() because it is required to put an index number rather than the element in pop().
Test case
Test.assert_equals(delete_nth([20,37,20,21], 1), [20,37,21])
Test.assert_equals(delete_nth([1,1,3,3,7,2,2,2,2], 3), [1, 1, 3, 3, 7, 2, 2, 2])
Program
def delete_nth(order, max_e):
# code here
check_list = [x for x in dict.fromkeys(order) if order.count(x) > 1]
print(check_list)
print(order)
for i in range(len(check_list)):
while(order.count(check_list[i]) > max_e):
order.remove(check_list[i])
#order.pop(index)
return order
Your assertions fails, because the order is not preserved. Here is a simple example of how this could be done without doing redundant internal loops to count the occurrences for each number:
def delete_nth(order, max_e):
# Get a new list that we will return
result = []
# Get a dictionary to count the occurences
occurrences = {}
# Loop through all provided numbers
for n in order:
# Get the count of the current number, or assign it to 0
count = occurrences.setdefault(n, 0)
# If we reached the max occurence for that number, skip it
if count >= max_e:
continue
# Add the current number to the list
result.append(n)
# Increase the
occurrences[n] += 1
# We are done, return the list
return result
assert delete_nth([20,37,20,21], 1) == [20, 37, 21]
assert delete_nth([1, 1, 1, 1], 2) == [1, 1]
assert delete_nth([1, 1, 3, 3, 7, 2, 2, 2, 2], 3) == [1, 1, 3, 3, 7, 2, 2, 2]
assert delete_nth([1, 1, 2, 2], 1) == [1, 2]
A version which maintains the order:
from collections import defaultdict
def delete_nth(order, max_e):
count = defaultdict(int)
delet = []
for i, v in enumerate(order):
count[v] += 1
if count[v] > max_e:
delet.append(i)
for i in reversed(delet): # start deleting from the end
order.pop(i)
return order
print(delete_nth([1,1,2,2], 1))
print(delete_nth([20,37,20,21], 1))
print(delete_nth([1,1,3,3,7,2,2,2,2], 3))
This should do the trick:
from itertools import groupby
import numpy as np
def delete_nth(order, max_e):
if(len(order)<=max_e):
return order
elif(max_e<=0):
return []
return np.array(
sorted(
np.concatenate(
[list(v)[:max_e]
for k,v in groupby(
sorted(
zip(order, list(range(len(order)))),
key=lambda k: k[0]),
key=lambda k: k[0])
]
),
key=lambda k: k[1])
)[:,0].tolist()
Outputs:
print(delete_nth([2,3,4,5,3,2,3,2,1], 2))
[2, 3, 4, 5, 3, 2, 1]
print(delete_nth([2,3,4,5,5,3,2,3,2,1], 1))
[2, 3, 4, 5, 1]
print(delete_nth([2,3,4,5,3,2,3,2,1], 3))
[2, 3, 4, 5, 3, 2, 3, 2, 1]
print(delete_nth([2,2,1,1], 1))
[2, 1]
Originally my answer only worked for one test case, this is quick (not the prettiest) but works for both:
def delete_nth(x, e):
x = x[::-1]
for i in x:
while x.count(i) > e:
x.remove(i)
return x[::-1]

How to efficiently make a numbered list in python Class-list

How to make the following code more compact and efficient.
Here, the code was to find the position where certain numerical value resides in the list.
For example, given set of number
ListNo = [[100,2,5], [50,10], 4, 1, [6,6,500]]
The value of 100, 50 and 500 was in the position of 0,3 and 9, respectively.
The testing code was as follows
ListNo = [[100,2,5], [50,10], 4, 1, [6,6,500]]
NumberedList = ListNo
Const = 0
items = 0
for i, item in enumerate(ListNo):
MaxRange = len(item) if isinstance(item, list) else 1
for x in range(0, MaxRange):
if MaxRange > 1:
NumberedList[i][x] = Const
else:
NumberedList[i] = Const
Const = Const + 1
print(NumberedList)
[[0, 1, 2], [3, 4], 5, 6, [7, 8, 9]]
My question is, whether there is another option to make this code more compact and efficient.
You can use itertools.count:
from itertools import count
i = count()
print([[next(i) for _ in range(len(l))] if isinstance(l, list) else next(i) for l in ListNo])
This outputs:
[[0, 1, 2], [3, 4], 5, 6, [7, 8, 9]]
A recursive solution would be more elegant, and handle more cases:
def nested_list_ordinal_recurse(l, it):
if isinstance(l, list):
return [nested_list_ordinal_recurse(item, it) for item in l]
else:
return next(it)
def nested_list_ordinal(l, _it=None):
return nested_list_ordinal_recurse(l, itertools.count())
ListNo = [[100,2,5], [50,10], 4, 1, [6,6,500]];
count=-1
def counter(l=[]):
global count
if l:
return [counter() for i in l]
else:
count+=1
return count
print [counter(item) if isinstance(item, list) else counter() for item in ListNo ]
Without iter tools

Finding longest run in a list

Given a list of data, I'm trying to create a new list in which the value at position i is the length of the longest run starting from position i in the original list. For instance, given
x_list = [1, 1, 2, 3, 3, 3]
Should return:
run_list = [2, 1, 1, 3, 2, 1]
My solution:
freq_list = []
current = x_list[0]
count = 0
for num in x_list:
if num == current:
count += 1
else:
freq_list.append((current,count))
current = num
count = 1
freq_list.append((current,count))
run_list = []
for i in freq_list:
z = i[1]
while z > 0:
run_list.append(z)
z -= 1
Firstly I create a list freq_list of tuples, where every tuple's first element is the element from x_list, and where the second element is the number of the total run.
In this case:
freq_list = [(1, 2), (2, 1), (3, 3)]
Having this, I create a new list and append appropriate values.
However, I was wondering if there is a shorter way/another way to do this?
Here's a simple solution that iterates over the list backwards and increments a counter each time a number is repeated:
last_num = None
result = []
for num in reversed(x_list):
if num != last_num:
# if the number changed, reset the counter to 1
counter = 1
last_num = num
else:
# if the number is the same, increment the counter
counter += 1
result.append(counter)
# reverse the result
result = list(reversed(result))
Result:
[2, 1, 1, 3, 2, 1]
This is possible using itertools:
from itertools import groupby, chain
x_list = [1, 1, 2, 3, 3, 3]
gen = (range(len(list(j)), 0, -1) for _, j in groupby(x_list))
res = list(chain.from_iterable(gen))
Result
[2, 1, 1, 3, 2, 1]
Explanation
First use itertools.groupby to group identical items in your list.
For each item in your groupby, create a range object which counts backwards from the length of the number of consecutive items to 1.
Turn this all into a generator to avoid building a list of lists.
Use itertools.chain to chain the ranges from the generator.
Performance note
Performance will be inferior to #Aran-Fey's solution. Although itertools.groupby is O(n), it makes heavy use of expensive __next__ calls. These do not scale as well as iteration in simple for loops. See itertools docs for groupby pseudo-code.
If performance is your main concern, stick with the for loop.
You are performing a reverse cumulative count on contiguous groups. We can create a Numpy cumulative count function with
import numpy as np
def cumcount(a):
a = np.asarray(a)
b = np.append(False, a[:-1] != a[1:])
c = b.cumsum()
r = np.arange(len(a))
return r - np.append(0, np.flatnonzero(b))[c] + 1
and then generate our result with
a = np.array(x_list)
cumcount(a[::-1])[::-1]
array([2, 1, 1, 3, 2, 1])
I would use a generator for this kind of task because it avoids building the resulting list incrementally and can be used lazily if one wanted:
def gen(iterable): # you have to think about a better name :-)
iterable = iter(iterable)
# Get the first element, in case that fails
# we can stop right now.
try:
last_seen = next(iterable)
except StopIteration:
return
count = 1
# Go through the remaining items
for item in iterable:
if item == last_seen:
count += 1
else:
# The consecutive run finished, return the
# desired values for the run and then reset
# counter and the new item for the next run.
yield from range(count, 0, -1)
count = 1
last_seen = item
# Return the result for the last run
yield from range(count, 0, -1)
This will also work if the input cannot be reversed (certain generators/iterators cannot be reversed):
>>> x_list = (i for i in range(10)) # it's a generator despite the variable name :-)
>>> ... arans solution ...
TypeError: 'generator' object is not reversible
>>> list(gen((i for i in range(10))))
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
And it works for your input:
>>> x_list = [1, 1, 2, 3, 3, 3]
>>> list(gen(x_list))
[2, 1, 1, 3, 2, 1]
This can actually be made simpler by using itertools.groupby:
import itertools
def gen(iterable):
for _, group in itertools.groupby(iterable):
length = sum(1 for _ in group) # or len(list(group))
yield from range(length, 0, -1)
>>> x_list = [1, 1, 2, 3, 3, 3]
>>> list(gen(x_list))
[2, 1, 1, 3, 2, 1]
I also did some benchmarks and according to these Aran-Feys solution is the fastest except for long lists where piRSquareds solution wins:
This was my benchmarking setup if you want to confirm the results:
from itertools import groupby, chain
import numpy as np
def gen1(iterable):
iterable = iter(iterable)
try:
last_seen = next(iterable)
except StopIteration:
return
count = 1
for item in iterable:
if item == last_seen:
count += 1
else:
yield from range(count, 0, -1)
count = 1
last_seen = item
yield from range(count, 0, -1)
def gen2(iterable):
for _, group in groupby(iterable):
length = sum(1 for _ in group)
yield from range(length, 0, -1)
def mseifert1(iterable):
return list(gen1(iterable))
def mseifert2(iterable):
return list(gen2(iterable))
def aran(x_list):
last_num = None
result = []
for num in reversed(x_list):
if num != last_num:
counter = 1
last_num = num
else:
counter += 1
result.append(counter)
return list(reversed(result))
def jpp(x_list):
gen = (range(len(list(j)), 0, -1) for _, j in groupby(x_list))
res = list(chain.from_iterable(gen))
return res
def cumcount(a):
a = np.asarray(a)
b = np.append(False, a[:-1] != a[1:])
c = b.cumsum()
r = np.arange(len(a))
return r - np.append(0, np.flatnonzero(b))[c] + 1
def pirsquared(x_list):
a = np.array(x_list)
return cumcount(a[::-1])[::-1]
from simple_benchmark import benchmark
import random
funcs = [mseifert1, mseifert2, aran, jpp, pirsquared]
args = {2**i: [random.randint(0, 5) for _ in range(2**i)] for i in range(1, 20)}
bench = benchmark(funcs, args, "list size")
%matplotlib notebook
bench.plot()
Python 3.6.5, NumPy 1.14
Here's a simple iterative approach to achieve it using collections.Counter:
from collections import Counter
x_list = [1, 1, 2, 3, 3, 3]
x_counter, run_list = Counter(x_list), []
for x in x_list:
run_list.append(x_counter[x])
x_counter[x] -= 1
which will return you run_list as:
[2, 1, 1, 3, 2, 1]
As an alternative, here's one-liner to achieve this using list comprehension with enumerate but it is not performance efficient due to iterative usage of list.index(..):
>>> [x_list[i:].count(x) for i, x in enumerate(x_list)]
[2, 1, 1, 3, 2, 1]
You can count the consecutive equal items and then add a countdown from count-of-items to 1 to the result:
def runs(p):
old = p[0]
n = 0
q = []
for x in p:
if x == old:
n += 1
else:
q.extend(range(n, 0, -1))
n = 1
old = x
q.extend(range(n, 0, -1))
return q
(A couple of minutes later) Oh, that's the same as MSeifert's code but without the iterable aspect. This version seems to be almost as fast as the method shown by Aran-Fey.

find the start position of the longest sequence of 1's

I want to find the start position of the longest sequence of 1's in my array:
a1=[0,0,1,1,1,1,0,0,1,1]
#2
I am following this answer to find the length of the longest sequence. However, I was not able to determine the position.
Inspired by this solution, here's a vectorized approach to solve it -
# Get start, stop index pairs for islands/seq. of 1s
idx_pairs = np.where(np.diff(np.hstack(([False],a1==1,[False]))))[0].reshape(-1,2)
# Get the island lengths, whose argmax would give us the ID of longest island.
# Start index of that island would be the desired output
start_longest_seq = idx_pairs[np.diff(idx_pairs,axis=1).argmax(),0]
Sample run -
In [89]: a1 # Input array
Out[89]: array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1])
In [90]: idx_pairs # Start, stop+1 index pairs
Out[90]:
array([[ 2, 6],
[ 8, 10]])
In [91]: np.diff(idx_pairs,axis=1) # Island lengths
Out[91]:
array([[4],
[2]])
In [92]: np.diff(idx_pairs,axis=1).argmax() # Longest island ID
Out[92]: 0
In [93]: idx_pairs[np.diff(idx_pairs,axis=1).argmax(),0] # Longest island start
Out[93]: 2
A more compact one-liner using groupby(). Uses enumerate() on the raw data to keep the starting positions through the analysis pipeline, evenutally ending up with the list of tuples [(2, 4), (8, 2)] each tuple containing the starting position and length of non-zero runs:
from itertools import groupby
L = [0,0,1,1,1,1,0,0,1,1]
print max(((lambda y: (y[0][0], len(y)))(list(g)) for k, g in groupby(enumerate(L), lambda x: x[1]) if k), key=lambda z: z[1])[0]
lambda: x is the key function for groupby() since we enumerated L
lambda: y packages up results we need since we can only evaluate g once, without saving
lambda: z is the key function for max() to pull out the lengths
Prints '2' as expected.
This seems to work, using groupby from itertools, this only goes through the list once:
from itertools import groupby
pos, max_len, cum_pos = 0, 0, 0
for k, g in groupby(a1):
if k == 1:
pat_size = len(list(g))
pos, max_len = (pos, max_len) if pat_size < max_len else (cum_pos, pat_size)
cum_pos += pat_size
else:
cum_pos += len(list(g))
pos
# 2
max_len
# 4
You could use a for loop and check if the next few items (of length m where m is the max length) are the same as the maximum length:
# Using your list and the answer from the post you referred
from itertools import groupby
L = [0,0,1,1,1,1,0,0,1,1]
m = max(sum(1 for i in g) for k, g in groupby(L))
# Here is the for loop
for i, s in enumerate(L):
if len(L) - i + 2 < len(L) - m:
break
if s == 1 and 0 not in L[i:i+m]:
print i
break
This will give:
2
Another way of doing in a single loop, but without resorting to itertool's groupby.
max_start = 0
max_reps = 0
start = 0
reps = 0
for (pos, val) in enumerate(a1):
start = pos if reps == 0 else start
reps = reps + 1 if val == 1 else 0
max_reps = max(reps, max_reps)
max_start = start if reps == max_reps else max_start
This could also be done in a one-liner fashion using reduce:
max_start = reduce(lambda (max_start, max_reps, start, reps), (pos, val): (start if reps == max(reps, max_reps) else max_start, max(reps, max_reps), pos if reps == 0 else start, reps + 1 if val == 1 else 0), enumerate(a1), (0, 0, 0, 0))[0]
In Python 3, you cannot unpack tuples inside the lambda arguments definition, so it's preferable to define the function using def first:
def func(acc, x):
max_start, max_reps, start, reps = acc
pos, val = x
return (start if reps == max(reps, max_reps) else max_start,
max(reps, max_reps),
pos if reps == 0 else start,
reps + 1 if val == 1 else 0)
max_start = reduce(func, enumerate(a1), (0, 0, 0, 0))[0]
In any of the three cases, max_start gives your answer (i.e. 2).
Using more_itertools, a third-party library:
Given
import itertools as it
import more_itertools as mit
lst = [0, 0, 1, 1, 1, 1, 0, 0, 1, 1]
Code
longest_contiguous = max([tuple(g) for _, g in it.groupby(lst)], key=len)
longest_contiguous
# (1, 1, 1, 1)
pred = lambda w: w == longest_contiguous
next(mit.locate(mit.windowed(lst, len(longest_contiguous)), pred=pred))
# 2
See also the more_itertools.locate docstring for details on how these tools work.
For another solution that uses only Numpy, I think this should work in all the cases. The most upvoted solution is probably faster though.
tmp = np.cumsum(np.insert(np.array(a1) != 1, 0, False)) # value of tmp[i+1] was not incremented when a1[i] is 1
# [0, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4]
values, counts = np.unique(tmp, return_counts=True)
# [0, 1, 2, 3, 4], [1, 1, 5, 1, 3]
counts_idx = np.argmax(counts)
longest_sequence_length = counts[counts_idx] - 1
# 4
longest_sequence_idx = np.argmax(tmp == values[counts_idx])
# 2
I've implemented a run-searching function for numpy arrays in haggis.npy_util.mask2runs. You can use it like this:
runs, lengths = mask2runs(a1, return_lengths=True)
result = runs[lengths.argmax(), 0]

Categories

Resources