Renumber a sequence to remove gaps, but keep identical numbers - python

Assume an unordered list of numbers, with duplicates being allowed. I want to patch all gaps or sudden jumps in it. Some examples:
def renum(arr):
# magic happens here
pass
renum(np.array([1, 1, 1, 2, 2, 2])) # already in correct shape
> [1, 1, 1, 2, 2, 2]
renum(np.array([1, 1, 2, 2, 4, 4, 5, 5, 5])) # A jump between 2 and 4
> [1,1, 2, 2, 3, 3, 4, 4, 4]
renum(np.array([1, 1, 2, 2, 5, 2, 2])) # A forward and backward jump
> [1,1, 2, 2, 3, 4, 4]
Finding gaps is easy, but I have a hard time when trying to renumber gaps followed by the same number multiple times when processing the sequence elementwise. I.e the attempt below fails because numbers can occur many times:
def renum(arr):
new_arr = np.zeros(len(arr))
prev_num = new_arr[0]
for idx, num in enumerate(arr):
diff = num - prev_num
if diff == 0 or diff == 1:
new_arr[idx] = num
else:
new_arr[idx] = prev_num + 1
prev_num = new_arr[idx]
return new_arr
renum(np.array([1, 1, 2, 2, 4, 4, 5, 5, 5]))
> [1, 1, 2, 2, 3, 4, 5, 5, 5] # should actually be [1, 1, 2, 2, 3, 3, 4, 4, 4]
Also I think this implementation is not very efficient..
Any ideas?

This seems to do the trick:
def renum(input_array):
diff = np.diff(input_array)
diff[diff != 0] = 1
return np.hstack((input_array[0], diff)).cumsum()
If I understood correctly, you want the differences between your values to be 0 if they are 0 in the original array. If they are non-zero, you want them to be 1. This happens in the first two lines. Now, you can use the first original element and the newly created differences to create a new array as described here.

Related

Split one list of numbers into several based on a simple condition, lead and lag in lists [duplicate]

This question already has answers here:
Split List By Value and Keep Separators
(8 answers)
Closed 1 year ago.
Is there an easy way to split the list l below into 3 list. I want to cut the list when the sequence starts over. So every list should start with 1.
l= [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4]
l1 = [1, 2, 3,4, 5]
l2=[1,2,3,4]
l3=[1,2,3,4]
My original thought was to look at the lead value and implement a condition inside a for loop that would cut the list when x.lead < x. But how do I use lead and lag when using lists in python?
NumPy solution
import numpy as np
l = [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4]
parts = [list(i) for i in np.split(l,np.flatnonzero(np.diff(l)-1)+1)]
print(parts)
output
[[1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4]]
Explanation: I first find differences between adjacent elements using numpy.diff, then subtract 1 to be able to use numpy.flatnonzero to find where difference is other than 1, add 1 (note that numpy.diff output length is input length minus 1) to get indices for use in numpy.split, eventually convert it to list, as otherwise you would end with numpy.arrays
What about this:
l = [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4]
one_indices = [i for i, e in enumerate(l) if e == 1]
slices = []
for count, item in enumerate(one_indices):
if count == len(one_indices) - 1:
slices.append((item, None))
else:
slices.append((item, one_indices[count + 1]))
sequences = [l[x[0] : x[1]] for x in slices]
print(sequences)
Out:
[[1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4]]
Another way without numpy,
l= [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4]
start = 0
newlist = []
for i,v in enumerate(l):
if i!=0 and v==1:
newlist.append(l[start:i])
start = i
newlist.append(l[start:i+1])
print(newlist)
Working Demo: https://rextester.com/RYCV85570

Remove duplicate numbers from a list

I was attempting to remove all duplicated numbers in a list.
I was trying to understand what is wrong with my code.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers)
The result I got was:
[1, 1, 6, 5, 2, 3]
I guess the idea is to write code yourself without using library functions. Then I would still suggest to use additional set structure to store your previous items and go only once over your array:
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
unique = set()
for x in numbers:
if x not in unique:
unique.add(x)
numbers = list(unique)
print(numbers)
If you want to use your code then the problem is that you modify collection in for each loop, which is a big NO NO in most programming languages. Although Python allows you to do that, the problem and solution are already described in this answer: How to remove items from a list while iterating?:
Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]:
if x < 0: a.remove(x)
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
Using a shallow copy of the list:
for x in numbers[:]:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers) # [1, 6, 5, 2, 3]
Alternatives:
Preserving the order of the list:
Using dict.fromkeys()
print(list(dict.fromkeys(numbers).keys())) # [1, 6, 5, 2, 3]
Using more_itertools.unique_everseen(iterable, key=None):
from more_itertools import unique_everseen
print(list(unique_everseen(numbers))) # [1, 6, 5, 2, 3]
Using pandas.unique:
import pandas as pd
print(pd.unique(numbers).tolist()) # [1, 6, 5, 2, 3]
Using collections.OrderedDict([items]):
from collections import OrderedDict
print(list(OrderedDict.fromkeys(numbers))) # [1, 6, 5, 2, 3]
Using itertools.groupby(iterable[, key]):
from itertools import groupby
print([k for k,_ in groupby(numbers)]) # [1, 6, 5, 2, 3]
Ignoring the order of the list:
Using numpy.unique:
import numpy as np
print(np.unique(numbers).tolist()) # [1, 2, 3, 5, 6]
Using set():
print(list(set(numbers))) # [1, 2, 3, 5, 6]
Using frozenset([iterable]):
print(list(frozenset(numbers))) # [1, 2, 3, 5, 6]
Why don't you simply use a set:
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(set(numbers))
print(numbers)
Before anything, the first advice I can give is to never edit over an array that you are looping. All kinds of wacky stuff happens. Your code is fine (I recommend reading other answers though, there's an easier way to do this with a set, which pretty much handles the duplication thing for you).
Instead of removing number from the array you are looping, just clone the array you are looping in the actual for loop syntax with slicing.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers[:]:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers)
print("Final")
print(numbers)
The answer there is numbers[:], which gives back a clone of the array. Here's the print output:
[1, 1, 1, 6, 5, 5, 2, 3]
[1, 1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
Final
[1, 6, 5, 2, 3]
Leaving a placeholder here until I figure out how to explain why in your particular case it's not working, like the actual step by step reason.
Another way to solve this making use of the beautiful language that is Python, is through list comprehension and sets.
Why a set. Because the definition of this data structure is that the elements are unique, so even if you try to put in multiple elements that are the same, they won't appear as repeated in the set. Cool, right?
List comprehension is some syntax sugar for looping in one line, get used to it with Python, you'll either use it a lot, or see it a lot :)
So with list comprehension you will iterate an iterable and return that item. In the code below, x represents each number in numbers, x is returned to be part of the set. Because the set handles duplicates...voila, your code is done.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
nubmers_a_set = {x for x in numbers }
print(nubmers_a_set)
This seems like homework but here is a possible solution:
import numpy as np
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
filtered = list(np.unique(numbers))
print(filtered)
#[1, 2, 3, 5, 6]
This solution does not preserve the ordering. If you need also the ordering use:
filtered_with_order = list(dict.fromkeys(numbers))
Why don't you use fromkeys?
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(dict.fromkeys(numbers))
Output: [1,6,5,2,3]
The flow is as follows.
Now the list is [1, 1, 1, 1, 6, 5, 5, 2, 3] and Index is 0.
The x is 1. The numbers.count(1) is 4 and thus the 1 at index 0 is removed.
Now the numbers list becomes [1, 1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 1.
The x is 1. The numbers.count(1) is 3 and thus the 1 and index 1 is removed.
Now the numbers list becomes [1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 2.
The x will be 6.
etc...
So that's why there are two 1's.
Please correct me if I am wrong. Thanks!
A fancy method is to use collections.Counter:
>>> from collections import Counter
>>> numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> c = Counter(numbers)
>>> list(c.keys())
[1, 6, 5, 2, 3]
This method have a linear time complexity (O(n)) and uses a really performant library.
You can try:
from more_itertools import unique_everseen
items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
list(unique_everseen(items))
or
from collections import OrderedDict
>>> items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]
more you can find here
How do you remove duplicates from a list whilst preserving order?

Modifying list with numbers in python

I am trying modify a list. Currently, there is a list with random number and I would like to change the list which creates maximum number of increase between numbers. Maybe I worded badly. For example, if list is [2,3,1,2,1], I would modify into [1,2,3,1,2] since 1->2, 2->3 and 1->2 in an increase which gives total of 3 increasing sequence. Any suggestions?
I would approach your problem with this recursive algorithm. What I am doing is sorting my list, putting all duplicates at the end, and repeating the same excluding the sorted, duplicate-free list.
def sortAndAppendDuplicates(l):
l.sort()
ll = list(dict.fromkeys(l)) # this is 'l' without duplicates
i = 0
while i < (len(ll)-1):
if list[i] == list[i+1]:
a = list.pop(i)
list.append(a)
i = i - 1
i = i + 1
if hasNoDuplicates(l):
return l
return ll + sortAndAppendDuplicates(l[len(ll):])
def hasNoDuplicates(l):
return( len(l) == len( list(dict.fromkeys(l)) ) )
print(sortAndAppendDuplicates([2,3,6,3,4,5,5,8,7,3,2,1,3,4,5,6,7,7,0,1,2,3,4,4,5,5,6,5,4,3,3,5,1,2,1]))
# this would print [0, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 3, 4, 5, 3, 5, 3, 5]

Skip list and modify a global index in python

I am having an issue with last part of a code im creating. I am trying to, for example, make the list iterate to item 3 normally, but then check if the item is 3 and other condition (which doesn't matter right now), then change the index to iterate from example 10.
I made a lot of attempts but it doesn't seem to work.
li = [3, 8, 1, 2, 6, 2, 2, 3, 3, 5, 4, 5, 5, 4, 2, 1, 5, 5, 3, 5, 4, 6]
'''
HERE COMES OTHER CODE WHICH WORKS BASED ON THE ITERATION
'''
for i in range(0,len(li)):
print(i)
if i == 3: #along with other condition
def g(li):
global i
i = li[9]
g()
print(i)
Maybe if it wasn´t clear here, what i am looking for is when 3 and the other condition reach the condition, then it skip to the index 9 to keep iterating the rest of the script from 9 which would be the new value.
I am sure I have not got your question right. But while loop should be preferred here
i=0
while i<len(li):
if i == 3: #along with other condition
i = li[9]
print(i)
continue
i += 1
A simple way to do what you want is to set a flag if that condition is met and continue thru the skipped indices if that flag is true
li = [3, 8, 1, 2, 6, 2, 2, 3, 3, 5, 4, 5, 5, 4, 2, 1, 5, 5, 3, 5, 4, 6]
'''
HERE COMES OTHER CODE WHICH WORKS BASED ON THE ITERATION
'''
do_skip = False
for i in range(len(li)):
if i == 3: #along with other condition
do_skip = True
# don't skip past a certain point
if do_skip and i < 9:
continue
print(i)
Alternatively, you can use a while loop:
li = [3, 8, 1, 2, 6, 2, 2, 3, 3, 5, 4, 5, 5, 4, 2, 1, 5, 5, 3, 5, 4, 6]
'''
HERE COMES OTHER CODE WHICH WORKS BASED ON THE ITERATION
'''
i = 0
while i < len(li):
if i == 3: #along with other condition
i = 9
print(i)
# other loop operations go here
i += 1
Yet another way to do it:
li = [3, 8, 1, 2, 6, 2, 2, 3, 3, 5, 4, 5, 5, 4, 2, 1, 5, 5, 3, 5, 4, 6]
flag = True # Conditional Flag
for x, i in enumerate(li):
if x > 2 and not flag: break
if 3 > x or x > 8: print(x, "has a value of", i)

Swap two values randomly in list

I have the following list:
a = [1, 2, 5, 4, 3, 6]
And I want to know how I can swap any two values at a time randomly within a list regardless of position within the list. Below are a few example outputs on what I'm thinking about:
[1, 4, 5, 2, 3, 6]
[6, 2, 3, 4, 5, 1]
[1, 6, 5, 4, 3, 2]
[1, 3, 5, 4, 2, 6]
Is there a way to do this in Python 2.7? My current code is like this:
import random
n = len(a)
if n:
i = random.randint(0,n-1)
j = random.randint(0,n-1)
a[i] += a[j]
a[j] = a[i] - a[j]
a[i] -= a[j]
The issue with the code I currently have, however, is that it starts setting all values to zero given enough swaps and iterations, which I do not want; I want the values to stay the same in the array, but do something like 2opt and only switch around two with each swap.
You are over-complicating it, it seems. Just randomly sample two indices from the list, then swap the values at those indicies:
>>> def swap_random(seq):
... idx = range(len(seq))
... i1, i2 = random.sample(idx, 2)
... seq[i1], seq[i2] = seq[i2], seq[i1]
...
>>> a
[1, 2, 5, 4, 3, 6]
>>> swap_random(a)
>>> a
[1, 2, 3, 4, 5, 6]
>>> swap_random(a)
>>> a
[1, 2, 6, 4, 5, 3]
>>> swap_random(a)
>>> a
[1, 2, 6, 5, 4, 3]
>>> swap_random(a)
>>> a
[6, 2, 1, 5, 4, 3]
Note, I used the Python swap idiom, which doesn't require an intermediate variable. It is equivalent to:
temp = seq[i1]
seq[i1] = seq[i2]
seq[i2] = temp

Categories

Resources