Remove cycle/repetition from list python - python

I have a list of values which might or might not have certain cycles of elements in it.
I have written below code to extract the index and length of values after that which repeats in the list. My problem is that i now have multiple index and length of values which are repeating. How to remove these elements from main list to remove those cycle
data = [1,2,3,1,2,3,4,5,6,7,4,5,6,7,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18]
minrun = 1
lendata = len(data)
for runlen in range(minrun, lendata // 2):
i = 0
while i < lendata - runlen * 2:
# print("i",i)
# print("runlen", runlen)
# print(lendata - runlen * 2)
s1 = data[i:i + runlen]
# print("s1",s1)
s2 = data[i + runlen:i + runlen * 2]
# print("s2",s2)
if s1 == s2:
print(i, runlen, s1)
i += runlen
else:
i += 1
Is there a better way to do this?

This could work, but I didn't test for other sequence, I only tested it for your example.
data = [1,2,3,1,2,3,4,5,6,7,4,5,6,7,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18]
print(data)
checkindex = []
for j in range(1, len(data)):
for k in range(len(data)):
try:
if data[k] == data[j+k]:
if j+k not in checkindex:
checkindex.append(j+k)
except IndexError:
continue
checkindex = sorted(checkindex)
for i in range(len(checkindex)-1, -1, -1):
del data[checkindex[i]]
print(data)
Output:
[1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
Explanation:
for k in range(len(data)):
if data[i+k] == data[j+k]:
Basically means like this for example for j=3:
data[0] == data[3+0]: True
data[1] == data[3+1]: True
data[2] == data[3+2]: True
data[3] == data[3+3]: False
For every True, append (j+k) --> if it is not duplicate. I recommend try printing out here and there to understand more, for example:
data = [1,2,3,1,2,3,4,5,6,7,4,5,6,7,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18]
print(data)
checkindex = []
for j in range(1, len(data)):
for k in range(len(data)):
try:
if data[k] == data[j+k]:
print(j, k, j+k)
if j+k not in checkindex:
checkindex.append(j+k)
except IndexError:
continue
print(checkindex)
checkindex = sorted(checkindex)
for i in range(len(checkindex)-1, -1, -1):
del data[checkindex[i]]
print(data)
Your output will be:
[1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
[]
[]
3 0 3
3 1 4
3 2 5
[3, 4, 5]
4 6 10
4 7 11
4 8 12
4 9 13
4 10 14
4 11 15
4 12 16
4 13 17
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
8 6 14
8 7 15
8 8 16
8 9 17
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
10 14 24
10 15 25
10 16 26
10 17 27
10 18 28
10 19 29
10 20 30
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
14 10 24
14 11 25
14 12 26
14 13 27
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
18 3 21
18 4 22
18 5 23
18 6 24
18 7 25
18 8 26
18 9 27
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
21 0 21
21 1 22
21 2 23
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
I personally think the print statement helps me understanding more.

Related

plot multiple boxplot in one graph ( one column is list of data) by Plotly or other way

I don't know how to plot multiple boxplot in one graph by Plotly. I guess the point is that the value in the 'list_hour' column is the list. My dataframe is like this:
Main_Category list_hour
0 Active Life [22, 21, 18, 23, 20, 5, 13, 12, 20, 0, 22, 19,...
1 Auto Repair [17, 15, 19, 23, 16, 22, 22, 19, 15, 20, 18, 1...
2 Automotive [17, 18, 19, 0, 20, 5, 23, 17, 15, 18, 22, 16,...
3 Beauty & Spas [17, 18, 17, 0, 4, 16, 17, 23, 23, 23, 18, 2, ...
4 Event Planning & Services [22, 20, 1, 17, 1, 4, 18, 20, 18, 17, 21, 5, 2...
5 Fast Food [17, 2, 2, 2, 5, 5, 2, 2, 7, 10, 2, 2, 22, 1, ...
6 Food [21, 17, 0, 23, 5, 2, 3, 19, 22, 22, 0, 19, 22...
7 Hair Salons [17, 23, 18, 3, 0, 0, 23, 6, 23, 15, 19, 7, 3,...
8 Health & Medical [10, 21, 18, 18, 20, 14, 15, 22, 18, 23, 15, 1...
9 Home Services [2, 2, 3, 16, 4, 20, 14, 1, 2, 20, 2, 21, 7, 1...
10 Local Services [2, 17, 2, 15, 15, 15, 20, 21, 15, 17, 18, 1, ...
11 Nightlife [23, 13, 2, 1, 13, 3, 5, 19, 3, 3, 19, 22, 18,...
12 Pizza [18, 23, 0, 4, 20, 17, 3, 19, 2, 20, 20, 23, 4...
13 Restaurants [16, 0, 16, 16, 15, 21, 22, 18, 7, 15, 23, 1, ...
14 Shopping [0, 22, 18, 23, 17, 14, 21, 19, 2, 22, 3, 21, ...
Or is there any other way to plot multiple boxplot for this dataframe instead of Plotly? Appreciate your help!!!
Here is the way I solve this problem.
nn = list_hour_df.explode('list_hour').reset_index(drop=True)
nn['list_hour'] = nn['list_hour'].astype(int)
nn
0 Active Life 22
1 Active Life 21
2 Active Life 18
3 Active Life 23
4 Active Life 20
... ... ...
396416 Shopping 8
396417 Shopping 13
396418 Shopping 21
396419 Shopping 19
396420 Shopping 4
fig = px.box(nn, x="list_hour", y="Main_Category")
fig.show()
enter image description here

Python speed up permutation of bits within 32Bit number

I'm interested in reordering the bits within a number, and since I want to do it several trillion times, I want to do it fast.
Here are the details: given a number num and an order matrix order.
order contains up to ~6000 lines of permutations of the numbers 0..31.
These are the positions to which the bits change.
Simplified example: binary(num) = 1001, order[1]=[0,1,3,2], reordered number for order[1] would be 1010 (binary).
Now I want to know, if my input number num is the smallest of these (~6000) reordered numbers. I'm searching for all 32-Bit numbers which fullfill this criterion.
My current approach is to slow, so I'm looking for a speedup.
minimal-reproducible-example:
num = 1753251840
order = [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]]
patterns=set()
bits = format(num, '032b')
for perm in order:
bitsn = [bits[perm[i]] for i in range(32)]
patterns.add(int(''.join(bitsn),2))
print( min(patterns)==num)
Where can I start to improve this?
Extracting bits using string is generally very inefficient (whatever the language). The same thing also apply for parsing. Moreover, for such a fast low-level operation, you need to use a JIT or a compiled language as comments already pointed out.
Here is a prototype using the Numba's JIT (assume all numbers are unsigned):
npOrder = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]], dtype=np.uint32)
#njit
def extractBits(num):
bits = np.empty(32, dtype=np.int32)
for i in range(32):
bits[i] = (num >> i) & 0x01
return bits
#njit
def permuteAndMerge(bits, perm):
bitsnFinal = 0
for i in range(32):
bitsnFinal |= bits[31-perm[i]] << i
return bitsnFinal
#njit
def computeOptimized(num):
bits = extractBits(num)
permCount = npOrder.shape[0]
patterns = np.empty(permCount, dtype=np.uint32)
for i in range(permCount):
patterns[i] = permuteAndMerge(bits, npOrder[i])
# The array can be converted to a set if needed here with: set(patterns)
return min(patterns) == num
This code is about 25 time faster than the original one on my machine (ran 5 000 000 times).
You can also use Numba to accelerate and parallelize the loop that run the function computeOptimized resulting in a significant additional speed-up.
Note that this code can be again much faster in C or C++ using low-level processor instructions (available for example on many x86_64 processors). With that and parallelism, the order of magnitude of the execution speed should be close to a billion of permutation per second.
Couple of possible speed-ups, staying with Python and the current algorithm:
Bail out as soon as you find a pattern less than num; once one like that is found, the condition cannot possibly be true. (You also don't need to store patterns; at most a flag whether an equal one was found, if that's not guaranteed by the problem.)
bitsn could be a generator expression, and doesn't need to be in a variable; you'll have to measure whether that's faster.
More fundamental improvements:
If you want to find all the numbers (rather than just test a particular one), it feels like there ought to be a faster algorithm by considering what the bits mean. A couple of hours thinking could potentially let you process just the 6000 lists, rather than all 2³² integers.
As others have written, if you're after pure speed, python is not the ideal language. That depends on the balance of how much time you want to spend on programming vs on running the program.
Side note:
Are the 32-bit integers signed or unsigned?

Python list of objects sharing reference to other objects?

I have the following code, which on first glance should produce 10 Jobs with 3 Tasks each.
class Job:
id = None
tasks = {}
class Task:
id = None
cnt = 0
jobs = []
for i in range(0, 10):
job = Job()
job.id = i
for ii in range(0, 3):
task = Task()
task.id = cnt
job.tasks[task.id] = task
cnt += 1
jobs.append(job)
for job in jobs:
print("job {}, tasks: {}".format(job.id, job.tasks.keys()))
The result is somehow surprising - we have 30 Tasks shared by each Job:
job 0, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 1, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 2, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 3, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 4, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 5, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 6, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 7, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 8, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 9, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
Can someone explain what is going on in here?
UPDATE
tasks is a class variable shared by all the instances.
In your Job class you need to do this
class Job:
id = None
def __init__(self):
self.tasks = {}
tasks is in your class and each time you are appending to the class tasks which is shared by all the instances.

Best way to split this list into smaller lists?

I've been trying to wrap my head around the best way to split this list of numbers up that are ordered but broken up in sections. Ex:
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 29, 30, 31, 32, 33, 35, 36, 44, 45, 46, 47]
I'd like the output to be this..
sliced_data = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],[29, 30, 31, 32, 33],[35, 36],[44, 45, 46, 47]]
I've been trying a while look until it's empty but that isn't working too well..
Edit:
for each_half_hour in half_hour_blocks:
if next_number != each_half_hour:
skippers.append(half_hour_blocks[:next_number])
del half_hour_blocks[:next_number]
next_number = each_half_hour + 1
>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 29, 30, 31, 32, 33, 35, 36, 44, 45, 46, 47]
>>> from itertools import groupby, count
>>> [list(g) for k,g in groupby(data, key=lambda i, c=count():i-next(c))]
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [29, 30, 31, 32, 33], [35, 36], [44, 45, 46, 47]]
I don't see why a while-loop wouldn't work here, unless you're going for something more efficient or succinct.
Something like:
slice = [data.pop(0)]
sliced_data = []
while data:
if data[0] == slice[-1] + 1:
slice.append(data.pop(0))
else:
sliced_data.append(slice)
slice = [data.pop(0)]
sliced_data.append(slice)

How to split a list into N random-but-min-sized chunks

For example: I want to split range(37) in n=5 chunks, which each chunk having
len(chunk) >= 4.
>>> def divide(lst, min_size, split_size):
it = iter(lst)
from itertools import islice
size = len(lst)
for i in range(split_size - 1,0,-1):
s = random.randint(min_size, size - min_size * i)
yield list(islice(it,0,s))
size -= s
yield list(it)
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3], [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], [23, 24, 25, 26, 27], [28, 29, 30, 31], [32, 33, 34, 35, 36]]
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22], [23, 24, 25, 26], [27, 28, 29, 30, 31], [32, 33, 34, 35, 36]]
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28], [29, 30, 31, 32], [33, 34, 35, 36]]
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36]]
>>>
For example you could initialy set each of n chunks size to 4 and then calculate: r = (m=37 mod n), if m>=20. And then just add 1 to the first chunk and decrease r, 1 to second chunk and decrease r....and repeat until r = 0. Then you have your chunks and you can fill them.
def divide(val, num=5, minSize=4):
''' Divides val into # num chunks with each being at least of size minSize.
It limits max size of a chunk using math.ceil(val/(num-len(chunks)))'''
import random
import math
chunks = []
for i in xrange(num-1):
maxSize = math.ceil(val/(num-len(chunks)))
newSize = random.randint(minSize, maxSize)
val = val - newSize
chunks.append(newSize)
chunks.append(val)
return chunks
Calling divide with different parameters:
>>> divide(37,5,4)
>>> [7, 5, 4, 10, 11]
>>> divide(37,5,4)
>>> [4, 5, 4, 10, 14]
>>> divide(50,6,5)
>>> [6, 8, 8, 5, 9, 14]

Categories

Resources