Average length of sequence with consecutive values >100 (Python) - python

I am trying to identify the length of consecutive sequences within an array that are >100. I have found the longest sequence using the following code but need to alter to also find the average length.
def getLongestSeq(a, n):
maxIdx = 0
maxLen = 0
currLen = 0
currIdx = 0
for k in range(n):
if a[k] >100:
currLen +=1
# New sequence, store
# beginning index.
if currLen == 1:
currIdx = k
else:
if currLen > maxLen:
maxLen = currLen
maxIdx = currIdx
currLen = 0
if maxLen > 0:
print('Index : ',maxIdx,',Length : ',maxLen,)
else:
print("No positive sequence detected.")
# Driver code
arrQ160=resultsQ1['60s']
n=len(arrQ160)
getLongestSeq(arrQ160, n)
arrQ260=resultsQ2['60s']
n=len(arrQ260)
getLongestSeq(arrQ260, n)
arrQ360=resultsQ3['60s']
n=len(arrQ360)
getLongestSeq(arrQ360, n)
arrQ460=resultsQ4['60s']
n=len(arrQ460)
getLongestSeq(arrQ460, n)
output
Index : 12837 ,Length : 1879
Index : 6179 ,Length : 3474
Index : 1164 ,Length : 1236
Index : 2862 ,Length : 617

This should work:
def get_100_lengths( arr ) :
s = ''.join( ['0' if i < 100 else '1' for i in arr] )
parts = s.split('0')
return [len(p) for p in parts if len(p) > 0]
After that you may calculate an average or do whatever you like.
The result:
>>> get_100_lengths( [120,120,120,90,90,120,90,120,120] )
[3, 1, 2]

that might be a little tricky. You want to use one variable to keep track of sum of length, one variable to keep track of how many times a sequence occurred.
We can determine if a sequence terminated when current number<100 and previous number is greater than 100
def getLongestSeq(array):
total_length = total_ct = 0
last_is_greater = False
for number in array:
if number > 100:
total_length += 1
last_is_greater = True
elif number<100 and last_is_greater:
total_ct += 1
last_is_greater = False
return round(total_length / total_ct)
Did not test this code, please comment if there is any issue

You want to find all the sequences, take their lengths, and get the average. Each of those steps are relatively straightforward.
items = [1, 101, 1, 101, 101, 1, 101, 101, 101, 1]
Finding sequences: use groupby.
from itertools import groupby
groups = groupby(items, lambda x: x > 100) # (False, [1]), (True, [101]), ...
Find lengths (careful, iterable of iterables not a list):
lens = [len(g) for k, g in groups if k] # [1, 2, 3]
Find average (assumes at least one):
avg = float(sum(lens)) / len(lens) # 2.0

Related

How can i count the first and the last index of max sequence?

monets = []
for i in range(20):
choices = ['Tails', 'Eagle']
monets.append(random.choice(choices))
cnt = 0
prev = 0
for i, e in enumerate(monets):
if e == 'Eagle':
cnt += 1
if e == 'Eagle' and i == len(monets) - 1 and cnt > prev:
prev = cnt
elif e != 'Eagle':
if prev < cnt:
prev = cnt
cnt = 0
print(monets)
print(prev)
My code calculates the max sequence of 'Eagle' in random generated list, but i stuck on how to calculate first and last index of this sequence. I figured out that using enumerate may help me with it, but i mixed up. Example: ['Tails', 'Eagle','Eagle','Tails','Eagle'] => output: 1,2
This should works, this is a simple algorithm, you don't need any sophisticated libraries:
(revision 2)
m = 0
c = 0
p = -1
for [i,s] in enumerate(monets):
if s == 'Eagle':
c += 1
else:
c = 0
if c > m:
m = c
p = i
print('max Eagle:', m, 'from:', p + 1 - m, 'to:', p)
You could also use itertools.groupby to get groups of consecutive "Eagles". Combine that with enumerate, as in your approach, to pair them with the indices, and use max to find the longest sequence. Finally, get the indices from the first and last elements of that list.
>>> from itertools import groupby
>>> monets = ['Tails', 'Eagle','Eagle','Tails','Eagle']
>>> max((list(g) for k, g in groupby(enumerate(monets), key=lambda x: x[1]) if k == "Eagle"), key=len)
[(1, 'Eagle'), (2, 'Eagle')]
>>> _[0][0], _[-1][0]
(1, 2)
Just reading your code, looks like you've got following computation working (i.e. generally correct, but I didn't actually run and test for bugs)
['Tails', 'Eagle','Eagle','Tails','Eagle'] # monets list
[ 0, 1, 2, 0, 1] # 'Eagle' sequence lengths
There are a few different ways to do what you want, but continuing on your existing methodology, you can indeed use enumerate to generate the following:
[ (0, 0), (1, 1), (2, 2), (3, 0), (4, 1)] # seq lengths from before, enumerated
Where each pair represents: (index, length)
From that, find the pair with the largest length, and you'll have the end index of the sequence, in this case: (2, 2).
The first instance of length == 1, searching backwards from the end index, will give you the start index.
Sidenote: #tobias_k's answer is written in a more functional style (which I also personally prefer). It's a different methodology than you've started with, but I highly recommend learning it. Here is that method written more (IMO) readably:
import itertools as it
monets = ['Tails', 'Eagle','Eagle','Tails','Eagle']
grouped = it.groupby(enumerate(monets), key=lambda pair: pair[1])
eagle_seqs = [list(seq) for v, seq in grouped if v == 'Eagle']
longest_seq = max(eagle_seqs, key=len)
seq_idxs = [i for i, _ in longest_seq]
start_idx, end_idx = seq_idxs[0], seq_idxs[-1]
This is the most elegant solution to this problem:
import random
import numpy as np
import pandas as pd
monets = []
for i in range(20):
choices = ['Tails', 'Eagle']
monets.append(random.choice(choices))
Here the only additional thing to do is to encode the seq into num values and find the longest contiguous sequence of indices:
encode_ = {'Tails': 0, 'Eagle': 1}
df = pd.DataFrame(monets).replace(encode_)
A = np.where(df == 1)[0]
result = max(np.split(A, np.where(np.diff(A) != 1)[0] + 1), key=len).tolist()
start_idx, end_idx = result[0],result[-1]
Using a down-to-ground approach: (it returns the position of the 1st maximal sequence of consecutive terms)
lst = ['Tails', 'Eagle', 'Eagle','Tails', 'Eagle', 'Eagle','Eagle', 'Eagle', 'Tails', 'Eagle', 'Eagle','Tails']
index, counter = -1, 0
tmp_i, tmp_c = -1, 0
for i, v in enumerate(lst):
if v == 'Eagle':
# tmp-update
tmp_c += 1
if tmp_i == -1:
tmp_i = i
else:
if tmp_c > counter:
# global update
counter = tmp_c
index = tmp_i
# reset
tmp_i, tmp_c = -1, 0
# final check for occurrence of max sequence at the end of the list
if tmp_c > counter:
# global update
counter = tmp_c
index = tmp_i
boundaries_max_seq = (index, index + counter - 1)
print(boundaries_max_seq)
# (4, 7)

How to split a series by the longest repetition of a number in python?

df = pd.DataFrame({
'label':[f"subj_{i}" for i in range(28)],
'data':[i for i in range(1, 14)] + [1,0,0,0,2] + [0,0,0,0,0,0,0,0,0,0]
})
I have a dataset something like that. It looks like:
I want to cut it at where the longest repetitions of 0s occur, so I want to cut at index 18, but I want to leave index 14-16 intact. So far I've tried stuff like:
Counters
cad_recorder = 0
new_index = []
for i,row in tqdm(temp_df.iterrows()):
if row['cadence'] == 0:
cad_recorder += 1
new_index.append(i)
* But obviously that won't work since the indices will be rewritten at each occurrance of zero.
I also tried a dictionary, but I'm not sure how to compare previous and next values using iterrows.
I also took the rolling mean for X rows at a time, and if its zero then I got an index. But then I got stuck at actually inferring the range of indices. Or finding the longest sequence of zeroes.
Edit: A friend of mine suggested the following logic, which gave the same results as #shubham-sharma. The poster's solution is much more pythonic and elegant.
def find_longest_zeroes(df):
'''
Finds the index at which the longest reptitions of <1 values begin
'''
current_length = 0
max_length = 0
start_idx = 0
max_idx = 0
for i in range(len(df['data'])):
if df.iloc[i,9] <= 1:
if current_length == 0:
start_idx = i
current_length += 1
if current_length > max_length:
max_length = current_length
max_idx = start_idx
else:
current_length = 0
return max_idx
The code I went with following #shubham-sharma's solution:
cut_us_sof = {}
og_df_sof = pd.DataFrame()
cut_df_sof = pd.DataFrame()
for lab in df['label'].unique():
temp_df = df[df['label'] == lab].reset_index(drop=True)
mask = temp_df['data'] <= 1 # some values in actual dataset were 0.0000001
counts = temp_df[mask].groupby((~mask).cumsum()).transform('count')['data']
idx = counts.idxmax()
# my dataset's trailing zeroes are usually after 200th index. But I also didn't want to remove trailing zeroes < 500 in length
if (idx > 2000) & (counts.loc[idx] > 500):
cut_us_sof[lab] = idx
og_df_sof = og_df_sof.append(temp_df)
cut_df_sof = cut_df_sof.append(temp_df.iloc[:idx,:])
We can use boolean masking and cumsum to identify the blocks of zeros, then groupby and transform these blocks using count followed by idxmax to get the starting index of the block having the maximum consecutive zeros
m = df['data'].eq(0)
idx = m[m].groupby((~m).cumsum()).transform('count').idxmax()
print(idx)
18

Summation from sub list

If n = 4, m = 3, I have to select 4 elements (basically n elements) from a list from start and end. From below example lists are [17,12,10,2] and [2,11,20,8].
Then between these two lists I have to select the highest value element and after this the element has to be deleted from the original list.
The above step has to be performed m times and take the summation of the highest value elements.
A = [17,12,10,2,7,2,11,20,8], n = 4, m = 3
O/P: 20+17+12=49
I have written the following code. However, the code performance is not good and giving time out for larger list. Could you please help?
A = [17,12,10,2,7,2,11,20,8]
m = 3
n = 4
scoreSum = 0
count = 0
firstGrp = []
lastGrp = []
while(count<m):
firstGrp = A[:n]
lastGrp = A[-n:]
maxScore = max(max(firstGrp), max(lastGrp))
scoreSum = scoreSum + maxScore
if(maxScore in firstGrp):
A.remove(maxScore)
else:
ai = len(score) - 1 - score[::-1].index(maxScore)
A.pop(ai)
count = count + 1
firstGrp.clear()
lastGrp.clear()
print(scoreSum )
I would like to do that this way, you can generalize it later:
a = [17,12,10,2,7,2,11,20,8]
a.sort(reverse=True)
sums=0
for i in range(3):
sums +=a[i]
print(sums)
If you are concerned about performance, you should use specific libraries like numpy. This will be much faster !
A = [17,12,10,2,7,11,20,8]
n = 4
m = 3
score = 0
for _ in range(m):
sublist = A[:n] + A[-n:]
subidx = [x for x in range(n)] + [x for x in range(len(A) - n, len(A))]
sub = zip(sublist, subidx)
maxval = max(sub, key=lambda x: x[0])
score += maxval[0]
del A[maxval[1]]
print(score)
Your method uses a lot of max() calls. Combining the slices of the front and back lists allows you to reduce the amounts of those max() searches to one pass and then a second pass to find the index at which it occurs for removal from the list.

Find longest sequence of 0's in the integer list

A = [1,2,0,0,3,4,5,-1,0,2,-1,-3,0,0,0,0,0,0,0,0,-2,-3,-4,-5,0,0,0]
Return initial and ending index of longest sequence of 0's in the list.
As, longest sequence of 0's in above list is 0,0,0,0,0,0,0,0 so it should return 12,19 as starting and ending index.Please help with some one line python code.
I tried :
k = max(len(list(y)) for (c,y) in itertools.groupby(A) if c==0)
print(k)
which return 8 as the max length.
Now, how to find start and end index of longest sequence?
you can first use enumerate to zip the item with index,
and then itertools.groupby(list,operator.itemgetter(1)) to group by item,
filter only 0s using list(y) for (x,y) in list if x == 0,
and at last max(list, key=len) to get the longest sequence.
import itertools,operator
r = max((list(y) for (x,y) in itertools.groupby((enumerate(A)),operator.itemgetter(1)) if x == 0), key=len)
print(r[0][0]) # prints 12
print(r[-1][0]) # prints 19
You can try this:
A = [1,2,0,0,3,4,5,-1,0,2,-1,-3,0,0,0,0,0,0,0,0,2,-3,-4,-5,0,0,0]
count = 0
prev = 0
indexend = 0
for i in range(0,len(A)):
if A[i] == 0:
count += 1
else:
if count > prev:
prev = count
indexend = i
count = 0
print("The longest sequence of 0's is "+str(prev))
print("index start at: "+ str(indexend-prev))
print("index ends at: "+ str(indexend-1))
Output:
The longest sequence of 0's ist 8
index start at: 12
index ends at: 19
A nice concise native python approach
target = 0
A = [1,2,0,0,3,4,5,-1,0,2,-1,-3,0,0,0,0,0,0,0,0,2,-3,-4,-5,0,0,0]
def longest_seq(A, target):
""" input list of elements, and target element, return longest sequence of target """
cnt, max_val = 0, 0 # running count, and max count
for e in A:
cnt = cnt + 1 if e == target else 0 # add to or reset running count
max_val = max(cnt, max_val) # update max count
return max_val
Now that you have the length, find that k-length sequence of 0's in the original list. Expanding the stuff you'll eventually work into one line:
# k is given in your post
k_zeros = [0]*k
for i in range(len(A)-k):
if A[i:i+k] == k_zeros:
break
# i is the start index; i+k-1 is the end
Can you wrap this into a single statement now?
Ok, as one long disgusting line!
"-".join([sorted([list(y) for c,y in itertools.groupby([str(v)+"_"+str(i) for i,v in enumerate(A)], lambda x: x.split("_")[0]) if c[0] == '0'],key=len)[-1][a].split("_")[1] for a in [0,-1]])
It keeps track of indices by turning [1,2,0...] into ["1_0","2_1","0_2",..] and then doing some splitting and parsing.
Yes it's very ugly and you should go with one of the other answers but I wanted to share
This solution i submitted in Codility with 100 percent efficieny.
class Solution {
public int solution(int N) {
int i = 0;
int gap = 0;
`bool startZeroCount = false;
List<int> binaryArray = new List<int>();
while (N > 0)
{
binaryArray.Add(N % 2);
N = N / 2;
i++;
}
List<int> gapArr = new List<int>();
for (int j = i-1; j >= 0; j--)
{
if (binaryArray[j] == 1)
{
if(startZeroCount)
{
gapArr.Add(gap);
gap = 0;
}
startZeroCount = true;
}
else if(binaryArray[j] == 0)
{
if (startZeroCount)
gap++;
}
}
gapArr.Sort();
if (gapArr.Count != 0)
return gapArr[gapArr.Count - 1];
else return 0;enter code here
}
}
A = [1,2,0,0,3,4,5,-1,0,2,-1,-3,0,0,0,2,-3,-4,-5,0,0,0,0]
count = 0
prev = 0
indexend = 0
indexcount = 0
for i in range(0,len(A)):
if A[i] == 0:
count += 1
indexcount = i
else:
if count > prev:
prev = count
indexend = i
count = 0
if count > prev:
prev = count
indexend = indexcount
print("The longest sequence of 0's is "+str(prev))
print("index start at: "+ str(indexend-prev))
print("index ends at: "+ str(indexend-1))
To also consider if longest 0's sequecnces are at the end.
Output
The longest sequence of 0's is 4
index start at: 18
index ends at: 21
If you would like to completely avoid Python iteration you can do it with Numpy. E.g., for very long sequences, using for loops may be relatively slow. This method will use pre-compiled C for-loops under the hood. The disadvantage is that you have multiple for-loops here. Nonetheless, overall, below algorithm should be a speed gain on longer sequences.
import numpy as np
def longest_sequence(bool_array):
where_not_true = np.where(~bool_array)[0]
lengths_plus_1 = np.diff(np.hstack((-1,where_not_true,len(bool_array))))
index = np.cumsum(np.hstack((0,lengths_plus_1)))
start_in_lngth = np.argmax(lengths_plus_1)
start = index[ start_in_lngth]
length = lengths_plus_1[start_in_lngth] - 1
return start, length
t = np.array((0,1,0,1,1,1,0,0,1,1,0,1))
print(longest_sequence(t==0))
print(longest_sequence(t==1))
p = np.array((0,0,0,1,0,1,1,1,0,0,0,1,1,0,1,1,1,1))
print(longest_sequence(p==0))
print(longest_sequence(p==1))

Count consecutive characters

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?
At first, I thought I could do something like:
word = '1000'
counter = 0
print range(len(word))
for i in range(len(word) - 1):
while word[i] == word[i + 1]:
counter += 1
print counter * "0"
else:
counter = 1
print counter * "1"
So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.
In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.
How could I do this with just built-in functions?
Consecutive counts:
You can use itertools.groupby:
s = "111000222334455555"
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
After which, result looks like:
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
And you could format with something like:
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
Total counts:
Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:
from collections import Counter
s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
Your method:
As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.
For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:
counts = []
count = 1
for a, b in zip(s, s[1:]):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest
import itertools
counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.
def pairwise(iterable):
"""iterates pairwise without holding an extra copy of iterable in memory"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
counts = []
count = 1
for a, b in pairwise(s):
...
A solution "that way", with only basic statements:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
Output :
'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'
Totals (without sub-groupings)
#!/usr/bin/python3 -B
charseq = 'abbcccdddd'
distros = { c:1 for c in charseq }
for c in range(len(charseq)-1):
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
print(distros)
I'll provide a brief explanation for the interesting lines.
distros = { c:1 for c in charseq }
The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.
Then comes the loop:
for c in range(len(charseq)-1):
We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):
# replacing vars for their values
if charseq[1] == charseq[1+1]:
distros[charseq[1]] += 1
# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
distros['b'] += 1
You can see the program output below with the correct counts:
➜ /tmp ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}
You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:
sum(word[i] == word[i+1] for i in range(len(word)-1))
This produces the sum of (False, True, True, False) where False is 0 and True is 1 - which is what you're after.
If you want this to be safe you need to guard empty words (index -1 access):
sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))
And this can be improved with zip:
sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))
If we want to count consecutive characters without looping, we can make use of pandas:
In [1]: import pandas as pd
In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))
In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:
In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))
In [7]: d.ne(d.shift())
Out[7]:
0 True
1 True
2 False
3 True
dtype: bool
In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0 1
1 2
2 2
3 3
dtype: int32
This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:
count= 0
maxcount = 0
for i in str(bin(13)):
if i == '1':
count +=1
elif count > maxcount:
maxcount = count;
count = 0
else:
count = 0
if count > maxcount: maxcount = count
maxcount
There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.
w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # digits
['1', '0', '2', '3', '4']
print(cw) # counts
[3, 3, 3, 2, 2, 5]
w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # characters
print(cw) # digits
['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]
A one liner that returns the amount of consecutive characters with no imports:
def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}
That returns the amount of times any repeated character in a list is in a consecutive run of characters.
alternatively, this accomplishes the same thing, albeit much slower:
def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}
In terms of performance, I ran them with
site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))
which resulted in:
12.528256356999918
5.351301653001428
This method can definitely be improved, but without using any external libraries, this was the best I could come up with.
In python
your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
current = loop
count = count + 1
if index == len(your_string)-1:
print(f"{count}{current}", end ='')
break
if your_string[index+1] != current:
print(f"{count}{current}",end ='')
count = 0
continue
This will output
5w1e4a2w4b1n
#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1 #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
if s[i-1]==s[i]:
count = count+1
else:
l.append(count)
count=1
if i == len(s)-1: #To check the last character sequence we need loop reverse order
reverse_count=1
for i in range(-1,-(len(s)),-1): #Lopping only for last character
if s[i] == s[i-1]:
reverse_count = reverse_count+1
else:
l.append(reverse_count)
break
print(l)
Today I had an interview and was asked the same question. I was struggling with the original solution in mind:
s = 'abbcccda'
old = ''
cnt = 0
res = ''
for c in s:
cnt += 1
if old != c:
res += f'{old}{cnt}'
old = c
cnt = 0 # default 0 or 1 neither work
print(res)
# 1a1b2c3d1
Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.
After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).
s = 'abbcccda'
olds = []
for c in s:
if olds and c in olds[-1]:
olds[-1].append(c)
else:
olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)
# [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
# a1b2c3d1a1
Here is my simple solution:
def count_chars(s):
size = len(s)
count = 1
op = ''
for i in range(1, size):
if s[i] == s[i-1]:
count += 1
else:
op += "{}{}".format(count, s[i-1])
count = 1
if size:
op += "{}{}".format(count, s[size-1])
return op
data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
if data_input[start] == data_input[end]:
end = end + 1
if end == len(data_input):
value = data_input[start:end]
temp_dict[value] = len(value)
break
if data_input[start] != data_input[end]:
value = data_input[start:end]
temp_dict[value] = len(value)
start = end
print(temp_dict)
PROBLEM: we need to count consecutive characters and return characters with their count.
def countWithString(input_string:str)-> str:
count = 1
output = ''
for i in range(1,len(input_string)):
if input_string[i]==input_string[i-1]:
count +=1
else:
output += f"{count}{input_string[i-1]}"
count = 1
# Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
output += f"{count}{input_string[-1]}"
return output
countWithString(input)
input:'aaabbbaabbcc'
output:'3a3b2a2b2c'
Time Complexity: O(n)
Space Complexity: O(1)
temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
counter = 0
temp_list = []
for i in range(len(input_str)):
if i==0:
counter+=1
elif input_str[i]== input_str[i-1]:
counter+=1
if i == len(input_str)-1:
temp_list.extend([input_str[i - 1], str(counter)])
else:
temp_list.extend([input_str[i-1],str(counter)])
counter = 1
print("".join(temp_list))
consecutive_charcounter(temp_str)

Categories

Resources