Validating the value of several variables - python

What I am after: The user is allowed to input only 0 or 1 (for a total of 4 variables). If the user inputs for example 2, 1, 1, 0 it should throw an error saying Only 0 and 1 allowed.
What I've tried so far:
if (firstBinary != 0 or firstBinary != 1 and secondBinary != 0
or secondBinary != 1 and thirdBinary != 0 or thirdBinary != 1
and forthBinary != 0 or forthBinary != 1):
print('Only 0 and 1 allowed')
else:
print('binary to base 10: result)
Problem: When I use such a statement, I get either the result even when I input for example 5, or I get 'only 0 and 1 allowed' even though I wrote all 1 or 0.
I found this which seemed to be what I was after, but it is still not working like I want it to:
if 0 in {firstBinary, secondBinary, thirdBinary, forthBinary} or 1 in \
{firstBinary, secondBinary, thirdBinary, forthBinary}:
print("Your result for binary to Base 10: ", allBinaries)
else:
print('Only 0 and 1 allowed')
This code basically gives me the same result as what I get with the first code sample.

Use any:
v1, v2, v3, v4 = 0, 1, 1, 2
if any(x not in [0, 1] for x in [v1, v2, v3, v4]):
print "bad"
of course, if you use a list it will look even better
inputs = [1, 1, 0 , 2]
if any(x not in [0, 1] for x in inputs):
print "bad"

This is due to the operator precedence in python. The or operator is of higher precedence than the and operator, the list looks like this:
or
and
not
!=, ==
(Source: https://docs.python.org/3/reference/expressions.html#operator-precedence)
So, python interprets your expression like this (the brackets are to clarify what is going on):
if (firstBinary != 0 or (firstBinary != 1 and secondBinary != 0 or (secondBinary != 1 and \
thirdBinary != 0 or (thirdBinary != 1 and forthBinary != 0 or (forthBinary != 1)))))
Which results in a different logic than what you want. There are 2 possible solutions to this, the first one is to add brackets to make the expression unambiguous. This is quite tedious and long-winded:
if ((firstBinary != 0 or firstBinary != 1) and (secondBinary != 0 or secondBinary != 1) and \
(thirdBinary != 0 or thirdBinary != 1) and (forthBinary != 0 or forthBinary != 1))
The other approach is to use the in-built all function:
vars = [firstBinary, secondBinary, thirdBinary, fourthBinary]
if not all(0 <= x <= 1 for x in vars):
print("Only 0 or 1 allowed")

I'd break it down into the two parts that you're trying to solve:
Is a particular piece of input valid?
Are all the pieces of input taken together valid?
>>> okay = [0,1,1,0]
>>> bad = [0,1,2,3]
>>> def validateBit(b):
... return b in (0, 1)
>>> def checkInput(vals):
... return all(validateBit(b) for b in vals)
...
>>> checkInput(okay)
True
>>> checkInput(bad)
False
>>>

values = [firstBinary, secondBinary, thirdBinary]
if set(values) - set([0, 1]):
print "Only 0 or 1, please"

Related

If, else return else value even when the condition is true, inside a for loop

Here is the function i defined:
def count_longest(field, data):
l = len(field)
count = 0
final = 0
n = len(data)
for i in range(n):
count = 0
if data[i:i + l] is field:
while data[i - l: i] == data[i:i + l]:
count = count + 1
i = i + 1
else:
print("OK")
if final == 0 or count >= final:
final = count
return final
a = input("Enter the field - ")
b = input("Enter the data - ")
print(count_longest(a, b))
It works in some cases and gives incorrect output in most cases. I checked by printing the strings being compared, and even after matching the requirement, the loop results in "OK" which is to be printed when the condition is not true! I don't get it! Taking the simplest example, if i enter 'as', when prompted for field, and 'asdf', when prompted for data, i should get count = 1, as the longest iteration of the substring 'as' is once in the string 'asdf'. But i still get final as 0 at the end of the program. I added the else statement just to check the if the condition was being satisfied, but the program printed 'OK', therefore informing that the if condition has not been satisfied. While in the beginning itself, data[0 : 0 + 2] is equal to 'as', 2 being length of the "field".
There are a few things I notice when looking at your code.
First, use == rather than is to test for equality. The is operator checks if the left and right are referring to the very same object, whereas you want to properly compare them.
The following code shows that even numerical results that are equal might not be one and the same Python object:
print(2 ** 31 is 2 ** 30 + 2 ** 30) # <- False
print(2 ** 31 == 2 ** 30 + 2 ** 30) # <- True
(note: the first expression could either be False or True—depending on your Python interpreter).
Second, the while-loop looks rather suspicious. If you know you have found your sequence "as" at position i, you are repeating the while-loop as long as it is the same as in position i-1—which is probably something else, though. So, a better way to do the while-loop might be like so:
while data[i: i + l] == field:
count = count + 1
i = i + l # <- increase by l (length of field) !
Finally, something that might be surprising: changing the variable i inside the while-loop has no effect on the for-loop. That is, in the following example, the output will still be 0, 1, 2, 3, ..., 9, although it looks like it should skip every other element.
for i in range(10):
print(i)
i += 1
It does not effect the outcome of the function, but when debugging you might observe that the function seems to go backward after having found a run and go through parts of it again, resulting in additional "OK"s printed out.
UPDATE: Here is the complete function according to my remarks above:
def count_longest(field, data):
l = len(field)
count = 0
final = 0
n = len(data)
for i in range(n):
count = 0
while data[i: i + l] == field:
count = count + 1
i = i + l
if count >= final:
final = count
return final
Note that I made two additional simplifications. With my changes, you end up with an if and while that share the same condition, i.e:
if data[i:i+1] == field:
while data[i:i+1] == field:
...
In that case, the if is superfluous since it is already included in the condition of while.
Secondly, the condition if final == 0 or count >= final: can be simplified to just if count >= final:.

Formatting and transforming data in Python

I am trying to reformat somewhat inconsistent values (OCR) to a standard form based on a set of rules. These values come over typically as a fraction i.e. 4 3/4 but the values are sometimes polluted with other random characters i.e. 4 .3/4. The values can also be non fractional floats (4.75). The goal is to grab the values and produce the number; input = 'T 3/' output = 3. The values will never exceed 11.
I'm sure there's a better way to do this, but this is what I have so far, and it works on most but it doesn't catch everything. Any help to help handle exceptions like 'T 3/' would be appreciated.
a = ' Y 3/'
def get_num(t):
return str(''.join(ele for ele in t if ele.isdigit()))
t = get_num(a)
y = int(t)
z = a.split('.')[0]
print len(t)
if '/' in a:
if int(str(y)[:1]) == 1 and int(str(t)[1]) != 1 and len(t) == 4 and t <1199:
print '{}{} {}/{}'.format(*t)
if int(str(y)[:1]) == 1 and int(str(t)[1]) != 1 and len(t) == 4 and t >1199:
print '{} {}/{}'.format(*t[1:])
if int(str(y)[:1]) != 1 and int(str(t)[1]) != 1 and len(t) == 3:
print '{} {}/{}'.format(*t)
if int(str(y)[:1]) != 1 and int(str(t)[1]) == 1 and len(t) == 3:
print '{} {}/{}'.format(*t)
elif '.' in a and '/' not in a:
if int(z) == 1 and len(z) == 1:
print a.replace(' ','')
if int(z) > 11 and len(z) > 1 and int(t[:1]) == 1:
print a.replace(' ','')[1:]
if int(z) != 1 and len(z) <= 2:
print a.replace(' ','')
elif '.' not in a and '/' not in a:
if int(str(y)[:1]) == 1 and int(str(t)[1]) != 1 and len(t) == 4 and t <1199:
print '{}{} {}/{}'.format(*t)
if int(str(y)[:1]) == 1 and int(str(t)[1]) != 1 and len(t) == 4 and t >1199:
print '{} {}/{}'.format(*t[1:])
if int(str(y)[:1]) != 1 and int(str(t)[1]) != 1 and len(t) == 3:
print '{} {}/{}'.format(*t)
if int(str(y)[:1]) != 1 and int(str(t)[1]) == 1 and len(t) == 3:
print '{} {}/{}'.format(*t)
else:
print(a)
Common Sample Inputs/Outputs (many more combos):
In: 4 3/4 | Out: 4 3/4
In: 4.75 | Out: 4.75
In: T 3/ | Out: 3
In: 14 3/4 | Out: 4 3/4 (leading 1 does not belong >11
In: 4 .33 | Out: 4.33
In: 3 2./3 | Out: 3 2/3
In: 3 ..33 | Out: 3.33
Essentially its a fraction if there is a '/' in the string even if it contains a '.'. If its not in the string, it's a decimal. If neither, then there is an opportunity there as well for some logic
Alright #FanScience, got an answer here. It works with all the inputs you gave, but it's slightly brittle -- happy to try to expand if you find there are inputs that it doesn't get along with.
First I imported two helpful libraries:
import re
from fractions import Fraction
Next I defined a dictionary with your inputs + expected outputs, so I could write a function that tested my methodology:
expected_results = {
"4 3/4": 4.75,
"4.75": 4.75,
"T 3/": 3,
"14 3/4": 4.75,
"4 .33": 4.33,
"3 2./3": 3.67,
"3 ..33": 3.33,
}
Next I got to writing the actual function -- let me know if you have any questions, as this is somewhat complex:
def generate_correct_float(to_convert):
# Removes any alphabetical character (A, B, a, b...)
to_convert = re.sub("[a-zA-Z]", "", elem)
# Removes extra whitespace on either end
to_convert = to_convert.strip()
final_answer = 0
if " " in to_convert: #two-part, ie 4 3/4
primary_number, secondary_number = to_convert.split()
final_answer += int(primary_number)
if "/" in secondary_number: # Fraction case
secondary_number = secondary_number.replace(".", "")
secondary_number = secondary_number.strip("/")
secondary_number = Fraction(secondary_number)
else: #Decimal Case
secondary_number = Fraction(int(secondary_number.replace(".", "")), 100)
final_answer += round(secondary_number.__float__(), 2)
else: # Single element, ie 3/
final_answer = to_convert.strip("/")
final_answer = float(final_answer)
# Eliminate cases where we're higher than 11
while final_answer > 11:
final_answer -= 10
return final_answer
Finally I wrote a wrapper function that tests inputs vs outputs and gives us a correct vs incorrect count:
incorrect = 0
for elem in expected_results:
answer = generate_correct_float(elem)
if answer != expected_results[elem]:
print('------')
print(elem)
print(expected_results[elem])
print(answer)
incorrect += 1
correct = len(expected_results) - incorrect
print(f"correct: {correct}")
print(f"incorrect: {incorrect}")
Running this with our code gives us 7/7, all with floats!
Hopefully this helps -- let me know if there are any points of clarification I can make #FanScience

Finding the longest repetitive piece in a string python

I want to write a function "longest" where my input doc test looks like this (python)
"""
>>>longest('1211')
1
>>>longest('1212')
2
>>>longest('212111212112112121222222212212112121')
2
>>>lvs('1')
0
>>>lvs('121')
0
>>>lvs('12112')
0
"""
What I am trying to achieve is that for example in the first case the 1 is repeated in the back with "11" so the repeated part is 1 and this repeated part is 1 character long it is this length that this function should return.
So in the case of the second you got "1212" so the repeated part is "12" which is 2 characters long.
The tricky thing here is that the longest is "2222222" but this doesn't matter since it is not in the front nor the back. The solution for the last doc test is that 21 is being repeated which is 2 characters long.
The code I have created this far is following
import re
def repetitions(s):
r = re.compile(r"(.+?)\1+")
for match in r.finditer(s):
yield (match.group(1), len(match.group(0)) / len(match.group(1)))
def longest(s):
"""
>>> longest('1211')
1
"""
nummer_hoeveel_keer = dict(repetitions(s)) #gives a dictionary with as key the number (for doctest 1 this be 1) and as value the length of the key
if nummer_hoeveel_keer == {}: #if there are no repetitive nothing should be returnd
return 0
sleutels = nummer_hoeveel_keer.keys() #here i collect the keys to see which has has the longest length
lengtes = {}
for sleutel in sleutels:
lengte = len(sleutel)
lengtes[lengte] = sleutel
while lengtes != {}: #as long there isn't a match and the list isn't empty i keep looking for the longest repetitive which is or in the beginning or in the back
maximum_lengte = max(lengtes.keys())
lengte_sleutel = {v: k for k, v in lengtes.items()}
x= int(nummer_hoeveel_keer[(lengtes[maximum_lengte])])
achter = s[len(s) - maximum_lengte*x:]
voor = s[:maximum_lengte*x]
combinatie = lengtes[maximum_lengte]*x
if achter == combinatie or voor == combinatie:
return maximum_lengte
del lengtes[str(maximum_lengte)]
return 0
when following doc test is put in this code
"""
longest('12112')
0
""
there is a key error where I put "del lengtes[str(maximum_lengte)]"
after a suggestion of #theausome I used his code as a base to work further with (see answer): this makes my code right now look like this:
def longest(s):
if len(s) == 1:
return 0
longest_patt = []
k = s[-1]
longest_patt.append(k)
for c in s[-2::-1]:
if c != k:
longest_patt.append(c)
else:
break
rev_l = list(reversed(longest_patt))
character = ''.join(rev_l)
length = len(rev_l)
s = s.replace(' ','')[:-length]
if s[-length:] == character:
return len(longest_patt)
else:
return 0
l = longest(s)
print l
Still there are some doc tests that are troubling me like for example:
>>>longest('211211222212121111111')
3 #I get 1
>>>longest('2111222122222221211221222112211')
4 #I get 1
>>>longest('122211222221221112111')
4 #I get 1
>>>longest('121212222112222112')
6 #I get 1
Anyone has ideas how to deal with/ approach this problem, maybe find a more graceful way around the problem ?
Try the below code. It works perfectly for your input doc tests.
def longest(s):
if len(s) == 1:
return 0
longest_patt = []
k = s[-1]
longest_patt.append(k)
for c in s[-2::-1]:
if c != k:
longest_patt.append(c)
else:
break
rev_l = list(reversed(longest_patt))
character = ''.join(rev_l)
length = len(rev_l)
s = s.replace(' ','')[:-length]
if s[-length:] == character:
return len(longest_patt)
else:
return 0
l = longest(s)
print l
Output:
longest('1211')
1
longest('1212')
2
longest('212111212112112121222222212212112121')
2
longest('1')
0
longest('121')
0
longest('12112')
0

Calculate the total resistance of a circuit given in a string

I have really been struggling to solve this problem. This is the problem:
Given a string describing the circuit, calculate the total resistance
of the circuit.
Here is an example:
input: 3 5 S
expected output: 8
The operands in the string are trailed by the operator, denoting if the resistors are either in Series or Parallel. However let's analyze a more complicated circuit:
input: 3 5 S 0 P 3 2 S P
expected output: 0
Step by step:
The 3 5 S at the beginning of the input gives us 8 and hence the first intermediate step is the string 8 0 P 3 2 S P.
8 0 P gives us 0, as one resistor is short-circuited and consequently we get 0 3 2 S P.
3 2 P is 5.
and finally 0 5 P is 0.
Here is my attempt. I tried using recursion as it seemed like a problem that can be solved that way. Firstly I wrote a helper function:
def new_resistance(a,b,c):
if c == '':
if int(a) == 0 or int(b) == 0:
return 0
else:
return 1/(1/int(a) + 1/int(b))
else:
return int(a) + int(b)
And the function that calculates the newn resistance of the circuit:
def resistance(array):
if isinstance(array, int):
return array
else:
if isinstance(array,list):
temp = array
else:
temp = array.split(" ")
i = 0
while True:
try:
a = new_resistance(temp[i], temp[i+1], temp[i+2])
except Exception as e:
i += 1
if len(temp[i+3:]) == 0:
return resistance(new_resistance(temp[i], temp[i+1], temp[i+2]))
else:
return resistance(temp[:i] + [new_resistance(temp[i], temp[i+1], temp[i+2])] + temp[i+3:])
The idea behind the program is to start at the beginning of the list and calculate the resistance of the first three elements of the list, then to append them at the beginning of a new list (without the three elements) and call the function again with the new list. Do this until only a single integer remains and return the integers.
Any help is appreciated.
UPDATE:
The solution to the problem, using a stack and a parser similar to a NPR parser.
operator_list = set('PS')
def resistance(circuit):
temp = circuit.split(" ")
stack = []
for char in temp:
if char in operator_list:
a = new_resistance(stack.pop(), stack.pop(), char)
print(a)
stack.append(a)
else:
print(char)
stack.append(char)
return stack[-1]
def new_resistance(a,b,c):
if c == 'P':
if float(a) == 0 or float(b) == 0:
return 0
else:
return 1/(1/float(a) + 1/float(b))
else:
return float(a) + float(b)
circuit = '3 5 S 0 P 3 2 S P'
resistance(circuit)
# 3
# 5
# 8.0
# 0
# 0
# 3
# 2
# 5.0
# 0
The problem is that once you reach 0 3 2 S P, you cannot simply take the first 3 elements. You need to look for number number S_or_P, wherever it is in the string.
You can use a regex for this task:
import re
circuit = '3 5 S 0 P 3 2 S P'
pattern = re.compile('(\d+) +(\d+) +([SP])')
def parallel_or_serie(m):
a, b, sp = m.groups()
if sp == 'S':
return str(int(a) + int(b))
else:
if a == '0' or b == '0':
return '0'
else:
return str(1/(1/int(a) + 1/int(b)))
while True:
print(circuit)
tmp = circuit
circuit = re.sub(pattern, parallel_or_serie, circuit, count=1)
if tmp == circuit:
break
# 3 5 S 0 P 3 2 S P
# 8 0 P 3 2 S P
# 0 3 2 S P
# 0 5 P
# 0
Note that 1 1 P will output 0.5. You could replace int by float and modify the regex in order to parse floats.
Your program, or more specifically your parser, seems to be relying on the Reverse Polish Notation, which in turn is a small variant of the Normal Polish Notation. Simply put, the RPN is an abstract representation where the operators of an arithmetical expression follow their operands, unlike in the Normal Polish Notation where the operators precede their operands. Parsers based on this representation can be easily implemented by using stacks (and usually do not need to interpret parentheses).
If you are tasked with developing that parser you may get some input from the Wikipedia article I linked above.
Credits to #none who first recognized the RPN.
Old memories came to my mind. I was playing with the FORTH language on 8-bit computers in 1980s. OK, back to Python:
circuit = '3 5 S 0 P 3 2 S P'
stack = []
for n in circuit.split():
if n == 'S':
r1 = stack.pop()
r2 = stack.pop()
stack.append(r1+r2)
elif n == 'P':
r1 = stack.pop()
r2 = stack.pop()
stack.append(0.0 if (r1 == 0 or r2 == 0) else 1/(1/r1+1/r2))
else:
stack.append(float(n))
assert len(stack) == 1
print(stack[0])
On in the spirit of VPfB for any combination of serial parallel (not only in pairs)
def calculateresistor(dataString):
stack = []
r = []
cicuit=dataString
for n in circuit.split():
if n == 'S':
stackSize=size(stack)
if size(stack)>=2:
for k in range(0,size(stack)-1):
r.append(float(stack.pop()))
r.append(float(stack.pop()))
stack.append((r[-1]+r[-2]))
elif n == 'P':
stackSize=size(stack)
if size(stack)>=2:
for k in range(0,size(stack)-1):
r.append(float(stack.pop()))
r.append(float(stack.pop()))
r.append(0.0 if (r[-1] == 0 or r[-2] == 0) else (1/(1/r[-1]+1/r[-2])))
stack.append(r[-1])
else:
stack.append(float(n))
assert len(stack) == 1
return(stack)

Count consecutive characters

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?
At first, I thought I could do something like:
word = '1000'
counter = 0
print range(len(word))
for i in range(len(word) - 1):
while word[i] == word[i + 1]:
counter += 1
print counter * "0"
else:
counter = 1
print counter * "1"
So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.
In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.
How could I do this with just built-in functions?
Consecutive counts:
You can use itertools.groupby:
s = "111000222334455555"
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
After which, result looks like:
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
And you could format with something like:
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
Total counts:
Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:
from collections import Counter
s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
Your method:
As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.
For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:
counts = []
count = 1
for a, b in zip(s, s[1:]):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest
import itertools
counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.
def pairwise(iterable):
"""iterates pairwise without holding an extra copy of iterable in memory"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
counts = []
count = 1
for a, b in pairwise(s):
...
A solution "that way", with only basic statements:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
Output :
'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'
Totals (without sub-groupings)
#!/usr/bin/python3 -B
charseq = 'abbcccdddd'
distros = { c:1 for c in charseq }
for c in range(len(charseq)-1):
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
print(distros)
I'll provide a brief explanation for the interesting lines.
distros = { c:1 for c in charseq }
The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.
Then comes the loop:
for c in range(len(charseq)-1):
We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):
# replacing vars for their values
if charseq[1] == charseq[1+1]:
distros[charseq[1]] += 1
# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
distros['b'] += 1
You can see the program output below with the correct counts:
➜ /tmp ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}
You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:
sum(word[i] == word[i+1] for i in range(len(word)-1))
This produces the sum of (False, True, True, False) where False is 0 and True is 1 - which is what you're after.
If you want this to be safe you need to guard empty words (index -1 access):
sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))
And this can be improved with zip:
sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))
If we want to count consecutive characters without looping, we can make use of pandas:
In [1]: import pandas as pd
In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))
In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:
In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))
In [7]: d.ne(d.shift())
Out[7]:
0 True
1 True
2 False
3 True
dtype: bool
In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0 1
1 2
2 2
3 3
dtype: int32
This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:
count= 0
maxcount = 0
for i in str(bin(13)):
if i == '1':
count +=1
elif count > maxcount:
maxcount = count;
count = 0
else:
count = 0
if count > maxcount: maxcount = count
maxcount
There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.
w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # digits
['1', '0', '2', '3', '4']
print(cw) # counts
[3, 3, 3, 2, 2, 5]
w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # characters
print(cw) # digits
['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]
A one liner that returns the amount of consecutive characters with no imports:
def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}
That returns the amount of times any repeated character in a list is in a consecutive run of characters.
alternatively, this accomplishes the same thing, albeit much slower:
def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}
In terms of performance, I ran them with
site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))
which resulted in:
12.528256356999918
5.351301653001428
This method can definitely be improved, but without using any external libraries, this was the best I could come up with.
In python
your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
current = loop
count = count + 1
if index == len(your_string)-1:
print(f"{count}{current}", end ='')
break
if your_string[index+1] != current:
print(f"{count}{current}",end ='')
count = 0
continue
This will output
5w1e4a2w4b1n
#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1 #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
if s[i-1]==s[i]:
count = count+1
else:
l.append(count)
count=1
if i == len(s)-1: #To check the last character sequence we need loop reverse order
reverse_count=1
for i in range(-1,-(len(s)),-1): #Lopping only for last character
if s[i] == s[i-1]:
reverse_count = reverse_count+1
else:
l.append(reverse_count)
break
print(l)
Today I had an interview and was asked the same question. I was struggling with the original solution in mind:
s = 'abbcccda'
old = ''
cnt = 0
res = ''
for c in s:
cnt += 1
if old != c:
res += f'{old}{cnt}'
old = c
cnt = 0 # default 0 or 1 neither work
print(res)
# 1a1b2c3d1
Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.
After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).
s = 'abbcccda'
olds = []
for c in s:
if olds and c in olds[-1]:
olds[-1].append(c)
else:
olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)
# [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
# a1b2c3d1a1
Here is my simple solution:
def count_chars(s):
size = len(s)
count = 1
op = ''
for i in range(1, size):
if s[i] == s[i-1]:
count += 1
else:
op += "{}{}".format(count, s[i-1])
count = 1
if size:
op += "{}{}".format(count, s[size-1])
return op
data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
if data_input[start] == data_input[end]:
end = end + 1
if end == len(data_input):
value = data_input[start:end]
temp_dict[value] = len(value)
break
if data_input[start] != data_input[end]:
value = data_input[start:end]
temp_dict[value] = len(value)
start = end
print(temp_dict)
PROBLEM: we need to count consecutive characters and return characters with their count.
def countWithString(input_string:str)-> str:
count = 1
output = ''
for i in range(1,len(input_string)):
if input_string[i]==input_string[i-1]:
count +=1
else:
output += f"{count}{input_string[i-1]}"
count = 1
# Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
output += f"{count}{input_string[-1]}"
return output
countWithString(input)
input:'aaabbbaabbcc'
output:'3a3b2a2b2c'
Time Complexity: O(n)
Space Complexity: O(1)
temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
counter = 0
temp_list = []
for i in range(len(input_str)):
if i==0:
counter+=1
elif input_str[i]== input_str[i-1]:
counter+=1
if i == len(input_str)-1:
temp_list.extend([input_str[i - 1], str(counter)])
else:
temp_list.extend([input_str[i-1],str(counter)])
counter = 1
print("".join(temp_list))
consecutive_charcounter(temp_str)

Categories

Resources