Break a binary string down to segments - python

The task here is to break down a string 110011110110000 into a list:
['11', '00', '1111', '0', '11', '0000']
My solution is
str1='110011110110000'
seg = []
a0=str1[0]
seg0=''
for a in str1:
print('a=',a)
if a==a0:
seg0=seg0+a
else:
print('seg0=',seg0)
seg.append(seg0)
seg0=a
a0=a
seg.append(seg0)
seg
It's ugly and I am sure you guys out there have a one-liner for this. Maybe regex?

You can use itertools.groupby (doc):
str1='110011110110000'
from itertools import groupby
l = [v * len([*g]) for v, g in groupby(str1)]
print(l)
Prints:
['11', '00', '1111', '0', '11', '0000']
EDIT: version with regex:
str1='110011110110000'
import re
print([g[0] for g in re.findall(r'((\d)\2*)', str1)])

Here is a regex solution:
result = [x[0] for x in re.findall(r'(([10])\2*)', str1)]
The regex is (([10])\2*), find a 0 or 1, then keep looking for that same thing. Since findall returns all groups in the match, we need to map it to the first group (Group 2 is the ([10]) bit).

Here is an iterative regex approach, using the simple pattern 1+|0+:
str1 = "110011110110000"
pattern = re.compile(r'(1+|0+)')
result = []
for m in re.finditer(pattern, str1):
result.append(m.group(0))
print(result)
This prints:
['11', '00', '1111', '0', '11', '0000']
Note that we might want to instead use re.split here. The problem with re.split is that it doesn't seem to support splitting on lookarounds. In other languages, such as Java, we could try splitting on this pattern:
(?<=0)(?=1)|(?<=1)(?=0)
This would nicely generate the array/list we expect.

one line solution using groupy
from itertools import groupby
text='1100111101100001'
sol = [''.join(group) for key, group in groupby(text)]
print(sol)
output
['11', '00', '1111', '0', '11', '0000', '1']
not regex solution, but improvement on ur code
str1='110011110110000'
def func(string):
tmp = string[0]
res =[]
for i, v in enumerate(string, 1):
if v==tmp[-1]:
tmp+=v
else:
res.append(tmp)
tmp=v
res.append(tmp)
return res
print(func(str1))
output
['111', '00', '1111', '0', '11', '0000']

You can use general regex (.)\1*
(.) - match single character (any) and store it in first capturing group
\1* - repeat what's ca[ptured in first captruing group zero or more times
Demo
Matches collection will be your desired result.

Related

Python group repeating characters in list together

I want to split a string of 1s and 0s into a list of repeated groups of these characters.
Example:
m = '00001100011111000'
Goes to:
m = ['0000', '11', '000', '11111', '000']
How would I do this in efficient concise code? Would regex work for this?
Yes, you can use re. For example:
import re
m = "00001100011111000"
print(["".join(v) for v, _ in re.findall(r"((.)\2*)", m)])
Prints:
['0000', '11', '000', '11111', '000']
Other regex:
print(re.findall(r"0+|1+", m))
Prints:
['0000', '11', '000', '11111', '000']
You can use itertools groupby
from itertools import groupby
m = '00001100011111000'
["".join(g) for _, g in groupby(m)]
Another method using itertools.groupby:
from itertools import groupby
["".join(g) for k, g in groupby(m)]
You can make use of re.finditer():
m = "00001100011111000"
[s.group(0) for s in re.finditer(r"(\d)\1*", m)]
Output:
['0000', '11', '000', '11111', '000']

How to seperate decimal numbers written back to back

How would it be possible to seperate a string of values (in my case, only corresponding to roman numeral values) into elements of a list?
'10010010010100511' -> [100, 100, 100, 10, 100, 5, 1, 1,]
I want to create something that goes like:
if it is a zero add it to side
if it's not a zero create a new element for it
You were on the right track with zeros, you have to notice that every base roman numeral is either a 1 or 5 followed by some amount of zeros. You can represent that as a very simple regex.
import re
s = '10010010010100511'
pattern = "[1|5]0*"
matches = re.finditer(pattern=pattern, string=s)
l = [match[0] for match in matches]
print(l) # ['100', '100', '100', '10', '100', '5', '1', '1']
If for some reason you don't want to use regex, you can simply iterate over each character using the same principle:
string = '10010010010100511'
lst = []
for char in string:
if char in ['1', '5']:
lst.append(char)
elif char == '0':
lst[-1] += '0'
print(lst) # ['100', '100', '100', '10', '100', '5', '1', '1']
Code:
s='10010010010100511'
d=[]
c=0 #introducing this new varible just to know from where
for i in range(len(s)): ##Here basic idea is to check next value
if i+1 <len(s):
if s[i+1]!='0': #if NEXT value is not zero thn
d.append(s[c:i+1]) #get string from - to and add in d list
c=len(s[:i+1])
else:
d.append(s[-1])
d
Output:
['100', '100', '100', '10', '100', '5', '1', '1']

Regex for split or findall each digit python

What is the best solution to split this str var into a continuous number list
My solution :
>>> str
> '2223334441214844'
>>> filter(None, re.split("(0+)|(1+)|(2+)|(3+)|(4+)|(5+)|(6+)|(7+)|(8+)|(9+)", str))
> ['222', '333', '444', '1', '2', '1', '4', '8', '44']
The more flexible way would be to use itertools.groupby which is made to match consecutive groups in iterables:
>>> s = '2223334441214844'
>>> import itertools
>>> [''.join(group) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
The key would be the single key that is being grouped on (in your case, the digit). And the group is an iterable of all the items in the group. Since the source iterable is a string, each item is a character, so in order to get back the fully combined group, we need to join the characters back together.
You could also repeat the key for the length of the group to get this output:
>>> [key * len(list(group)) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you wanted to use regular expressions, you could make use of backreferences to find consecutive characters without having to specify them explicitly:
>>> re.findall('((.)\\2*)', s)
[('222', '2'), ('333', '3'), ('444', '4'), ('1', '1'), ('2', '2'), ('1', '1'), ('4', '4'), ('8', '8'), ('44', '4')]
For finding consecutive characters in a string, this is essentially the same that groupby will do. You can then filter out the combined match to get the desired result:
>>> [x for x, *_ in re.findall('((.)\\2*)', s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
One solution without regex (that is not specific to digits) would be to use itertools.groupby():
>>> from itertools import groupby
>>> s = '2223334441214844'
>>> [''.join(g) for _, g in groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you only need to extract consecutive identical digits, you may use a matching approach using r'(\d)\1*' regex:
import re
s='2223334441214844'
print([x.group() for x in re.finditer(r'(\d)\1*', s)])
# => ['222', '333', '444', '1', '2', '1', '4', '8', '44']
See the Python demo
Here,
(\d) - matches and captures into Group 1 any digit
\1* - a backreference to Group 1 matching the same value, 0+ repetitions.
This solution can be customized to match any specific consecutive chars (instead of \d, you may use \S - non-whitespace, \w - word, [a-fA-F] - a specific set, etc.). If you replace \d with . and use re.DOTALL modifier, it will work as the itertools solutions posted above.
Use a capture group and backreference.
str = '2223334441214844'
import re
print([i[0] for i in re.findall(r'((\d)\2*)', str)])
\2 matches whatever the (\d) capture group matched. The list comprehension is needed because when the RE contains capture groups, findall returns a list of the capture groups, not the whole match. So we need an extra group to get the whole match, and then need to extract that group from the result.
What about without importing any external module ?
You can create your own logic in pure python without importing any module Here is recursive approach,
string_1='2223334441214844'
list_2=[i for i in string_1]
def con(list_1):
group = []
if not list_1:
return 0
else:
track=list_1[0]
for j,i in enumerate(list_1):
if i==track[0]:
group.append(i)
else:
print(group)
return con(list_1[j:])
return group
print(con(list_2))
output:
['2', '2', '2']
['3', '3', '3']
['4', '4', '4']
['1']
['2']
['1']
['4']
['8']
['4', '4']

trying to prepend zeroes in python but getting strange results

I am trying to take a list of strings, and prepend an amount of zeroes to the front so that they are all the same length. I have this:
def parity(binlist):
print(binlist)
for item in binlist:
if len(item)==0:
b='000'
elif len(item)==1:
b='00{}'.format(item)
elif len(item)==2:
b='0{}'.format(item)
binlist.remove(item)
binlist.append(b)
return binlist
This is binlist:
['1', '10', '11', '11']
and i want to get this after running it:
['001', '010', '011', '011']
but I get this:
['10', '11', '11', '001']
which really confuses me.
thanks for any help at all.
Try this:
>>> n = "7"
>>> print n.zfill(3)
>>> "007"
This way you will have always a 3 chars string (if the number is minor than 1000)
http://www.tutorialspoint.com/python/string_zfill.htm
The native string formatting operations allow you to do this without all the trouble you're putting in. Here's an example.
x = ['1', '10', '11', '11']
print ["{:>03s}".format(t) for t in x]
['001', '010', '011', '011']
This is caused because you are deleting the elements in the list while iterating through the list using a for loop. Doing so does not iterate over the full list. You can use a while loop to solve this problem.
You can do this in a one-liner using zfill:
>>> map(lambda binlist_item: binlist_item.zfill(3), ['1', '10', '11', '11'] )
['001', '010', '011', '011']
Fill with zeros for each item in the list
binlist = [i.zfill(3) for i in binlist]

Best way of getting the longest sequence of even digits in an integer

What would be the most efficient way of getting the longest sequence of even digits in an integer in Python? For example, if I have a number 2456890048, the longest sequence should be 0048.
Should the integer be converted to a string to determine the longest sequence? Or should it be converted into the list and then, based on the indexes of each item, we would determine which sequence is the longest? Or is there a more efficient way that I am not aware of ( i am quite new to Python, and i am not sure what would be the best way to tackle this problem).
You can use itertools.groupby and max:
>>> from itertools import groupby
def solve(strs):
return max((list(g) for k, g in groupby(strs, key=lambda x:int(x)%2) if not k),
key=len)
...
>>> solve('2456890048') #or pass `str(2456890048)` if you've integers.
['0', '0', '4', '8']
>>> solve('245688888890048')
['6', '8', '8', '8', '8', '8', '8']
Here:
[list(g) for k, g in groupby('2456890048', key=lambda x:int(x)%2) if not k]
returns:
[['2', '4'], ['6', '8'], ['0', '0', '4', '8']]
Now we can apply max on this list(with key=len) to get the longest sequence. (Note that in the original code I am using a generator expression with max, so the list is not created in the memory.)
I think it is one of most efficient way
def longest(i):
curMax = m = 0
while i != 0:
d = i % 10 % 2
i = i / 10
if d == 0:
curMax += 1
else:
m = max(m, curMax)
curMax = 0
return max(m, curMax)
print longest(2456890048)
You can extract all runs of even numbers using a regexp, and find the longest using max.
import re
def longest_run(d):
return max(re.findall('[02468]+', str(d)), key=len)

Categories

Resources