Greedy String Tiling in Python

Greedy String Tiling in Python - python

I am trying to learn greedy string tiling in algorithm
I have two lists as follows:
a=['a','b','c','d','e','f']
b=['d','e','a','b','c','f']
i would like to retrieve c=['a','b','c','d','e']
Another example would be
a = ['1','2','3','4','5','6','7','8','9','1','3']
b = ['3','4','5','2','1','7','8','9','1']
c should be ['3','4','5','7','8','9','1']
Right now I am using the following code which works for the latter example but not the former. Can someone help?
def gct(a, b):
if len(a) == 0 or len(b) == 0:
return []
if a[0] == b[0]:
return [a[0]] + gct(a[1:], b[1:])
return max(gct(a, b[1:]), gct(a[1:], b), key=len)
I am calling function with
gct( ['a','b','c','d','e','f'], ['d','e','a','b','c','f'] )
which gives ['a', 'b', 'c', 'f'], when it should be ['a','b','c','d','e'] or ['d','e','a','b','c'].
P.S The order doesn't matter in the printing of the result. The order is only important while doing the comparisons. The minimum pattern length should be 2
NOTE intersect will not solve my problem

The following codes can solve the problem. A minlength is added to make sure the substr >= minlength.
maxsub is used to find the longest substr and return its index in string a.
markit.a is used to mark the char position found before in string a.
while loop is used to find all the substr (len>=minlength)
^
#! /usr/bin/env python
a=['a','b','c','d','e','f']
b=['d','e','a','b','c','f']
a = ['1','2','3','4','5','6','7','8','9','1','3']
b = ['3','4','5','2','1','7','8','9','1']
def gct(a,b,minlength=2):
if len(a) == 0 or len(b) == 0:
return []
# if py>3.0, nonlocal is better
class markit:
a=[0]
minlen=2
markit.a=[0]*len(a)
markit.minlen=minlength
#output char index
out=[]
# To find the max length substr (index)
# apos is the position of a[0] in origin string
def maxsub(a,b,apos=0,lennow=0):
if (len(a) == 0 or len(b) == 0):
return []
if (a[0]==b[0] and markit.a[apos]!=1 ):
return [apos]+maxsub(a[1:],b[1:],apos+1,lennow=lennow+1)
elif (a[0]!=b[0] and lennow>0):
return []
return max(maxsub(a, b[1:],apos), maxsub(a[1:], b,apos+1), key=len)
while True:
findmax=maxsub(a,b,0,0)
if (len(findmax)<markit.minlen):
break
else:
for i in findmax:
markit.a[i]=1
out+=findmax
return [ a[i] for i in out]
print gct(a,b)
>> ['a', 'b', 'c', 'd', 'e']
>> ['7', '8', '9', '1', '3', '4', '5']
print gct(a,b,3)
>> ['a', 'b', 'c']
>> ['7', '8', '9', '1', '3', '4', '5']

Related

how to separate alternating digits and characters in string into dict or list?

'L134e2t1C1o1d1e1'
the original string is "LeetCode"
but I need to separate strings from digits, digits can be not only single-digit but also 3-4 digits numbers like 345.
My code needs to separate into dict of key values; keys are characters and numbers is the digit right after the character. Also create 2 lists of separate digits, letters only.
expected output:
letters = ['L', 'e', 't', 'C', 'o', 'd', 'e']
digits = [134,2,1,1,1,1,1]
This code is not properly processing this.
def f(s):
d = dict()
letters = list()
# letters = list(filter(lambda x: not x.isdigit(), s))
i = 0
while i < len(s):
print('----------------------')
if not s[i].isdigit():
letters.append(s[i])
else:
j = i
temp = ''
while j < len(s) and s[j].isdigit():
j += 1
substr = s[i:j]
print(substr)
i += 1
print('----END -')
print(letters)

With the following modification your function separates letters from digits in s:
def f(s):
letters = list()
digits = list()
i = 0
while i < len(s):
if not s[i].isdigit():
letters.append(s[i])
i += 1
else:
j = i
temp = ''
while j < len(s) and s[j].isdigit():
j += 1
substr = s[i:j]
i = j
digits.append(substr)
print(letters)
print(digits)
f('L134e2t1C1o1d1e1')
As said in my comments you didn't update i after the inner loop terminates which made i go back to a previous and already processed index.

If I would be limited to not use regex I would do it following way
text = 'L134e2t1C1o1d1e1'
letters = [i for i in text if i.isalpha()]
digits = ''.join(i if i.isdigit() else ' ' for i in text).split()
print(letters)
print(digits)
output
['L', 'e', 't', 'C', 'o', 'd', 'e']
['134', '2', '1', '1', '1', '1', '1']
Explanation: for letters I use simple list comprehension with condition, .isalpha() is str method which check if string (in this consisting of one character) is alphabetic. For digits (which should be rather called numbers) I replace non-digits using single space, turn that into string using ''.join then use .split() (it does split on one or more whitespaces). Note that digits is now list of strs rather than ints, if that is desired add following line:
digits = list(map(int,digits))

Your string only had two e's, so I've added one more to complete the example. This is one way you could do it:
import re
t = 'L1e34e2t1C1o1d1e1'
print(re.sub('[^a-zA-Z]', '', t))
Result:
LeetCode
I know you cannot use regex, but to complete this answer, I'll just add a solution:
def f(s):
d = re.findall('[0-9]+', s)
l = re.findall('[a-zA-Z]', s)
print(d)
print(l)
f(t)
Result:
['134', '2', '1', '1', '1', '1', '1']
['L', 'e', 't', 'C', 'o', 'd', 'e']

You edited your question and I got a bit confused, so here is a really exhaustive code giving you a list of letters, list of the numbers, the dict with the number associated with the number, and finally the sentence with corresponding number of characters ...
def f(s):
letters = [c for c in s if c.isalpha()]
numbers = [c for c in s if c.isdigit()]
mydict = {}
currentKey = ""
for c in s:
print(c)
if c.isalpha():
mydict[c] = [] if c not in mydict.keys() else mydict[c]
currentKey = c
elif c.isdigit():
mydict[currentKey].append(c)
sentence = ""
for i in range(len(letters)):
count = int(numbers[i])
while count > 0:
sentence += letters[i]
count -= 1
print(letters)
print(numbers)
print(mydict)
print(sentence)

letters = []
digits = []
dig = ""
for letter in 'L134e2t1C1o1d1e1':
if letter.isalpha():
# do not add empty string to list
if dig:
# append dig to list of digits
digits.append(dig)
dig = ""
letters.append(letter)
# if it is a actual letter continue
continue
# add digits to `dig`
dig = dig + letter
Try this. The idea is to skip all actual letters and add the digits to dig.

I know there's an accepted answer but I'll throw this one in anyway:
letters = []
digits = []
lc = 'L134e2t1C1o1d1e1'
n = None
for c in lc:
if c.isalpha():
if n is not None:
digits.append(n)
n = None
letters.append(c)
else:
if n is None:
n = int(c)
else:
n *= 10
n += int(c)
if n is not None:
digits.append(n)
for k, v in zip(letters, digits):
dct.setdefault(k, []).append(v)
print(letters)
print(digits)
print(dct)
Output:
['L', 'e', 't', 'C', 'o', 'd', 'e']
[134, 2, 1, 1, 1, 1, 1]
{'L': [134], 'e': [2, 1], 't': [1], 'C': [1], 'o': [1], 'd': [1]}

Subwords of a string in Python

I am trying to create a list of every possible version of a string in a fast way. I don't really mean specifically subwords - for example from a string "ABC", I want to get:
['C', 'B', 'BC', 'A', 'AB', 'ABC']
(without "AC" which is a subword)
Same goes for "123":
I want to get ['3', '2', '23', '1', '12', '123'] instead of ['3', '2', '23', '1', '13', '12', '123']

Here is a simple loop and slice based generator function:
def subs(s):
for i in range(len(s)):
for j in range(i+1, len(s)+1):
yield s[i:j]
>>> list(subs("ABC"))
['A', 'AB', 'ABC', 'B', 'BC', 'C']

Might be faster to extend the substrings instead of freshly slicing each:
def subs(s):
while s:
t = ''
for c in s:
t += c
yield t
s = s[1:]
Benchmark results for s = "z" * 5000:
8.4 seconds subs_slice
1.5 seconds subs_extend
Benchmark code (Try it online!):
from timeit import timeit
from collections import deque
def subs_slice(s):
for i in range(len(s)):
for j in range(i+1, len(s)+1):
yield s[i:j]
def subs_extend(s):
while s:
t = ''
for c in s:
t += c
yield t
s = s[1:]
funcs = subs_slice, subs_extend
for func in funcs:
print(list(func('ABCD')))
s = "z" * 5000
for _ in range(3):
for func in funcs:
t = timeit(lambda: deque(func(s), 0), number=1)
print(t, func.__name__)
print()

For ABC you can just get ['C', 'B', 'BC', 'A', 'AB', 'ABC', 'AC'] then use remove() to remove the subword from your list. E.i:
abc_list = ['C', 'B', 'BC', 'A', 'AB', 'ABC', 'AC']
abc_list.remove('AC')
Output: ['C', 'B', 'BC', 'A', 'AB', 'ABC']
There is a lack of context to the question to give you a full answer. Do all of your strings have 3 characters or more? how do you define what you don't need?
If all the strings are 3 characters in length, then you can use this:
def subwording(word: str):
subword = word[0]+word[2]
return subword
Then you can remove subword from your list.

Code works only sometimes for removing odd or even index items

Question : Write a Python program to remove the characters which have odd or even index
values of a given string.
I tried to make a copy of the list by deep copy .
I ran a loop from first list and checked for even then used pop method on second list to remove that specific index from the second list .
This code works for some inputs , I think mostly for those which doesn't have any repeated characters and doesn't work for others.
Code
#!/usr/bin/python3
import copy
list1 = input("Enter a string ")
list1 = list(list1)
list2 = copy.deepcopy(list1)
for i in list1:
if list1.index(i)%2 != 0:
list2.pop(list2.index(i))
print(list2)
The outputs for some samples are :
123456789 -> ['1', '3', '5', '7', '9'], qwertyuiop -> ['q', 'e', 't', 'u', 'o'], saurav -> ['s', 'u'], 11112222333344445555 -> ['1', '1', '1', '1', '2', '2', '2', '2', '3', '3', '3', '3', '4', '4', '4', '4', '5', '5', '5', '5']

Read the documentation for index. It returns the index of the first occurrence of the given value. A simple print inside the loop will show you what's going on, in appropriate detail. This is a basic debugging skill you need to learn for programming in any language.
import copy
list1 = input("Enter a string ")
list1 = list(list1)
list2 = copy.deepcopy(list1)
for i in list1:
if list1.index(i)%2 != 0:
print(i, list1.index(i), list2.index(i))
list2.pop(list2.index(i))
print(list2)
print(list2)
output:
Enter a string google
o 1 1
['g', 'o', 'g', 'l', 'e']
o 1 1
['g', 'g', 'l', 'e']
e 5 3
['g', 'g', 'l']
['g', 'g', 'l']
... and that's your trouble. Fix your logic. You already know the needed index to save or remove. There is no need to extract the character, and then search for it again. You already know where it is.
Even better, simply slice the original string for the characters you want:
print(list1[::2])

Your problem is the list.index function. The documentation states that it "returns zero-based index in the list of the first item whose value is equal to x." Because you are calling it on list1 - and that is not modified - the result will always be list1.index('a') == 1 for example.
The correct solution would be to use enumerate. A further problem exists here - because you are indexing from an array that you have not modified, you indexes will be off after the first list.pop operation. Every item after the one removed will have been shifted by 1. To correct this, you could instead try building a list instead of emptying one:
#!/usr/bin/python3
list1 = input("Enter a string ")
list2 = []
for i, item in enumerate(list1):
if i % 2 == 0:
list2.append(item)
print(list2)

You don't need to iterate at all. Just reference the string elements directly.
st="123456789"
print('Odd: ', list(st[::2]))
print('Even: ', list(st[1::2]))
Output:
Odd: ['1', '3', '5', '7', '9']
Even: ['2', '4', '6', '8']

The method list.index(i) returns index in the list of the first item whose value is equal to i.
For example, "saurav".index('a') returns 1. when you call list2.pop(list2.index(i)) and you want to pop an a, it doesn't work well.
I think it can be simple using range as build-in function.
list1 = list(input("Enter a string "))
list2 = list()
for i in range(len(list1)):
if i % 2 == 0:
list2.append(list1[i])
print(list2)
It works with same way by following:
list1 = list(input("Enter a string "))
list2 = list()
for i in range(0, len(list1), 2):
list2.append(list1[i])
print(list2)
Also, you can use Extended Slices in Python 2.3 or above.
list1 = list(input("Enter a string "))
list2 = list1[::2]
print(list2)

Print 2 lists side by side

I'm trying to output the values of 2 lists side by side using list comprehension. I have an example below that shows what I'm trying to accomplish. Is this possible?
code:
#example lists, the real lists will either have less or more values
a = ['a', 'b', 'c,']
b = ['1', '0', '0']
str = ('``` \n'
'results: \n\n'
'options votes \n'
#this line is the part I need help with: list comprehension for the 2 lists to output the values as shown below
'```')
print(str)
#what I want it to look like:
'''
results:
options votes
a 1
b 0
c 0
'''

You can use the zip() function to join lists together.
a = ['a', 'b', 'c']
b = ['1', '0', '0']
res = "\n".join("{} {}".format(x, y) for x, y in zip(a, b))
The zip() function will iterate tuples with the corresponding elements from each of the lists, which you can then format as Michael Butscher suggested in the comments.
Finally, just join() them together with newlines and you have the string you want.
print(res)
a 1
b 0
c 0

This works:
a = ['a', 'b', 'c']
b = ['1', '0', '0']
print("options votes")
for i in range(len(a)):
print(a[i] + '\t ' + b[i])
Outputs:
options votes
a 1
b 0
c 0

from __future__ import print_function # if using Python 2
a = ['a', 'b', 'c']
b = ['1', '0', '0']
print("""results:
options\tvotes""")
for x, y in zip(a, b):
print(x, y, sep='\t\t')

[print(x,y) for x,y in zip(list1, list2)]
Note the square brackets enclosing the print statement.

using python 3.6 to slice substring with same char [duplicate]

I am not well experienced with Regex but I have been reading a lot about it. Assume there's a string s = '111234' I want a list with the string split into L = ['111', '2', '3', '4']. My approach was to make a group checking if it's a digit or not and then check for a repetition of the group. Something like this
L = re.findall('\d[\1+]', s)
I think that \d[\1+] will basically check for either "digit" or "digit +" the same repetitions. I think this might do what I want.

Use re.finditer():
>>> s='111234'
>>> [m.group(0) for m in re.finditer(r"(\d)\1*", s)]
['111', '2', '3', '4']

If you want to group all the repeated characters, then you can also use itertools.groupby, like this
from itertools import groupby
print ["".join(grp) for num, grp in groupby('111234')]
# ['111', '2', '3', '4']
If you want to make sure that you want only digits, then
print ["".join(grp) for num, grp in groupby('111aaa234') if num.isdigit()]
# ['111', '2', '3', '4']

Try this one:
s = '111234'
l = re.findall(r'((.)\2*)', s)
## it this stage i have [('111', '1'), ('2', '2'), ('3', '3'), ('4', '4')] in l
## now I am keeping only the first value from the tuple of each list
lst = [x[0] for x in l]
print lst
output:
['111', '2', '3', '4']

If you don't want to use any libraries then here's the code:
s = "AACBCAAB"
L = []
temp = s[0]
for i in range(1,len(s)):
if s[i] == s[i-1]:
temp += s[i]
else:
L.append(temp)
temp = s[i]
if i == len(s)-1:
L.append(temp)
print(L)
Output:
['AA', 'C', 'B', 'C', 'AA', 'B']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Greedy String Tiling in Python - python

Related

how to separate alternating digits and characters in string into dict or list?

Subwords of a string in Python

Code works only sometimes for removing odd or even index items

Print 2 lists side by side

using python 3.6 to slice substring with same char [duplicate]

Categories

Resources