This question already has answers here:
Run length encoding in Python
(10 answers)
Closed 4 years ago.
I'm looking for the solution to my problem. I want to make a program where someone inputs string and I cover it into something like this:
'ZZZZYYYZZ' -> 'Z4Y3Z2'
I'm open for any suggestions.
The code I did:
def compress(s):
e={}
if s.isalpha():
for i in s:
if i in e:
e[i] += 1
else:
e[i] = 1
else:
return None
return ''.join(['{0}{1}'.format(k,v)for k,v in e.items()])
s=input("Write string: ")
compress(s)
This produces the wrong output
Write string: ZZZZYYYZZ
Y3Z6
Grouping of unsorted data into chunks is a job for itertools.groupby.
>>> from itertools import groupby
>>>
>>> s = 'ZZZZYYYZZ'
>>> ''.join('{}{}'.format(c, len(list(g))) for c, g in groupby(s))
'Z4Y3Z2'
Detrails on what groupby produces here:
>>> [(c, list(g)) for c, g in groupby(s)]
[('Z', ['Z', 'Z', 'Z', 'Z']), ('Y', ['Y', 'Y', 'Y']), ('Z', ['Z', 'Z'])]
~edit~
Slight memory optimization without intermediary lists:
>>> ''.join('{}{}'.format(c, sum(1 for _ in g)) for c, g in groupby(s))
'Z4Y3Z2'
~edit 2~
Instead of C1 can we have just C?
>>> s = 'XYXYXXX'
>>> to_join = []
>>> groups = groupby(s)
>>>
>>> for char, group in groups:
...: group_len = sum(1 for _ in group)
...: if group_len == 1:
...: to_join.append(char)
...: else:
...: to_join.append('{}{}'.format(char, group_len))
...:
>>> ''.join(to_join)
'XYXYX3'
This lends itself to a neat use of zip, allowing you to iterate over each character and the next character:
s = 'ZZZZYYYZZ'
out = ''
count = 1
for a, b in zip(s[:-1], s[1:]):
print(a, b)
if a != b:
out += a + str(count)
count = 1
else:
count += 1
out += s[-1] + str(count)
which gives out as 'Z4Y3Z2'.
Related
I am trying to compare the two strings: 'apple' and 'pear' and return letters that do not belong to the other string.
For example, 'apple' does not contain 'r' in 'pear'
'pear' does not contain 'l' and 'p' in apple (pear contains p but does not contains two p's).
So I want to have a function that returns 'r', 'l', and 'p'.
I tried set, but it ignores the duplicates (p, in this example).
def solution(A, B):
N = len(A)
M = len(B)
letters_not_in_B = list(set([c for c in A if c not in B]))
letters_not_in_A = list(set([c for c in B if c not in A]))
answer = len(letters_not_in_B) + len(letters_not_in_A)
return answer
You can compare the character counts for each separate string resulting from the concatenation of the parameters a and b:
def get_results(a, b):
return list(set([i for i in a+b if a.count(i) != b.count(i)]))
print(get_results('apple', 'pear'))
Output:
['p', 'r', 'l']
Use a Counter
from collections import Counter
Counter('apple') - Counter('pear') # --> Counter({'p': 1, 'l': 1})
Counter('pear') - Counter('apple') # --> Counter({'r': 1})
def solution(a, b):
# create mutable list copies of a and b
list_a = list(a)
list_b = list(b)
for ch in a:
if ch in list_b:
list_b.remove(ch)
for ch in b:
if ch in list_a:
list_a.remove(ch)
return list_a + list_b
My Question is that if we need to find the intersect between two strings?
How could we do that?
For example "address" and "dress" should return "dress".
I used a dict to implement my function, but I can only sort these characters and not output them with the original order? So how should I modify my code?
def IntersectStrings(first,second):
a={}
b={}
for c in first:
if c in a:
a[c] = a[c]+1
else:
a[c] = 1
for c in second:
if c in b:
b[c] = b[c]+1
else:
b[c] = 1
l = []
print a,b
for key in sorted(a):
if key in b:
cnt = min(a[key],b[key])
while(cnt>0):
l.append(key)
cnt = cnt-1
return ''.join(l)
print IntersectStrings('address','dress')
There are lots of intersecting strings. One way you could create a set of all substrings of each string and then intersect. If you want the biggest intersection just find the max from the resulting set, e.g.:
def substrings(s):
for i in range(len(s)):
for j in range(i, len(s)):
yield s[i:j+1]
def intersect(s1, s2):
return set(substrings(s1)) & set(substrings(s2))
Then you can see the intersections:
>>> intersect('address', 'dress')
{'re', 'ss', 'ess', 'es', 'ress', 'dress', 'dres', 'd', 'e', 's', 'res', 'r', 'dre', 'dr'}
>>> max(intersect('address', 'dress'), key=len)
'dress'
>>> max(intersect('sprinting', 'integer'), key=len)
'int'
Hi I am new to programming and want to learn python. I am working on a code that should return items that are most redundant in a list. If there are more than 1 then it should return all.
Ex.
List = ['a','b','c','b','d','a'] #then it should return both a and b.
List = ['a','a','b','b','c','c','d'] #then it should return a b and c.
List = ['a','a','a','b','b','b','c','c','d','d','d'] #then it should return a b and d.
Note: We don't know what element is most common in the list so we have to find the most common element and if there are more than one it should return all. If the list has numbers or other strings as elements then also the code has to work
I have no idea how to proceed. I can use a little help.
Here is the whole program:
from collections import Counter
def redundant(List):
c = Counter(List)
maximum = c.most_common()[0][1]
return [k for k, v in c.items()if v == maximum]
def find_kmers(DNA_STRING, k):
length = len(DNA_STRING)
a = 0
List_1 = []
string_1 = ""
while a <= length - k:
string_1 = DNA_STRING[a:a+k]
List_1.append(string_1)
a = a + 1
redundant(List_1)
This program should take DNA string and length of kmer and find what are the kemers of that length that are present in that DNA string.
Sample Input:
ACGTTGCATGTCGCATGATGCATGAGAGCT
4
Sample Output:
CATG GCAT
You can use collections.Counter:
from collections import Counter
def solve(lis):
c = Counter(lis)
mx = c.most_common()[0][1]
#or mx = max(c.values())
return [k for k, v in c.items() if v == mx]
print (solve(['a','b','c','b','d','a']))
print (solve(['a','a','b','b','c','c','d']))
print (solve(['a','a','a','b','b','b','c','c','d','d','d'] ))
Output:
['a', 'b']
['a', 'c', 'b']
['a', 'b', 'd']
A slightly different version of the above code using itertools.takewhile:
from collections import Counter
from itertools import takewhile
def solve(lis):
c = Counter(lis)
mx = max(c.values())
return [k for k, v in takewhile(lambda x: x[1]==mx, c.most_common())]
inputData = [['a','b','c','b','d','a'], ['a','a','b','b','c','c','d'], ['a','a','a','b','b','b','c','c','d','d','d'] ]
from collections import Counter
for myList in inputData:
temp, result = -1, []
for char, count in Counter(myList).most_common():
if temp == -1: temp = count
if temp == count: result.append(char)
else: break
print result
Output
['a', 'b']
['a', 'c', 'b']
['a', 'b', 'd']
>>> def maxs(L):
... counts = collections.Counter(L)
... maxCount = max(counts.values())
... return [k for k,v in counts.items() if v==maxCount]
...
>>> maxs(L)
['a', 'b']
>>> L = ['a','a','b','b','c','c','d']
>>> maxs(L)
['a', 'b', 'c']
>>> L = ['a','a','a','b','b','b','c','c','d','d','d']
>>> maxs(L)
['d', 'a', 'b']
Just for the sake of giving a solution not using collections & using list comprehensions.
given_list = ['a','b','c','b','d','a']
redundant = [(each, given_list.count(each)) for each in set(given_list) if given_list.count(each) > 1]
count_max = max(redundant, key=lambda x: x[1])[1]
final_list = [char for char, count in redundant if count == count_max]
PS - I myself haven't used Counters yet :( Time to learn!
Given a list of strings, where each string is in the format "A - something" or "B - somethingelse", and list items mostly alternate between pieces of "A" data and "B" data, how can irregularities be removed?
Irregularities being any sequence that breaks the A B pattern.
If there are multiple A's, the next B should also be removed.
If there are multiple B's, the preceding A should also be removed.
After removal of these invalid sequnces, list order should be kept.
Example: A B A B A A B A B A B A B A B B A B A B A A B B A B A B
In this case, AAB (see rule 2), ABB (see rule 3) and AABB should be removed.
I'll give it a try with regexp returning indexes of sequences to be removed
>>> import re
>>> data = 'ABABAABABABABABBABABAABBABAB'
>>> [(m.start(0), m.end(0)) for m in re.finditer('(AA+B+)|(ABB+)', data)]
[(4, 7), (13, 16), (20, 24)]
or result of stripping
>>> re.sub('(AA+B+)|(ABB+)', '', data)
ABABABABABABABABAB
The drunk-on-itertools solution:
>>> s = 'ABABAABABABABABBABABAABBABAB'
>>> from itertools import groupby, takewhile, islice, repeat, chain
>>> groups = (list(g) for k,g in groupby(s))
>>> pairs = takewhile(bool, (list(islice(groups, 2)) for _ in repeat(None)))
>>> kept_pairs = (p for p in pairs if len(p[0]) == len(p[1]) == 1)
>>> final = list(chain(*chain(*kept_pairs)))
>>> final
['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
(Unfortunately I'm now in no shape to think about corner cases and trailing As etc..)
I'd write it as a generator. Repeat:
read as many A's as possible,
read as many B's as possible,
if you've read exactly 1 A and 1 B, yield them; otherwise ignore and proceed.
Also this needs an additional special case in case you want to allow the input to end with an A.
Using itertools.groupby:
from itertools import groupby
def solve(strs):
drop_next = False
ans = []
for k, g in groupby(strs):
lis = list(g)
if drop_next:
#if True then don't append the current set to `ans`
drop_next = False
elif len(lis) > 1 and k == 'A':
#if current group contains more than 1 'A' then skip the next set of 'B'
drop_next = True
elif len(lis) > 1 and k == 'B':
#if current group contains more than 1 'B' then pop the last appended item
if ans:
ans.pop(-1)
else:
ans.append(k)
return ''.join(ans)
strs = 'ABABAABABABABABBABABAABBABAB'
print solve(strs)
#ABABABABABABABABAB
How to find all intersections (also called the longest common substrings) of two strings and their positions in both strings?
For example, if S1="never" and S2="forever" then resulted intersection must be ["ever"] and its positions are [(1,3)]. If S1="address" and S2="oddness" then resulted intersections are ["dd","ess"] and their positions are [(1,1),(4,4)].
Shortest solution without including any library is preferable. But any correct solution is also welcomed.
Well, you're saying that you can't include any library. However, Python's standard difflib contains a function which does exactly what you expect. Considering that it is a Python interview question, familiarity with difflib might be what the interviewer expected.
In [31]: import difflib
In [32]: difflib.SequenceMatcher(None, "never", "forever").get_matching_blocks()
Out[32]: [Match(a=1, b=3, size=4), Match(a=5, b=7, size=0)]
In [33]: difflib.SequenceMatcher(None, "address", "oddness").get_matching_blocks()
Out[33]: [Match(a=1, b=1, size=2), Match(a=4, b=4, size=3), Match(a=7, b=7, size=0)]
You can always ignore the last Match tuple, since it's dummy (according to documentation).
This can be done in O(n+m) where n and m are lengths of input strings.
The pseudocode is:
function LCSubstr(S[1..m], T[1..n])
L := array(1..m, 1..n)
z := 0
ret := {}
for i := 1..m
for j := 1..n
if S[i] = T[j]
if i = 1 or j = 1
L[i,j] := 1
else
L[i,j] := L[i-1,j-1] + 1
if L[i,j] > z
z := L[i,j]
ret := {}
if L[i,j] = z
ret := ret ∪ {S[i-z+1..z]}
return ret
See the Longest_common_substring_problem wikipedia article for more details.
Here's what I could come up with:
import itertools
def longest_common_substring(s1, s2):
set1 = set(s1[begin:end] for (begin, end) in
itertools.combinations(range(len(s1)+1), 2))
set2 = set(s2[begin:end] for (begin, end) in
itertools.combinations(range(len(s2)+1), 2))
common = set1.intersection(set2)
maximal = [com for com in common
if sum((s.find(com) for s in common)) == -1 * (len(common)-1)]
return [(s, s1.index(s), s2.index(s)) for s in maximal]
Checking some values:
>>> longest_common_substring('address', 'oddness')
[('dd', 1, 1), ('ess', 4, 4)]
>>> longest_common_substring('never', 'forever')
[('ever', 1, 3)]
>>> longest_common_substring('call', 'wall')
[('all', 1, 1)]
>>> longest_common_substring('abcd1234', '1234abcd')
[('abcd', 0, 4), ('1234', 4, 0)]
Batteries included!
The difflib module might have some help for you - here is a quick and dirty side-by-side diff:
>>> import difflib
>>> list(difflib.ndiff("never","forever"))
['- n', '+ f', '+ o', '+ r', ' e', ' v', ' e', ' r']
>>> diffs = list(difflib.ndiff("never","forever"))
>>> for d in diffs:
... print {' ': ' ', '-':'', '+':' '}[d[0]]+d[1:]
...
n
f
o
r
e
v
e
r
I'm assuming you only want substrings to match if they have the same absolute position within their respective strings. For example, "abcd", and "bcde" won't have any matches, even though both contain "bcd".
a = "address"
b = "oddness"
#matches[x] is True if a[x] == b[x]
matches = map(lambda x: x[0] == x[1], zip(list(a), list(b)))
positions = filter(lambda x: matches[x], range(len(a)))
substrings = filter(lambda x: x.find("_") == -1 and x != "","".join(map(lambda x: ["_", a[x]][matches[x]], range(len(a)))).split("_"))
positions = [1, 2, 4, 5, 6]
substrings = ['dd', 'ess']
If you only want substrings, you can squish it into one line:
filter(lambda x: x.find("_") == -1 and x != "","".join(map(lambda x: ["_", a[x]][map(lambda x: x[0] == x[1], zip(list(a), list(b)))[x]], range(len(a)))).split("_"))
def IntersectStrings( first, second):
x = list(first)
#print x
y = list(second)
lst1= []
lst2= []
for i in x:
if i in y:
lst1.append(i)
lst2 = sorted(lst1) + []
# This above step is an optional if it is required to be sorted alphabetically use this or else remove it
return ''.join(lst2)
print IntersectStrings('hello','mello' )