get occurrence of all substring of matching characters from string - python

e.g. find substring containing 'a', 'b', 'c' in a string 'abca', answer should be 'abc', 'abca', 'bca'
Below code is what I did, but is there better, pythonic way than doing 2 for loops?
Another e.g. for 'abcabc' count should be 10
def test(x):
counter = 0
for i in range(0, len(x)):
for j in range(i, len(x)+1):
if len((x[i:j]))>2:
print(x[i:j])
counter +=1
print(counter)
test('abca')

You can condense it down with list comprehension:
s = 'abcabc'
substrings = [s[b:e] for b in range(len(s)-2) for e in range(b+3, len(s)+1)]
substrings, len(substrings)
# (['abc', 'abca', 'abcab', 'abcabc', 'bca', 'bcab', 'bcabc', 'cab', 'cabc', 'abc'], 10)

You can use combinations from itertools:
from itertools import combinations
string = "abca"
result = [string[x:y] for x, y in combinations(range(len(string) + 1), r = 2)]
result = [item for item in result if 'a' in item and 'b' in item and 'c' in item]

Related

Count duplicated value in Python considering sequence [duplicate]

This question already has an answer here:
What's the most Pythonic way to identify consecutive duplicates in a list?
(1 answer)
Closed 10 months ago.
I have string value as:
s = 'asdabbdasfababbabb'
I've split the str by using below code, than get result as below :
n = 3
split_strings = [s[index : index + n] for index in range(0, len(s), n)]
['asd', 'abb', 'das', 'fab', 'abb', 'abb']
What I need to achieve:
I want to count duplicated value consiering the sequence such as :
({'asd': 1, 'abb': 1, 'das': 1, 'fab': 1, 'abb' : 2})
However, if I use Counter() it counts the duplicated value but, does not seems to consider the sequence of list:
Counter({'asd': 1, 'abb': 3, 'das': 1, 'fab': 1})
How can I achieve what I need?
You cannot store duplicate keys in a dict. If you are willing to have a list of tuples, you can use itertools.groupby:
from itertools import groupby
lst = ['asd', 'abb', 'das', 'fab', 'abb', 'abb']
counts = [(k, len([*g])) for k, g in groupby(lst)]
print(counts) # [('asd', 1), ('abb', 1), ('das', 1), ('fab', 1), ('abb', 2)]
The itertools.groupby function is a favorite, but perhaps future readers might appreciate an algorithm for actually finding these groupings:
def groups(*items):
i = 0
groups = []
while i < len(items):
item = items[i]
j = i + 1
count = 1
while j < len(items):
if items[j] == item:
count += 1
j += 1
else:
break
i = j
groups.append((item, count))
return groups

Filter a list of strings by frequency

I have a list of strings:
a = ['book','book','cards','book','foo','foo','computer']
I want to return anything in this list that's x > 2
Final output:
a = ['book','book','book']
I'm not quite sure how to approach this. But here's two methods I had in mind:
Approach One:
I've created a dictionary to count the number of times an item appears:
a = ['book','book','cards','book','foo','foo','computer']
import collections
def update_item_counts(item_counts, itemset):
for a in itemset:
item_counts[a] +=1
test = defaultdict(int)
update_item_counts(test, a)
print(test)
Out: defaultdict(<class 'int'>, {'book': 3, 'cards': 1, 'foo': 2, 'computer': 1})
I want to filter out the list with this dictionary but I'm not sure how to do that.
Approach two:
I tried to write a list comprehension but it doesn't seem to work:
res = [k for k in a if a.count > 2 in k]
A very barebone answer is that you should replace a.count by a.count(k) in your second solution.
Although, do not attempt to use list.count for this, as this will traverse the list for each item. Instead count occurences first with collections.Counter. This has the advantage of traversing the list only once.
from collections import Counter
from itertools import repeat
a = ['book','book','cards','book','foo','foo','computer']
count = Counter(a)
output = [word for item, n in count.items() if n > 2 for word in repeat(item, n)]
print(output) # ['book', 'book', 'book']
Note that the list comprehension is equivalent to the loop below.
output = []
for item, n in count.items():
if n > 2:
output.extend(repeat(item, n))
Try this:
a_list = ['book','book','cards','book','foo','foo','computer']
b_list = []
for a in a_list:
if a_list.count(a) > 2:
b_list.append(a)
print(b_list)
# ['book', 'book', 'book']
Edit: You mentioned list comprehension. You are on the right track! You can do it with list comprehension like this:
a_list = ['book','book','cards','book','foo','foo','computer']
c_list = [a for a in a_list if a_list.count(a) > 2]
Good luck!
a = ['book','book','cards','book','foo','foo','computer']
list(filter(lambda s: a.count(s) > 2, a))
Your first attempt builds a dictionary with all of the counts. You need to take this a step further to get the items that you want:
res = [k for k in test if test[k] > 2]
Now that you have built this by hand, you should check out the builtin Counter class that does all of the work for you.
If you just want to print there are better answers already, if you want to remove you can try this.
a = ['book','book','cards','book','foo','foo','computer']
countdict = {}
for word in a:
if word not in countdict:
countdict[word] = 1
else:
countdict[word] += 1
for x, y in countdict.items():
if (2 >= y):
for i in range(y):
a.remove(x)
You can try this.
def my_filter(my_list, my_freq):
'''Filter a list of strings by frequency'''
# use set() to unique my_list, then turn set back to list
unique_list = list(set(my_list))
# count frequency in unique_list
frequencies = []
for value in unique_list:
frequencies.append(my_list.count(value))
# filter frequency
return_list = []
for i, frequency in enumerate(frequencies):
if frequency > my_freq:
for _ in range(frequency):
return_list.append(unique_list[i])
return return_list
a = ['book','book','cards','book','foo','foo','computer']
my_filter(a, 2)
['book', 'book', 'book']

Adding certain lengthy elements to a list

I'm doing a project for my school and for now I have the following code:
def conjunto_palavras_para_cadeia1(conjunto):
acc = []
conjunto = sorted(conjunto, key=lambda x: (len(x), x))
def by_size(words, size):
result = []
for word in words:
if len(word) == size:
result.append(word)
return result
for i in range(0, len(conjunto)):
if i > 0:
acc.append(("{} ->".format(i)))
acc.append(by_size(conjunto, i))
acc = ('[%s]' % ', '.join(map(str, acc)))
print( acc.replace(",", "") and acc.replace("'", "") )
conjunto_palavras_para_cadeia1(c)
I have this list: c = ['A', 'E', 'LA', 'ELA'] and what I want is to return a string where the words go from the smallest one to the biggest on in terms of length, and in between they are organized alphabetically. I'm not being able to do that...
OUTPUT: [;1 ->, [A, E], ;2 ->, [LA], ;3 ->, [ELA]]
WANTED OUTPUT: ’[1->[A, E];2->[LA];3->[ELA]]’
Taking a look at your program, the only issue appears to be when you are formatting your output for display. Note that you can use str.format to insert lists into strings, something like this:
'{}->{}'.format(i, sublist)
Here's my crack at your problem, using sorted + itertools.groupby.
from itertools import groupby
r = []
for i, g in groupby(sorted(c, key=len), key=len):
r.append('{}->{}'.format(i, sorted(g)).replace("'", ''))
print('[{}]'.format(';'.join(r)))
[1->[A, E];2->[LA];3->[ELA]]
A breakdown of the algorithm stepwise is as follows -
sort elements by length
group consecutive elements by length
for each group, sort sub-lists alphabetically, and then format them as strings
at the end, join each group string and surround with square brackets []
Shortest solution (with using of pure python):
c = ['A', 'E', 'LA', 'ELA']
result = {}
for item in c:
result[len(item)] = [item] if len(item) not in result else result[len(item)] + [item]
str_result = ', '.join(['{0} -> {1}'.format(res, sorted(result[res])) for res in result])
I will explain:
We are getting items one by one in loop. And we adding them to dictionary by generating lists with index of word length.
We have in result:
{1: ['A', 'E'], 2: ['LA'], 3: ['ELA']}
And in str_result:
1 -> ['A', 'E'], 2 -> ['LA'], 3 -> ['ELA']
Should you have questions - ask

Getting the nth char of each string in a list of strings

Let's they I have the list ['abc', 'def', 'gh'] I need to get a string with the contents of the first char of the first string, the first of the second and so on.
So the result would look like this: "adgbehcf" But the problem is that the last string in the array could have two or one char.
I already tried to nested for loop but that didn't work.
Code:
n = 3 # The encryption number
for i in range(n):
x = [s[i] for s in partiallyEncrypted]
fullyEncrypted.append(x)
a version using itertools.zip_longest:
from itertools import zip_longest
lst = ['abc', 'def', 'gh']
strg = ''.join(''.join(item) for item in zip_longest(*lst, fillvalue=''))
print(strg)
to get an idea why this works it may help having a look at
for tpl in zip_longest(*lst, fillvalue=''):
print(tpl)
I guess you can use:
from itertools import izip_longest
l = ['abc', 'def', 'gh']
print "".join(filter(None, [i for sub in izip_longest(*l) for i in sub]))
# adgbehcf
Having:
l = ['abc', 'def', 'gh']
This would work:
s = ''
In [18]: for j in range(0, len(max(l, key=len))):
...: for elem in l:
...: if len(elem) > j:
...: s += elem[j]
In [28]: s
Out[28]: 'adgbehcf'
Please don't use this:
''.join(''.join(y) for y in zip(*x)) +
''.join(y[-1] for y in x if len(y) == max(len(j) for j in x))

Intersection with order between two strings

My Question is that if we need to find the intersect between two strings?
How could we do that?
For example "address" and "dress" should return "dress".
I used a dict to implement my function, but I can only sort these characters and not output them with the original order? So how should I modify my code?
def IntersectStrings(first,second):
a={}
b={}
for c in first:
if c in a:
a[c] = a[c]+1
else:
a[c] = 1
for c in second:
if c in b:
b[c] = b[c]+1
else:
b[c] = 1
l = []
print a,b
for key in sorted(a):
if key in b:
cnt = min(a[key],b[key])
while(cnt>0):
l.append(key)
cnt = cnt-1
return ''.join(l)
print IntersectStrings('address','dress')
There are lots of intersecting strings. One way you could create a set of all substrings of each string and then intersect. If you want the biggest intersection just find the max from the resulting set, e.g.:
def substrings(s):
for i in range(len(s)):
for j in range(i, len(s)):
yield s[i:j+1]
def intersect(s1, s2):
return set(substrings(s1)) & set(substrings(s2))
Then you can see the intersections:
>>> intersect('address', 'dress')
{'re', 'ss', 'ess', 'es', 'ress', 'dress', 'dres', 'd', 'e', 's', 'res', 'r', 'dre', 'dr'}
>>> max(intersect('address', 'dress'), key=len)
'dress'
>>> max(intersect('sprinting', 'integer'), key=len)
'int'

Categories

Resources