Phone number sequence predictor - python

I have an Phone number here lets say : 98888888xx the last two XX are the numbers which I want to generate a sequence of like 988888811 , 9888888812, 988888813 and so on.
I am trying to learn python programming so can someone help me how would I go on writing a script for the same

You could use a list comprehension and range():
['98888888' + str(number).zfill(2) for number in range(100)]
['9888888800',
'9888888801',
'9888888802',
...
'9888888897',
'9888888898',
'9888888899']

A robust solution is to use recursion to handle many possible occurrences of "x":
import re
s = '98888888xx'
_len = len(re.sub('\d+', '', s))
def combos(d, current = []):
if len(current) == _len:
yield current
else:
for i in d[0]:
yield from combos(d[1:], current+[i])
_c = combos([range(1, 10)]*_len)
new_result = [(lambda d:re.sub('x', lambda _:str(next(d)), s))(iter(i)) for i in _c]
Output:
['9888888811', '9888888812', '9888888813', '9888888814', '9888888815', '9888888816', '9888888817', '9888888818', '9888888819', '9888888821', '9888888822', '9888888823', '9888888824', '9888888825', '9888888826', '9888888827', '9888888828', '9888888829', '9888888831', '9888888832', '9888888833', '9888888834', '9888888835', '9888888836', '9888888837', '9888888838', '9888888839', '9888888841', '9888888842', '9888888843', '9888888844', '9888888845', '9888888846', '9888888847', '9888888848', '9888888849', '9888888851', '9888888852', '9888888853', '9888888854', '9888888855', '9888888856', '9888888857', '9888888858', '9888888859', '9888888861', '9888888862', '9888888863', '9888888864', '9888888865', '9888888866', '9888888867', '9888888868', '9888888869', '9888888871', '9888888872', '9888888873', '9888888874', '9888888875', '9888888876', '9888888877', '9888888878', '9888888879', '9888888881', '9888888882', '9888888883', '9888888884', '9888888885', '9888888886', '9888888887', '9888888888', '9888888889', '9888888891', '9888888892', '9888888893', '9888888894', '9888888895', '9888888896', '9888888897', '9888888898', '9888888899']
Note that a simple solution in the form of a nested list comprehension presents itself when there are only two 'x':
d = [s.replace('x', '{}').format(a, b) for a in range(1, 10) for b in range(1, 10)]
However, multiple nested loops is not a clean approach to solving the problem when the input string contains three or more 'x's. Instead, recursion works best:
s = '98888888xxxxx'
_len = len(re.sub('\d+', '', s))
_c = combos([range(1, 10)]*_len)
new_result = [(lambda d:re.sub('x', lambda _:str(next(d)), s))(iter(i)) for i in _c]
Output (first twenty strings):
['9888888811111', '9888888811112', '9888888811113', '9888888811114', '9888888811115', '9888888811116', '9888888811117', '9888888811118', '9888888811119', '9888888811121', '9888888811122', '9888888811123', '9888888811124', '9888888811125', '9888888811126', '9888888811127', '9888888811128', '9888888811129', '9888888811131', '9888888811132']

Related

finding repeated substring in k length in a string using function

I just started using function and I'm trying to build one that's find a repeated substring that is length is at least k and returns the results into tuple that contains a dict.
the keys needs to be the substring and the value is how many times it was repeated, and then add to the tuple the length of the substring.
I just started but I didnt really knew how to continue but this is what I tried to do:
def longest_repeat(string, K)
longest = {} ,
if isinstance(K, int) and isinstance(string, str)
for sub_str in string:
if sub_str >= K:
longest[0][sub_seq] = DNA_seq_slic = []
a=0
b=k
for nuc in range(len(DNA_seq)-k+1):
DNA_seq_slic.append(DNA_seq[a:b])
a +=1
b +=1
import collections
for sub_seq in DNA_seq_slic:
repeated = [item for item, count in collections.Counter(DNA_seq_slic).items() if count > 1]
repeated_subseq_dict = dict(zip(repeated,[0 for x in range(0,len(repeated))]))
for key in repeated_subseq_dict:
repeated_subseq_dict[key] = DNA_seq_slic.count(key)
return(repeated_subseq_dict)
Im sorry if its a little bit messed up, I didnt really had direction and I tried to use other function I built to solve this and it didnt really worked. I can clarify more if needed.
the output should be something like this:
longest_repeated("ATAATACATAATA", 5)
output: longest = {ATAATA: 2} , 6
Really appreciate any kind of help! Thanks!
You can try re module:
import re
def longest_repeated(s, k):
m = re.findall(f"(.{{{k},}})(?=.*\\1)", s)
if m:
mx = max(m, key=len)
return {mx: s.count(mx)}, len(mx)
Some tests:
print(longest_repeated("ATAATACATAATA", 5))
({'ATAATA': 2}, 6)
print(longest_repeated("XXXXXATAATACATAATAXXXXX", 5))
({'ATAATA': 2}, 6)

Remove similar items in a list in Python

How do you remove similar items in a list in Python but only for a given item. Example,
l = list('need')
If 'e' is the given item then
l = list('nd')
The set() function will not do the trick since it will remove all duplicates.
count() and remove() is not efficient.
use filter
assuming you write function that decide on the items that you want to keep in the list.
for your example
def pred(x):
return x!="e"
l=list("need")
l=list(filter(pred,l))
Assuming given = 'e' and l= list('need').
for i in range(l.count(given)):
l.remove(given)
If you just want to replace 'e' from the list of words in a list, you can use regex re.sub(). If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn(). The first one will provide you strings in a list. The second will provide you a tuple (string, n) where n is the number of occurrences.
import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]
The output for j and k are :
j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]
If you want to count the total changes made, just iterate thru k and sum it. There are other simpler ways too. You can simply use regEx
re.subn('e','',''.join(lst))[1]
This will give you total number of items replaced in the list.
List comprehension Method. Not sure if the size/complexity is less than that of count and remove.
def scrub(l, given):
return [i for i in l if i not in given]
Filter method, again i'm not sure
def filter_by(l, given):
return list(filter(lambda x: x not in given, l))
Bruteforce with recursion but there are a lot of potential downfalls. Still an option. Again I don't know the size/comp
def bruteforce(l, given):
try:
l.remove(given[0])
return bruteforce(l, given)
except ValueError:
return bruteforce(l, given[1:])
except IndexError:
return l
return l
For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!
Below is the method I've chosen to use.
def timer(func, name):
print("-------{}-------".format(name))
try:
start = datetime.datetime.now()
x = func()
end = datetime.datetime.now()
print((end-start).microseconds)
except Exception, e:
print("Failed: {}".format(e))
print("\r")
The dataset we are testing against. Where l is our original list and q is the items we want to remove, and r is our expected result.
l = list("need"*50000)
q = list("ne")
r = list("d"*50000)
For posterity I've added the count / remove method the OP was against. (For good reason!)
def count_remove(l, given):
for i in given:
for x in range(l.count(i)):
l.remove(i)
return l
All that's left to do is test!
timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)
timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)
timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)
timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)
And our results
-------List Comp-------
10000
-------Filter-------
28000
-------Count/Remove-------
199000
-------Bruteforce-------
Failed: maximum recursion depth exceeded
Process finished with exit code 0
The Recursion method failed with a larger dataset, but we expected this. I tested on smaller datasets, and Recursion is marginally slower. I thought it would be faster.

Generate a dictionary of all possible Kakuro solutions

I'm just starting out with Python and had an idea to try to generate a dictionary of all the possible solutions for a Kakuro puzzle. There are a few posts out there about these puzzles, but none that show how to generate said dictionary. What I'm after is a dictionary that has keys from 3-45, with their values being tuples of the integers which sum to the key (so for example mydict[6] = ([1,5],[2,4],[1,2,3])). It is essentially a Subset Sum Problem - https://mathworld.wolfram.com/SubsetSumProblem.html
I've had a go at this myself and have it working for tuples up to three digits long. My method requires a loop for each additional integer in the tuple, so would require me to write some very repetitive code! Is there a better way to do this? I feel like i want to loop the creation of loops, if that is a thing?
def kakuro():
L = [i for i in range(1,10)]
mydict = {}
for i in L:
L1 = L[i:]
for j in L1:
if i+j in mydict:
mydict[i+j].append((i,j))
else:
mydict[i+j] = [(i,j)]
L2 = L[j:]
for k in L2:
if i+j+k in mydict:
mydict[i+j+k].append((i,j,k))
else:
mydict[i+j+k] = [(i,j,k)]
for i in sorted (mydict.keys()):
print(i,mydict[i])
return
my attempt round 2 - getting better!
def kakurodict():
from itertools import combinations as combs
L = [i for i in range(1,10)]
mydict={}
mydict2={}
for i in L[1:]:
mydict[i] = list(combs(L,i))
for j in combs(L,i):
val = sum(j)
if val in mydict2:
mydict2[val].append(j)
else:
mydict2[val] = [j]
return mydict2
So this is written with the following assumptions.
dict[n] cannot have a list with the value [n].
Each element in the subset has to be unique.
I hope there is a better solution offered by someone else, because when we generate all subsets for values 3-45, it takes quite some time. I believe the time complexity of the subset sum generation problem is 2^n so if n is 45, it's not ideal.
import itertools
def subsetsums(max):
if (max < 45):
numbers = [x for x in range(1, max)]
else:
numbers = [x for x in range(1, 45)]
result = [list(seq) for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == max]
return(result)
mydict = {}
for i in range(3, 46):
mydict[i] = subsetsums(i)
print(mydict)

Remove adjacent duplicates given a condition

I'm trying to write a function that will take a string, and given an integer, will remove all the adjacent duplicates larger than the integer and output the remaining string. I have this function right now that removes all the duplicates in a string, and I'm not sure how to put the integer constraint into it:
def remove_duplicates(string):
s = set()
list = []
for i in string:
if i not in s:
s.add(i)
list.append(i)
return ''.join(list)
string = "abbbccaaadddd"
print(remove_duplicates(string))
This outputs
abc
What I would want is a function like
def remove_duplicates(string, int):
.....
Where if for the same string I input int=2, I want to remove my n characters without removing all the characters. Output should be
abbccaadd
I'm also concerned about run time and complexity for very large strings, so if my initial approach is bad, please suggest a different approach. Any help is appreciated!
Not sure I understand your question correctly. I think that, given m repetitions of a character, you want to remove up to k*n duplicates such that k*n < m.
You could try this, using groupby:
>>> from itertools import groupby
>>> string = "abbbccaaadddd"
>>> n = 2
>>> ''.join(c for k, g in groupby(string) for c in k * (len(list(g)) % n or n))
'abccadd'
Here, k * (len(list(g)) % n or n) means len(g) % n repetitions, or n if that number is 0.
Oh, you changed it... now my original answer with my "interpretation" of your output actually works. You can use groupby together with islice to get at most n characters from each group of duplicates.
>>> from itertools import groupby, islice
>>> string = "abbbccaaadddd"
>>> n = 2
>>> ''.join(c for _, g in groupby(string) for c in islice(g, n))
'abbccaadd'
Create group of letters, but compute the length of the groups, maxed out by your parameter.
Then rebuild the groups and join:
import itertools
def remove_duplicates(string,maxnb):
groups = ((k,min(len(list(v)),maxnb)) for k,v in itertools.groupby(string))
return "".join(itertools.chain.from_iterable(v*k for k,v in groups))
string = "abbbccaaadddd"
print(remove_duplicates(string,2))
this prints:
abbccaadd
can be a one-liner as well (cover your eyes!)
return "".join(itertools.chain.from_iterable(v*k for k,v in ((k,min(len(list(v)),maxnb)) for k,v in itertools.groupby(string))))
not sure about the min(len(list(v)),maxnb) repeat value which can be adapted to suit your needs with a modulo (like len(list(v)) % maxnb), etc...
You should avoid using int as a variable name as it is a python keyword.
Here is a vanilla function that does the job:
def deduplicate(string: str, treshold: int) -> str:
res = ""
last = ""
count = 0
for c in string:
if c != last:
count = 0
res += c
last = c
else:
if count < treshold:
res += c
count += 1
return res

How to write all_subsets function using recursion?

Given the function all_subsets(lst), how can I write this function using recursion?
For example of input: [1,2,3], the output should be: [[], [1],[2],[3],[1,2],[1,3],[2,3][1,2,3]]
The assignment is to use recursive function. Please help. This is part of a lab assignment, so I am not graded on this, but at the same time, I am dying to learn how to write this code, and I don't know anyone in my lab class that's figured it out.
So far, I've got:
def all_subsets(b):
if len(b) == 0:
return ''
else:
lst = []
subsets = all_subsets(b[1:])
for i in b:
lst.append([i])
for i in subsets:
if b[0] not in i:
lst.append([b[0]] + i)
for i in subsets:
if b[1] not in i:
lst.append([b[1]] + i)
return lst
It can handle [1,2,3], but it can't handle anything bigger; plus this code also has weird output order
This should work:
def all_subsets(b):
if len(b)==1:
return [[], b] # if set has 1 element then it has only 2 substets
else:
s = all_subsets(b[:-1]) # calculate subsets of set without last element
# and compose remaining subsets
return sorted(s + [e + [b[-1]] for e in s], key=len) # you can omit sorting if you want

Categories

Resources