Longest common prefix using buffer? - python

If I have an input string and an array:
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
I am trying to find the longest common prefix between the consecutive elements of the array pos referencing the original s. I am trying to get the following output:
longest = [3,1]
The way I obtained this is by computing the longest common prefix of the following pairs:
s[15:] which is _be and s[2:] which is _be_or_not_to_be giving 3 ( _be )
s[2:] which is _be_or_not_to_be and s[8:] which is _not_to_be giving 1 ( _ )
However, if s is huge, I don't want to create multiple copies when I do something like s[x:]. After hours of searching, I found the function buffer that maintains only one copy of the input string but I wasn't sure what is the most efficient way to utilize it here in this context. Any suggestions on how to achieve this?

Here is a method without buffer which doesn't copy, as it only looks at one character at a time:
from itertools import islice, izip
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
length = len(s)
for start1, start2 in izip(pos, islice(pos, 1, None)):
pref = 0
for pos1, pos2 in izip(xrange(start1, length), xrange(start2, length)):
if s[pos1] == s[pos2]:
pref += 1
else:
break
print pref
# prints 3 1
I use islice, izip, and xrange in case you're talking about potentially very long strings.
I also couldn't resist this "One Liner" which doesn't even require any indexing:
[next((i for i, (a, b) in
enumerate(izip(islice(s, start1, None), islice(s, start2, None)))
if a != b),
length - max((start1, start2)))
for start1, start2 in izip(pos, islice(pos, 1, None))]
One final method, using os.path.commonprefix:
[len(commonprefix((buffer(s, n), buffer(s, m)))) for n, m in zip(pos, pos[1:])]

>>> import os
>>> os.path.commonprefix([s[i:] for i in pos])
'_'
Let Python to manage memory for you. Don't optimize prematurely.
To get the exact output you could do (as #agf suggested):
print [len(commonprefix([buffer(s, i) for i in adj_indexes]))
for adj_indexes in zip(pos, pos[1:])]
# -> [3, 1]

I think your worrying about copies is unfounded. See below:
>>> s = "how long is a piece of string...?"
>>> t = s[12:]
>>> print t
a piece of string...?
>>> id(t[0])
23295440
>>> id(s[12])
23295440
>>> id(t[2:20]) == id(s[14:32])
True
Unless you're copying the slices and leaving references to the copies hanging around, I wouldn't think it could cause any problem.
edit: There are technical details with string interning and stuff that I'm not really clear on myself. But I'm sure that a string slice is not always a copy:
>>> x = 'google.com'
>>> y = x[:]
>>> x is y
True
I guess the answer I'm trying to give is to just let python manage its memory itself, to begin with, you can look at memory buffers and views later if needed. And if this is already a real problem occurring for you, update your question with details of what the actual problem is.

One way of doing using buffer this is give below. However, there could be much faster ways.
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
lcp = []
length = len(pos) - 1
for index in range(0, length):
pre = buffer(s, pos[index])
cur = buffer(s, pos[index+1], pos[index+1]+len(pre))
count = 0
shorter, longer = min(pre, cur), max(pre, cur)
for i, c in enumerate(shorter):
if c != longer[i]:
break
else:
count += 1
lcp.append(count)
print
print lcp

Related

Removing whitespaces in string representation of the list in Python

In this Python code
def addToArrayForm(num,k):
num_string = ""
answer = []
for n in num:
num_string += str(n)
num_string = int(num_string) + k # This is an integer
for i in str(num_string):
answer.append(int(i))
print(answer)
addToArrayForm([1,2,0,0], 34)
I get this output => [1, 2, 3, 4]
How can I turn this output [1, 2, 3, 4] to this => [1,2,3,4] (what i wanna do is to remove these spaces between items)?
You can use replace method.
>>> answer = [1, 2, 3, 4]
>>> print(answer)
[1, 2, 3, 4]
>>> newanser = str(answer).replace(' ', '')
>>> print(newanser)
[1,2,3,4]
Good luck in leetcode ;)
You want to join all elements of answer with a comma, and then surround it in brackets. Let's do that!
def addToArrayForm(num,k):
num_string = ""
answer = []
for n in num:
num_string += str(n)
num_string = int(num_string) + k # This is an integer
for i in str(num_string):
answer.append(int(i))
# print(answer)
# Convert each element of answer to string, then join them all by comma
answer_s = ",".join(str(i) for i in answer) # 1,2,3,4
# Format answer_s into a string, surround it by brackets, print
print(f"[{answer_s}]")
Try it online
Your entire script can be reduced as below.
Working backwards through the list, start with k and increase with d * (powers of 10).
ex: k + 0*1 + 0*10 + 2*100 + 1*1000
def addToArrayForm(num,k):
for i,d in enumerate(num[::-1]):
k += d*((10**i) or 1)
#reformat as you requested
k = f"[{','.join(str(k))}]"
print(k)
addToArrayForm([1,2,0,0], 34) #[1,2,3,4]
One thing to note is: You initially have a list and the spaces you are trying to get rid of are just how a list is printed. ALL of the answers are converting your list to a str, in order to provide the result you requested. This ignores your list. You may want to consider that spaces being printed in your console isn't affecting anything, and jumping through hoops just to get rid of them is a waste of effort. You end up with something that looks like a list in the console, but isn't a list, at all.
All that being said: If you decide to keep the results that you already have, but want to get to those results in a much cleaner way, you can do this:
def addToArrayForm(num,k):
for i,d in enumerate(num[::-1]):
k += d*((10**i) or 1)
#reformat to list
num = list(map(int, str(k)))
print(num)
addToArrayForm([1,2,0,0], 34)

finding repeated substring in k length in a string using function

I just started using function and I'm trying to build one that's find a repeated substring that is length is at least k and returns the results into tuple that contains a dict.
the keys needs to be the substring and the value is how many times it was repeated, and then add to the tuple the length of the substring.
I just started but I didnt really knew how to continue but this is what I tried to do:
def longest_repeat(string, K)
longest = {} ,
if isinstance(K, int) and isinstance(string, str)
for sub_str in string:
if sub_str >= K:
longest[0][sub_seq] = DNA_seq_slic = []
a=0
b=k
for nuc in range(len(DNA_seq)-k+1):
DNA_seq_slic.append(DNA_seq[a:b])
a +=1
b +=1
import collections
for sub_seq in DNA_seq_slic:
repeated = [item for item, count in collections.Counter(DNA_seq_slic).items() if count > 1]
repeated_subseq_dict = dict(zip(repeated,[0 for x in range(0,len(repeated))]))
for key in repeated_subseq_dict:
repeated_subseq_dict[key] = DNA_seq_slic.count(key)
return(repeated_subseq_dict)
Im sorry if its a little bit messed up, I didnt really had direction and I tried to use other function I built to solve this and it didnt really worked. I can clarify more if needed.
the output should be something like this:
longest_repeated("ATAATACATAATA", 5)
output: longest = {ATAATA: 2} , 6
Really appreciate any kind of help! Thanks!
You can try re module:
import re
def longest_repeated(s, k):
m = re.findall(f"(.{{{k},}})(?=.*\\1)", s)
if m:
mx = max(m, key=len)
return {mx: s.count(mx)}, len(mx)
Some tests:
print(longest_repeated("ATAATACATAATA", 5))
({'ATAATA': 2}, 6)
print(longest_repeated("XXXXXATAATACATAATAXXXXX", 5))
({'ATAATA': 2}, 6)

find all possible rotation of a given string using python

Given string is "abc" then it should print out "abc", "bca", "cba"
My approach: find length of the given string and rotate them till length
def possible_rotation():
a = "abc"
b = len(a)
for i in range (b-1):
c = a[:i] + a[i:]
print c
Above code simply prints abc, abc. Any idea what am I missing here?
def possible_rotation():
a = "abc"
b = len(a)
for i in range (b):
c = a[i:]+a[:i]
print c
possible_rotation()
Output:
abc
bca
cab
You have 2 issues.The range issue and the rotation logic.it should be a[i:]+a[:i] not the other way round.For range range(b-1) should be range(b)
You have two errors:
range(b-1) should be range(b);
a[:i] + a[i:] should be a[i:] + a[:i].
This is what I did. I used a deque, A class in collections and then used the rotate function like this
from collections import deque
string = 'abc'
for i in range(len(string)):
c = deque(string)
c.rotate(i)
print ''.join(list(c))
And gives me this output.
abc
cab
bca
What it does. It creates a deque object, A double ended queue object, which has a method rotate, rotate takes the number of steps to rotate and returns the objects shifted to the right with the number of steps kinda like rshift in binary operations. Through the loops it shifts ad produces a deque object that I convert to list and finally to a string.
Hope this helps
for i in range(b):
print(a[i:] + a[:i])
0 - [a,b,c] + []
1 - [b,c] + [a]
2 - [c ] + [a,b]
swap the lists
No need to do (b-1),You simply do it by:
def possible_rotation():
a = "abc"
for i in range(0,len(a)):
strng = a[i:]+a[:i]
print strng
possible_rotation()
`
This looks to be homework, but here's a solution using the built-in collections.deque:
from collections import deque
def possible_rotations(string):
rotated = deque(string)
joined = None
while joined != string:
rotated.rotate(1)
joined = ''.join(x for x in rotated)
print(joined)
Test it out:
>>> print(possible_rotations('abc'))
cab
bca
abc
Two things:
Firstly, as already pointed out in the comments, you should iterate over range(b) instead of range(b-1). In general, range(b) is equal to [0, 1, ..., b-1], so in your example that would be [0, 1, 2].
Secondly, you switched around the two terms, it should be: a[i:] + a[:i].

Python: Check the occurrences in a list against a value

lst = [1,2,3,4,1]
I want to know 1 occurs twice in this list, is there any efficient way to do?
lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get.
The general function on the list is list.count(x), and will return the number of times x occurs in a list.
Are you asking whether every item in the list is unique?
len(set(lst)) == len(lst)
Whether 1 occurs more than once?
lst.count(1) > 1
Note that the above is not maximally efficient, because it won't short-circuit -- even if 1 occurs twice, it will still count the rest of the occurrences. If you want it to short-circuit you will have to write something a little more complicated.
Whether the first element occurs more than once?
lst[0] in lst[1:]
How often each element occurs?
import collections
collections.Counter(lst)
Something else?
For multiple occurrences, this give you the index of each occurence:
>>> lst=[1,2,3,4,5,1]
>>> tgt=1
>>> found=[]
>>> for index, suspect in enumerate(lst):
... if(tgt==suspect):
... found.append(index)
...
>>> print len(found), "found at index:",", ".join(map(str,found))
2 found at index: 0, 5
If you want the count of each item in the list:
>>> lst=[1,2,3,4,5,2,2,1,5,5,5,5,6]
>>> count={}
>>> for item in lst:
... count[item]=lst.count(item)
...
>>> count
{1: 2, 2: 3, 3: 1, 4: 1, 5: 5, 6: 1}
def valCount(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
u = [ x for x,y in valCount(lst).iteritems() if y > 1 ]
u is now a list of all values which appear more than once.
Edit:
#katrielalex: thank you for pointing out collections.Counter, of which I was not previously aware. It can also be written more concisely using a collections.defaultdict, as demonstrated in the following tests. All three methods are roughly O(n) and reasonably close in run-time performance (using collections.defaultdict is in fact slightly faster than collections.Counter).
My intention was to give an easy-to-understand response to what seemed a relatively unsophisticated request. Given that, are there any other senses in which you consider it "bad code" or "done poorly"?
import collections
import random
import time
def test1(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
def test2(lst):
res = collections.defaultdict(lambda: 0)
for v in lst:
res[v] += 1
return res
def test3(lst):
return collections.Counter(lst)
def rndLst(lstLen):
r = random.randint
return [r(0,lstLen) for i in xrange(lstLen)]
def timeFn(fn, *args):
st = time.clock()
res = fn(*args)
return time.clock() - st
def main():
reps = 5000
res = []
tests = [test1, test2, test3]
for t in xrange(reps):
lstLen = random.randint(10,50000)
lst = rndLst(lstLen)
res.append( [lstLen] + [timeFn(fn, lst) for fn in tests] )
res.sort()
return res
And the results, for random lists containing up to 50,000 items, are as follows:
(Vertical axis is time in seconds, horizontal axis is number of items in list)
Another way to get all items that occur more than once:
lst = [1,2,3,4,1]
d = {}
for x in lst:
d[x] = x in d
print d[1] # True
print d[2] # False
print [x for x in d if d[x]] # [1]
You could also sort the list which is O(n*log(n)), then check the adjacent elements for equality, which is O(n). The result is O(n*log(n)). This has the disadvantage of requiring the entire list be sorted before possibly bailing when a duplicate is found.
For a large list with a relatively rare duplicates, this could be the about the best you can do. The best way to approach this really does depend on the size of the data involved and its nature.

Number of elements in Python Set

I have a list of phone numbers that have been dialed (nums_dialed).
I also have a set of phone numbers which are the number in a client's office (client_nums)
How do I efficiently figure out how many times I've called a particular client (total)
For example:
>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4
Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.
which client has 10^5 numbers in his office? Do you do work for an entire telephone company?
Anyway:
print sum(1 for num in nums_dialed if num in client_nums)
That will give you as fast as possible the number.
If you want to do it for multiple clients, using the same nums_dialed list, then you could cache the data on each number first:
nums_dialed_dict = collections.defaultdict(int)
for num in nums_dialed:
nums_dialed_dict[num] += 1
Then just sum the ones on each client:
sum(nums_dialed_dict[num] for num in this_client_nums)
That would be a lot quicker than iterating over the entire list of numbers again for each client.
>>> client_nums = set([2, 3])
>>> nums_dialed = [1, 2, 2, 3, 3]
>>> count = 0
>>> for num in nums_dialed:
... if num in client_nums:
... count += 1
...
>>> count
4
>>>
Should be quite efficient even for the large numbers you quote.
Using collections.Counter from Python 2.7:
dialed_count = collections.Counter(nums_dialed)
count = sum(dialed_count[t] for t in client_nums)
Thats very popular way to do some combination of sorted lists in single pass:
nums_dialed = [1, 2, 2, 3, 3]
client_nums = [2,3]
nums_dialed.sort()
client_nums.sort()
c = 0
i = iter(nums_dialed)
j = iter(client_nums)
try:
a = i.next()
b = j.next()
while True:
if a < b:
a = i.next()
continue
if a > b:
b = j.next()
continue
# a == b
c += 1
a = i.next() # next dialed
except StopIteration:
pass
print c
Because "set" is unordered collection (don't know why it uses hashes, but not binary tree or sorted list) and it's not fair to use it there. You can implement own "set" through "bisect" if you like lists or through something more complicated that will produce ordered iterator.
The method I use is to simply convert the set into a list and then use the len() function to count its values.
set_var = {"abc", "cba"}
print(len(list(set_var)))
Output:
2

Categories

Resources