Related
Looking for a way to use a Python regex to extract all the characters in a string between to indexes. My code is below:
import re
txt = "Hula hoops are fun."
x = re.search(r"hoops", txt)
c = x.span()
a = c[0]
b = c[1]
print(a) # prints 5
print(b) # prints 10
txt2 = "Hula loops are fun."
z = re.???(a, b, txt2) #<------ Incorrect
print(z)
What I am trying to figure out is to somehow use a and b to get z = "loops" in txt2 (the rewrite of txt). Is there a python regex command to do this?
you can use z = txt[a:b] to extract all characters between a and b indices.
Why not using slices(the obvious way)?
z = txt2[a:b]
print(z) # loops
If you really want to use regex, you need to consume a . character a times to reach a because Regex doesn't have indexing directly. Then get the next b - a characters. In your case you end up with (?<=.{5}).{5} pattern. (?<=.{5}) part is a positive lookbehind assertion.
pat = rf"(?<=.{{{str(a)}}}).{{{str(b - a)}}}"
print(re.search(pat, txt2))
output:
<re.Match object; span=(5, 10), match='loops'>
import re
txt = "Hula hoops are fun."
x = re.search(r"hoops", txt)
c = x.span()
a = c[0]
b = c[1]
print(a) # prints 5
print(b) # prints 10
txt2 = "Hula loops are fun."
txt3 = list(txt2)
xy = txt3[a:b]
z = ""
for item in xy:
z = z + item
print(z)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am searching for a short and cool rot13 function in Python ;-)
I've written this function:
def rot13(s):
chars = "abcdefghijklmnopqrstuvwxyz"
trans = chars[13:]+chars[:13]
rot_char = lambda c: trans[chars.find(c)] if chars.find(c)>-1 else c
return ''.join( rot_char(c) for c in s )
Can anyone make it better? E.g supporting uppercase characters.
It's very simple:
>>> import codecs
>>> codecs.encode('foobar', 'rot_13')
'sbbone'
maketrans()/translate() solutions…
Python 2.x
import string
rot13 = string.maketrans(
"ABCDEFGHIJKLMabcdefghijklmNOPQRSTUVWXYZnopqrstuvwxyz",
"NOPQRSTUVWXYZnopqrstuvwxyzABCDEFGHIJKLMabcdefghijklm")
string.translate("Hello World!", rot13)
# 'Uryyb Jbeyq!'
Python 3.x
rot13 = str.maketrans(
'ABCDEFGHIJKLMabcdefghijklmNOPQRSTUVWXYZnopqrstuvwxyz',
'NOPQRSTUVWXYZnopqrstuvwxyzABCDEFGHIJKLMabcdefghijklm')
'Hello World!'.translate(rot13)
# 'Uryyb Jbeyq!'
This works on Python 2 (but not Python 3):
>>> 'foobar'.encode('rot13')
'sbbone'
The maketrans and translate methods of str are handy for this type of thing.
Here's a general solution:
import string
def make_rot_n(n):
lc = string.ascii_lowercase
uc = string.ascii_uppercase
trans = str.maketrans(lc + uc,
lc[n:] + lc[:n] + uc[n:] + uc[:n])
return lambda s: str.translate(s, trans)
rot13 = make_rot_n(13)
rot13('foobar')
# 'sbbone'
From the builtin module this.py (import this):
s = "foobar"
d = {}
for c in (65, 97):
for i in range(26):
d[chr(i+c)] = chr((i+13) % 26 + c)
print("".join([d.get(c, c) for c in s])) # sbbone
As of Python 3.1, string.translate and string.maketrans no longer exist. However, these methods can be used with bytes instead.
Thus, an up-to-date solution directly inspired from Paul Rubel's one, is:
rot13 = bytes.maketrans(
b"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ",
b"nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM")
b'Hello world!'.translate(rot13)
Conversion from string to bytes and vice-versa can be done with the encode and decode built-in functions.
Try this:
import codecs
codecs.encode("text to be rot13()'ed", "rot_13")
In python-3 the str-codec that #amber mentioned has moved to codecs standard-library:
> import codecs
> codecs.encode('foo', 'rot13')
sbb
The following function rot(s, n) encodes a string s with ROT-n encoding for any integer n, with n defaulting to 13. Both upper- and lowercase letters are supported. Values of n over 26 or negative values are handled appropriately, e.g., shifting by 27 positions is equal to shifting by one position. Decoding is done with invrot(s, n).
import string
def rot(s, n=13):
'''Encode string s with ROT-n, i.e., by shifting all letters n positions.
When n is not supplied, ROT-13 encoding is assumed.
'''
upper = string.ascii_uppercase
lower = string.ascii_lowercase
upper_start = ord(upper[0])
lower_start = ord(lower[0])
out = ''
for letter in s:
if letter in upper:
out += chr(upper_start + (ord(letter) - upper_start + n) % 26)
elif letter in lower:
out += chr(lower_start + (ord(letter) - lower_start + n) % 26)
else:
out += letter
return(out)
def invrot(s, n=13):
'''Decode a string s encoded with ROT-n-encoding
When n is not supplied, ROT-13 is assumed.
'''
return(rot(s, -n))
A one-liner to rot13 a string S:
S.translate({a : a + (lambda x: 1 if x>=0 else -1)(77 - a) * 13 for a in range(65, 91)})
For arbitrary values, something like this works for 2.x
from string import ascii_uppercase as uc, ascii_lowercase as lc, maketrans
rotate = 13 # ROT13
rot = "".join([(x[:rotate][::-1] + x[rotate:][::-1])[::-1] for x in (uc,lc)])
def rot_func(text, encode=True):
ascii = uc + lc
src, trg = (ascii, rot) if encode else (rot, ascii)
trans = maketrans(src, trg)
return text.translate(trans)
text = "Text to ROT{}".format(rotate)
encode = rot_func(text)
decode = rot_func(encode, False)
This works for uppercase and lowercase. I don't know how elegant you deem it to be.
def rot13(s):
rot=lambda x:chr(ord(x)+13) if chr(ord(x.lower())+13).isalpha()==True else chr(ord(x)-13)
s=[rot(i) for i in filter(lambda x:x!=',',map(str,s))]
return ''.join(s)
You can support uppercase letters on the original code posted by Mr. Walter by alternating the upper case and lower case letters.
chars = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz"
If you notice the index of the uppercase letters are all even numbers while the index of the lower case letters are odd.
A = 0 a = 1,
B = 2, b = 3,
C = 4, c = 4,
...
This odd-even pattern allows us to safely add the amount needed without having to worry about the case.
trans = chars[26:] + chars[:26]
The reason you add 26 is because the string has doubled in letters due to the upper case letters. However, the shift is still 13 spaces on the alphabet.
The full code:
def rot13(s):
chars = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz"
trans = chars[26:]+chars[:26]
rot_char = lambda c: trans[chars.find(c)] if chars.find(c) > -1 else c
return ''.join(rot_char(c) for c in s)
OUTPUT (Tested with python 2.7):
print rot13("Hello World!") --> Uryyb Jbeyq!
Interesting exercise ;-) i think i have the best solution because:
no modules needed, uses only built-in functions --> no deprecation
it can be used as a one liner
based on ascii, no mapping dicts/strings etc.
Python 2 & 3 (probably Python 1):
def rot13(s):
return ''.join([chr(ord(n) + (13 if 'Z' < n < 'n' or n < 'N' else -13)) if n.isalpha() else n for n in s])
def rot13_verbose(s):
x = []
for n in s:
if n.isalpha():
# 'n' is the 14th character in the alphabet so if a character is bigger we can subtract 13 to get rot13
ort = 13 if 'Z' < n < 'n' or n < 'N' else -13
x.append(chr(ord(n) + ort))
else:
x.append(n)
return ''.join(x)
# crazy .min version (99 characters) disclaimer: not pep8 compatible^
def r(s):return''.join([chr(ord(n)+(13if'Z'<n<'n'or'N'>n else-13))if n.isalpha()else n for n in s])
def rot13(s):
lower_chars = ''.join(chr(c) for c in range (97,123)) #ASCII a-z
upper_chars = ''.join(chr(c) for c in range (65,91)) #ASCII A-Z
lower_encode = lower_chars[13:] + lower_chars[:13] #shift 13 bytes
upper_encode = upper_chars[13:] + upper_chars[:13] #shift 13 bytes
output = "" #outputstring
for c in s:
if c in lower_chars:
output = output + lower_encode[lower_chars.find(c)]
elif c in upper_chars:
output = output + upper_encode[upper_chars.find(c)]
else:
output = output + c
return output
Another solution with shifting. Maybe this code helps other people to understand rot13 better.
Haven't tested it completely.
from string import maketrans, lowercase, uppercase
def rot13(message):
lower = maketrans(lowercase, lowercase[13:] + lowercase[:13])
upper = maketrans(uppercase, uppercase[13:] + uppercase[:13])
return message.translate(lower).translate(upper)
I found this post when I started wondering about the easiest way to implement
rot13 into Python myself. My goals were:
Works in both Python 2.7.6 and 3.3.
Handle both upper and lower case.
Not use any external libraries.
This meets all three of those requirements. That being said, I'm sure it's not winning any code golf competitions.
def rot13(string):
CLEAR = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
ROT13 = 'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm'
TABLE = {x: y for x, y in zip(CLEAR, ROT13)}
return ''.join(map(lambda x: TABLE.get(x, x), string))
if __name__ == '__main__':
CLEAR = 'Hello, World!'
R13 = 'Uryyb, Jbeyq!'
r13 = rot13(CLEAR)
assert r13 == R13
clear = rot13(r13)
assert clear == CLEAR
This works by creating a lookup table and simply returning the original character for any character not found in the lookup table.
Update
I got to worrying about someone wanting to use this to encrypt an arbitrarily-large file (say, a few gigabytes of text). I don't know why they'd want to do this, but what if they did? So I rewrote it as a generator. Again, this has been tested in both Python 2.7.6 and 3.3.
def rot13(clear):
CLEAR = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
ROT13 = 'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm'
TABLE = {x: y for x, y in zip(CLEAR, ROT13)}
for c in clear:
yield TABLE.get(c, c)
if __name__ == '__main__':
CLEAR = 'Hello, World!'
R13 = 'Uryyb, Jbeyq!'
r13 = ''.join(rot13(CLEAR))
assert r13 == R13
clear = ''.join(rot13(r13))
assert clear == CLEAR
I couldn't leave this question here with out a single statement using the modulo operator.
def rot13(s):
return ''.join([chr(x.islower() and ((ord(x) - 84) % 26) + 97
or x.isupper() and ((ord(x) - 52) % 26) + 65
or ord(x))
for x in s])
This is not pythonic nor good practice, but it works!
>> rot13("Hello World!")
Uryyb Jbeyq!
You can also use this also
def n3bu1A(n):
o=""
key = {
'a':'n', 'b':'o', 'c':'p', 'd':'q', 'e':'r', 'f':'s', 'g':'t', 'h':'u',
'i':'v', 'j':'w', 'k':'x', 'l':'y', 'm':'z', 'n':'a', 'o':'b', 'p':'c',
'q':'d', 'r':'e', 's':'f', 't':'g', 'u':'h', 'v':'i', 'w':'j', 'x':'k',
'y':'l', 'z':'m', 'A':'N', 'B':'O', 'C':'P', 'D':'Q', 'E':'R', 'F':'S',
'G':'T', 'H':'U', 'I':'V', 'J':'W', 'K':'X', 'L':'Y', 'M':'Z', 'N':'A',
'O':'B', 'P':'C', 'Q':'D', 'R':'E', 'S':'F', 'T':'G', 'U':'H', 'V':'I',
'W':'J', 'X':'K', 'Y':'L', 'Z':'M'}
for x in n:
v = x in key.keys()
if v == True:
o += (key[x])
else:
o += x
return o
Yes = n3bu1A("N zhpu fvzcyre jnl gb fnl Guvf vf zl Zragbe!!")
print(Yes)
Short solution:
def rot13(text):
return "".join([x if ord(x) not in range(65, 91)+range(97, 123) else
chr(((ord(x)-97+13)%26)+97) if x.islower() else
chr(((ord(x)-65+13)%26)+65) for x in text])
I want to swap each pair of characters in a string. '2143' becomes '1234', 'badcfe' becomes 'abcdef'.
How can I do this in Python?
oneliner:
>>> s = 'badcfe'
>>> ''.join([ s[x:x+2][::-1] for x in range(0, len(s), 2) ])
'abcdef'
s[x:x+2] returns string slice from x to x+2; it is safe for odd len(s).
[::-1] reverses the string in Python
range(0, len(s), 2) returns 0, 2, 4, 6 ... while x < len(s)
The usual way to swap two items in Python is:
a, b = b, a
So it would seem to me that you would just do the same with an extended slice. However, it is slightly complicated because strings aren't mutable; so you have to convert to a list and then back to a string.
Therefore, I would do the following:
>>> s = 'badcfe'
>>> t = list(s)
>>> t[::2], t[1::2] = t[1::2], t[::2]
>>> ''.join(t)
'abcdef'
Here's one way...
>>> s = '2134'
>>> def swap(c, i, j):
... c = list(c)
... c[i], c[j] = c[j], c[i]
... return ''.join(c)
...
>>> swap(s, 0, 1)
'1234'
>>>
''.join(s[i+1]+s[i] for i in range(0, len(s), 2)) # 10.6 usec per loop
or
''.join(x+y for x, y in zip(s[1::2], s[::2])) # 10.3 usec per loop
or if the string can have an odd length:
''.join(x+y for x, y in itertools.izip_longest(s[1::2], s[::2], fillvalue=''))
Note that this won't work with old versions of Python (if I'm not mistaking older than 2.5).
The benchmark was run on python-2.7-8.fc14.1.x86_64 and a Core 2 Duo 6400 CPU with s='0123456789'*4.
If performance or elegance is not an issue, and you just want clarity and have the job done then simply use this:
def swap(text, ch1, ch2):
text = text.replace(ch2, '!',)
text = text.replace(ch1, ch2)
text = text.replace('!', ch1)
return text
This allows you to swap or simply replace chars or substring.
For example, to swap 'ab' <-> 'de' in a text:
_str = "abcdefabcdefabcdef"
print swap(_str, 'ab','de') #decabfdecabfdecabf
Loop over length of string by twos and swap:
def oddswap(st):
s = list(st)
for c in range(0,len(s),2):
t=s[c]
s[c]=s[c+1]
s[c+1]=t
return "".join(s)
giving:
>>> s
'foobar'
>>> oddswap(s)
'ofbora'
and fails on odd-length strings with an IndexError exception.
There is no need to make a list. The following works for even-length strings:
r = ''
for in in range(0, len(s), 2) :
r += s[i + 1] + s[i]
s = r
A more general answer... you can do any single pairwise swap with tuples or strings using this approach:
# item can be a string or tuple and swap can be a list or tuple of two
# indices to swap
def swap_items_by_copy(item, swap):
s0 = min(swap)
s1 = max(swap)
if isinstance(item,str):
return item[:s0]+item[s1]+item[s0+1:s1]+item[s0]+item[s1+1:]
elif isinstance(item,tuple):
return item[:s0]+(item[s1],)+item[s0+1:s1]+(item[s0],)+item[s1+1:]
else:
raise ValueError("Type not supported")
Then you can invoke it like this:
>>> swap_items_by_copy((1,2,3,4,5,6),(1,2))
(1, 3, 2, 4, 5, 6)
>>> swap_items_by_copy("hello",(1,2))
'hlelo'
>>>
Thankfully python gives empty strings or tuples for the cases where the indices refer to non existent slices.
To swap characters in a string a of position l and r
def swap(a, l, r):
a = a[0:l] + a[r] + a[l+1:r] + a[l] + a[r+1:]
return a
Example:
swap("aaabcccdeee", 3, 7) returns "aaadcccbeee"
Do you want the digits sorted? Or are you swapping odd/even indexed digits? Your example is totally unclear.
Sort:
s = '2143'
p=list(s)
p.sort()
s = "".join(p)
s is now '1234'. The trick is here that list(string) breaks it into characters.
Like so:
>>> s = "2143658709"
>>> ''.join([s[i+1] + s[i] for i in range(0, len(s), 2)])
'1234567890'
>>> s = "badcfe"
>>> ''.join([s[i+1] + s[i] for i in range(0, len(s), 2)])
'abcdef'
re.sub(r'(.)(.)',r"\2\1",'abcdef1234')
However re is a bit slow.
def swap(s):
i=iter(s)
while True:
a,b=next(i),next(i)
yield b
yield a
''.join(swap("abcdef1234"))
One more way:
>>> s='123456'
>>> ''.join([''.join(el) for el in zip(s[1::2], s[0::2])])
'214365'
>>> import ctypes
>>> s = 'abcdef'
>>> mutable = ctypes.create_string_buffer(s)
>>> for i in range(0,len(s),2):
>>> mutable[i], mutable[i+1] = mutable[i+1], mutable[i]
>>> s = mutable.value
>>> print s
badcfe
def revstr(a):
b=''
if len(a)%2==0:
for i in range(0,len(a),2):
b += a[i + 1] + a[i]
a=b
else:
c=a[-1]
for i in range(0,len(a)-1,2):
b += a[i + 1] + a[i]
b=b+a[-1]
a=b
return b
a=raw_input('enter a string')
n=revstr(a)
print n
A bit late to the party, but there is actually a pretty simple way to do this:
The index sequence you are looking for can be expressed as the sum of two sequences:
0 1 2 3 ...
+1 -1 +1 -1 ...
Both are easy to express. The first one is just range(N). A sequence that toggles for each i in that range is i % 2. You can adjust the toggle by scaling and offsetting it:
i % 2 -> 0 1 0 1 ...
1 - i % 2 -> 1 0 1 0 ...
2 * (1 - i % 2) -> 2 0 2 0 ...
2 * (1 - i % 2) - 1 -> +1 -1 +1 -1 ...
The entire expression simplifies to i + 1 - 2 * (i % 2), which you can use to join the string almost directly:
result = ''.join(string[i + 1 - 2 * (i % 2)] for i in range(len(string)))
This will work only for an even-length string, so you can check for overruns using min:
N = len(string)
result = ''.join(string[min(i + 1 - 2 * (i % 2), N - 1)] for i in range(N))
Basically a one-liner, doesn't require any iterators beyond a range over the indices, and some very simple integer math.
While the above solutions do work, there is a very simple solution shall we say in "layman's" terms. Someone still learning python and string's can use the other answers but they don't really understand how they work or what each part of the code is doing without a full explanation by the poster as opposed to "this works". The following executes the swapping of every second character in a string and is easy for beginners to understand how it works.
It is simply iterating through the string (any length) by two's (starting from 0 and finding every second character) and then creating a new string (swapped_pair) by adding the current index + 1 (second character) and then the actual index (first character), e.g., index 1 is put at index 0 and then index 0 is put at index 1 and this repeats through iteration of string.
Also added code to ensure string is of even length as it only works for even length.
DrSanjay Bhakkad post above is also a good one that works for even or odd strings and is basically doing the same function as below.
string = "abcdefghijklmnopqrstuvwxyz123"
# use this prior to below iteration if string needs to be even but is possibly odd
if len(string) % 2 != 0:
string = string[:-1]
# iteration to swap every second character in string
swapped_pair = ""
for i in range(0, len(string), 2):
swapped_pair += (string[i + 1] + string[i])
# use this after above iteration for any even or odd length of strings
if len(swapped_pair) % 2 != 0:
swapped_adj += swapped_pair[-1]
print(swapped_pair)
badcfehgjilknmporqtsvuxwzy21 # output if the "needs to be even" code used
badcfehgjilknmporqtsvuxwzy213 # output if the "even or odd" code used
One of the easiest way to swap first two characters from a String is
inputString = '2134'
extractChar = inputString[0:2]
swapExtractedChar = extractChar[::-1] """Reverse the order of string"""
swapFirstTwoChar = swapExtractedChar + inputString[2:]
# swapFirstTwoChar = inputString[0:2][::-1] + inputString[2:] """For one line code"""
print(swapFirstTwoChar)
#Works on even/odd size strings
str = '2143657'
newStr = ''
for i in range(len(str)//2):
newStr += str[i*2+1] + str[i*2]
if len(str)%2 != 0:
newStr += str[-1]
print(newStr)
#Think about how index works with string in Python,
>>> a = "123456"
>>> a[::-1]
'654321'
I'm trying to slowly knock out all of the intricacies of python. Basically, I'm looking for some way, in python, to take a string of characters and push them all over by 'x' characters.
For example, inputing abcdefg will give me cdefghi (if x is 2).
My first version:
>>> key = 2
>>> msg = "abcdefg"
>>> ''.join( map(lambda c: chr(ord('a') + (ord(c) - ord('a') + key)%26), msg) )
'cdefghi'
>>> msg = "uvwxyz"
>>> ''.join( map(lambda c: chr(ord('a') + (ord(c) - ord('a') + key)%26), msg) )
'wxyzab'
(Of course it works as expected only if msg is lowercase...)
edit: I definitely second David Raznick's answer:
>>> import string
>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> key = 2
>>> tr = string.maketrans(alphabet, alphabet[key:] + alphabet[:key])
>>> "abcdefg".translate(tr)
'cdefghi'
I think your best bet is to look at string.translate. You may have to use make_trans to make the mapping you like.
I would do it this way (for conceptual simplicity):
def encode(s):
l = [ord(i) for i in s]
return ''.join([chr(i + 2) for i in l])
Point being that you convert the letter to ASCII, add 2 to that code, convert it back, and "cast" it into a string (create a new string object). This also makes no conversions based on "case" (upper vs. lower).
Potential optimizations/research areas:
Use of StringIO module for large strings
Apply this to Unicode (not sure how)
This solution works for both lowercase and uppercase:
from string import lowercase, uppercase
def caesar(text, key):
result = []
for c in text:
if c in lowercase:
idx = lowercase.index(c)
idx = (idx + key) % 26
result.append(lowercase[idx])
elif c in uppercase:
idx = uppercase.index(c)
idx = (idx + key) % 26
result.append(uppercase[idx])
else:
result.append(c)
return "".join(result)
Here is a test:
>>> caesar("abcdefg", 2)
'cdefghi'
>>> caesar("z", 1)
'a'
Another version. Allows for definition of your own alphabet, and doesn't translate any other characters (such as punctuation). The ugly part here is the loop, which might cause performance problems. I'm not sure about python but appending strings like this is a big no in other languages like Java and C#.
def rotate(data, n):
alphabet = list("abcdefghijklmopqrstuvwxyz")
n = n % len(alphabet)
target = alphabet[n:] + alphabet[:n]
translation = dict(zip(alphabet, target))
result = ""
for c in data:
if translation.has_key(c):
result += translation[c]
else:
result += c
return result
print rotate("foobar", 1)
print rotate("foobar", 2)
print rotate("foobar", -1)
print rotate("foobar", -2)
Result:
gppcbs
hqqdct
emmazq
dllzyp
The make_trans() solution suggested by others is the way to go here.
This is a generalization of the "string contains substring" problem to (more) arbitrary types.
Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:
Example usage (Sequence in Sequence):
>>> seq_in_seq([5,6], [4,'a',3,5,6])
3
>>> seq_in_seq([5,7], [4,'a',3,5,6])
-1 # or None, or whatever
So far, I just rely on brute force and it seems slow, ugly, and clumsy.
I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s
3
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s
(nothing)
Here's a brute-force approach O(n*m) (similar to #mcella's answer). It might be faster than the Knuth-Morris-Pratt algorithm implementation in pure Python O(n+m) (see #Gregg Lind answer) for small input sequences.
#!/usr/bin/env python
def index(subseq, seq):
"""Return an index of `subseq`uence in the `seq`uence.
Or `-1` if `subseq` is not a subsequence of the `seq`.
The time complexity of the algorithm is O(n*m), where
n, m = len(seq), len(subseq)
>>> index([1,2], range(5))
1
>>> index(range(1, 6), range(5))
-1
>>> index(range(5), range(5))
0
>>> index([1,2], [0, 1, 0, 1, 2])
3
"""
i, n, m = -1, len(seq), len(subseq)
try:
while True:
i = seq.index(subseq[0], i + 1, n - m + 1)
if subseq == seq[i:i + m]:
return i
except ValueError:
return -1
if __name__ == '__main__':
import doctest; doctest.testmod()
I wonder how large is the small in this case?
A simple approach: Convert to strings and rely on string matching.
Example using lists of strings:
>>> f = ["foo", "bar", "baz"]
>>> g = ["foo", "bar"]
>>> ff = str(f).strip("[]")
>>> gg = str(g).strip("[]")
>>> gg in ff
True
Example using tuples of strings:
>>> x = ("foo", "bar", "baz")
>>> y = ("bar", "baz")
>>> xx = str(x).strip("()")
>>> yy = str(y).strip("()")
>>> yy in xx
True
Example using lists of numbers:
>>> f = [1 , 2, 3, 4, 5, 6, 7]
>>> g = [4, 5, 6]
>>> ff = str(f).strip("[]")
>>> gg = str(g).strip("[]")
>>> gg in ff
True
Same thing as string matching sir...Knuth-Morris-Pratt string matching
>>> def seq_in_seq(subseq, seq):
... while subseq[0] in seq:
... index = seq.index(subseq[0])
... if subseq == seq[index:index + len(subseq)]:
... return index
... else:
... seq = seq[index + 1:]
... else:
... return -1
...
>>> seq_in_seq([5,6], [4,'a',3,5,6])
3
>>> seq_in_seq([5,7], [4,'a',3,5,6])
-1
Sorry I'm not an algorithm expert, it's just the fastest thing my mind can think about at the moment, at least I think it looks nice (to me) and I had fun coding it. ;-)
Most probably it's the same thing your brute force approach is doing.
Brute force may be fine for small patterns.
For larger ones, look at the Aho-Corasick algorithm.
Here is another KMP implementation:
from itertools import tee
def seq_in_seq(seq1,seq2):
'''
Return the index where seq1 appears in seq2, or -1 if
seq1 is not in seq2, using the Knuth-Morris-Pratt algorithm
based heavily on code by Neale Pickett <neale#woozle.org>
found at: woozle.org/~neale/src/python/kmp.py
>>> seq_in_seq(range(3),range(5))
0
>>> seq_in_seq(range(3)[-1:],range(5))
2
>>>seq_in_seq(range(6),range(5))
-1
'''
def compute_prefix_function(p):
m = len(p)
pi = [0] * m
k = 0
for q in xrange(1, m):
while k > 0 and p[k] != p[q]:
k = pi[k - 1]
if p[k] == p[q]:
k = k + 1
pi[q] = k
return pi
t,p = list(tee(seq2)[0]), list(tee(seq1)[0])
m,n = len(p),len(t)
pi = compute_prefix_function(p)
q = 0
for i in range(n):
while q > 0 and p[q] != t[i]:
q = pi[q - 1]
if p[q] == t[i]:
q = q + 1
if q == m:
return i - m + 1
return -1
I'm a bit late to the party, but here's something simple using strings:
>>> def seq_in_seq(sub, full):
... f = ''.join([repr(d) for d in full]).replace("'", "")
... s = ''.join([repr(d) for d in sub]).replace("'", "")
... #return f.find(s) #<-- not reliable for finding indices in all cases
... return s in f
...
>>> seq_in_seq([5,6], [4,'a',3,5,6])
True
>>> seq_in_seq([5,7], [4,'a',3,5,6])
False
>>> seq_in_seq([4,'abc',33], [4,'abc',33,5,6])
True
As noted by Ilya V. Schurov, the find method in this case will not return the correct indices with multi-character strings or multi-digit numbers.
For what it's worth, I tried using a deque like so:
from collections import deque
from itertools import islice
def seq_in_seq(needle, haystack):
"""Generator of indices where needle is found in haystack."""
needle = deque(needle)
haystack = iter(haystack) # Works with iterators/streams!
length = len(needle)
# Deque will automatically call deque.popleft() after deque.append()
# with the `maxlen` set equal to the needle length.
window = deque(islice(haystack, length), maxlen=length)
if needle == window:
yield 0 # Match at the start of the haystack.
for index, value in enumerate(haystack, start=1):
window.append(value)
if needle == window:
yield index
One advantage of the deque implementation is that it makes only a single linear pass over the haystack. So if the haystack is streaming then it will still work (unlike the solutions that rely on slicing).
The solution is still brute-force, O(n*m). Some simple local benchmarking showed it was ~100x slower than the C-implementation of string searching in str.index.
Another approach, using sets:
set([5,6])== set([5,6])&set([4,'a',3,5,6])
True