How to make a simple pattern detector in python

How to make a simple pattern detector in python - python

Let's say I have some code: list=["r","s","r","s"]
I would want to print the next digit of the code. The expected output would, of course, be "r". Is there any way to do this in python?
I tried a couple of programs online, but they all didn't help me.

Assuming that your pattern starts at the beginning of your array, here is a way to find the next element:
def repeat(pattern, length):
return (length//len(pattern))*pattern + pattern[:length%len(pattern)]
def find_pattern(array):
# we successively try longer and longer patterns, starting with length 1
for len_attempt, _ in enumerate(array, 1):
pattern = array[:len_attempt]
if repeat(pattern, len(array)) == array:
return repeat(pattern, len(array)+1)[-1]
Here is the output of this function for various patterns:
arr = ['r', 's', 'r', 's']
print(find_pattern(arr))
>>> r
arr = ['r', 's', 'w', 'r', 's']
print(find_pattern(arr))
>>> w
arr = ['r', 's', 'w', 'w', 's']
print(find_pattern(arr))
>>> r # considering a pattern of length 5
Explanation:
First of all, we define a repeat function which will be useful later. It repeats a pattern to a given length. For example, if we give ['r', 's'] as a pattern and a length of 5, it will return ['r', 's', 'r', 's', 'r'].
Then, we try patterns of length 1, 2, 3... until when the repeat of this pattern gives us the original array. At this point we know that this pattern works best, and we return the next predicted element. In the worst case scenario, the program will consider a pattern of length len(array) in which case it will just return the first element of this array.
You can easily tweak this program to give :
not only the next element of the array, but the nth one.
The length of the pattern
If the pattern doesn't necessarily start at the beginning of your array, it shouldn't be too difficult to make this program work for this case too. (hint: remove the n first elements of the array and find a pattern that ends with these elements.)
I hope this is what you are looking for!

Related

How do I check if a sequence of characters exists in a list?

How do I check if a sequence of characters exists in a list?
I have a string with some characters that have sequences that reoccur. I know that strings are immutable so I turn the string into the list. However, I'm not sure how to iterate through the list, find the occurrence and change the first letter of the occurrence.
message: DDMCAXQVEKGYBNDDMZUH
Occurence is: DDM
list: ['D', 'D', 'M', 'C', 'A', 'X', 'Q', 'V', 'E', 'K', 'G', 'Y', 'B', 'N', 'D', 'D', 'M', 'Z', 'U', 'H']
What I have currently is simply turning the message into the list. I've tried different ways, which were unsuccessfully that's what I didn't post it. Not really asking you to write the code but at the least explain how to achieve this.

It's a lot easier to check if a string exists in another string since you can simply use the in operator:
if 'DDM' in message:
# do something
But since your goal is to change the first letter of the occurrence, you can use the str.index method to obtain the index of the occurrence and then assemble a new string with slices of the current string and the new letter:
try:
i = message.index('DDM')
message = message[:i] + new_letter + message[i + 1:]
except ValueError:
raise RuntimeError("Sequence 'DDM' not found in message.")

You can use re.sub():
import re
s = 'DDMCAXQVEKGYBNDDMZUH'
re.sub(r'DDM', '$DM', s)
# $DMCAXQVEKGYBN$DMZUH

A simple solution with a for-loop would be:
msg = 'DDMCAXQVEKGYBNDDMZUH'
occ = 'DDM'
for i in range(len(msg)):
if msg[i:i+len(occ)] == occ:
msg = msg[:i] + 'x' + msg[i+1:]
resulting in xDMCAXQVEKGYBNxDMZUH
This also works with overlapping substrings. For example:
msg = 'AAABAA'
occ = 'AA'
will give xxABxA

The simplest way would be using string replace() function.
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
message = "DDMCAXQVEKGYBNDDMZUH"
print message.replace("DDM", "ABC", 1)
Replace function would replace the first occurrence of DDM in the message string.
output: ABCCAXQVEKGYBNDDMZUH
If I carefully read your question you want to search the first occurrence of DDM in your message and replace the first character of it. In that case use below:
message = "DDMCAXQVEKGYBNDDMZUH"
print message.replace("DDM", "ADM", 1)
output: ADMCAXQVEKGYBNDDMZUH

Complexity of reverse sentence algorithm

I was working on a data-structure problem in Python where I have to reverse the order of the words in the array in the most efficient manner. I came up with the following solution to the problem
def reverse(arr, st, end):
while st < end:
arr[st], arr[end] = arr[end], arr[st]
end -= 1
st += 1
def reverse_arr(arr):
arr = arr[::-1]
st_index = 0
length = len(arr)
for i, val in enumerate(arr):
if val == ' ':
end_index = i-1
reverse(arr, st_index, end_index)
st_index = end_index + 2
if i == length - 1:
reverse(arr, st_index, length-1)
return arr
If the arr is:
arr = [ 'p', 'e', 'r', 'f', 'e', 'c', 't', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'r', 'a', 'c', 't', 'i', 'c', 'e' ]
It returns:
['p', 'r', 'a', 'c', 't', 'i', 'c', 'e', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'e', 'r', 'f', 'e', 'c', 't']
The solution works fine but I don't understand how the complexity of this algorithm is O(n). It's written that traversing the array twice with a constant number of actions for each item is linear i.e. O(n) where n is the length of the array.
I think it should be more than O(n) as according to me the length of each word is not fixed and time complexity to reverse each word depends on the length of the word. Can someone explain this in a better way?

reverse will get called once for each word. During that call, it will do a constant amount of work per character.
You can either represent this in terms of the number of words and average length of words (i.e. O(wordCount*averageWordLength)), or in terms of the total number of characters in the array. If you do the latter, it's easy to see that you're still doing a constant amount of work per character (since both reverse and reverse_arr does a constant amount of work per character, and no two reverse calls will include the same character), leading to O(characterCount) complexity.
I would not assume that "the length of the array" in the explanation refers to the number of words, but rather the number of characters, or they're assuming the word length has a fixed upper bound (in which the complexity is indeed O(wordCount)).
TL;DR: n in O(n) is characterCount, not wordCount.

def reverse(arr, st, end):
while st < end:
arr[st], arr[end] = arr[end], arr[st]
end -= 1
st += 1
def reverse_Cha(arr):
arr = arr[::-1]
st_index = 0
length = len(arr)
for i, val in enumerate(arr):
if val == ' ':
end_index = i-1
reverse(arr, st_index, end_index)
st_index = end_index + 2
if i == length - 1:
reverse(arr, st_index, length-1)
return arr
def reverse_Jon(arr):
r = [ch for word in ' '.join(''.join(arr).split()[::-1]) for ch in word]
return r
def reverse_Nua(arr):
rev_arr = list(' '.join(''.join(arr).split()[::-1]))
return rev_arr
If we considered the 3 proposed solutions: yours as reverse_Cha, Jon Clements' as reverse_Jon, and mine as reverse_Nua.
We note that we have O(n) when we use [::-1], when we examine each elements of a list (length n), etc.
reverse_Cha uses [::-1], then examine each elements twice (to read then to exchange), complexity is thus depending on the total number of elements (O(3n+c) which we write as O(n) (+c comes from O(1) operations))
reverse_Jon uses [::-1], then examine each elements twice (examine each character of each word), complexity is thus depending on the total number of elements and number of words (O(3n+m) which we write as O(n+m) (with m the number of words))
reverse_Nua uses [::-1], then stick to python list functions, complexity is thus still depending on the total number of elements (Just O(n) directly this time)
As term of performance (1e6 loops), we got reverse_Cha: 2.785867s; reverse_Jon: 4.11845s (due to for); reverse_Nua: 1.185973s.

I assume this is a purely theoretical question, because in real world applications you would probably rather split your list into one-word sublists, then rejoin the sublists in reverse order - that requires more memory, but is much faster.
Having said that, I'd like to point out that the algorithm you've shown is, indeed, O(n) - it depends on total length of your words, not on lengths of individual words. In other words: it will take the same time for 20 3-letter words, 6 10-letter words, 10 6-letter words… you always go through every letter only twice: once during reversal of individual words (that's the first call to reverse in reverse_arr) and once during reversal of the whole array (the second call to reverse).

Can this function return an int value by linking the elements of a list?

I am creating a function in order to develop a tiny word game. When I was creating it I got stuck when I tried to write the body. I was looking for information about Python and if I can write a return statement . It seems that it is possible but I didn't find out anything clear about that. This is my body function: This is my current progress: Am I close? or Should I try another method?
def num_words_on_board(board, words):
""" (list of list of str, list of str) -> int
Return how many words appear on board.
>>> num_words_on_board([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], ['ANT', 'BOX', 'SOB', 'TO'])
3
"""
count = 0
for word_list in board:
if words in ''.join(word_list):
count = count + 1
return count

Your question is lacking in explanation, but I'll answer the best I understood.
I think you are trying to do something like a wordsearch solver, mixed with a scramble puzzle?
Anyways, my recommendation is to make multiple functions for everything you need to solve. For example:
The way I see it, you need to know if the letters in the board can make up each of the words in the words variable. That can be done by one function. If you don't really need the order of the word, just the length, then we can do it like this.
def same (word, letters):
temp = []
for every_letter in word:
if every_letter in letters and every_letter not in temp:
temp.append(every_letter)
return len(temp) >= len(word)
This function takes only one word and a "bunch of letters" (for example a list from board ;) ) as parameters, then the function compares each letter in the word against each letter in the "bunch of letters" and if it finds a match it adds it to a temp variable, at the end of the iterations if the temp variable has at least the same count of letters as the initial `word' then it's safe to say that the word can be built.
*there is a problem with this function. If the original word has repeated letters, for example the word "butter" then this function will not work, but since this is not your case we are good to continue.
For the second part we have to use that function for every word in board so we'll use another function for that:
def num_words_on_board(board, words):
count = 0
for word in words:
for letters in board:
if same(word, letters):
count += 1
print(count) # This is not needed, just here for testing.
return count
And there we go. This function should return the count which is 3. I hope this helps you.
(if anyone wanna correct my code please feel free, it's not optimal by any means. Just thought it would be easy to understand like this, rather than the answer in the duplicate question mentioned by Stefan Pochmann)
:)

I had a previous function can I use it in order to create this new one?.
My previous function is this:
def board_contains_word(board, word):
""" (list of list of str, str) -> bool
Return True if and only if word appears in board.
Precondition: board has at least one row and one column.
>>> board_contains_word([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'ANT')
True
>>> board_contains_word([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'NNT')
False
>>> board_contains_word([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'NTT')
True
"""
for word_list in board:
if word in ''.join(word_list):
return True
return False

Longest repeating substring using for-loops and if-statements

I'm in an introductory level programming class that teaches python. I was introduced to a longest repeating substring problem for a project and I can't seem to crack it. I've looked on here for a solution, but I haven't learned suffix trees yet so I wouldn't be able to use them. So far, I've gotten here:
msg = "kalhfdlakdhfklajdf" (anything)
for i in range(len(msg)):
if msg[i] == msg[i + 1]:
reps.append(msg[i])
What this does is scan my string, msg, and check to see if the counter matches the next character in sequence. If the characters match, it appends msg[i] to the list "reps". My problem is that:
a) The function I created always appends one less than repetition amount, and
b) my function program always crashes due to msg[i+1] going out of bounds once it reaches the last spot on the list.
In essence, I want my program to find repeats, append them to a list where the highest repeating character is counted and returned to the user.

You need to use len(msg)-1 as your range but your condition will omit one character with your condition, and for getting ride of that you can add another condition to your code that check the preceding characters too :
with you'r condition you'll have 8 h in reps till there is 9 in msg:
>>> msg = "kalhfdlakdhhhhhhhhhfklajdf"
>>> reps = []
>>> for i in range(len(msg)-1):
... if msg[i] == msg[i + 1]:
... reps.append(msg[i])
...
>>> reps
['h', 'h', 'h', 'h', 'h', 'h', 'h', 'h']
And with another condition :
>>> reps=[]
>>> for i in range(len(msg)-1):
... if msg[i] == msg[i + 1] or msg[i] == msg[i - 1]:
... reps.append(msg[i])
...
>>> reps
['h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h']

For the groupby answer I alluded to on #Kasra's excellent response:
from itertools import groupby
msg = "kalhfdlakdhhhhhhhhhfklajdf"
maxcount = 0
for substring in groupby(msg):
lett, count = substring[0], len(list(substring[1]))
if count > maxlen:
maxcountlett = lett
maxcount = count
result = [maxcountlett] * maxlen
But note that this only works for substrings of length 1. msg = 'hahahaha' should give ['ha', 'ha', 'ha', 'ha'] by my understanding.

a) Think about what is happening when it makes the first match.
For example, given abcdeeef it sees that msg[4] matches msg[5]. It then goes and appends msg[4] to reps. Then msg[5] matches msg[6] and it appends msg[5] to reps. However, msg[6] does not match msg[7] so it does not append msg[6]. You are one short.
In order to fix this you need to append one extra for each string of matches. A good way to do this is to check if the character you're currently matching already exists in reps. If it does only append the current one. If it does not append it twice.
if msg[i] == msg[i+1]
if msg[i] in reps
reps.append(msg[i])
else
reps.append(msg[i])
reps.append(msg[i])
b) You need to ensure that you do not exceed your boundaries. This can be accomplished by taking 1 off of your range.
for i in (range(len(msg)-1))

Solving jumbled word puzzles with python?

I have an interesting programming puzzle for you:
You will be given two things:
A word containing a list of English words put together, e.g:
word = "iamtiredareyou"
Possible subsets:
subsets = [
'i', 'a', 'am', 'amt', 'm', 't', 'ti', 'tire', 'tired', 'i',
'ire', 'r', 're', 'red', 'redare', 'e', 'd', 'da', 'dar', 'dare',
'a', 'ar', 'are', 'r', 're', 'e', 'ey', 'y', 'yo', 'you', 'o', 'u'
]
Challenges:
Level-1: I need to pragmatically find the members in subsets which together in an order will make "iamtiredareyou" i.e. ['i', 'am', 'tired', 'are', 'you']
Level-2: The original string may consist of some extra characters in sequence which are not present in the subset. e.g. "iamtired12aareyou". The subset given is same as above, the solution should automatically include this subset in the right place in the result array. i.e. ['i', 'am', 'tired', '12a', 'are', 'you']
How can I do this?

Generally, a recursive algorithm would do.
Start with checking all subsets against start of a given word, if found — add (append) to found values and recurse with remaining part of the word and current found values.
Or if it's an end of the string — print found values.
something like that:
all=[]
def frec(word, values=[]):
gobal all
if word == "": # got result.
all+=[values]
for s in subsets:
if word.startswith(s):
frec(word[len(s):], values+[s])
frec(word)
note that there are lots of possible solutions since subsets include many one-character strings. You might want to find some shortest of results. (13146 solutions... use “all.sort(cmp=lambda x, y: cmp(len(x), len(y)))” to get shortest)
For a level2 — you need another loop if no subset matches that adds more and more symbols to next value (and recurses into that) until match is found.
all=[]
def frec(word, values=[]):
global all
if word == "": # got result.
all+=[values]
return true
match = False
for s in subsets:
if word.startswith(s):
match = True
frec(word[len(s):], values+[s])
if not match:
return frec(word[1:], values+[word[0]])
frec(word)
This does not try to combine non-subset values into one string, though.

i think you should do your own programming excercises....

For the Level 1 challenge you could do it recursively. Probably not the most efficient solution, but the easiest:
word = "iamtiredareyou"
subsets = ['i', 'a', 'am', 'amt', 'm', 't', 'ti', 'tire', 'tired', 'i', 'ire', 'r', 're', 'red', 'redare', 'e', 'd', 'da', 'dar', 'dare', 'a', 'ar', 'are', 'r', 're', 'e', 'ey', 'y', 'yo', 'you', 'o', 'u']
def findsubset():
global word
for subset in subsets:
if word.startswith(subset):
setlist.append(subset)
word = word[len(subset):]
if word == "":
print setlist
else:
findsubset()
word = subset + word
setlist.pop()
# Remove duplicate entries by making a set
subsets = set(subsets)
setlist = []
findsubset()
Your list of subsets has duplicates in it - e.g. 'a' appears twice - so my code makes it a set to remove the duplicates before searching for results.

Sorry about the lack of programming snippet, but I'd like to suggest dynamic programming. Attack level 1 and level 2 at the same time by giving each word a cost, and adding all the single characters not present as single character high cost words. The problem is then to find the way of splitting the sequence up into words that gives the least total cost.
Work from left to right along the sequence, at each point working out and saving the least cost solution up to and including the current point, and the length of the word that ends that solution. To work out the answer for the next point in the sequence, consider all of the known words that are suffixes of the sequence. For each such word, work out the best total cost by adding the cost of that word to the (already worked out) cost of the best solution ending just before that word starts. Note the smallest total cost and the length of the word that produces it.
Once you have the best cost for the entire sequence, use the length of the last word in that sequence to work out what the last word is, and then step back that number of characters to inspect the answer worked out at that point and get the word just preceding the last word, and so on.

Isn't it just the same as finding the permutations, but with some conditions? Like you start the permutation algorithm (a recursive one) you check if the string you already have matches the first X characters of your to find word, if yes you continue the recursion until you find the whole word, otherwise you go back.
Level 2 is a bit silly if you ask me, because then you could actually write anything as the "word to be found", but basically it would be just like level1 with the exception that if you can't find a substring in your list you simply add it (letter by letter i.e. you have "love" and a list of ['l','e'] you match 'l' but you lack 'o' so you add it and check if any of your words in the list start with a 'v' and match your word to be found, they don't so you add 'v' to 'o' etc.).
And if you're bored you can implement a genetical algorithm, it's really fun but not really efficient.

Here is a recursive, inefficient Java solution:
private static void findSolutions(Set<String> fragments, String target, HashSet<String> solution, Collection<Set<String>> solutions) {
if (target.isEmpty()) {
solutions.add(solution);
return;
}
for (String frag : fragments) {
if (target.startsWith(frag)) {
HashSet<String> solution2 = new HashSet<String>(solution);
solution2.add(frag);
findSolutions(fragments, target.substring(frag.length()), solution2, solutions);
}
}
}
public static Collection<Set<String>> findSolutions(Set<String> fragments, String target) {
HashSet<String> solution = new HashSet<String>();
Collection<Set<String>> solutions = new ArrayList<Set<String>>();
findSolutions(fragments, target, solution, solutions);
return solutions;
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make a simple pattern detector in python - python

Let's say I have some code: list=["r","s","r","s"] I would want to print the next digit of the code. The expected output would, of course, be "r". Is there any way to do this in python? I tried a couple of programs online, but they all didn't help me.

Related

How do I check if a sequence of characters exists in a list?

Complexity of reverse sentence algorithm

Can this function return an int value by linking the elements of a list?

Longest repeating substring using for-loops and if-statements

Solving jumbled word puzzles with python?

Categories

Resources