Generating permutations in lexicographic order in Python - python

I am struggling with the following, is my code correct and how to test if it works?
Task: Take a string as a single input argument. You may assume the string consists of distinct lower case letters (in alphabetical order). You may assume the input is a string of letters in alphabetical order.
Return a list of strings where each string represents a permutation of the input string. The list of permutations must be in lexicographic order. (This is basically the ordering that dictionaries use. Order by the first letter (alphabetically), if tie then use the second letter, etc.
If the string contains a single character return a list containing that string
Loop through all character positions of the string containing the characters to be permuted, for each character:
Form a simpler string by removing the character
Generate all permutations of the simpler string recursively
Add the removed character to the front of each permutation of the simpler
word, and add the resulting permutation to a list
Return all these newly constructed permutations
[My code]
def perm_gen_lex(in_string):
if (len(in_string) <= 1):
return(in_string)
# List of all new combinations
empty_list = []
# All permutations
final_perm = perm_gen_lex(in_string[1:])
# Character to be removed
remove_char = in_string(0)
# Remaining part of string
remaining_string = in_string[1:]
for perm in final_perm[1:]:
for i in range(len(in_string) + 1):
return empty_list.append(perm[:i] + remove_char + perm[i:])
return empty_list

Some variation on this will get you moving:
from itertools import product
def combinations(string):
return [''.join(i) for i in product(string, repeat = len(string))]
print(combinations("abc"))
See https://docs.python.org/3/library/itertools.html#itertools.product

Related

using for loop to replace bad nucleotides from DNA sequence

I have a list of sequences (for simplicity like the following one)
seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
and I want to use for looping to replace every instance of a nucleotide other than ["A","C","G","T"] with "N"
my code so far
seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
for x in range(len(seqList)):
for i in range(len(seqList[x])):
if seqList[x][i] not in ["A","C","G","T"]:
seqList[x][i].replace(seqList[x][i],"N")
print(seqList)
problem is, the nucleotides are not replaced and nothing changes in the original sequence
and i can't figure out the reason!!!
Strings in python are immutable.
You can make ot work like this
seqList= ["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
for x in range(len(seqList)):
stringl=list(seqList[x])
for i in range(len(seqList[x])):
if seqList[x][i] not in ["A","C","G","T"]:
stringl[i]="N"
seqList[x]="".join(stringl)
An aprouch without looping all letters would be replacing all letters which are not ACGT
def replace_bad(seq):
unique = [
letter
for letter in set(seq)
if letter not in "ACGT"
]
for each in unique:
seq = seq.replace(each, "N")
return seq
if __name__ == '__main__':
for seq in seqList:
print(replace_bad(seq))

How to repeat a char at an index in a python string the number of times given as a argument to the function?

So I am trying to learn Python and was wondering how to repeat the char at an index given as an argument in a string. An example would be below
def repeatchar(origString, repeatCount, lettersToRepeat)
>>> repeatChar('cat', 3, 'cr')
'cccat'
I have the comparison of origString and lettersToRepeat. I am trying to figure out how to get the char that is in lettersToRepeat to actually repeat repeatCount times.
Here's a succinct way to do it:
def repeatChar(origString, repeatCount, lettersToRepeat):
return "".join([l * repeatCount if l in lettersToRepeat else l for l in origString])
In python strings can be iterated over. This also uses list comprehension, which you might want to look up if you haven't seen before. Here the code between then square brackets produces a list. The "".join() will concatenate the characters in the list together (see: How to concatenate items in a list to a single string?).
You could use a recursive approach that performs the repetition for the first character (using replace()) and recurses for the rest of the letters:
def repeatChar(origString, repeatCount, lettersToRepeat):
if not lettersToRepeat: return origString
repeated = origString.replace(lettersToRepeat[0],lettersToRepeat[0]*repeatCount)
return repeatChar(repeated,repeatCount,lettersToRepeat[1:])
repeatChar('cat', 3, 'cr') # 'cccat'

Python algorithm in list

In a list of N strings, implement an algorithm that outputs the largest n if the entire string is the same as the preceding n strings. (i.e., print out how many characters in front of all given strings match).
My code:
def solution(a):
import numpy as np
for index in range(0,a):
if np.equal(a[index], a[index-1]) == True:
i += 1
return solution
else:
break
return 0
# Test code
print(solution(['abcd', 'abce', 'abchg', 'abcfwqw', 'abcdfg'])) # 3
print(solution(['abcd', 'gbce', 'abchg', 'abcfwqw', 'abcdfg'])) # 0
Some comments on your code:
There is no need to use numpy if it is only used for string comparison
i is undefined when i += 1 is about to be executed, so that will not run. There is no actual use of i in your code.
index-1 is an invalid value for a list index in the first iteration of the loop
solution is your function, so return solution will return a function object. You need to return a number.
The if condition is only comparing complete words, so there is no attempt to only compare a prefix.
A possible way to do this, is to be optimistic and assume that the first word is a prefix of all other words. Then as you detect a word where this is not the case, reduce the size of the prefix until it is again a valid prefix of that word. Continue like that until all words have been processed. If at any moment you find the prefix is reduced to an empty string, you can actually exit and return 0, as it cannot get any less than that.
Here is how you could code it:
def solution(words):
prefix = words[0] # if there was only one word, this would be the prefix
for word in words:
while not word.startswith(prefix):
prefix = prefix[:-1] # reduce the size of the prefix
if not prefix: # is there any sense in continuing?
return 0 # ...: no.
return len(prefix)
The description is somewhat convoluted but it does seem that you're looking for the length of the longest common prefix.
You can get the length of the common prefix between two strings using the next() function. It can find the first index where characters differ which will correspond to the length of the common prefix:
def maxCommon(S):
cp = S[0] if S else "" # first string is common prefix (cp)
for s in S[1:]: # go through other strings (s)
cs = next((i for i,(a,b) in enumerate(zip(s,cp)) if a!=b),len(cp))
cp = cp[:cs] # truncate to new common size (cs)
return len(cp) # return length of common prefix
output:
print(maxCommon(['abcd', 'abce', 'abchg', 'abcfwqw', 'abcdfg'])) # 3
print(maxCommon(['abcd', 'gbce', 'abchg', 'abcfwqw', 'abcdfg'])) # 0

Step words anagram python

A step word is formed by taking a given word, adding a letter, and anagramming the result. For example, starting with the word "APPLE", you can add an "A" and anagram to get "APPEAL".
Given a global dictionary of words, create a function step(word) that returns a list of all unique, valid step words appearing in the dictionary.
Dictionary: https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt
I made a dictionary using the link using:
>>> words = open('words.txt', encoding='ascii').read().upper().split()
This assignment should be completed without any other library function calls. There are several solutions, but some are better and faster than others. How can you speed up your solution?
The solution should look like this.
>>> step("APPLE")
>>>['APPEAL', 'CAPPLE', 'PALPED', 'LAPPED', 'DAPPLE', 'ALEPPO', 'LAPPER', 'RAPPEL', 'LAPPET', 'PAPULE', 'UPLEAP']
As we know anagrams in sorted form are exactly same.
Logic:
Create a dictionary for look-up. where key would be sorted string and value would be array of anagrams of the key.
Add all the alphabet (A..Z) one by one to the input string, one at a time such that the result string contains one extra alphabet than the input string and is in sorted form. Now, find the anagrams in the dictionary created in previous step.
Combination of all the values you got from step 2 will be your expected output.
Talking about the complexity of run-time code. (excluding the time to create constants like dictionary for look-up, alphabets)
It will take O(NxLogN) + O(26xN) ~ O(NxLogN)
26: Number of Alphabets
N: length of input string
To sort the input string once the sorted function will take NlogN time in worst case.
To create the new sorted string by addition of one alphabet to the sorted input string will take 26xN time.
Code:
# array of all valid and unique words from the dictionary
valid_words = set(open('words.txt', encoding='ascii').read().upper().split())
look_up = {}
for word in valid_words:
try:
look_up[''.join(sorted(word))].append(word)
except KeyError:
look_up[''.join(sorted(word))] = [word]
alphabet_array = []
alphabet_dict = {}
for i in range (65, 91):
alphabet_dict[chr(i)]=i
alphabet_array.append(chr(i))
def step(word):
sorted_string = sorted(word)
length_of_input_string = len(sorted_string)
output_values = []
for i in alphabet_array:
new_str = ''
value_added = 0
for j in range (0, length_of_input_string):
if value_added==0 and (alphabet_dict[sorted_string[j]] > alphabet_dict[i]):
new_str += i
value_added = 1
new_str += sorted_string[j]
if value_added==0:
new_str += i
try:
output_values+=look_up[new_str]
except KeyError:
pass
return output_values
if __name__ == '__main__':
input_string = 'APPLE'
print (step(input_string))
Since anagrams have the same letters, if you alphabetically sort the letters in a word, you would get the same string for words that are anagrams of each other.
For example:
LEAP -> alphabetically sorted -> AEPL
PALE -> alphabetically sorted -> AEPL
1) You should iterate through all the words in your dictionary and create a look up of the alphabetically sorted string key to a list of words that have the same key.
Given a list of words
["PALE","LEAP"]
you will get the anagram lookup as follows
{
"AEPL"=>["PALE","LEAP"],
...
}
2) Next, take the input word, and try different combinations of alphabets to create a new string. Sort this string and lookup against the anagram dictionary for matches. Concatenate the lists returned into one list and return that list.
Let's say the input word is PEA, generate all combinations
["PEAA","PEAB"...,"PEAL",...]
Alphabetically sort every candidate word
["AAEP","ABEP",...,"AEPL",...]
Then lookup and concatenate the lists returned
["LEAP","PALE"]
Let me know if you want the python code here as well, but it should be easy to code this up. The speedup is primarily due to preprocessing the anagram lookup dictionary, due to which the final lookup runs in near constant time, but it uses additional space of the order of the words in the input list.
Please see the implementation and explanations following:
For this problem we can divide it into two parts: firstly try to build up a
words map using defaultdict to store all similar anagram words into list. For example words that have same letters, such as TEA, EAT, should have the same key.
Creating the word maps will enable us to loop through N words and sort them.
The run time will be O(N * k logk) - assuming the average word length is k.
Secondly, we can loop through each letter and add it to the given word, and
check if the new value are already for this key is in the maps. If so, we
find the step word and add it to the results.
from collections import defaultdict
from string import ascii_uppercase as uppers
def make_wordmap(dictionary): # first part - build up the lookup hash map
maps = defaultdict(list)
for word in dictionary:
maps[tuple(sorted(word))].append(word)
return maps
def step_words(word, dc): # search the word from dict by using the maps
word_map = make_wordmap(dc)
step_words = []
for letter in uppers:
key = tuple(sorted(word + letter))
if word_map[key]:
step_words.extend(word_map[key])
return step_words
if __name__ == '__main__':
dictionary = ['APPEAL', 'TEA', 'DAVY']
word = 'APPLE'
print(step_words(word, dictionary))
print(step_words('DAY', dictionary))
Here is a simple piece of code which: read the file and build a lookup table.
There are some code to handle special cases found in the file.
## letters which will be added to the word to lookup for anagrams
alphabet = 'abcdefghijklmnopqrstuvwxyz'
## read the file, lowercase the letters, and split it in case you have some blanks
dictionary = open('words.txt', encoding='ascii').read().lower().split()
## build a lookup dict which holds all words for a given sorted set of letters
lookup = {}
for word in dictionary:
try:
if word not in lookup[''.join(sorted(word))]: ## avoid possible duplicates in the dictionary if word is already in the dictionnary, like anagrams, or Upper/lowercase
lookup[''.join(sorted(word))].append(word)
except:
lookup[''.join(sorted(word))] = [word] ## create new dictionnary entry if key does not exists
## step function to find anagrams with one more letter
def step(word):
word = word.lower() ## works with lowercase word only
output = []
for i in alphabet: ## try to find anagrams with one extra letter added to the word
try:
output += lookup[''.join(sorted(word+i))] ## add the word found if an anagram is found in lookup dict
except:
pass
return list(set(output)) ## be sure to return only unique answers in a list
## main
if __name__ == '__main__':
print (step('aPPle'))
print (step('OV')) ## test for the dupes 'Ova' and 'ova' found in the file
print (step('a')) ## test for the one letter dupes 'a' and 'A' found in the file
print (step('A')) ## test for the one letter dupes 'a' and 'A' found in the file
The below program will Search for the Multiple Combinations of the input string and print the Anagrams. This can be further finetuned, but this is a basic starting point for checking the valid Anagram values.
#Program: Anagram Finder
from itertools import permutations
#Store the words text file in to an object for searching
with open('words.txt', 'r') as f:
dictionary = f.read()
dictionary = [x.lower() for x in dictionary.split('\n')]
#Get permutations of input word
def get_perms(value, length):
for l in range(length):
for perm in permutations(value, l):
yield ''.join(perm)
else:
return []
# Search the dictionary for possible Anagrams and list the valid ones
def fncSearchForAnagram():
y = ["Apple"] # Anagram check Sample input.
for i in y:
perms = get_perms(i, len(i))
for item in perms:
if item.lower() in dictionary: # converting search string to lower since the word file has all lower case chars
# This output will be your Anagram combination word listed in the Words.txt File
print(item)
fncSearchForAnagram()
To easily check if a word is an anagram of another we sort both words first.
And check if, after sorting they are equals. As anagram is just the rearranging of characters.
To check if in a word we can add some char to create another word, we just check if the char of a word are present in the other word.
list_of_words = ['handsome', 'handy', 'notright', 'and']
word_to_check = 'adn'
anagram_can_add_chars= []
anagram_cannot_add_chars= []
# loop through the list of words
for word in list_of_words:
# we sort the string so that we know that we have all the chars
if sorted(word_to_check) == sorted(word):
anagram_cannot_add_chars.append(word)
# we convert the strings to set
# this will remove duplicate chars on each
# but we can always add them
if set(sorted(word_to_check)).issubset(set(sorted(word))):
anagram_can_add_chars.append(word)
print(anagram_can_add_chars)
print('---')
print(anagram_cannot_add_chars)
result :
['handsome', 'handy', 'and']
---
['and']

Calling different functions python

I need to call various different functions that i have already created in order to achieve the question below. I am really unsure how to programme this in order to achieve it. The question is..
Find two word anagrams in word list( str, str list ), where the input parameters should
be a string and a list of strings. The output should be a list of all strings made up of two
words separated by a space, such that both of these words are in str list and the combination
of the two words is an anagram of str.
The output i expect is:
wordlist = ('and','band','nor,'born')
find_two_word_anagrams_in_wordlist( "brandon", wordlist )
[’and born’, ’band nor’]
How to achieve this is:
Initialise a list two word anagrams to the empty list, [].
Call find partial anagrams in word list to get a list of all the
partial anagrams of str that can be found in the word list.
Then, do a loop that runs over all these partial anagrams. For each
partial anagram part anag do the following:
Remove the letters of part anag from the input string, str, to get
a string, rem that is the remaining letters after taking away the
partial anagram;
Call find anagrams in word list on rem (and the input word list) to
get a list of anagrams of the remaining letters;
For each anagram rem anag of the remaining letters, form the string
part anag + " " + rem anag and add it to the list two word anagrams.
At this point the list two word anagrams should have all the two
word anagrams in it, so return that list from your function.
Code i have already created is:
def find_partial_anagrams_in_word_list(str1,str_list):
partial_anagrams = []
for word in str_list:
if (partial_anagram(word, str1)):
partial_anagrams.append(word)
print(partial_anagrams)
def remove_letters(str1,str2):
str2_list = list(str2)
for char in str2:
if char in str1:
str2_list.remove(char)
return "".join(str2_list)
def find_anagrams_in_word_list(str1,str_list):
anagrams = []
for word in str_list:
if (anagram(str1,word)):
anagrams.append(word)
print(word)
Any step by step help or input would be appreciated.
You have two choices, you can look at anagrams in combinations of two different words, or you can look at anagrams in all two by two combinations.
The choice is yours, here I have implemented both
from itertools import combinations, combinations_with_replacement
def ana2(s,wl):
rl = []
for w1, w2 in combinations(wl,2):
w = w1+w2
if len(w) != len(s): continue
if sorted(w) == sorted(s): rl.append((w1, w2)
return rl
def ana2wr(s,wl):
rl = []
for w1, w2 in combinations_with_replacement(wl,2):
w = w1+w2
if len(w) != len(s): continue
if sorted(w) == sorted(s): rl.append((w1, w2))
return rl
and here it is some testing
wl = ('and','band','nor','born')
s1 = 'brandon'
s2 = 'naddan
print ana2(s1,wl)
print ana2(s2,wl)
print ana2wr(s1,wl)
print ana2wr(s2,wl)
that produces the following output
[('and', 'born'), ('band', 'nor')]
[]
[('and', 'born'), ('band', 'nor')]
[('and', 'and')]
You wrote
I need to call various different functions that i have already created
in order to achieve the question below
def find_2_word_anagram(word, wordlist)
# Initialise a list two word anagrams to the empty list, [].
analist = []
# Call find partial anagrams in word list to get a list of all the
# partial anagrams of str that can be found in the word list.
candidates = find_partial_anagrams_in_word_list(word,wordlist)
# Then, do a loop that runs over all these partial anagrams
for candidate in candidates:
# Remove the letters of part anag from the input string, str, to
# get a string, rem that is the remaining letters after taking
# away the partial anagram
remaining = remove_letter(candidate,words)
# Call find anagrams in word list on rem (and the input word list)
# to get a list of anagrams of the remaining letters;
matches = find_anagrams_in_word_list(remaining, wordlist)
for other in matches:
analist.append(" ".join(candidate,other))
return analist
Note that
you have still to write the inner functions following your specifications
when you write a function that returns, e.g., a list of words you MUST RETURN A LIST OF WORDS, and in particular I mean that it isn't enough that you print the matches from your function
to find an anagram, the idiom sorted(w1)==sorted(w2) is all you need, but the story with finding a partial anagram is more complex...
I took the liberty of inlining one of your functions, the one that computes the remainder.
When you strip the comments, that are your verbatim your specs, you have very few lines of code.
Post Scriptum
Have a thorough look at Daniel's comment to my previous answer, there's a lot in it...

Categories

Resources