using for loop to replace bad nucleotides from DNA sequence - python

I have a list of sequences (for simplicity like the following one)
seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
and I want to use for looping to replace every instance of a nucleotide other than ["A","C","G","T"] with "N"
my code so far
seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
for x in range(len(seqList)):
for i in range(len(seqList[x])):
if seqList[x][i] not in ["A","C","G","T"]:
seqList[x][i].replace(seqList[x][i],"N")
print(seqList)
problem is, the nucleotides are not replaced and nothing changes in the original sequence
and i can't figure out the reason!!!

Strings in python are immutable.
You can make ot work like this
seqList= ["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
for x in range(len(seqList)):
stringl=list(seqList[x])
for i in range(len(seqList[x])):
if seqList[x][i] not in ["A","C","G","T"]:
stringl[i]="N"
seqList[x]="".join(stringl)

An aprouch without looping all letters would be replacing all letters which are not ACGT
def replace_bad(seq):
unique = [
letter
for letter in set(seq)
if letter not in "ACGT"
]
for each in unique:
seq = seq.replace(each, "N")
return seq
if __name__ == '__main__':
for seq in seqList:
print(replace_bad(seq))

Related

Print every two letters pairs in python3

I'm new to Python and I'm stuck in an exercise which tells me to provide a script printing every possible pairs of two letters, only lower case, one by line, ordered alphabetically and that is the closest thing that I could do
import string
x=string.ascii_lowercase
y=list(x)
for i in y:
print(i,end='')
for g in y:
print(g)
You only print the first letter of each pair once.
from string import ascii_lowercase as lowercase_letters
for first_letter in lowercase_letters:
for second_letter in lowercase_letters:
print(first_letter + second_letter)
Additionally:
You don't need to convert the string to a list, you can loop over a string just fine. In fact, that's how list(some_string) works!
I used more readable variable names.
Using from ... import means you don't need to have the additional assignment.
You need to print the i letter in the second for loop
import string
x=string.ascii_lowercase
for i in x:
for g in x:
print(i,g)
So the program will go through every letter in the first loop and will print then the whole alphabet, one by one, as the second letter in the second loop
word_list = ['WELCOME']
double_letters = []
for word in word_list:
for i,j in enumerate(word):
x = word[i:i+2]
if len(x) == 2:
double_letters.append(x)
print(double_letters)
If you are given a list of words. Then this is one possible way
Try this code this will print in alphabetic order
You have studied ASCII code so what it does is it loops through 97 to 122 which contains all the alphabets and then it joins them.
for firstchar in range(97, 123):
for secondchar in range(97, 123):
print(chr(firstchar) + chr(secondchar))
If use string module it is very simple task:
import string
for firstchar in string.ascii_lowercase:
for secondchar in string.ascii_lowercase:
print(firstchar + secondchar)

How to split a list from the console by lowercase,uppercase and mix-case strings in python 3

This is my code, but it doesn't work. It should read text from the console, split it into words and distribute them into 3 lists and use separators between them.
words = list(map(str, input().split(" ")))
lowercase_words = []
uppercase_words = []
mixedcase_words = []
def split_symbols(list):
from operator import methodcaller
list = words
map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space"))
return list
for word in words:
if words[word] == word.lower():
words[word] = lowercase_words
elif words[word] == word.upper():
words[word] = uppercase_words
else:
words[word] = mixedcase_words
print(f"Lower case: {split_symbols(lowercase_words)}")
print(f"Upper case: {split_symbols(uppercase_words)}")
print(f"Mixed case: {split_symbols(mixedcase_words)}")
There are several issues in your code.
1) words is a list and word is string. And you are trying to access the list with the index as string which will throw an error. You must use integer for indexing a list. In this case, you don't even need indexes.
2) To check lower or upper case you can just do, word == word.lower() or word == word.upper(). Or another approach would be to use islower() or isupper() function which return a boolean.
3) You are trying to assign an empty list to that element of list. What you want is to append the word to that particular list. You want something like lowercase_words.append(word). Same for uppercase and mixedcase
So, to fix this two issues you can write the code like this -
for word in words:
if word == word.lower(): # same as word.islower()
lowercase_words.append(word)
elif word == word.upper(): # same as word.isupper()
uppercase_words.append(word)
else:
mixedcase_words.append(word)
My advice would be to refrain from naming variable things like list. Also, in split_words() you are assigning list to words. I think you meant it other way around.
Now I am not sure about the "use separators between them" part of the question. But the line map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space")) is definitely wrong. map() takes a function and an iterable. In your code the iterable part is absent and I think this where the input param list fits in. So, it may be something like -
map(methodcaller("split"," "), list)
But then again I am not sure what are you trying to achieve with that many seperator

Step words anagram python

A step word is formed by taking a given word, adding a letter, and anagramming the result. For example, starting with the word "APPLE", you can add an "A" and anagram to get "APPEAL".
Given a global dictionary of words, create a function step(word) that returns a list of all unique, valid step words appearing in the dictionary.
Dictionary: https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt
I made a dictionary using the link using:
>>> words = open('words.txt', encoding='ascii').read().upper().split()
This assignment should be completed without any other library function calls. There are several solutions, but some are better and faster than others. How can you speed up your solution?
The solution should look like this.
>>> step("APPLE")
>>>['APPEAL', 'CAPPLE', 'PALPED', 'LAPPED', 'DAPPLE', 'ALEPPO', 'LAPPER', 'RAPPEL', 'LAPPET', 'PAPULE', 'UPLEAP']
As we know anagrams in sorted form are exactly same.
Logic:
Create a dictionary for look-up. where key would be sorted string and value would be array of anagrams of the key.
Add all the alphabet (A..Z) one by one to the input string, one at a time such that the result string contains one extra alphabet than the input string and is in sorted form. Now, find the anagrams in the dictionary created in previous step.
Combination of all the values you got from step 2 will be your expected output.
Talking about the complexity of run-time code. (excluding the time to create constants like dictionary for look-up, alphabets)
It will take O(NxLogN) + O(26xN) ~ O(NxLogN)
26: Number of Alphabets
N: length of input string
To sort the input string once the sorted function will take NlogN time in worst case.
To create the new sorted string by addition of one alphabet to the sorted input string will take 26xN time.
Code:
# array of all valid and unique words from the dictionary
valid_words = set(open('words.txt', encoding='ascii').read().upper().split())
look_up = {}
for word in valid_words:
try:
look_up[''.join(sorted(word))].append(word)
except KeyError:
look_up[''.join(sorted(word))] = [word]
alphabet_array = []
alphabet_dict = {}
for i in range (65, 91):
alphabet_dict[chr(i)]=i
alphabet_array.append(chr(i))
def step(word):
sorted_string = sorted(word)
length_of_input_string = len(sorted_string)
output_values = []
for i in alphabet_array:
new_str = ''
value_added = 0
for j in range (0, length_of_input_string):
if value_added==0 and (alphabet_dict[sorted_string[j]] > alphabet_dict[i]):
new_str += i
value_added = 1
new_str += sorted_string[j]
if value_added==0:
new_str += i
try:
output_values+=look_up[new_str]
except KeyError:
pass
return output_values
if __name__ == '__main__':
input_string = 'APPLE'
print (step(input_string))
Since anagrams have the same letters, if you alphabetically sort the letters in a word, you would get the same string for words that are anagrams of each other.
For example:
LEAP -> alphabetically sorted -> AEPL
PALE -> alphabetically sorted -> AEPL
1) You should iterate through all the words in your dictionary and create a look up of the alphabetically sorted string key to a list of words that have the same key.
Given a list of words
["PALE","LEAP"]
you will get the anagram lookup as follows
{
"AEPL"=>["PALE","LEAP"],
...
}
2) Next, take the input word, and try different combinations of alphabets to create a new string. Sort this string and lookup against the anagram dictionary for matches. Concatenate the lists returned into one list and return that list.
Let's say the input word is PEA, generate all combinations
["PEAA","PEAB"...,"PEAL",...]
Alphabetically sort every candidate word
["AAEP","ABEP",...,"AEPL",...]
Then lookup and concatenate the lists returned
["LEAP","PALE"]
Let me know if you want the python code here as well, but it should be easy to code this up. The speedup is primarily due to preprocessing the anagram lookup dictionary, due to which the final lookup runs in near constant time, but it uses additional space of the order of the words in the input list.
Please see the implementation and explanations following:
For this problem we can divide it into two parts: firstly try to build up a
words map using defaultdict to store all similar anagram words into list. For example words that have same letters, such as TEA, EAT, should have the same key.
Creating the word maps will enable us to loop through N words and sort them.
The run time will be O(N * k logk) - assuming the average word length is k.
Secondly, we can loop through each letter and add it to the given word, and
check if the new value are already for this key is in the maps. If so, we
find the step word and add it to the results.
from collections import defaultdict
from string import ascii_uppercase as uppers
def make_wordmap(dictionary): # first part - build up the lookup hash map
maps = defaultdict(list)
for word in dictionary:
maps[tuple(sorted(word))].append(word)
return maps
def step_words(word, dc): # search the word from dict by using the maps
word_map = make_wordmap(dc)
step_words = []
for letter in uppers:
key = tuple(sorted(word + letter))
if word_map[key]:
step_words.extend(word_map[key])
return step_words
if __name__ == '__main__':
dictionary = ['APPEAL', 'TEA', 'DAVY']
word = 'APPLE'
print(step_words(word, dictionary))
print(step_words('DAY', dictionary))
Here is a simple piece of code which: read the file and build a lookup table.
There are some code to handle special cases found in the file.
## letters which will be added to the word to lookup for anagrams
alphabet = 'abcdefghijklmnopqrstuvwxyz'
## read the file, lowercase the letters, and split it in case you have some blanks
dictionary = open('words.txt', encoding='ascii').read().lower().split()
## build a lookup dict which holds all words for a given sorted set of letters
lookup = {}
for word in dictionary:
try:
if word not in lookup[''.join(sorted(word))]: ## avoid possible duplicates in the dictionary if word is already in the dictionnary, like anagrams, or Upper/lowercase
lookup[''.join(sorted(word))].append(word)
except:
lookup[''.join(sorted(word))] = [word] ## create new dictionnary entry if key does not exists
## step function to find anagrams with one more letter
def step(word):
word = word.lower() ## works with lowercase word only
output = []
for i in alphabet: ## try to find anagrams with one extra letter added to the word
try:
output += lookup[''.join(sorted(word+i))] ## add the word found if an anagram is found in lookup dict
except:
pass
return list(set(output)) ## be sure to return only unique answers in a list
## main
if __name__ == '__main__':
print (step('aPPle'))
print (step('OV')) ## test for the dupes 'Ova' and 'ova' found in the file
print (step('a')) ## test for the one letter dupes 'a' and 'A' found in the file
print (step('A')) ## test for the one letter dupes 'a' and 'A' found in the file
The below program will Search for the Multiple Combinations of the input string and print the Anagrams. This can be further finetuned, but this is a basic starting point for checking the valid Anagram values.
#Program: Anagram Finder
from itertools import permutations
#Store the words text file in to an object for searching
with open('words.txt', 'r') as f:
dictionary = f.read()
dictionary = [x.lower() for x in dictionary.split('\n')]
#Get permutations of input word
def get_perms(value, length):
for l in range(length):
for perm in permutations(value, l):
yield ''.join(perm)
else:
return []
# Search the dictionary for possible Anagrams and list the valid ones
def fncSearchForAnagram():
y = ["Apple"] # Anagram check Sample input.
for i in y:
perms = get_perms(i, len(i))
for item in perms:
if item.lower() in dictionary: # converting search string to lower since the word file has all lower case chars
# This output will be your Anagram combination word listed in the Words.txt File
print(item)
fncSearchForAnagram()
To easily check if a word is an anagram of another we sort both words first.
And check if, after sorting they are equals. As anagram is just the rearranging of characters.
To check if in a word we can add some char to create another word, we just check if the char of a word are present in the other word.
list_of_words = ['handsome', 'handy', 'notright', 'and']
word_to_check = 'adn'
anagram_can_add_chars= []
anagram_cannot_add_chars= []
# loop through the list of words
for word in list_of_words:
# we sort the string so that we know that we have all the chars
if sorted(word_to_check) == sorted(word):
anagram_cannot_add_chars.append(word)
# we convert the strings to set
# this will remove duplicate chars on each
# but we can always add them
if set(sorted(word_to_check)).issubset(set(sorted(word))):
anagram_can_add_chars.append(word)
print(anagram_can_add_chars)
print('---')
print(anagram_cannot_add_chars)
result :
['handsome', 'handy', 'and']
---
['and']

Generating permutations in lexicographic order in Python

I am struggling with the following, is my code correct and how to test if it works?
Task: Take a string as a single input argument. You may assume the string consists of distinct lower case letters (in alphabetical order). You may assume the input is a string of letters in alphabetical order.
Return a list of strings where each string represents a permutation of the input string. The list of permutations must be in lexicographic order. (This is basically the ordering that dictionaries use. Order by the first letter (alphabetically), if tie then use the second letter, etc.
If the string contains a single character return a list containing that string
Loop through all character positions of the string containing the characters to be permuted, for each character:
Form a simpler string by removing the character
Generate all permutations of the simpler string recursively
Add the removed character to the front of each permutation of the simpler
word, and add the resulting permutation to a list
Return all these newly constructed permutations
[My code]
def perm_gen_lex(in_string):
if (len(in_string) <= 1):
return(in_string)
# List of all new combinations
empty_list = []
# All permutations
final_perm = perm_gen_lex(in_string[1:])
# Character to be removed
remove_char = in_string(0)
# Remaining part of string
remaining_string = in_string[1:]
for perm in final_perm[1:]:
for i in range(len(in_string) + 1):
return empty_list.append(perm[:i] + remove_char + perm[i:])
return empty_list
Some variation on this will get you moving:
from itertools import product
def combinations(string):
return [''.join(i) for i in product(string, repeat = len(string))]
print(combinations("abc"))
See https://docs.python.org/3/library/itertools.html#itertools.product

Rm duplication in list comprehension

Input is a string, the idea is to count the letters A-z only, and print them alphabetically with the count of appearances.
As usual I kept at this 'till I got a working result, but now seek to optimize it in order to better understand the Python way of doing things.
def string_lower_as_list(string):
"""
>>> string_lower_as_list('a bC')
['a', ' ', 'b', 'c']
"""
return list(string.lower())
from sys import argv
letters = [letter for letter in string_lower_as_list(argv[1])
if ord(letter) < 124 and ord(letter) > 96]
uniques = sorted(set(letters))
for let in uniques:
print let, letters.count(let)
How do I remove the duplication of ord(letter) in the list comprehension?
Would there have been any benefit in using a Dictionary or Tuple in this instance, if so, how?
EDIT
Should have said, Python 2.7 on win32
You can compare letters directly and you actually only need to compare lower case letters
letters = [letter for letter in string_lower_as_list(argv[1])
if "a" <= letter <= "z"]
But better would be to use a dictionary to count the values. letters.count has to traverse the list every time you call it. But you are already traversing the list to filter out the right characters, so why not count them at the same time?
letters = {}
for letter in string_lower_as_list(argv[1]):
if "a" <= letter <= "z":
letters[letter] = letters.get(letter, 0) + 1
for letter in sorted(letters):
print letter, letters[letter]
Edit: As the others said, you don't have to convert the string to a list. You can iterate over it directly: for letter in argv[1].lower().
How do I remove the duplication of ord(letter) in the list comprehension?
You can use a very Python-specific and somewhat magical idiom that doesn't work in other languages: if 96 < ord(letter) < 124.
Would there have been any benefit in using a Dictionary or Tuple in this instance, if so, how?
You could try using the collections.Counter class added in Python 2.7.
P.S. You don't need to convert the string to a list in order to iterate over it in the list comprehension. Any iterable will work, and strings are iterable.
P.S. 2. To get the property 'this letter is alphabetic', instead of lowercasing and comparing to a range, just use str.isalpha. Unicode objects provide the same method, which allows the same code to Just Work with text in foreign languages, without having to know which characters are "letters". :)
You don't have to convert string to list, string is iterable:
letters = {}
for letter in argv[1].lower():
if "a" <= letter <= "z":
letters[letter] = letters.get(letter, 0) + 1
for letter in sorted(letters.keys()):
print letter, letters[letter]

Categories

Resources