I have the following program:
from collections import Counter
counter=0
lst=list()
fhandle=open('DNAInput.txt','r')
for line in fhandle:
if line.startswith('>'):
continue
else:
lst.append(line)
while counter != len(lst[0]):
lst2=list()
for word in lst:
lst2.append(word[counter])
for letter in lst2:
mc=Counter(lst).most_common(5)
counter=counter +1
print(mc)
which takes the following inout file:
>1
GATCA
>2
AATC
>3
AATA
>4
ACTA
And prints out the letter that repeats the most in each Collin.
How can I make the exact same file without the "from collections import Counter"
If I understand what you are trying to do; find the most common character in each column(?) here is how you can do it:
def most_common(col, exclude_char='N'):
col = list(filter((exclude_char).__ne__, col))
return max(set(col), key=col.count)
sequences = []
with open('DNAinput.txt', 'r') as file:
for line in file:
if line[0] == '>':
continue
else:
sequences.append(line.strip())
m = max([len(v) for v in sequences])
matrix = [list(v) for v in sequences]
for seq in matrix:
seq.extend(list('N' * (m - len(seq))))
transposed_matrix = [[matrix[j][i] for j in range(len(matrix))] for i in range(m)]
for column in transposed_matrix:
print(most_common(column))
This works by:
Opening your file and reading it into a list like this:
# This is the `sequences` list
['GATCA', 'AATC', 'AATA', 'ACTA']
Get the length of the longest DNA sequence:
# m = max([len(v) for v in sequences])
5
Create a matrix (list of lists) from these sequences:
# matrix = [list(v) for v in sequences]
[['G', 'A', 'T', 'C', 'A'],
['A', 'A', 'T', 'C'],
['A', 'A', 'T', 'A'],
['A', 'C', 'T', 'A']]
Pad the matrix so all the sequences are the same length:
# for seq in matrix:
# seq.extend(list('N' * (m - len(seq))))
[['G', 'A', 'T', 'C', 'A'],
['A', 'A', 'T', 'C', 'N'],
['A', 'A', 'T', 'A', 'N'],
['A', 'C', 'T', 'A', 'N']]
Transpose the matrix so columns go top -> bottom (not left -> right). This places all the characters from the same position into a list together.
# [[matrix[j][i] for j in range(len(matrix))] for i in range(m)]
[['G', 'A', 'A', 'A'],
['A', 'A', 'A', 'C'],
['T', 'T', 'T', 'T'],
['C', 'C', 'A', 'A'],
['A', 'N', 'N', 'N']]
Finally, iterate over each list in the transposed matrix and call most_common with the sub-list as input:
# for column in transposed_matrix:
# print(most_common(column))
A
A
T
C
A
There are caveats to this approach; firstly, the most_common function I have included will return the first value in the event that there are the same number of nucleotides in a single postion (see position four, this could have been either A or C). Furthermore, the most_common function could take exponentially more time than using Counter from collections.
For these reasons, I would strongly recommend using the following script instead as collections is included with python on installation.
from collections import Counter
sequences = []
with open('DNAinput.txt', 'r') as file:
for line in file:
if line[0] == '>':
continue
else:
sequences.append(line.strip())
m = max([len(v) for v in sequences])
matrix = [list(v) for v in sequences]
for seq in matrix:
seq.extend(list('N' * (m - len(seq))))
transposed_matrix = [[matrix[j][i] for j in range(len(matrix))] for i in range(m)]
for column in transposed_matrix:
print(Counter(column).most_common(5))
You would have to go to the Collections module, in my case is located here:
C:\Python27\Lib\collections.py
And grab the parts you need and copy them into your script, in your case you need the Counter class.
This could get complicated if the Counter class is sourcing other things from that's script or other imported modules. You could go to those imported modules and also copy the code into your script but they as well could be referencing more modules.
What is the reason why you don't want to import a module in your script? Maybe there is a better solution to your problem than not importing anything.
Related
I have a list of lists and I want to remove duplicates within each nested list.
Input: [['c', 'p', 'p'], ['a', 'a', 'a'], ['t', 't', 'p']]
Output: [['c', 'p'], ['a'], ['t','p']]
The key here is that I cannot use the set() function or fromkeys().
Here is the code I have,
ans = []
for i in letters:
[ans.append([x]) for x in i if x not in ans]
which returns
[['c'], ['p'], ['p'], ['a'], ['a'], ['a'], ['t'], ['t'], ['p']]
which isn't what I want.
You tripped yourself up with the nested lists. A second loop is necessary to filter the elements. Although it's quite inefficient, you can write your attempt as
ans = []
for i in letters:
k = []
for j in i:
if j not in k:
k.append(j)
ans.append(k)
You can likely shorten this code, but not reduce its complexity.
To do that, you can use something sorted and itertools.groupby. this is still less efficient than a hash table, but better than linear lookup (although it likely doesn't matter much for short arrays):
ans = [[k for k, _ in groupby(sorted(i))] for i in letters]
You can iterate over the inner list and check if that character is already present or not
inputList = [['c', 'p', 'p'], ['a', 'a', 'a'], ['t', 't', 'p']]
result = []
for l in inputList:
# create a empty list to store intermediate result
tmp = []
# iterate over sublist
for ch in l:
if ch not in tmp: tmp.append(ch)
result.append(tmp)
print(result)
Since you can't use set() or fromkeys(), I would suggest a normal loop iteration, each time checking if value is already present:
lst = [['c', 'p', 'p'], ['a', 'a', 'a'], ['t', 't', 'p']]
new_lst = []
for x in lst:
res = []
for y in x:
if y not in res:
res.append(y)
new_lst.append(res)
print(new_lst)
Ideally, new_lst here should be a set.
list=[['c', 'p', 'p'], ['a', 'a', 'a'], ['t', 't', 'p']]
ans=[]
for sublist in list:
temp=[]
for ch in sublist:
if ch not in temp:
temp.append(ch)
ans.append(temp)
print(ans)
#I think it should work, very simple, it could be more complex
Just ignore every instance of a letter until it is the last one.
for every sublist of input:[[...] for sub in input]
store the letter if it isn't in the rest of the sublist:[ltr for i, ltr in enumerate(sub) if ltr not in sub[i+1:]]
Put it together and you have:
input = [['c', 'p', 'p'], ['a', 'a', 'a'], ['t', 't', 'p']]
output = [[ltr for i, ltr in enumerate(sub) if ltr not in sub[i+1:]] for sub in input]
print(output) #[['c', 'p'], ['a'], ['t', 'p']]
When solving the following problem:
"Assuming you have a random list of strings (for example: a, b, c, d, e, f, g), write a program that will sort the strings in alphabetical order.
You may not use the sort command."
I run into the problem of running strings through the following code, which sometimes gets me duplicated strings in final list
I am fairly new to python and our class just started to look at numpy, and functions in that module, and im not sure of any being used in the code (except any sort function).
import numpy as np
list=[]
list=str(input("Enter list of string(s): "))
list=list.split()
print() # for format space purposes
listPop=list
min=listPop[0]
newFinalList=[]
if(len(list2)!=1):
while(len(listPop)>=1):
for i in range(len(listPop)):
#setting min=first element of list
min=listPop[0]
if(listPop[i]<=min):
min=listPop[i]
print(min)
listPop.pop(i)
newFinalList.append(min)
print(newFinalList)
else:
print("Only one string inputted, so already alphabatized:",list2)
Expected result of ["a","y","z"]
["a","y","z"]
Actual result...
Enter list of string(s): a y z
a
a
a
['a', 'a', 'a']
Enter list of string(s): d e c
d
c
d
d
['c', 'd', 'd']
Selection sort: for each index i of the list, select the smallest item at or after i and swap it into the ith position. Here's an implementation in three lines:
# For each index i...
for i in range(len(list)):
# Find the position of the smallest item after (or including) i.
j = list[i:].index(min(list[i:])) + i
# Swap it into the i-th place (this is a no-op if i == j).
list[i], list[j] = list[j], list[i]
list[i:] is a slice (subset) of list starting at the ith element.
min(list) gives you the smallest element in list.
list.index(element) gives you the (first) index of element in list.
a, b = b, a atomically swaps the values of a and b.
The trickiest part of this implementation is that when you're using index to find the index of the smallest element, you need to find the index within the same list[i:] slice that you found the element in, otherwise you might select a duplicate element in an earlier part of the list. Since you're finding the index relative to list[i:], you then need to add i back to it to get the index within the entire list.
You can implement Quick sort for same:
def partition(arr,low,high):
i = ( low-1 )
pivot = arr[high]
for j in range(low , high):
if arr[j] <= pivot:
i = i+1
arr[i],arr[j] = arr[j],arr[i]
arr[i+1],arr[high] = arr[high],arr[i+1]
return ( i+1 )
def quickSort(arr,low,high):
if low < high:
pi = partition(arr,low,high)
quickSort(arr, low, pi-1)
quickSort(arr, pi+1, high)
arr = ['a', 'x', 'p', 'o', 'm', 'w']
n = len(arr)
quickSort(arr,0,n-1)
print ("Sorted list is:")
for i in range(n):
print ("%s" %arr[i]),
output:
Sorted array is:
a m o p w x
Mergesort:
from heapq import merge
from itertools import islice
def _ms(a, n):
return islice(a,n) if n<2 else merge(_ms(a,n//2),_ms(a,n-n//2))
def mergesort(a):
return type(a)(_ms(iter(a),len(a)))
# example
import string
import random
L = list(string.ascii_lowercase)
random.shuffle(L)
print(L)
print(mergesort(L))
Sample run:
['h', 'g', 's', 'l', 'a', 'f', 'b', 'z', 'x', 'c', 'r', 'j', 'q', 'p', 'm', 'd', 'k', 'w', 'u', 'v', 'y', 'o', 'i', 'n', 't', 'e']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
I have a problem trying to transform a list.
The original list is like this:
[['a','b','c',''],['c','e','f'],['c','g','h']]
now I want to make the output like this:
[['a','b','c','e','f'],['a','b','c','g','h']]
When the blank is found ( '' ) merge the three list into two lists.
I need to write a function to do this for me.
Here is what I tried:
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
combine(x, y)
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
but its not working the way I want.
try this :
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
result = []
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
result.append(combine(x[0:len(x)-2], y))
print(result)
your problem was with
combine(x[0:len(x)-2], y)
output :
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
So you basically want to merge 2 lists? If so, you can use one of 2 ways :
Either use the + operator, or use the
extend() method.
And then you put it into a function.
I made it with standard library only with comments. Please refer it.
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
# I can't make sure whether the xlist's item is just one or not.
# So, I made it to find all
# And, you can see how to get the last value of a list as [-1]
xlist = [x for x in mylist if x[-1] == '']
ylist = [x for x in mylist if x[-1] != '']
result = []
# combine matrix of x x y
for x in xlist:
for y in ylist:
c = x + y # merge
c = [i for i in c if i] # drop ''
c = list(set(c)) # drop duplicates
c.sort() # sort
result.append(c) # add to result
print (result)
The result is
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
Your code almost works, except you never do anything with the result of combine (print it, or add it to some result list), and you do not remove the '' element. However, for a longer list, this might be a bit slow, as it has quadratic complexity O(n²).
Instead, you can use a dictionary to map first elements to the remaining elements of the lists. Then you can use a loop or list comprehension to combine the lists with the right suffixes:
lst = [['a','b','c',''],['c','e','f'],['c','g','h']]
import collections
replacements = collections.defaultdict(list)
for first, *rest in lst:
replacements[first].append(rest)
result = [l[:-2] + c for l in lst if l[-1] == "" for c in replacements[l[-2]]]
# [['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
If the list can have more than one placeholder '', and if those can appear in the middle of the list, then things get a bit more complicated. You could make this a recursive function. (This could be made more efficient by using an index instead of repeatedly slicing the list.)
def replace(lst, last=None):
if lst:
first, *rest = lst
if first == "":
for repl in replacements[last]:
yield from replace(repl + rest)
else:
for res in replace(rest, first):
yield [first] + res
else:
yield []
for l in lst:
for x in replace(l):
print(x)
Output for lst = [['a','b','c','','b',''],['c','b','','e','f'],['c','g','b',''],['b','x','y']]:
['a', 'b', 'c', 'b', 'x', 'y', 'e', 'f', 'b', 'x', 'y']
['a', 'b', 'c', 'g', 'b', 'x', 'y', 'b', 'x', 'y']
['c', 'b', 'x', 'y', 'e', 'f']
['c', 'g', 'b', 'x', 'y']
['b', 'x', 'y']
try my solution
although it will change the order of list but it's quite simple code
lst = [['a', 'b', 'c', ''], ['c', 'e', 'f'], ['c', 'g', 'h']]
lst[0].pop(-1)
print([list(set(lst[0]+lst[1])), list(set(lst[0]+lst[2]))])
I have a file with a list of letters corresponding to another letter:
A['B', 'D']
B['A', 'E']
C[]
D['A', 'G']
E['B', 'H']
F[]
G['D']
H['E']
I need to import these lists to their corresponding letter, to hopefully have variables that look like this:
vertexA = ['B', 'D']
vertexB = ['A', 'E']
vertexC = []
vertexD = ['A', 'G']
vertexE = ['B', 'H']
vertexF = []
vertexG = ['D']
vertexH = ['E']
What would be the best way to do this? I tried searching for an answer but was unlucky in doing so. Thanks for any help.
You can try using dictionaries rather than variables, and I think it makes it easier as well to populate your data from your textfile.
vertex = {}
vertex['A'] = ['B', 'D']
vertex['A']
>>> ['B', 'D']
When you read your input file, the inputs should look like this:
string='A["B","C"]'
So, we know that the first letter is the name of the list.
import ast
your_list=ast.literal_eval(string[1:])
your_list:
['B', 'C']
You can take care of the looping, reading file, and string manipulation for proper naming...
Building a dictionary would probably be best. Each letter of the alphabet would be a key, and then the value would be a list of associated letters. Here's a proof of concept (not tested):
from string import string.ascii_uppercase
vertices = {}
# instantiate dict with uppercase letters of alphabet
for c in ascii_uppercase:
vertices[c] = []
# iterate over file and populate dict
with open("out.txt", "rb") as f:
for i, line in enumerate(f):
if line[0].upper() not in ascii_uppercase:
# you probably want to do some additional error checking
print("Error on line {}: {}".format(i, line))
else: # valid uppercase letter at beginning of line
list_open = line.index('[')
list_close = line.rindex(']') + 1 # one past end
# probably would want to validate record is in correct format before getting here
# translate hack to remove unwanted chars
row_values = line[list_open:list_close].translate(None, "[] '").split(',')
# do some validation for cases where row_values is empty
vertices[line[0].upper()].extend([e for e in row_values if e.strip() != ''])
Using it would then be easy:
for v in vertices['B']:
# do something with v
File A.txt:
A['B', 'D']
B['A', 'E']
C[]
D['A', 'G']
E['B', 'H']
F[]
G['D']
H['E']
The code:
with open('A.txt','r') as file:
file=file.read().splitlines()
listy=[[elem[0],elem[1:].strip('[').strip(']').replace("'",'').replace(' ','').split(',')] for elem in file]
This makes a nested list, but as Christian Dean said, is a better way to go.
Result:
[['A', ['B', 'D']], ['B', ['A', 'E']], ['C', ['']], ['D', ['A', 'G']], ['E', ['B', 'H']], ['F', ['']], ['G', ['D']], ['H', ['E']]]
I have a dataset along the lines of:
data.append(['a', 'b', 'c'], ['a', 'x', 'y', z'], ['a', 'x', 'e', 'f'], ['a'])
I've searched SO and found ways to return duplicates across all lists using intersection_update() (so, in this example, 'a'), but I actually want to return duplicates from any lists, i.e.,:
retVal = ['a', 'x']
Since 'a' and 'x' are duplicated at least once among all lists. Is there a built-in for Python 2.7 that can do this?
Use a Counter to determine the number of each item and chain.from_iterable to pass the items from the sublists to the Counter.
from itertools import chain
from collections import Counter
data=[['a', 'b', 'c'], ['a', 'x', 'y', 'z'], ['a', 'x', 'e', 'f'], ['a']]
c = Counter(chain.from_iterable(data))
retVal = [k for k, count in c.items() if count >= 2]
print(retVal)
#['x', 'a']