Split list of lines into 2d array - python

I have set of sequences in a list which looks like this :
[agghd,gjg,tomt]
How to split it so that my output looks like the following :
[[a,g,g,h,d],[g,j,g],[t,o,m,t]]
I have done the following code for now :
agghd
gjh
tomt
list2=[]
list2 = [str(sequences.seq).split() for sequences in family]

You can split a string to characters by calling list() on it
list1 = ['agghd', 'gjg', 'tomt']
list2 = [list(string) for string in list1]
# output: [['a', 'g', 'g', 'h', 'd'], ['g', 'j', 'g'], ['t', 'o', 'm', 't']]

You can try
[[eval(n) for n in str(sequences.seq).split()] for sequences in family]

Related

How to iterate over position through Numpy- Python

I am wondering if there is a way to iterate over individual positions in a sequence list using NumPy. For example, if I had a list of sequences:
a = ['AGHT','THIS','OWKF']
The function would be able to go through each individual characters in their position. So for the first sequence 'AGHT', it would be broken down into 'A','G','H','T'. The ultimate goal is to create individual grids based on character abundance in each one of these sequences. So far I have only been able to make a loop that goes through each character, but I need this in NumPy:
b = np.array(a)
for c in b:
for d in c:
print(d)
I would prefer this in NumPy, but if there are other ways I would like to know as well. Thanks!
list expands a string into a list:
In [406]: a = ['AGHT','THIS','OWKF']
In [407]: [list(item) for item in a]
Out[407]: [['A', 'G', 'H', 'T'], ['T', 'H', 'I', 'S'], ['O', 'W', 'K', 'F']]
You can use join() to join the array into a sequence of characters, then iterate over each character or print it like this:
>>> a = ['AGHT','THIS','OWKF']
>>> print(''.join(a))
'AGHTTHISOWKF'
Or to turn it into an array of individual characters:
>>> out = ''.join(a)
>>> b = np.array(list(out))
array(['A', 'G', 'H', 'T', 'T', 'H', 'I', 'S', 'O', 'W', 'K', 'F'],
dtype='<U1')

How can I split a list in two unique lists in Python?

Hi I have a list as following:
listt = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
15 members.
I want to turn it into 3 lists, I used this code it worked but I want unique lists. this give me 3 lists that have mutual members.
import random
listt = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
print(random.sample(listt,5))
print(random.sample(listt,5))
print(random.sample(listt,5))
Try this:
from random import shuffle
def randomise():
listt = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
shuffle(listt)
return listt[:5], listt[5:10], listt[10:]
print(randomise())
This will print (for example, since it is random):
(['i', 'k', 'c', 'b', 'a'], ['d', 'j', 'h', 'n', 'f'], ['e', 'l', 'o', 'g', 'm'])
If it doesn't matter to you which items go in each list, then you're better off partitioning the list into thirds:
In [23]: L = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
In [24]: size = len(L)
In [25]: L[:size//3]
Out[25]: ['a', 'b', 'c', 'd', 'e']
In [26]: L[size//3:2*size//3]
Out[26]: ['f', 'g', 'h', 'i', 'j']
In [27]: L[2*size//3:]
Out[27]: ['k', 'l', 'm', 'n', 'o']
If you want them to have random elements from the original list, you'll just need to shuffle the input first:
random.shuffle(L)
Instead of sampling your list three times, which will always give you three independent results where individual members may be selected for more than a single list, you could just shuffle the list once and then split it in three parts. That way, you get three random subsets that will not share any items:
>>> random.shuffle(listt)
>>> list[0:5]
>>> listt[0:5]
['b', 'a', 'f', 'e', 'h']
>>> listt[5:10]
['c', 'm', 'g', 'j', 'o']
>>> listt[10:15]
['d', 'l', 'i', 'n', 'k']
Note that random.shuffle will shuffle the list in place, so the original list is modified. If you don’t want to modify the original list, you should make a copy first.
If your list is larger than the desired result set, then of course you can also sample your list once with the combined result size and then split the result accordingly:
>>> sample = random.sample(listt, 5 * 3)
>>> sample[0:5]
['h', 'm', 'i', 'k', 'd']
>>> sample[5:10]
['a', 'b', 'o', 'j', 'n']
>>> sample[10:15]
['c', 'l', 'f', 'e', 'g']
This solution will also avoid modifying the original list, so you will not need a copy if you want to keep it as it is.
Use [:] for slicing all members out of the list which basically copies everything into a new object. Alternatively just use list(<list>) which copies too:
print(random.sample(listt[:],5))
In case you want to shuffle only once, store the shuffle result into a variable and copy later:
output = random.sample(listt,5)
first = output[:]
second = output[:]
print(first is second, first is output) # False, False
and then the original list can be modified without the first or second being modified.
For nested lists you might want to use copy.deepcopy().

Python: Get all combinations of sequential elements of list

Given an array say x = ['A','I','R']
I would want output as an
[['A','I','R'],['A','I'],['I','R'],['A'],['I'],['R']]
What I don't want as output is :
[['A','I','R'],['A','I'],['I','R'],['A','R'],['A'],['I'],['R']] # extra ['A','R'] which is not in sequence .
Below is the code which gives the output I don't want:
letter_list = [a for a in str]
all_word = []
for i in xrange(0,len(letter_list)):
all_word = all_word + (map(list, itertools.combinations(letter_list,i))) # dont use append. gives wrong result.
all_word = filter(None,all_word) # remove empty combination
all_word = all_word + [letter_list] # add original list
My point is I only want combinations of sequences. Is there any way to use itertools or should I write custom function ?
Yes, you can use itertools:
>>> x = ['A', 'I', 'R']
>>> xs = [x[i:j] for i, j in itertools.combinations(range(len(x)+1), 2)]
>>> xs
[['A'], ['A', 'I'], ['A', 'I', 'R'], ['I'], ['I', 'R'], ['R']]
>>> sorted(xs, key=len, reverse=True)
[['A', 'I', 'R'], ['A', 'I'], ['I', 'R'], ['A'], ['I'], ['R']]
Credit: answer by hochl
Try to use yield:
x = ['A','I','R']
def groupme(x):
s = tuple(x)
for size in range(1, len(s) + 1):
for index in range(len(s) + 1 - size):
yield list(x[index:index + size])
list(groupme(x))
>>> [['A'], ['I'], ['R'], ['A', 'I'], ['I', 'R'], ['A', 'I', 'R']]
don't try to be so magical: two loops will do what you want; one over possible sequence starts, the inner over possible sequence lengths:
x = "AIR" # strings are iterables/sequences, too!
all_words = []
for begin in xrange(len(x)):
for length in xrange(1,len(x) - begin+1):
all_words.append(x[begin:begin+length])
using list comprehension:
letters=['A', 'I', 'R']
[letters[start:end+1]
for start in xrange(len(letters))
for end in xrange(start, len(letters))]
[['A'], ['A', 'I'], ['A', 'I', 'R'], ['I'], ['I', 'R'], ['R']]
if it is important to have the order you proposed (from longest to shortest and when the same length by starting position) you can do instead:
[letters[start:start+l+1]
for l in range(len(letters))[::-1]
for start in xrange(len(letters)-l)]
[['A', 'I', 'R'], ['A', 'I'], ['I', 'R'], ['A'], ['I'], ['R']]
Just to address Holroy comment. If instead of using list comprehension you use a generator expression (just substituting external [] with ()) you would get a much less memory requiring code. But in this case you must be careful of not using the result more than once or for instance not trying to use list methods (such as len, or removing elements) on the result.

Python merging sublist

I've got the following list :
[['a','b','c'],['d','e'],['f','g','h','i',j]]
I would like a list like this :
['abc','de','fghij']
How is it possible?
[Edit] : in fact, my list could have strings and numbers,
l = [[1,2,3],[4,5,6], [7], [8,'a']]
and would be :
l = [123,456, 7, 8a]
thx to all,
you can apply ''.join method for all sublists.
This can be done either using map function or using list comprehensions
map function runs function passed as first argument to all elements of iterable object
initial = ['a', 'b', 'c'], ['d', 'e'], ['f', 'g', 'h', 'i', 'j']]
result = map(''.join, initial)
also one can use list comprehension
initial = ['a', 'b', 'c'], ['d', 'e'], ['f', 'g', 'h', 'i', 'j']]
result = [''.join(sublist) for sublist in initial]
Try
>>> L = [['a','b','c'],['d','e'],['f','g','h','i','j']]
>>> [''.join(x) for x in L]
['abc', 'de', 'fghij']

Multi-item sort based with two different list items and reorder/reshuffle?

I have a nested list that looks like this:
li = [['m', 'z', 'asdgwergerwhwre'],
['j', 'h', 'asdgasdgasdgasdgas'],
['u', 'a', 'asdgasdgasdgasd'],
['i', 'o', 'sdagasdgasdgdsag']]
I would like to sort this list alphabetically, BUT using either the first or second element in each sublist. For the above example, the desired output would be:
['a', 'u', 'asdgasdgasdgasd']
['h', 'j', 'asdgasdgasdgasdgas']
['i', 'o', 'sdagasdgasdgdsag']
['m', 'z', 'asdgwergerwhwre']
What is the best way to achieve this sort?
As the first step we perform some transformation (swap for first two items - if needed) and at the second aplly simple sort:
>>> sorted(map(lambda x: sorted(x[:2]) + [x[2]], li))
[['a', 'u', 'asdgasdgasdgasd'],
['h', 'j', 'asdgasdgasdgasdgas'],
['i', 'o', 'sdagasdgasdgdsag'],
['m', 'z', 'asdgwergerwhwre']]
You can make use of the built-in method sorted() to accomplish some of this. You would have to reverse the order of the list if you wanted to reverse the way it was printed, but that's not too difficult to do.
def rev(li):
for l in li:
l[0], l[1] = l[1], l[0]
return li
new_list = sorted(rev(li))
If you wanted to sort the list based on a specific index, you can use sorted(li, key=lambda li: li[index]).
import pprint
li = [['m', 'z', 'asdgwergerwhwre'],
['j', 'h', 'asdgasdgasdgasdgas'],
['u', 'a', 'asdgasdgasdgasd'],
['i', 'o', 'sdagasdgasdgdsag']]
for _list in li:
_list[:2]=sorted(_list[:2])
pprint.pprint(sorted(li))
>>>
[['a', 'u', 'asdgasdgasdgasd'],
['h', 'j', 'asdgasdgasdgasdgas'],
['i', 'o', 'sdagasdgasdgdsag'],
['m', 'z', 'asdgwergerwhwre']]

Categories

Resources