One-liner for splicing lists - python

I am looking for a pythonic way to splice two lists based on the values in one of them. One-liner would be preferred.
Say we have
[0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
and
['a', 'b', 'c', 'd', 'e', 'f']
and the result has to look like this:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']

You can use next with iter:
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f']
new_d = iter(d1)
result = [i if not i else next(new_d) for i in d]
Output:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']

One liner:
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f']
print( [d1.pop(0) if i==1 else i for i in d] )
Prints:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']
EDIT (More efficient approach):
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f'][::-1]
print( [d1.pop() if i==1 else i for i in d[::-1]] )

Similar to #Ajax1234's answer, but on a single line:
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f']
result = [d[i] if not d[i] else d1[d[:i].count(1)] for i in range(len(d))]
Result:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']

Related

Convert a list of string to category integer in Python

Given a list of string,
['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
I would like to convert to an integer-category form
[0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
This can achieve using numpy unique as below
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
_, opt = np.unique(np.array(ipt), return_inverse=True)
But, I curious if there is another alternative without the need to import numpy.
If you are solely interested in finding integer representation of factors, then you can use a dict comprehension along with enumerate to store the mapping, after using set to find unique values:
lst = ['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
d = {x: i for i, x in enumerate(set(lst))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [3, 3, 0, 3, 3, 3, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 1, 1, 1, 2, 1, 1, 1]
This approach can be used for general factors, i.e., the factors do not have to be 'a', 'b' and so on, but can be 'dog', 'bus', etc. One drawback is that it does not care about the order of factors. If you want the representation to preserve order, you can use sorted:
d = {x: i for i, x in enumerate(sorted(set(lst)))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
You could take a note out of the functional programming book:
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
opt = list(map(lambda x: ord(x)-97, ipt))
This code iterates through the input array and passes each element through the lambda function, which takes the ascii value of the character, and subtracts 97 (to convert the characters to 0-25).
If each string isn't a single character, then the lambda function may need to be adapted.
You could write a custom function to do the same thing as you are using numpy.unique() for.
def unique(my_list):
''' Takes a list and returns two lists, a list of each unique entry and the index of
each unique entry in the original list
'''
unique_list = []
int_cat = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Or if you wanted your indexing to be ordered.
def unique_ordered(my_list):
''' Takes a list and returns two lists, an ordered list of each unique entry and the
index of each unique entry in the original list
'''
# Unique list
unique_list = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
# Sorting unique list alphabetically
unique_list.sort()
# Integer category list
int_cat = []
for item in my_list:
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Comparing the computation time for these two vs numpy.unique() for 100,000 iterations of your example list, we get:
numpy = 2.236004s
unique = 0.460719s
unique_ordered = 0.505591s
Showing that either option would be faster than numpty for simple lists. More complicated strings decrease the speed of unique() and unique_ordered much more than numpy.unique(). Doing 10,000 iterations of a random, 100 element list of 20 character strings, we get times of:
numpy = 0.45465s
unique = 1.56963s
unique_ordered = 1.59445s
So if efficiency was important and your list had more complex/a larger variety of strings, it would likely be better to use numpy.unique()

Plotting strings in a list as datapoints in a linegraph

I have a list of lists as so:
lsofls = [0,0,0,0,0],["a",0,0,0,0],[0,"a",0,0,0],["b","a",0,0,0],["b",0,"a",0,0],["b",0,0,"a",0],[0,"b",0,"a",0],["c","b",0,"a",0],["c",0,"b","a",0],[0,"c","b","a",0],["d","c","b","a",0], ["d","c","b",0,"a"],["d","c","b",0,0],["d","c",0,"b",0]
And I wish to plot this, whereby each string in each list acts as its own datapoint. Each list in the list of lists is a point in time starting at t0 at the zeroth list. Each element in a list within the list of lists is a point in the sequence. I struggle to explain what I mean, but by printing the list of lists with each list as a new line it becomes clearer:
for s in lsofls:
print(s)
This gives:
[0, 0, 0, 0, 0]
['a', 0, 0, 0, 0]
[0, 'a', 0, 0, 0]
['b', 'a', 0, 0, 0]
['b', 0, 'a', 0, 0]
['b', 0, 0, 'a', 0]
[0, 'b', 0, 'a', 0]
['c', 'b', 0, 'a', 0]
['c', 0, 'b', 'a', 0]
[0, 'c', 'b', 'a', 0]
['d', 'c', 'b', 'a', 0]
['d', 'c', 'b', 0, 'a']
['d', 'c', 'b', 0, 0]
['d', 'c', 0, 'b', 0]
I essentially want to rotate this output 90 degrees anticlockwise, as a linegraph.
I am unsure how to do this, as I usually plot using integers.
I hope I am being clear enough, I am unsure how to phrase the question.
EDIT:
The solution provided by #ce.teuf is very close to what I need. However I need the string to be able to rejoin at position 1 in the graph. SO if you look at this list here:
lsofls = [0, 0, 0, 0, 0], ['a', 0, 0, 0, 0], [0, 'a', 0, 0, 0], ['b', 'a', 0, 0, 0], ['b', 0, 'a', 0, 0], ['b', 0, 0, 'a', 0], [0, 'b', 0, 'a', 0], ['c', 'b', 0, 'a', 0], ['c', 0, 'b', 'a', 0], [0, 'c', 'b', 'a', 0], ['d', 'c', 'b', 'a', 0], ['d', 'c', 'b', 0, 'a'], ['d', 'c', 'b', 0, 0], ['d', 'c', 0, 'b', 0], ['d', 'c', 0, 0, 'b'], ['d', 0, 'c', 0, 'b'], [0, 'd', 'c', 0, 'b'], ['a', 'd', 'c', 0, 'b']
for s in lsofls:
print(s)
So I need a way for each string to rejoin in the graph if that makes sense.
Using numpy essentially :
import numpy as np
import matplotlib.pyplot as plt
z = [[0, 0, 0, 0, 0],
['a', 0, 0, 0, 0],
[0, 'a', 0, 0, 0],
['b', 'a', 0, 0, 0],
['b', 0, 'a', 0, 0],
['b', 0, 0, 'a', 0],
[0, 'b', 0, 'a', 0],
['c', 'b', 0, 'a', 0],
['c', 0, 'b', 'a', 0],
[0, 'c', 'b', 'a', 0],
['d', 'c', 'b', 'a', 0],
['d', 'c', 'b', 0, 'a'],
['d', 'c', 'b', 0, 0],
['d', 'c', 0, 'b', 0]]
flat_list = [item for sublist in z for item in sublist]
series = list(set(flat_list))[1:]
y_len = len(z[0])
z2 = np.rot90(z)
for s in series:
z3 = np.argwhere(z2== s)
z3[:, 0] = (z3[:, 0] - y_len) * -1
y, x = z3[:, 0], z3[:, 1]
plt.plot(np.sort(x[::-1]), y[::-1])
plt.show()

Get right label using indices?

Really stupid question as I am new to python:
If I have labels = ['a', 'b', 'c', 'd'],
and indics = [2, 3, 0, 1]
How should I get the corresponding label using each index so I can get: ['c', 'd', 'a', 'b']?
There are a few alternatives, one, is to use a list comprehension:
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = [labels[i] for i in indices]
print(result)
Output
['c', 'd', 'a', 'b']
Basically iterate over each index and fetch the item at that position. The above is equivalent to the following for loop:
result = []
for i in indices:
result.append(labels[i])
A third option is to use operator.itemgetter:
from operator import itemgetter
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = list(itemgetter(*indices)(labels))
print(result)
Output
['c', 'd', 'a', 'b']

Pairwise similarity

I have pandas dataframe that looks like this:
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
name cards
0 ['A', 'B', 'C', 'D']
1 ['B', 'C', 'D', 'E']
2 ['E', 'F', 'G', 'H']
3 ['A', 'A', 'E', 'F']
And I'd like to create a matrix that looks like this:
name 0 1 2 3
name
0 4 3 0 1
1 3 4 1 1
2 0 1 4 2
3 1 1 2 4
Where the values are the number of items in common.
Any ideas?
Using .apply method and lambda we can directly get a dataframe
def func(df, j):
return pd.Series([len(set(i)&set(j)) for i in df.cards])
newdf = df.cards.apply(lambda x: func(df, x))
newdf
0 1 2 3
0 4 3 0 1
1 3 4 1 1
2 0 1 4 2
3 1 1 2 3
By list comprehension and iterate through all pairs we can make the result:
import pandas as pd
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
result=[[len(list(set(x) & set(y))) for x in df['cards']] for y in df['cards']]
print(result)
output :
[[4, 3, 0, 1], [3, 4, 1, 1], [0, 1, 4, 2], [1, 1, 2, 3]]
'&' is used to calculate intersection of two sets
This is exactly what you want:
import pandas as pd
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
result=[[len(x)-max(len(set(y) - set(x)),len(set(x) - set(y))) for x in df['cards']] for y in df['cards']]
print(result)
output:
[[4, 3, 0, 1], [3, 4, 1, 1], [0, 1, 4, 2], [1, 1, 2, 4]]
import pandas as pd
import numpy as np
df = pd.DataFrame([['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']])
nrows = df.shape[0]
# Initialization
matrix = np.zeros((nrows,nrows),dtype= np.int64)
for i in range(0,nrows):
for j in range(0,nrows):
matrix[i,j] = sum(df.iloc[:,i] == df.iloc[:,j])
output
print(matrix)
[[4 1 0 0]
[1 4 0 0]
[0 0 4 0]
[0 0 0 4]]

Transform character array into integers with python

I have a piece of data which is in the form of character array:
cgcgcg
aacacg
cgcaag
cgcacg
agaacg
cacaag
agcgcg
cgcaca
cacaca
agaacg
cgcacg
cgcgaa
Notice that each column consists of only two types characters. I need to transform them into integers 0 or 1, based on their percentage in the column. For instance in the 1st column, there are 8 c's and 4 a's, so c is in majority, then we need to code it as 0 and the other as 1.
Using zip() I can transpose this array in python, and get each column into a list:
In [28]: lines = [l.strip() for l in open(inputfn)]
In [29]: list(zip(*lines))
Out[29]:
[('c', 'a', 'c', 'c', 'a', 'c', 'a', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'g', 'g', 'g', 'a', 'g', 'g', 'a', 'g', 'g', 'g'),
('c', 'c', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'a', 'a', 'a', 'a', 'g', 'a', 'a', 'a', 'a', 'g'),
('c', 'c', 'a', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'c', 'a'),
('g', 'g', 'g', 'g', 'g', 'g', 'g', 'a', 'a', 'g', 'g', 'a')]
It's not necessary to transform them strictly into integers, i.e. 'c' to '0' or 'c' to int(0) will both be ok, since we are going to write them to a tab delimited file anyway.
Something like this:
lis = [('c', 'a', 'c', 'c', 'a', 'c', 'a', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'g', 'g', 'g', 'a', 'g', 'g', 'a', 'g', 'g', 'g'),
('c', 'c', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'a', 'a', 'a', 'a', 'g', 'a', 'a', 'a', 'a', 'g'),
('c', 'c', 'a', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'c', 'a'),
('g', 'g', 'g', 'g', 'g', 'g', 'g', 'a', 'a', 'g', 'g', 'a')]
def solve(lis):
for row in lis:
item1, item2 = set(row)
c1, c2 = row.count(item1), row.count(item2)
dic = {item1 : int(c1 < c2), item2 : int(c2 < c1)}
yield [dic[x] for x in row]
...
>>> list(solve(lis))
[[0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1]]
Using collections.Counter:
from collections import Counter
def solve(lis):
for row in lis:
c = Counter(row)
maxx = max(c.values())
yield [int(c[x] < maxx) for x in row]
...
>>> pprint(list(solve(lis)))
[[0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1]]

Categories

Resources