iterating on next item in sublist with condition in python - python

I have a list which is sorted and grouped based on 2 element of sublist like below
[[[2178393, 'a', 'online', 0, 20], [2178394, 'a', 'away', 0, 30], [2178395, 'a', 'away', 0, 40]],[[2178389, 'b', 'online', 0, 10], [2178390, 'b', 'online', 0, 15], [2178392, 'b', 'online', 1, 25], [2178391, 'b', 'online', 1, 30], [2178397, 'b', 'away', 1, 40]], [[2178388, 'c', 'online', 0, 15], [2178396, 'c', 'away', 0, 20], [2178402, 'c', 'online', 0,25], [2178408, 'c', 'online', 1, 50]]]
in above there are 3 sublists that contains the lists, i want to add 5th element(4th index) from next list to present list inside the sublists. In simple adding the 5th element(4th index) of every next sublist to the present sublist.
the output should be
[[[2178393, 'a', 'online', 0, 20,30], [2178394, 'a', 'away', 0, 30,40], [2178395, 'a', 'away', 0, 40]],[[2178389, 'b', 'online', 0, 10,15], [2178390, 'b', 'online', 0, 15,25], [2178392, 'b', 'online', 1, 25,30], [2178391, 'b', 'online', 1, 30,40], [2178397, 'b', 'away', 1, 40]], [[2178388, 'c', 'online', 0, 15,20], [2178396, 'c', 'away', 0, 20,25], [2178402, 'c', 'online', 0,25,50], [2178408, 'c', 'online', 1, 50]]]
Please help me.

Here is the code to achieve that
for outer in range(0,len(list)):
for inner in range(0,len(list[outer])-1):
list[outer][inner].append(list[outer][inner+1][4])
Desired Output

Use a nested list comprehension along with zip_longest. This takes advantage of the fact that each of the innermost lists just needs the last element of the next list to be appended to it, with the last innermost list being unchanged.
from itertools import zip_longest
data = [[[2178393, 'a', 'online', 0, 20], [2178394, 'a', 'away', 0, 30], [2178395, 'a', 'away', 0, 40]],[[2178389, 'b', 'online', 0, 10], [2178390, 'b', 'online', 0, 15], [2178392, 'b', 'online', 1, 25], [2178391, 'b', 'online', 1, 30], [2178397, 'b', 'away', 1, 40]], [[2178388, 'c', 'online', 0, 15], [2178396, 'c', 'away', 0, 20], [2178402, 'c', 'online', 0,25], [2178408, 'c', 'online', 1, 50]]]
expected = [[[2178393, 'a', 'online', 0, 20,30], [2178394, 'a', 'away', 0, 30,40], [2178395, 'a', 'away', 0, 40]],[[2178389, 'b', 'online', 0, 10,15], [2178390, 'b', 'online', 0, 15,25], [2178392, 'b', 'online', 1, 25,30], [2178391, 'b', 'online', 1, 30,40], [2178397, 'b', 'away', 1, 40]], [[2178388, 'c', 'online', 0, 15,20], [2178396, 'c', 'away', 0, 20,25], [2178402, 'c', 'online', 0,25,50], [2178408, 'c', 'online', 1, 50]]]
result = [[bottom_list_first + bottom_list_second[-1:]
for bottom_list_first, bottom_list_second
in zip_longest(middle_list, middle_list[1:], fillvalue=[])]
for middle_list in data]
print(result == expected)
Output:
True

Related

Plotting strings in a list as datapoints in a linegraph

I have a list of lists as so:
lsofls = [0,0,0,0,0],["a",0,0,0,0],[0,"a",0,0,0],["b","a",0,0,0],["b",0,"a",0,0],["b",0,0,"a",0],[0,"b",0,"a",0],["c","b",0,"a",0],["c",0,"b","a",0],[0,"c","b","a",0],["d","c","b","a",0], ["d","c","b",0,"a"],["d","c","b",0,0],["d","c",0,"b",0]
And I wish to plot this, whereby each string in each list acts as its own datapoint. Each list in the list of lists is a point in time starting at t0 at the zeroth list. Each element in a list within the list of lists is a point in the sequence. I struggle to explain what I mean, but by printing the list of lists with each list as a new line it becomes clearer:
for s in lsofls:
print(s)
This gives:
[0, 0, 0, 0, 0]
['a', 0, 0, 0, 0]
[0, 'a', 0, 0, 0]
['b', 'a', 0, 0, 0]
['b', 0, 'a', 0, 0]
['b', 0, 0, 'a', 0]
[0, 'b', 0, 'a', 0]
['c', 'b', 0, 'a', 0]
['c', 0, 'b', 'a', 0]
[0, 'c', 'b', 'a', 0]
['d', 'c', 'b', 'a', 0]
['d', 'c', 'b', 0, 'a']
['d', 'c', 'b', 0, 0]
['d', 'c', 0, 'b', 0]
I essentially want to rotate this output 90 degrees anticlockwise, as a linegraph.
I am unsure how to do this, as I usually plot using integers.
I hope I am being clear enough, I am unsure how to phrase the question.
EDIT:
The solution provided by #ce.teuf is very close to what I need. However I need the string to be able to rejoin at position 1 in the graph. SO if you look at this list here:
lsofls = [0, 0, 0, 0, 0], ['a', 0, 0, 0, 0], [0, 'a', 0, 0, 0], ['b', 'a', 0, 0, 0], ['b', 0, 'a', 0, 0], ['b', 0, 0, 'a', 0], [0, 'b', 0, 'a', 0], ['c', 'b', 0, 'a', 0], ['c', 0, 'b', 'a', 0], [0, 'c', 'b', 'a', 0], ['d', 'c', 'b', 'a', 0], ['d', 'c', 'b', 0, 'a'], ['d', 'c', 'b', 0, 0], ['d', 'c', 0, 'b', 0], ['d', 'c', 0, 0, 'b'], ['d', 0, 'c', 0, 'b'], [0, 'd', 'c', 0, 'b'], ['a', 'd', 'c', 0, 'b']
for s in lsofls:
print(s)
So I need a way for each string to rejoin in the graph if that makes sense.
Using numpy essentially :
import numpy as np
import matplotlib.pyplot as plt
z = [[0, 0, 0, 0, 0],
['a', 0, 0, 0, 0],
[0, 'a', 0, 0, 0],
['b', 'a', 0, 0, 0],
['b', 0, 'a', 0, 0],
['b', 0, 0, 'a', 0],
[0, 'b', 0, 'a', 0],
['c', 'b', 0, 'a', 0],
['c', 0, 'b', 'a', 0],
[0, 'c', 'b', 'a', 0],
['d', 'c', 'b', 'a', 0],
['d', 'c', 'b', 0, 'a'],
['d', 'c', 'b', 0, 0],
['d', 'c', 0, 'b', 0]]
flat_list = [item for sublist in z for item in sublist]
series = list(set(flat_list))[1:]
y_len = len(z[0])
z2 = np.rot90(z)
for s in series:
z3 = np.argwhere(z2== s)
z3[:, 0] = (z3[:, 0] - y_len) * -1
y, x = z3[:, 0], z3[:, 1]
plt.plot(np.sort(x[::-1]), y[::-1])
plt.show()

How to convert "index" to a "string"?

Here is a working example code:
data = {'name': ['Joe', 'Mike', 'Jack', 'Hack', 'David', 'Marry', 'Wansi', 'Sidy', 'Jason', 'Even'],
'age': [25, 32, 18, np.nan, 15, 20, 41, np.nan, 37, 32],
'gender': [1, 0, 1, 1, 0, 1, 0, 0, 1, 0],
'isMarried': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
print(df)
print("---------------------------")
obj = df[df["age"]>40].index.format()
print("obj is",type(obj))
I hope obj as a string (), but the above result is list().
What should I do to correct it ?
You can simply put obj = obj[0] and it will then become a string
data = {'name': ['Joe', 'Mike', 'Jack', 'Hack', 'David', 'Marry', 'Wansi', 'Sidy', 'Jason', 'Even'],
'age': [25, 32, 18, np.nan, 15, 20, 41, np.nan, 37, 32],
'gender': [1, 0, 1, 1, 0, 1, 0, 0, 1, 0],
'isMarried': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
print(df)
print("---------------------------")
obj = df[df["age"]>40].index.format()
obj = obj[0]
print("obj is",type(obj))
obj = df[df["age"]>40].index.format()[0]
print("obj is",obj,type(obj))
obj is g <class 'str'>

One-liner for splicing lists

I am looking for a pythonic way to splice two lists based on the values in one of them. One-liner would be preferred.
Say we have
[0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
and
['a', 'b', 'c', 'd', 'e', 'f']
and the result has to look like this:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']
You can use next with iter:
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f']
new_d = iter(d1)
result = [i if not i else next(new_d) for i in d]
Output:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']
One liner:
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f']
print( [d1.pop(0) if i==1 else i for i in d] )
Prints:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']
EDIT (More efficient approach):
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f'][::-1]
print( [d1.pop() if i==1 else i for i in d[::-1]] )
Similar to #Ajax1234's answer, but on a single line:
d = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
d1 = ['a', 'b', 'c', 'd', 'e', 'f']
result = [d[i] if not d[i] else d1[d[:i].count(1)] for i in range(len(d))]
Result:
[0, 'a', 'b', 0, 0, 'c', 'd', 'e', 0, 'f']

generate a list with 6-dimensional unique elements in python

I need a list with 6 unique elements, like 000001, 000002, 000003 etc. It isn't neccessary have to be in digits, it can be a string, like AAAAAA, AAAAAB, ABCDEF etc.
If I generate a list with np.arange() I won't have 6-dimensional elements. I only decided to use 'for' cicles like
but I think there are a lot of more convenient ways to do this.
You need a cartesian product of the string "ABCDEF" by itself, taken five times (in other words, the product of six identical strings). It can be calculated using product() function from module itertools. The result of the product is a list of 6-tuples of individual characters. The tuples are converted to strings with join().
from itertools import product
symbols = "ABCDEF"
[''.join(x) for x in product(*([symbols] * len(symbols)))]
#['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE',
# 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD',...
# 'FFFFFA', 'FFFFFB', 'FFFFFC', 'FFFFFD', 'FFFFFE', 'FFFFFF']
You can change the value of symbols to any other combination of distinct characters.
You can use the function combinations_with_replacement():
from itertools import combinations_with_replacement
list(map(''.join, combinations_with_replacement('ABC', r=3)))
# ['AAA', 'AAB', 'AAC', 'ABB', 'ABC', 'ACC', 'BBB', 'BBC', 'BCC', 'CCC']
If you need all possible combinations use the function product():
from itertools import product
list(map(''.join, product('ABC', repeat=3)))
# ['AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB', 'ACC', 'BAA', 'BAB', 'BAC', 'BBA', 'BBB', 'BBC', 'BCA', 'BCB', 'BCC', 'CAA', 'CAB', 'CAC', 'CBA', 'CBB', 'CBC', 'CCA', 'CCB', 'CCC']
You can use np.unravel_index to get an index array:
idx = np.array(np.unravel_index(np.arange(30000), 6*(6,)), order='F').T
idx
# array([[0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 1],
# [0, 0, 0, 0, 0, 2],
# ...,
# [3, 5, 0, 5, 1, 3],
# [3, 5, 0, 5, 1, 4],
# [3, 5, 0, 5, 1, 5]])
You can replace the indices with more or less anything you like afterwards:
symbols = np.fromiter('ABCDEF', 'U1')
symbols
# array(['A', 'B', 'C', 'D', 'E', 'F'], dtype='<U1')
symbols[idx]
# array([['A', 'A', 'A', 'A', 'A', 'A'],
# ['A', 'A', 'A', 'A', 'A', 'B'],
# ['A', 'A', 'A', 'A', 'A', 'C'],
# ...,
# ['D', 'F', 'A', 'F', 'B', 'D'],
# ['D', 'F', 'A', 'F', 'B', 'E'],
# ['D', 'F', 'A', 'F', 'B', 'F']], dtype='<U1')
If you need the result as a list of words:
final = symbols[idx].view('U6').ravel().tolist()
final[:20]
# ['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE', 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD', 'AAAABE', 'AAAABF', 'AAAACA', 'AAAACB', 'AAAACC', 'AAAACD', 'AAAACE', 'AAAACF', 'AAAADA', 'AAAADB']

Transform character array into integers with python

I have a piece of data which is in the form of character array:
cgcgcg
aacacg
cgcaag
cgcacg
agaacg
cacaag
agcgcg
cgcaca
cacaca
agaacg
cgcacg
cgcgaa
Notice that each column consists of only two types characters. I need to transform them into integers 0 or 1, based on their percentage in the column. For instance in the 1st column, there are 8 c's and 4 a's, so c is in majority, then we need to code it as 0 and the other as 1.
Using zip() I can transpose this array in python, and get each column into a list:
In [28]: lines = [l.strip() for l in open(inputfn)]
In [29]: list(zip(*lines))
Out[29]:
[('c', 'a', 'c', 'c', 'a', 'c', 'a', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'g', 'g', 'g', 'a', 'g', 'g', 'a', 'g', 'g', 'g'),
('c', 'c', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'a', 'a', 'a', 'a', 'g', 'a', 'a', 'a', 'a', 'g'),
('c', 'c', 'a', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'c', 'a'),
('g', 'g', 'g', 'g', 'g', 'g', 'g', 'a', 'a', 'g', 'g', 'a')]
It's not necessary to transform them strictly into integers, i.e. 'c' to '0' or 'c' to int(0) will both be ok, since we are going to write them to a tab delimited file anyway.
Something like this:
lis = [('c', 'a', 'c', 'c', 'a', 'c', 'a', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'g', 'g', 'g', 'a', 'g', 'g', 'a', 'g', 'g', 'g'),
('c', 'c', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'a', 'c', 'c'),
('g', 'a', 'a', 'a', 'a', 'a', 'g', 'a', 'a', 'a', 'a', 'g'),
('c', 'c', 'a', 'c', 'c', 'a', 'c', 'c', 'c', 'c', 'c', 'a'),
('g', 'g', 'g', 'g', 'g', 'g', 'g', 'a', 'a', 'g', 'g', 'a')]
def solve(lis):
for row in lis:
item1, item2 = set(row)
c1, c2 = row.count(item1), row.count(item2)
dic = {item1 : int(c1 < c2), item2 : int(c2 < c1)}
yield [dic[x] for x in row]
...
>>> list(solve(lis))
[[0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1]]
Using collections.Counter:
from collections import Counter
def solve(lis):
for row in lis:
c = Counter(row)
maxx = max(c.values())
yield [int(c[x] < maxx) for x in row]
...
>>> pprint(list(solve(lis)))
[[0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1]]

Categories

Resources