How to order a nested list in R - python

I need the code to be in R
Exemple
I have a list:
[[0,'A',50.1],
[1,'B',50.0],
[2,'C',50.2],
[3,'D',50.7],
[4,'E',50.3]]
I want to order it based on the 3rd elemnts only so I get a result like this
[[1,'B',50.0],
[0,'A',50.1],
[2,'C',50.2],
[4,'E',50.3],
[3,'D',50.7]]
and then reorder the index so the Final result would be
Final = [[0,'B',50.0],
[1,'A',50.1],
[2,'C',50.2],
[3,'E',50.3],
[4,'D',50.7]]
and then I have the indexes in some grouping
G = [[0,1],[1,3][2,3,4]]
I want based on G as indexes of Final have the Grouping like this
[['B','A'],['A','E']['C','E','D']]
I already have the code in python, but I need the same code in R
L = [[i, *x[1:]] for i, x in enumerate(sorted(L, key=lambda x: x[2]))]
print (L)
[[0, 'B', 50.0], [1, 'A', 50.1], [2, 'C', 50.2], [3, 'E', 50.3], [4, 'D', 50.7]]
out = [[L[y][1] for y in x] for x in G]
print (out)
[['B', 'A'], ['A', 'E'], ['C', 'E', 'D']]

You can try:
LL <- L |>
as.data.frame() |>
arrange(x) |>
mutate(id=sort(L$id))
lapply(G, \(x) LL$v1[LL$id %in% x])
[[1]]
[1] "B" "A"
[[2]]
[1] "A" "E"
[[3]]
[1] "C" "E" "D"
Data:
L <- list(id=0:4, v1=LETTERS[1:5], x = c(50.1, 50.0, 50.2, 50.7, 50.3))
G <- list(c(0,1), c(1,3), c(2,3,4))
Libraries:
library(dplyr)

Related

select element in a column A while column B does not have a value

I want to select products as long as it does not contain 0 in x.
Input:
test = pd.DataFrame(
[
['a', 0],
['a', 3],
['a', 4],
['b', 3],
['b', 2],
['c', 1],
['d', 0]
]
)
test.columns = ['product', 'x']
test.query("select distinct (product) where x not in (0) ")
expected out come:
b,c
How to do this in both pandas and SQL?
In SQL, you would use:
select product
from t
group by product
having min(x) > 0;
This works assuming x is never negative. A more general formulation is:
having sum(case when x = 0 then 1 else 0 end) = 0
In your case pandas can do with isin
test.loc[~test['product'].isin(test.loc[test.x.eq(0),'product']),'product'].unique()
Out[41]: array(['b', 'c'], dtype=object)
Or do with set
set(test['product'].tolist())-set(test.loc[test.x.eq(0),'product'].tolist())
Out[47]: {'b', 'c'}
If you want to filter your dataframe, you can use groupby with .any():
test[~test.groupby('product')['x'].transform(lambda x: x.eq(0).any())]
Output:
product x
b 3
b 2
c 1
If you only want to see unique values you can add ['product'].unique().tolist() at the end of the code which I pasted above.
Then we have the output:
['b', 'c']

Making equal size lists in Python

I have 3 different lists of unequal length.
I want to append the shorter lists with "X" and make sizes equal to the length of the longest list.
A = [10,20,30,40,50]
B = ["A", "B", "C"]
C = ["X1", "X2"]
After appending "X" , it should be like the following:
A = [10,20,30,40,50]
B = ["A", "B", "C", "X","X"]
C = ["P1", "P2", "X", "X", "X"]
I have used the below code for achieving it,
for i, a in enumerate(A):
if i < len(B):
pass
else:
B.append('X')
How can i do it efficiently in python ?
Use the extend method
B.extend(['X'] * (len(A)-len(B)))
Calculate the max length and for each list, append the delta.
In Python, List has a binary operator + to concat multiple lists together, as well as * to tile itself.
A = [10,20,30,40,50]
B = ["A", "B", "C"]
C = ["X1", "X2"]
max_length = max(max(len(A), len(B)), len(C))
A += ['X'] * (max_length - len(A))
B += ['X'] * (max_length - len(B))
C += ['X'] * (max_length - len(C))
Then organize them using a container list, for less repeated codes and better extensibility.
A = [10,20,30,40,50]
B = ["A", "B", "C"]
C = ["X1", "X2"]
arrays = [A, B, C]
max_length = 0
for array in arrays:
max_length = max(max_length, len(array))
for array in arrays:
array += ['X'] * (max_length - len(array))
Result:
print(A) # [10, 20, 30, 40, 50]
print(B) # ['A', 'B', 'C', 'X', 'X']
print(C) # ['X1', 'X2', 'X', 'X', 'X']
The python itertools module has a lot of nifty functions that are good for cases like this. For example:
>>> from itertools import izip_longest, izip
>>> A = [10, 20, 30, 40, 50]
>>> B = ["A", "B", "C"]
>>> C = ["X1", "X2"]
>>> A, B, C = (list(x) for x in (izip(*izip_longest(A, B, C, fillvalue='X'))))
>>> A
[10, 20, 30, 40, 50]
>>> B
['A', 'B', 'C', 'X', 'X']
>>> C
['X1', 'X2', 'X', 'X', 'X']
Write function that makes this for you
A = [10, 20, 30, 40, 50]
B = ["A", "B", "C"]
C = ["X1", "X2"]
def extend_with_extra_elements(*some_lists):
max_some_lists_length = max(map(len, some_lists))
for some_list in some_lists:
extra_elements_count = max_some_lists_length - len(some_list)
extra_elements = ['X'] * extra_elements_count
yield some_list + extra_elements
A, B, C = extend_with_extra_elements(A, B, C)
efficient enough
Try to use max() to get the max length and then append list to B and C.
If you want to replace X with P, you can use a list comprehension [i.replace('X','P') for i in C] to get ['P1','P2']:
>>> m=max(len(A),len(B),len(C))
>>> B+['X']*(m-len(B))
['A', 'B', 'C', 'X', 'X']
>>> [i.replace('X','P') for i in C]+['X']*(m-len(C))
['P1', 'P2', 'X', 'X', 'X']

python compare items in 2 list of different length - order is important

list_1 = ['a', 'a', 'a', 'b']
list_2 = ['a', 'b', 'b', 'b', 'c']
so in the list above, only items in index 0 is the same while index 1 to 4 in both list are different. also, list_2 has an extra item 'c'.
I want to count the number of times the index in both list are different, In this case I should get 3.
I tried doing this:
x = 0
for i in max(len(list_1),len(list_2)):
if list_1[i]==list_2[i]:
continue
else:
x+=1
I am getting an error.
Use the zip() function to pair up the lists, counting all the differences, then add the difference in length.
zip() will only iterate over the items that can be paired up, but there is little point in iterating over the remainder; you know those are all to be counted as different:
differences = sum(a != b for a, b in zip(list_1, list_2))
differences += abs(len(list_1) - len(list_2))
The sum() sums up True and False values; this works because Python's boolean type is a subclass of int and False equals 0, True equals 1. Thus, for each differing pair of elements, the True values produced by the != tests add up as 1s.
Demo:
>>> list_1 = ['a', 'a', 'a', 'b']
>>> list_2 = ['a', 'b', 'b', 'b', 'c']
>>> sum(a != b for a, b in zip(list_1, list_2))
2
>>> abs(len(list_1) - len(list_2))
1
>>> difference = sum(a != b for a, b in zip(list_1, list_2))
>>> difference += abs(len(list_1) - len(list_2))
>>> difference
3
You can try with this :
list1 = [1,2,3,5,7,8,23,24,25,32]
list2 = [5,3,4,21,201,51,4,5,9,12,32,23]
list3 = []
for i in range(len(list2)):
if list2[i] not in list1:
pass
else :
list3.append(list2[i])
print list3
print len(list3)
As ZdaR commented, you should get 3 as the result and zip_longest can help here if you don't have Nones in the lists.
from itertools import zip_longest
list_1=['a', 'a', 'a', 'b']
list_2=['a', 'b', 'b', 'b', 'c']
x = sum(a != b for a,b in zip_longest(list_1,list_2))
Can i try this way using for loop:
>>> count = 0
>>> ls1 = ['a', 'a', 'a', 'b']
>>> ls2 = ['a', 'b', 'b', 'b', 'c']
>>> for i in range(0, max(len(ls1),len(ls2)), 1):
... if ls1[i:i+1] != ls2[i:i+1]:
... count += 1
...
>>> print count
3
>>>
Or try this (didn't change the lists):
dif = 0
for i in range(len(min(list_1, list_2))):
if list_1[i]!=list_2[i]:
dif+=1
#print(list_1[i], " != ", list_2[i], " --> Dif = ", dif)
dif+=(len(max(list_1, list_2)) - len(min(list_1, list_2)))
print("Difference = ", dif)
(Output: Difference = 3)
Not much better, but here's another option
if len(a) < len(b):
b = b[0:len(a)]
else:
a = a[0:len(b)]
correct = sum(a == b)

How to cluster list-of-list by distance condition in Python

I have the following list of lists that contains 6 entries:
lol = [['a', 3, 1.01],
['x', 5, 1.00],
['k', 7, 2.02],
['p', 8, 3.00],
['b', 10, 1.09],
['f', 12, 2.03]]
Each sublist in lol contains 3 elements:
['a', 3, 1.01]
e1 e2 e3
The list above is already sorted according to e2 (i.e, 2nd element)
I'd like to 'cluster' the above list following roughly these steps:
Pick the lowest entry (wrt. e2) in lol as the key of first cluster
Assign that as first member of the cluster (dictionary of list)
Calculate the difference current e3 in next list with first member
of existing clusters.
If the difference is less than threshold, assign that list as
the member of the corresponding cluster
Else, create new cluster with current list as new key.
Repeat the rest until finish
The final result will look like this, with threshold <= 0.1.
dol = {'a':['a', 'x', 'b'],
'k':['k', 'f'],
'p':['p']}
I'm stuck with this, what's the right way to do it:
import json
from collections import defaultdict
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x', 5, 1.00], ['k', 7, 2.02],
['p', 8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
dol = defaultdict(list)
for thelist in lol:
e1, e2, e3 = thelist
if tmp_e1 == "-":
tmp_e1 = e1
else:
diff = abs(tmp_e3 - e3)
if diff > thres:
tmp_e1 = e1
dol[tmp_e1].append(e1)
tmp_e1 = e1
tmp_e3 = e3
print json.dumps(dol, indent=4)
I would first ensure lol is sorted on second element, then iterate keeping in the list only what in not in threshold from first element :
import json
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],
['p',8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
# ensure lol is sorted
lol.sort(key = (lambda x: x[1]))
dol = {}
while len(lol) > 0:
x = lol.pop(0)
lol2 = []
dol[x[0]] = [ x[0] ]
for i in lol:
if abs(i[2] - x[2]) < thres:
dol[x[0]].append(i[0])
else:
lol2.append(i)
lol = lol2
print json.dumps(dol, indent=4)
Result :
{
"a": [
"a",
"x",
"b"
],
"p": [
"p"
],
"k": [
"k",
"f"
]
}
Letting e2/e3 aside, here's a rough draft.
First generator groups data by value, it does need data to be sorted by value though.
Then an example use, first raw and then with data re-sorted by value.
In [32]: def cluster(lol, threshold=0.1):
cl, start = None, None
for e1, e2, e3 in lol:
if cl and abs(start - e3) <= threshold:
cl.append(e1)
else:
if cl: yield cl
cl = [e1]
start = e3
if cl: yield cl
In [33]: list(cluster(lol))
Out[33]: [['a', 'x'], ['k'], ['p'], ['b'], ['f']]
In [34]: list(cluster(sorted(lol, key = lambda ar:ar[-1])))
Out[34]: [['x', 'a', 'b'], ['k', 'f'], ['p']]

Python: Looking for more efficient method of sorting list into sublists accounting for missing input

I'm still somewhat new to Python 3, and while I have something that works, I think it could be a lot more efficient and readable.
For example, if I have the input:
A1, B1, C1, A2, B2, A3, C3, C4
I want to convert this into columns (so I can ultimately put it into an excel spreadsheet) that look like this:
1 A B C
2 A B None
3 A None C
4 None None C
My code looks like this:
locations = ["A", "B", "C"]
log = [["A",1],["B",1],["C",1],["A",2],["B",2],["A",3],["C",3],["C",4]]
day = [ [] for x in range(10) ] # can I dynamically allocate this as I go?
i = 0
for index, element in enumerate(log):
if (log[index][1] != log[index-1][1] and index != 0): # if the number changes
for place in locations:
if place not in day[i]: # if something's missing
day[i].insert(locations.index(place),None) # insert a None where its missing
i += 1
if element[0] in locations:
day[i].append(element[0])
for place in locations: # the loop ends without doing the last list so I call
if place not in day[i]: # this again, is there a way to keep it in the loop?
day[i].insert(locations.index(place),None)
day = [x for x in day if x != []] # strips empty lists from list
columns = list(zip(*day)) # transposes matrix
And my output is:
[('A', 'A', 'A', None), ('B', 'B', None, None), ('C', None, 'C', 'C')]
So my question is: How can I make this more efficient? Can I allocate the lists inside the list as I go? And how to I keep it all inside the for loop?
Thanks in advance!
Here's an example that builds out the array while reading the log:
days = []
for loc, day in log:
for i in range(len(days), day):
days.append([i+1] + [None for _ in locations])
days[day - 1][1 + locations.index(loc)] = loc
print(days)
[[1, 'A', 'B', 'C'], [2, 'A', 'B', None], [3, 'A', None, 'C'], [4, None, None, 'C']]
I am not sure if this is more efficient, but it's more concise:
>>> locations = ["A", "B", "C"]
>>> log = [["A",1],["B",1],["C",1],["A",2],["B",2],["A",3],["C",3],["C",4]]
>>> maxi=max(i for [_,i] in log)
>>> d = {i:list() for i in locations} #d={'B': [], 'A': [], 'C': []}
>>> for [letter,i] in log:
... d[letter].append(i)
... #d={'B': [1, 2], 'A': [1, 2, 3], 'C': [1, 3, 4]}
>>> [tuple(letter if i in d[letter] else None for i in range(1,maxi+1)) for letter in locations]
[('A', 'A', 'A', None), ('B', 'B', None, None), ('C', None, 'C', 'C')]
I would do a dictionary with location tuples as keys, and start with a default None value.
columns = ("A","B","C")
rows = (1,2,3,4)
table = { (col, row):None for col in columns for row in rows}
Then you can edit your table by looping through your log,.. turning the location key/value in table that match a value in your log, into the respective log/table col value.
log = [["A",1],["B",1],["C",1],["A",2],["B",2],["A",3],["C",3],["C",4]]
for cell in log:
if tuple(cell) in table:
# if tuple(cell) in table.keys(): if python 2.7
table[tuple(cell)] = cell[0]
print [ tuple(table[col,row] for row in rows) for col in columns]
[('A', 'A', 'A', None), ('B', 'B', None, None), ('C', None, 'C', 'C')]

Categories

Resources