Remove Duplicates from 3 deep nested list - python - sympy - python

Im working with sympy and I have a list with some duplicates (the order doesn't matter, I still consider them duplicates) and Im looking for a way to remove them.
The list is as fallows,
A=[[[m, b], [f, g]],
[[g, h], [f, b]],
[[f, g], [m, b]]]
I would consider [[m, b], [f, g]] and [[f, g], [m, b]] as the same and am trying to figure out a way to to make a list with them removed. It would look like this,
B=[[[m, b], [f, g]],
[[g, h], [f, b]]].
It dosnt matter which of the duplicate it keeps, so long as only 1 remains.
Ive tried using the set function but it gives out
TypeError: unhashable type: 'list' error and Im not sure sure. Any input or advice is apprecaited.

A = [[['m', 'b'], ['f', 'g']], [['g', 'h'], ['f', 'b']], [['f', 'g'], ['m', 'b']], [['l', 'k'], ['d', 'c']]]
B = []
C = []
for i in A:
for j in i:
if j not in B:
B = B + [j]
c = 0
c1 = 1
counter = int(len(B) / 2)
for k in range(counter):
C.append([B[k+c], B[k+c1]])
c = c + 1
c1 = c + 1
print(B)
print(C)

Related

How to order a nested list in R

I need the code to be in R
Exemple
I have a list:
[[0,'A',50.1],
[1,'B',50.0],
[2,'C',50.2],
[3,'D',50.7],
[4,'E',50.3]]
I want to order it based on the 3rd elemnts only so I get a result like this
[[1,'B',50.0],
[0,'A',50.1],
[2,'C',50.2],
[4,'E',50.3],
[3,'D',50.7]]
and then reorder the index so the Final result would be
Final = [[0,'B',50.0],
[1,'A',50.1],
[2,'C',50.2],
[3,'E',50.3],
[4,'D',50.7]]
and then I have the indexes in some grouping
G = [[0,1],[1,3][2,3,4]]
I want based on G as indexes of Final have the Grouping like this
[['B','A'],['A','E']['C','E','D']]
I already have the code in python, but I need the same code in R
L = [[i, *x[1:]] for i, x in enumerate(sorted(L, key=lambda x: x[2]))]
print (L)
[[0, 'B', 50.0], [1, 'A', 50.1], [2, 'C', 50.2], [3, 'E', 50.3], [4, 'D', 50.7]]
out = [[L[y][1] for y in x] for x in G]
print (out)
[['B', 'A'], ['A', 'E'], ['C', 'E', 'D']]
You can try:
LL <- L |>
as.data.frame() |>
arrange(x) |>
mutate(id=sort(L$id))
lapply(G, \(x) LL$v1[LL$id %in% x])
[[1]]
[1] "B" "A"
[[2]]
[1] "A" "E"
[[3]]
[1] "C" "E" "D"
Data:
L <- list(id=0:4, v1=LETTERS[1:5], x = c(50.1, 50.0, 50.2, 50.7, 50.3))
G <- list(c(0,1), c(1,3), c(2,3,4))
Libraries:
library(dplyr)

Compare 2 list columns in pandas and find the diff

DataFrame
df = pd.DataFrame({
'Id': [1,1,1,1,2,2,3,4,4,4],
'Col_1':['AD11','BZ23','CQ45','DL36','LM34','MM23','DL35','AD11','BP23','CQ45'],
'Col_2':['AD11',nan,nan,'DL36',nan,nan,'DL35',nan,nan,'CQ45']]
}, columns=['Id','Col_1','Col_2'])
Looks Like
Original data frame looks like this
Please note that Col_1 & Col_2 has alpha numeric values and has more than one character. For eg : 'AD34' , 'EC45', etc.
After groupby and reset index
g = df.groupby('Id')['Col_1','Col_2'].agg(['unique'])
g= g.reset_index(drop=True)
g.columns = [''.join(col).strip() for col in g.columns.values]
I want to
store results that match in Match column
results that do not match No_match column
Result :
I tried to use some logic from this
post but doesnt solve my issue.
Is there a better way to do the transformation for my requirement ?
Appreciate the help.
First remove missing values from list and then use set.intersection and set.difference:
g = df.groupby('Id')[['Col_1','Col_2']].agg([lambda x: x.dropna().unique().tolist()])
g= g.reset_index(drop=True)
g.columns = [f'{a}_unique' for a, b in g.columns]
z = list(zip(g['Col_1_unique'], g['Col_2_unique']))
g['Match'] = [list(set(a).intersection(b)) for a, b in z]
g['No_Match'] = [list(set(a).difference(b)) for a, b in z]
print (g)
Col_1_unique Col_2_unique Match No_Match
0 [AD11, BZ23, CQ45, DL36] [AD11, DL36] [DL36, AD11] [CQ45, BZ23]
1 [LM34, MM23] [] [] [LM34, MM23]
2 [DL35] [DL35] [DL35] []
3 [AD11, BP23, CQ45] [CQ45] [CQ45] [AD11, BP23]
Here, my simple logic is to compare both list, by same value on same positions.
Such as, [a,b,c] & [b,a,c] so match will be [c] only.
Code:
df = pd.DataFrame({
'Id': [1,1,1,1,2,2,3,4,4,4],
'Col_1':['A','B','C','D','L','M','D','A','B','C'],
'Col_2':['A','','','D','','','D','', '', 'C']
}, columns=['Id','Col_1','Col_2'])
#In order to compare list by values and position I needed to add unique value on null value
#So the both list length would be same
df['Col_2'] = df.apply(lambda x : x.name if x.Col_2=='' else x.Col_2, axis=1)
g = df.groupby('Id')['Col_1','Col_2'].agg(['unique'])
g= g.reset_index(drop=True)
g.columns = [''.join(col).strip() for col in g.columns.values]
g['Match'] = g.apply(lambda x: [a for a, b in zip(x.Col_1unique, x.Col_2unique) if a==b], axis=1)
g['Not_Match'] = g.apply(lambda x: [a for a, b in zip(x.Col_1unique, x.Col_2unique) if a!=b], axis=1)
g
Output:
Col_1unique Col_2unique Match Not_Match
0 [A, B, C, D] [A, 1, 2, D] [A, D] [B, C]
1 [L, M] [4, 5] [] [L, M]
2 [D] [D] [D] []
3 [A, B, C] [7, 8, C] [C] [A, B]
Please try to use the below code but make it more efficient, for time being i tried the below,
import pandas as pd
df = pd.DataFrame({
'Id': [1, 1, 1, 1, 2, 2, 3, 4, 4, 4],
'Col_1': ['A', 'B', 'C', 'D', 'L', 'M', 'D', 'A', 'B', 'C'],
'Col_2': ['A', 'nan', 'nan', 'D', 'nan', 'nan', 'D', 'nan', 'nan', 'C']})
print(df)
df['Match'] = ''
df['No-Match'] = ''
for i, row in df.iterrows():
if row['Col_1'] == row['Col_2']:
df.at[i, 'Match'] = row['Col_1']
else:
df.at[i, 'No-Match'] = row['Col_1']
print(df)
g = df.groupby('Id')['Id','Col_1','Col_2','Match','No-Match'].agg(['unique'])
g= g.reset_index(drop=True)
g.columns = [''.join(col).strip() for col in g.columns.values]
print(g)
Once you run this, you will get the below output:
Idunique Col_1unique Col_2unique Matchunique No-Matchunique
0 [1] [A, B, C, D] [A, nan, D] [A, D] [B, C]
1 [2] [L, M] [nan] [] [L, M]
2 [3] [D] [D] [D] []
3 [4] [A, B, C] [nan, C] [C] [A, B]

Pandas Dataframe explode List, add new columns and count values

I'm a little bit stuck. I habe a Dataframe with a list in a column.
id
list
1
[a, b]
2
[a,a,a,b]
3
c,b,b
4
[c,a]
5
[f,f,b]
I have the values, a, b, c, d, e, f in general.
I want to count if two values are in a list togehter and also if a value appears more than once in that list.
I want to get that to create a heatmap, with all values in x and y axis. and the counts where e.g. a is x times in a list with itself or e.g. a and b are x times togehter.
I tried this so far, but it is not exactly the solution i want.
Make ne columns and count values
df['a'] = df['list'].explode().str.contains('a').groupby(level=0).any().astype('int')
df['b'] = df['list'].explode().str.contains('b').groupby(level=0).any().astype('int')
df['c'] = df['list'].explode().str.contains('c').groupby(level=0).any().astype('int')
df['d'] = df['list'].explode().str.contains('d').groupby(level=0).any().astype('int')
df['e'] = df['list'].explode().str.contains('e').groupby(level=0).any().astype('int')
df['f'] = df['list'].explode().str.contains('f').groupby(level=0).any().astype('int')
here i get the first problem, i create a new df with rows names the list and counting the values in the list, but I also get the count if i only have the value once in the list
make x axis
df_explo = df.explode(['list'],ignore_index=True)
get sum of all
df2=df_explo.groupby(['list']).agg({'a':'sum','b':'sum','c':'sum','d':'sum','e':'sum','f':'sum').reset_index()
set index to list
df3 = df2.set_index('list')
create heatmap
sns.heatmap(df3,cmap='RdYlGn_r', linewidths=0.5,annot=True,fmt="d")
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
from itertools import combinations
data = [
['a', 'b'],
['a', 'a', 'a', 'b'],
['b', 'b', 'b'],
['c', 'a'],
['f', 'f', 'b']
]
letters = ['a', 'b', 'c', 'd', 'e', 'f']
duplicate_occurrences = pd.DataFrame(0, index=[0], columns=letters)
co_occurrences = pd.DataFrame(0, index=letters, columns=letters)
for l in data:
duplicates = [k for k, v in Counter(l).items() if v > 1]
for d in duplicates:
duplicate_occurrences[d] += 1
co = list(combinations(set(l), 2))
for a, b in co:
co_occurrences.loc[a, b] += 1
co_occurrences.loc[b, a] += 1
plt.figure(figsize=(7, 1))
sns.heatmap(duplicate_occurrences, cmap='RdYlGn_r', linewidths=0.5, annot=True, fmt="d")
plt.title('Duplicate Occurrence Counts')
plt.show()
sns.heatmap(co_occurrences, cmap='RdYlGn_r', linewidths=0.5, annot=True, fmt="d")
plt.title('Co-Occurrence Counts')
plt.show()
The first plot shows how often each letter occurs at least twice in a list, the second shows how often each pair of letters occurs together in a list.
In case you want to plot the duplicate occurrences on the diagonal, you could do it e.g. as follows:
df = pd.DataFrame(0, index=letters, columns=letters)
for l in data:
for k, v in Counter(l).items():
if v > 1:
df.loc[k, k] += 1
for a, b in combinations(set(l), 2):
df.loc[a, b] += 1
df.loc[b, a] += 1
sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True, fmt="d")

A column in my dataframe does not seem to correspond to the input List (python)

I want to assign one of the columns of my dataframe to a list. I used the code below.
listone = [['a', 'b', 'c'], ['m', 'g'], ['h'], ['y', 't', 'r']]
df['Letter combinations'] = listone
The 'Letter Combinations' column in the dataframe doesn't correspond to the list, instead seems to assign random elements to each row in the column. I was wondering if this method indexes the elements differently causing a change in the order or if there is something wrong with my code. Any help would be appreciated!
Edit: Here is my complete code
listone = [[a, b, c], [m, g], [h], [y, t, r]]
numbers = [1, 2, 3, 4]
my_matrix = {'Numbers': numbers}
sample = pd.DataFrame(my_matrix)
sample['Letter combinations'] = listone
sample
My output looks like:
```
Numbers Letter combination
0 1 [b]
1 2 [m, g]
2 3 []
3 4 [r]
```
You need to make the listone to be a series. Ie:
sample['Letter combinations'] = pd.Series(listone)
sample
Numbers Letter combinations
0 1 [a, b, c]
1 2 [m, g]
2 3 [h]
3 4 [y, t, r]

Create array which replace by the header with 1 and skip zero for each row

a b c
1 1 0
0 0 1
1 0 1
Where a, b and c are the headers
I have the data frame that is shown above, and I need result in the format below:
[[a,b],
[c],
[a,c]]
As you can see, the headers with value 1 are present and the ones with value 0(zero) are skipped.
Here's one way
In [96]: df.astype(bool).apply(lambda x: df.columns[x.tolist()].tolist(), axis=1)
Out[96]:
0 [a, b]
1 [c]
2 [a, c]
dtype: object
For array of values, use .values
In [102]: df.astype(bool).apply(lambda x: df.columns[x.tolist()].tolist(), axis=1)
...: .values
Out[102]: array([['a', 'b'], ['c'], ['a', 'c']], dtype=object)
Or, use iterrows
In [114]: [x[x].index.tolist() for i,x in df.astype(bool).iterrows()]
Out[114]: [['a', 'b'], ['c'], ['a', 'c']]
main_list = []
for ind in df.index:
sublist = []
for column in df.columns:
if df.loc[ind, column]:
sublist.append(column)
main_list.append(sublist)
output:
[['a', 'b'], ['c'], ['a', 'c']]
hope it helps

Categories

Resources