python fixed array of dynamic strings list - python

I would like to fill iteratively an array of fixed size where each item is a list of strings. For example, let's consider the following strings list:
arr = ['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
I want to obtain the following array of 3 items (no ordering is required):
res = [['A1', 'A2', 'A3', 'A4'],
['B2', 'B1'],
['C3', 'C1', 'C2']]
I have the following piece of code:
arr = ['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
res = [[]] * 3
for i in range(len(arr)):
# Calculate index corresponding to A, B or C
j = ord(arr[i][0])-65
# Extend corresponding string list
res[j].extend([arr[i]])
for i in range(len(res)):
print(res[i])
But I get this result:
['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
Where am I wrong please?
Thank you for your help!

You can use itertools.groupby and group the elements in the list (having been sorted) according to the first element. You can use operator.itemgetter to efficiently fetch the first substring in each string:
from itertools import groupby
from operator import itemgetter
[list(v) for k,v in groupby(sorted(arr), key=itemgetter(0))]
# [['A1', 'A2', 'A3', 'A4'], ['B1', 'B2'], ['C1', 'C2', 'C3']]

The problem is due to the following:
res = [[]] * 3 will create three lists, but all three are the same object. So whenever you append or extend one of them it will be added to "all" (they are all the same object after all).
You can easily check this by replacing it with:
res = [[],[],[]]
which will then give you the expected answer.
Consider these snippets:
res = [[]]*2
res[0].append(1)
print(res)
Out:
[[1], [1]]
While
res = [[],[]]
res[0].append(1)
print(res)
Out:
[[1], []]
Alternatively you can create the nested list like this: res = [[] for i in range(3)]

You can use list comprehension :
[[k for k in arr if k[0]==m] for m in sorted(set([i[0] for i in arr]))]
OUTPUT :
[['A1', 'A2', 'A3', 'A4'], ['B2', 'B1'], ['C3', 'C1', 'C2']]

Related

Split data in list based on condition

I have following list :
data = ['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3']
I want to split the list such that
split1 = ['A1', 'C3', 'B2', 'A2', 'C2', 'A3', 'C1', 'B1', 'B3']
split2 = ['D3', 'D2', 'D1']
Constraint is that no item with same prefix(A, B, etc.) can wind up in separate list. The data can be split in any ratio like 50-50, 80-20.
Here you go:
import numpy as np
data = np.array(['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3'])
# define some condition
condition = ['B', 'D']
boolean_selection = [np.any([ c in d for c in condition]) for d in data]
split1 = data[boolean_selection]
split2 = data[np.logical_not(boolean_selection)]

Append lists to list of lists

I want to append lists of dataframes in an existing list of lists:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
fr_list = [[] for x in range(2)]
fr_list[0].append(df1)
fr_list[0].append(df1)
fr_list[1].append(df1)
fr2 = [[] for x in range(2)]
fr2[0].append(df1)
fr2[1].append(df1)
fr_list.append(fr2) # <-- here is the problem
Output: fr_list = [[df1, df1], [df1], [fr2[0], fr2[1]]] List contains 3 elements
Expected: fr_list = [[df1, df1, fr2[0]],[df1, fr2[1]]] List contains 2 elements
fr_list=[a+b for a,b in zip(fr_list,fr2)]
Replace fr_list.append(fr2) with the above code
Explanation: using zip & list comprehension, add corresponding lists in fr_list & fr2. What you did was appended the outer list in fr_list with outer list in fr & not the inner lists.

Python List of Lists why do I get this Error: IndexError: list index out of range? [duplicate]

This question already has answers here:
Python: Editing list while iterating over it
(7 answers)
Closed 2 years ago.
I have made a function that is supposed to delete any similar occurrences in the lists of list but I was surprised by the following ERROR: IndexError: list index out of range why it is so?
for example:
input: [['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
expected output:[['a0', 'a3'], ['a1', 'a2'], ['a3', 'a1']]
def getList(a):
b=a
lena = len(a)
print(len(a))
for i in range(lena):
for j in range (i+1,lena):
print(i,j)
print(a[i],a[j])
if(a[i][0],a[i][1])==(a[j][1],a[j][0]) or (a[i][0],a[i][1])==(a[j][0],a[j][1]):
print(a)
a = [['a0', 'a3'], ['a1', 'a2'], ['a0','a3'], ['a2','a1'], ['a3', 'a1']]
getList(a)
OUTPUT:
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
5
0 1
['a0', 'a3'] ['a1', 'a2']
0 2
['a0', 'a3'] ['a0', 'a3']
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
0 3
['a0', 'a3'] ['a2', 'a1']
0 4
['a0', 'a3'] ['a3', 'a1']
1 2
['a1', 'a2'] ['a0', 'a3']
1 3
['a1', 'a2'] ['a2', 'a1']
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
1 4
['a1', 'a2'] ['a3', 'a1']
2 3
['a0', 'a3'] ['a2', 'a1']
2 4
['a0', 'a3'] ['a3', 'a1']
3 4
['a2', 'a1'] ['a3', 'a1']
When I Modify the Code by adding b.pop(j) or anything as for example:
def getList(a):
b=a
lena = len(a)
print(len(a))
for i in range(lena):
for j in range (i+1,lena):
print(i,j)
print(a[i],a[j])
if(a[i][0],a[i][1])==(a[j][1],a[j][0]) or (a[i][0],a[i][1])==(a[j][0],a[j][1]):
print(a)
b.pop(j)
a = [['a0', 'a3'], ['a1', 'a2'], ['a0','a3'], ['a2','a1'], ['a3', 'a1']]
getList(a)
RESULT:
5
0 1
['a0', 'a3'] ['a1', 'a2']
0 2
['a0', 'a3'] ['a0', 'a3']
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
0 3
['a0', 'a3'] ['a3', 'a1']
0 4
Traceback (most recent call last):
File "C:/Users/I/Desktop/papers/test.py", line 21, in <module>
getList(a)
File "C:/Users/I/Desktop/papers/test.py", line 13, in getList
print(a[i],a[j])
IndexError: list index out of range
I am wondering what could be the problem?
Manipulating a list while iterating over it is a recipe for disaster and almost always can be avoided by accumulating results onto a separate structure, or at least making a copy of the original list for iteration purposes (b = a creates an alias, not a copy, which can be done with a.copy() or a[:]). When you pop, the list length changes and the iterator refers to nonexistent list elements.
Also, it's best not to conflate printing with programmatic output. The result of most algorithms should not be stdout. Instead, results should be written to a data structure and returned for the caller to use or dump as they see fit.
Another problem is efficiency: the nested loops mean O(n2) running time. Using extra space can give you a linear algorithm.
If you convert each sublist into tuples, then they become hashable and you can stick the data into a set to eliminate duplicates, then convert everything back to lists:
>>> [list(x) for x in set(tuple(sorted(x)) for x in a)]
[['a1', 'a2'], ['a1', 'a3'], ['a0', 'a3']]
The problem is that order is lost. If order should be kept, you might use a set as a lookup table:
>>> lookup = set()
>>> result = []
>>> for pair in a:
... key = tuple(sorted(pair))
... if key not in lookup:
... lookup.add(key)
... result.append(pair)
...
>>> result
[['a0', 'a3'], ['a1', 'a2'], ['a3', 'a1']]
If you're using CPython 3.6+, you can take advantage of dictionary ordering to improve the set approach shown above:
>>> [list(x) for x in dict([tuple(sorted(x)), None] for x in a)]
[['a0', 'a3'], ['a1', 'a2'], ['a1', 'a3']]
Pre-3.6 versions can use collections.OrderedDict to achieve the same result:
>>> from collections import OrderedDict
>>> [list(x) for x in OrderedDict([tuple(sorted(x)), None] for x in a)]
[['a0', 'a3'], ['a1', 'a2'], ['a1', 'a3']]
Your code workes fine if you call b = a.copy() and return b.
b = a means that the variable name b points to the same Object(list) as a.

Remove element from every list in a column in pandas dataframe based on another column

I'd like to remove values in list from column B based on column A, wondering how.
Given:
df = pd.DataFrame({
'A': ['a1', 'a2', 'a3', 'a4'],
'B': [['a1', 'a2'], ['a1', 'a2', 'a3'], ['a1', 'a3'], []]
})
I want:
result = pd.DataFrame({
'A': ['a1', 'a2', 'a3', 'a4'],
'B': [['a1', 'a2'], ['a1', 'a2', 'a3'], ['a1', 'a3'], []],
'Output': [['a2'], ['a1', 'a3'], ['a1'], []]
})
One way of doing that is applying a filtering function to each row via DataFrame.apply:
df['Output'] = df.apply(lambda x: [i for i in x.B if i != x.A], axis=1)
Another solution using iterrows():
for i,value in df.iterrows():
try:
value['B'].remove(value['A'])
except ValueError:
pass
print(df)
Output:
A B
0 a1 [a2]
1 a2 [a1, a3]
2 a3 [a1]
3 a4 []

haw can I find the smallest list among some lists generated by my program?

I wrote a program that generates some lists, something like
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'b5', 'b5', 'b4', 'D', 'c4']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b5']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b5', 'b5', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'D', 'c4']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b5', 'b5', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b3', 'b2']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'b5']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b5', 'b5', 'b4', 'b3', 'b2']
and I want to find the shortest list, the list that has the minimum number of elements
thanks,
You can use the min function:
min(data, key = len)
If you want to handle cases where there are multiple elements having the shortest length, you can sort the list in ascending order by length:
sorted(data, key = len)
You can sort it by list length then get the first element but this won't take into account lists that all have the same length.
smallest_list = sorted(list_of_list, key=len)[0]
Another would be get the length of the smallest list then use that as a filter
len_smallest_list = min(len(x) for x in list_of_list)
smallest_list = [list for list in list_of_list if len(list) == len_smallest_list]

Categories

Resources