keep random list elements and remove the others

keep random list elements and remove the others - python

I have a larg list of elements:
list= [a1, b, c, b, c, d, a2, c,...a3........]
And i want to remove a specific elements from it a1, a2, a3
suppose that i can get the indexes of the elements start with a
a_indexes = [0,6, ...]
Now, i want to remove most of these elements start with a a but not all of them, i want to keep 20 of them chosen arbitrary. How can i do so ?
I know that to remove an elements from a list list_ can use:
list_.remove(list[element position])
But i am not sure how to play with the a list.

Here's a an approach that will work if I understand the question correctly.
We have a list containing numerous items. We want to remove some elements that match a certain criterion - but not all.
So:
from random import sample
li = ['a','b','a','b','a','b','a']
dc = 'a'
keep = 1 # this is how many we want to keep in the list
if (k := li.count(dc) - keep) > 0: # sanity check
il = [i for i, v in enumerate(li) if v == dc]
for i in sorted(sample(il, k), reverse=True):
li.pop(i)
print(li)
Note how the sample is sorted. This is important because we're popping elements. If we do that in no particular order then we could end up removing the wrong elements.
An example of output might be:
['b', 'b', 'a', 'b']

Suppose you have this list:
li=['d', 'a', 'c', 'a', 'g', 'b', 'f', 'a', 'c', 'g', 'e', 'f', 'e', 'g', 'b', 'b', 'c', 'e', 'a', 'd', 'g', 'd', 'd', 'a', 'c', 'e', 'a', 'c', 'f', 'a', 'b', 'a', 'a', 'f', 'b', 'd', 'd', 'b', 'f', 'a', 'd', 'g', 'd', 'b', 'e']
You can define a character to delete and a count k of how many to delete:
delete='a'
k=3
Then use random.shuffle to generate a random group of k indices to delete:
idx=[i for i,c in enumerate(li) if c==delete]
random.shuffle(idx)
idx=idx[:k]
>>> idx
[3, 7, 31]
Then delete those indices:
new_li=[e for i,e in enumerate(li) if i not in idx]

Related

How to efficiently split a list that has a certain periodicity, into multiple lists?

For example the original list:
['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
We want to split the list into lists started with 'a' and ended with 'a', like the following:
['a','b','c','a']
['a','d','e','a']
['a','b','e','f','j','a']
['a','c','a']
The final ouput can also be a list of lists. I have tried a double for loop approach with 'a' as the condition, but this is inefficient and not pythonic.

One possible solution is using re (regex)
import re
l = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
r = [list(f"a{_}a") for _ in re.findall("(?<=a)[^a]+(?=a)", "".join(l))]
print(r)
# [['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]

You can do this in one loop:
lst = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
out = [[]]
for i in lst:
if i == 'a':
out[-1].append(i)
out.append([])
out[-1].append(i)
out = out[1:] if out[-1][-1] == 'a' else out[1:-1]
Also using numpy.split:
out = [ary.tolist() + ['a'] for ary in np.split(lst, np.where(np.array(lst) == 'a')[0])[1:-1]]
Output:
[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]

Firstly you can store the indices of 'a' from the list.
oList = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
idx_a = list()
for idx, char in enumerate(oList):
if char == 'a':
idx_a.append(idx)
Then for every consecutive indices you can get the sub-list and store it in a list
ans = [oList[idx_a[x]:idx_a[x + 1] + 1] for x in range(len(idx_a))]
You can also get more such lists if you take in-between indices also.

You can do this with a single iteration and a simple state machine:
original_list = list('kabcadeabefjacab')
multiple_lists = []
for c in original_list:
if multiple_lists:
multiple_lists[-1].append(c)
if c == 'a':
multiple_lists.append([c])
if multiple_lists[-1][-1] != 'a':
multiple_lists.pop()
print(multiple_lists)
[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]

We can use str.split() to split the list once we str.join() it to a string, and then use a f-string to add back the stripped "a"s. Note that even if the list starts/ends with an "a", this the split list will have an empty string representing the substring before the split, so our unpacking logic that discards the first + last subsequences will still work as intended.
def split(data):
_, *subseqs, _ = "".join(data).split("a")
return [list(f"a{seq}a") for seq in subseqs]
Output:
>>> from pprint import pprint
>>> testdata = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
>>> pprint(split(testdata))
[['a', 'b', 'c', 'a'],
['a', 'd', 'e', 'a'],
['a', 'b', 'e', 'f', 'j', 'a'],
['a', 'c', 'a']]

how to terminate the program when similar elements are found

I have the following 2D list:
test_list = [['A', 'B', 'C'], ['I', 'L', 'A', 'C', 'K', 'B'], ['J', 'I', 'A', 'B', 'C']]
I want to compare the 1st list elements of the 2D array test_list[0] with all other lists. If the elements ['A', 'B', 'C'] are present in all other lists then it should print any message such as "All elements are similar" and the program should terminate when it finds the above condition
I have tried this piece of code but it only needs a termination condition: this is only a best-case scenario in which all elements are present.
test_list = [['A', 'B', 'C'], ['I', 'L', 'A', 'C', 'K', 'B'], ['J', 'I', 'A', 'B', 'C']]
s = test_list[0]
for e in test_list[1:]:
if all(v in e for v in s):
print(e, "contains all elements of ", s)
#the program should terminate only if all the members are present.

You can use another all() call to test all of test_list[1:]
s = test_list[0]
if all(all(v in e for v in s) for e in test_list[1:]):
print("All elements are similar")

List of indices of tuples of tuples that contain certain tuples

I have a list list1 of 3 sublists of tuples like
[[(['A', 'B', 'A'], ['B', 'O', 'A']),
(['A', 'B', 'A'], ['B', 'A', 'O']),
(['A', 'B', 'O'], ['B', 'O', 'A']),
(['A', 'B', 'O'], ['B', 'A', 'O']),
(['A', 'B', 'A'], ['B', 'O', 'A']),
(['A', 'B', 'A'], ['B', 'A', 'O'])],
[(['A', 'B', 'A'], ['B', 'A', 'A']),
(['A', 'B', 'O'], ['B', 'A', 'A']),
(['A', 'B', 'A'], ['B', 'A', 'A'])],
[['A', 'B', 'A'], ['A', 'B', 'O']],
[['A', 'B', 'B']],
[['B', 'A', 'A']]]
Assume list2 = ['A', 'B', 'A']. My goal is to obtain a list of indices of any pairs of tuples (or a singleton set of tuple) in list1 that contain the tuple list2. I tried to use the enumerate function as follows but the result is not correct
print([i for i, j in enumerate(bigset) if ['A', 'B', 'A'] in j[0] or
['A', 'B', 'A'] == j[0] or [['A', 'B', 'A']] in j[0]])
Can anyone please help me with this problem? I'm quite stuck due to the mismatch in the different sizes of tuples of tuples appearing in list1.
Another question I have is: I want to find the total number of 3-element lists in list1. So if I do it by hand, the answer is 22. But how to do it in code? I guess we need to use two for loops?
Expected Output For list1 above with the given list2, we would get the list of indices containing list2 is [0,1,5,6,7,9,10].

Ok, so here you go
This use recursion because we don't know the depth of your list1 SO the index will be counted like this :
0,1
2,3,4,
6,7
8,
9,10,11,12
etc... (The same order you have by writing it in 1 row)
Here the result will be :
[0, 2, 8, 10, 12, 16, 18]
Now the code
def foo(l,ref):
global s
global indexes
for items in l: #if it's an element of 3 letters
if len(items)== 3 and len(items[0])==1:
if items == ref:
indexes.append(s) #save his index if it match the ref
s+= 1 #next index
else: #We need to go deeper
foo(items,ref)
return(s)
list1 = [[(['A', 'B', 'A'], ['B', 'O', 'A']),
(['A', 'B', 'A'], ['B', 'A', 'O']),
(['A', 'B', 'O'], ['B', 'O', 'A']),
(['A', 'B', 'O'], ['B', 'A', 'O']),
(['A', 'B', 'A'], ['B', 'O', 'A']),
(['A', 'B', 'A'], ['B', 'A', 'O'])],
[(['A', 'B', 'A'], ['B', 'A', 'A']),
(['A', 'B', 'O'], ['B', 'A', 'A']),
(['A', 'B', 'A'], ['B', 'A', 'A'])],
[['A', 'B', 'A'], ['A', 'B', 'O']],
[['A', 'B', 'B']],
[['B', 'A', 'A']]]
list2 = ['A', 'B', 'A']
indexes = []
s=0
count= foo(list1,list2)
print(indexes)
s is the index we are working on
count is the total amount of element (22).
Indexes is the list of index you want.
This work even if you make a list3 = [list1,list1,[list1,[list1],list1]] , you may want to try it.
Best luck to end your script now.

Would it work for your implementation if we sort out your list1 into a more friendly format first? If so, you could do that in a pretty simple way:
Go through each element of list1, if the element is itself a big list of tuples, then we want to unpack further. If the element is a tuple (so the first element of that tuple is a list), or it is itself one of your 3-element lists, then we just want to append that as it is.
nice_list = []
for i in list1:
if type(i[0]) == str or type(i[0]) == list:
# i.e. i is one of your 3-element lists, or a tuple of lists
nice_list.append(i)
else:
#If i is still a big list of other tuples, we want to unpack further
for j in i:
nice_list.append(j)
Then you could search for the indices much easier:
for i, idx in zip(nice_list, range(len(nice_list))):
if ['A', 'B', 'A'] in i:
print(idx) #Or append them to a list, whatever you wanted to do
For a not-particularly-elegant solution to your question about finding how many 3-element lists there are, yes you could use a for loop:
no_of_lists = 0
for n in nice_list:
if type(n) == tuple:
no_of_lists += len(n)
elif type(n) == list and type(n[0]) == list:
# if it is a list of lists
no_of_lists += len(n)
elif type(n) == list and type(n[0]) == str:
#if it is a 3-element list
no_of_lists_lists += 1
print('Number of 3-element lists contained:', no_of_lists)
Edit: to answer the question you asked in the comments about how the for n in nice_list part works, this just iterates through each element of the list. To explore this, try writing some code to print out nice_list[0], nice_list[1] etc, or a for loop which prints out each n so you can see what that looks like. For example, you could do:
for n in nice_list:
print(n)
to understand how that's working.

Slightly unconventional approach, due to unknown depth, and/or lack of known array flattening operation - I would try with regex:
import re
def getPos(el, arr):
el=re.escape(str(el))
el=f"(\({el})|({el}\))"
i=0
for s in re.finditer(r"\([^\)]+\)", str(arr)):
if(re.match(el,s.group(0))):
yield i
i+=1
Which yields:
>>> print(list(getPos(list2, list1)))
[0, 1, 4, 5, 6, 8, 9]
(Which I believe is the actual result you want).

Remove common item in list of lists based on different index position

If I have a list of lists, and I want to remove all the items after 'd', and I want to do that based on the index location of 'd' in both lists, how would I do that if the index location of 'd' is different in each list.
Is there a better way than indexing?
ab_list = ['a', 'b', 'c' ,'d','e', 'f'], ['a', 'd', 'e', 'f', 'g']
loc=[]
for i in ab_list:
loc.append(i.index('d'))
print(loc)
# output is [3, 1]
for i in ab_list:
for l in loc:
ab_list_keep=(i[0:l])
print(ab_list_keep)
## output is
#['a', 'b', 'c']
#['a']
#['a', 'd', 'e']
#['a']
The first two lines of the output is what I'd want, but making a list out of the index locations of 'd' doesn't seem to be right.

Python's built in itertools.takewhile method is designed for cases like this one:
import itertools
ab_list = ['a', 'b', 'c' ,'d','e', 'f'],['a', 'd', 'e', 'f', 'g']
print([list(itertools.takewhile(lambda i: i != "d", sublist)) for sublist in ab_list])
output:
[['a', 'b', 'c'], ['a']]

Find the same elements from two lists and print the elements from both lists

There are two lists:
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l = ['a', 'c', 'e']
I want to find the same elements from these two lists, that is:
['a', 'c', 'e']
then I want to print out the element we found, for example, 'a' from both lists, that is: ['a', 'a', 'a'].
The result I want is as follows:
['a', 'a', 'a', 'c', 'c', 'c', 'e', 'e']
I try to doing in this way:
c = []
for item_k in k:
for item_j in j:
if item_k== item_j:
c.append(item_k)
c.append(item_j)
However, the result is ['a', 'a', 'c', 'c', 'e', 'e']
Also in this way:
c=[]
for item_k in k:
if item_k in l:
c.append(item_k)
d=l.count(item_k)
c.append(item_k*d)
print c
But it do not works, can anybody tell me how to do it? really appreciate your help in advance

result = [x for x in sorted(k + l) if x in k and x in l]
print(result)
results:
['a', 'a', 'a', 'c', 'c', 'c', 'e', 'e']

Since you want to pick up elements from both lists, the most straight forward way is probably to iterate over both while checking the other one (this is highly optimizatiable if you depend on speed for doing this):
merged = []
for el in list1:
if el in list2:
merged.append(el)
for el in list2:
if el in list1:
merged.append(el)
.. if the order of the elements is important, you'll have to define an iteration order (in what order do you look at what element from what array?).

If the lists are sorted and you want the result to be sorted:
sorted([x for x in list1 if x in set(list2)] + [x for x in list2 if x in set(list1)] )

You can use set operations to intersect and then loop through, appending to a new list any that match the intersected list
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l = ['a', 'c', 'e']
common_list = list(set(k).intersection(set(l)))
all_results = []
for item in k:
if item in common_list:
all_results.append(item)
for item in l:
if item in common_list:
all_results.append(item)
print sorted(all_results)
output:
['a', 'a', 'a', 'c', 'c', 'c', 'e', 'e']

Here's a compact way. Readability might suffer a little, but what fun are comprehensions without a little deciphering?
import itertools
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l = ['a', 'c', 'e']
combined = [letter for letter in itertools.chain(k,l) if letter in l and letter in k]

Here is an implementation that matches your initial algorithm:
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l=['a', 'c', 'e']
c=[]
for x in l:
count = 0
for y in k:
if x == y:
count += 1
while count>=0:
c.append(x)
count = count -1
print c

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

keep random list elements and remove the others - python

Related

How to efficiently split a list that has a certain periodicity, into multiple lists?

how to terminate the program when similar elements are found

List of indices of tuples of tuples that contain certain tuples

Remove common item in list of lists based on different index position

Find the same elements from two lists and print the elements from both lists

Categories

Resources