Remove some duplicates from list in python

Remove some duplicates from list in python - python

UPDATE: I believe I found the solution. I've put it at the end.
Let’s say we have this list:
a = ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c']
I want to create another list to remove the duplicates from list a, but at the same time, keep the ratio approximately intact AND maintain order.
The output should be:
b = ['a', 'b', 'a', 'c']
EDIT: To explain better, the ratio doesn't need to be exactly intact. All that's required is the output of ONE single letter for all letters in the data. However, two letters might be the same but represent two different things. The counts are important to identify this as I say later. Letters representing ONE unique variable appear in counts between 3000-3400 so when I divide the total count by 3500 and round it, I know how many time it should appear in the end, but the problem is I don't know what order they should be in.
To illustrate this I'll include one more input and desired output:
Input: ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'a', 'a', 'd', 'd', 'a', 'a']
Desired Output: ['a', 'a', 'b', 'c', 'a', 'd', 'a']
Note that 'C' has been repeated three times. The ratio needs not be preserved exactly, all I need to represent is how many times that variable is represented and because it's represented 3 times only in this example, it isn't considered enough for it to count as two.
The only difference is that here I'm assuming all letters repeating exactly twice are unique, although in the data-set, again, uniqueness is dependent on the appearance of 3000-3400 times.
Note(1): This doesn't necessarily need to be considered but there's a possibility that not all letters will be grouped together nicely, for example, considering 4 letters for uniqueness to make it short: ['a','a',''b','a','a','b','b','b','b'] should still be represented as ['a','b']. This is a minor problem in this case, however.
EDIT:
Example of what I've tried and successfully done:
full_list = ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c']
#full_list is a list containing around 10k items, just using this as example
rep = 2 # number of estimated repetitions for unique item,
# in the real list this was set to 3500
quant = {'a': 0, "b" : 0, "c" : 0, "d" : 0, "e" : 0, "f" : 0, "g": 0}
for x in set(full_list):
quant[x] = round(full_list.count(x)/rep)
final = []
for x in range(len(full_list)):
if full_list[x] in final:
lastindex = len(full_list) - 1 - full_list[::-1].index(full_list[x])
if lastindex == x and final.count(full_list[x]) < quant[full_list[x]]:
final.append(full_list[x])
else:
final.append(full_list[x])
print(final)
My problem with the above code is two-fold:
If there are more than 2 repetitions of the same data, it will not count them correctly. For example: ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c', 'a', 'a'] should become ['a','b','a','c','a'] but instead it becomes ['a','b,'c','a']
It takes a very log time to finish as I'm sure it's a very
inefficient way to do this.
Final remark: The code I've tried was more of a little hack to achieve the desired output on the most common input, however it doesn't do exactly what I intended it to. It's also important to note that the input changes over time. Repetitions of single letters aren't always the same, although I believe they're always grouped together, so I was thinking of making a flag that is True when it hits a letter and becomes false as soon as it changes to a different one, but this also has the problem of not being able to account for the fact that two letters that are the same might be put right next to each other. The count for each letter as an individual is always between 3000-3400, so I know that if the count is above that, there are more than 1.
UPDATE: Solution
Following hiro protagonist's suggestion with minor modifications, the following code seems to work:
full = ['a', 'a', 'b', 'b', 'a', 'a', 'c', 'c', 'a', 'a']
from itertools import groupby
letters_pre = [key for key, _group in groupby(full)]
letters_post = []
for x in range(len(letters_pre)):
if x>0 and letters_pre[x] != letters_pre[x-1]:
letters_post.append(letters_pre[x])
if x == 0:
letters_post.append(letters_pre [x])
print(letters_post)
The only problem is that it doesn't consider that sometimes letters can appear in between unique ones, as described in "Note(1)", but that's only a very minor issue. The bigger issue is that it doesn't consider when two separate occurances of the same letter are consecutive, for example (two for uniqueness as example): ['a','a','a','a','b','b'] gets turned to ['a','b'] when desired output should be ['a','a','b']

this is where itertools.groupby may come in handy:
from itertools import groupby
a = ["a", "a", "b", "b", "a", "a", "c", "c"]
res = [key for key, _group in groupby(a)]
print(res) # ['a', 'b', 'a', 'c']
this is a version where you could 'scale' down the unique keys (but are guaranteed to have at leas one in the result):
from itertools import groupby, repeat, chain
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'a', 'a',
'd', 'd', 'a', 'a']
scale = 0.4
key_count = tuple((key, sum(1 for _item in group)) for key, group in groupby(a))
# (('a', 4), ('b', 2), ('c', 5), ('a', 2), ('d', 2), ('a', 2))
res = tuple(
chain.from_iterable(
(repeat(key, round(scale * count) or 1)) for key, count in key_count
)
)
# ('a', 'a', 'b', 'c', 'c', 'a', 'd', 'a')
there may be smarter ways to determine the scale (probably based on the length of the input list a and the average group length).

Might be a strange one, but:
b = []
for i in a:
if next(iter(b[::-1]), None) != i:
b.append(i)
print(b)
Output:
['a', 'b', 'a', 'c']

Related

Enumerating all possible scenarios

I am trying to find all of the possible combinations for a set. Suppose I have 2 vehicles (A and B) and I want to use them by sending them and then return. Send and return are two distinct actions, and I want to enumerate all of the possible sequences of sending and returning this vehicle. Thus the set is [ A, A, B, B]. I use this code to enumerate:
from itertools import permutations
a = permutations(['A', 'A', 'B', 'B'])
# Print the permutations
seq = []
for i in list(a):
seq.append(i)
seq = list(set(seq)) # remove duplicates
The result is as follows:
('A', 'B', 'B', 'A')
('A', 'B', 'A', 'B')
('A', 'A', 'B', 'B')
('B', 'A', 'B', 'A')
('B', 'B', 'A', 'A')
('B', 'A', 'A', 'B')
Suppose my assumption is the two vehicles identical. Thus, it doesn't matter which one is on the first order (i.e. ABBA is the same as BAAB). Here's what I expect the result is:
('A', 'B', 'B', 'A')
('A', 'B', 'A', 'B')
('A', 'A', 'B', 'B')
I can do this easily by removing the last three elements. However, I encounter a problem when I try to do the same thing for three vehicles ( a = permutations(['A', 'A', 'B', 'B', 'C', 'C']). How to ensure that the result already considers the three identical vehicles?

One way would be to generate all the combinations, then filter for only those where the first mention of each vehicle is in alphabetical order.
In recent versions of Python, dict retains first-insertion order, so we can use it to determine the first mention; something like:
from itertools import permutations
seq = set()
for i in permutations(['A', 'A', 'B', 'B']):
first_mentions = {car: None for car in i}.keys()
if list(first_mentions) == sorted(first_mentions):
seq.add(i)
(This works in practice since Python 3.5, and officially since Python 3.7)

from itertools import permutations
a = permutations(['A', 'A', 'B', 'B'])
seq = []
for i in list(a):
if i[0]=='A':
seq.append(i)
seq = list(set(seq))
print(seq)
Try this, I think this should do

Function for creating a random order list in python

I am new to Python. For an experiment, I need to build a random selector function that determines the order of runs the athlete will perform. We will have four courses (A, B, C, D) and we want the athlete to perform these in random order. There will be a total of 12 runs for each athlete and each course must have 3 runs each session. How can I build this function?
This is what I have tried so far. It works but I need to run the script several times but I get what I want. So if someone has any better idea, I would be really happy.
Best
Christian
import random
runs = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
diffCourses = ['A', 'B', 'C', 'D']
myRandom = []
for run in runs:
x = random.choice(diffCourses)
myRandom.append(x)
if myRandom.count('A') != 3 or myRandom.count('B') != 3 or myRandom.count('C') != 3 or myRandom.count('D') != 3:
print('The run order does not satify the requirement')
else:
print('satified')
print(myRandom)

To keep things simple
I would create the total set of runs first, then shuffle it
from random import shuffle
diffCourses = ['A', 'B', 'C', 'D']
runs = diffCourses*3
shuffle(runs)
print(runs)
for example it produces
['C', 'C', 'D', 'C', 'D', 'A', 'A', 'D', 'B', 'B', 'B', 'A']

You choose a random ordering of A,B,C,D three times and collect them into a run:
import random
diffCourses = ['A', 'B', 'C', 'D']
runs = [ a for b in (random.sample(diffCourses, k=4) for _ in range (3)) for a in b]
print(runs)
Output (additional spaces between each sample):
['A', 'D', 'C', 'B', 'A', 'B', 'D', 'C', 'B', 'A', 'D', 'C']
The random.sample(diffCourses, k=4) part shuffles ABCD in a random fashion and the nested list comprehension creates a flat list from the three sublists.
This automatically ensures you get every letter trice and in a random fashion - you might get A A if your runner needs to run A last and first ins 2 sessions.
See
What does "list comprehension" mean? How does it work and how can I use it?
Understanding nested list comprehension
for how list comps work.

Index of a list item that occurs multiple times

I have the following code
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
for x in items:
print(x, end='')
print(items.index(x), end='')
## out puts: a0a0b2a0c4c4d6
I understand that python finds the first item in the list to index, but is it possible for me to get an output of a0a1b2a3c4c5d6 instead?
It would be optimal for me to keep using the for loop because I will be editing the list.
edit: I made a typo with the c indexes

And in case you really feel like doing it in one line:
EDIT - using .format or format-strings makes this shorter / more legible, as noted in the comments
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join("{}{}".format(e,i) for i,e in enumerate(items)))
For Python 3.7 you can do
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join(f"{e}{i}" for i, e in enumerate(items)))
ORIGINAL
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
print("".join((str(e) for item_with_index in enumerate(items) for e in item_with_index[::-1])))
Note that the reversal is needed (item_with_index[::-1]) because you want the items printed before the index but enumerate gives tuples with the index first.

I think you're looking for a0a1b2a3c4c5d6 instead.
for i, x in enumerate(items):
print("{}{}".format(x,i), end='')

Don't add or remove items from your list as you are traversing it. If you want the output specified, you can use enumerate to get the items and the indices of the list.
items = ['a', 'a', 'b', 'a', 'c', 'c', 'd']
for idx, x in enumerate(items):
print("{}{}".format(x, idx), end='')
# outputs a0a1b2a3c4c5d6

How to sort a list alphabetically by treating same letters in different case as same in python

If input is like ['z','t','Z','a','b','A','d'],then after sorting I want to get output like ['a','A','b','d','t','z','Z'] or ['A','a','b','d','t','Z','z'].

This will sort always upper-case letter first:
lst = ['z','t','Z','a','b','A','d']
print(sorted(lst, key=lambda k: 2*ord(k.lower()) + k.islower()))
Prints:
['A', 'a', 'b', 'd', 't', 'Z', 'z']
EDIT Thanks to #MadPhysicist in the comments, another variant:
print(sorted(lst, key=lambda k: (k.lower(), k.islower())))

There are two options on how this sorting could be done. Option 1 is stable, meaning that the order of elements is preserved regardless of case:
['A', 'b', 'a', 'B'] -> ['A', 'a', 'b', 'B']
The other option is to always put uppercase before or after lowercase:
['A', 'b', 'a', 'B'] -> ['A', 'a', 'B', 'b'] or ['a', 'A', 'b', 'B']
Both are possible with the key argument to list.sort (or the builtin sorted).
A stable sort is simply:
['A', 'b', 'a', 'B'].sort(key=str.lower)
A fully ordered sort requires you to check the original status of the letter, in addition to comparing the lowercased values:
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), x.islower()))
This uses the fact that a tuples are compared lexicographically, or element-by-element. The first difference determines the order. If two letters have different values for x.lower(), they will be sorted as usual. If they have the same lowercase representation, x.islower() will be compared. Since uppercase letters will return 0 and lowercase letters return 1, lowercase letters will come after uppercase. To switch that, invert the sense of the comparison:
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), not x.islower()))
OR
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), x.isupper()))
OR
['A', 'b', 'a', 'B'].sort(key=lambda x: (x.lower(), -x.islower()))
etc...

You could use sorted's (or list.sort's) extra keyword - key. You can pass to key a function according to which the sort will be performed. So for example:
l = ['z','t','Z','a','b','A','d']
print(sorted(l, key=str.lower))
Gives:
['a', 'A', 'b', 'd', 't', 'z', 'Z']
Note: this will not preserve the order of lower/upper between different letters. It will preserve the order of original input.

python: compare lists in a sequence using nested for loops

so I have two lists where I compare a person's answers to the correct answers:
correct_answers = ['A', 'C', 'A', 'B', 'D']
user_answers = ['B', 'A', 'C', 'B', 'D']
I need to compare the two of them (without using sets, if that's even possible) and keep track of how many of the person's answers are wrong - in this case, 3
I tried using the following for loops to count how many were correct:
correct = 0
for i in correct_answers:
for j in user_answers:
if i == j:
correct += 1
print(correct)
but this doesn't work and I'm not sure what I need to change to make it work.

Just count them:
correct_answers = ['A', 'C', 'A', 'B', 'D']
user_answers = ['B', 'A', 'C', 'B', 'D']
incorrect = sum(1 if correct != user else 0
for correct, user in zip(correct_answers, user_answers))

I blame #alecxe for convincing me to post this, the ultra-efficient solution:
from future_builtins import map # <-- Only on Python 2 to get generator based map and avoid intermediate lists; on Py3, map is already a generator
from operator import ne
numincorrect = sum(map(ne, correct_answers, user_answers))
Pushes all the work to the C layer (making it crazy fast, modulo the initial cost of setting it all up; no byte code is executed if the values processed are Python built-in types, which removes a lot of overhead), and one-lines it without getting too cryptic.

The less pythonic, more generic (and readable) solution is pretty simple too.
correct_answers = ['A', 'C', 'A', 'B', 'D']
user_answers = ['B', 'A', 'C', 'B', 'D']
incorrect = 0
for i in range(len(correct_answers)):
if correct_answers[i] != user_answers[i]:
incorrect += 1
This assumes your lists are the same length. If you need to validate that, you can do it before running this code.
EDIT: The following code does the same thing, provided you are familiar with zip
correct_answers = ['A', 'C', 'A', 'B', 'D']
user_answers = ['B', 'A', 'C', 'B', 'D']
incorrect = 0
for answer_tuple in zip(correct_answers, user_answers):
if answer_tuple[0] != answer_tuple[1]:
incorrect += 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove some duplicates from list in python - python

Might be a strange one, but: b = [] for i in a: if next(iter(b[::-1]), None) != i: b.append(i) print(b) Output: ['a', 'b', 'a', 'c']

Related

Enumerating all possible scenarios

Function for creating a random order list in python

Index of a list item that occurs multiple times

How to sort a list alphabetically by treating same letters in different case as same in python

python: compare lists in a sequence using nested for loops

Categories

Resources