"Balancing" a list of symbols - python

Consider a list with elements drawn from a set of symbols, e.g. {A, B, C}:
List --> A, A, B, B, A, A, A, A, A, B, C, C, B, B
Indexing indices --> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
How I can re-order this list so that, for any symbol, we have approximately half of the symbols in the first half of the list i.e. [0, [N/2]] of the list and half on the second half? i.e. [[N/2, N]]
Note that there could be multiple solutions to this problem. We also want to compute the resulting list of indices of the permutation, so that we can apply the new ordering to any list associated with the original one.
Is there a name for this problem? Any efficient algorithms for it? Most of the solutions I can think of are very brute-force.

You can use a dictionary here, this will take O(N) time:
from collections import defaultdict
lst = ['A', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', ' C', ' C', ' B', 'B']
d = defaultdict(list)
for i, x in enumerate(lst):
d[x].append(i)
items = []
indices = []
for k, v in d.items():
n = len(v)//2
items.extend([k]*n)
indices.extend(v[:n])
for k, v in d.items():
n = len(v)//2
items.extend([k]*(len(v)-n))
indices.extend(v[n:])
print items
print indices
Output:
['A', 'A', 'A', ' C', 'B', 'B', 'A', 'A', 'A', 'A', ' C', 'B', 'B', ' B']
[0, 1, 4, 10, 2, 3, 5, 6, 7, 8, 11, 9, 13, 12]

You can do this by getting the rank order of the symbols, then picking alternate ranks for each half of the output array:
x = np.array(['A', 'A', 'B', 'B', 'A', 'A', 'A',
'A', 'A', 'B', 'C', 'C', 'B', 'B'])
order = np.argsort(x)
idx = np.r_[order[0::2], order[1::2]]
print(x[idx])
# ['A' 'A' 'A' 'A' 'B' 'B' 'C' 'A' 'A' 'A' 'B' 'B' 'B' 'C']
print(idx)
# [ 0 4 6 8 3 12 10 1 5 7 2 9 13 11]
By default np.argsort uses the quicksort algorithm, with average time complexity O(N log N). The indexing step would be O(1).

You can use collections.Counter which is even better than just a defaultdict -- and you can place items into the first half and second half separately. That way, if you prefer, you can shuffle the first half and second half as much as you want (and just keep track of the shuffling permutation, with e.g. NumPy's argsort).
import collections
L = ['A', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'B', 'B']
idx_L = list(enumerate(L))
ctr = collections.Counter(L)
fh = []
fh_idx = []
sh = []
sh_idx = []
for k, v in ctr.iteritems():
idxs = [i for i,e in idx_L if e == k]
fh = fh + [k for i in range(v//2)]
fh_idx = fh_idx + idxs[:v//2]
sh = sh + [k for i in range(v // 2, v)]
sh_idx = sh_idx + idxs[v//2:]
shuffled = fh + sh
idx_to_shuffled = fh_idx + sh_idx
​
print shuffled
print idx_to_shuffled
which gives
['A', 'A', 'A', 'C', 'B', 'B', 'A', 'A', 'A', 'A', 'C', 'B', 'B', 'B']
[0, 1, 4, 10, 2, 3, 5, 6, 7, 8, 11, 9, 12, 13]

Shuffle the list with the indices, then split it in half. This method won't perfectly split the symbols every time, but as the number of repeats of each symbol gets larger, it will approach a perfect split.
import random
symbols = ['A', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'B', 'B']
indices = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
both = zip(symbols, indices)
random.shuffle(both)
symbols2, indices2 = zip(*both)
print symbols2
print indices2
Some sample outputs:
Trial #1:
('A', 'C', 'B', 'A', 'A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C')
( 7, 10, 2, 4, 1, 13, 8, 0, 5, 6, 9, 3, 12, 11)
# |
Trial #2
('A', 'A', 'B', 'B', 'C', 'A', 'A', 'A', 'B', 'C', 'A', 'A', 'B', 'B')
( 6, 0, 9, 3, 11, 1, 8, 4, 13, 10, 7, 5, 2, 12)
# |
Trial #3
('A', 'A', 'C', 'C', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'B')
( 4, 5, 11, 10, 2, 3, 0, 13, 12, 6, 7, 8, 1, 9)
# |

Related

Convert a list of string to category integer in Python

Given a list of string,
['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
I would like to convert to an integer-category form
[0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
This can achieve using numpy unique as below
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
_, opt = np.unique(np.array(ipt), return_inverse=True)
But, I curious if there is another alternative without the need to import numpy.
If you are solely interested in finding integer representation of factors, then you can use a dict comprehension along with enumerate to store the mapping, after using set to find unique values:
lst = ['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
d = {x: i for i, x in enumerate(set(lst))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [3, 3, 0, 3, 3, 3, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 1, 1, 1, 2, 1, 1, 1]
This approach can be used for general factors, i.e., the factors do not have to be 'a', 'b' and so on, but can be 'dog', 'bus', etc. One drawback is that it does not care about the order of factors. If you want the representation to preserve order, you can use sorted:
d = {x: i for i, x in enumerate(sorted(set(lst)))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
You could take a note out of the functional programming book:
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
opt = list(map(lambda x: ord(x)-97, ipt))
This code iterates through the input array and passes each element through the lambda function, which takes the ascii value of the character, and subtracts 97 (to convert the characters to 0-25).
If each string isn't a single character, then the lambda function may need to be adapted.
You could write a custom function to do the same thing as you are using numpy.unique() for.
def unique(my_list):
''' Takes a list and returns two lists, a list of each unique entry and the index of
each unique entry in the original list
'''
unique_list = []
int_cat = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Or if you wanted your indexing to be ordered.
def unique_ordered(my_list):
''' Takes a list and returns two lists, an ordered list of each unique entry and the
index of each unique entry in the original list
'''
# Unique list
unique_list = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
# Sorting unique list alphabetically
unique_list.sort()
# Integer category list
int_cat = []
for item in my_list:
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Comparing the computation time for these two vs numpy.unique() for 100,000 iterations of your example list, we get:
numpy = 2.236004s
unique = 0.460719s
unique_ordered = 0.505591s
Showing that either option would be faster than numpty for simple lists. More complicated strings decrease the speed of unique() and unique_ordered much more than numpy.unique(). Doing 10,000 iterations of a random, 100 element list of 20 character strings, we get times of:
numpy = 0.45465s
unique = 1.56963s
unique_ordered = 1.59445s
So if efficiency was important and your list had more complex/a larger variety of strings, it would likely be better to use numpy.unique()

Python Iterating through two lists only iterates through last element

I am trying to iterate through a double list but am getting the incorrect results. I am trying to get the count of each element in the list.
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
dict = {}
for words in l:
for letters in words:
dict[letters] = words.count(letters)
for x in countVocabDict:
print(x + ":" + str(countVocabDict[x]))
at the moment, I am getting:
<s>:1
a:1
b:2
c:2
</s>:1
It seems as if it is only iterating through the last list in 'l' : ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']
but I am trying to get:
<s>: 3
a: 4
b: 5
c: 6
</s>:3
In each inner for loop, you are not adding to the current value of dict[letters] but set it to whatever amount is counted for the current sublist (peculiarly) named word.
Fixing your code with a vanilla dict:
>>> l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
>>> d = {}
>>>
>>> for sublist in l:
...: for x in sublist:
...: d[x] = d.get(x, 0) + 1
>>> d
{'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3}
Note that I am not calling list.count in each inner for loop. Calling count will iterate over the whole list again and again. It is far more efficient to just add 1 every time a value is seen, which can be done by looking at each element of the (sub)lists exactly once.
Using a Counter.
>>> from collections import Counter
>>> Counter(x for sub in l for x in sub)
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})
Using a Counter and not manually unnesting the nested list:
>>> from collections import Counter
>>> from itertools import chain
>>> Counter(chain.from_iterable(l))
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})
The dictionary is being overwritten in every iteration, rather it should update
count_dict[letters] += words.count(letters)
Initialize the dictionary with defaultdict
from collections import defaultdict
count_dict = defaultdict(int)
As #Vishnudev said, you must add current counter. But dict[letters] must exists (else you'll get a KeyError Exception). You can use the get method of dict with a default value to avoir this:
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'],
['<s>', 'a', 'c', 'b', 'c', '</s>'],
['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
dict = {}
for words in l:
for letters in words:
dict[letters] = dict.get(letters, 0) + 1
As per your question, you seem to know that it only takes on the result of the last sublist. This happens because after every iteration your previous dictionary values are replaced and overwritten by the next iteration values. So, you need to maintain the previous states values and add it to the newly calculated values.
You can try this-
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
d={}
for lis in l:
for x in lis:
if x in d:
d[x]+=1
else:
d[x]=1
So the resulting dictionary d will be as-
{'<s>': 3, 'a': 4, 'c': 6, 'b': 5, '</s>': 3}
I hope this helps!

python: output data from a list

I'm trying to figure out how to output list items. the code below is taking answers and checking them against a key to see which answers are correct. for each student correct answers are stored in correct_count. Then I'm sorting in ascending order based on the correct count.
def main():
answers = [
['A', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['D', 'B', 'A', 'B', 'C', 'A', 'E', 'E', 'A', 'D'],
['E', 'D', 'D', 'A', 'C', 'B', 'E', 'E', 'A', 'D'],
['C', 'B', 'A', 'E', 'D', 'C', 'E', 'E', 'A', 'D'],
['A', 'B', 'D', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['E', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D']]
keys = ['D', 'B', 'D', 'C', 'C', 'D', 'A', 'E', 'A', 'D']
grades = []
# Grade all answers
for i in range(len(answers)):
# Grade one student
correct_count = 0
for j in range(len(answers[i])):
if answers[i][j] == keys[j]:
correct_count += 1
grades.append([i, correct_count])
grades.sort(key=lambda x: x[1])
# print("Student", i, "'s correct count is", correct_count)
if __name__ == '__main__':
main()
if I print out grades the output looks like this
[[0, 7]]
[[1, 6], [0, 7]]
[[2, 5], [1, 6], [0, 7]]
[[3, 4], [2, 5], [1, 6], [0, 7]]
[[3, 4], [2, 5], [1, 6], [0, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]]
what I'm interested in is the last row. The first number of each set corresponds to a student id and it's sorted in ascending order based on the 2nd number which represents a grade (4, 5, 6, 7, 7, 7, 7, 8).
I'm not sure how to grab that last row and iterate through it so that i get output like
student 3 has a grade of 4 and student 2 has a grade of 5
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]]
def main():
answers = [
['A', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['D', 'B', 'A', 'B', 'C', 'A', 'E', 'E', 'A', 'D'],
['E', 'D', 'D', 'A', 'C', 'B', 'E', 'E', 'A', 'D'],
['C', 'B', 'A', 'E', 'D', 'C', 'E', 'E', 'A', 'D'],
['A', 'B', 'D', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['E', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D']]
keys = ['D', 'B', 'D', 'C', 'C', 'D', 'A', 'E', 'A', 'D']
grades = []
# Grade all answers
for i in range(len(answers)):
# Grade one student
correct_count = 0
for j in range(len(answers[i])):
if answers[i][j] == keys[j]:
correct_count += 1
grades.append([i, correct_count])
grades.sort(key=lambda x: x[1])
for student, correct in grades:
print("Student", student,"'s correct count is", correct)
if __name__ == '__main__':
main()
What you were doing was printing grades while you were still in the loop. If you would've printed grades after both loops, you would've only seen the last line: [[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]], then just loop through grades and python will "unpack" the list into the student, and grade, respectively ash shown above.
Here is the output:
Student 3 's correct count is 4
Student 2 's correct count is 5
Student 1 's correct count is 6
Student 0 's correct count is 7
Student 5 's correct count is 7
Student 6 's correct count is 7
Student 7 's correct count is 7
Student 4 's correct count is 8
Don't forget to click the check mark if you like this answer.
What about something like the following:
students_grade = {}
for id, ans in enumerate(answers):
students_grade[id] = sum([x == y for x, y in zip(ans, key)])
Now you have a dictionary with the id of students mapping to their score ;)
Of course, you can change the enumerate to have the true list of ids instead!
While MMelvin0581 already addressed the problem in your code, You can also use nested list comprehension to achieve the same results
>>> [(a,sum([1 if k==i else 0 for k,i in zip(keys,j)])) for a,j in enumerate(answers)]
This will produce output like:
>>> [(0, 7), (1, 6), (2, 5), (3, 4), (4, 8), (5, 7), (6, 7), (7, 7)]
Then you can sort your results based on the criteria
>>> from operator import itemgetter
>>> sorted(out, key=itemgetter(1))
Note: itemgetter will have slight performance benefit over lambda. The above operation will produce output like:
>>> [(3, 4), (2, 5), (1, 6), (0, 7), (5, 7), (6, 7), (7, 7), (4, 8)]
Then finally print your list like:
for item in sorted_list:
print("Student: {} Scored: {}".format(item[0],item[1]))

printing corresponding item in list to another list containing numbers

My problem is this.
These are the two lists
codes = ['a', 'b', 'c', 'a', 'e', 'f', 'g', 'a', 'i', 'j', 'a', 'l']
pas = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
How would I find the position of all the 'a' in the codes list. And then print out the corresponding item in the pas list.
This is what the output should be. They should also be sorted with the .sort() function.
1
4
8
11
I have come up with this code. (That doesnt work)
qwer = [i for i,x in enumerate(codes) if x == common]
qwe = [qwer[i:i+1] for i in range(0, len(qwer), 1)]
print(pas[qwe])
What would be the best way to get the correct output?
>>> pas = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>> codes = ['a', 'b', 'c', 'a', 'e', 'f', 'g', 'a', 'i', 'j', 'a', 'l']
>>> result = sorted(i for i,j in zip(pas,codes) if j=='a')
>>> for i in result:
... print i
...
1
4
8
11
There are many ways to achieve it. Your example lists are:
>>> codes = ['a', 'b', 'c', 'a', 'e', 'f', 'g', 'a', 'i', 'j', 'a', 'l']
>>> pas = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Approach 1: Using enumerate:
>>> indices = [pas[i] for i, x in enumerate(codes) if x == "a"]
indices = [1, 4, 8, 11]
Approach 2: Using zip:
>>> [p for p, c in zip(pas, codes) if c == 'a']
[1, 4, 8, 11]
Just added another way to use numpy:
import numpy as np
codes = np.array(['a', 'b', 'c', 'a', 'e', 'f', 'g', 'a', 'i', 'j', 'a', 'l'])
pas = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
index = np.where(codes=='a')
values = pas[index]
In [122]: print(values)
[ 1 4 8 11]
codes = ['a', 'b', 'c', 'a', 'e', 'f', 'g', 'a', 'i', 'j', 'a', 'l']
pas = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[pas[index] for index, element in enumerate(codes) if element == "a"]

Mapping one value to another in a list

In a Python list, how can I map all instances of one value to another value?
For example, suppose I have this list:
x = [1, 3, 3, 2, 3, 1, 2]
Now, perhaps I want to change all 1's to 'a', all 2's to 'b', and all 3's to 'c', to create another list:
y = ['a', 'c', 'c', 'b', 'c', 'a', 'b']
How can I do this mapping elegantly?
You should use a dictionary and a list comprehension:
>>> x = [1, 3, 3, 2, 3, 1, 2]
>>> d = {1: 'a', 2: 'b', 3: 'c'}
>>> [d[i] for i in x]
['a', 'c', 'c', 'b', 'c', 'a', 'b']
>>>
>>> x = [True, False, True, True, False]
>>> d = {True: 'a', False: 'b'}
>>> [d[i] for i in x]
['a', 'b', 'a', 'a', 'b']
>>>
The dictionary serves as a translation table of what gets converted into what.
An alternative solution is to use the built-in function map which applies a function to a list:
>>> x = [1, 3, 3, 2, 3, 1, 2]
>>> subs = {1: 'a', 2: 'b', 3: 'c'}
>>> list(map(subs.get, x)) # list() not needed in Python 2
['a', 'c', 'c', 'b', 'c', 'a', 'b']
Here the dict.get method was applied to the list x and each number was exchanged for its corresponding letter in subs.
In [255]: x = [1, 3, 3, 2, 3, 1, 2]
In [256]: y = ['a', 'c', 'c', 'b', 'c', 'a', 'b']
In [257]: [dict(zip(x,y))[i] for i in x]
Out[257]: ['a', 'c', 'c', 'b', 'c', 'a', 'b']

Categories

Resources