generate a list with 6-dimensional unique elements in python - python

I need a list with 6 unique elements, like 000001, 000002, 000003 etc. It isn't neccessary have to be in digits, it can be a string, like AAAAAA, AAAAAB, ABCDEF etc.
If I generate a list with np.arange() I won't have 6-dimensional elements. I only decided to use 'for' cicles like
but I think there are a lot of more convenient ways to do this.

You need a cartesian product of the string "ABCDEF" by itself, taken five times (in other words, the product of six identical strings). It can be calculated using product() function from module itertools. The result of the product is a list of 6-tuples of individual characters. The tuples are converted to strings with join().
from itertools import product
symbols = "ABCDEF"
[''.join(x) for x in product(*([symbols] * len(symbols)))]
#['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE',
# 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD',...
# 'FFFFFA', 'FFFFFB', 'FFFFFC', 'FFFFFD', 'FFFFFE', 'FFFFFF']
You can change the value of symbols to any other combination of distinct characters.

You can use the function combinations_with_replacement():
from itertools import combinations_with_replacement
list(map(''.join, combinations_with_replacement('ABC', r=3)))
# ['AAA', 'AAB', 'AAC', 'ABB', 'ABC', 'ACC', 'BBB', 'BBC', 'BCC', 'CCC']
If you need all possible combinations use the function product():
from itertools import product
list(map(''.join, product('ABC', repeat=3)))
# ['AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB', 'ACC', 'BAA', 'BAB', 'BAC', 'BBA', 'BBB', 'BBC', 'BCA', 'BCB', 'BCC', 'CAA', 'CAB', 'CAC', 'CBA', 'CBB', 'CBC', 'CCA', 'CCB', 'CCC']

You can use np.unravel_index to get an index array:
idx = np.array(np.unravel_index(np.arange(30000), 6*(6,)), order='F').T
idx
# array([[0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 1],
# [0, 0, 0, 0, 0, 2],
# ...,
# [3, 5, 0, 5, 1, 3],
# [3, 5, 0, 5, 1, 4],
# [3, 5, 0, 5, 1, 5]])
You can replace the indices with more or less anything you like afterwards:
symbols = np.fromiter('ABCDEF', 'U1')
symbols
# array(['A', 'B', 'C', 'D', 'E', 'F'], dtype='<U1')
symbols[idx]
# array([['A', 'A', 'A', 'A', 'A', 'A'],
# ['A', 'A', 'A', 'A', 'A', 'B'],
# ['A', 'A', 'A', 'A', 'A', 'C'],
# ...,
# ['D', 'F', 'A', 'F', 'B', 'D'],
# ['D', 'F', 'A', 'F', 'B', 'E'],
# ['D', 'F', 'A', 'F', 'B', 'F']], dtype='<U1')
If you need the result as a list of words:
final = symbols[idx].view('U6').ravel().tolist()
final[:20]
# ['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE', 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD', 'AAAABE', 'AAAABF', 'AAAACA', 'AAAACB', 'AAAACC', 'AAAACD', 'AAAACE', 'AAAACF', 'AAAADA', 'AAAADB']

Related

Convert a list of string to category integer in Python

Given a list of string,
['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
I would like to convert to an integer-category form
[0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
This can achieve using numpy unique as below
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
_, opt = np.unique(np.array(ipt), return_inverse=True)
But, I curious if there is another alternative without the need to import numpy.
If you are solely interested in finding integer representation of factors, then you can use a dict comprehension along with enumerate to store the mapping, after using set to find unique values:
lst = ['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
d = {x: i for i, x in enumerate(set(lst))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [3, 3, 0, 3, 3, 3, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 1, 1, 1, 2, 1, 1, 1]
This approach can be used for general factors, i.e., the factors do not have to be 'a', 'b' and so on, but can be 'dog', 'bus', etc. One drawback is that it does not care about the order of factors. If you want the representation to preserve order, you can use sorted:
d = {x: i for i, x in enumerate(sorted(set(lst)))}
lst_new = [d[x] for x in lst]
print(lst_new)
# [0, 0, 2, 0, 0, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 1, 1, 1, 3, 1, 1, 1]
You could take a note out of the functional programming book:
ipt=['a', 'a', 'c', 'a', 'a', 'a', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'd', 'd', 'd', 'd', 'c', 'b', 'b', 'b', 'd', 'b', 'b', 'b']
opt = list(map(lambda x: ord(x)-97, ipt))
This code iterates through the input array and passes each element through the lambda function, which takes the ascii value of the character, and subtracts 97 (to convert the characters to 0-25).
If each string isn't a single character, then the lambda function may need to be adapted.
You could write a custom function to do the same thing as you are using numpy.unique() for.
def unique(my_list):
''' Takes a list and returns two lists, a list of each unique entry and the index of
each unique entry in the original list
'''
unique_list = []
int_cat = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Or if you wanted your indexing to be ordered.
def unique_ordered(my_list):
''' Takes a list and returns two lists, an ordered list of each unique entry and the
index of each unique entry in the original list
'''
# Unique list
unique_list = []
for item in my_list:
if item not in unique_list:
unique_list.append(item)
# Sorting unique list alphabetically
unique_list.sort()
# Integer category list
int_cat = []
for item in my_list:
int_cat.append(unique_list.index(item))
return unique_list, int_cat
Comparing the computation time for these two vs numpy.unique() for 100,000 iterations of your example list, we get:
numpy = 2.236004s
unique = 0.460719s
unique_ordered = 0.505591s
Showing that either option would be faster than numpty for simple lists. More complicated strings decrease the speed of unique() and unique_ordered much more than numpy.unique(). Doing 10,000 iterations of a random, 100 element list of 20 character strings, we get times of:
numpy = 0.45465s
unique = 1.56963s
unique_ordered = 1.59445s
So if efficiency was important and your list had more complex/a larger variety of strings, it would likely be better to use numpy.unique()

Split a list into various slices and concatenate it

I have seen similar questions to mine, but nothing I researched really fixed my issue.
So, basically I want to split a list, in order to remove some items and concatenate it back. Those items correspond to indexes that are given by a list of tuples.
import numpy as np
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
print ([list1[0:s] +list1[s+1:e] for s,e in indices])
#Returns: [['x', 'y', 'z', 'a'], ['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f']]
This code I have, which I got from one of the answers from this post nearly does what I need, but I tried to adapt it to loop over the first index of indices once but instead it does twice and it doesn't include the rest of the list.
I want my final list to split from zero index to the first item on first tuple and so on, using a for loop or some iterator.
Something like this,
`final_arr = arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:]<br/>
#Returns: [['x','y','a','b','c','f','g',2,3,4]]`
If someone could do it using for loops, it would be easier for me to see how you understand the problem, then after I can try to adapt to using shorter code.
Sort the indices using sorted and del the slices. You need reverse=True otherwise the indices of the later slices are incorrect.
for x, y in sorted(indices, reverse=True):
del(arr[x:y])
print(arr)
>>> ['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
This is the same result as you get with
print(arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:])
>>> ['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
import itertools
ignore = set(itertools.chain.from_iterable(map(lambda i: range(*i), indices)))
out = [c for idx, c in enumerate(arr) if idx not in ignore]
print(out)
print(arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:])
Output,
['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
Like this:
import numpy as np
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
print ([v for t in indices for i,v in enumerate(arr) if i not in range(t[0],t[1])])
Output:
['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 2, 3, 4, 'x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 2, 3, 4]
1- If you can remove the list items:
I using the example for JimithyPicker. I change the index list (removed items), because always that one index was removed the size of list change.
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [2,5,5] #INDEXES THAT NEED TO BE CUT OUT
for index in indices:
arr.pop(index)
final_arr = [arr]
print(final_arr)
Output:
[['x', 'y', 'a', 'b', 'c', 'f', 'g', 2, 3, 4]]
2- If you can't remove items:
In this case is necessary change the second index! The number doesn't match with output that you want.
The indices = [(2,4),(7,9)] has the output: ['x', 'y', 'a', 'b', 'c', 'd', 'f', 'g', 2, 3, 4]
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(6,9)] #INDEXES THAT NEED TO BE CUT OUT
final_arr = arr[0:indices[0][0]] + arr[indices[0][1]-1:indices[1][0]] + arr[indices[1][1]-1:]
print(final_arr)
Output:
['x','y','a','b','c','f','g',2,3,4]

Get right label using indices?

Really stupid question as I am new to python:
If I have labels = ['a', 'b', 'c', 'd'],
and indics = [2, 3, 0, 1]
How should I get the corresponding label using each index so I can get: ['c', 'd', 'a', 'b']?
There are a few alternatives, one, is to use a list comprehension:
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = [labels[i] for i in indices]
print(result)
Output
['c', 'd', 'a', 'b']
Basically iterate over each index and fetch the item at that position. The above is equivalent to the following for loop:
result = []
for i in indices:
result.append(labels[i])
A third option is to use operator.itemgetter:
from operator import itemgetter
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = list(itemgetter(*indices)(labels))
print(result)
Output
['c', 'd', 'a', 'b']

How can I apply a permutation to a list?

How one might get Sympy Permutation to act on a list? E.g.,
from sympy.combinatorics import Permutation
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
perm = Permutation([[0, 2, 8, 6], [1, 5, 7, 3]])
# Then something like...
perm * lst # This doesn't work. Throws AttributeError because of list
I'd like something like this that returns (in this example):
['g', 'd', 'a', 'h', 'e', 'b', 'i', 'f', 'c']
I have read https://docs.sympy.org/latest/modules/combinatorics/permutations.html, and don't see how.
Any suggestions as to how might one go about this?
You can just do perm(lst)
>>> from sympy.combinatorics import Permutation
>>> lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
>>> perm = Permutation([[0, 2, 8, 6], [1, 5, 7, 3]])
>>> perm(lst)
['c', 'f', 'i', 'b', 'e', 'h', 'a', 'd', 'g']
Your example output seems to have the result of applying the reverse of the given Permutation to the list - if that is your required output you need to either reverse the final list or each list within the permutation.
From here:
The permutation can be ‘applied’ to any list-like object, not only Permutations.

Joining Lists of Lists of Strings

I've a list of lists, in which each element is a single character:
ngrams = [['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c']]
From this, I want to generate a new single list with the content ['aa','ab','ac','ba','bb','bc','ca','cb','cc']. The individual elements of each list are appended to each other but in reverse order of the lists. I've come up with this (where np = 2):
for cnt in range(np-2,-1,-1):
thisngrams[-1] = [a+b for (a,b) in zip(thisngrams[-1],thisngrams[cnt])]
My solution needs to handle np higher than just 2. I expect this is O(np), which isn't bad. Can someone suggest a more efficient and pythonic way to do what I want (or is this a good pythonic approach)?
You can try this:
ngrams = [['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c']]
new = map(''.join, zip(*ngrams))
Output:
['aa', 'ba', 'ca', 'ab', 'bb', 'cb', 'ac', 'bc', 'cc']
For more than two elements:
n = [["a", "b", "c"], ["a", "c", "d"], ["e", "f", "g"]]
new = map(''.join, zip(* reversed(ngrams)))
#in Python3
#new = list(map(''.join, zip(* reversed(ngrams))))
Output:
['eaa', 'fcb', 'gdc']

Categories

Resources