Remove duplicate matrix from list of matrices

Remove duplicate matrix from list of matrices - python

I have a rectangular matrix with digits only, i want calculate the number of different unique 2 × 2 square matrices in it.
I stored all possible 2x2 matrices in a new list. Now I want to re move all duplicate matrices from this new list. I don't know how to do it. If I use 'set()' function it gives me the error "unhashable type: 'list' ".
def differentSquares(matrix):
squares_list = []
for i in range (len(matrix)-1):
for j in range (len(matrix[i])-1):
temp=[[matrix[i][j],matrix[i][j+1]],
[matrix[i+1][j],matrix[i+1][j+1]]]
squares_list.append(temp)
return len(squares_list)
I know this problem can be solved by a different logic but I still want to know how can someone remove a duplicate matrix from a list of matrices.
If I enter the following input
Matrix=[[1,2,1],
[2,2,2],
[2,2,2],
[1,2,3],
[2,2,1]]
The value returned is 8 as I returned the length of the list without removing the duplicates.
If I remove the duplicates the answer will become 6(correct answer)

As mentioned by Alex, you can hash only immutable type objects in a set. list is mutable whereas tuple is immutable.
For more info - Hashable, immutable
Also, you can directly add your immutable objects to a set. Since a set will always contain unique elements, the add operation will not add any duplicates.
def differentSquares(matrix):
unique_squares = set() # Create a set for adding unique 2x2 matrices
for i in range (len(matrix)-1):
for j in range (len(matrix[i])-1):
temp=((matrix[i][j],matrix[i][j+1]),
(matrix[i+1][j],matrix[i+1][j+1]))
unique_squares.add(temp) # Add the matrix. It will not add any duplicates
return len(unique_squares) # Returns 6 for the given example

Lists are mutable(can be changed) and therefore cannot be hashed. Instead try to use tuples which are immutable and therefore can be hashed.
def differentSquares(matrix):
squares_list = []
for i in range(len(matrix) - 1):
for j in range(len(matrix[i]) - 1):
temp=((matrix[i][j], matrix[i][j+1]),
(matrix[i+1][j], matrix[i+1][j+1]))
squares_list.append(temp)
return len(set(squares_list))

Related

Python, attach list of subscripts to a numpy array object and use the selected elements in a conditional

I've created a list of objects each of which has an attribute which is a 5x5 numpy array with named columns and also an "ID" attribute. I want to create a function which checks if specific elements in the array (based on position) are in a (different) list of variable length, but exactly which array elements are being searched for can vary based on certain conditions.
Ideally I'd like to pass the list of subscripts used to retrieve the array elements as an argument to the function which will then attach each of the desired subscripts to the object and check for their presence in the list.
Here's what I mean:
# lst is the list we're checking against (i.e. are the array elements in this list)
# objs_list is the list of objects with the array and ID attributes
# "A" is the name of one of the columns in the numpy array
def check_list_membership(obj_id):
# create object_to_check which is a numpy array
object_to_check = next((x for x in objs_list if x.ID == obj_id)), None).arrayattribute
if (object_to_check["A"][0] in lst) & (object_to_check["A"][1] in lst) & \
(object_to_check["A"][2] in lst) & (object_to_check["A"][3] in lst ) & \
(object_to_check["A"][4] in lst):
print("all selected elements are in the list")
else:
print("one or more selected elements are not in the list")
In this example, I want to see if the array elements ["A"][0], ["A"][1], etc. are in the list. But in other cases I may want to check if ["A"][0], ["B"][1], and others are in the list. Obviously it's really clunky to have all of these conditional statements written out and I want to avoid this.
What I really want is to take the list of desired subscripts and attach each of them to the "object_to_check" in sequence and check for their membership in the list. I don't think string concatenation as described here will do what I want because I don't want the result to be strings. Instead I want each of these elements to be evaluated and checked for membership in the list. I don't think multiplying "object_to_check" to the length of the number of subscripts and zipping would help either because then I'd have to end up with strings (disembodied subscripts) and I'm not sure what I could do that would allow my list of object/subscript pairs to be evaluated (safely). I've looked into evaluating functions in string form which seems to be veering into controversial territory.
The desired function might look something like:
def check_list_membership(obj_id,list_of_subscripts):
object_to_check = next((x for x in objs_list if x.ID == obj_id)), None).arrayattribute
# pass 'list_of_subscripts' in here and attach to 'object_to_check'
if object_to_check[various subscripts] in lst:
print("all selected elements are in the list")
else:
print("one or more selected elements are not in the list")
How can I accomplish this without dozens of lines of hard-coding?

How to find duplicates in a Python list of lists which elements are numpy.ndarray of shape (9, 103)

I got a list (I call chunks) with len(chunks)=195 and len(chunks[0]) = 32. The elements inside chunks[0] are of type numpy.ndarray and shape (9,103).
type(chunks[0][0])
<class 'numpy.ndarray'>
type(chunks[0][0][0])
<class 'numpy.ndarray'>
type(chunks[0][0][0][0])
<class 'numpy.float64'>
I'm trying to find if there are duplicates in chunks[0]. The most appropriate way I thought of was len(chunks[0]) != set(chunks[0]) but that throws an error: 'TypeError: unhashable type'.
Is there another workable way to investigate whether elements inside the chunks[0] are equal and if so to eliminate the duplicates from the list? Could transforming them to tensors be advisable to check for duplicates in a fast way?

The problem
Hashables data types, i.e., those that can be used as elements in sets or keys in dicts, have to be immutable. That's because you have to get the same hash value each time you try to look for it, but if you could modify it, the hash value would change. For example, lists and arrays can be changed and are therefore not hashable, but tuples are immutable so they are hashable.
One possible solution
You can create a tuple containing the values from your list or array or list of arrays, and use that in your set.
Sample code
You could use functions like these to solve your problem:
def 2d_array_to_tuples(a):
return tuple(tuple(row) for row in a)
def list_of_2d_arrays_to_tuples(a_list):
return tuple(2d_array_to_typles(a) for a in a_list)
These two functions return "2D" and "3D" tuples, which are hashable. You can insert their return values into sets.
And then this could work to detect if two chunks contain the same 32 arrays in the same order:
len(chunks) != len(set(list_of_2d_arrays_to_tuples(chunk) for chunk in chunks))
Or if you want to look for duplicate arrays within chunks[0]:
len(chunks[0]) != len(set(2d_array_to_tuples(a) for a in chunks[0]))
Eliminating the duplicates
If you want to eliminate the duplicates in the list, I would unroll those code a bit. Let chunk = chunks[0] and say you want uniq_chunk to have the arrays from chunk without the duplicates. This code should do the trick:
found = set()
uniq_chunk = []
for a in chunk:
as_tuple = 2d_array_to_tuples(a)
if as_tuple not in found:
found.add(as_tuple)
uniq_chunk.append(a)
You can adjust this approach to the exact thing you're trying to deduplicate.

How to use hashed values as an Index in Python

I want to use three lists as a index for a table in Python. To do so I decided to hash the lists to a int, because you can not use lists as an index.
The Problem is that the hashed vaule is a number like: -103692953590217654
This can not be used as an index as well.
How can I turn this high int into a smaller number, so that it is usabel for a index of a table?
I need this solution to fill a q-table for a reinforcement learning framework. My state is definded with three lists.
IndexError: list index out of range

Tuples are hashable, and sounds like you should be using them for your case.
As an arbitrary example:
a = (1,2)
b = (3,4)
q_learning_dict = {}
q_learning_dict[(a, b)] = 0.1
To convert your lists to tuples, you can simply pass them to the tuple() function like tuple([1,2,3]).
Warning: Tuples are IMMUTABLE. This means that you cannot change their content after you initialize them (which is also what makes them hashable).
Hashing a list doesn't make sense because you can change the contents of the list or append/remove values to/from it, which would render your previous hash invalid.

TypeError: unhashable type: 'list' only when using multidimensional list to initialize set

I know that this is a commonly asked question and there are numerous posts discussing the topic surrounding hash-able property of set elements, but I am trying to understand why set can accept 1D list but not multi-dimensional list to initialize it.
Look at the below code: Case1, Case2 works (they accept 1D list) while Case3 does not (it accepts 2D list). How and what is the role of dimension in set initialization.
#Case1:
cities = set(["Frankfurt", "Basel","Freiburg"])
print(cities)
#Case2:
citylist = list(["Frankfurt", "Basel","Freiburg"])
setofcitites = set(citylist)
print(setofcitites)
#Case3:
more_cities = set([["Frankfurt", "Basel","Freiburg"], ["Dubai", "Toronto","Sydney"]])
print(more_cities)

Short answer
Strings are hashable, but lists are not.
Longer answer
Understand precisely what is being hashed when you use the set function.
In Case 1 and Case 2, you are hashing the elements of the list, which are actually strings.
In Case 3, you are hashing the elements of the list, which are lists themselves.
Since lists are mutable objects, they are not hashable.

In case 1 and 2, no lists are being hashed. The list is iterated, and it's elements are hashed. The hashed elements aren't lists, and are otherwise hashable, so it's fine.
In case 3 though, the outer list is iterated like before, but each element of the list is another list, which are attempted to be hashed. As you know, that won't end well.

That's because in the first two cases you're essentially converting a list into a set i.e. individual elements of the list become elements of the set. Since those are strings, and strings are hashable, they are allowed in the set.
In the third case the elements of the list you're trying to convert into a set are lists themselves and a list is not hashable, hence the error. It's equivalent as if you've tried to do:
your_set = set()
your_set.add("Frankfurt") # OK
your_set.add(["Frankfurt", "Basel"]) # Err

list is mutable object, thus can't be hashed
this will work
more_cities = set([("Frankfurt", "Basel","Freiburg"), ("Dubai", "Toronto","Sydney")])

Use a list to check an array index

In python, I am generating lists to represent states (for example a state could be [1,3,2,2,5]).
The value of each element can range from 1 to some specific number.
Based on certain rules, these states can evolve in particular ways. I am not interested in states I have already encountered.
Right now my code just checks whether a list is in a list of lists and if it isn't, it appends it. But this list is getting very large and using a lot of resources to check against.
I would like to
create a multidimensional array of zeros,
check a particular location in that array, and if that location is 0 set it to 1.
If I take the state stored as a list or as an array, adjust it by 1
to correspond to index values, and try to pass that value as an index
for the zeros array, it doesn't just change the one element.
I think this is because the list is in brackets, where index() wants an
argument of just integers separated by commas.
Is there a way to pass a list or array of integers to check an index of an array that won't add complication to my code? Or even just some more efficient way to
store and check the states I have already generated?

Yes, you may define a class of your state and use define its hash function. In this way, the running time of a lookup reduce to O(1) and the space required reduce to O(N), where N is the number of states instead of number of state * state size
A sample class of the state:
class State(object):
def __init__(self, state_array):
self.state_array = state_array
def getHash(self):
return hash(self.state_array) # using python default hash function here, you can totally design your own.
When storing current state:
# current_state is the state you currently have
all_states = {}
if current_state.getHash() not in all_state:
all_states[current_state.getHash()] = 1
else:
# Do something else
Note that in python, element in dict takes O(1) because a dict in python is actually a hashmap.

You should maintain the states you have already encountered in a set. This ensures that a contains check (state in visited) is O(1), and not O(N). Since lists are not hashable, you would have to convert them to tuples, or use tuples in the first place:
visited = set()
states = [[1,3,2,2,5], [...], ...]
for state in states:
tpl = tuple(state)
if tpl not in visited: # instant check no matter the size of visted
# process state
visited.add(tpl)
Your idea of creating this 5-dimensional list and passing the state values as indexes is valid, but would create a large (n^5 n: number of values for each slot) and presumably sparse and therefore waisteful matrix.
BTW, You access a slot in such a matrix via m[1][3][2][2][5], not m.index([1,3,2,2,5]).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicate matrix from list of matrices - python

Related

Python, attach list of subscripts to a numpy array object and use the selected elements in a conditional

How to find duplicates in a Python list of lists which elements are numpy.ndarray of shape (9, 103)

How to use hashed values as an Index in Python

TypeError: unhashable type: 'list' only when using multidimensional list to initialize set

Use a list to check an array index

Categories

Resources