In python, I am generating lists to represent states (for example a state could be [1,3,2,2,5]).
The value of each element can range from 1 to some specific number.
Based on certain rules, these states can evolve in particular ways. I am not interested in states I have already encountered.
Right now my code just checks whether a list is in a list of lists and if it isn't, it appends it. But this list is getting very large and using a lot of resources to check against.
I would like to
create a multidimensional array of zeros,
check a particular location in that array, and if that location is 0 set it to 1.
If I take the state stored as a list or as an array, adjust it by 1
to correspond to index values, and try to pass that value as an index
for the zeros array, it doesn't just change the one element.
I think this is because the list is in brackets, where index() wants an
argument of just integers separated by commas.
Is there a way to pass a list or array of integers to check an index of an array that won't add complication to my code? Or even just some more efficient way to
store and check the states I have already generated?
Yes, you may define a class of your state and use define its hash function. In this way, the running time of a lookup reduce to O(1) and the space required reduce to O(N), where N is the number of states instead of number of state * state size
A sample class of the state:
class State(object):
def __init__(self, state_array):
self.state_array = state_array
def getHash(self):
return hash(self.state_array) # using python default hash function here, you can totally design your own.
When storing current state:
# current_state is the state you currently have
all_states = {}
if current_state.getHash() not in all_state:
all_states[current_state.getHash()] = 1
else:
# Do something else
Note that in python, element in dict takes O(1) because a dict in python is actually a hashmap.
You should maintain the states you have already encountered in a set. This ensures that a contains check (state in visited) is O(1), and not O(N). Since lists are not hashable, you would have to convert them to tuples, or use tuples in the first place:
visited = set()
states = [[1,3,2,2,5], [...], ...]
for state in states:
tpl = tuple(state)
if tpl not in visited: # instant check no matter the size of visted
# process state
visited.add(tpl)
Your idea of creating this 5-dimensional list and passing the state values as indexes is valid, but would create a large (n^5 n: number of values for each slot) and presumably sparse and therefore waisteful matrix.
BTW, You access a slot in such a matrix via m[1][3][2][2][5], not m.index([1,3,2,2,5]).
Related
I've created a list of objects each of which has an attribute which is a 5x5 numpy array with named columns and also an "ID" attribute. I want to create a function which checks if specific elements in the array (based on position) are in a (different) list of variable length, but exactly which array elements are being searched for can vary based on certain conditions.
Ideally I'd like to pass the list of subscripts used to retrieve the array elements as an argument to the function which will then attach each of the desired subscripts to the object and check for their presence in the list.
Here's what I mean:
# lst is the list we're checking against (i.e. are the array elements in this list)
# objs_list is the list of objects with the array and ID attributes
# "A" is the name of one of the columns in the numpy array
def check_list_membership(obj_id):
# create object_to_check which is a numpy array
object_to_check = next((x for x in objs_list if x.ID == obj_id)), None).arrayattribute
if (object_to_check["A"][0] in lst) & (object_to_check["A"][1] in lst) & \
(object_to_check["A"][2] in lst) & (object_to_check["A"][3] in lst ) & \
(object_to_check["A"][4] in lst):
print("all selected elements are in the list")
else:
print("one or more selected elements are not in the list")
In this example, I want to see if the array elements ["A"][0], ["A"][1], etc. are in the list. But in other cases I may want to check if ["A"][0], ["B"][1], and others are in the list. Obviously it's really clunky to have all of these conditional statements written out and I want to avoid this.
What I really want is to take the list of desired subscripts and attach each of them to the "object_to_check" in sequence and check for their membership in the list. I don't think string concatenation as described here will do what I want because I don't want the result to be strings. Instead I want each of these elements to be evaluated and checked for membership in the list. I don't think multiplying "object_to_check" to the length of the number of subscripts and zipping would help either because then I'd have to end up with strings (disembodied subscripts) and I'm not sure what I could do that would allow my list of object/subscript pairs to be evaluated (safely). I've looked into evaluating functions in string form which seems to be veering into controversial territory.
The desired function might look something like:
def check_list_membership(obj_id,list_of_subscripts):
object_to_check = next((x for x in objs_list if x.ID == obj_id)), None).arrayattribute
# pass 'list_of_subscripts' in here and attach to 'object_to_check'
if object_to_check[various subscripts] in lst:
print("all selected elements are in the list")
else:
print("one or more selected elements are not in the list")
How can I accomplish this without dozens of lines of hard-coding?
I'm currently learning Depth-First Search in Python and a problem asked that, given a Binary Search Tree and a number N, find all paths from root-to-leaf such that the sum of all the node values of each path equals N.
I did everything right, but my code didn't work (it resulted in an empty 2D array). When looking at the solution, the only difference was "allPath.append(list(currPath))", while the code I wrote was simply 'allPath.append(currPath)'. When I made this change, the code worked perfectly Here's the full code:
def findPathSum(root, sum):
allPath = []
_findPathSum(root, sum, [], allPath)
return allPath
def _findPathSum(currNode, sum, currPath, allPath):
if currNode is None:
return
currPath.append(currNode.val)
if currNode.val == sum and currNode.left is None and currNode.right is None:
print(currPath)
allPath.append(list(currPath))
else:
_findPathSum(currNode.left, sum-currNode.val, currPath, allPath)
_findPathSum(currNode.right, sum-currNode.val, currPath, allPath)
del currPath[-1]
What I'm confused about is that currPath is already a list, and only contains integers (which are the node values). When I print currPath before it is appended to allPaths, it also correctly displays a list with integer values. Yet after I append it to allPaths, allPaths is just an empty array. However, using the list() method on it, for some reason, displays the correct 2D array with the right integer values. I have no clue why this would work.
From my understanding, the list() method simply takes an iterable and turns it into a list...however currPath was already a list. I feel like I'm missing something really obvious.
list creates a brand new list (although the elements are not brand new), a new list object. In your case, without using list you will simply be appending the exact same list object to allPath on each recursive call.
Therefore, since all the elements of allPath are the exact same list, changing that list changes all of the elements of allPath. For example, when at the end of _findPathSum you do del currPath[-1], you are effectively deleting the final element of every element of allPath. Since in the end currPath will be empty, that is what you see at the end in allPath - a list containing empty lists.
I am trying to work with a function called countryByPop() that takes an integer as a parameter.Function does two things:
First, get the list of countries(from a text file called "countries.text"), and sort it in descending order of population using the selection sort algorithm.
Second, use the integer parameter and return the nth most populace country.
readCountries() is a method that i created to open a file "countries.text" and displaying it.
the results looks like this eventually [Name, Area, Population]:
[["Afghanistan",647500.0,25500100],["Albania",28748.0,2821977].......["Zimbabwe",390580.0,12973808]]
Now, I am done with the the first(sorting) part of the function:
def countryByPop():
Countries = readCountries()
for i in range(0,len(Countries)):
madeSwap = False
for j in range (0,len(Countries)-(i+1)):
if Countries[j][2] < Countries[j+1][2]:
temp = Countries[j+1]
Countries[j+1] = Countries[j]
Countries[j] = temp
madeSwap = True
if not madeSwap:
return Countries
return Countries
I can not seem to figure out the how to return the nth most popular country.
lets say i pass 18 as an integer parameter in the function, and 18 gives you Turkey as the 18th most populated city in the list, it should print out something like:
>>>>countryByPop(18)
["Turkey",780580.0,75627384]
>>>countryByPop(-1)
Invalid parameter: -1
Python already has methods for sorting lists of sequences in the way you need. For example,
# sort the list of lists countries, ordering by the third element in reverse
countries.sort(key=lambda e: -e[2])
# the 10th most populous country:
countries[10]
That is, you can tell the sort method to order the elements of countries (which happen to be lists of 3 elements themselves) in order of increasing (-population): ie decreasing population. Then the first element of the sorted list will be to most populous country and so on.
first of all there is no n parameter in the signature of the function, wihch should instead be
def countryByPop(n):
then, at the end, assuming Countries is ordered, you need to return its n-th value
return Countries[n]
Note: python allows you to return tuples so in case you need to return the sorted list and the nth element you can do
return Countries, Countries[n]
In my humble opinion, it is simpler to just return the list and have the caller access its n-th element
Further notes:
please follow the PEP-8 guidelines for naming variables
do not re-invent the wheel and use python's own sorting
functionalities (in-place with the list sort method or using sorted)
I am trying to wrap my head around recursion and have posted a working algorithm to produce all the subsets of a given list.
def genSubsets(L):
res = []
if len(L) == 0:
return [[]]
smaller = genSubsets(L[:-1])
extra = L[-1:]
new = []
for i in smaller:
new.append(i+extra)
return smaller + new
Let's say my list is L = [0,1], correct output is [[],[0],[1],[0,1]]
Using print statements I have narrowed down that genSubsets is called twice before I ever get to the for loop. That much I get.
But why does the first for loop initiate a value of L as just [0] and the second for loop use [0,1]? How exactly do the recursive calls work that incorporate the for loop?
I think this would actually be easier to visualize with a longer source list. If you use [0, 1, 2], you'll see that the recursive calls repeatedly cut off the last item from the list. That is, recusion builds up a stack of recursive calls like this:
genSubsets([0,1,2])
genSubsets([0,1])
genSubsets([0])
genSubsets([])
At this point it hits the "base case" of the recursive algorithm. For this function, the base case is when the list given as a parameter is empty. Hitting the base case means it returns an list containing an empty list [[]]. Here's how the stack looks when it returns:
genSubsets([0,1,2])
genSubsets([0,1])
genSubsets([0]) <- gets [[]] returned to it
So that return value gets back to the previous level, where it is saved in the smaller variable. The variable extra gets assigned to be a slice including only the last item of the list, which in this case is the whole contents, [0].
Now, the loop iterates over the values in smaller, and adds their concatenation with extra to new. Since there's just one value in smaller (the empty list), new ends up with just one value too, []+[0] which is [0]. I assume this is the value you're printing out at some point.
Then the last statement returns the concatenation of smaller and new, so the return value is [[],[0]]. Another view of the stack:
genSubsets([0,1,2])
genSubsets([0,1]) <- gets [[],[0]] returned to it
The return value gets assigned to smaller again, extra is [1], and the loop happens again. This time, new gets two values, [1] and [0,1]. They get concatenated onto the end of smaller again, and the return value is [[],[0],[1],[0,1]]. The last stack view:
genSubsets([0,1,2]) <- gets [[],[0],[1],[0,1]] returned to it
The same thing happens again, this time adding 2s onto the end of each of the items found so far. new ends up as [[2],[0,2],[1,2],[0,1,2]].
The final return value is [[],[0],[1],[0,1],[2],[0,2],[1,2],[0,1,2]]
I am no big fan of trying to visualize the entire call graph for recursive function to understand what they do.
I believe there is a much simpler way:
Enter fairy tale land where recursive functions do the right thing™.
Just assume that genSubsets(L) works:
# This computes the powerset of the list L minus the last element
smaller = genSubsets(L[:-1])
Because this magically worked, the only entries that are missing are those, that contain the last element.
This fragment constructs all those missing subsets:
new = []
for i in smaller:
new.append(i+extra)
Now we have those subsets containing the last element in new and we have those subsets not containing the last element in smaller.
It follows that we must now have all subsets, so we can return new + smaller.
The only thing left is the base case to make sure the recursion stops. Because the empty set (or list in this case) is an element of every power set, we can use that to stop the recursion: Requesting the powerset of an empty set is a set containing the empty set. So our base case is correct. Since every recursive step removes one element off the list, the base case must be encountered at some time.
Thus, the code really does produce the power set.
Note: The principle behind this is that of induction. If something works for some known n0, and we can prove that: The algorithm working for n implies it works for n+1, it must thus work for all n ≥ n0.
Raised by this question's comments (I can see that this is irrelevant), I am now aware that using dictionaries for data that needs to be queried/accessed regularly is not good, speedwise.
I have a situation of something like this:
someDict = {}
someDict[(-2, -2)] = something
somedict[(3, -10)] = something else
I am storing keys of coordinates to objects that act as arrays of tiles in a game. These are going to be negative at some point, so I can't use a list or some kind of sparse array (I think that's the term?).
Can I either:
Speed up dictionary lookups, so this would not be an issue
Find some kind of container that will support sparse, negative indices?
I would use a list, but then the querying would go from O(log n) to O(n) to find the area at (x, y). (I think my timings are off here too).
Python dictionaries are very very fast, and using a tuple of integers is not going to be a problem. However your use case seems that sometimes you need to do a single-coordinate check and doing that traversing all the dict is of course slow.
Instead of doing a linear search you can however speed up the data structure for the access you need using three dictionaries:
class Grid(object):
def __init__(self):
self.data = {} # (i, j) -> data
self.cols = {} # i -> set of j
self.rows = {} # j -> set of i
def __getitem__(self, ij):
return self.data[ij]
def __setitem__(self, ij, value):
i, j = ij
self.data[ij] = value
try:
self.cols[i].add(j)
except KeyError:
self.cols[i] = set([j])
try:
self.rows[j].add(i)
except KeyError:
self.rows[j] = add([i])
def getRow(self, i):
return [(i, j, data[(i, j)])
for j in self.cols.get(i, [])]
def getCol(self, j):
return [(i, j, data[(i, j)])
for i in self.rows.get(j, [])]
Note that there are many other possible data structures depending on exactly what you are trying to do, how frequent is reading, how frequent is updating, if you query by rectangles, if you look for nearest non-empty cell and so on.
To start off with
Speed up dictionary lookups, so this would not be an issue
Dictionary lookups are pretty fast O(1), but (from your other question) you're not relying on the hash-table lookup of the dictionary, your relying on a linear search of the dictionary's keys.
Find some kind of container that will support sparse, negative indices?
This isn't indexing into the dictionary. A tuple is an immutable object, and you are hashing the tuple as a whole. The dictionary really has no idea of the contents of the keys, just their hash.
I'm going to suggest, as others did, that you restructure your data.
For example, you could create objects that encapsulate the data you need, and arrange them in a binary tree for O(n lg n) searches. You can even go so far as to wrap the entire thing in a class that will give you the nice if foo in Bar: syntax your looking for.
You probably need a couple coordinated structures to accomplish what you want. Here's a simplified example using dicts and sets (tweaking user 6502's suggestion a bit).
# this will be your dict that holds all the data
matrix = {}
# and each of these will be a dict of sets, pointing to coordinates
cols = {}
rows = {}
def add_data(coord, data)
matrix[coord] = data
try:
cols[coord[0]].add(coord)
except KeyError:
# wrap coords in a list to prevent set() from iterating over it
cols[coord[0]] = set([coord])
try:
rows[coord[1]].add(coord)
except KeyError:
rows[coord[1]] = set([coord])
# now you can find all coordinates from a row or column quickly
>>> add_data((2, 7), "foo4")
>>> add_data((2, 5), "foo3")
>>> 2 in cols
True
>>> 5 in rows
True
>>> [matrix[coord] for coord in cols[2]]
['foo4', 'foo3']
Now just wrap that in a class or a module, and you'll be off, and as always, if it's not fast enough profile and test before you guess.
Dictionary lookups are very fast. Searching for part of the key (e.g. all tiles in row x) is what's not fast. You could use a dict of dicts. Rather than a single dict indexed by a 2-tuple, use nested dicts like this:
somedict = {0: {}, 1:{}}
somedict[0][-5] = "thingy"
somedict[1][4] = "bing"
Then if you want all the tiles in a given "row" it's just somedict[0].
You will need some logic to add the secondary dictionaries where necessary and so on. Hint: check out getitem() and setdefault() on the standard dict type, or possibly the collections.defaultdict type.
This approach gives you quick access to all tiles in a given row. It's still slow-ish if you want all the tiles in a given column (though at least you won't need to look through every single cell, just every row). However, if needed, you could get around that by having two dicts of dicts (one in column, row order and the other in row, column order). Updating then becomes twice as much work, which may not matter for a game where most of the tiles are static, but access is very easy in either direction.
If you only need to store numbers and most of your cells will be 0, check out scipy's sparse matrix classes.
One alternative would be to simply shift the index so it's positive.
E.g. if your indices are contiguous like this:
...
-2 -> a
-1 -> c
0 -> d
1 -> e
2 -> f
...
Just do something like LookupArray[Index + MinimumIndex], where MinimumIndex is the absolute value of the smallest index you would use.
That way, if your minimum was say, -50, it would map to 0. -20 would map to 30, and so forth.
Edit:
An alternative would be to use a trick with how you use the indices. Define the following key function
Key(n) = 2 * n (n >= 0)
Key(n) = -2 * n - 1. (n < 0)
This maps all positive keys to the positive even indices, and all negative elements to the positive odd indices. This may not be practical though, since if you add 100 negative keys, you'd have to expand your array by 200.
One other thing to note: If you plan on doing look ups and the number of keys is constant (or very slowly changing), stick with an array. Otherwise, dictionaries aren't bad at all.
Use multi-dimensional lists -- usually implemented as nested objects. You can easily make this handle negative indices with a little arithmetic. It might use a more memory than a dictionary since something has to be put in every possible slot (usually None for empty ones), but access will be done via simple indexing lookup rather than hashing as it would with a dictionary.