Combining two sets and splitting them in pairs - python

I need to take two sets of data and produce one set of pairs(tuples) from both sets. This result set will only have one possible pair, i.e. for two sets: 1,2 and 3, 4 the result should be: ((1, 3), (2, 4)). Full exercise text can be found here:http://pastebin.com/mUaKV4G7
I need to do this using pop. Here's what I have so far:
def mating_pairs(males, females):
pairs = set()
tmp_males, tmp_females = males.copy(), females.copy()
for male in tmp_males:
for female in tmp_females:
pairs.add(males.pop())
pairs.add(females.pop())
zip(pairs[::2], pairs[1::2])
return pairs
This function works fine up to the point when it reaches:
zip(pairs[::2], pairs[1::2])
without it given two sets it'll combine them together but when I try to use zip to split them in pairs I get this error:
'set' object is not subscriptable
Which leads me to believe that it's somewhere returning None instead of correct result.
This function need to work with both integers and strings( I don't think it needs to pairing values in a specific order), also both sets will have equal number of values.
Can someone advise what I'm doing wrong?

The error tells you what is going on: pairs is a set and the expression pairs[::2] means "every 2nd element of the set". The problem is that sets have no defined order, so "every 2nd element of the set" makes no sense. As the order of element is undefined, Python raises a exception instead of making up a random order.
What you probably wanted to do is either to pair up males and females in the order they appear:
def mating_pairs(males, females):
return zip(males, females)
or all possible pairs of males and females (the product of both lists):
from itertools import product
def mating_pairs(males, females):
return product(males, females)
Your homework seem to be to implement either zip or product :-)

If you must use pop, try popping both males and females at once, into a tuple that you add to pairs (and, by the way, I'm not sure why you make copies of your sets and destroy the originals, but I suppose you have your reasons). Also, iterating both males and females will fail to give you the answer you're looking for - rather, check the emptiness of each set as you pop from it. What you're looking for is more like this:
def mating_pairs(males, females):
pairs = set()
tmp_males, tmp_females = males.copy(), females.copy()
while tmp_males and tmp_females:
pairs.add((tmp_males.pop(),tmp_females.pop()))
return pairs
though this would be a touch simpler if you can avoid using set.pop:
def mating_pairs(males, females):
return set(zip(males,females))
Also, please note that this can't be a complete answer unless you are using some sort of ordered set datatype. As it is using a set, you're not guaranteed to preserve any order of the males and females that were passed in.

If using pop is forced restriction I'd go with something like that:
def mating_pairs(males, females):
res = set()
males_copy, females_copy = males.copy(), females.copy()
while males_copy and females_copy:
res.add((males_copy.pop(), females_copy.pop()))
return res
print mating_pairs(set([1, 2, 3]), set(['a', 'b', 'c', 'd']))
# => set([(1, 'a'), (2, 'b'), (3, 'c')])
But set(zip(males, females)) is much more easier anyway.

I’m a bit unsure what you are trying to do. But analyzing your posted code, I guess you want to combine every male with every female once, and put tuples of each combination into a final list.
If that is the case, then what you are trying to do is creating a Cartesian product. The Python library has a nice function for this: itertools.product. You can use it like this:
def mating_pairs(males, females):
return set(itertools.product(males, females))
If you want to do it the manual way, then you can use two nested for loops to get all combinations. However the way you did it, by utilizing pop won’t work. What you did in your code is that you iterated over all males and females (after copying the parameters) and then you pop the items from the original sets. That way, very quickly both sets males and females will be empty, as you keep popping from them for every possible combination, without considering that you only get all combinations if you keep reusing individual items.
You could fix your code like this, without using pop:
def mating_pairs(males, females):
pairs = set()
# We only iterate over the items, so we don’t modify the original sets.
for male in males:
for female in females:
# And instead of adding individual items by once, and zipping them
# later, we just directly add tuples to the set.
pairs.add((male, female))
return pairs

Related

Is there a better way than a while loop to perform this function?

I was attempting some python exercises and I hit the 5s timeout on one of the tests. The function is pre-populated with the parameters and I am tasked with writing code that is fast enough to run within the max timeframe of 5s.
There are N dishes in a row on a kaiten belt, with the ith dish being of type Di​. Some dishes may be of the same type as one another. The N dishes will arrive in front of you, one after another in order, and for each one you'll eat it as long as it isn't the same type as any of the previous K dishes you've eaten. You eat very fast, so you can consume a dish before the next one gets to you. Any dishes you choose not to eat as they pass will be eaten by others.
Determine how many dishes you'll end up eating.
Issue
The code "works" but is not fast enough.
Code
The idea here is to add the D[i] entry if it is not in the pastDishes list (which can be of size K).
from typing import List
# Write any import statements here
def getMaximumEatenDishCount(N: int, D: List[int], K: int) -> int:
# Write your code here
numDishes=0
pastDishes=[]
i=0
while (i<N):
if(D[i] not in pastDishes):
numDishes+=1
pastDishes.append(D[i])
if len(pastDishes)>K:
pastDishes.pop(0)
i+=1
return numDishes
Is there a more effective way?
After much trial and error, I have finally found a solution that is fast enough to pass the final case in the puzzle you are working on. My previous code was very neat and quick, however, I have finally found a module with a tool that makes this much faster. Its from collections just as deque is, however it is called Counter.
This was my original code:
def getMaximumEatenDishCount(N: int, D: list, K: int) -> int:
numDishes=lastMod=0
pastDishes=[0]*K
for Dval in D:
if Dval in pastDishes:continue
pastDishes[lastMod] = Dval
numDishes,lastMod = numDishes+1,(lastMod+1)%K
return numDishes
I then implemented Counter like so:
from typing import List
# Write any import statements here
from collections import Counter
def getMaximumEatenDishCount(N: int, D: 'list[int]', K: int) -> int:
eatCount=lastMod = 0
pastDishes=[0]*K
eatenCounts = Counter({0:K})
for Dval in D:
if Dval in eatenCounts:continue
eatCount +=1
eatenCounts[Dval] +=1
val = pastDishes[lastMod]
if eatenCounts[val] <= 1: eatenCounts.pop(val)
else: eatenCounts[val] -= 1
pastDishes[lastMod]=Dval
lastMod = (lastMod+1)%K
return eatCount
Which ended up working quite well. I'm sure you can make it less clunky, but this should work fine on its own.
Some Explanation of what i am doing:
Typically while loops are actually marginally faster than a for loop, however since I need to access the value at an index multiple times if i used it, using a for loop I believe is actually better in this situation. You can see i also initialised the list to the max size it needs to be and am writing over the values instead of popping+appending which saves alot of time. Additionally, as pointed out by #outis, another small improvement was made in my code by using the modulo operator in conjunction with the variable which removes the need for an additional if statement. The Counter is essentially a special dict object that holds a hashable as the key and an int as the value. I use the fact that lastMod is an index to what would normally be accesed through list.pop(0) to access the object needed to either remove or decrement in the counter
Note that it is not considered 'pythonic' to assign multiple variable on one line, however I believe it adds a slight performance boost which is why I have done it. This can be argued though, see this post.
If anyone else is interested the problem that we were trying to solve, it can be found here: https://www.facebookrecruiting.com/portal/coding_puzzles/?puzzle=958513514962507
Can we use an appropriate data structure? If so:
Data structures
Seems like an ordered set which you have to shrink to a capacity restriction of K.
To meet that, if exceeds (len(ordered_set) > K) we have to remove the first n items where n = len(ordered_set) - K. Ideally the removal will perform in O(1).
However since removal on a set is in unordered fashion. We first transform it to a list. A list containing unique elements in the order of appearance in their original sequence.
From that ordered list we can then remove the first n elements.
For example: the function lru returns the least-recently-used items for a sequence seq limited by capacity-limit k.
To obtain the length we can simply call len() on that LRU return value:
maximumEatenDishCount = len(lru(seq, k))
See also:
Does Python have an ordered set?
Fastest way to get sorted unique list in python?
Using set for uniqueness (up to Python 3.6)
def lru(seq, k):
return list(set(seq))[:k]
Using dict for uniqueness (since Python 3.6)
Same mechanics as above, but using the preserved insertion order of dicts since 3.7:
using OrderedDict explicitly
from collections import OrderedDict
def lru(seq, k):
return list(OrderedDict.fromkeys(seq).keys())[:k]
using dict factory-method:
def lru(seq, k):
return list(dict.fromkeys(seq).keys())[:k]
using dict-comprehension:
def lru(seq, k):
return list({i:0 for i in seq}.keys())[:k]
See also:
The order of keys in dictionaries
Using ordered dictionary as ordered set
How do you remove duplicates from a list whilst preserving order?
Real Python: OrderedDict vs dict in Python: The Right Tool for the Job
As the problem is an exercise, exact solutions are not included. Instead, strategies are described.
There are at least a couple potential approaches:
Use a data structure that supports fast containment testing (a set in use, if not in name) limited to the K most recently eaten dishes. Fortunately, since dict preserves insertion order in newer Python versions and testing key containment is fast, it will fit the bill. dict requires that keys be hashable, but since the problem uses ints to represent dish types, that requirement is met.
With this approach, the algorithm in the question remains unchanged.
Rather than checking whether the next dish type is any of the last K dishes, check whether the last time the next dish was eaten is within K of the current plate count. If it is, skip the dish. If not, eat the dish (update both the record of when the next dish was last eaten and the current dish count). In terms of data structures, the program will need to keep a record of when any given dish type was last eaten (initialized to -K-1 to ensure that the first time a dish type is encountered it will be eaten; defaultdict can be very useful for this).
With this approach, the algorithm is slightly different. The code ends up being slightly shorter, as there's no shortening of the data structure storing information about the dishes as there is in the original algorithm.
There are two takeaways from the latter approach that might be applied when solving other problems:
More broadly, reframing a problem (such as from "the dish is in the last K dishes eaten" to "the dish was last eaten within K dishes of now") can result in a simpler approach.
Less broadly, sometimes it's more efficient to work with a flipped data structure, swapping keys/indices and values.
Approach & takeaway 2 both remind me of a substring search algorithm (the name escapes me) that uses a table of positions in the needle (the string to search for) of where each character first appears (for characters not in the string, the table has the length of the string); when a mismatch occurs, the algorithm uses the table to align the substring with the mismatching character, then starts checking at the start of the substring. It's not the most efficient string search algorithm, but it's simple and more efficient than the naive algorithm. It's similar to but simpler and less efficient than the skip search algorithm, which uses the positions of every occurrence of each character in the needle.
from typing import List
# Write any import statements here
from collections import deque, Counter
def getMaximumEatenDishCount(N: int, D: List[int], K: int) -> int:
# Write your code here
q = deque()
cnt = 0
dish_counter = Counter()
for d in D:
if dish_counter[d] == 0:
cnt += 1
q.append(d)
dish_counter[d] += 1
if len(q) == K + 1:
remove = q.popleft()
dish_counter[remove] -= 1
return cnt

Display the top 2 highest difference car records

How to display the top 2 rows of highest difference from a text file in python
For example here is a text file:
Mazda 64333 53333
Merce 74321 54322
BMW 52211 31432
The expected output would be
Merce 74321 54322
BMW 52211 31432
I tried multiple codes but only managed to display the actual difference and not the whole row.
would this work for you?
from operator import itemgetter
with open("x.txt", "r+") as data:
data = [i.split() for i in data.readlines()]
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
print(top)
print(top[:2])
[['BMW', 20779], ['Merce', 19999], ['Mazda', 11000]]
[['BMW', 20779], ['Merce', 19999]]
So, at a glance, this might seem slightly complicated but it's really not!
let's break down each step of the following program
from operator import itemgetter
with open("x.txt", "r+") as data:
data = [i.split() for i in data.readlines()]
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
now let's first note that operator is a built-in package, it's not an external import such as libraries like requests, and itemgetter is a pretty straightforward function.
with open("x.txt", "r+") as data should be pretty straight forward as well... all this does is open a text file with reading permissions and store that object in data.
we then use our first list comprehension which might look new to you...
data = [i.split() for i in data.readlines()]
all this is doing is going through each line for example car 123 122 and splitting it by spaces into a list like so ["car", "123", "122"].
Now if you look closely at the product of that, there's something wrong. The last 2 elements (which need to be integers to find the difference) are strings! hence, why we are going to have to use the next list comprehension to change that.
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
This is a bit more complicated... but all it's really doing is sorting a simple list comprehension.
It goes through each value in data and gets the differences! Let's see how it does that.
As you know, our data looks something like [["car", "123", "122"], ["car1", "1234", "1223"]] right now. So, we would be able to access the integer values of ["car", "123", "122"] with [1] and [2], with this knowledge we can loop through the data, and get the difference of those when they are casted to integers. E.g int(row[1])-int(row[2]) of ["car", "123", "122"] would return 1 (the difference).
With this knowledge, we can create a new list with the comprehension that contains: the car's name row[0] and the difference int(row[1])-int(row[2]) represented by [row[0], int(row[1])-int(row[2])] in the list comp. while using row as each iterable in data we can easily form this! Heres that list comprehension by itself:
[[row[0], int(row[1])-int(row[2])] for row in data]
Finally, we have arrived at the last piece of this little program... the sorted function! sorted() will return a sorted list based on the key you give it (and you can use reverse=True to have the greatest values first). it's really not that hard to understand when it's abbreviated as follows:
sorted([the list comprehension],key=itemgetter(1), reverse=True)
So while you might know that yes, it's sorting that list comprehension we made and listing the biggest values first, you might not know how its sorting this! To know how it's being sorted we need to look at the key.
itemgetter() is a function of the operator class. All you really need to know about it is that it's getting the 1st index of the lists given and therefore sorting by that. If you can recall each element of our data looks like ["car", difference] (difference is just a placeholder for what actually is the integer difference). Since we want the greatest differences then it makes sense to sort by them right?
using itemgetter(1) it will sort by the 1st index; the difference! and that pretty much sums it up :)
we store all of that to the variable top and then print the first two elements with print(top[:2])
I hope this helped!
Create a dict that contains the distances of each row with the car brand as key.
Then you can sort the dict.items() using the values and return the top 2

Random DNA mutation Generator

I'd like to create a dictionary of dictionaries for a series of mutated DNA strands, with each dictionary demonstrating the original base as well as the base it has mutated to.
To elaborate, what I would like to do is create a generator that allows one to input a specific DNA strand and have it crank out 100 randomly generated strands that have a mutation frequency of 0.66% (this applies to each base, and each base can mutate to any other base). Then, what I would like to do is create a series of dictionary, where each dictionary details the mutations that occured in a specific randomly generated strand. I'd like the keys to be the original base, and the values to be the new mutated base. Is there a straightforward way of doing this? So far, I've been experimenting with a loop that looks like this:
#yields a strand with an A-T mutation frequency of 0.066%
def mutate(string, mutation, threshold):
dna = list(string)
for index, char in enumerate(dna):
if char in mutation:
if random.random() < threshold:
dna[index] = mutation[char]
return ''.join(dna)
dna = "ATGTCGTACGTTTGACGTAGAG"
print("DNA first:", dna)
newDNA = mutate(dna, {"A": "T"}, 0.0066)
print("DNA now:", newDNA)
But I can only yield one strand with this code, and it only focuses on T-->A mutations. I'm also not sure how to tie the dictionary into this. could someone show me a better way of doing this? Thanks.
It sounds like there are two parts to your issue. The first is that you want to mutate your DNA sequence several times, and the second is that you want to gather some additional information about the mutations in a data structure of some kind. I'll handle each of those separately.
Producing 100 random results from the same source string is pretty easy. You can do it with an explicit loop (for instance, in a generator function), but you can just as easily use a list comprehension to run a single-mutation function over and over:
results = [mutate(original_string) for _ in range(100)]
Of course, if you make the mutate function more complicated, this simple code may not be appropriate. If it returns some kind of more sophisticated data structure, rather than just a string, you may need to do some additional processing to combine the data in the format you want.
As for how to build those data structures, I think the code you have already is a good start. You'll need to decide how exactly you're going to be accessing your data, and then let that guide you to the right kind of container.
For instance, if you just want to have a simple record of all the mutations that happen to a string, I'd suggest a basic list that contains tuples of the base before and after the mutation. On the other hand, if you want to be able to efficiently look up what a given base mutates to, a dictionary with lists as values might be more appropriate. You could also include the index of the mutated base if you wanted to.
Here's a quick attempt at a function that returns the mutated string along with a list of tuples recording all the mutations:
bases = "ACGT"
def mutate(orig_string, mutation_rate=0.0066):
result = []
mutations = []
for base in orig_string:
if random.random() < mutation_rate:
new_base = bases[bases.index(base) - random.randint(1, 3)] # negatives are OK
result.append(new_base)
mutations.append((base, new_base))
else:
result.append(base)
return "".join(result), mutations
The most tricky bit of this code is how I'm picking the replacement of the current base. The expression bases[bases.index(base) - random.randint(1, 3)] does it all in one go. Lets break down the different bits. bases.index(base) gives the index of the previous base in the global bases string at the top of the code. Then I subtract a random offset from this index (random.randint(1, 3)). The new index may be negative, but that's OK, as when we use it to index back into the bases string (bases[...]), negative indexes count from the right, rather than the left.
Here's how you could use it:
string = "ATGT"
results = [mutate(string) for _ in range(100)]
for result_string, mutations in results:
if mutations: # skip writing out unmutated strings
print(result_string, mutations)
For short strings, like "ATGT" you're very unlikely to get more than one mutation, and even one is pretty rare. The loop above tends to print between 2 and 4 results on each run (that is, more than 95% of length-four strings are not mutated at all). Longer strings will have mutations more often, and it's more plausible that you'll see multiple mutations in one string.

Python: Sorting a list by class objects

Working on a project for CS1, and I am close to cracking it, but this part of the code has stumped me! The object of the project is to create a list of the top 20 names in any given year by referencing a file with thousands of names on it. Each line in each file contains the name, gender, and how many times it occurs. This file is seperated by gender (so female names in order of their occurences followed by male names in order of their occurences). I have gotten the code to a point where each entry is contained within a class in a list (so this list is a long list of memory entries). Here is the code I have up to this point.
class entry():
__slots__ = ('name' , 'sex' , 'occ')
def mkEntry( name, sex, occ ):
dat = entry()
dat.name = name
dat.sex = sex
dat.occ = occ
return dat
##test = mkEntry('Mary', 'F', '7065')
##print(test.name, test.sex, test.occ)
def readFile(fileName):
fullset = []
for line in open(fileName):
val = line.split(",")
sett = mkEntry(val[0] , val[1] , int(val[2]))
fullset.append(sett)
return fullset
fullset = readFile("names/yob1880.txt")
print(fullset)
What I am wondering if I can do at this point is can I sort this list via usage of sort() or other functions, but sort the list by their occurrences (dat.occ in each entry) so in the end result I will have a list sorted independently of gender and then at that point I can print the first entries in the list, as they should be what I am seeking. Is it possible to sort the list like this?
Yes, you can sort lists of objects using sort(). sort() takes a function as an optional argument key. The key function is applied to each element in the list before making the comparisons. For example, if you wanted to sort a list of integers by their absolute value, you could do the following
>>> a = [-5, 4, 6, -2, 3, 1]
>>> a.sort(key=abs)
>>> a
[1, -2, 3, 4, -5, 6]
In your case, you need a custom key that will extract the number of occurrences for each object, e.g.
def get_occ(d): return d.occ
fullset.sort(key=get_occ)
(you could also do this using an anonymous function: fullset.sort(key=lambda d: d.occ)). Then you just need to extract the top 20 elements from this list.
Note that by default sort returns elements in ascending order, which you can manipulate e.g. fullset.sort(key=get_occ, reverse=True)
This sorts the list by using the occ property in descending order:
fullset.sort(key=lambda x: x.occ, reverse=True)
You mean you want to sort the list only by the occ? sort() has a parameter named key, you can do like this:
fullset.sort(key=lambda x: x.occ)
I think you just want to sort on the value of the 'occ' attribute of each object, right? You just need to use the key keyword argument to any of the various ordering functions that Python has available. For example
getocc = lambda entry: entry.occ
sorted(fullset, key=getocc)
# or, for in-place sorting
fullset.sort(key=getocc)
or perhaps some may think it's more pythonic to use operator.attrgetter instead of a custom lambda:
import operator
getocc = operator.attrgetter('occ')
sorted(fullset, key=getocc)
But it sounds like the list is pretty big. If you only want the first few entries in the list, sorting may be an unnecessarily expensive operation. For example, if you only want the first value you can get that in O(N) time:
min(fullset, key=getocc) # Same getocc as above
If you want the first three, say, you can use a heap instead of sorting.
import heapq
heapq.nsmallest(3, fullset, key=getocc)
A heap is a useful data structure for getting a slice of ordered elements from a list without sorting the whole list. The above is equivalent to sorted(fullset, key=getocc)[:3], but faster if the list is large.
Hopefully it's obvious you can get the three largest with heapq.nlargest and the same arguments. Likewise you can reverse any of the sorts or replace min with max.

Optimised Python dictionary / negative index storage

Raised by this question's comments (I can see that this is irrelevant), I am now aware that using dictionaries for data that needs to be queried/accessed regularly is not good, speedwise.
I have a situation of something like this:
someDict = {}
someDict[(-2, -2)] = something
somedict[(3, -10)] = something else
I am storing keys of coordinates to objects that act as arrays of tiles in a game. These are going to be negative at some point, so I can't use a list or some kind of sparse array (I think that's the term?).
Can I either:
Speed up dictionary lookups, so this would not be an issue
Find some kind of container that will support sparse, negative indices?
I would use a list, but then the querying would go from O(log n) to O(n) to find the area at (x, y). (I think my timings are off here too).
Python dictionaries are very very fast, and using a tuple of integers is not going to be a problem. However your use case seems that sometimes you need to do a single-coordinate check and doing that traversing all the dict is of course slow.
Instead of doing a linear search you can however speed up the data structure for the access you need using three dictionaries:
class Grid(object):
def __init__(self):
self.data = {} # (i, j) -> data
self.cols = {} # i -> set of j
self.rows = {} # j -> set of i
def __getitem__(self, ij):
return self.data[ij]
def __setitem__(self, ij, value):
i, j = ij
self.data[ij] = value
try:
self.cols[i].add(j)
except KeyError:
self.cols[i] = set([j])
try:
self.rows[j].add(i)
except KeyError:
self.rows[j] = add([i])
def getRow(self, i):
return [(i, j, data[(i, j)])
for j in self.cols.get(i, [])]
def getCol(self, j):
return [(i, j, data[(i, j)])
for i in self.rows.get(j, [])]
Note that there are many other possible data structures depending on exactly what you are trying to do, how frequent is reading, how frequent is updating, if you query by rectangles, if you look for nearest non-empty cell and so on.
To start off with
Speed up dictionary lookups, so this would not be an issue
Dictionary lookups are pretty fast O(1), but (from your other question) you're not relying on the hash-table lookup of the dictionary, your relying on a linear search of the dictionary's keys.
Find some kind of container that will support sparse, negative indices?
This isn't indexing into the dictionary. A tuple is an immutable object, and you are hashing the tuple as a whole. The dictionary really has no idea of the contents of the keys, just their hash.
I'm going to suggest, as others did, that you restructure your data.
For example, you could create objects that encapsulate the data you need, and arrange them in a binary tree for O(n lg n) searches. You can even go so far as to wrap the entire thing in a class that will give you the nice if foo in Bar: syntax your looking for.
You probably need a couple coordinated structures to accomplish what you want. Here's a simplified example using dicts and sets (tweaking user 6502's suggestion a bit).
# this will be your dict that holds all the data
matrix = {}
# and each of these will be a dict of sets, pointing to coordinates
cols = {}
rows = {}
def add_data(coord, data)
matrix[coord] = data
try:
cols[coord[0]].add(coord)
except KeyError:
# wrap coords in a list to prevent set() from iterating over it
cols[coord[0]] = set([coord])
try:
rows[coord[1]].add(coord)
except KeyError:
rows[coord[1]] = set([coord])
# now you can find all coordinates from a row or column quickly
>>> add_data((2, 7), "foo4")
>>> add_data((2, 5), "foo3")
>>> 2 in cols
True
>>> 5 in rows
True
>>> [matrix[coord] for coord in cols[2]]
['foo4', 'foo3']
Now just wrap that in a class or a module, and you'll be off, and as always, if it's not fast enough profile and test before you guess.
Dictionary lookups are very fast. Searching for part of the key (e.g. all tiles in row x) is what's not fast. You could use a dict of dicts. Rather than a single dict indexed by a 2-tuple, use nested dicts like this:
somedict = {0: {}, 1:{}}
somedict[0][-5] = "thingy"
somedict[1][4] = "bing"
Then if you want all the tiles in a given "row" it's just somedict[0].
You will need some logic to add the secondary dictionaries where necessary and so on. Hint: check out getitem() and setdefault() on the standard dict type, or possibly the collections.defaultdict type.
This approach gives you quick access to all tiles in a given row. It's still slow-ish if you want all the tiles in a given column (though at least you won't need to look through every single cell, just every row). However, if needed, you could get around that by having two dicts of dicts (one in column, row order and the other in row, column order). Updating then becomes twice as much work, which may not matter for a game where most of the tiles are static, but access is very easy in either direction.
If you only need to store numbers and most of your cells will be 0, check out scipy's sparse matrix classes.
One alternative would be to simply shift the index so it's positive.
E.g. if your indices are contiguous like this:
...
-2 -> a
-1 -> c
0 -> d
1 -> e
2 -> f
...
Just do something like LookupArray[Index + MinimumIndex], where MinimumIndex is the absolute value of the smallest index you would use.
That way, if your minimum was say, -50, it would map to 0. -20 would map to 30, and so forth.
Edit:
An alternative would be to use a trick with how you use the indices. Define the following key function
Key(n) = 2 * n (n >= 0)
Key(n) = -2 * n - 1. (n < 0)
This maps all positive keys to the positive even indices, and all negative elements to the positive odd indices. This may not be practical though, since if you add 100 negative keys, you'd have to expand your array by 200.
One other thing to note: If you plan on doing look ups and the number of keys is constant (or very slowly changing), stick with an array. Otherwise, dictionaries aren't bad at all.
Use multi-dimensional lists -- usually implemented as nested objects. You can easily make this handle negative indices with a little arithmetic. It might use a more memory than a dictionary since something has to be put in every possible slot (usually None for empty ones), but access will be done via simple indexing lookup rather than hashing as it would with a dictionary.

Categories

Resources