Understanding Hash Tables with python as a reference - python

I am going through online lectures on data structures and I want to confirm my understanding of the hash table.
I understand that a hash table will use a hashing function to reduce the universe of all possible keys down to a set m and use chaining to resolve collisions.
I can't seem to visualize the m part of it. Say I create an empty dict() in python. Does python create a table with some predefined number of empty slots?

Overview
An overview of how Python's dictionaries are implemented can be found in the 2017 Pycon talk, Modern Python Dictionaries A confluence of a dozen great ideas.
How to visualize reduction
I understand that a hash table will use a hashing function to reduce the universe of all possible keys down to a set m and use chaining to resolve collisions. ... I can't seem to visualize the m part of it.
The easiest visualization is with m == 2 so that hashing divides keys into two groups:
>>> from pprint import pprint
>>> def hash(n):
'Hash a number into evens or odds'
return n % 2
>>> table = [[], []]
>>> for x in [10, 15, 12, 41, 80, 13, 40, 9]:
table[hash(x)].append(x)
>>> pprint(table, width=25)
[[10, 12, 80, 40],
[15, 41, 13, 9]]
In the above example, the eight keys all get divided into two groups (the evens and the odds).
The example also works with bigger values of m such as m == 7:
>>> table = [[], [], [], [], [], [], []]
>>> for x in [10, 15, 12, 41, 80, 13, 40, 9]:
table[x % 7].append(x)
>>> pprint(table, width=25)
[[],
[15],
[9],
[10, 80],
[],
[12, 40],
[41, 13]]
As you can see, the above example has two empty slots and slots with a collision.
Table for an empty dict
Say I create an empty dict() in python. Does python create a table with some predefined number of null entries?
Yes, Python creates eight slots for an empty table. In Python's source code, we see #define PyDict_MINSIZE 8 in cpython/Objects/dictobject.c.

Related

Is it possible to find the original name of lists put in another list, by interacting with the latter list?

I have a couple lists (raw data from elsewhere), that I collect in another list, to do stuff with later in the code. (So if I were to edit the raw data I am using, I can just change the original lists, edit the everything-list to reflect added/removed lists, and have all the subsequent code reflect those changes without me having to change anything in the rest of the code.)
Like so:
a=[1,2,3]
b=[55,9,18]
c=[15,234,2]
everything=[a,b,c]
At one point I would like to use the NAMES of my original lists ('a','b', and 'c' in my example).
Is there a way for me to use my list 'everything' to access the names of the lists put in it?
(So for the code
for i in range(len(everything)):
print('The data from',???,'is',everything[i])
??? would be replaced by something to ideally print
The data from a is [1, 2, 3]
The data from b is [55, 9, 18]
The data from c is [15, 234, 2]
)
You can use dictionaries for this.
a=[1,2,3]
b=[55,9,18]
c=[15,234,2]
everything={'a':a,'b': b,'c': c}
for i in range(len(everything['a'])):
everything['a'][i] += 10
print(everything)
# >> {'a': [11, 12, 13], 'b': [55, 9, 18], 'c': [15, 234, 2]}
print(a)
# >> [11, 12, 13]
for var, val in everything.items():
print(f'The data from {var} is {val}')
"""
>>The data from a is [11, 12, 13]
The data from b is [55, 9, 18]
The data from c is [15, 234, 2]
"""
There's a way you can do this, but using a dictionary is equivalent to your case as its keys are unique and can be used as your variable name. Hence, with dictionaries you can retrieve values and print them in any format you need:
a = [1,2,3]
b = [55,9,18]
c = [15,234,2]
everything= {'a': a, 'b': b, 'c': c}
for k, v in everything.items():
print(f'The data from {k} is {v}')
If you are trying to access the variable name using id, this can be used.
a=[1,2,3]
b=[55,9,18]
c=[15,234,2]
everything = [a,b,c]
def get_name(your_id):
name = [x for x,_ in globals().items() if id(_)==your_id ]
return(name[0])
for i in range(len(everything)):
print('The data from',get_name(id(everything[i])),'is',everything[i])
This outputs:
('The data from', 'a', 'is', [1, 2, 3])
('The data from', 'b', 'is', [55, 9, 18])
('The data from', 'c', 'is', [15, 234, 2])
globals is a built-in which returns a dict of variables/values in the global name space. So you could get the variable name given the id.

List intersection in Django ORM how to?

Let's say, I have two tables: all_my_friends_ids and my_facebook_friends_ids which represent two lists of my friends in database:
all_my_friends_ids = self.user.follows.values_list('pk', flat=True)
(e.g. all_my_friends_ids = [1, 4, 9, 16, 18, 20, 24, 70])
my_facebook_friends_ids = User.objects.filter(facebook_uid__in=my_facebook_friends_uids)
(e.g. my_facebook_friends_ids = [4, 16, 28, 44, 39])
I want to check if all elements of my_facebook_friends_ids list have entry in all_my_friends_ids or not, and if not - return id elements that are not in the all_my_friends_ids list (and add them later in all_my_friends_ids).
How to solve this task in Django ORM with QuerySet? I tried to extract ids and apply this function to them:
def sublistExists(list1, list2):
return ''.join(map(str, list2)) in ''.join(map(str, list1))
but it doesn't seem the right way, especially for my case.
facebook_exclusives = (User.objects
.filter(facebook_uid__in=my_facebook_friends_uids)
.exclude(facebook_uid__in=all_my_friends_ids))
If you want, you can offload it to your database completely, without creating a (potentially huge) intermediate list in Python:
facebook_exclusives = (User.objects
.filter(facebook_uid__in=my_facebook_friends_uids)
.exclude(facebook_uid__in=self.user.follows.all()))

Generating a list of prime numbers using list comprehension

I'm trying to create a list of all the prime numbers less than or equal to a given number. I did that successfully using for loops. I was trying to achieve the same using list comprehension using python. But my output has some unexpected values.
Here is my code..
pr=[2]
pr+=[i for i in xrange(3,num+1) if not [x for x in pr if i%x==0]]
where num is the number I had taken as input from user.
The output of the above code for
num=20 is this: [2, 3, 5, 7, 9, 11, 13, 15, 17, 19]
I'm puzzled as to why 9 and 15 are there in the output. What am I doing wrong here?
It simply doesn’t work that way. List comprehensions are evaluated separately, so you can imagine it like this:
pr = [2]
tmp = [i for i in xrange(3,num+1) if not [x for x in pr if i%x==0]]
pr += tmp
By the time tmp is evaluated, pr only contains 2, so you only ever check if a number is divisible by 2 (i.e. if it’s even). That’s why you get all uneven numbers.
You simply can’t solve this nicely† using list comprehensions.
† Not nicely, but ugly and in a very hackish way, by abusing that you can call functions inside a list comprehension:
pr = [2]
[pr.append(i) for i in xrange(3,num+1) if not [x for x in pr if i%x==0]]
print(pr) # [2, 3, 5, 7, 11, 13, 17, 19]
This abuses list comprehensions and basically collects a None value for each prime number you add to pr. So it’s essentially like your normal for loop except that we unnecessarily collect None values in a list… so you should rather allow yourself to use a line break and just use a normal loop.
Your list pr doesn't update until after your entire list comprehension is done. This means your list only contains 2, so every number dividable by 2 is not in the list (as you can see). You should update the list whenever you found a new prime number.
This is because the pr += [...] is evaluated approximately as this:
pr = [2]
tmp = [i for i in xrange(3,num+1) if not [x for x in pr if i%x==0]]
pr.extend(tmp)
So while tmp is generated, contents of pr remains the same ([2]).
I would go with function like this:
>>> import itertools
>>> def primes():
... results = []
... for i in itertools.count(2):
... if all(i%x != 0 for x in results):
... results.append(i)
... yield i
...
# And then you can fetch first 10 primes
>>> list(itertools.islice(primes(), 10))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
# Or get primes smaller than X
>>> list(itertools.takewhile(lambda x: x < 50, primes()))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
Note, that using all is more efficient than creating array and testing whether it's empty.

python slicing a string of numbers in to sections based on lengths within the string

I have a string of numbers that I want to read from a file and parse into sub-sections, with lengths of the subsections based on numbers within the string. The first number in the string is the length of the first sub-section. So for example, if I have a string of data as follows:
4, 11, 22, 33, 3, 44, 55, 5, 44, 55, 66, 77
I want to divide up as follows:
first subsection is length 4, so, 4, 11, 22, 33
second subsection is length 3, so 3, 44, 55
third subsection is length 5, so 5, 44, 55, 66, 77
I tried using variables in slice, so that I could increment the start/stop values as I march through the data, but it doesn't take vars. I worked out a way to delete each subsection as I go so that the first value will always be the length of the next subsection, but it seems sort of clunky.
I'd appreciate any suggestions - thx
You can do something like:
your_list = [4, 11, 22, 33, 3, 44, 55, 5, 44, 55, 66, 77]
subsec = []
it = iter(your_list)
for n in it:
subsec.append([n] + map(lambda x: next(it), range(int(n-1))))
This way you only loop once over your list.
or
for n in it:
subsec.append([n] + [next(it) for _ in range(int(n-1))])
When dealing with more complex logic, I prefer to use regular loops.
In this case I would go with a while loop, running until the list is empty, and removing the elements already processed. If the sections are wrong (i.e. the last section goes beyond the size of the string), the assert will tell you.
sequence = [4, 11, 22, 33, 3, 44, 55, 5, 44, 55, 66, 77]
sections = []
while sequence:
section_size = sequence[0]
assert len(sequence) >= section_size
sections.append(sequence[:section_size])
sequence = sequence[section_size:]
print sections
This splits the sections and save them in a list called sections, with the size as first element, like in your examples.
Edit: added error checking.
Just thought I'd throw this out there. Very similar to both BoppreH's solution, but it avoids the overhead of creating n additional lists by iterating over indices:
def generateSlices(seq):
i = 0
while i < len(seq):
n = x[i]
yield x[i:i + n]
i += n
You can check for errors after generating a list of sublists by doing:
mySubLists = [[5, 23, 33, 44, 2], [10]]
all(len(x) == x[0] for x in mySubLists)
Incidentally, why is your data structured in this strange way? It seems error-prone.

how to random a list using python

this is my code :
import random
a = [12,2,3,4,5,33,14,124,55,233,565]
b=[]
for i in a:
b.append(random.choice(a))
print a,b
but i think maybe has a method like sort named randomList
has this method in python .
thanks
import random
a = [12,2,3,4,5,33,14,124,55,233,565]
b = a[:]
random.shuffle(b)
# b: [55, 12, 33, 5, 565, 3, 233, 2, 124, 4, 14]
This will not modify a.
To modify a inplace, just do random.shuffle(a).
I think you are looking for random.shuffle.
you could use random.shuffle
random.shuffle(a)
would give a random order of a.
>>> random.sample(a, len(a))
[14, 124, 565, 233, 55, 12, 5, 33, 4, 3, 2]
this has several advantages over random.shuffle:
a new list is returned (no changes to original a)
the resulting list is in selection order so that all sub-slices will also be valid random samples
in random.shuffle, most permutations of a long sequence can never be generated.
All elements of a are part of the returned list. See more here.

Categories

Resources