Understanding the in operator in Python - python

index = []
def add_to_index(index,keyword,url):
if len(index) == 0:
index.append([keyword, [url]])
elif keyword in index:
find_key_pos = index.find(keyword)
index.insert(find_key_pos + len(keyword), url)
add_to_index(index,'udacity','http://udacity.com')
add_to_index(index,'udacity','http://npr.org')
print(index)
My output is:
[['udacity', ['http://udacity.com']]]
Actually the output has to be
[['udacity', ['http://udacity.com', 'http://npr.org']]
Whenever the keyword already exists in the index list, I just have to insert the url to the list that is next to the keyword.
In,
add_to_index(index,'udacity','http://udacity.com')
add_to_index(index,'udacity','http://npr.org')
The keyword 'udacity' is the same that is why I should add the different url's after that keyword.

Your bugs:
index.insert(find_key_pos + len(keyword), url)
The first parameter to list.insert() is the index for the new element. You actually only want to get the list for your keyword though and append a new URL to the nested list.
What you want instead is:
index[find_key_pos].append(url)
Second bug lies in the re-use of the index variable. Your function parameter is shadowing the list from the parent scope. Use different names. Your code will work, because lists are mutable and you are passing around references to the same list, but it will create a hella lot of confusion down the road.
But what you should really do is you should look up Python dictionaries. They offer the keyword functionality out of the box.
Here's a small dict wrapper that will make your life easier:
class ListDict():
def __init__(self):
self.index = ()
def addEntry(self, key, entry):
if key in self.index:
self.index[key].append(entry)
else:
self.index[key] = [entry]
def getEntries(self, key):
if key in self.index:
return self.index[key]
else:
return []
Usage:
websiteUrls = ListDict()
websiteUrls.addEntry("udemy", "foo")
websiteUrls.addEntry("udemy", "bar")
websiteUrls.getEntries("udemy")
# ["foo", "bar"]
websiteUrls.getEntries("nope")
# []

Related

Adding items to a list if it's not a function

I'm trying to write a function right now, and its purpose is to go through an object's __dict__ and add an item to a dictionary if the item is not a function.
Here is my code:
def dict_into_list(self):
result = {}
for each_key,each_item in self.__dict__.items():
if inspect.isfunction(each_key):
continue
else:
result[each_key] = each_item
return result
If I'm not mistaken, inspect.isfunction is supposed to recognize lambdas as functions as well, correct? However, if I write
c = some_object(3)
c.whatever = lambda x : x*3
then my function still includes the lambda. Can somebody explain why this is?
For example, if I have a class like this:
class WhateverObject:
def __init__(self,value):
self._value = value
def blahblah(self):
print('hello')
a = WhateverObject(5)
So if I say print(a.__dict__), it should give back {_value:5}
You are actually checking if each_key is a function, which most likely is not. You actually have to check the value, like this
if inspect.isfunction(each_item):
You can confirm this, by including a print, like this
def dict_into_list(self):
result = {}
for each_key, each_item in self.__dict__.items():
print(type(each_key), type(each_item))
if inspect.isfunction(each_item) == False:
result[each_key] = each_item
return result
Also, you can write your code with dictionary comprehension, like this
def dict_into_list(self):
return {key: value for key, value in self.__dict__.items()
if not inspect.isfunction(value)}
I can think of an easy way to find the variables of an object through the dir and callable methods of python instead of inspect module.
{var:self.var for var in dir(self) if not callable(getattr(self, var))}
Please note that this indeed assumes that you have not overrided __getattr__ method of the class to do something other than getting the attributes.

Why I cannot pass my parameter python

I wrote a simple function to calculate mode, but it seems one parameter does not pass successfully.
I initial countdict= dict() in the main function, then I pass it mod = mode(magList, countdict).
In mode(alist, countdict), countdict= dict(zip(alist,[0]*len(alist))). and countdict can print in mode.
but when I try to print(countdict) in main function, the output says it is empty. I check my code, it says in function mode, I unused countdict. How could that be possible.
The whole code is as following:
def mode(alist, countdict):
countdict= dict(zip(alist,[0]*len(alist)))
for x in alist:
countdict[x]+=1
maxcount =max(countdict.values())
modelist = [ ]
for item in countdict:
if countdict[item] == maxcount:
modelist.append(item)
return modelist
def makeMagnitudeList():
quakefile = open("earthquakes.txt","r")
headers = quakefile.readline()
maglist = [ ]
for aline in quakefile:
vlist = aline.split()
maglist.append(float(vlist[1]))
return maglist
def mymain():
magList = makeMagnitudeList()
print(magList)
countdict= dict()
mod = mode(magList, countdict)
print("mode: ", mod)
print(countdict)
if __name__=='__main__':
mymain()
As I said earlier, the line:
countdict= dict(zip(alist,[0]*len(alist)))
will wipe out the reference to countdict that you passed in. Because of this, the countdict variable you are printing is the original, empty dictionary. The question Liarez linked to: How do I pass a variable by reference? will help explain why this is happening.
To get around this, you could change the return statement in the mode function to:
return (modelist, countdict)
which will return a tuple containing both modelist and countdict. When calling this function, you would write:
(mod, countdict) = mode(magList, countdict)
ensuring that the modified countdict is returned, meaning that your print function call should not output an empty dictionary.
The other thing to note is that the countdict you are passing into mode is empty anyway, so you may find it better to simply not pass this argument in and have mode take only one parameter.
This line of code is your problem.
countdict= dict(zip(alist,[0]*len(alist)))
Python dictionaries are mutable objects and can be changed in a function, however, dictionary itself is not passed to the function. Only reference is passed and is passed by value. It means that when you assign a new dictionary to your countdict parameter, you lose the original reference pointing at countdict created in your mymain function.

Python set with the ability to pop a random element

I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:
import random
random.sample(mySet, 1)
But this is quite slow for large sets (it runs in O(n) time).
Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):
for e in mySet:
break
# e is now an element from mySet
I coded my own rudimentary class which has constant time lookup, deletion, and random values.
class randomSet:
def __init__(self):
self.dict = {}
self.list = []
def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)
def addIterable(self, item):
for a in item:
self.add(a)
def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]
def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]
def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue
Are there any better implementations for this, or any big improvements to be made to this code?
I think the best way to do this would be to use the MutableSet abstract base class in collections. Inherit from MutableSet, and then define add, discard, __len__, __iter__, and __contains__; also rewrite __init__ to optionally accept a sequence, just like the set constructor does. MutableSet provides built-in definitions of all other set methods based on those methods. That way you get the full set interface cheaply. (And if you do this, addIterable is defined for you, under the name extend.)
discard in the standard set interface appears to be what you have called delete here. So rename delete to discard. Also, instead of having a separate popRandom method, you could just define popRandom like so:
def popRandom(self):
item = self.getRandom()
self.discard(item)
return item
That way you don't have to maintain two separate item removal methods.
Finally, in your item removal method (delete now, discard according to the standard set interface), you don't need an if statement. Instead of testing whether index == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whether index == len(self.list) - 1 or not:
def discard(self, item):
if item in self.dict:
index = self.dict[item]
self.list[index], self.list[-1] = self.list[-1], self.list[index]
self.dict[self.list[index]] = index
del self.list[-1] # or in one line:
del self.dict[item] # del self.dict[self.list.pop()]
One approach you could take is to derive a new class from set which salts itself with random objects of a type derived from int.
You can then use pop to select a random element, and if it is not of the salt type, reinsert and return it, but if it is of the salt type, insert a new, randomly-generated salt object (and pop to select a new object).
This will tend to alter the order in which objects are selected. On average, the number of attempts will depend on the proportion of salting elements, i.e. amortised O(k) performance.
Can't we implement a new class inheriting from set with some (hackish) modifications that enable us to retrieve a random element from the list with O(1) lookup time? Btw, on Python 2.x you should inherit from object, i.e. use class randomSet(object). Also PEP8 is something to consider for you :-)
Edit:
For getting some ideas of what hackish solutions might be capable of, this thread is worth reading:
http://python.6.n6.nabble.com/Get-item-from-set-td1530758.html
Here's a solution from scratch, which adds and pops in constant time. I also included some extra set functions for demonstrative purposes.
from random import randint
class RandomSet(object):
"""
Implements a set in which elements can be
added and drawn uniformly and randomly in
constant time.
"""
def __init__(self, seq=None):
self.dict = {}
self.list = []
if seq is not None:
for x in seq:
self.add(x)
def add(self, x):
if x not in self.dict:
self.dict[x] = len(self.list)
self.list.append(x)
def pop(self, x=None):
if x is None:
i = randint(0,len(self.list)-1)
x = self.list[i]
else:
i = self.dict[x]
self.list[i] = self.list[-1]
self.dict[self.list[-1]] = i
self.list.pop()
self.dict.pop(x)
return x
def __contains__(self, x):
return x in self.dict
def __iter__(self):
return iter(self.list)
def __repr__(self):
return "{" + ", ".join(str(x) for x in self.list) + "}"
def __len__(self):
return len(self.list)
Yes, I'd implement an "ordered set" in much the same way you did - and use a list as an internal data structure.
However, I'd inherit straight from "set" and just keep track of the added items in an
internal list (as you did) - and leave the methods I don't use alone.
Maybe add a "sync" method to update the internal list whenever the set is updated
by set-specific operations, like the *_update methods.
That if using an "ordered dict" does not cover your use cases. (I just found that trying to cast ordered_dict keys to a regular set is not optmized, so if you need set operations on your data that is not an option)
If you don't mind only supporting comparable elements, then you could use blist.sortedset.

Accessing list items with getattr/setattr in Python

Trying to access/assign items in a list with getattr and setattr funcions in Python.
Unfortunately there seems to be no way of passing the place in the list index along with the list name.
Here's some of my tries with some example code:
class Lists (object):
def __init__(self):
self.thelist = [0,0,0]
Ls = Lists()
# trying this only gives 't' as the second argument. Python error results.
# Interesting that you can slice a string to in the getattr/setattr functions
# Here one could access 'thelist' with with [0:7]
print getattr(Ls, 'thelist'[0])
# tried these two as well to no avail.
# No error message ensues but the list isn't altered.
# Instead a new variable is created Ls.'' - printed them out to show they now exist.
setattr(Lists, 'thelist[0]', 3)
setattr(Lists, 'thelist\[0\]', 3)
print Ls.thelist
print getattr(Ls, 'thelist[0]')
print getattr(Ls, 'thelist\[0\]')
Also note in the second argument of the attr functions you can't concatenate a string and an integer in this function.
Cheers
getattr(Ls, 'thelist')[0] = 2
getattr(Ls, 'thelist').append(3)
print getattr(Ls, 'thelist')[0]
If you want to be able to do something like getattr(Ls, 'thelist[0]'), you have to override __getattr__ or use built-in eval function.
You could do:
l = getattr(Ls, 'thelist')
l[0] = 2 # for example
l.append("bar")
l is getattr(Ls, 'thelist') # True
# so, no need to setattr, Ls.thelist is l and will thus be changed by ops on l
getattr(Ls, 'thelist') gives you a reference to the same list that can be accessed with Ls.thelist.
As you discovered, __getattr__ doesn't work this way. If you really want to use list indexing, use __getitem__ and __setitem__, and forget about getattr() and setattr(). Something like this:
class Lists (object):
def __init__(self):
self.thelist = [0,0,0]
def __getitem__(self, index):
return self.thelist[index]
def __setitem__(self, index, value):
self.thelist[index] = value
def __repr__(self):
return repr(self.thelist)
Ls = Lists()
print Ls
print Ls[1]
Ls[2] = 9
print Ls
print Ls[2]

Python Scoping/Static Misunderstanding

I'm really stuck on why the following code block 1 result in output 1 instead of output 2?
Code block 1:
class FruitContainer:
def __init__(self,arr=[]):
self.array = arr
def addTo(self,something):
self.array.append(something)
def __str__(self):
ret = "["
for item in self.array:
ret = "%s%s," % (ret,item)
return "%s]" % ret
arrayOfFruit = ['apple', 'banana', 'pear']
arrayOfFruitContainers = []
while len(arrayOfFruit) > 0:
tempFruit = arrayOfFruit.pop(0)
tempB = FruitContainer()
tempB.addTo(tempFruit)
arrayOfFruitContainers.append(tempB)
for container in arrayOfFruitContainers:
print container
**Output 1 (actual):**
[apple,banana,pear,]
[apple,banana,pear,]
[apple,banana,pear,]
**Output 2 (desired):**
[apple,]
[banana,]
[pear,]
The goal of this code is to iterate through an array and wrap each in a parent object. This is a reduction of my actual code which adds all apples to a bag of apples and so forth. My guess is that, for some reason, it's either using the same object or acting as if the fruit container uses a static array. I have no idea how to fix this.
You should never use a mutable value (like []) for a default argument to a method. The value is computed once, and then used for every invocation. When you use an empty list as a default value, that same list is used every time the method is invoked without the argument, even as the value is modified by previous function calls.
Do this instead:
def __init__(self,arr=None):
self.array = arr or []
Your code has a default argument to initialize the class. The value of the default argument is evaluated once, at compile time, so every instance is initialized with the same list. Change it like so:
def __init__(self, arr=None):
if arr is None:
self.array = []
else:
self.array = arr
I discussed this more fully here: How to define a class in Python
As Ned says, the problem is you are using a list as a default argument. There is more detail here. The solution is to change __init__ function as below:
def __init__(self,arr=None):
if arr is not None:
self.array = arr
else:
self.array = []
A better solution than passing in None — in this particular instance, rather than in general — is to treat the arr parameter to __init__ as an enumerable set of items to pre-initialize the FruitContainer with, rather than an array to use for internal storage:
class FruitContainer:
def __init__(self, arr=()):
self.array = list(arr)
...
This will allow you to pass in other enumerable types to initialize your container, which more advanced Python users will expect to be able to do:
myFruit = ('apple', 'pear') # Pass a tuple
myFruitContainer = FruitContainer(myFruit)
myOtherFruit = file('fruitFile', 'r') # Pass a file
myOtherFruitContainer = FruitContainer(myOtherFruit)
It will also defuse another potential aliasing bug:
myFruit = ['apple', 'pear']
myFruitContainer1 = FruitContainer(myFruit)
myFruitContainer2 = FruitContainer(myFruit)
myFruitContainer1.addTo('banana')
'banana' in str(myFruitContainer2)
With all other implementations on this page, this will return True, because you have accidentally aliased the internal storage of your containers.
Note: This approach is not always the right answer: "if not None" is better in other cases. Just ask yourself: am I passing in a set of objects, or a mutable container? If the class/function I'm passing my objects in to changes the storage I gave it, would that be (a) surprising or (b) desirable? In this case, I would argue that it is (a); thus, the list(...) call is the best solution. If (b), "if not None" would be the right approach.

Categories

Resources