Undesired python feedparser instantiation relic

Undesired python feedparser instantiation relic - python

Question: How do I kill an instantiation or insure i'm creating a new instantiation of the python universal feedparser?
Info:
I'm working on a program right now that downloads and catalogs large numbers of blogs. It has worked well so for except for an unfortunate bug. My code is set up to take a list of blog urls and run them through a for loop. each run it picks a url and sends it down to a separate class which manages the downloading, extracting, and saving of the data to a file.
The first url works just fine. It downloads the entirety of the blog and saves it to a file. But the second blog it downloads will have all the data from the first one as well, I'm totally clueless as to why.
Code snippets:
class BlogHarvester:
def __init__(self,folder):
f = open(folder,'r')
stop = folder[len(folder)-1]
while stop != '/':
folder = folder[0:len(folder)-1]
stop = folder[len(folder)-1]
blogs = []
for line in f:
blogs.append(line)
for herf in blogs:
blog = BlogParser(herf)
sPath = ""
uid = newguid()##returns random hash.
sPath = uid
sPath = sPath + " - " + blog.posts[0].author[1:5] + ".blog"
print sPath
blog.storeAsFile(sPath)
class BlogParser:
def __init__(self, blogherf='null', path='null', posts = []):
self.blogherf = blogherf
self.blog = feedparser.parse(blogherf)
self.path = path
self.posts = posts
if blogherf != 'null':
self.makeList()
elif path != 'null':
self.loadFromFile()
class BlogPeices:
def __init__(self,title,author,post,date,publisher,rights,comments):
self.author = author
self.title = title
self.post = post
self.date = date
self.publisher = publisher
self.rights = rights
self.comments = comments
I included snippets I figured that would probably be useful. Sorry if there are any confusing artifacts. This program has been a pain in the butt.

The problem is posts=[]. Default arguments are calculated at compile time, not runtime, so mutations to the object remain for the lifetime of the class. Instead use posts=None and test:
if posts is None:
self.posts = []

As what Ignacio said, any mutations that happen to the default arguments in the function list will stay for the life of the class.
From http://docs.python.org/reference/compound_stmts.html#function-definitions
Default parameter values are evaluated
when the function definition is
executed. This means that the
expression is evaluated once, when the
function is defined, and that that
same “pre-computed” value is used for
each call. This is especially
important to understand when a default
parameter is a mutable object, such as
a list or a dictionary: if the
function modifies the object (e.g. by
appending an item to a list), the
default value is in effect modified.
This is generally not what was
intended. A way around this is to use
None as the default, and explicitly
test for it in the body of the
function.
But this brings up sort of a gotcha, you are modifying a reference... So you may be modifying a list that the consumer of the class that wasn't expected to be modified:
For example:
class A:
def foo(self, x = [] ):
x.append(1)
self.x = x
a = A()
a.foo()
print a.x
# prints: [1]
a.foo()
print a.x
# prints: [1,1] # !!!! Consumer would expect this to be [1]
y = [1,2,3]
a.foo(y)
print a.x
# prints: [1, 2, 3, 1]
print y
# prints: [1, 2, 3, 1] # !!!! My list was modified
If you were to copy it instead: (See http://docs.python.org/library/copy.html )
import copy
class A:
def foo(self, x = [] ):
x = copy.copy(x)
x.append(1)
self.x = x
a = A()
a.foo()
print a.x
# prints: [1]
a.foo()
print a.x
# prints: [1] # !!! Much better =)
y = [1,2,3]
a.foo(y)
print a.x
# prints: [1, 2, 3, 1]
print y
# prints: [1, 2, 3] # !!!! My list is how I made it

Related

In python, is there a way to automatically log information any time you create a variable?

Not sure if this makes sense at all, but here's an example:
Let's say I have a script. In this script, I create a list
list = [1,2,3,4]
Maybe I just don't have the technical vocabulary to find what I'm looking for, but is there any way I could set some logging up so that any time I created a variable I could store information in a log file? Given the above example, maybe I'd want to see how many elements are in the list?
I understand that I could simply write a function and call that over and over again, but let's say I might want to know information about a ton of different data types, not just lists. It wouldn't be clean to call a function repeatedly.

this is hackery but what the heck
class _LoggeryType(type):
def __setattr__(cls,attr,value):
print("SET VAR: {0} = {1}".format(attr,value))
globals().update({attr:value})
# Python3
class Loggery(metaclass=_LoggeryType):
pass
# python2
class Loggery:
__metaclass__=_LoggeryType
Loggery.x = 5
print("OK set X={0}".format(x))
note i wouldn't really recommend using this

One method would be to use the powerful sys.settrace. I've written up a small (but somewhat incomplete) example:
tracer.py:
import inspect
import sys
import os
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('tracing-logger')
FILES_TO_TRACE = [os.path.basename(__file__), 'tracee.py']
print(FILES_TO_TRACE)
def new_var(name, value, context):
logger.debug(f"New {context} variable called {name} = {value}")
# do some analysis here, for example
if type(value) == list:
logger.debug(f"\tNumber of elements: {len(value)}")
def changed_var(name, value, context):
logger.debug(f"{context} variable called {name} of was changed to: {value}")
def make_tracing_func():
current_locals = {}
current_globals = {}
first_line_executed = False
def tracing_func(frame, event, arg):
nonlocal first_line_executed
frame_info = inspect.getframeinfo(frame)
filename = os.path.basename(frame_info.filename)
line_num = frame_info.lineno
if event == 'line':
# check for difference with locals
for var_name in frame.f_code.co_varnames:
if var_name in frame.f_locals:
var_value = frame.f_locals[var_name]
if var_name not in current_locals:
current_locals[var_name] = var_value
new_var(var_name, var_value, 'local')
elif current_locals[var_name] != var_value:
current_locals[var_name] = var_value
changed_var(var_name, var_value, 'local')
for var_name, var_value in frame.f_globals.items():
if var_name not in current_globals:
current_globals[var_name] = var_value
if first_line_executed:
new_var(var_name, var_value, 'global')
elif current_globals[var_name] != var_value:
current_globals[var_name] = var_value
changed_var(var_name, var_value, 'global')
first_line_executed = True
return tracing_func
elif event == 'call':
if os.path.basename(filename) in FILES_TO_TRACE:
return make_tracing_func()
return None
return tracing_func
sys.settrace(make_tracing_func())
import tracee
tracee.py
my_list = [1, 2, 3, 4]
a = 3
print("tracee: I have a list!", my_list)
c = a + sum(my_list)
print("tracee: A number:", c)
c = 12
print("tracee: I changed it:", c)
Output:
DEBUG:tracing-logger:New global variable called my_list = [1, 2, 3, 4]
DEBUG:tracing-logger: Number of elements: 4
DEBUG:tracing-logger:New global variable called a = 3
tracee: I have a list! [1, 2, 3, 4]
DEBUG:tracing-logger:New global variable called c = 13
tracee: A number: 13
DEBUG:tracing-logger:global variable called c was changed to: 12
tracee: I changed it: 12
There are some additional cases you may want to handle (duplicated changes to globals due to function calls, closure variables, etc.). You can also use linecache to find the contents of the lines, or use the line_num variable in the logging.

Python object is being referenced by an object I cannot find

I am trying to remove an object from memory in python and I am coming across an object that it is not being removed. From my understanding if there is no references to the object the garbage collector will de-allocate the memory when it is run. However after I have removed all of the references if I run
bar = Foo()
print gc.get_referrers(bar)
del bar
baz = gc.collect()
print baz
I get a reply of
[< frame object at 0x7f1eba291e50>]
0
So how come does it not delete the object?
I get the same reply for all of the instances of objects if i do
bar = [foo() for i in range(0, 10)]
for x in range(0,len(bar))
baz = bar[x]
del bar[x]
print gc.get_referrers(baz)
How do I completely remove all referrers from an object/any idea what the frame object that is on all is?
I thought it would be the object frame(?) that contains a list of all objects in the program but I have not been able to confirm that/find a way to rid objects from being referenced by said mystical(to me) object fram.
Any help would be greatly appreciated
Edit:
Okay I rewrote the code to the simple form pulling out everything except the basics
import random, gc
class Object():
def __init__(self):
self.n=None
self.p=None
self.isAlive=True
def setNext(self,object):
self.n=object
def setPrev(self, object):
self.p=object
def getNext(self):
return self.n
def getPrev(self):
return self.p
def simulate(self):
if random.random() > .90:
self.isAlive=False
def remove(self):
if self.p is not None and self.n is not None:
self.n.setPrev(self.p)
self.p.setNext(self.n)
elif self.p is not None:
self.p.setNext(None)
elif self.n is not None:
self.n.setPrev(None)
del self
class Grid():
def __init__(self):
self.cells=[[Cell() for i in range(0,500)] for j in range(0,500)]
for x in range(0,100):
for y in range(0,100):
for z in range(0,100):
self.cells[x][y].addObject(Object())
def simulate(self):
for x in range(0,500):
for y in range(0,500):
self.cells[x][y].simulate()
num=gc.collect()
print " " + str(num) +" deleted today."
class Cell():
def __init__(self):
self.objects = None
self.objectsLast = None
def addObject(self, object):
if self.objects is None:
self.objects = object
else:
self.objectsLast.setNext(object)
object.setPrev(self.objectsLast)
self.objectsLast = object
def simulate(self):
current = self.objects
while current is not None:
if current.isAlive:
current.simulate()
current = current.getNext()
else:
delete = current
current = current.getNext()
if delete.getPrev() is None:
self.objects = current
elif delete.getNext() is None:
self.objectsLast = delete.getPrev()
delete.remove()
def main():
print "Building Map..."
x = Grid()
for y in range (1,101):
print "Simulating day " + str(y) +"..."
x.simulate()
if __name__ == "__main__":
main()

gc.get_referrers takes one argument: the object whose referers it should find.
I cannot think of any circumstance in which gc.get_referrers would return no results, because in order to send an object to gc.get_referrers, there has to be a reference to the object.
In other words, if there was no reference to the object, it would not be possible to send it to gc.get_referrers.
At the very least, there will be a reference from the globals() or from the current execution frame (which contains the local variables):
A code block is executed in an execution frame. An execution frame contains some administrative information (used for debugging), determines where and how execution continues after the code block's execution has completed, and (perhaps most importantly) defines two namespaces, the local and the global namespace, that affect execution of the code block.
See an extended version of the example from the question:
class Foo(object):
pass
def f():
bar = [Foo() for i in range(0, 10)]
for x in range(0, len(bar)):
# at this point there is one reference to bar[x]: it is bar
print len(gc.get_referrers(bar[x])) # prints 1
baz = bar[x]
# at this point there are two references to baz:
# - bar refernces it, because it is in the list
# - this "execution frame" references it, because it is in variable "baz"
print len(gc.get_referrers(bar[x])) # prints 2
del bar[x]
# at this point, only the execution frame (variable baz) references the object
print len(gc.get_referrers(baz)) # prints 1
print gc.get_referrers(baz) # prints a frame object
del baz
# now there are no more references to it, but there is no way to call get_referrers
f()
How to test it properly?
There is a better trick to detect whether there are referers or not: weakref.
weakref module provides a way to create weak references to an object which do not count. What it means is that even if there is a weak reference to an object, it will still be deleted when there are no other references to it. It also does not count in the gc.get_referrers.
So:
>>> x = Foo()
>>> weak_x = weakref.ref(x)
>>>
>>> gc.get_referrers(x) == [globals()] # only one reference from global variables
True
>>> x
<__main__.Foo object at 0x000000000272D2E8>
>>> weak_x
<weakref at 0000000002726D18; to 'Foo' at 000000000272D2E8>
>>> del x
>>> weak_x
<weakref at 0000000002726D18; dead>
The weak reference says that the object is dead, so it was indeed deleted.

Okay thanks to cjhanks and user2357112 I came up with this answer
The problem being that if you run the program the gc does not collect anything after each day even though there were things deleted
To test if it is deleted I instead run
print len(gc.get_objects())
each time I go through a "day" doing this shows how many objects python is tracking.
Now with that information and thanks to a comment I tired changing Grid to
class Grid():
def __init__(self):
self.cells=[[Cell() for i in range(0,500)] for j in range(0,500)]
self.add(100)
def add(self, num):
for x in range(0, 100):
for y in range(0, 100):
for z in range(0, num):
self.cells[x][y].addObject(Object())
def simulate(self):
for x in range(0,500):
for y in range(0,500):
self.cells[x][y].simulate()
num=gc.collect()
print " " + str(num) +" deleted today."
print len(gc.get_objects())
and then calling Grid.add(50) halfway through the process. My memory allocation for the program did not increase (watching top in Bash) So my learning points:
GC was running without my knowledge
Memory is allocated and never returned to system until the the program is done
Python will reuse the memory

NameError: name 'getTempo' is not defined

i'm getting an error defining function "getTempo" and i don't know why... Thanks for the help.
example:
L=[Musica("aerossol",4.9),Musica("lua",5.3),Musica("monte",3.2),Musica("rita",4.7)];getTempo("lua",L)
should give:
lua:5.3
5.3
class Musica:
def __init__(self,nome,tempo):
self.nome=nome
self.tempo=tempo
def __repr__(self):
return self.nome+":"+str(self.tempo)
def getTempo(nomeMusica,ListaMusicas):
if ListaMusicas==[]:
print ("Inexistente")
else:
meio=len(ListaMusicas)//2
print (ListaMusicas[meio])
A = [i[0] for i in ListaMusicas]
B = [i[1] for i in ListaMusicas]
if nomeMusica==A[meio]:
print (B[meio])
elif nomeMusica<A[meio]:
return getTempo(nomeMusica,ListaMusicas[:meio])
else:
return getTempo(nomeMusica,ListaMusicas[(meio+1):])

In python, unlike languages like Java or C++, instance attributes and methods must be accessed on the instance, so you must write self.getTempo in order for getTempo to resolve.
EDIT - Selective Reading Failure
You also need to make sure that all method definitions include an argument for the class instance itself, which will be the first argument passed. By convention, this is the self argument, but it can be any name you choose. Here is the modified function definition:
def getTempo(self, nomeMusica,ListaMusicas): # Changed
if ListaMusicas==[]:
print ("Inexistente")
else:
meio=len(ListaMusicas)//2
print (ListaMusicas[meio])
A = [i[0] for i in ListaMusicas]
B = [i[1] for i in ListaMusicas]
if nomeMusica==A[meio]:
print (B[meio])
elif nomeMusica<A[meio]:
return self.getTempo(nomeMusica,ListaMusicas[:meio]) # Changed
else:
return self.getTempo(nomeMusica,ListaMusicas[(meio+1):]) # Changed

Python Adding Unique Class Objects to a List

I'm having problems when adding objects to a list. When I append objects to the end of a list and then try to loop through the it, every spot on the list just gives me back the most recently added object.
The script compares info from different projects in Excel spreadsheets. I'm using Python for Windows and the win32com.client to access the speadsheets I'm interested in. I read about a few others on Stack Overflow having problems adding unique objects to a list, but I'm pretty sure I don't have the same mistakes that they did (initializing a list in a loop, not providing input attributes when creating a class object).
I can comment out the object creation in the loop and simply add numbers to the list and am able to print out all three unique values, but as soon as I put the object creation call back in, things go wrong. The code below just prints three of the most recently added project. Any help would be greatly appreciated, thanks!
class Project:
"""
Creates an instance for each project
in the spreadsheet
"""
def __init__(self, bldg, zone, p_num, p_name, p_mgr,
const_mgr, ehs_lias, ehs_const, status,
p_type, start, finish):
self.bldg = bldg
self.zone = zone
self.p_num = p_num
self.p_name = p_name
self.p_mgr = p_mgr
self.const_mgr = const_mgr
self.ehs_lias = ehs_lias
self.ehs_const = ehs_const
self.status = status
self.p_type = p_type
self.start = start
self.finish = finish
def quickPrint(self):
""" prints quick glance projects details """
if p_name is None:
pass
else:
print 'Building ' + str(bldg.Value)
print str(p_name.Value)
print str(p_type.Value) + " -- " + str(p_mgr.Value)
print str(start.Value) + " - " + str(finish.Value)
projects = []
for i in range(25, 28):
bldg = excel.Cells(i,1)
zone = excel.Cells(i,2)
p_num = excel.Cells(i,3)
p_name = excel.Cells(i,4)
p_mgr = excel.Cells(i,5)
const_mgr = excel.Cells(i,6)
ehs_lias = excel.Cells(i,7)
ehs_const = excel.Cells(i,8)
status = excel.Cells(i,9)
p_type = excel.Cells(i,10)
start = excel.Cells(i,11)
finish = excel.Cells(i,12)
projects.append(Project(bldg, zone, p_num, p_name, p_mgr,
const_mgr, ehs_lias, ehs_const,
status, p_type, start, finish))
projects[0].quickPrint()
projects[1].quickPrint()
projects[2].quickPrint()

I think you have defined quickPrint incorrectly. As far as it is concerned, p_name, p_type, p_mgr, etc. are not defined, so it looks further up the scope resolution tree, or whatever it's called, and then eventually finds them - where you last defined them in the for loop, which is why it gives you the last value.
Because you have used the same variable names in your loop, you are hiding this issue, and making it more confusing.
def quickPrint(self):
""" prints quick glance projects details """
if self.p_name is None:
pass
else:
print 'Building ' + str(self.bldg.Value)
print str(self.p_name.Value)
print str(self.p_type.Value) + " -- " + str(self.p_mgr.Value)
print str(self.start.Value) + " - " + str(self.finish.Value)
example:
class Project(object):
def __init__(self, argument):
self.argument = argument
def __repr__(self):
return str(argument)
projects = []
for i in range(10):
argument = i
projects.append(Project(argument))
print projects
this outputs [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
changing the __repr__(self): definition to have self.argument fixes it.

How can I refer to a function not by name in its definition in python?

I am maintaining a little library of useful functions for interacting with my company's APIs and I have come across (what I think is) a neat question that I can't find the answer to.
I frequently have to request large amounts of data from an API, so I do something like:
class Client(object):
def __init__(self):
self.data = []
def get_data(self, offset = 0):
done = False
while not done:
data = get_more_starting_at(offset)
self.data.extend(data)
offset += 1
if not data:
done = True
This works fine and allows me to restart the retrieval where I left off if something goes horribly wrong. However, since python functions are just regular objects, we can do stuff like:
def yo():
yo.hi = "yo!"
return None
and then we can interrogate yo about its properties later, like:
yo.hi => "yo!"
my question is: Can I rewrite my class-based example to pin the data to the function itself, without referring to the function by name. I know I can do this by:
def get_data(offset=0):
done = False
get_data.data = []
while not done:
data = get_more_starting_from(offset)
get_data.data.extend(data)
offset += 1
if not data:
done = True
return get_data.data
but I would like to do something like:
def get_data(offset=0):
done = False
self.data = [] # <===== this is the bit I can't figure out
while not done:
data = get_more_starting_from(offset)
self.data.extend(data) # <====== also this!
offset += 1
if not data:
done = True
return self.data # <======== want to refer to the "current" object
Is it possible to refer to the "current" object by anything other than its name?
Something like "this", "self", or "memememe!" is what I'm looking for.

I don't understand why you want to do this, but it's what a fixed point combinator allows you to do:
import functools
def Y(f):
#functools.wraps(f)
def Yf(*args):
return inner(*args)
inner = f(Yf)
return Yf
#Y
def get_data(f):
def inner_get_data(*args):
# This is your real get data function
# define it as normal
# but just refer to it as 'f' inside itself
print 'setting get_data.foo to', args
f.foo = args
return inner_get_data
get_data(1, 2, 3)
print get_data.foo
So you call get_data as normal, and it "magically" knows that f means itself.

You could do this, but (a) the data is not per-function-invocation, but per function (b) it's much easier to achieve this sort of thing with a class.
If you had to do it, you might do something like this:
def ybother(a,b,c,yrselflambda = lambda: ybother):
yrself = yrselflambda()
#other stuff
The lambda is necessary, because you need to delay evaluation of the term ybother until something has been bound to it.
Alternatively, and increasingly pointlessly:
from functools import partial
def ybother(a,b,c,yrself=None):
#whatever
yrself.data = [] # this will blow up if the default argument is used
#more stuff
bothered = partial(ybother, yrself=ybother)
Or:
def unbothered(a,b,c):
def inbothered(yrself):
#whatever
yrself.data = []
return inbothered, inbothered(inbothered)
This last version gives you a different function object each time, which you might like.
There are almost certainly introspective tricks to do this, but they are even less worthwhile.

Not sure what doing it like this gains you, but what about using a decorator.
import functools
def add_self(f):
#functools.wraps(f)
def wrapper(*args,**kwargs):
if not getattr(f, 'content', None):
f.content = []
return f(f, *args, **kwargs)
return wrapper
#add_self
def example(self, arg1):
self.content.append(arg1)
print self.content
example(1)
example(2)
example(3)
OUTPUT
[1]
[1, 2]
[1, 2, 3]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Undesired python feedparser instantiation relic - python

The problem is posts=[]. Default arguments are calculated at compile time, not runtime, so mutations to the object remain for the lifetime of the class. Instead use posts=None and test: if posts is None: self.posts = []

Related

In python, is there a way to automatically log information any time you create a variable?

Python object is being referenced by an object I cannot find

NameError: name 'getTempo' is not defined

Python Adding Unique Class Objects to a List

How can I refer to a function not by name in its definition in python?

Categories

Resources