How to use python-magic 5.19-1 - python

I need to determine MIME-types from files without suffix in python3 and I thought of python-magic as a fitting solution therefor.
Unfortunately it does not work as described here:
https://github.com/ahupp/python-magic/blob/master/README.md
What happens is this:
>>> import magic
>>> magic.from_file("testdata/test.pdf")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'from_file'
So I had a look at the object, which provides me with the class Magic for which I found documentation here:
http://filemagic.readthedocs.org/en/latest/guide.html
I was surprised, that this did not work either:
>>> with magic.Magic() as m:
... pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'ms'
>>> m = magic.Magic()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'ms'
>>>
I could not find any information about how to use the class Magic anywhere, so I went on doing trial and error, until I figured out, that it accepts instances of LP_magic_set only for ms.
Some of them are returned by the module's methods
magic.magic_set() and magic_t().
So I tried to instanciate Magic with either of them.
When I then call the file() method from the instance, it will always return an empty result and the errlvl() method tells me error no. 22.
So how do I use magic anyway?

I think that you are confusing different implementations of "python-magic"
You appear to have installed python-magic-5.19.1, however, you reference firstly the documentation for python-magic-0.4.6, and secondly filemagic-1.6. I think that you are better off using python-magic-0.4.6 as it is readily available at PYPI and easily installed via pip into virtualenv environments.
Documentation for python-magic-5.19.1 is hard to come by, but I managed to get it to work like this:
>>> import magic
>>> m=magic.open(magic.MAGIC_NONE)
>>> m.load()
0
>>> m.file('/etc/passwd')
'ASCII text'
>>> m.file('/usr/share/cups/data/default.pdf')
'PDF document, version 1.5'
You can also get different magic descriptions, e.g. MIME type:
>>> m=magic.open(magic.MAGIC_MIME)
>>> m.load()
0
>>> m.file('/etc/passwd')
'text/plain; charset=us-ascii'
>>> m.file('/usr/share/cups/data/default.pdf')
'application/pdf; charset=binary'
or for more recent versions of python-magic-5.30
>>> import magic
>>> magic.detect_from_filename('/etc/passwd')
FileMagic(mime_type='text/plain', encoding='us-ascii', name='ASCII text')
>>> magic.detect_from_filename('/etc/passwd').mime_type
'text/plain'

Related

Function not being defined even though ive saved it

I have a fully working programme that I wish to run. It executes on my friend's laptop, but not mine, (I've saved it to my documents folder) the following is the program:
def DigitCount(n):
#how many decimal digits in integer 'n'
if n<0:
n=-n
digitCount=1
powerOfTen=10
while powerOfTen<=n:
digitCount+=1
powerOfTen*=10
return digitCount
But I keep getting the following error:
>>> DigitCount(100)
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
DigitCount(100)
NameError: name 'DigitCount' is not defined
Wait, are you saying you do the following from the command line?
$ python DigitCount.py
$ python
>>> DigitCount(100)
That won't work. You have to do this:
$ python
>>> import DigitCount
>>> DigitCount.DigitCount(100)

Splitting Lines in Python in a List?

I'm new-ish to Python and I'm having trouble achieving a result that I want. I'm opening a text file called urldata.txt which contains URLs that I need to break down by scheme, server, and path.
I have retrieved the data from the file:
urls = open("urldata.txt").read()
print(urls)
this returns:
http://www.google.com
https://twitter.com/search?q=%23ASUcis355
https://github.com/asu-cis-355/course-info
I want to break these URLs into 3 pieces each so that when I enter
urls.scheme()
urls.server()
urls.path()
It will return me the scheme of each URL when I enter
urls.scheme()
'http','https','https'
Then it will return the server when I enter
urls.server()
'google.com'
'twitter.com'
'github.com'
Finally, it will return the path when I enter
urls.path()
'/'
'/search?q=%23ASUcis355'
'/asu-cis-355/course-info'
I have defined a class to do this; however, I receive an error saying 'scheme() missing 1 required positional argument: 'self' Below is my class and the def parts to it that I have created.
class urls:
def __init__(self,url):
self.urls=urls
def scheme(self):
return urls.split("://")[0]
def server(self):
return urls.split("/")[2]
def path(self):
return urls.split(".com/")[1]
Any help at all is greatly appreciated!
This exists already. It's called urlparse:
from urllib.parse import urlparse
d = urlparse('https://twitter.com/search?q=%23ASUcis355')
print(d)
Output:
ParseResult(scheme='https', netloc='twitter.com', path='/search', params='', query='q=%23ASUcis355', fragment='')
If you attempt to call a class definition (what urls' is) without creating an instance of this class in Python3 then you get this error
>>> urls.scheme()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: scheme() missing 1 required positional argument: 'self'
>>>
But if you create an instance of urls and then use that instance this works as intended
>>> url_instance = urls("http://www.google.com")
>>> url_instance.scheme()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in scheme
AttributeError: type object 'urls' has no attribute 'split'
Note that this fixes your current error but your code isn't correct as is. I'll leave you to figure out what's happening with this error.
The difference between a class definition (or type) and an instance of the class has some interesting nuance but generally speaking
class Thing:
pass
is a class definition and
thing_instance = Thing()
Is an instance of the class.

python BaseHTTPRequestHandler instance error

I try to create a class instance of BaseHTTPRequestHandler but I have an error message.
Here is what I did :
>>> from BaseHTTPServer import BaseHTTPRequestHandler
>>> obj=BaseHTTPRequestHandler()
>>> obj.send_response(200)
I got :
>>> obj=BaseHTTPRequestHandler()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() takes exactly 4 arguments (1 given)
Could you please give me some advice
The signature of BaseHTTPRequestHandler is as follows
class BaseHTTPServer.BaseHTTPRequestHandler(request, client_address, server)
The problem is that youre not providing the values it needs
You can check https://docs.python.org/2/library/basehttpserver.html#module-BaseHTTPServer for detailed info on what it requires exactly.
As #iCodez commented, self counts as 1 of the 4 needed arguments, but is passed implicitly, so you just need to pass the other 3

Python hasattr vs getattr

I have been reading lately some tweets and the python documentation about hasattr and it says:
hasattr(object, name)
The arguments are an object and a string. The result is True if the string is the name of >> one of the object’s attributes, False if not. (This is implemented by calling
getattr(object, name) and seeing whether it raises an AttributeError or not.)
There is a motto in Python that says that is Easier to ask for forgiveness than permission where I usually agree.
I tried to do a performance test in this case with a really simple python code:
import timeit
definition="""\
class A(object):
a = 1
a = A()
"""
stm="""\
hasattr(a, 'a')
"""
print timeit.timeit(stmt=stm, setup=definition, number=10000000)
stm="""\
getattr(a, 'a')
"""
print timeit.timeit(stmt=stm, setup=definition, number=10000000)
With the results:
$ python test.py
hasattr(a, 'a')
1.26515984535
getattr(a, 'a')
1.32518696785
I´ve tried also what happens if the attribute doesn´t exists and the differences between getattr and hasattr are bigger. So what I´ve seen so far is that getattr is slower than hasattr, but in the documentation it says that it calls getattr.
I´ve searched the CPython implementation of hasattr and getattr and it seems that both call the next function:
v = PyObject_GetAttr(v, name);
but there is more boilerplate in getattr than in hasattr that probably makes it slower.
Does anyone knows why in the documentation we say that hasattr calls getattr and we seem to encourage the users to use getattr instead of hasattr when it really isn´t due to performance? Is just because it is more pythonic?
Maybe I am doing something wrong in my test :)
Thanks,
Raúl
The documentation does not encourage, the documentation just states the obvious. The hasattr is implemented as such, and throwing an AttributeError from a property getter can make it look like the attribute does not exist. This is an important detail, and that is why it is explicitly stated in the documentation. Consider for example this code:
class Spam(object):
sausages = False
#property
def eggs(self):
if self.sausages:
return 42
raise AttributeError("No eggs without sausages")
#property
def invalid(self):
return self.foobar
spam = Spam()
print(hasattr(Spam, 'eggs'))
print(hasattr(spam, 'eggs'))
spam.sausages = True
print(hasattr(spam, 'eggs'))
print(hasattr(spam, 'invalid'))
The result is
True
False
True
False
That is the Spam class has a property descriptor for eggs, but since the getter raises AttributeError if not self.sausages, then the instance of that class does not "hasattr" eggs.
Other than that, use hasattr only when you don't need the value; if you need the value, use getattr with 2 arguments and catch the exception, or 3 arguments, the third being a sensible default value.
The results using getattr() (2.7.9):
>>> spam = Spam()
>>> print(getattr(Spam, 'eggs'))
<property object at 0x01E2A570>
>>> print(getattr(spam, 'eggs'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in eggs
AttributeError: No eggs without sausages
>>> spam.sausages = True
>>> print(getattr(spam, 'eggs'))
42
>>> print(getattr(spam, 'invalid'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 10, in invalid
AttributeError: 'Spam' object has no attribute 'invalid'
>>>
Seems that hasattr has a problem with swallowing exceptions (at least in Python 2.7), so probably is better to stay away from it until it's fixed.
Take, for instance, the following code:
>>> class Foo(object):
... #property
... def my_attr(self):
... raise ValueError('nope, nope, nope')
...
>>> bar = Foo()
>>> bar.my_attr
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in my_attr
ValueError: nope, nope, nope
>>> hasattr(Foo, 'my_attr')
True
>>> hasattr(bar, 'my_attr')
False
>>> getattr(bar, 'my_attr', None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in my_attr
ValueError: nope, nope, nope
>>>

How do I disable and then re-enable a warning?

I'm writing some unit tests for a Python library and would like certain warnings to be raised as exceptions, which I can easily do with the simplefilter function. However, for one test I'd like to disable the warning, run the test, then re-enable the warning.
I'm using Python 2.6, so I'm supposed to be able to do that with the catch_warnings context manager, but it doesn't seem to work for me. Even failing that, I should also be able to call resetwarnings and then re-set my filter.
Here's a simple example which illustrates the problem:
>>> import warnings
>>> warnings.simplefilter("error", UserWarning)
>>>
>>> def f():
... warnings.warn("Boo!", UserWarning)
...
>>>
>>> f() # raises UserWarning as an exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
UserWarning: Boo!
>>>
>>> f() # still raises the exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
UserWarning: Boo!
>>>
>>> with warnings.catch_warnings():
... warnings.simplefilter("ignore")
... f() # no warning is raised or printed
...
>>>
>>> f() # this should raise the warning as an exception, but doesn't
>>>
>>> warnings.resetwarnings()
>>> warnings.simplefilter("error", UserWarning)
>>>
>>> f() # even after resetting, I'm still getting nothing
>>>
Can someone explain how I can accomplish this?
EDIT: Apparently this is a known bug: http://bugs.python.org/issue4180
Reading through the docs and few times and poking around the source and shell I think I've figured it out. The docs could probably improve to make clearer what the behavior is.
The warnings module keeps a registry at __warningsregistry__ to keep track of which warnings have been shown. If a warning (message) is not listed in the registry before the 'error' filter is set, any calls to warn() will not result in the message being added to the registry. Also, the warning registry does not appear to be created until the first call to warn:
>>> import warnings
>>> __warningregistry__
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
NameError: name '__warningregistry__' is not defined
>>> warnings.simplefilter('error')
>>> __warningregistry__
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
NameError: name '__warningregistry__' is not defined
>>> warnings.warn('asdf')
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
UserWarning: asdf
>>> __warningregistry__
{}
Now if we ignore warnings, they will get added to the warnings registry:
>>> warnings.simplefilter("ignore")
>>> warnings.warn('asdf')
>>> __warningregistry__
{('asdf', <type 'exceptions.UserWarning'>, 1): True}
>>> warnings.simplefilter("error")
>>> warnings.warn('asdf')
>>> warnings.warn('qwerty')
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
UserWarning: qwerty
So the error filter will only apply to warnings that aren't already in the warnings registry. To make your code work you'll need to clear the appropriate entries out of the warnings registry when you're done with the context manager (or in general any time after you've used the ignore filter and want a prev. used message to be picked up the error filter). Seems a bit unintuitive...
Brian Luft is correct about __warningregistry__ being the cause of the problem. But I wanted to clarify one thing: the way the warnings module appears to work is that it sets module.__warningregistry__ for each module where warn() is called. Complicating things even more, the stacklevel option to warnings causes the attribute to be set for the module the warning was issued "in the name of", not necessarily the one where warn() was called... and that's dependent on the call stack at the time the warning was issued.
This means you may have a lot of different modules where the __warningregistry__ attribute is present, and depending on your application, they may all need clearing before you'll see the warnings again. I've been relying on the following snippet of code to accomplish this... it clears the warnings registry for all modules whose name matches the regexp (which defaults to everything):
def reset_warning_registry(pattern=".*"):
"clear warning registry for all match modules"
import re
import sys
key = "__warningregistry__"
for mod in sys.modules.values():
if hasattr(mod, key) and re.match(pattern, mod.__name__):
getattr(mod, key).clear()
Update: CPython issue 21724 addresses issue that resetwarnings() doesn't clear warning state. I attached an expanded "context manager" version to this issue, it can be downloaded from reset_warning_registry.py.
Brian is spot on about the __warningregistry__. So you need to extend catch_warnings to save/restore the global __warningregistry__ too
Something like this may work
class catch_warnings_plus(warnings.catch_warnings):
def __enter__(self):
super(catch_warnings_plus,self).__enter__()
self._warningregistry=dict(globals.get('__warningregistry__',{}))
def __exit__(self, *exc_info):
super(catch_warnings_plus,self).__exit__(*exc_info)
__warningregistry__.clear()
__warningregistry__.update(self._warningregistry)
Following on from Eli Collins' helpful clarification, here is a modified version of the catch_warnings context manager that clears the warnings registry in a given sequence of modules when entering the context manager, and restores the registry on exit:
from warnings import catch_warnings
class catch_warn_reset(catch_warnings):
""" Version of ``catch_warnings`` class that resets warning registry
"""
def __init__(self, *args, **kwargs):
self.modules = kwargs.pop('modules', [])
self._warnreg_copies = {}
super(catch_warn_reset, self).__init__(*args, **kwargs)
def __enter__(self):
for mod in self.modules:
if hasattr(mod, '__warningregistry__'):
mod_reg = mod.__warningregistry__
self._warnreg_copies[mod] = mod_reg.copy()
mod_reg.clear()
return super(catch_warn_reset, self).__enter__()
def __exit__(self, *exc_info):
super(catch_warn_reset, self).__exit__(*exc_info)
for mod in self.modules:
if hasattr(mod, '__warningregistry__'):
mod.__warningregistry__.clear()
if mod in self._warnreg_copies:
mod.__warningregistry__.update(self._warnreg_copies[mod])
Use with something like:
import my_module_raising_warnings
with catch_warn_reset(modules=[my_module_raising_warnings]):
# Whatever you'd normally do inside ``catch_warnings``
I've run into the same issues, and while all of the other answers are valid I choose a different route. I don't want to test the warnings module, nor know about it's inner workings. So I just mocked it instead:
import warnings
import unittest
from unittest.mock import patch
from unittest.mock import call
class WarningTest(unittest.TestCase):
#patch('warnings.warn')
def test_warnings(self, fake_warn):
warn_once()
warn_twice()
fake_warn.assert_has_calls(
[call("You've been warned."),
call("This is your second warning.")])
def warn_once():
warnings.warn("You've been warned.")
def warn_twice():
warnings.warn("This is your second warning.")
if __name__ == '__main__':
__main__=unittest.main()
This code is Python 3, for 2.6 you need the use an external mocking library as unittest.mock was only added in 2.7.

Categories

Resources