Python urllib won't download file due to permissions, but wget will - python

I'm trying to download an MP3 file, via its URL, using Python's urllib2.
mp3file = urllib2.urlopen(url)
output = open(dst,'wb')
output.write(mp3file.read())
output.close()
I'm getting a urllib2.HTTPError: HTTP Error 403: Forbidden error.
Trying urllib also fails, but silently.
urllib.urlretrieve(url, dst)
However, if I use wget, I can download the file successfully.
I've noted the general differences between the two methods mentioned in "Difference between Python urllib.urlretrieve() and wget", but they don't seem to apply here.
Is wget doing something to negotiate permissions that urllib2 doesn't do? If so, what, and how do I replicate this in urllib2?

Could be something on the server side - blocking python user agent for example. Try using wget user agent : Wget/1.13.4 (linux-gnu) .
In Python 2:
import urllib
# Change header for User-Agent
class AppURLopener(urllib.FancyURLopener):
version = "Wget/1.13.4 (linux-gnu)"
url = "http://www.example.com/test_file"
fname = "test_file"
urllib._urlopener = AppURLopener()
urllib.urlretrieve(url, fname)

The above didn't work for me (I'm using python3.5). wget works fine.
It's not (I assume) a huge problem for me - surely I can still do a system() and use wget to get the data, with some file renaming and munging.
But in case anyone else is suffering from the same problem, these are the errors I get from the above snippet:
Traceback (most recent call last):
File "./mksynt.py", line 10, in <module>
class AppURLopener(urllib.FancyURLopener):
AttributeError: module 'urllib' has no attribute 'FancyURLopener'
I see that the original answer was only promised to work in python2.

Related

python remoteconfig unable to parse file from Gitlab

I am trying to get remoteconfig working, following this guide:
https://pypi.org/project/remoteconfig/
As a control, I have this code that works:
config.read('./config.ini')
for section in config:
print(section)
When I put the same config file in a remote Gitlab, this code does not work:
from remoteconfig import config
config.read('https://myorg.org/path/repo/~/blob/app/config.ini')
for section in config:
print(section)
What could I be doing wrong here? The error msg I am getting is:
configParser.MissingSectionHeaderError: File contains no section headers
So it seems like it's reaching the file path (network/connectivity OK), but not liking what's in that file or possibly the file format? The same exact file works with localconfig.
For now I am going to use the 'gitlab' pip module and simply consume the API for the file (with private_token:
f = project.files.get(file_path='path/file', ref='master'

dnf.base.fill_sack() -- how to use the certs in yum.conf?

I have cert information for using HTTPS in my repos stored in /etc/yum.conf at the bottom:
[main]
cachedir=/var/cache/yum/$basearch/$releasever
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
...
sslclientcert=/path/to/cert.pem
sslclientkey=/path/to/privatekey.pem
sslcacert=/path/to/ca.pem
When I use dnf via the terminal, it can communicate with the repos and retrieve repodata/repomd.xml (and package information and all) just fine. However, when I do it via python:
import dnf
with dnf.Base() as base:
base.read_all_repos()
base.fill_sack()
I get:
Errors during downloading metadata for repository '<reponame>':
- Curl error (60): Peer certificate cannot be authenticated with given CA certificates for <repo-path>/repodata/repomd.xml [SSL certificate problem: self signed certificate in certificate chain]
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/dnf/repo.py", line 574, in load
ret = self._repo.load()
File "/usr/lib64/python3.6/site-packages/libdnf/repo.py", line 397, in load
return _repo.Repo_load(self)
libdnf._error.Error: Failed to download metadata for repo '<reponame>': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/site-packages/dnf/base.py", line 399, in fill_sack
self._add_repo_to_sack(r)
File "/usr/lib/python3.6/site-packages/dnf/base.py", line 139, in _add_repo_to_sack
repo.load()
File "/usr/lib/python3.6/site-packages/dnf/repo.py", line 581, in load
raise dnf.exceptions.RepoError(str(e))
dnf.exceptions.RepoError: Failed to download metadata for repo '<reponame>': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
How I know it's an SSL/HTTPS problem:
Implemented HTTPS on my personal repo a few days ago, and that's when the issues started (it worked just fine prior to that)
If I change the URL to http:// instead of https://, it works fine (but this isn't a viable solution).
If I run the dnf via commandline, it also works just fine.
I know that dnf has the ability to pull those certs in from the yum.conf file, but does anyone know (or can figure out) how it's actually done? I've spent a good while digging through the code now and can't figure it out (I'm not particularly familiar with how swig works, which is what I'm getting caught up on witin the dnf code itself).
Any help is appreciated.
Figured it out a little while ago, so maybe this'll help someone else out there:)
import dnf
from libdnf.conf import Option
with dnf.Base() as base:
# Read in the config from yum.conf
with open('/etc/yum.conf') as inf:
yum_config = inf.read()
# Split the config into a dictionary
yum_settings = yum_config.split('\n')
yum_settings = {k.split('=')[0]:k.split('=')[1] for k in yum_settings in '=' in k}
# The settings below will require a priority. This is the third highest one out of about 10, so I went with this one
priority = Option.Priority_DROPINCONFIG
base.conf._config.sslclientcert().set(priority, yum_settings['sslclientcert'])
base.conf._config.sslclientkey().set(priority, yum_settings['sslclientkey'])
base.conf._config.sslcacert().set(priority, yum_settings['sslcacert'])
base.read_all_repos()
base.fill_sack()
And now it works! It uses the SSL certs in the files in /etc/yum.conf to connect to the repos and all is well!

Get a list of YouTube Music library uploads with ytmusicapi

I need to get a list of all albums of my YouTube Music uploads library with ytmusicapi and I never work with Python before. I created the headers_auth.json and a test.py file with the following code from the example:
from ytmusicapi import YTMusic
ytmusic = YTMusic('headers_auth.json')
playlistId = ytmusic.create_playlist("test", "test description")
search_results = ytmusic.search("Oasis Wonderwall")
ytmusic.add_playlist_items(playlistId, [search_results[0]['videoId']])
I ran from the Ubuntu terminal: python /home/do/Desktop/ytmusicapi/ytmusicapi-master/test.py
Result error:
Traceback (most recent call last): File
"/home/do/Desktop/ytmusicapi/ytmusicapi-master/test.py", line 1, in
from ytmusicapi import YTMusic File "/home/do/Desktop/ytmusicapi/ytmusicapi-master/ytmusicapi/init.py",
line 2, in
from ytmusicapi.ytmusic import YTMusic File "/home/do/Desktop/ytmusicapi/ytmusicapi-master/ytmusicapi/ytmusic.py",
line 28
auth: str = None,
^ SyntaxError: invalid syntax
How to fix that?
This is because you are using a different version of Python.
The library you are using requires Python>=3.5 wheraas you seem to be using Python2.
If you are curious this code
def f(x:int):
return
is valid in Python3.5+ versions but not in Python2.
You will have to either switch to Python3 to use that library or else you can clone the repo and convert the whole repo to Python2 using something like this(though there might be other ways too).
I'd recommend you simply switch to Python3.5 or higher.

Building the SeeingWand on Raspberry Pi Zero and have coding issues

This is my first posting, so please forgive any lack of decorum
I am building a SeeingWand as outlined in MagPi issue #71.
I have installed and tested all the HW. Then install the python code, the original; code was python2.7, I have update the code to run under python3, but get a strange error when i run the code:
The system displays that the http module does not have a .client attribute.
The documentation says it does. I have tried .client and .server attributes both give the same error. What am i doing wrong?
I have tried several coding variations and several builds of the raspberry OS (Raspbian) mostly give the same errors
import picamera, http, urllib, base64, json, re
from os import system
from gpiozero import Button
CHANGE {MS_API_KEY} BELOW WITH YOUR MICROSOFT VISION API KEY
ms_api_key = "{MS_API_KEY}"
camera button - this is the BCM number, not the pin number
camera_button = Button(27)
setup camera
camera = picamera.PiCamera()
setup vision API
headers = {
'Content-Type': 'application/octet-stream',
'Ocp-Apim-Subscription-Key': ms_api_key,
}
params = urllib.parse.urlencode({
'visualFeatures': 'Description',
})
loop forever waiting for button press
while True:
camera_button.wait_for_press()
camera.capture('/tmp/image.jpg')
body = open('/tmp/image.jpg', "rb").read()
try:
conn = http.client.HTTPsConnection('westcentralus.api.cognitive.microsoft.com')
conn.request("POST", "/vision/v1.0/analyze?%s"%params, body, headers)
response = conn.getresponse()
analysis=json.loads(response.read())
image_caption = analysis["description"]["captions"][0]["text"].capitalize()
# validate text before system() call; use subprocess in next version
if re.match("^[a-zA-z ]+$", image_caption):
system('espeak -ven+f3 -k5 -s120 "' + image_caption + '"')
else :
system('espeak -ven+f3 -k5 -s120 "i do not know what i just saw"')
conn.close()
except Exception as e:
print (e.args)
The system displays an error stating that the http module does not have a .client attribute.
The documentation says it does. I have tried .client and .server attributes both give the same error. What am i doing wrong?
Expected results are:
when i push button 1 I expect the camera to take a picture
when i push button 2 i expect to access MSFT Azure to identify the picture using AI
the final output is for the Wand to access the audio hat and describe what the Wand is "looking" at.
try adding an import like this:
import http.client
Edit: http is a Python package. Even if the package contains some modules, it does not automatically import those modules when you import the package, unless the __init__.py for that package does so on your behalf. In the case of http, the __init__.py is empty, so you get nothing gratis just for importing the package.

How to make screenshot from web page? [duplicate]

This question already has answers here:
ModuleNotFoundError: What does it mean __main__ is not a package?
(6 answers)
Closed 4 years ago.
How to make screenshot from any url (web page)?
I was trying:
from .ghost import Ghost
ghost = Ghost(wait_timeout=4)
ghost.open('http://www.google.com')
ghost.capture_to('screen_shot.png')
Result:
No module named '__main__.ghost'; '__main__' is not a package
I was trying also:
Python Webkit making web-site screenshots using virtual framebuffer
Take screenshot of multiple URLs using selenium (python)
Fastest way to take a screenshot with python on windows
Take a screenshot of open website in python script
I've also tried other methods that are not listed here.
Nothing succeeded. Or an error or module is not found .. or or or.
I'm tired. Is there an easy way to make a screenshot of a web page using Python 3.X?
upd1:
C:\prg\PY\PUMA\tests>py save-web-html.py
Traceback (most recent call last):
File "save-web-html.py", line 2, in <module>
from .ghost import Ghost
ModuleNotFoundError: No module named '__main__.ghost'; '__main__' is not a package
upd2:
C:\prg\PY\PUMA\tests>py save-web-html.py
Exception ignored in: <bound method Ghost.__del__ of <ghost.ghost.Ghost object at 0x0000020A169CF860>>
Traceback (most recent call last):
File "C:\Users\Coar\AppData\Local\Programs\Python\Python36\lib\site-packages\ghost\ghost.py", line 325, in __del__
self.exit()
File "C:\Users\Coar\AppData\Local\Programs\Python\Python36\lib\site-packages\ghost\ghost.py", line 315, in exit
self._app.quit()
AttributeError: 'NoneType' object has no attribute 'quit'
Traceback (most recent call last):
File "save-web-html.py", line 4, in <module>
ghost = Ghost(wait_timeout=4)
TypeError: __init__() got an unexpected keyword argument 'wait_timeout'
In the late 80's this may have been a simple task, just render some html to an image instead of the screen.
But these days web-pages require client-side execution to build parts of its DOM and re-render based on client-side initiated AJAX (or equivalent) requests... it's a whole thing "web 2.0" thing.
Rendering a web-site such as http://google.com as a simple html return should be easy, but rendering something like https://www.facebook.com/ or https://www.kogan.com/ will have many back & fourth comms to display what you're expecting to see.
So restricting this to a pure python solution may not be plausible; I'm not aware of a python-based browser.
Consider running a separate service to take the screenshots, and use your core application (in python) to fetch requested screenshots.
I just tried a few with docker, many of them struggle with https and the aforementioned ajax behaviour.
earlyclaim/docker-manet appears to work demo page
edit: from your comments, you need the data from a graph that's rendered using a 2nd request.
you just need the json return from https://www.minnowbooster.net/limit/chart
try:
from urllib.request import urlopen # py3
except ImportError:
from urllib2 import urlopen # py2
import json
url = 'https://www.minnowbooster.net/limit/chart'
response = urlopen(url)
data_str = response.read().decode()
data = json.loads(data_str)
print(data)

Categories

Resources