Problems retrieving XML with Python requests

Problems retrieving XML with Python requests - python

I am probably overseeing something obvious but I can't seem to figure it. I am trying a simple verification to start with using the following url.
http://myanimelist.net/api/account/verify_credentials.xml
http://myanimelist.net/modules.php?go=api#verifycred
(Here's the full documentation regarding this URL).
This is the code used for testing it out.
class Foobar():
def __init__(self):
pass
def bar(self):
client = requests.get('http://myanimelist.net/api/account/verify_credentials.xml',
auth=(username, password))
if client.status_code == 200:
print "Succesfull authentication. %i"%client.status_code
else:
print "Authentication failed %i"%client.status_code
print client.text
Foo = Foobar()
Foo.bar()
I got a correct response once and assumed this was the right way of going. However from this part on I only receive responses like this.
Every request send regarding the user credentials being correct or not.
I've tried various encoding and neither have affected the response in any way.
EDIT: I seem to have solved the issue. After wiping my cookies and clearing my cache it returned a valid response by status code 401.

The issue causing it was cooking placed by the site itself. I am unsure which cookies specifically caused this problem but once found I will add it.
EDIT: They have a bot checking thrid party connections which bans you upon trying to connect rending their API useless.

Related

Why don't I get a response from my request?

I'm trying to make one simple request:
ua=UserAgent()
req = requests.get('https://www.casasbahia.com.br/' , headers={'User-Agent':ua.random})
I would understand if I received <Response [403] or something like that, but instead, a recive nothing, the code keep runing with no response.
using logging I see:
I know I could use a timeout to avoid keeping the code running, but I just want to understand why I don't get an response
thanks in advance

I never used this API before, but from what I researched on here just now, there are sites that can block requests from fake users.
So, for reproducing this example on my PC, I installed fake_useragent and requests modules on my Python 3.10, and tried to execute your script. It turns out that with my Authentic UserAgent string, the request can be done. When printed on the console, req.text shows the entire HTML file received from the request.
But if I try again with a fake user agent, using ua.random, it fails. The site was probably developed to detect and reject requests from fake agents (or bots).
Though again, this is just theory. I have no ways to access this site's server files to comprove it.

Problem with detecting if link is invalid

Is there any way to detect if a link is invalid using webbot?
I need to tell the user that the link they provided was unreachable.

The only way to be completely sure that a url sends you to a valid page is to fetch that page and check it works. You could try making a request other than GET to try to avoid the wasted bandwith downloading the page, but not all servers will respond: the only way to be absolutely sure is to GET and see what happens. Something like:
import requests
from requests.exceptions import ConnectionError
def check_url(url):
try:
r = requests.get(url, timeout=1)
return r.status_code == 200
except ConnectionError:
return False
Is this a good idea? It's only a GET request, and get is supposed to idempotent, so you shouldn't cause anybody any harm. On the other hand, what if a user sets up a script to add a new link every second pointing to the same website? Then you're DDOSing that website. So when you allow users to cause your server to do things like this, you need to think how you might protect it. (In this case: you could keep a cache of valid links expiring every n seconds, and only look up if the cache doesn't hold the link.)
Note that if you just want to check the link points to a valid domain it's a bit easier: you can just do a dns query. (The same point about caching and avoiding abuse probably applies.)
Note that I used requests, because it is easy, but you likely want to do this in the bacground, either with requests in a thread, or with one of the asyncio http libraries and an asyncio event loop. Otherwise your code will block for at least timeout seconds.
(Another attack: this gets the whole page. What if a user links to a massive page? See this question for a discussion of protecting from oversize responses. For your use case you likely just want to get a few bytes. I've deliberately not complicated the example code here because I wanted to illustrate the principle.)
Note that this just checks that something is available on that page. What if it's one of the many dead links which redirects to a domain-name website? You could enforce 'no redirects'---but then some redirects are valid. (Likewise, you could try to detect redirects up to the main domain or to a blacklist of venders' domains, but this will always be imperfect.) There is a tradeoff here to consider, which depends on your concrete use case, but it's worth being aware of.

You could try sending an HTTP request, opening the result, and have a list of known error codes, 404, etc. You can easily implement this in Python and is efficient and quick. Be warned that SOMETIMES (quite rarely) a website might detect your scraper and artificially return an Error Code to confuse you.

django spotify api python http post 500 error

Hello I am trying to make a django website using the spotify api, so I am trying to get some simple example code working using the spotipy python library, but keep getting a http post 500 whenever my spotipy code is called.
Right now if you click a button on the website it makes a post request to one of my endpoints, which calls a python function and is supposed to return text. This is the code in the python function:
import spotipy
def spotifyleastplayed_py(request):
print("spotifyleastplayed_py()")
if request.method == 'POST':
print("0")
sp = spotipy.Spotify()
print("1")
results = sp.search(q='weezer', limit=20)
print("2")
print(results)
data = "temp spotifyleastplayed_py() Return Data"
return HttpResponse(data) #HttpResponse(json.dumps("{test:bobo}"))
When the function is called, my console outputs the following error message:
[06/Oct/2019 21:49:03] "GET /spotifyleastplayed HTTP/1.1" 200 1992
spotifyleastplayed_py()
0
1
[06/Oct/2019 21:49:07] "POST /spotifyleastplayed_py/ HTTP/1.1" 500 6326
Do I need to add the spotipy url to django somewhere so the library can make calls successfully? It seems like its failing to make the http request to spotipy.

First of all, I would advise you to learn more about debugging your python code, as this is a critical skill to have as a developer, and it might help you get further into the problem next time. One thing you could deduce from your example for example is that your program does not execute anything beyond the following line
results = sp.search(q='weezer', limit=20)
But the only information you are getting is a 500 return code, which doesn't tell you exactly what is going wrong, only that something is not right.
One first step you could take for example is trying to find out what exactly is causing your code to terminate. If you wrap the statement in a try except block, you'll be able to see exactly what kind of error is occurring, like this:
try:
results = sp.search(q='weezer', limit=20)
except Exception as e:
print(e)
This catches the error generated by the statement, and prints it out which will give the following:
http status: 401, code:-1 -
https://api.spotify.com/v1/search?q=weezer&limit=20&offset=0&type=track:
No token provided
That's already a lot more telling than simply a 500 error, right?
I would not recommend this method for every issue in your code, but it's a start.
To learn more about how to debug your code, you can read articles like this.
Anyways:
When I run your code, a spotipy.client.SpotifyException is raised, because the Spotify API returns a 401 error code.
401 (Unauthorized) means you have no authorization to access the requested resource, and for the Spotify API specifically, it means that you'll need to supply a valid token.
You'll need to request a token from the user, and pass that token when initializing spotify like this:
...
sp = spotipy.Spotify(auth=token)
results = sp.search(q='weezer', limit=20)
...
How exactly you get this token from the user depends on the rest of your implementation.
I would recommend reading up on Spotify's authentication flow
There are also plenty of other examples on how people implemented the authorization flow in spotipy, for example in this StackOverflow thread.

requests.post is not giving any response in python?

I'm using Python requests 2.19.1 .
But I'm facing an intermittent issue where I get no response at all when I post to a specific url.
I'm trying to check if the LDAP is giving me the expected output for invalid credentials.
Here's the format:
requests.post('https://oxhp-member.uhc.com/Member/MemberPortal/j_acegi_security_check',
credentials_payload)
that I'm posting
Almost everytime, it works fine. But sometimes, it doesn't give any response for that. Even network issues gives us some response. Right? Why am I not getting any response for the above call.
Is there any existing bug in requests?
Somebody please point me in correct direction.

requests is not responsible for "giving back response". The server you are using requests to post to is.
To see the response you have to keep it in a variable and handle it somehow.
resp = requests.post('https://oxhp-member.uhc.com/Member/MemberPortal/j_acegi_security_check',
credentials_payload)
print(resp.status_code)
print(resp.content)
Whatever resp contains is the responsibility of the server.

Testing external URLs in Django

I'm currently having a hard time getting some of my Django tests to work.
What I'm trying to do is test if a given URL (the REST API I want to consume) is up and running (returning status code 200) and later on if it's responding the expected values.
However, all I get returned is a status code 404 (Page not found), even though the URL is definitely the right one. (Tried the exact string in my browser)
This is the code:
from django.test import TestCase
class RestTest(TestCase):
def test_api_test_endpoint(self):
response = self.client.get("http://ip.to.my.api:8181/test/")
self.assertEqual(response.status_code, 200, "Status code not equals 200")
It always returns a 404 instead of a 200...
Anyone knows what I do wrong here?

self.client is not a real HTTP client; it's the Django test client, which simulates requests to your own app for the purposes of testing. It doesn't make HTTP requests, and it only accepts a path, not a full URL.
If you really needed to check that an external URL was up, you would need a proper HTTP client like requests. However this doesn't seem to be an appropriate thing to do in a test case, since it depends on an external API; if that API went down, suddenly your tests would fail, which seems odd. This could be something that you do in a monitoring job, but not in a test.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems retrieving XML with Python requests - python

The issue causing it was cooking placed by the site itself. I am unsure which cookies specifically caused this problem but once found I will add it. EDIT: They have a bot checking thrid party connections which bans you upon trying to connect rending their API useless.

Related

Why don't I get a response from my request?

Problem with detecting if link is invalid

django spotify api python http post 500 error

requests.post is not giving any response in python?

Testing external URLs in Django

Categories

Resources