Can't get full table from requests python - python

I'm trying to get the whole table from this website: https://br.investing.com/commodities/aluminum-historical-data
But when I send this code:
with requests.Session() as s:
r = s.post('https://br.investing.com/commodities/aluminum-historical-data',
headers={"curr_id": "49768","smlID": "300586","header": "Alumínio Futuros Dados Históricos",
'User-Agent': 'Mozilla/5.0', 'st_date': '01/01/2017','end_date': '29/09/2018',
'interval_sec': 'Daily','sort_col': 'date','sort_ord': 'DESC','action': 'historical_data'})
bs2 = BeautifulSoup(r.text,'lxml')
tb = bs2.find('table',{"id":"curr_table"})
It only returns a piece of the table, not the whole date I just filtered.
I did see the post page below:
Can anyone help me get the whole table I just filtered?

You made two mistakes with your code.
The first one is the url.
You need to use the correct url to request data to investing.com.
Your current url is 'https://br.investing.com/commodities/aluminum-historical-data'
However, when you see inspection and click 'Network' the Request URLis https://br.investing.com/instruments/HistoricalDataAjax.
Your second mistake exists in s.post(blah). As Federico Rubbi referred above, what you coded assigned to headers must be assigned to data instead.
Now, your mistakes are all resolved. You need to do only one step more. You have to add a dictionary {'X-Requested-With': 'XMLHttpRequest'} to your_headers. Seeing from your code, I can see that you have already checked Network tab in HTML inspection. So, you are probably able to see why you need {'X-Requested-With': 'XMLHttpRequest'}.
So the entire code should be as follows.
import requests
import bs4 as bs
with requests.Session() as s:
url = 'https://br.investing.com/instruments/HistoricalDataAjax' # Making up for the first mistake.
your_headers = {'User-Agent': 'Mozilla/5.0'}
s.get(url, headers= your_headers)
c_list = s.cookies.get_dict().items()
cookie_list = [key+'='+value for key,value in c_list]
cookie = ','.join(cookie_list)
your_headers = {**{'X-Requested-With': 'XMLHttpRequest'},**your_headers}
your_headers['Cookie'] = cookie
data= {} # Your data. Making up for the second mistake.
response = s.post(url, data= data, headers = your_headers)

The problem is that you're passing form data as headers.
You have to send data with data keyworded argument in request.Session.post:
with requests.Session() as session:
url = 'https://br.investing.com/commodities/aluminum-historical-data'
data = {
"curr_id": "49768",
"smlID": "300586",
"header": "Alumínio Futuros Dados Históricos",
'User-Agent': 'Mozilla/5.0',
'st_date': '01/01/2017',
'end_date': '29/09/2018',
'interval_sec': 'Daily',
'sort_col': 'date',
'sort_ord': 'DESC',
'action': 'historical_data',
}
your_headers = {} # your headers here
response = session.post(url, data=data, headers=your_headers)
bs2 = BeautifulSoup(response.text,'lxml')
tb = bs2.find('table',{"id":"curr_table"})
I'd also recommend including your headers (especially user-agents) in the POST request because the site could not allow bots. In this case, if you do it, it will be harder to detect the bot.

Related

requests.post returns isSuccess:false even though Postman returns true

I am trying to post some information into an API based on their recommended format. When I use Postman( tool to test APIs), I see that the response has the isSuccess flag set to true. However, when I write the same code in Python using the requests library, I get the isSuccess flag as false
As mentioned about, I verified the headers and the json data object, both are the same yet the results defer
import requests
data = {"AccountNumber":"100007777",
"ActivityID":"78",
"ActivityDT":"2019-08-07 12:00:00",
"ActivityValue":"1"
}
url = "http://<IP>/<API_PATH>"
headers = {
"X-Tenant":"Default",
"Content-Type":"application/json"
}
response = requests.post(url,data=data, headers = headers)
print(response.content)
This code should successfully post the data and I should get a isSuccess:true in my response variable.
Can anyone help me figure out what might be wrong?
Can you try to change;
response = requests.post(url,data=data, headers = headers)
to;
response = requests.post(url,json=data, headers = headers)
or;
response = requests.post(url,body=data, headers = headers)

POST request using urllib2 doesn't correctly send data (401 error)

I am trying to make a POST request in Python 2, using urllib2. My code is currently as follows;
url = 'http://' + server_url + '/playlists/upload?'
data = urllib.urlencode(OrderedDict([("sectionID", section_id), ("path", current_playlist), ("X-Plex-Token", plex_token)]))
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
d = response.read()
print(d)
'url' and 'data' return correctly formatted with the variables, I know this because I can copy their output into Postman for checking and the POST works fine (see example url below)
http://192.168.1.96:32400/playlists/upload?sectionID=11&path=D%3A%5CMedia%5CPPP%5Ctmp%5Cplex%5CAmbient.m3u&X-Plex-Token=XXXXXXXXX
When I run my Python code I get a 401 error returned, presumably meaning the X-Plex-Token parameter was not correctly sent, hence I am not allowed access.
Can anyone tell me where I'm going wrong? Help is greatly appreciated.
Have you tried removing the question mark and not using OrderedDict (no idea why you would need that) ?
url = 'http://' + server_url + '/playlists/upload'
data = urllib.urlencode({"sectionID":section_id), "path":current_playlist,"X-Plex-Token":plex_token})
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
d = response.read()
print(d)
Of course you should be using requests instead anyway:
import requests
r = requests.post('http://{}/playlists/upload'.format(server_url), data = {"sectionID":section_id), "path":current_playlist,"X-Plex-Token":plex_token})
print r.url
print r.text
print r.json
I've ended up switching to Python 3, as I didn't realise that the requests module was included by default. Still no idea why the above wasn't working, but maybe something to do with the lack of headers
headers = {'cache-control': "no-cache"}
edit:
This is what I'm using now, as mentioned above I probably don't need OrderedDict.
import requests
url = 'http://' + server_url + '/playlists/upload'
headers = {'cache-control': "no-cache"}
querystring = urllib.parse.urlencode(OrderedDict([("sectionID", section_id), ("path", current_playlist), ("X-Plex-Token", plex_token)]))
response = requests.request("POST", url, data = "", headers = headers, params = querystring)
print(response.text)

How to call an API using Python Requests library

I can't figure out how to call this api correctly using python urllib or requests.
Let me give you the code I have now:
import requests
url = "http://api.cortical.io:80/rest/expressions/similar_terms?retina_name=en_associative&start_index=0&max_results=1&sparsity=1.0&get_fingerprint=false"
params = {"positions":[0,6,7,29]}
headers = { "api-key" : key,
"Content-Type" : "application/json"}
# Make a get request with the parameters.
response = requests.get(url, params=params, headers=headers)
# Print the content of the response
print(response.content)
I've even added in the rest of the parameters to the params variable:
url = 'http://api.cortical.io:80/rest/expressions/similar_terms?'
params = {
"retina_name":"en_associative",
"start_index":0,
"max_results":1,
"sparsity":1.0,
"get_fingerprint":False,
"positions":[0,6,7,29]}
I get this message back:
An internal server error has been logged # Sun Apr 01 00:03:02 UTC
2018
So I'm not sure what I'm doing wrong. You can test out their api here, but even with testing I can't figure it out. If I go out to http://api.cortical.io/, click on the Expression tab, click on the POST /expressions/similar_terms option then paste {"positions":[0,6,7,29]} in the body textbox and hit the button, it'll give you a valid response, so nothing is wrong with their API.
I don't know what I'm doing wrong. can you help me?
The problem is that you're mixing query string parameters and post data in your params dictionary.
Instead, you should use the params parameter for your query string data, and the json parameter (since the content type is json) for your post body data.
When using the json parameter, the Content-Type header is set to 'application/json' by default. Also, when the response is json you can use the .json() method to get a dictionary.
An example,
import requests
url = 'http://api.cortical.io:80/rest/expressions/similar_terms?'
params = {
"retina_name":"en_associative",
"start_index":0,
"max_results":1,
"sparsity":1.0,
"get_fingerprint":False
}
data = {"positions":[0,6,7,29]}
r = requests.post(url, params=params, json=data)
print(r.status_code)
print(r.json())
200
[{'term': 'headphones', 'df': 8.991197733061748e-05, 'score': 4.0, 'pos_types': ['NOUN'], 'fingerprint': {'positions': []}}]
So, I can't speak to why there's a server error in a third-party API, but I followed your suggestion to try using the API UI directly, and noticed you're using a totally different endpoint than the one you're trying to call in your code. In your code you GET from http://api.cortical.io:80/rest/expressions/similar_terms but in the UI you POST to http://api.cortical.io/rest/expressions/similar_terms/bulk. It's apples and oranges.
Calling the endpoint you mention in the UI call works for me, using the following variation on your code, which requires using requests.post, and as was also pointed out by t.m. adam, the json parameter for the payload, which also needs to be wrapped in a list:
import requests
url = "http://api.cortical.io/rest/expressions/similar_terms/bulk?retina_name=en_associative&start_index=0&max_results=1&sparsity=1.0&get_fingerprint=false"
params = [{"positions":[0,6,7,29]}]
headers = { "api-key" : key,
"Content-Type" : "application/json"}
# Make a get request with the parameters.
response = requests.post(url, json=params, headers=headers)
# Print the content of the response
print(response.content)
Gives:
b'[[{"term":"headphones","df":8.991197733061748E-5,"score":4.0,"pos_types":["NOUN"],"fingerprint":{"positions":[]}}]]'

Python web scrape data request error

I'm trying to retrieve the response json data by a web site that I call.
The site is this:
WebSite DriveNow
On this page are shown on map some data. With browser debugger I can see the end point
end point
that sends response data json.
I have use this python to try scrape the json response data:
import requests
import json
headers = {
'Host': 'api2.drive-now.com',
'X-Api-Key': 'adf51226795afbc4e7575ccc124face7'
}
r = requests.get('https://api2.drive-now.com/cities/42756?expand=full', headers=headers)
json_obj = json.loads(r.content)
but I get this error:
hostname doesn't match either of 'activityharvester.com'
How I can retrieve this data?
Thanks
I have tried to call the endpoint that show json response using Postam, and passing into Header only Host and Api-Key. The result is the json that i want. But i i try the same call into python i recive the error hostname doesn't match either of 'activityharvester.com'
I don't understand your script, nor your question. Why two requests and three headers ? Did you mean something like this ?
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0',
'X-Api-Key':'adf51226795afbc4e7575ccc124face7',
}
res = requests.get('https://api2.drive-now.com/cities/4604?expand=full', headers=headers, allow_redirects=False)
print(res.status_code, res.reason)
json_obj = json.loads(res.content)
print(json_obj)

How to request the "load more" function with Python?

Hi~I'm now doing my research on Zhihu, a Chinese Q&A website like Quora, using social network analysis. And I'm writing a crawler with Python these days, but met a problem:
I want to scratch the user info that follows a specific user, like Kaifu-Lee. The Kaifu-Lee's followers page is http://www.zhihu.com/people/kaifulee/followers
And the load-more button is at the bottom of the followers list, I need to get full list.
Here's the way I do with python requests:
import requests
import re
s = requests.session()
login_data = {'email': '***', 'password': '***', }
# post the login data.
s.post('http://www.zhihu.com/login', login_data)
# verify if I've login successfully. Surely this step have succeed.
r = s.get('http://www.zhihu.com')
Then, I jumped to the target page:
r = s.get('http://www.zhihu.com/people/kaifulee/followers')
and get 200 return:
In [7]: r
Out[7]: <Response [200]>
So the next step is to analyze the request of load-more under "network" tag using chrome's developer tool, here's the information:
Request URL: http://www.zhihu.com/node/ProfileFollowersListV2
Request Method: POST
Request Headers
Connection:keep-alive
Host:www.zhihu.com
Origin:http://www.zhihu.com
Referer:http://www.zhihu.com/people/kaifulee/followers
Form data
method:next
params:{"hash_id":"12135f10b08a64c54e8bfd537dd7bee7","order-by":"created","offset":20}
_xsrf:ea63beee3a3444bfb853f36b7d968ad1
So I try to POST:
global header_info
header_info = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1581.2 Safari/537.36',
'Host':'www.zhihu.com',
'Origin':'http://www.zhihu.com',
'Connection':'keep-alive',
'Referer':'http://www.zhihu.com/people/zihaolucky/followers',
'Content-Type':'application/x-www-form-urlencoded',
}
# form data.
data = r.text
raw_hash_id = re.findall('hash_id(.*)',data)
hash_id = raw_hash_id[0][14:46]
payload={"method":next,"hash_id":str(hash_id),"order_by":"created","offset":20}
# post with parameters.
url = 'http://www.zhihu.com/node/ProfileFollowersListV2'
r = requests.post(url,data=payload,headers=header_info)
BUT, it returns Response<404>>
If I made any mistake?
Someone said I made a mistake in dealing with the params. The Form Data has 3 parameters:method,params,_xsrfand I lost _xsrf and then I put them into a dictionary.
So I modified the code:
# form data.
data = r.text
raw_hash_id = re.findall('hash_id(.*)',data)
hash_id = raw_hash_id[0][14:46]
raw_xsrf = re.findall('xsrf(.*)',r.text)
_xsrf = raw_xsrf[0][9:-3]
payload = {"method":"next","params":{"hash_id":hash_id,"order_by":"created","offset":20,},"_xsrf":_xsrf,}
# reuse the session object, but still error.
>>> r = s.post(url,data=payload,headers=header_info)
>>> <Response [500]>
You can't pass nested dictionaries to the data parameter. Requests just doesn't know what to do with them.
It's not clear, but it looks like the value of the params key is probably JSON. This means your payload code should look like this:
params = json.dumps({"hash_id":hash_id,"order_by":"created","offset":20,})
payload = {"method":"next", "params": params, "_xsrf":_xsrf,}
Give that a try.

Categories

Resources