Python Post doesnt post anything - python

I'm trying to make a program that will connect on a website using this code:
import requests
url = "https://website/cours/login/index.php"
headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Mobile Safari/537.36"
}
brack = requests.post(url,headers = headers, data={"username":"e19****","password":"0****"})
content = brack.content
print(content)
the thing is that when im posting, nothing happens, like if I just did a GET. There's no error when I run (200 OK).

Depending on the API definition, a POST request may not return a content. Since you're getting a 200 response. Most probably there's a json object that was returned with your POST request.
Try this:
import requests
url = "https://website/cours/login/index.php"
headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Mobile Safari/537.36"
}
response = requests.post(url,headers=headers,
data={"username":"e19****","password":"0****"})
print(response.json())
Usually, if the POST request is expected to give a real-time/streaming ("instant") response to provide/return some data or content, then you might get an session/job ID where you can poll/fetch the completed job at a later time.
These tutorial might be helpful:
https://realpython.com/flask-connexion-rest-api/
https://rapidapi.com/blog/how-to-use-an-api-with-python/
https://www.w3schools.com/tags/ref_httpmethods.asp

Related

Python requests html 403 response

Im using the requests module in python to try and make a search on the following webiste http://musicpleer.audio/, however this website appears to be blocking me as it issues nothing but a 403 when i attempt to access it, im wondering how i can get around this, ive tried sending it the user agent of my web browser(chrome) and it still returns error 403. any suggestions on how i could get around this an example of downloading a song from the site would be very helpful. Thanks in advance
My code:
import requests, os
def funGetList:
start_path = 'C:/Users/Jordan/Music/' # current directory
list = []
for path,dirs,files in os.walk(start_path):
for filename in files:
temp = (os.path.join(path,filename))
tempLen = len(temp)
"print(tempLen)"
iterate = 0
list.append(temp[22:(len(temp))-4])
def funDownloadMP3:
for i in list:
print(i)
payload = {'searchQuery': 'meme', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
print(requests.post(url, data=payload))
Putting the User-Agent in the headers seems to work:
In []:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
r = requests.get('{}#!{}'.format(url, 'meme'), headers=headers)
r.status_code
Out[]:
200
Note: It looks like the search url is simple '#!<search-term>'
HTML 403 Forbidden error code.
The server might be expecting some more request headers like Host or Cookies etc.
You might want to use Postman to debug it with ease

Read HTTPS URL's in R like linkedin

I am trying to read the LinkedIn company page, for example, https://www.linkedin.com/company/facebook
getting company name,location,type of industry,etc.
This is my code below
urlCreate1<-"https://www.linkedin.com/company/facebook"
parse_rvest<-getURL(urlCreate1,'useragent' = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36")
nameRest <- content %>%html_nodes(".industry") %>%html_text()
nameRest
and the output I get for this is character(0) which from previous posts I understand that its not getting .industry tag as I read the https code.
I have also tried this
parse_rvest<-content(GET(urlCreate1),encoding='UTF-8')
but it doesn't help
I have a python code that works but I need this to be done in R
This is part of the python code I got online
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
response = requests.get(url, headers=headers)
formatted_response = response.content.replace('<!--', '').replace('-->', '')
doc = html.fromstring(formatted_response)
datafrom_xpath = doc.xpath('//code[#id="stream-promo-top-bar-embed-id-content"]//text()')
if datafrom_xpath:
try:
json_formatted_data = json.loads(datafrom_xpath[0])
company_name = json_formatted_data['companyName'] if 'companyName' in json_formatted_data.keys() else None
size = json_formatted_data['size'] if 'size' in json_formatted_data.keys() else None
Please help me in reading the page. I am using selector gadget to get the xpath(.industry)
Have a look at the LIN API:
https://cran.r-project.org/web/packages/Rlinkedin/Rlinkedin.pdf
Then, you should be able to easily, and legally, do whatever you want to do.
Here are some ideas to get you started.
http://thinktostart.com/analyze-linkedin-with-r/
https://github.com/hadley/httr/issues/200
https://www.reddit.com/r/datascience/comments/3rufk5/pulling_data_from_linkedin_api/

convert curl command to python [pycurl or request or anythings!]

I`m try to post a data to specific url and get response and save it in the file with curl and no problem with it.
This webpage post a data and if data is correct show response webpage, and if not redirect to url like that:
http://example.com/url/foo/bar/error
My code is:
curl --fail --user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" --data "mydataexample" --referer "http://example.com/url/foo/bar" http://example.com/url/foo/bar --output test.html
But when code in python with requests always status_code is OK [200] even with wrong data! and no exist correct response to save!
Here is my python code:
import requests
data = 'myexampledata'
headers = { 'referer':'http://example.com/url/foo/bar' ,'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36' }
url = 'http://example.com/url/foo/bar'
r = requests.post(url,params=data,headers=headers)
# check headers and status_code for test if data is wrong...
print r.headers
print r.status_code
And now, How write a python code to solve this problem and exactly work same as curl command? Any advise with requests or pycurl to fix it?

Login to website via Python Requests

for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.
I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.
This is the login page: http://seekingalpha.com/account/login
My first approach looks like this:
import requests
with requests.Session() as c:
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})
page = c.get("http://seekingalpha.com/account/email_preferences")
print(page.content)
This results in "403 Forbidden"
My second approach looks like this:
from requests import Request, Session
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
# c.get(requestUrl)
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
headers = {
"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
"origin":"http://seekingalpha.com",
"referer":"http://seekingalpha.com/account/login",
"Cache-Control":"max-age=0",
"Upgrade-Insecure-Requests":1,
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
}
s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)
prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"
resp = s.send(prepped)
print(resp.status_code)
In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.
Does someone have an idea, what went wrong? Probably a lot.
Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.
When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:
Your code will be like this:
import requests
response = requests.get("www.example.com", cookies={
"c_user":"my_cookie_part",
"xs":"my_other_cookie_part"
})
print response.content

Image url does not return an image. Using Python requests

I use Python requests to get images, but in some case sit doesn't work. It seems to happen more often. An example is
http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg
It loads fine in my browser, but using requests, it returns html that says "403 forbidden" and "nginx/1.7.11"
import requests
image_url = "<the_url>"
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
r = requests.get(image_url, headers=headers)
# r.content is html '403 forbidden', not an image
I have also tried with this header, which has been necessary in some cases. Same result.
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'image/webp,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
(I had a similar question a few weeks ago, but this was answered by the particular image file types not being supported by PIL. This is different.)
EDIT: Based on comments:
It seems the link only works if you have already visited the original site http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/ with the image. I suppose the browser then uses the cached version. Any workaround?
The site is validating the Referer header. This prevents other sites from including the image in their web pages and using the image host's bandwidth. Set it to the site you mentioned in your post, and it will work.
More info:
https://en.wikipedia.org/wiki/HTTP_referer
import requests
image_url = "http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg"
headers = {
'User-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' : 'gzip,deflate,sdch',
'Referer' : 'http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/'
}
r = requests.get(image_url, headers=headers)
print r
For me, this prints
<Response [200]>

Categories

Resources