I am not an expert with python but this is what I did with python-requests. I am trying to call this URL that gives me the email id of the user if I provide the first_name, last_name and domain.
https://dry-tor-58240.herokuapp.com
However, when I try to request it with python I get the 200 response code but when I convert the response.text to Beautiful Soup object I don't see the email address anywhere in it.
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {"first_name":"nandish","last_name":"ajani","domain":"atyantik.com"}
r = requests.get("https://dry-tor-58240.herokuapp.com/", headers = headers, params = payload)
soup = BeautifulSoup(r.text, 'lxml')
Can anyone let me know what is it that I am doing wrong?
It should be POST request method. This will return a json format, so I also utilized request's .json()
import requests
from bs4 import BeautifulSoup
request_url = 'https://dry-tor-58240.herokuapp.com/find'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
payload = {"first_name":"nandish","last_name":"ajani","domain":"atyantik.com"}
jsonObj = requests.post(request_url, headers = headers, json = payload).json()
Output:
print (jsonObj['email'])
Related
I am trying to scrape a website using beautiful soup. For that, while making a request I need to send the Instagram user ID which is supposed to be entered in a search box & scrape the response HTML. How should I send the user ID while sending a request? Thanks.
import requests
from bs4 import BeautifulSoup
URL = "https://product.influencer.in/price-index"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'lxml')
print(soup.prettify())
It's a simple post request to the api to get that data. You'll need to enter your email in the value below
import requests
instagramHandles = ['leomessi','cristiano']
URL = "https://team.influencer.in/api/v1/price-index/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
for instagramHandle in instagramHandles:
payload = {
'email': "name#email.com",
'insta_handle': instagramHandle}
jsonData = requests.post(URL, headers=headers, data=payload).json()
cost_min = jsonData['data']['cost_min']
cost_max = jsonData['data']['cost_max']
print(f'{instagramHandle}: {cost_min} - {cost_max}')
Output:
leomessi: 5.4 Cr. - 6.5 Cr.
cristiano: 8.5 Cr. - 10.1 Cr.
I need to scrape infos from an instagram user page, more, I need to use this url page : "https://www.instagram.com/cristiano/?__a=1"
The problem is that I need to be loggin with my instagram account to execute this script
from requests import get
from bs4 import BeautifulSoup
import json
import re
import requests
url_user = "https://www.instagram.com/cristiano/?__a=1"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.74 Safari/537.36 Edg/79.0.309.43'}
response = get(url_user, headers=headers)
print(response)
# print(page.text)
soup = BeautifulSoup(response.text, 'html.parser')
# print(soup)
jsondata=json.loads(str(soup))
I get this error :
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
How can I avoid that connection problem to scrape infos and access data?
Thank you
Adding the __a=1 parameter gets you a JSON response therefore you do not need to go through BeautifulSoup you simply load the JSON directly.
response = get(url_user, headers=headers)
jsondata=json.loads(response.text)
Alternatively you can use the json() function to load the JSON:
response = get(url_user, headers=headers)
jsondata = response.json()
I'm trying to get data from a json link, but I'm getting this error: TypeError: can't concat str to bytes
This is my code:
l = "https://www.off---white.com/en/IT/men/products/omch016f18d471431088s"
url = (l+".json"+"?porcoiddio")
req = urllib.request.Request(url, headers)
response = urllib.request.urlopen(req)
size_opts = json.loads(response.decode('utf-8'))['available_sizes']
How can I solve this error?
Your question answer is change your code to:
size_opts = json.loads(response.read().decode('utf-8'))['available_sizes']
Change at 2018-10-02 22:55 : I view your source code and found Response 503 , the reason why you got 503 is that request did not contain cookies:
req = urllib.request.Request(url, headers=headers)
you have update your headers.
headers.update({"Cookie":cookie_value})
req = urllib.request.Request(url, headers=headers) # !!!! you need a headers include cookies !!!!
you are providing the data argument by mistake …
you'll have to use a keyword argument for headers as otherwise the second argument will be filled with positional input, which happens to be data, try this:
req = urllib.request.Request(url, headers=headers)
See https://docs.python.org/3/library/urllib.request.html#urllib.request.Request for a documentation of Requests signature.
You could have a go using requests instead?
import requests, json
l = "https://www.off---white.com/en/IT/men/products/omch016f18d471431088s"
url = (l+".json"+"?porcoiddio")
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=10))
size_opts = session.get(url, headers= {'Referer': 'off---white.com/it/IT/login', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}).json()['available_sizes']
To check the response:
size_opts = session.get(url, headers= {'Referer': 'off---white.com/it/IT/login', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'})
print(size_opts)
Gives
<Response [503]>
This response means: "503 Service Unavailable. The server is currently unable to handle the request due to a temporary overload or scheduled maintenance"
I would suggest the problem isn't the code but the server?
I am going to write a code to send post requests to a website and getting results.
The post request have 3 parameters as you can see from this figure:(in the section of form data) d,n, q
I have tried the following code but always getting error.
import requests
url = 'http://www.kloth.net/services/nslookup.php'
payload = {'d':'google.com','n':'localhost', 'd':'SOA'}
session = requests.Session()
session.post(url',headers=headers,data=payload)
can you help me how to fix this issue!
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
'Host:www.kloth.net',
'Origin':'http://www.kloth.net',
'Proxy-Connection':'keep-alive',
'Referer':'http://www.kloth.net/services/nslookup.php}
payload = {'d': 'google.com',
'n': 'localhost',
'q': 'SOA}
session = requests.Session()
return = session.post('http://www.kloth.net/services/nslookup.php', data = payload, headers = headers)
print str(return.content)
You didn't specified your headers,
i would like to take the response data about a specific website.
I have this site:
https://enjoy.eni.com/it/milano/map/
and if i open the browser debuger console i can see a posr request that give a json response:
how in python i can take this response by scraping the website?
Thanks
Apparently the webservice has a PHPSESSID validation so we need to get it first using proper user agent:
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
}
r = requests.get('https://enjoy.eni.com/it/milano/map/', headers=headers)
session_id = r.cookies['PHPSESSID']
headers['Cookie'] = 'PHPSESSID={};'.format(session_id)
res = requests.post('https://enjoy.eni.com/ajax/retrieve_vehicles', headers=headers, allow_redirects=False)
json_obj = json.loads(res.content)