Scrapy Trying to get Json Response

Scrapy Trying to get Json Response - python

I am using a scraper to scrape the steam gaming platform, and I am having trouble with pagination. The pagination from this link: https://steamcommunity.com/sharedfiles/filedetails/comments/2460661464
uses pagination, and I believe is making a POST request to some server. I would like to simulate this request using Scrapy's FormRequest function, and get all of the comments at once. I don't know how to do this. what should my headers and formdata look like? Currently they look like this:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Host': 'steamcommunity.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0'
}
data = {
"start": "0",
"totalcount": comment_number,
"count": comment_number,
"sessionid": "d880ab2338b70926db0a9591",
f"extended_data": "{\"contributors\":[\"{contributor_id}\",{}],\"appid\":289070,\"sharedfile\":{\"m_parentsDetails\":null,\"m_parentBundlesDetails\":null,\"m_bundledChildren\":[],\"m_ownedBundledItems\":[]},\"parent_item_reported\":false}",
"feature2": "-1"
}
yield FormRequest(url, formdata=data, headers=headers, callback=self.parse_paginated_comments, dont_filter=True, meta={'app_id': app_id, 'game': game, 'workshop_id': workshop_id, 'workshop_name': workshop_name})
What are the correct headers/data and how do I set up my FormRequest to get all of the comments (in this case 1-134)?

I don't know anything about Scrapy, but here's how you could do it using just basic requests and BeautifulSoup.
The API doesn't seem to be very strict about the payload that's POSTed. Even if some parameters are omitted, the API doesn't seem to mind. I've found that you can assign an impossibly large number to the count parameter to have the API return all comments (assuming there will never be more than 99999999 comments in a thread, in this case). I haven't played around with the request headers that much - you could probably trim them down even further.
def get_comments(thread_id):
import requests
from bs4 import BeautifulSoup as Soup
url = "https://steamcommunity.com/comment/PublishedFile_Public/render/76561198401810552/{}/".format(thread_id)
headers = {
"Accept": "text/javascript, text/html, application/xml, text/xml, */*",
"Accept-Encoding": "gzip, deflate",
"Content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"User-Agent": "Mozilla/5.0",
"X-Requested-With": "XMLHttpRequest"
}
payload = {
"start": "0",
"count": "99999999",
}
def to_clean_comment(element):
return element.text.strip()
response = requests.post(url, headers=headers, data=payload)
response.raise_for_status()
soup = Soup(response.json()["comments_html"], "html.parser")
yield from map(to_clean_comment, soup.select("div.commentthread_comment_text"))
def main():
for comment in get_comments("2460661464"):
print(comment)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())

Related

How can i webscrape the website investing.com who receives data from an external API?

i am trying to webscrape the following website: https://www.investing.com/economic-calendar/.
Through research, i know there are 2 methods to scrape this website:
requests
selenium
This website receives data from an external api:
https://www.investing.com/economic-calendar/Service/getCalendarFilteredData
With request, the websites send additional information about the api-call for example if I want the economic data from january 1st till january 2nd, the website will send these 2 dates with the payload.
I tried to make an api call to the link but i got an error code 403, after researching i found out that my headers were incorrect, but i don´t know how to fix this.
Can you maybe explain me how to webscrape this website?
I have the feeling that this website is protected by cloudfare or something like that.
Below my code.
import json
import datetime
import requests
from bs4 import BeautifulSoup
def Get_calendar_data():
headers = {
'authority': 'www.investing.com',
'method': 'POST',
'path':'/economic-calendar/Service/getCalendarFilteredData',
'scheme': 'https',
'accept': '*/*',
#'accept-encoding': 'gzip', 'deflate', 'br',
#'accept-language': ('nl-NL','nl;q=0.9','en-US;q=0.8','en;q=0.7'),
'content-length': '439',
'content-type': 'application/x-www-form-urlencoded',
'origin': 'https://www.investing.com',
'Referer':'https://www.investing.com/economic-calendar/',
#'sec-ch-ua': ("Not_A Brand;v=99", "Google Chrome;v=109", "Chromium;v=109"),
'sec-ch-ua-platform': 'Windows',
'sec-fetch-dest': 'empty',
'sec-fetch-mode':'cors',
'sec-fetch-site':'same-origin',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
'x-requested-with': 'XMLHttpRequest',
}
url = 'https://www.investing.com/economic-calendar/Service/getCalendarFilteredData'
body = {
'dateFrom': '2023-02-07',
'dateTo': '2023-02-08'
}
with requests.session() as r:
s = r.post(url, data = body, headers = headers)
print(s)
Get_calendar_data()
fetch("https://www.investing.com/economic-
calendar/Service/getCalendarFilteredData", {
"headers": {
"accept": "*/*",
"content-type": "application/x-www-form-urlencoded",
"sec-ch-ua": "\"Not_A Brand\";v=\"99\", \"Google Chrome\";v=\"109\", \"Chromium\";v=\"109\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"x-requested-with": "XMLHttpRequest"
},
"referrer": "https://www.investing.com/economic-calendar/",
"referrerPolicy": "strict-origin-when-cross-origin",
"body": "country%5B%5D=25&country%5B%5D=32&country%5B%5D=6&country%5B%5D=37&country%5B%5D=72&country%5B%5D=22&country%5B%5D=17&country%5B%5D=39&country%5B%5D=14&country%5B%5D=10&country%5B%5D=35&country%5B%5D=43&country%5B%5D=56&country%5B%5D=36&country%5B%5D=110&country%5B%5D=11&country%5B%5D=26&country%5B%5D=12&country%5B%5D=4&country%5B%5D=5&dateFrom=2023-03-01&dateTo=2023-03-09&timeZone=8&timeFilter=timeRemain&currentTab=custom&limit_from=0",
"method": "POST",
"mode": "cors",
"credentials": "omit"
}); ;

Assuming you're allready providing valide Headers, Cookies and post-data:
As check-cloudfare tells, the website seems to be protected by cloudfare.
I discovered this issue also some time ago, and the only workaround for me was using javascript in a browser.
You can use Selenium or Selenium-Profiles with driver.execute_async_script("some_js") for that. A advantage of using Selenium-Profiles is, that it is mostly undetected and allready includes a function for making single requests
With javascript, it would look something like that:
fetch("https://www.investing.com/economic-calendar/Service/getCalendarFilteredData", {
"headers": {
"accept": "*/*",
"content-type": "application/x-www-form-urlencoded",
"sec-ch-ua": "\"Not_A Brand\";v=\"99\", \"Google Chrome\";v=\"109\", \"Chromium\";v=\"109\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"x-requested-with": "XMLHttpRequest"
},
"referrer": "https://www.investing.com/economic-calendar/",
"referrerPolicy": "strict-origin-when-cross-origin",
"body": "country%5B%5D=25&country%5B%5D=32&country%5B%5D=6&country%5B%5D=37&country%5B%5D=72&country%5B%5D=22&country%5B%5D=17&country%5B%5D=39&country%5B%5D=14&country%5B%5D=10&country%5B%5D=35&country%5B%5D=43&country%5B%5D=56&country%5B%5D=36&country%5B%5D=110&country%5B%5D=11&country%5B%5D=26&country%5B%5D=12&country%5B%5D=4&country%5B%5D=5&dateFrom=2023-03-01&dateTo=2023-03-09&timeZone=8&timeFilter=timeRemain&currentTab=custom&limit_from=0",
"method": "POST",
"mode": "cors",
"credentials": "omit"
})
Note that this is javascript, and NOT Python.

How to login into Instagram using Python Requests?

I am using the following code to make a Python Request to login into my Instagram account. I am running this on local.
import requests
from datetime import datetime
import re
from pprint import pprint
import json
time = int(datetime.now().timestamp())
link = 'https://www.instagram.com/accounts/login/'
login_url = f"https://www.instagram.com/accounts/login/ajax/"
payload = {
'username': 'username',
'enc_password': 'PWD_INSTAGRAM_BROWSER:0:{time}:password',
'queryParams': "{}",
'optIntoOneTap': 'false',
'stopDeletionNonce': "",
'trustedDeviceRecords': "{}"
}
response = requests.get(link)
csrf = response.cookies['csrftoken']
print(csrf)
response = requests.post(login_url, data=payload, headers={
"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"X-Xequested-Xith": "XMLHttpRequest",
"Referer": "https://www.instagram.com/",
"X-CSRFtoken": csrf,
"Content-Type": "application/x-www-form-urlencoded",
"Host": "www.instagram.com",
"Origin": "https://www.instagram.com"
})
response_json = json.loads(response.text)
pprint(response_json)
The response I receive after running the above code shows that I request is not authenticated:
{'authenticated': False, 'status': 'ok', 'user': True}
How can I login to Instagram using requests? Is there an updated method?

In general, these use cases lend themselves perfectly for selenium, scrapy, playwright or puppeteer. I do not have an instagram account, so I don't know if this works, but in theory it might return a valid response:
import requests
cookies = {
'csrftoken': '9e7U8qRNqAbazRC0kwrRgyN2okh1kihx',
'mid': 'YsM1_AALAAEG2fGCvkPXE5DVlJD0',
'ig_did': '494394E2-A583-4F01-BC32-5E4344FE2C4D',
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
'X-CSRFToken': '9e7U8qRNqAbazRC0kwrRgyN2okh1kihx',
'X-Instagram-AJAX': 'c6412f1b1b7b',
'X-IG-App-ID': '936619743392459',
'X-ASBD-ID': '198387',
'X-IG-WWW-Claim': '0',
'X-Requested-With': 'XMLHttpRequest',
'Origin': 'https://www.instagram.com',
'DNT': '1',
'Connection': 'keep-alive',
'Referer': 'https://www.instagram.com/accounts/login/?',
# Requests sorts cookies= alphabetically
# 'Cookie': 'csrftoken=9e7U8qRNqAbazRC0kwrRgyN2okh1kihx; mid=YsM1_AALAAEG2fGCvkPXE5DVlJD0; ig_did=494394E2-A583-4F01-BC32-5E4344FE2C4D',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
# Requests doesn't support trailers
# 'TE': 'trailers',
}
data = {
'enc_password': '#PWD_INSTAGRAM_BROWSER:10:1656960533:ARxQAMMwb3Yd6w3UdaFGt0Q3mTZ7lMDJHHmZFLaGEfQahXJOTxqb35Q/ZGC3B70DxZRhKcnaf3xImXyL7EFseRF/yZG4dvauui/LzLU7oAHK3rYHSYsjPjQTham/5DFXq6m4foqB5fIiJoChT+ng58EDUkFA1A==',
'username': 'somerandomemail#hotmail.com',
'queryParams': '{}',
'optIntoOneTap': 'false',
'stopDeletionNonce': '',
'trustedDeviceRecords': '{}',
}
response = requests.post('https://www.instagram.com/accounts/login/ajax/', cookies=cookies, headers=headers, data=data)
If you run into security issues, try the same but with cloudscraper instead of requests library.

I don't think you can access Instagram with only requests as far as I know.
Last time I tried, I'd to create an app within the facebook developer account and make an accesstoken from the Facebook / Instagram Graph API to access Instagram and make login stuffs. With that, you can not only login to your account but also you can post contents from that.
Long story short, refer Instagram Graph API and this should get your job done!
Edit:
# Sharing on Instagram...
insta = facebook.GraphAPI(facebookUserAccessToken)
InstaSend = insta.put_photo(open(IMAGEPATH, 'rb'), message=TEXT)
if InstaSend:
print('\nInstagram Share Successful!')
# Sharing on Facebook...
face = facebook.GraphAPI(facebookPageAccessToken)
face.put_object(
parent_object=facebookPageID,
connection_name="feed",
message=TEXT,
)
faceSend = face.put_photo(open(IMAGEPATH, 'rb'), message=TEXT)
if faceSend:
print('\nFacebook Share Successful!')
This above part of code dates back to 2019 that I'd written to automatically share the content on different social media platforms once the video was published on my YouTube channel. I haven't used this since then and I doubt this would work for you as is. Some code might needs to be changed in order for it to work as the Graph API is actively being updated by Meta. However what I believe is the process has gotten easier.
Additionally, you can checkout "Justin Stolpe" on YouTube for more insights on this topic.

How to get h3 tag with class in web scraping Python

I want to scrape the text of an h3 with class as shown in the attached photo.
I modified the code based on the posted recommendation:
import requests
import urllib
session = requests.session()
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
'Accept': '*/*',
'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
'Content-Type': 'application/json',
'Origin': 'https://auth.fool.com',
'Connection': 'keep-alive',
})
response1 = session.get("https://www.fool.com/secure/login.aspx")
assert response1
response1.cookies
#<RequestsCookieJar[Cookie(version=0, name='_csrf', value='8PrzU3pSVQ12xoLeq2y7TuE1', port=None, port_specified=False, domain='auth.fool.com', domain_specified=False, domain_initial_dot=False, path='/usernamepassword/login', path_specified=True, secure=True, expires=1609597114, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)]>
params = urllib.parse.parse_qs(response1.url)
params
payload = {
"client_id": params["client"][0],
"redirect_uri": "https://www.fool.com/premium/auth/callback/",
"tenant": "fool",
"response_type": "code",
"scope": "openid email profile",
"state": params["https://auth.fool.com/login?state"][0],
"_intstate": "deprecated",
"nonce": params["nonce"][0],
"password": "XXX",
"connection": "TMF-Reg-API",
"username": "XXX",
}
formatted_payload = "{" + ",".join([f'"{key}":"{value}"' for key, value in payload.items()]) + "}"
url = "https://auth.fool.com/usernamepassword/login"
response2 = session.post(url, data=formatted_payload)
response2.cookies
#<RequestsCookieJar[]>
response2.cookies is empty thus it seems that the login fails.

I can only give you some partial advice but you might be able to find the "last missing piece" (I have no access to the premium content of your target page). It's correct, that you need to login first, in order to get the content:
What's usually useful is using a session that handles cookies. Also, a proper header often does the trick:
import requests
import urllib
session = requests.session()
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
'Accept': '*/*',
'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
'Content-Type': 'application/json',
'Origin': 'https://auth.fool.com',
'Connection': 'keep-alive',
})
Next we get some cookies for our session from the "official" login page:
response = session.get("https://www.fool.com/secure/login.aspx")
assert response
We will use some of the response URL (yes, there are a couple of redirects) parameters to get a valid payload for the actual login:
params = urllib.parse.parse_qs(response.url)
params
payload = {
"client_id": params["client"][0],
"redirect_uri": "https://www.fool.com/premium/auth/callback/",
"tenant": "fool",
"response_type": "code",
"scope": "openid email profile",
"state": params["https://auth.fool.com/login?state"][0],
"_intstate": "deprecated",
"nonce": params["nonce"][0],
"password": "#pas$w0яδ",
"connection": "TMF-Reg-API",
"username": "seralouk#stackoverflow.com",
}
formatted_payload = "{" + ",".join([f'"{key}":"{value}"' for key, value in payload.items()]) + "}"
Finally, we can login:
url = "https://auth.fool.com/usernamepassword/login"
response = session.post(url, data=formatted_payload)
Let me know if you are able to login or if we need to tweak the script. And just some general comments: I normally use an incognito tab to inspect the browser requests an then copy them over to postman where I play around with the parameters and see how they influence the HTTP response.
I rarely use Selenium but rather invest the time to build a proper requests tu be used with python's internal library and then use BeautifulSoup.
Edit:
After logging in, you can use BeautifulSoup to parse the content of the actual site:
# add BeautifulSoup to our project
from bs4 import BeautifulSoup
# use the session with the login cookies to fetch the data
the_url = "https://www.fool.com/premium/stock-advisor/coverage/tags/buy-recommendation"
data = BeautifulSoup(session.get(the_url).text, 'html.parser')
my_h3 = data.find("h3", "content-item-headline")

What is the correct header/payload for this voucher site in python requests?

I'm setting up a python script, that checks if voucher codes are still valid.
(Site is: "https://www.lieferando.de/checkVoucher.php")
I works in Postman and ARC but I can't get it to work with Python Requests. I also tried the create code function from Postman but its still not working.
url = "https://www.lieferando.de/checkVoucher.php"
payload = {'vouchercode': "TRF5RCF6VRLZ7552"}
headers = {
'vouchercode': "TRF5RCF6VRLZ7552",
'Content-Type': "application/x-www-form-urlencoded",
'User-Agent': "PostmanRuntime/7.11.0",
'Accept': "*/*",
'Cache-Control': "no-cache",
'Postman-Token': "143f10f9-4bfc-4bfe-9cb9-ae4159118c7c,14eebeb3-f79b-4dea-9279-328e5dad1850",
'Host': "www.lieferando.de",
'cookie': "visid_incap_1716123=fad1eRraQbSyEro92B7ouuB0y1wAAAAAQUIPAAAAAAAhvXPqviZx2wjoycs1g4Fc; incap_ses_727_1716123=+tNFCxebHDoMdSkWn9MWCljCy1wAAAAAzwDNwJi0+rHL/bgMW1zj3Q==; incap_ses_184_1716123=geD7AxnPrHLB4TighrSNAnuFy1wAAAAAFTCb2kBj03wyR2BVXlobyg==; incap_ses_876_1716123=tlZZBSxfnSPJPB4gFi4oDI6Ly1wAAAAAWxnq9RAJRBvFTuNF7EhDEw==; incap_ses_730_1716123=JW8oXiBsrk8SYz8T/3shCmCRy1wAAAAApG2tibhMTuqnZBYjb+JDGg==; incap_ses_536_1716123=GY3ddNoWphYa0bcoG0JwB+mXy1wAAAAAxqvjmrYrd4ZqhbHGH418eQ==; nlbi_1716123=4oBPV9c8liHrbOgrX9BzAQAAAADFGnUou8G0vVD66E07GFpV; incap_ses_246_1716123=Oka1Xjj8WAEkqd1TwPdpA/qly1wAAAAAWjqXqiPrP3pj1mpDS572Lg==; incap_ses_108_1716123=madBJ0JEly173VQl8LN/Ab+1y1wAAAAAzTICVw2c/Vk5RibweBnRHQ==; incap_ses_877_1716123=atGOOty1yBkTqVcPrLsrDG+KzFwAAAAAtCkMsl02gWsI0TCmJVWhjQ==; PHPSESSID=j812qmhlang0kvh8rfdulhkm56",
'accept-encoding': "gzip, deflate",
'content-length': "1376",
'Connection': "keep-alive",
'cache-control': "no-cache"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
Server should response with
{"basketResponse":null,"status":"error","value":"Alle Gutscheine mit diesem Gutscheincode wurden bereits verwendet. Es sind keine Gutscheine verf\u00fcgbar und somit ist der Code nicht mehr g\u00fcltig."
But responds with
{"basketResponse":null,"status":"error","value":"Bitte gib den Gutscheincode ein","markfields":["ivouchercode"]}

So you have a couple of problems with your code:
You should send payload as a JSON string (so replacing data= with json=)
The headers that you are using aren't correct
You should include cookies in your POST request (you can do it automatically using requests.session())
All in all your code should look something like this:
import requests
session = requests.session()
url = "https://www.lieferando.de/checkVoucher.php"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "DNT": "1", "Connection": "close", "Upgrade-Insecure-Requests": "1"}
session.request("GET", "https://www.lieferando.de/bestellung-abschliessen-miran-pizza-doener", headers=headers)
payload = {'vouchercode': "TRF5RCF6VRLZ7552"}
response = session.request("POST", url, json=payload, headers=headers)
print(response.json())
(PS:
response.json() converts response into a JSON string, which makes it easier to work with. If not needed you can use .text instead)
Hope this helps

How could I send two consecutive requests including redirecting

How could I send two consecutive requests including redirecting
I tried to use Python requests to mimic the search function on the browser.
However, it's not as simple as other simple requests.
I opened the developer mode on Chrome browser and copied the two requests in Curl form then converted it into Python request form.
I can only get 500 error via Python, but I could get the correct response on the browser.
Current code , it only returns 500 error
cookies = {
'optimizelyEndUserId': 'oeu1454030467608r0.5841516454238445',
~~~
'_gat': '1',
}
headers = {
'Origin': 'https://m.flyscoot.com',
~~~~
}
data = 'origin=KHH&destination=KIX&departureDate=20160309&returnDate=&roundTrip=false&adults=1&children=0&infants=0&promoCode='
req = requests.session()
resp_1 = req.post('https://m.flyscoot.com/search', headers=headers, cookies=cookies, data=data)
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
~~~~
}
# because the first request will be redirected to a unknown status, so I copied the first response set_cookie for the 2nd request uses.
resp_2 = req.get('https://m.flyscoot.com/select', headers=headers, cookies=resp_1.history[0].cookies)

It's seem it's the mobile url. Mostly you should set a web agent. Try this (Python 3):
import urllib
import requests
FF_USER_AGENT = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:21.0.0) '
'Gecko/20121011 Firefox/21.0.0',
"Origin": "http://makeabooking.flyscoot.com",
"Referer": "http://makeabooking.flyscoot.com",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
}
req = requests.session()
resp_1 = req.get('http://makeabooking.flyscoot.com/', headers=FF_USER_AGENT)
# form urlencoded data
raw_data = (
"availabilitySearch.SearchInfo.SearchStations%5B0%5D.DepartureStationCode"
"=ADL"
"&availabilitySearch.SearchInfo.SearchStations%5B0%5D.ArrivalStationCode"
"=SIN"
"&availabilitySearch.SearchInfo.SearchStations%5B0%5D.DepartureDate=2%2F17"
"%2F2016&availabilitySearch.SearchInfo.SearchStations%5B1%5D"
".DepartureStationCode=SIN&availabilitySearch.SearchInfo.SearchStations%5B1"
"%5D.ArrivalStationCode=ADL&availabilitySearch.SearchInfo.SearchStations"
"%5B1"
"%5D.DepartureDate=3%2F17%2F2016&availabilitySearch.SearchInfo.Direction"
"=Return&Singapore+%28SIN%29=Singapore+%28SIN%29&availabilitySearch"
".SearchInfo.AdultCount=1&availabilitySearch.SearchInfo.ChildrenCount=0"
"&availabilitySearch.SearchInfo.InfantCount=0&availabilitySearch.SearchInfo"
".PromoCode=")
dict_data = dict(urllib.parse.parse_qsl(raw_data))
final = req.post('http://makeabooking.flyscoot.com/',
headers=FF_USER_AGENT,
data=dict_data)
print(final.status_code)
print(final.url)
[MOBILE Version]
import urllib
import requests
# debug request
import http.client
http.client.HTTPConnection.debuglevel = 1
import logging
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
FF_USER_AGENT = {
'User-Agent': "Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4",
"Origin": "https://m.flyscoot.com",
"Referer": "https://m.flyscoot.com/search",
"Host": "m.flyscoot.com",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip,deflate",
"Accept-Language": "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
}
req = requests.session()
resp_1 = req.get('https://m.flyscoot.com', headers=FF_USER_AGENT)
# form urlencoded data
raw_data = (
"origin=MEL&destination=CAN&departureDate=20160220&returnDate=20160227&roundTrip=true&adults=1&children=0&infants=0&promoCode=")
dict_data = dict(urllib.parse.parse_qsl(raw_data))
final = req.post('https://m.flyscoot.com/search',
headers=FF_USER_AGENT,
data=dict_data)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scrapy Trying to get Json Response - python

Related

How can i webscrape the website investing.com who receives data from an external API?

How to login into Instagram using Python Requests?

How to get h3 tag with class in web scraping Python

What is the correct header/payload for this voucher site in python requests?

How could I send two consecutive requests including redirecting

Categories

Resources