Python HTTP Header Content-Type boundary - python

Here is my code:
headers={
'Host': 'cafe.upphoto.naver.com',
'Content-Length': '879990',
'Accept': '*/*',
'Origin': 'http://cafe.upphoto.naver.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',
'Content-Type':content,
'Content-Type': 'multipart/form-data;',# boundary=----WebKitFormBoundary3oLjjtLvU7AzQqTF',
'Referer': write,
'Accept-Language': 'ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4',
}
files = {'image':('test.jpg',open('C:\\Users\\Public\\Pictures\\Sample Pictures\\test.jpg','rb'),'Content-Type: image/jpeg'),'filename':(None,'test.jpg'),'autorotate':(None,'true'),'extractAnimatedCnt':(None,'true'),'userId':(None,'beg1995')}
resp=self.post(url2+'upload/0',files=files,headers=headers)
When you run this code, the following packet is created:
POST http://cafe.upphoto.naver.com/MjAxNzA3MDcwMTExNDAHMTQ5OTM1ODQzNjkyNwdjYWZlMgdiZWcxOTk1BzAHMgdhODA1MzhiZmMyMGMyYTFlYTlhODE1NGY5OTc1ZDRkZA/upload/0 HTTP/1.1
Host: cafe.upphoto.naver.com
Proxy-Connection: keep-alive
Content-Length: 879990
Accept: */*
Origin: http://cafe.upphoto.naver.com
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary9xhUsyQOPYJrPr3R
Referer: http://cafe.upphoto.naver.com/MjAxNzA3MDcwMTExNDAHMTQ5OTM1ODQzNjkyNwdjYWZlMgdiZWcxOTk1BzAHMgdhODA1MzhiZmMyMGMyYTFlYTlhODE1NGY5OTc1ZDRkZA/startup?mode=base&width=960
Accept-Language: ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4
--e2f306a6b5a3485fb70bc2f7f1af2e9a
Content-Disposition: form-data; name="image"; filename="test.jpg"
Content-Type: Content-Type: image/jpeg
ÿØÿà
Content-Disposition: form-data; name="filename"
test.jpg
--e2f306a6b5a3485fb70bc2f7f1af2e9a
Content-Disposition: form-data; name="autorotate"
true
--e2f306a6b5a3485fb70bc2f7f1af2e9a
Content-Disposition: form-data; name="extractAnimatedCnt"
true
--e2f306a6b5a3485fb70bc2f7f1af2e9a
Content-Disposition: form-data; name="userId"
beg1995
--e2f306a6b5a3485fb70bc2f7f1af2e9a-
Look. The boundaries set and the boundaries actually applied are different.
What is the problem?

I suppose you use requests library. It doesn't allow to setup boundaries. So it is generated automatically on the fly.

Related

Making the right post request

I need your help in putting together a post request.
The output I get is html, but the plan was to get the following:
Below are all the data for the desired item:
General
Request URL: https://dgslivebetting.betonline.ag/ngwbet.aspx/gvFrameHtml
Request Method: POST
Status Code: 200
Remote Address: 104.17.64.19:443
Referrer Policy: strict-origin-when-cross-origin
Response Headers
cache-control: no-cache
cf-cache-status: DYNAMIC
cf-ray: 76800ae95afc35b3-DME
content-encoding: br
content-type: application/json; charset=utf-8
date: Thu, 10 Nov 2022 16:07:42 GMT
expires: -1
pragma: no-cache
server: cloudflare
set-cookie: server_persistent=!zk3OrErnBetHZkiKJcby5Il79pzHsf7dxKD0PcVuB54Z2dznuEbqgGAVDWLDvoqpVSDnVq+Jtf91LHo=; path=/; Httponly; Secure
x-newrelic-app-data: PxQFUFRTDQMHR1NRBQkOVVABDhFORDQHUjZKA1ZLVVFHDFYPHjZWADdTRRcPAF0cXgMWAFJFaAcXQU4cBRAlEFEPXSpMVVgQH1UXUR1RHVBUAA9QVloUHgFIQ1YCAg9fAAgFAFZXUFYDUQBAFF5VXkAAZA==
Request Headers
:authority: dgslivebetting.betonline.ag
:method: POST
:path: /ngwbet.aspx/gvFrameHtml
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
content-length: 12
content-type: application/json; charset=UTF-8
cookie: \_xpid=574830729; \_xpkey=K_F3GRHECOTdjT306mOafHByLTxopGhY; LPVID=MxZmQyM2Q5OTFlOTU0ZTJk; \_hjSessionUser_2115245=eyJpZCI6IjQ3MzAxYmQwLTQ4ODgtNWNjMC1hZGZjLWJlZDBmNDgwZDJjZCIsImNyZWF0ZWQiOjE2NjY1NTY0MjQwOTIsImV4aXN0aW5nIjp0cnVlfQ==; CT.CONTENT.NA.STATUS=1; \_gid=GA1.2.1666042031.1667883501; PreviousUrlNav=%2Fsportsbook%2Flive-betting; chQuickBet=undefined; inputAmount=100.00; kameleoonVisitorCode=\_js_ti27yqxpj7dd4k1x; DD-LINK-NAREDIRECT=0; ASP.NET_SessionId=5acflzzgqtjdvsnjc5wtwuys; tz=Eastern%20Standard%20Time; btpdb.1PR3l09.dGZjLjY2ODI2ODU=U0VTU0lPTg; oddsfmt=dec; \_hjSession_2115245=eyJpZCI6Ijk2NzBiMjNkLWY4MGQtNDM5OS1hYWNhLWQyODBjNmZlYzNkMSIsImNyZWF0ZWQiOjE2NjgwOTM2NzY4OTUsImluU2FtcGxlIjpmYWxzZX0=; \_hjAbsoluteSessionInProgress=0; \_hjIncludedInSessionSample=0; LPSID-90263191=bLgFHbiuTjOcwCg1FgR16g; \__cf_bm=5LozQOf4P4COCn1rVD5emsVzukFSNbWdS7kvBVodzJ4-1668096251-0-AQ+nY5HeihIwV+gAI1oaFKJJxOtgXWs5czIr198Ffrh18P1q4nriEcszp/j7dwjuDjVuki1jlT6IByy2ewOCcXSUWavF+3MCcBF4Yb8sfDPVkvoSufxJ46feYuPiCiPcw0eW9oTUnrmZNcEkZ1732RDx6LWq1OElUvT0Uk6sk1n1; \_gat_UA-190679354-1=1; \_ga_KC6V6402HY=GS1.1.1668096234.18.1.1668096460.0.0.0; \_ga=GA1.1.1142263304.1666556424; server_persistent=!Tdbrpsz3tJ8jlNmKJcby5Il79pzHsfLVz91fFnDrXObiJE45d6idCUAVcW4Qmd/g598vNFaqTVuVRvk=
origin: https://dgslivebetting.betonline.ag
referer: https://dgslivebetting.betonline.ag/ngwbet.aspx
sec-ch-ua: "Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36
x-newrelic-id: VgcFUVNTDxACV1NaDgIDVlw=
x-requested-with: XMLHttpRequest
Please help me figure out how I can get what I want.
My code:
import requests
import cloudscraper
scraper = cloudscraper.create_scraper()
url = 'https://dgslivebetting.betonline.ag/ngwbet.aspx/gvFrameHtml'
data = {"gameID":0}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
'Referer': "https://dgslivebetting.betonline.ag/ngwbet.aspx/gvFrameHtml"
}
r = requests.post(url, data=data, headers=headers)
print(r.text)
In order to get JSON back, you need to add the Content-Type header to your request.
Your current examples shows you are only sending these headers:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
'Referer': "https://dgslivebetting.betonline.ag/ngwbet.aspx/gvFrameHtml"
}
At the very least, you'll need to add Content-Type: application/json; charset=UTF-8 to the request, otherwise, requests is doing an application/x-www-form-urlencoded form post which is why you're getting back HTML from this site instead of JSON.

Fatal erro in POST using request module Python 3

I'm working on a web scraper build in python. Until now I build the following code:
import requests
headers = {
'authority': 'truegamedata.com',
'accept': '*/*',
'x-requested-with': 'XMLHttpRequest',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'sec-gpc': '1',
'origin': 'https://truegamedata.com',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://truegamedata.com/weapon_builder.php',
'accept-language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
}
data = {
'weapon_name': '^%^5B^%^22Kilo 141^%^22^%^2C^%^22wz^%^22^%^5D'
}
response = requests.post('https://truegamedata.com/SQL_calls/base_data.php', headers=headers, data=data)
print(response.text)
For some reason, I get the following error:
<br />
<b>Fatal error</b>: Uncaught Error: Call to a member function execute() on bool in /home/customer/www/truegamedata.com/public_html/SQL_calls/base_data.php:29
Stack trace:
#0 {main}
thrown in <b>/home/customer/www/truegamedata.com/public_html/SQL_calls/base_data.php</b> on line <b>29</b><br />
Does anyone know why this is happening? And how I can get this response?
Here is the request from Chorme Dev tools:
Request URL: https://truegamedata.com/SQL_calls/base_data.php
Request Method: POST
Status Code: 200
Remote Address: 127.0.0.1:61696
Referrer Policy: strict-origin-when-cross-origin
cache-control: no-store, no-cache, must-revalidate
content-encoding: br
content-type: text/html; charset=UTF-8
date: Fri, 12 Feb 2021 20:08:45 GMT
expires: Thu, 19 Nov 1981 08:52:00 GMT
host-header: 8441280b0c35cbc1147f8ba998a563a7
pragma: no-cache
server: nginx
vary: Accept-Encoding
x-httpd-modphp: 1
x-proxy-cache-info: DT:1
:authority: truegamedata.com
:method: POST
:path: /SQL_calls/base_data.php
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: pt-BR,pt;q=0.9
content-length: 42
content-type: application/x-www-form-urlencoded; charset=UTF-8
cookie: PHPSESSID=375e8ebdfa9174d6db5eb8c1cda4411b; game=wz
origin: https://truegamedata.com
referer: https://truegamedata.com/weapon_builder.php
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
sec-gpc: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36
x-requested-with: XMLHttpRequest
weapon_name: ["FR 5.56","wz"]
I tried to give as much information as possible, if anything is missing let me know

Issue in passing session information for scraping

I went to this website
www4.fmovies.to
then I clicked a movie and checked its CDN URL via Inspect->Network
and got below details
https://cdn.mcloud.to/stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0013.ts
:authority: cdn.mcloud.to
:method: GET
:path: /stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0001.ts
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: __cfduid=d0847f9ac6d9a8da1dd131d1a0a91ea991542533053; _ga=GA1.2.485859786.1542533055; _gid=GA1.2.1916946057.1542533055; _gat=1
origin: https://mcloud.to
referer: https://mcloud.to/embed/#P#O8SE2916SEOA5?sub.file=https%253A%252F%252Fstatic1.akacdn.ru%252Fsubtitle%252F40039.vtt%253Fv1&ui=oAhi567w9OQEhJWEdbl0s%40Ep0Ir2VvG1xiK9JqKx
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
created header information using the above information and then ran
request = requests.get(url, headers=headers)
But am getting 403 Not Authorized. What is the issue?
You need to pass referer header that is the src attribute of video content iframe that looks like
<iframe src="https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x" allow="autoplay; fullscreen" scrolling="no" allowfullscreen="yes" style="width: 100%; height: 100%;" frameborder="no"></iframe>
The code looks like
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0)Gecko/20100101 Firefox/60.', 'pragma': 'no-cache', 'connection': 'keep-alive', 'cache-control': 'no-cache', 'referer': 'https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x'}
requests.get('https://cdn.mcloud.to/stream/sf:i0:q2:h2:p24:l1/WjLDZuCBHmtyv63lT-RoVQ/1542603600/g/c/0/rj0m0m/hls/480/480-0000.ts', headers=headers)

Not able to upload tar.gz file using Python Request Module

Here is what my XHR data looks like when captured in chrome
Request Header
POST my_url?X-Progress-ID=ee821652321919bc7ae61fbe0b625990&userpkgname=file_name.tar.gz HTTP/1.1
Host: 10.110.134.28
Connection: keep-alive
Content-Length: 17461
Accept: application/json, text/plain, */*
Origin: https://10.110.134.28
X-XSRF-TOKEN: bGIwdfFE-oaL_1yVrCzw0iHvv4yUHLC28xjw
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundarybfK9jSdLoc2Mpj0i
Referer: https://10.110.134.28/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Cookie: XSRF-TOKEN=bGIwdfFE-oaL_1yVrCzw0iHvv4yUHLC28xjw; sid=s%3AXglfJNLQ9zzp3eHjQ2QOpk19kFKDDMvJ.ZMKyZd1Gx13lz2MnJgty5WncnilySzfoThGktkhlk4w
Payload
------WebKitFormBoundarybfK9jSdLoc2Mpj0i
Content-Disposition: form-data; name="package"; filename="file_name.tar.gz"
Content-Type: application/x-gzip
------WebKitFormBoundarybfK9jSdLoc2Mpj0i--
And this is how I am building my request.
files = {'package': (<file_name>, open(config_path, 'rb'), 'application/x-gzip')}
request.post(url, files=files)
This is how my request header looks like
{
'Content-Length' : '17449',
'Accept-Encoding' : 'gzip, deflate',
'Accept' : '*/*',
'User-Agent' : 'python-requests/2.10.0',
'Connection' : 'keep-alive',
'Cookie' : 'XSRF-TOKEN=zmklLEL0-gDJOfNBk113MuTpBkLo0j6MAzw0; sid=s%3AC2JZDCfpg_CgkU7qSlS5YTvWXwpgMX35.5nU7W02TPNYtMkIQ4W%2B1bjd87A7KyJbh3shoNqqADXE',
'Content-Type' : 'multipart/form-data; boundary=270d9e02bf214dc7a09c3081cba5b0e0',
'XSRF-TOKEN' : 'zmklLEL0-gDJOfNBk113MuTpBkLo0j6MAzw0'
}
When I make the request I get 502 bad gateway response that too after few seconds while on chrome I get 200 OK instantly
So most probably I am not building my request correctly. Any suggestions?

How to specify the "Content-Type" and "Accept" on FormRequest?

Using the RequestForm, I need to specify that the Content-Type is application/json; charset=UTF-8 and Accept is */*.
How to do this?
Currently, my code looks like this:
yield scrapy.FormRequest(url='...',
formdata={
...
},
cookies={...},
callback=self.parse_second)
Using browser, the request is:
POST /PaginasPublicas/_SBC.aspx/pesquisaLoteIntegracaoTPCL HTTP/1.1
Host: geosampa.prefeitura.sp.gov.br
Connection: keep-alive
Content-Length: 118
Accept: */*
Origin: http://geosampa.prefeitura.sp.gov.br
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36
Content-Type: application/json; charset=UTF-8
Referer: http://geosampa.prefeitura.sp.gov.br/PaginasPublicas/_SBC.aspx
Accept-Encoding: gzip, deflate
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4,ar;q=0.2,de;q=0.2,es;q=0.2,fr;q=0.2,it;q=0.2,ja;q=0.2,pl;q=0.2,tr;q=0.2,zh-TW;q=0.2
Cookie: ASP.NET_SessionId=bvvghxvsxgwzuyaudsqn5m5q
Your request should be like this:
yield FormRequest(..., headers={'Content-Type': 'application/json','charset':'UTF-8'})
Scrapy Request has a field headers which is use to define explicit headers. This will work for you.
yield scrapy.FormRequest(url='...',
formdata={
...
},
cookies={...}, headers={'Content-Type': 'application/json','charset':'UTF-8'},
callback=self.parse_second)

Categories

Resources