I cant get all HTML with Python Requests - python

import requests as req
html = req.get(url)
texto = html.text
print(texto)
I cant get all HTML with Python Requests, only gets a litle part from the html file.

You need to add headers to your request to obtain a response like in your browser. Try:
import requests as req
headers = {
'Host': 'servidor.aternos.me',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Cookie': 'axcaccess=e75a91ffeda624d3a1e24c1d9fb31734',
'Upgrade-Insecure-Requests': '1'
}
url = 'https://servidor.aternos.me/'
html = req.get(url, headers=headers, timeout=10.)
print(html.status_code)
texto = html.text
print(texto)

Related

Problems with api access of Olx

Good afternoon.
I am trying to scrape the information of each item that is found in this link, but when launching the requests for obtain the links where is the information that i need, i can't get them. I was inspecting the page and i see that it brings an API, but i couldn't access it. Can someone help me with this? i really don't handle very well the API's.
This is my request for verify access
url = 'https://www.olx.com.co/api/relevance/search?category=16&facet_limit=100&location=1000001&location_facet_limit=20&page=1&user=1776310a947x4a045a04'
headers = {
'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0',
'accept': '*/*',
'accept-encoding':'gzip, deflate, br',
'accept-language':'es-ES,es;q=0.9'
}
req = requests.get(url, headers = headers)
req.content
Note: excuse my english
Thank you!
Its work fine, just print(req.json())
import requests
import json
url = 'https://www.olx.com.co/api/relevance/search?category=16&facet_limit=100&location=1000001&location_facet_limit=20&page=1&user=1776310a947x4a045a04'
headers = {
'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0',
'accept': '*/*',
'accept-encoding':'gzip, deflate, br',
'accept-language':'es-ES,es;q=0.9'
}
req = requests.get(url, headers = headers)
req.json()

Account authenticate post method requests in python

In the following code, I am trying to do POST method to microsoft online account, and I am starting with a page that requires to post an email. This is my try till now
import requests
from bs4 import BeautifulSoup
url = 'https://moe-register.emis.gov.eg/account/login?ReturnUrl=%2Fhome%2FRegistrationForm'
headers ={
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ar;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie':'__RequestVerificationToken=vdS3aPPg5qQ2bH9ADTppeKIVJfclPsMI6dqB6_Ru11-2XJPpLfs7jBlejK3n0PZuYl-CwuM2hmeCsXzjZ4bVfj2HGLs2KOfBUphZHwO9cOQ1; .AspNet.MOEEXAMREGFORM=ekeG7UWLA6OSbT8ZoOBYpC_qYMrBQMi3YOwrPGsZZ_3XCuCsU1BP4uc5QGGE2gMnFgmiDIbkIk_8h9WtTi-P89V7ME6t_mBls6T3uR2jlllCh0Ob-a-a56NaVNIArqBLovUnLGMWioPYazJ9DVHKZY7nR_SvKVKg2kPkn6KffkpzzHOUQAatzQ2FcStZBYNEGcfHF6F9ZkP3VdKKJJM-3hWC8y62kJ-YWD0sKAgAulbKlqcgL1ml6kFoctt2u66eIWNm3ENnMbryh8565aIk3N3UrSd5lBoO-3Qh8jdqPCCq38w3cURRzCd1Z1rhqYb3V2qYs1ULRT1_SyRXFQLrJs5Y9fsMNkuZVeDp_CKfyzM',
'Host': 'moe-register.emis.gov.eg',
'Origin': 'https://moe-register.emis.gov.eg',
'Referer': 'https://moe-register.emis.gov.eg/account/login?ReturnUrl=%2Fhome%2FRegistrationForm',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
with requests.session() as s:
# r = s.post(url)
#soup = BeautifulSoup(r.content, 'lxml')
data = {'EmailAddress': '476731809#matrouh1.moe.edu.eg'}
r_post = s.post(url, data=data, headers=headers, verify=False)
soup = BeautifulSoup(r_post.content, 'lxml')
print(soup)
What I got is the same page that requires the post of the email again. I expected to get the page that requires sign-in password..
This is the starting page
and this is an example of the email that needed to be posted 476731809#matrouh1.moe.edu.eg
** I have tried to use such a code but I got the page sign in again (although the credentials are correct)
Can you please try this code
import requests
from bs4 import BeautifulSoup
url = 'https://login.microsoftonline.com/common/login'
s = requests.Session()
res = s.get('https://login.microsoftonline.com')
cookies = dict(res.cookies)
res = s.post(url,
auth=('476731809#matrouh1.moe.edu.eg', 'Std#050202'),
verify=False,
cookies=cookies)
soup = BeautifulSoup(res.text, 'html.parser')
print(soup)
I checked out the page and following seems to be working:
import requests
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'Origin': 'https://moe-register.emis.gov.eg',
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://moe-register.emis.gov.eg/account/login',
'Accept-Language': 'en-US,en;q=0.9,gl;q=0.8,fil;q=0.7,hi;q=0.6',
}
data = {
'EmailAddress': '476731809#matrouh1.moe.edu.eg'
}
response = requests.post('https://moe-register.emis.gov.eg/account/authenticate', headers=headers, data=data, verify=False)
Your POST endpoint seems to be wrong, since you need to re-direct from /login to /authenticate to proceed with the request (I am on a mac so my user-agent may be different than yours/required, you can change that from the headers variable).

How to generate UserLoginType[_token] for login request

I'm trying to login into a website using a post request like this:
import requests
cookies = {
'_SID': 'c1i73k2mg3sj0ugi5ql16c3sp7',
'isCookieAllowed': 'true',
}
headers = {
'Host': 'service.premiumsim.de',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://service.premiumsim.de/',
'Content-Type': 'application/x-www-form-urlencoded',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
data = [
('_SID', 'c1i73k2mg3sj0ugi5ql16c3sp7'),
('UserLoginType[alias]', 'username'),
('UserLoginType[password]', 'password'),
('UserLoginType[logindata]', ''),
('UserLoginType[_token]', '1af70f3d0e5b9e6c39e1475b6d84e9d125d076de'),
]
requests.post('https://service.premiumsim.de/public/login_check', headers=headers, cookies=cookies, data=data)
The problem is 'UserLoginType[_token] in params. The above code is working, but I don't have that token and have no clue how to generate it. So when I do my request, I do it without the _token and the request fails.
Google did not found any helpful information about UserLoginType.
Does anyone know how to generate it(e.g. with another request first) to be able to login?
Edit:
Thanks to the suggestion of t.m.adam I used bs4 to get the token:
import requests
from bs4 import BeautifulSoup as bs
tokenRequest = requests.get('https://service.premiumsim.de')
html_bytes = tokenRequest.text
soup = bs(html_bytes, 'lxml')
token = soup.find('input', {'id':'UserLoginType__token'})['value']

How can i send lots of post request very very fast in python?

This is my code for sending lots of POST request in python language.it works correctly.But this is so so slow.i think because it send POST request one by one.
i have to question:
1.how can i make this code in a way that it send POST request very faster?
2.how can i send a POST request without waiting for its response?
import sys
import json
import requests
import time
url = "http://club.raakdarou.com/users/main/GettingGift"
number=2237499999938616
headers={'Host': 'club.raakdarou.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'http://club.raakdarou.com/users/main',
'Content-Length': '53',
'Cookie': 'ASP.NET_SessionId=4sirmg5n05fkszmvbasr4vnt; __RequestVerificationToken=AsWNgNzLVjU8kCNvWvTjn_tJiBlEKE5NH_xO-o7eGkq3h3av7I1e0_qu6NO80SNKfiV-c5Ajm6nlrDE7pFcKFkdr6ZHVX9zXAWYZt79c_pw1',
'Connection': 'keep-alive'}
def sendRequest(number):
try:
while(number>=2237400000000000):
data = {'gift': number , 'X-Requested-With' : 'XMLHttpRequest'}
r = requests.post(url, data=json.dumps(data), headers=headers)
print(number)
number=number-1
except:
print ("Unexpected error:", sys.exc_info()[0])
sendRequest(number+1)

using python urllib2 to send POST request and get response

I am trying to get the HTML page back from sending a POST request:
import httplib
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup
headers = {
'Host': 'digitalvita.pitt.edu',
'Connection': 'keep-alive',
'Content-Length': '325',
'Origin': 'https://digitalvita.pitt.edu',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1',
'Content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept': 'text/javascript, text/html, application/xml, text/xml, */*',
'Referer': 'https://digitalvita.pitt.edu/index.php',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Cookie': 'PHPSESSID=lvetilatpgs9okgrntk1nvn595'
}
data = {
'action': 'search',
'xdata': '<search id="1"><context type="all" /><results><ordering>familyName</ordering><pagesize>100000</pagesize><page>1</page></results><terms><name>d</name><school>All</school></terms></search>',
'request': 'search'
}
data = urllib.urlencode(data)
print data
req = urllib2.Request('https://digitalvita.pitt.edu/dispatcher.php', data, headers)
response = urllib2.urlopen(req)
the_page = response.read()
soup=BeautifulSoup(the_page)
print soup
Can anyone tell me how to make it work?
Do not specify a Content-Length header, urllib2 calculates it for you. As it is, your header specifies the wrong length:
>>> data = urllib.urlencode(data)
>>> len(data)
319
Without that header the rest of the posted code works fine for me.

Categories

Resources