Log in into website and download file with python requests

Log in into website and download file with python requests - python

I have a website with a HTML-Form. After logging in it takes me to a start.php site and then redirects me to an overview.php.
I want to download files from that server... When I click on the download link of a ZIP-File the address behind the link is:
getimage.php?path="vol/img"&id="4312432"
How can I do that with requests? I tried to create a session and do the GET-Command with the right params... but the answer is just the website I would see when I'm not logged in.
c = requests.Session()
c.auth =('myusername', 'myPass')
request1 = c.get(myUrlToStart.PHP)
tex = request1.text
with open('data.zip', 'wb') as handle:
request2 = c.get(urlToGetImage.Php, params=payload2, stream=True)
print(request2.headers)
for block in request2.iter_content(1024):
if not block:
break
handle.write(block)

What you're doing is a request with basic authentication. This does not fill out the form that is displayed on the page.
If you know the URL that your form sends a POST request to, you can try sending the form data directly to this URL

Those who are looking for the same thing could try this...
import requests
import bs4
site_url = 'site_url_here'
userid = 'userid'
password = 'password'
file_url = 'getimage.php?path="vol/img"&id="4312432"'
o_file = 'abc.zip'
# create session
s = requests.Session()
# GET request. This will generate cookie for you
s.get(site_url)
# login to site.
s.post(site_url, data={'_username': userid, '_password': password})
# Next thing will be to visit URL for file you would like to download.
r = s.get(file_url)
# Download file
with open(o_file, 'wb') as output:
output.write(r.content)
print(f"requests:: File {o_file} downloaded successfully!")
# Close session once all work done
s.close()

Related

Extract media files from cache using Selenium

I'm trying to download some videos from a website using Selenium.
Unfortunately I can't download it from source cause the video is stored in a directory with restricted access, trying to retrieve them using urllib, requests or ffmpeg returns a 403 Forbidden error, even after injecting my user data to the website.
I was thinking of playing the video in its entirety and store the media file from cache.
Would it be a possibility? Where can I find the cache folder in a custom profile? How do I discriminate among files in cache?
EDIT: This is what I attempted to do using requests
import requests
def main():
s = requests.Session()
login_page = '<<login_page>>'
login_data = dict()
login_data['username'] = '<<username>>'
login_data['password'] = '<<psw>>'
login_r = s.post(login_page)
video_src = '<<video_src>>'
cookies = dict(login_r.cookies) # contains the session cookie
# static cookies for every session
cookies['_fbp'] = 'fb.1.1630500067415.734723547'
cookies['_ga'] = 'GA1.2.823223936.1630500067'
cookies['_gat'] = '1'
cookies['_gid'] = 'GA1.2.1293544716.1631011551'
cookies['user'] = '66051'
video_r = s.get(video_src, cookies=cookies)
print(video_r.status_code)
if __name__ == '__main__':
main()
The print() function returns:
403
This is the network tab for the video:

Regarding video_r = s.get(video_src, cookies=cookies) Have you try to stream the response ? which send correct byte-range headers to download the video. Most websites prevent downloading the file as "one" block.
with open('...', 'wb') as f:
response = s.get(url=link, stream=True)
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
You can send a head request before if you want, in that way you can create a progress bar, you will retrieve the full content length from header.
Also a 403 is commonly use by anti-bot system, may be your selenium is detected.

You blocked because you forgot about headers.
You must use:
s.get('https://httpbin.org / headers', headers ={'user-agent': <The user agent value (for example: last line of your uploaded image)>})
or:
s.headers.update({'user-agent': <The user agent value (for example: last line of your uploaded image)>})
before sending a request

Web Scraping using Requests - Python

I am trying to get data using the Resquest library, but I’m doing something wrong. My explanation, manual search:
URL - https://www9.sabesp.com.br/agenciavirtual/pages/template/siteexterno.iface?idFuncao=18
I fill in the “Informe o RGI” field and after clicking on the Prosseguir button (like Next):
enter image description here
I get this result:
enter image description here
Before I coding, I did the manual search and checked the Form Data:
enter image description here
And then I tried it with this code:
import requests
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data)
print(res.text)
My output is:
<session-expired/>
What am I doing wrong?
Many thanks.

When you go to the site using the browser, a session is created and stored in a cookie on your machine. When you make the POST request, the cookies are sent with the request. You receive an session-expired error because you're not sending any session data with your request.
Try this code. It requests the entry page first and stores the cookies. The cookies are then sent with the POST request.
import requests
session = requests.Session() # start session
# get entry page with cookies
response = session.get('https://www9.sabesp.com.br/agenciavirtual/pages/home/paginainicial.iface', timeout=30)
cks = session.cookies # save cookies with Session data
print(session.cookies.get_dict())
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data, cookies=cks) # send cookies with request
print(res.text)

Python: Remain logged in a website using mechanicalsoup

I am a newbie in Python. I want to write a script in python where I run thousands of URLs and save the response. In order to access these URLs, credentials are required. So, I have written a basic script which goes to the URL and print the response. When I go through multiple URL, the website returns the error that multiple users are logged in. So, I would like to login once and run other URLs in the same logged in session. Is there any way I can do that using Mechanical soup.
Here is my script:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open("mywebsite1")
browser.select_form('form[action="/login"]')
browser.get_current_form().print_summary()
browser["userId"] = "myusername"
browser["password"] = "mypassword"
response = browser.submit_selected()
browser.launch_browser()
print(response.text)
browser.open("mywebsite2")
print(response.text)
...... so on for all the URLs
How can I save my session? Thanks in advance for your help

Just save & load session.cookies:
def save_cookies(browser):
return browser.session.cookies.get_dict()
def load_cookies(browser, cookies):
from requests.utils import cookiejar_from_dict
browser.session.cookies = cookiejar_from_dict(cookies)
browser = mechanicalsoup.StatefulBrowser()
browser.open("www.google.com")
cookies = save_cookies(browser)
load_cookies(browser, cookies)

Log in a site and navigate another pages

I have a script for Python 2 to login into a webpage and then move inside to reach a couple of files pointed to on the same site, but different pages. Python 2 let me open the site with my credentials and then create a opener.open() to keep the connection available to navigate to the other pages.
Here's the code that worked in Python 2:
$Your admin login and password
LOGIN = "*******"
PASSWORD = "********"
ROOT = "https:*********"
#The client have to take care of the cookies.
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
#POST login query on '/login_handler' (post data are: 'login' and 'password').
req = urllib2.Request(ROOT + "/login_handler",
urllib.urlencode({'login': LOGIN,
'password': PASSWORD}))
opener.open(rep)
#Set the right accountcode
for accountcode, queues in QUEUES.items():
req = urllib2.Request(ROOT + "/switch_to" + accountcode)
opener.open(req)
I need to do the same thing in Python 3. I have tried with request module and urllib, but although I can establish the first login, I don't know how to keep the opener to navigate the site. I found the OpenerDirector but it seems like I don't know how to do it, because I haven't reached my goal.
I have used this Python 3 code to get the result desired but unfortunately I can't get the csv file to print it.
enter image description here

Question: I don't know how to keep the opener to navigate the site.
Python 3.6» Documentation urllib.request.build_opener
Use of Basic HTTP Authentication:
import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.example.com/login.html')
csv_content = f.read()

Use python requests library for python 3 and session.
http://docs.python-requests.org/en/master/user/advanced/#session-objects
Once you login your session will be automatically managed. You dont need to create your own cookie jar. Following is the sample code.
s = requests.Session()
auth={"login":LOGIN,"pass":PASS}
url=ROOT+/login_handler
r=s.post(url, data=auth)
print(r.status_code)
for accountcode, queues in QUEUES.items():
req = s.get(ROOT + "/switch_to" + accountcode)
print(req.text) #response text

Python Requests GET fails after successfull session login

So I'm using the Python Requests library to login to a PHP-WebCMS. So far I was able to login using the post-command with a payload. I am able to download a file.
The problem is: When I'm running the GET-Command just after loggin in via POST it tells me that im not logged in anymore - although I'm still using the same session! Please have a look at the code
#Lets Login
with requests.session() as s:
payload = {'username': strUserName, 'password': strUserPass, 'Submit':'Login'}
r = s.post(urlToLoginPHP, data=payload, stream=True)
#Ok we are logged in. If I would run the #DOWNLOADING Files code right here I would get a correct zip file
#But since the r2-Get-Command delivers the "please login" page it doesn't work anymore
r2 = s.get("urlToAnotherPageOfThisWebsite",stream=True)
#We are not logged in anymore
#DOWNLOADING Files: This now just delivers a 5KB big file which contains the content
of the "please login page"
local_filename = 'image.zip'
# NOTE the stream=True parameter
r1 = s.get(downloadurl, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r1.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush()

I found the solution: I was logged into the Browser and was trying to login via python at the same time. The site noticed that I was already logged in and I misunderstood it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Log in into website and download file with python requests - python

What you're doing is a request with basic authentication. This does not fill out the form that is displayed on the page. If you know the URL that your form sends a POST request to, you can try sending the form data directly to this URL

Related

Extract media files from cache using Selenium

Web Scraping using Requests - Python

Python: Remain logged in a website using mechanicalsoup

Log in a site and navigate another pages

Python Requests GET fails after successfull session login

Categories

Resources