Extract media files from cache using Selenium - python

I'm trying to download some videos from a website using Selenium.
Unfortunately I can't download it from source cause the video is stored in a directory with restricted access, trying to retrieve them using urllib, requests or ffmpeg returns a 403 Forbidden error, even after injecting my user data to the website.
I was thinking of playing the video in its entirety and store the media file from cache.
Would it be a possibility? Where can I find the cache folder in a custom profile? How do I discriminate among files in cache?
EDIT: This is what I attempted to do using requests
import requests
def main():
s = requests.Session()
login_page = '<<login_page>>'
login_data = dict()
login_data['username'] = '<<username>>'
login_data['password'] = '<<psw>>'
login_r = s.post(login_page)
video_src = '<<video_src>>'
cookies = dict(login_r.cookies) # contains the session cookie
# static cookies for every session
cookies['_fbp'] = 'fb.1.1630500067415.734723547'
cookies['_ga'] = 'GA1.2.823223936.1630500067'
cookies['_gat'] = '1'
cookies['_gid'] = 'GA1.2.1293544716.1631011551'
cookies['user'] = '66051'
video_r = s.get(video_src, cookies=cookies)
print(video_r.status_code)
if __name__ == '__main__':
main()
The print() function returns:
403
This is the network tab for the video:

Regarding video_r = s.get(video_src, cookies=cookies) Have you try to stream the response ? which send correct byte-range headers to download the video. Most websites prevent downloading the file as "one" block.
with open('...', 'wb') as f:
response = s.get(url=link, stream=True)
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
You can send a head request before if you want, in that way you can create a progress bar, you will retrieve the full content length from header.
Also a 403 is commonly use by anti-bot system, may be your selenium is detected.

You blocked because you forgot about headers.
You must use:
s.get('https://httpbin.org / headers', headers ={'user-agent': <The user agent value (for example: last line of your uploaded image)>})
or:
s.headers.update({'user-agent': <The user agent value (for example: last line of your uploaded image)>})
before sending a request

Related

Can't download csv.gz from website using Python

I'm currently trying to download a csv.gz file from the following link: https://www.cryptoarchive.com.au/bars/pair. As you can see, opening the link with a browser simply opens the save file dialogue. However, passing the link to requests or urllib simply downloads HTML as opposed to the actual file.
This is the current approach I'm trying:
EDIT: Updated to reflect changes I've made.
url = "https://www.cryptoarchive.com.au/bars/pair"
file_name = "test.csv.gz"
headers = {"PLAY_SESSION": play_session}
r = requests.get(url, stream=True, headers=headers)
with open(file_name, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()
The only saved cookie I can find is the PLAY_SESSION. Setting that as a header doesn't change the result I'm getting.
Further, I've tried posting a request to the login page like this:
login = "https://www.cryptoarchive.com.au/signup"
data = {"email": email,
"password": password,
"accept": "checked"}
with requests.Session() as s:
p = s.post(login, data=data)
print(p.text)
However, this also doesn't seem to work and I especially can't figure out what to pass to the login page or how to actually check the checkbox...
Just browsing that url from a private navigation shows the error:
Please login/register first.
To get that file, you need to login first into the site. Probably with the login you will get a session token, some cookie or something similar that you need to put in the request command.
Both #Daniel Argüelles' and #Abhyudaya Sharma's answer have helped me. The solution was simply getting the PLAY_SESSION cookie after logging into the website and passing it to the request function.
cookies = {"PLAY_SESSION": play_session}
url = "https://www.cryptoarchive.com.au/bars/pair"
r = requests.get(url, stream=True, cookies=cookies)
with open(file_name, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()

Grab auto Download Links Using requests

I'm Trying to grab auto started direct download Link from Yourupload using Bs4
the direct download Link is auto generated every time,
the direct download Link also start automatically after 5 seconds,
i want to get the direct download Link and store it in "Link.txt" Files
import requests
import bs4
req = requests.get('https://www.yourupload.com/download?file=2573285', stream = True)
req = bs4.BeautifulSoup(req.text,'lxml')
print(req)
Well, actually the site is running a JavaScript code to handle the redirect to the final-destination url to stream the download with just token validation.
Now we will be more wolfs and get through it.
We will send a GET request firstly with maintaining the session via requests.Session() to maintain the session object and again send GET request to download the Video :).
Which means that you currently have the final url, you can do whatever, to download it now or later.
import requests
from bs4 import BeautifulSoup
def Main():
main = "https://www.yourupload.com/download?file=2573285"
with requests.Session() as req:
r = req.get(main)
soup = BeautifulSoup(r.text, 'html.parser')
token = soup.findAll("script")[2].text.split("'")[1][-4:]
headers = {
'Referer': main
}
r = req.get(
f"https://www.yourupload.com/download?file=2573285&sendFile=true&token={token}", stream=True, headers=headers)
print(f"Downloading From {r.url}")
name = r.headers.get("Content-Disposition").split('"')[1]
with open(name, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024*1024):
if chunk:
f.write(chunk)
print(f"File {name} Saved.")
Main()
Output:
Downloading From https://s205.vidcache.net:8166/play/a202003090La0xSot1Kl/okanime-2107-HD-19_99?&attach=okanime-2107-HD-19_99.mp4
File okanime-2107-HD-19_99.mp4 Saved.
Confirmation By Size: As you can see 250M
Notice that the download link is one time callable as the token is only validated one-time by the back-end.

How to Send post response as image from python requests

i am trying to get my image hosted online and for that i am using python
import requests
url = 'http://imgup.net/'
data = {'image[image][]':'http://www.webhost-resources.com/wp-content/uploads/2015/01/dedicated-hosting-server.jpg'}
r = requests.post(url, files=data)
i am not able to get the response url of the hosted image from the response .
Please help !
The files parameter of requests.post needs a:
Dictionary of 'name': file-like-objects (or {'name': ('filename', fileobj)}) for multipart encoding upload.
There's more data you'll need to send than just the file, most importantly the "authenticity token". If you look at the source code of the page, it'll show you all other parameters as <input type="hidden"> tags.
The upload URL is http://imgup.net/upload, as you can see from the action attribute of <form>.
So what you need to do is:
Download the image you want to upload (I'll call it dhs.jpg).
Do a GET request of the main page, extracting the authenticity_token.
Once you have that, send the request with files= and data=:
‌
url = "http://imgup.net/upload"
data = {'utf8': '✓', 'authenticity_token': '<put your scraped token here>', '_method': 'put'}
f = open("dhs.jpg", "rb") # open in binary mode
files = {'image[image][]': f}
r = requests.post(url, files=files, data=data)
f.close()
print(r.json()["image_link"]
Final note: While I couldn't find any rule against this behaviour in their T&C, the presence of an authenticity token makes it seem likely that imgup doesn't really want you to do this automatically.

Python Requests GET fails after successfull session login

So I'm using the Python Requests library to login to a PHP-WebCMS. So far I was able to login using the post-command with a payload. I am able to download a file.
The problem is: When I'm running the GET-Command just after loggin in via POST it tells me that im not logged in anymore - although I'm still using the same session! Please have a look at the code
#Lets Login
with requests.session() as s:
payload = {'username': strUserName, 'password': strUserPass, 'Submit':'Login'}
r = s.post(urlToLoginPHP, data=payload, stream=True)
#Ok we are logged in. If I would run the #DOWNLOADING Files code right here I would get a correct zip file
#But since the r2-Get-Command delivers the "please login" page it doesn't work anymore
r2 = s.get("urlToAnotherPageOfThisWebsite",stream=True)
#We are not logged in anymore
#DOWNLOADING Files: This now just delivers a 5KB big file which contains the content
of the "please login page"
local_filename = 'image.zip'
# NOTE the stream=True parameter
r1 = s.get(downloadurl, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r1.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush()
I found the solution: I was logged into the Browser and was trying to login via python at the same time. The site noticed that I was already logged in and I misunderstood it.

Log in into website and download file with python requests

I have a website with a HTML-Form. After logging in it takes me to a start.php site and then redirects me to an overview.php.
I want to download files from that server... When I click on the download link of a ZIP-File the address behind the link is:
getimage.php?path="vol/img"&id="4312432"
How can I do that with requests? I tried to create a session and do the GET-Command with the right params... but the answer is just the website I would see when I'm not logged in.
c = requests.Session()
c.auth =('myusername', 'myPass')
request1 = c.get(myUrlToStart.PHP)
tex = request1.text
with open('data.zip', 'wb') as handle:
request2 = c.get(urlToGetImage.Php, params=payload2, stream=True)
print(request2.headers)
for block in request2.iter_content(1024):
if not block:
break
handle.write(block)
What you're doing is a request with basic authentication. This does not fill out the form that is displayed on the page.
If you know the URL that your form sends a POST request to, you can try sending the form data directly to this URL
Those who are looking for the same thing could try this...
import requests
import bs4
site_url = 'site_url_here'
userid = 'userid'
password = 'password'
file_url = 'getimage.php?path="vol/img"&id="4312432"'
o_file = 'abc.zip'
# create session
s = requests.Session()
# GET request. This will generate cookie for you
s.get(site_url)
# login to site.
s.post(site_url, data={'_username': userid, '_password': password})
# Next thing will be to visit URL for file you would like to download.
r = s.get(file_url)
# Download file
with open(o_file, 'wb') as output:
output.write(r.content)
print(f"requests:: File {o_file} downloaded successfully!")
# Close session once all work done
s.close()

Categories

Resources