I am currently using the below script to get regular files from Google Drive. It works fine and its basically the code from the user #user115202 cudos.
Now I need to get it to work for Whatsapp Backups which are stored under "Backup" in GoogleDrive and not as a regular file.
The tool WhatsApp Google Drive Extractor (Google Drive API) doesnt seem to work anymore.
Does anyone know an alternative?
import requests
def download_file_from_google_drive(id, destination):
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
if __name__ == "__main__":
import sys
if len(sys.argv) is not 3:
print "Usage: python google_drive.py drive_file_id destination_file_path"
else:
# TAKE ID FROM SHAREABLE LINK
file_id = sys.argv[1]
# DESTINATION FILE ON YOUR DISK
destination = sys.argv[2]
download_file_from_google_drive(file_id, destination)
I managed to tackle the API and made some changes in the code and it works now. The code will be available on https://github.com/EliteAndroidApps/WhatsApp-GD-Extractor
Related
This is my code for uploading to google drive with python requests using google-drive-api.
import sys
import json
import requests
from tqdm import tqdm
import requests_toolbelt
from requests.exceptions import JSONDecodeError
class ProgressBar(tqdm):
def update_to(self, n: int) -> None:
self.update(n - self.n)
def upload_file(access_token:str, filename:str, filedirectory:str):
metadata = {
"title": filename,
}
files = {}
session = requests.session()
with open(filedirectory, "rb") as fp:
files["file"] = fp
files["data"] = ('metadata', json.dumps(metadata), 'application/json')
encoder = requests_toolbelt.MultipartEncoder(files)
with ProgressBar(
total=encoder.len,
unit="B",
unit_scale=True,
unit_divisor=1024,
miniters=1,
file=sys.stdout,
) as bar:
monitor = requests_toolbelt.MultipartEncoderMonitor(
encoder, lambda monitor: bar.update_to(monitor.bytes_read)
)
r = session.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
data=monitor,
allow_redirects=False,
headers={"Authorization": "Bearer " + access_token},
)
try:
resp = r.json()
print(resp)
except JSONDecodeError:
sys.exit(r.text)
upload_file("access_token", "test.txt", "test.txt")
When i am trying send file with data attribute in post request then file name did not send and with files attribute in post request then requests-toolbelt not working. How to fix this error ?
When I saw your script, I thought that the content type is not included in the request header. In this case, I think that the request body is directly shown in the uploaded file. I thought that this might be the reason for your current issue. In order to remove this issue, how about the following modification?
From:
r = session.post(
url,
data=monitor,
allow_redirects=False,
headers={"Authorization": "Bearer " + access_token},
)
To:
r = session.post(
url,
data=monitor,
allow_redirects=False,
headers={
"Authorization": "Bearer " + access_token,
"Content-Type": monitor.content_type,
},
)
In this case, from metadata = { "title": filename }, it supposes that url is https://www.googleapis.com/upload/drive/v2/files?uploadType=multipart. Please be careful about this.
When you want to use Drive API v3, please modify metadata = { "title": filename } to metadata = { "name": filename }, and use the endpoint of https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart.
When the file is uploaded with Drive API v3, the value of {'kind': 'drive#file', 'id': '###', 'name': 'test.txt', 'mimeType': 'text/plain'} is returned.
By the way, when an error like badContent occurs in your testing, please try to test the following modification. When in the request body of multipart/form-data the file content is put before the file metadata, it seems that an error occurs. I'm not sure whether this is the current specification. But, I didn't know the order of request body is required to be checked.
From
files = {}
files["file"] = fp
files["data"] = ('metadata', json.dumps(metadata), 'application/json')
To
files = collections.OrderedDict(data=("metadata", json.dumps(metadata), "application/json"), file=fp)
Note:
I thought that in your script, an error might occur at file_size = os.path.getsize(filename). Please confirm this again.
When I tested your script by modifying the above modifications, I could confirm that a test file could be uploaded to Google Drive with the expected filename. In this case, I also modified it as follows.
files = collections.OrderedDict(data=("metadata", json.dumps(metadata), "application/json"), file=fp)
References:
Files: insert of Drive API v2
Files: create of Drive API v3
Upload file data
Metadata needs to be sent in the post body as json.
Python Requests post() Method
data = Optional. A dictionary, list of tuples, bytes or a file object to send to the specified url
json = Optional. A JSON object to send to the specified url
metadata = {
"name": filename,
}
r = session.post(
url,
json=json.dumps(metadata),
allow_redirects=False,
headers={"Authorization": "Bearer " + access_token},
)
Future readers can find below a complete script that also contains details on how to get access to the bearer token for HTTP authentication.
Most of the credit goes to the OP and answers to the OPs question.
"""
Goal: For one time upload of a large file (as the GDrive UI hangs up)
Step 1 - Create OAuth 2.0 Client ID + Client Secret
- by following the "Authentication" part of https://pythonhosted.org/PyDrive/quickstart.html
Step 2 - Get Access Token
- from the OAuth playground -> https://developers.google.com/oauthplayground/
--> Select Drive API v3 -> www.googleapis.com/auth/drive --> Click on "Authorize APIs"
--> Click on "Exchange authorization code for tokens" --> "Copy paste the access token"
--> Use it in the script below
Step 3 - Run file as daemon process
- nohup python -u upload_gdrive.py > upload_gdrive.log 2>&1 &
- tail -f upload_gdrive.log
"""
import sys
import json
import requests
from tqdm import tqdm
import requests_toolbelt # pip install requests_toolbelt
from requests.exceptions import JSONDecodeError
import collections
class ProgressBar(tqdm):
def update_to(self, n: int) -> None:
self.update(n - self.n)
def upload_file(access_token:str, filename:str, filepath:str):
metadata = {
"name": filename,
}
files = {}
session = requests.session()
with open(filepath, "rb") as fp:
files = collections.OrderedDict(data=("metadata", json.dumps(metadata), "application/json"), file=fp)
encoder = requests_toolbelt.MultipartEncoder(files)
with ProgressBar(
total=encoder.len,
unit="B",
unit_scale=True,
unit_divisor=1024,
miniters=1,
file=sys.stdout,
) as bar:
monitor = requests_toolbelt.MultipartEncoderMonitor(
encoder, lambda monitor: bar.update_to(monitor.bytes_read)
)
r = session.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
data=monitor,
allow_redirects=False,
headers={
"Authorization": "Bearer " + access_token
, "Content-Type": monitor.content_type
},
)
try:
resp = r.json()
print(resp)
except JSONDecodeError:
sys.exit(r.text)
upload_file("<access_token>"
, "<upload_filename>", "<path_to_file>")
I have a yaml file : file.yaml structured as follows :
index:
- uid: "uid"
name: "name"
headline: "headline"
overview: "overview"
features: "features"
instructions: "instructions"
callback_url: "https://some-url.com/params"
edit_url: "https://edit-url/params"
uninstall_hook: "https://uninstall-url/params"
svg:
screenshot1:
screenshot2:
screenshot3:
I have to upload those informations to an api endpoint by performing a PUT request. I managed to do it first using the register.py following script that I just run python register.py:
import json
import requests
from pathlib import Path
import base64
import yaml
BASE_URL = "https://url.com" # API Host
FILE_FOLDER = Path.cwd() # Current working directory
if __name__ == "__main__":
public_key = <public_key>
private_key = <private_key>
auth_key = "{}:{}".format(public_key, private_key).encode("utf-8")
encodedKey = base64.b64encode(auth_key).decode("utf-8")
headers = {"Authorization": f"Basic {encodedKey}", "Content-type": "application/json"}
def update_app_info():
infos_file = FILE_FOLDER / "file.yaml"
with open(infos_file) as infos_file_data:
yamlcontent = yaml.safe_load(infos_file_data) # Parse file.yaml and produce a dictionary of it
file_infos = yamlcontent["index"][0] # retrieve actual configuration informations
response = requests.put(
f"{BASE_URL}/path/to/api_endpoint/{public_key}", data=json.dumps(file_infos), headers=headers
)
print(response)
print(response.json())
update_app_info()
That gives a 202 success response.
As you may observe, I tried to get content of the yaml file as a dicitonary and send that in data. I proceeded that way regarding format of data at GET https://url.com/path/to/api_endpoint (mock example for illustration...) . Having the dictionary file_infos seemed more appropriate and gets me a success response. Sending directly the file itself or 'infos_file_data' gave me some errors I got over with the above script.
The issue is when I update svg, screenshot1, screenshot2 & screenshot3 so that file.yaml is now :
index:
- uid: "uid"
name: "name"
headline: "headline"
overview: "overview"
features: "features"
instructions: "instructions"
callback_url: "https://some-url.com/params"
edit_url: "https://edit-url/params"
uninstall_hook: "https://uninstall-url/params"
svg: "icon.svg"
screenshot1: "screenshot1.png"
screenshot2: "screenshot2.png"
screenshot3: "screenshot3.png"
That gives now :
<Response [400]>
{'error': {'message': {'svg': ['The submitted data was not a file. Check the encoding type on the form.'], 'screenshot1': ['The submitted data was not a file. Check the encoding type on the form.'], 'screenshot2': ['The submitted data was not a file. Check the encoding type on the form.'], 'screenshot3': ['The submitted data was not a file. Check the encoding type on the form.']}, 'code': 400}}
I've done multiple searches (1 , 2 , 3 , 4 , 5...) but their application and few other errors, eventually get me to this :
import base64
import json
from pathlib import Path
import requests
import yaml
from requests_toolbelt.multipart.encoder import MultipartEncoder
BASE_URL = "https://url.com" # API Host
FILE_FOLDER = Path.cwd() # Current working directory
if __name__ == "__main__":
public_key = <public_key>
private_key = <private_key>
auth_key = "{}:{}".format(public_key, private_key).encode("utf-8")
encodedKey = base64.b64encode(auth_key).decode("utf-8")
def update_app_info():
infos_file = FILE_FOLDER / "file.yaml"
with open(infos_file) as infos_file_data:
yamlcontent = yaml.safe_load(infos_file_data) # Parse file.yaml and produce a dictionary of it
file_infos = yamlcontent["index"][0] # retrieve actual configuration informations
m = MultipartEncoder(fields=file_infos)
#print(m.content_type)
headers = {
"Authorization": f"Basic {encodedKey}",
"Content-Type": m.content_type,
}
response = requests.put(
f"{BASE_URL}/path/to/api_endpoint/{public_key}",
data=json.dumps(file_infos),
headers=headers
)
print(response)
print(response.json())
update_app_info()
That is also giving me the 202 success response but the file svg, screenshot1, screenshot2 & screenshot3 fields are not updated.
I'll share more informations where needed. Your help is very welcome.
I've got additional resources that helped.
As I was trying to solve my issue, I found this. It happens I didn't wrote files part as it should, plus I was having data as a JSON string. That causes a ValueError: Data must not be a string. error. This was useful to get it fixed.
Now, for what it's worth, here's the working script :
import base64
from pathlib import Path
import requests
import yaml
BASE_URL = "https://url.com" # API Host
FILE_FOLDER = Path.cwd() # Current working directory
if __name__ == "__main__":
public_key = <public_key>
private_key = <private_key>
auth_key = "{}:{}".format(public_key, private_key).encode("utf-8")
encodedKey = base64.b64encode(auth_key).decode("utf-8")
def update_app_info():
infos_file = FILE_FOLDER / "file.yaml"
with open(infos_file) as infos_file_data:
yamlcontent = yaml.safe_load(infos_file_data) # Parse file.yaml and produce a dictionary of it
if "index" in yamlcontent:
file_infos = yamlcontent["index"][0] # retrieve actual configuration informations
headers = {
"Authorization": f"Basic {encodedKey}",
}
files = {
"svg": open("icon.svg", "rb"),
"screenshot1": open("screenshot1.png", "rb"),
"screenshot2": open("screenshot2.png", "rb"),
"screenshot3": open("screenshot3.png", "rb"),
}
response = requests.put(
f"{BASE_URL}/path/to/api_endpoint/{public_key}", data=file_infos, files=files, headers=headers
)
print("\n", response)
print("\n", response.headers)
print("\n", response.json())
update_app_info()
My FIREBASE STORAGE security rule is
rules_version = '2';
service firebase.storage {
match /b/{bucket}/o {
// This will be defined for everything else
match /{allPaths=**} {
allow write: if request.auth != null;
allow read: if request.auth != null;
}
}
}
I would like to send request with "authid" in that, so that request gets verified.
Iam using python for requesting
Sample code
import requests
import shutil
def fetch_RESTAPI(auth_token,url):
headers = {"Authorization": "Bearer " +auth_token}
# r = requests.get(url=url, stream=True, auth=HTTPBasicAuth('test#abc.com', 'somePASS#888'))
# r = requests.get(url=url+"&auth="+auth_token, stream=True,auth=HTTPBasicAuth('test#abc.com', 'somePASS#888'))
r = requests.get(url=url,headers=headers)
path="down.jpg"
print(r)
if r.status_code == 200:
print("File Downloaded")
with open(path, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
else:
print("Something went wrong")
NOTE: I created the sample users using firebase-admin sdk in python, Is there a way to generate the token-id using python, with username and password knonw in advance
I am trying to download file from steamworkshopdownloader.io with requests but it always returns 500 error. What am I doing wrong? I am not very familiar with requests.
Code:
import requests
def downloadMap(map_id):
session = requests.session()
file = session.post("https://backend-02-prd.steamworkshopdownloader.io/api/details/file",
data={"publishedfileid": map_id})
print(file)
downloadMap("814218628")
If you want to download a file from this API try this code, it's adapted from the link in the comment I posted earlier (https://greasyfork.org/en/scripts/396698-steam-workshop-downloader/code) and converted into Python:
import requests
import json
import time
def download_map(map_id):
s = requests.session()
data = {
"publishedFileId": map_id,
"collectionId": None,
"extract": True,
"hidden": False,
"direct": False,
"autodownload": False
}
r = s.post('https://backend-01-prd.steamworkshopdownloader.io/api/download/request', data=json.dumps(data))
print(r.json())
uuid = r.json()['uuid']
data = f'{{"uuids":["{uuid}"]}}'
while True:
r = s.post('https://backend-01-prd.steamworkshopdownloader.io/api/download/status', data=data)
print(r.json())
if r.json()[uuid]['status'] == 'prepared':
break
time.sleep(1)
params = (('uuid', uuid),)
r = s.get('https://backend-01-prd.steamworkshopdownloader.io/api/download/transmit', params=params, stream=True)
print(r.status_code)
with open(f'./{map_id}.zip', 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
download_map(814218628)
The code demonstrates how to use the API and downloads a file named 814218628.zip (or whatever map_id was provided) into the directory the script is run from, the zip archive contains the .udk file (Game map design created by the Unreal Engine Development Kit).
New programmer who has been coding scripts to automate work responsibilities.
Scope of Problem:
I get bi-monthly excel reports from an outside vendor sent via email. This vendor uses ZixMail for encryption in which my company does not leverage. As a result, I have to access these emails via a Secure Mail Center with my username and password to log on this Mail Center website. I am trying to establish a connection to this server and download the attachment files.
What I have tried:
Tried a IMAP connection into the "server" (I am not sure if the website is a mail server)
Struck out many times, as I could never get a connection (If there are suggestions to try please share)
Accessing the site via HTTP using sessions.
I am able to connect to the site but when I go to .get and .write the file my excel file returns blank and corrupted.
On the Mail Center/website when I click the link/url it automatically downloads the file. I am not sure why this has to be so challenging?
The source code from the website where you download the file looks like:
a rel="external" href="/s/attachment?name=Random Letters and Numbers=emdeon" title="File Title.xlsx"
the href looks nothing like a normal URL and does not end in a .xlsx or any other type of file like most of the examples I have seen.
I guess I am just really looking for any ideas, thoughts, helps solutions.
Here is my HTTP connection code
import requests
import urllib.request
import shutil
import os
#Fill in your details here to be posted to the login form.
payload = {
'em': 'Username',
'passphrase': 'Password',
'validationKey': 'Key'
}
#This reads your URL and returns if the file is downloadable
def is_downloadable(URL_D):
h = requests.head(URL_D, allow_redirects=True)
header = h.headers
content_type = header.get('content-type')
if 'text' in content_type.lower():
return False
if 'html' in content_type.lower():
return False
return True
def download_file(URL_D):
with requests.get(URL_D, stream=True) as r:
r.raise_for_status()
with open(FileName, 'wb') as f:
for chunk in r.iter_content(chunk_size=None):
if chunk:
f.write(chunk)
f.close()
return FileName
def Main():
with requests.Session() as s:
p = s.post(URL, data=payload, allow_redirects=True )
print(is_downloadable(URL_D))
download_file(URL_D)
if __name__ == '__main__':
Path = "<path>"
FileName = os.path.join(Path,"Testing File.xlsx")
URL = 'login URL'
URL_D = 'Attachment URL"
Main()
is_downloadable(URL_D) returns as false and the excel file is empty and corrupted
Here is my code for the IMAP attempt:
import email
import imaplib
import os
class FetchEmail():
connection = None
error = None
def __init__(self, mail_server, username, password):
self.connection = imaplib.IMAP4_SSL(mail_server,port=993)
self.connection.login(username, password)
self.connection.select('inbox',readonly=False) # so we can mark mails as read
def close_connection(self):
"""
Close the connection to the IMAP server
"""
self.connection.close()
def save_attachment(self, msg, download_folder):
att_path = "No attachment found."
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(download_folder, filename)
if not os.path.isfile(att_path):
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
return att_path
def fetch_messages(self):
emails = []
(result, messages) = self.connection.search(None, "(ON 20-Nov-2020)")
if result == "OK":
for message in messages[0].split(' '):
try:
ret, data = self.connection.fetch(message,'(RFC822)')
except:
print ("No emails to read for date.")
self.close_connection()
exit()
msg = email.message_from_bytes(data[0][1])
if isinstance(msg, str) == False:
emails.append(msg)
response, data = self.connection.store(message, '+FLAGS','\\Seen')
return emails
self.error = "Failed to retreive emails."
return emails
def Main():
p = FetchEmail(mail_server,username,password)
msg = p.fetch_messages()
p.save_attachment(msg, download_folder)
p.close_connection()
if __name__ == "__main__":
mail_server = "Server"
username = "username"
password = "password"
download_folder= Path
Main()
Error Message: TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Even if I wrote the IMAP script wrong, I tried to IMAP connect via cmd prompt and same results.
To recap all I am looking for is some pointers and ideas to solve this problem. Thank You!
For anyone who stumbled upon this because of a similar issue. Probably not since I have a really weird habit of making everything simple, complicated. But
I was able to solve problem by using selenium webdriver to login to the website, and navigate through using the "click" mechanism. This was the only way I'd be able to successfully download the reports.
import time
import os
import re
import datetime
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
today = datetime.date.today()
first = today.replace(day=1)
year = today.strftime('%Y')
month = today.strftime('%B')
lastMonth = (first - datetime.timedelta(days=1)).strftime('%b')
def Main():
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Chrome(executable_path=path to chrome extension)
s.get("Website login page")
s.find_element_by_id("loginname").send_keys('username')
s.find_element_by_id("password").send_keys('password')
s.find_element_by_class_name("button").click()
for i in range(50):
s.get("landing page post login")
n = str(i)
subject = ("mailsubject"+n)
sent = ("mailsent"+n)
title = s.find_element_by_id(subject).text
date = s.find_element_by_id(sent).text
regex = "Bi Monthly"
regex_pr = "PR"
match = re.search(regex,title)
match_pr = re.search(regex_pr,title)
if match and not match_pr:
match_m = re.search(r"(\D{3})",date)
match_d = re.search(r"(\d{1,2})",date)
day = int(match_d.group())
m = (match_m.group(1))
if (day <= 15) and (m == lastMonth):
print("All up to date files have been dowloaded")
break
else:
name = ("messageItem"+n)
s.find_element_by_id(name).click()
s.find_element_by_partial_link_text("xlsx").click() #This should be under the else but its not formatting right on here
else:
continue
time.sleep(45)
if __name__ == "__main__":
Main()