This is my first question, please bear with me. I am working with an API that authenticates using an access token that expires in 15 minutes, there is no refresh token to use in-lieu of a re-login. So far I have been able to get the access token and insert it into the requests.get call but I cannot seem to get it to renew and am at a loss as to how.
All of the work done with this API, and in general, is with Python so I am hoping to keep it in Python throughout and in the same file.
I get a 401 message code once the 15 minutes are up, and code 200 if successful. So far my only ideas are to put it on a timer for renewal but I cannot make heads or tails of stackoverflow posts or the documentation on doing that, have the login running in a separate script and then this script calls the other one for the current header variable (but that still would require a timer), or have it call to redo the login function once it hits a response.status_code != 200.
Example script for getting the access token
import requests, os, json, time, csv
def login (url, payload):
#this will log into API and get an access token
auth = requests.post(url, data=payload).json()
sessionToken = auth["token"]
sessionTimer = auth["validFor"]
headers = {'Access-Token': sessionToken}
return headers
#calling the function to generate the token
if __name__ == '__main__':
url = "url inserted here"
u = input("Enter your username: ")
p = input("Enter your password: ")
t = input("Enter your tenancy name: ")
payload = {'username': u, 'password': p, 'tenant': t}
print("Logging in")
headers = login(url, payload)
#the actual work as pulled from a csv file
valuables = input("CSV file with filepath: ")
file = open(valuables, 'r', encoding='utf-8')
csvin = csv.reader(file)
for row in csvin:
try:
uuidUrl = row[0]
output_file = row[1]
response = requests.get(uuidUrl, headers=headers)
print(response.status_code)
with open(output_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
except requests.exceptions.RequestException:
print(output_file,"may have failed")
login(url, payload)
continue
I couldn't get it to successfully recognize a if response.status_code != 200: as a way to call back on the login(). I also couldn't seem to get it to exit a while True: loop.
I apologize I cannot give more details on accessing the API for other people to try out. It is non-public
Eventually I was able to figure out the answer to my own question. Posting this for later users. Updated snippet is below.
Short version of the story: requests.status_code was sending back a integer but I made the faulty assumption that it would be a string, thus my internal comparison was no good.
for row in csvin:
try:
uuidUrl = row[0]
xip_file = row[1]
response = requests.get(uuidUrl, headers=headers)
status = response.status_code
print(status)
if status == 401:
print(xip_file, "may have failed, loggin back in")
login(url, payload)
headers = login(url, payload)
response = requests.get(uuidUrl, headers=headers)
with open(xip_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
else:
with open(xip_file, 'wb') as fd:
for chunk in response.iter_content(chunk_size=128):
fd.write(chunk)
fd.close()
except requests.exceptions.RequestException:
print(xip_file,"may have failed")
headers = login(url, payload)
continue
Related
New programmer who has been coding scripts to automate work responsibilities.
Scope of Problem:
I get bi-monthly excel reports from an outside vendor sent via email. This vendor uses ZixMail for encryption in which my company does not leverage. As a result, I have to access these emails via a Secure Mail Center with my username and password to log on this Mail Center website. I am trying to establish a connection to this server and download the attachment files.
What I have tried:
Tried a IMAP connection into the "server" (I am not sure if the website is a mail server)
Struck out many times, as I could never get a connection (If there are suggestions to try please share)
Accessing the site via HTTP using sessions.
I am able to connect to the site but when I go to .get and .write the file my excel file returns blank and corrupted.
On the Mail Center/website when I click the link/url it automatically downloads the file. I am not sure why this has to be so challenging?
The source code from the website where you download the file looks like:
a rel="external" href="/s/attachment?name=Random Letters and Numbers=emdeon" title="File Title.xlsx"
the href looks nothing like a normal URL and does not end in a .xlsx or any other type of file like most of the examples I have seen.
I guess I am just really looking for any ideas, thoughts, helps solutions.
Here is my HTTP connection code
import requests
import urllib.request
import shutil
import os
#Fill in your details here to be posted to the login form.
payload = {
'em': 'Username',
'passphrase': 'Password',
'validationKey': 'Key'
}
#This reads your URL and returns if the file is downloadable
def is_downloadable(URL_D):
h = requests.head(URL_D, allow_redirects=True)
header = h.headers
content_type = header.get('content-type')
if 'text' in content_type.lower():
return False
if 'html' in content_type.lower():
return False
return True
def download_file(URL_D):
with requests.get(URL_D, stream=True) as r:
r.raise_for_status()
with open(FileName, 'wb') as f:
for chunk in r.iter_content(chunk_size=None):
if chunk:
f.write(chunk)
f.close()
return FileName
def Main():
with requests.Session() as s:
p = s.post(URL, data=payload, allow_redirects=True )
print(is_downloadable(URL_D))
download_file(URL_D)
if __name__ == '__main__':
Path = "<path>"
FileName = os.path.join(Path,"Testing File.xlsx")
URL = 'login URL'
URL_D = 'Attachment URL"
Main()
is_downloadable(URL_D) returns as false and the excel file is empty and corrupted
Here is my code for the IMAP attempt:
import email
import imaplib
import os
class FetchEmail():
connection = None
error = None
def __init__(self, mail_server, username, password):
self.connection = imaplib.IMAP4_SSL(mail_server,port=993)
self.connection.login(username, password)
self.connection.select('inbox',readonly=False) # so we can mark mails as read
def close_connection(self):
"""
Close the connection to the IMAP server
"""
self.connection.close()
def save_attachment(self, msg, download_folder):
att_path = "No attachment found."
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(download_folder, filename)
if not os.path.isfile(att_path):
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
return att_path
def fetch_messages(self):
emails = []
(result, messages) = self.connection.search(None, "(ON 20-Nov-2020)")
if result == "OK":
for message in messages[0].split(' '):
try:
ret, data = self.connection.fetch(message,'(RFC822)')
except:
print ("No emails to read for date.")
self.close_connection()
exit()
msg = email.message_from_bytes(data[0][1])
if isinstance(msg, str) == False:
emails.append(msg)
response, data = self.connection.store(message, '+FLAGS','\\Seen')
return emails
self.error = "Failed to retreive emails."
return emails
def Main():
p = FetchEmail(mail_server,username,password)
msg = p.fetch_messages()
p.save_attachment(msg, download_folder)
p.close_connection()
if __name__ == "__main__":
mail_server = "Server"
username = "username"
password = "password"
download_folder= Path
Main()
Error Message: TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Even if I wrote the IMAP script wrong, I tried to IMAP connect via cmd prompt and same results.
To recap all I am looking for is some pointers and ideas to solve this problem. Thank You!
For anyone who stumbled upon this because of a similar issue. Probably not since I have a really weird habit of making everything simple, complicated. But
I was able to solve problem by using selenium webdriver to login to the website, and navigate through using the "click" mechanism. This was the only way I'd be able to successfully download the reports.
import time
import os
import re
import datetime
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
today = datetime.date.today()
first = today.replace(day=1)
year = today.strftime('%Y')
month = today.strftime('%B')
lastMonth = (first - datetime.timedelta(days=1)).strftime('%b')
def Main():
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Chrome(executable_path=path to chrome extension)
s.get("Website login page")
s.find_element_by_id("loginname").send_keys('username')
s.find_element_by_id("password").send_keys('password')
s.find_element_by_class_name("button").click()
for i in range(50):
s.get("landing page post login")
n = str(i)
subject = ("mailsubject"+n)
sent = ("mailsent"+n)
title = s.find_element_by_id(subject).text
date = s.find_element_by_id(sent).text
regex = "Bi Monthly"
regex_pr = "PR"
match = re.search(regex,title)
match_pr = re.search(regex_pr,title)
if match and not match_pr:
match_m = re.search(r"(\D{3})",date)
match_d = re.search(r"(\d{1,2})",date)
day = int(match_d.group())
m = (match_m.group(1))
if (day <= 15) and (m == lastMonth):
print("All up to date files have been dowloaded")
break
else:
name = ("messageItem"+n)
s.find_element_by_id(name).click()
s.find_element_by_partial_link_text("xlsx").click() #This should be under the else but its not formatting right on here
else:
continue
time.sleep(45)
if __name__ == "__main__":
Main()
I have a .txt file that contains a list of URLs. The structure of the URLs varies - some may begin with https, some with http, others with just www and others with just the domain name (stackoverflow.com). So an example of the .txt file content is:-
www.google.com
microsoft.com
https://www.yahoo.com
http://www.bing.com
What I want to do is parse through the list and check if the URLs are live. In order to do that, the stucture of the URL must be correct otherwise the request will fail. Here's my code so far:-
import requests
with open('urls.txt', 'r') as f:
urls = f.readlines()
for url in urls:
url = url.replace('\n', '')
if not url.startswith('http'): #This is to handle just domain names and those that begin with 'www'
url = 'http://' + url
if url.startswith('http:'):
print("trying url {}".format(url))
response = requests.get(url, timeout=10)
status_code = response.status_code
if status_code == 200:
continue
else:
print("URL {} has a response code of {}".format(url, status_code))
print("encountered error. Now trying with https")
url = url.replace('http://', 'https://')
print("Now replacing http with https and trying again")
response = requests.get(url, timeout=10)
status_code = response.status_code
print("URL {} has a response code of {}".format(url, status_code))
else:
response = requests.get(url, timeout=10)
status_code = response.status_code
print("URL {} has a response code of {}".format(url, status_code))
I feel like I've overcomplicated this somewhat and there must be an easier way of trying variants (ie. domain name, domain with 'www' at the beginning, with 'http' at the beginning and with 'https://' at the beginning, until a site is identified as being live or not (ie. all variables have been exhausted).
Any suggestions on my code or a better way to approach this? In essence, I want to handle the formatting of the URL to ensure that I then attempt to check the status of the URL.
Thanks in advance
This is a little too long for a comment, but, yes, it can be simplified, starting from, and replacing, the startswith part:
if not '//' in url:
url = 'http://' + url
response = requests.get(url, timeout=10)
etc.
So as it stands I am able to get the content of the webpage of the PDF link EXAMPLE OF THE LINK HERE BUT, I don't want the content of the webpage I want the content of the PDF so I can put the content into a PDF on my computer in a folder.
I have been successful in doing this on sites that I don't need to log into and without a proxy server.
Relevant CODE:
import os
import urllib2
import time
import requests
import urllib3
from random import *
s = requests.Session()
data = {"Username":"username", "Password":"password"}
url = "https://login.url.com"
print "doing things"
r2 = s.post(url, data=data, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#I get a response 200 from printing r2
print r2
downlaod_url = "http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM"
file = open("F:\my_filepath\document" + str(maxCounter) + ".pdf", 'wb')
temp = s.get(download_url, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#This prints out the response from the proxy server (i.e. 200)
print temp
something = uniform(5,6)
print something
time.sleep(something)
#This gets me the content of the web page, not the content of the PDF
print temp.content
file.write(temp.content)
file.close()
I need help figuring out how to "download" the content of the PDF
try this:
import requests
url = 'http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM'
pdf = requests.get(url)
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
Edit
Try again with a requests session to manage cookies (assuming they send you those after login) and also maybe a different proxy
proxy_dict = {'https': 'ip:port'}
with requests.Session() as session:
# Authentication request, use GET/POST whatever is needed
# data variable should hold user/password information
auth = session.get(login_url, data=data, proxies=proxy_dict, verify=False)
if auth.status_code == 200:
print(auth.cookies) # Tell me if you got anything
pdf = auth.get('download_url') # Were continuing the same session
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
else:
print('No go, got {0} response'.format(auth.status_code))
I'm trying to write a tiny piece of software that logs into mintos.com, and saves the account overview page (which is displayed after a successful login) in a html file. I tried some different approaches, and this is my current version.
import requests
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
username = 'abc'
password = '123'
loginUrl = 'https://www.mintos.com/en/login'
resp = requests.get(loginUrl, auth=(username, password))
file = codecs.open("mint.html", "w", "UTF-8")
file.write(resp.text)
file.close()
When I run the code, I only save the original page, not the one I should get when logged in. I guess I'm messing up the login (I mean...there's not much else to mess up). I spent an embarrassing amount of time on this problem already.
Edit:
I also tried something along the lines of:
import requests
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
loginUrl = "https://www.mintos.com/en/login";
username = "abc"
password = "123"
payload = {"username": username, "password": password}
with requests.session() as s:
resp = s.post(loginUrl, data = payload)
file = codecs.open("mint.html", "w", "UTF-8")
file.write(resp.text)
file.close()
Edit 2: Another non working version, this time with _csrf_token
with requests.session() as s:
resp = s.get(loginUrl)
toFind = '_csrf_token" value="'
splited = resp.text.split(toFind)[1]
_csrf_token = splited.split('"',1)[0]
payload = {"_username": _username, "_password": _password, "_csrf_token": _csrf_token}
final = s.post(loginUrl, data = payload)
file = codecs.open("mint.html", "w", "UTF-8")
file.write(final.text)
file.close()
But I still get the same result. The downloaded page has the same token as the one I extract, though.
Final Edit: I made it work, and I feel stupid now. I needed to use "'https://www.mintos.com/en/login/check' as my loginUrl.
The auth parameter is just a shorthand for HTTPBasicAuth, which is not what most websites use. Most of them use cookies or session data in order to store your login / info on your computer so they can check who you are while you're browsing the pages.
If you want to be able to log in on the website, you'll have to make a POST request on the login form and then store (and give back every time) the cookies they'll send to you. Also, this implies they don't have any kind of "anti-bot filter" (which makes you unable to login without having a real browser or, at least, not that easily).
I'm trying to get all users information from GitHub API using Python Requests library. Here is my code:
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
r = requests.get(url, headers=headers)
users = r.json()
with open('users.json', 'w') as outfile:
json.dump(users, outfile)
I can dump first page of users into a json file by now. I can also find the 'next' page's url:
next_url = r.links['next'].get('url')
r2 = requests.get(next_url, headers=headers)
users2 = r2.json()
Since I don't know how many pages yet, how can I append 2nd, 3rd... page to 'users.json' sequentially in a while loop as fast as possible?
Thanks!
First, you need to open file in 'a' mode, otherwise subsequence write will overwrite everything
import requests
import json
url = 'https://api.github.com/users'
token = "my_token"
headers = {'Authorization': 'token %s' % token}
outfile = open('users.json', 'a')
while True:
r = requests.get(url, headers=headers)
users = r.json()
json.dump(users, outfile)
url = r.links['next'].get('url')
# I don't know what Github return in case there is no more users, so you need to double check by yourself
if url == '':
break
outfile.close()
Append the data you get from the requests query to a list and move on to the next query.
Once you have all of the data you want, then proceed to try to concatenate the data into a file or into an object. You can also use threading to do multiple queries in parallel, but most likely there is going to be rate limiting on the api.