Downloading PDF's and tracking downloads with Python - python

I'm creating an application that downloads PDF's from a website and saves them to disk. I understand the Requests module is capable of this but is not capable of handling the logic behind the download (File size, progress, time remaining etc.).
I've created the program using selenium thus far and would like to eventually incorporate this into a GUI Tkinter app eventually.
What would be the best way to handle the downloading, tracking and eventually creating a progress bar?
This is my code so far:
from selenium import webdriver
from time import sleep
import requests
import secrets
class manual_grabber():
""" A class creating a manual downloader for the Roger Technology website """
def __init__(self):
""" Initialize attributes of manual grabber """
self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')
def login(self):
""" Function controlling the login logic """
self.driver.get('https://rogertechnology.it/en/b2b')
sleep(1)
# Locate elements and enter login details
user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
user_in.send_keys(secrets.username)
pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
pass_in.send_keys(secrets.password)
enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
enter_button.click()
# Click Self Service Area button
self_service_button = self.driver.find_element_by_xpath('//*[#id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
self_service_button.click()
def download_file(self):
"""Access file tree and navigate to PDF's and download"""
# Wait for all elements to load
sleep(3)
# Find and switch to iFrame
frame = self.driver.find_element_by_xpath('//*[#id="siteOutFrame"]/iframe')
self.driver.switch_to.frame(frame)
# Find and click tech manuals button
tech_manuals_button = self.driver.find_element_by_xpath('//*[#id="fileTree_1"]/ul/li/ul/li[6]/a')
tech_manuals_button.click()
bot = manual_grabber()
bot.login()
bot.download_file()
So in summary, I'd like to make this code download PDF's on a website, store them in a specific directory (named after it's parent folder in the JQuery File Tree) and keep tracking of the progress (file size, time remaining etc.)
Here is the DOM:
I hope this is enough information. Any more required please let me know.

Related

How to open web API calls to the default browser window in PyQt5 Python

I have a browser code built in python using PyQt5 and I want to implement a functionality which is whenever the browser receives an API call from a website to some other service it should open that in the default browser.
For example, whenever we want to login to a website using google and we click on the google option, we get a new window to select our google account. That's the same functionality I want to implement.
Find the complete code here: https://pastebin.com/41n9eghQ
class WebPage(QWebEnginePage):
linkClicked = Signal(QUrl)
def acceptNavigationRequest(self, url, navigation_type, isMainFrame):
if navigation_type == QWebEnginePage.NavigationTypeLinkClicked:
self.linkClicked.emit(url)
return False
return super(WebPage, self).acceptNavigationRequest(
url, navigation_type, isMainFrame)
The above class inside the code accepts and processes the in-browser navigation requests.
def createWindow(self, webwindowtype):
import webbrowser
try:
webbrowser.open(to_text_string(self.url().toString()))
except ValueError:
pass
The above function opens the window in default browser whenever an API call gets triggered, the only problem here is right now I am passing the url of the current website i.e. self.url() and not the url of the API and I don't understand how to do so.
Is there a way to capture the API request and pass that url to the webbrowser.open() function.
Any help would be appreciated, thanks for your attention!

windows toast notifications with action using python winrt module

I've been trying to get this working for a long time now and i always get stuck at detecting button presses. I made a toast notification that looks like this:
Here's my code :
import winrt.windows.ui.notifications as notifications
import winrt.windows.data.xml.dom as dom
app = '{1AC14E77-02E7-4E5D-B744-2EB1AE5198B7}\\WindowsPowerShell\\v1.0\\powershell.exe'
#create notifier
nManager = notifications.ToastNotificationManager
notifier = nManager.create_toast_notifier(app)
#define your notification as string
tString = """
<toast>
<visual>
<binding template='ToastGeneric'>
<text>New notifications</text>
<text>Text</text>
<text>Second text</text>
</binding>
</visual>
<actions>
<action
content="test1"
arguments="test1"
activationType="backround"/>
<action
content="test2"
arguments="test2"
activationType="backround"/>
</actions>
</toast>
"""
print(type(notifier.update))
#convert notification to an XmlDocument
xDoc = dom.XmlDocument()
xDoc.load_xml(tString)
#display notification
notifier.show(notifications.ToastNotification(xDoc))
I don't know how to detect button presses
the only thing i figured out is that if i change the argument of the buttons to a link like this:
arguments="https://google.com"
then it will open it
Is there any way i could implement this? or is there documentation for this XML format these toast notifications use. That explains how arguments work?
Alright so I know It's been a while, but I was trying to figure out the same thing and I couldn't find a good, conclusive answer anywhere. I've finally gotten something to work with WinRT in Python 3.9 so I wanted there to be an answer somewhere that people could find!
So to start, I'm not intimately familiar with how the 'arguments' attribute works, but it doesn't seem to be important for at least simple use cases. Most of what I know came from the Windows Toast docs. Here's some code that should produce a notification and open your Documents folder when you click the button. I got a headstart from an answer in this thread but it was missing some very important steps.
import os,sys,time
import subprocess
import threading
import winrt.windows.ui.notifications as notifications
import winrt.windows.data.xml.dom as dom
# this is not called on the main thread!
def handle_activated(sender, _):
path = os.path.expanduser("~\Documents")
subprocess.Popen('explorer "{}"'.format(path))
def test_notification():
#define your notification as
tString = """
<toast duration="short">
<visual>
<binding template='ToastGeneric'>
<text>New notifications</text>
<text>Text</text>
<text>Second text</text>
</binding>
</visual>
<actions>
<action
content="Test Button!"
arguments=""
activationType="foreground"/>
</actions>
</toast>
"""
#convert notification to an XmlDocument
xDoc = dom.XmlDocument()
xDoc.load_xml(tString)
notification = notifications.ToastNotification(xDoc)
# add the activation token.
notification.add_activated(handle_activated)
#create notifier
nManager = notifications.ToastNotificationManager
#link it to your Python executable (or whatever you want I guess?)
notifier = nManager.create_toast_notifier(sys.executable)
#display notification
notifier.show(notification)
duration = 7 # "short" duration for Toast notifications
# We have to wait for the results from the notification
# If we don't, the program will just continue and maybe even end before a button is clicked
thread = threading.Thread(target=lambda: time.sleep(duration))
thread.start()
print("We can still do things while the notification is displayed")
if __name__=="__main__":
test_notification()
The key thing to note here is that you need to find a way to wait for the response to the notification, since the notification is handled by a different thread than the program that produces it. This is why your "www.google.com" example worked while others didn't, because it didn't have anything to do with the Python program.
There's likely a more elegant solution, but a quick and easy way is to just create a Python thread and wait there for a duration. This way it doesn't interfere with the rest of your program in case you need to be doing something else. If you want your program to wait for a response, use time.sleep(duration) without all the threading code to pause the whole program.
I'm not sure how it works exactly, but it seems like the add_activated function just assigns a callback handler to the next available block in the XML. So if you wanted to add another button, my guess is that you can just do add_activated with another callback handler in the same order as you've listed your buttons.
Edit: I played around with it some and it turns out this lets you click anywhere, not just on the button. Not sure where to go from there but it's worth a heads up.

Navigate to new page hosted on bokeh server from within bokeh app

So I'm writing an application running on the bokeh server, and having difficulty with navigating between different pages being hosted. Let's say I have this simple class that loads a single button:
class navigateWithButton(HBox):
extra_generated_classes = [["navigateWithButton", "navigateWithButton", "HBox"]]
myButton= Instance(Button)
inputs = Instance(VBoxForm)
#classmethod
def create(cls):
obj = cls()
obj.myButton = Button(
label="Go"
)
obj.inputs = VBoxForm(
children=[
obj.login_button
]
)
obj.children.append(obj.inputs)
return obj
def setup_events(self):
super(navigateWithButton, self).setup_events()
if not self.myButton:
return
self.myButton.on_click(self.navigate)
def navigate(self, *args):
###################################################################
### want to redirect to 'http://localhost:5006/other_app' here! ###
###################################################################
and further down I have, as would be expected:
#bokeh_app.route("/navigate/")
#object_page("navigate")
def navigate_button_test():
nav = navigateWithButton.create()
return nav
Along with a route to an addtional app I've created from within the same script:
#bokeh_app.route("/other_app/")
#object_page("other_app")
def some_other_app():
app = otherApp.create()
return app
running this code I can (obviously) easily navigate between the two applications just by typing in the address, and they both work beautifully, but I cannot for the life of me find an example of programmatically navigating between the two pages. I'm certain the answer is simple and I must be overlooking something very obvious, but If someone could tell me precisely where I am being ridiculous or if I'm barking waaay up the wrong tree I'd be extremely appreciative!!
And please bear in mind: I'm certain there are better ways of doing this, but I'm tasked with finishing inherited code and I'd like to try and find a solution before having to rebuild from scratch

Managing multiple instances of Selenium in Python

I am trying to be able to manage multiple instances of Selenium at the same time, but haven't had much luck. I'm not 100% sure if it's possible. I have an application with a GUI built with PyQT that retrieves our client's information from our SQL database. It's a fairly simple app that lets our users easily log in and out of our clients' accounts. They click the client's name, press "Login", it launches an instance of Firefox, logs into the account, and stays open so the user can do whatever they need to do. When they are done, they click the "Logout" button, and it logs out of the account and quits the webdriver instance.
What I'm trying to provide is a way for them to log into multiple accounts at once, while still maintaining the ability to click one of the client's names that they are logged into, process the logout on that account, and close that browser instance.
One thing I was hoping is to be able to control the webdriver by either a process ID, or unique ID, in which I can store in a dictionary linking it to that client, so when they click the client's name in the app, and press logout, it uses something in PyQT like "client_name = self.list_item.currentItem().text()" to get the name of the client they have selected (which I'm already using for other things, too), finds the unique ID or process ID, and sends the logout command to that instance, and then closes that instance.
This may not be the best way to go about doing it, but it's the only thing I could think of.
EDIT: I also know that you can retrieve the Selenium session_id with driver.session_id (considering your webdriver instance is assigned as 'driver'), but i have seen nothing so far on being able to control a webdriver instance by this session_id.
EDIT2: Here is an incredibly stripped down version of what I have:
from selenium import webdriver
from PyQt4 import QtGui, QtCore
class ClientAccountManager(QtGui.QMainWindow):
def __init__(self):
super(ClientAccountManager, self).__init__()
grid = QtGui.QGridLayout()
# Creates the list box
self.client_list = QtGui.QListWidget(self)
# Populates the list box with owner data
for name in client_names.itervalues():
item = QtGui.QListWidgetItem(name)
self.client_list.addItem(item)
# Creates the login button
login_btn = QtGui.QPushButton("Login", self)
login_btn.connect(login_btn, QtCore.SIGNAL('clicked()'), self.login)
# Creates the logout button
logout_btn = QtGui.QPushButton("Logout", self)
logout_btn.connect(logout_btn, QtCore.SIGNAL('clicked()'), self.logout)
def login(self):
# Finds the owner info based on who is selected
client_name = self.client_list.currentItem().text()
client_username, client_password = get_credentials(client_name)
# Creates browser instance
driver = webdriver.Firefox()
# Logs in
driver.get('https://www.....com/login.php')
driver.find_element_by_id('userNameId').send_keys(client_username)
driver.find_element_by_id('passwordId').send_keys(client_password)
driver.find_element_by_css_selector('input[type=submit]').click()
def logout(self):
# Finds the owner info based on who is selected
client_name = self.client_list.currentItem().text()
# Logs out
driver.get('https://www....com/logout.php')
# Closes the browser instance
driver.quit()
def main():
app = QtGui.QApplication(sys.argv)
cpm = ClientAccountManager()
cpm.show()
sys.exit(app.exec_())
if __name__ == '__main__':
main()
You can have multiple drivers. Just call webdriver.Firefox() multiple times and keep references to each driver. Some people report oddball behavior, but it basically works.
driver.close() will close the browser and does not take an id.

HTML page vastly different when using a headless webkit implementation using PyQT

I was under the impression that using a headless browser implementation of webkit using PyQT will automatically get me the html code for each URL even with heavy JS code in it. But I am only seeing it partially. I am comparing with the page I get when I save the page from the firefox window.
I am using the following code -
class JabbaWebkit(QWebPage):
# 'html' is a class variable
def __init__(self, url, wait, app, parent=None):
super(JabbaWebkit, self).__init__(parent)
JabbaWebkit.html = ''
if wait:
QTimer.singleShot(wait * SEC, app.quit)
else:
self.loadFinished.connect(app.quit)
self.mainFrame().load(QUrl(url))
def save(self):
JabbaWebkit.html = self.mainFrame().toHtml()
def userAgentForUrl(self, url):
return USER_AGENT
def get_page(url, wait=None):
# here is the trick how to call it several times
app = QApplication.instance() # checks if QApplication already exists
if not app: # create QApplication if it doesnt exist
app = QApplication(sys.argv)
#
form = JabbaWebkit(url, wait, app)
app.aboutToQuit.connect(form.save)
app.exec_()
return JabbaWebkit.html
Can some one see anything obviously wrong with the code?
After running the code through a few URLs, here is one I found that shows the problems I am running into quite clearly - http://www.chilis.com/EN/Pages/menu.aspx
Thanks for any pointers.
The page have ajax code, when it finish load, it still need some time to update the page with ajax. But you code will quit when it finish load.
You should add some code like this to wait some time and process events in webkit:
for i in range(200): #wait 2 seconds
app.processEvents()
time.sleep(0.01)

Categories

Resources