I am trying to capture the rendering time for pages, in an automated way.I tried the same using two approaches
a)Create a selenium script, attach httpwatch to the browse window, collect metrics[The free version of httpwatch doesn't give me all the metrics I need ]
b)Use selenium to launch a chrome window, and collect performance logs using the chrome, performance capabilty], and then try to read the content of the log file by loading it back to chrome or some other tool which would suit the purpose.
the code I used in selenium is
driver = webdriver.Chrome(executable_path="C:\\IEDriverServer\\chromedriver.exe",desired_capabilities={'loggingPrefs': {'performance': 'ALL'}})
The problem is chrome is not able to give me any data when I save the output to a file and load it.
What is it that I am doing wrong, or please let me know if there is a better way to do this. The metrics I need are basically Rendering time on the browse, Rendering start- Onload event start.
I am using python for the selenium scripts
Related
I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0
This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.
When I go to the following website: https://www.bvl.com.pe/mercado/movimientos-diarios and use Selenium's page_source option, or urllib.request.urlopen what I get is a different string than if I go to Google Chrome, and open the INSPECT option in the contextual menu and copy the entire thing.
From my research, I understand it has to do with Javascript running on the webpage and what I am getting is the base HTML.
What code can I use (Python) to get the same information?
That behavior entirely browser-dependent. The browser takes the raw HTML, processes it, runs a JS script (usually), styles it with CSS and does many other things. So to get such a result in Python you'd have to make your own web browser.
After much digging around, I came upon a solution that works in most cases. Use Headless Chrome with the --dump-dom switch.
https://developers.google.com/web/updates/2017/04/headless-chrome
Programmatically in Python use the subprocess module to run Chrome in a shell and either assign the output to a variable or direct the output to a text file.
So I have a program I want to run using selenium specifically that takes a series of actions on a password-protected website. Basically, I need to be able to input a unique link and password when I get it, which will take me to the main website which I have automated. The issue here is that Selenium takes very long to get to load a webpage when you start it up and time is very important in this application. Inputting the link and launching the browser to that link directly takes a long time. What I have tried doing is preloading the browser to a different website (ie, https://google.com) beforehand, and then waiting on user input for the link to the actual page. This process works a lot quicker, but I'm having trouble getting it to work inside a function and with multiprocessing. I am using multiprocessing to execute this on a wide scale with lots of instances. I am trying to start all of my functions the second a link is defined by me. I am on Windows 10, using Python 3.8.3, and using Chrome for my Selenium browser.
from selenium import webdriver
global link
link = input('Paste Link Here: ')
def instance_1():
browser1 = webdriver.Chrome(*my webdriver file path*)
browser1.get('https://google.com')
#need something that waits here until the link variable is defined by me
browser1.get(link)
#the rest of the automation works fine from here
Ideally, the solution would be able to work with multiprocessing. The ideal flow would be something like this:
1. All selenium instances" (written as their own functions) start-up and preload to a website (this part works fine)
2. They wait until the link to go to is specified (this is where the issue is)
3. They then go to the link and execute the automation (this part works fine)
Tldr; basically anything that would allow me to let the program continue while waiting on the input would be nice.
I'm new at Python and I need expert guidance for the project I'm trying to finish at work, as none of my coworkers are programmers.
I'm making a script that logs into a website and pulls a CSV dataset. Here are the steps that I'd like to automate:
Open chrome, go to a website
Login with username/password
Navigate to another internal site via menu dropdown
Input text into a search tag box or delete search tags, e.g. "Hours", press "Enter" or "Tab" to select (repeat this for 3-4 search tags)
Click "Run data"
Wait until data loads, then click "Download" to get a CSV file with 40-50k rows of data
Repeat this process 3-4 times for different data pulls, different links and different search tags
This process usually takes 30-40 minutes for a total of 4 or 5 data pulls each week so it's like watching paint dry.
I've tried to automate this using the pyautogui module, but it isn't working out for me. It works too fast, or doesn't work at all. I think I'm using it wrong.
This is my code:
import webbrowser
import pyautogui
#pyautogui.position()
#print(pyautogui.position())
#1-2
pyautogui.FAILSAFE = True
chrome_path = 'open -a /Applications/Google\ Chrome.app %s'
#2-12
url = 'http://Google.com/'
webbrowser.get(chrome_path).open(url)
pyautogui.moveTo(185, 87, duration=0.25)
pyautogui.click()
pyautogui.typewrite('www.linkedin.com')
pyautogui.press('enter')
#loginhere? Research
In case pyautogui is not suited for this task, can you recommend an alternative way?
The way you are going about grabbing your data is very error prone and not how people generally go about grabbing data from websites. What you want is a web scraper, which allows you to grab information from websites or some companies provide API's that allow you easier access to the data.
To grab information from LinkedIn it has a built in API. You did mention that you were navigating to another site though in which case I would see if that site has an API or look into using Scrapy, a web scraper that should allow you to pull the information you need.
Sidenote: You can also look into synchronous and asynchronous programming with python to make multiple requests faster/easier
I want to find the time whatever (an object, image, text, link, DB or anything) loads first in a requested website using Python and Selenium.
Checkout performance.timing, it's JavaScript and comes default in your browser. You have a lot of options to display, like:
navigationStart
connectStart
connectEnd
domLoading
domInteractive
domComplete
Just go to your console window in your browser and type performance.timing. Might be of use to you.
If you find something you can use, you can use selenium to execute the JavaScript inside the browser using execute_script:
driver.execute_script(‘return performance.timing.domComplete’)