Selenium - Can't get text from element (Python) - python

I'm trying to get the result of an input from:
https://web2.0calc.com/
But I can't get the result. I've tried:
result = browser.find_element_by_id("input")
result.text
result.get_attribute("textContent")
result.get_attribute("innerHtml")
result.get_attribute("textContent")
But it doesn't work and returns an empty string...

The required element is a Base64 image, so you can either get a Base64 value from #src, convert it to an image and get a value with a tool like PIL (quite complicated approach) or you can get a result with a direct API call:
import requests
url = 'https://web2.0calc.com/calc'
data = data={'in[]': '45*23'} # Pass your expression as a value
response = requests.post(url, data=data).json()
print(response['results'][0]['out'])
# 1035
If you need the value of #input:
print(browser.find_element_by_id('input').get_attribute('value'))

My preference would be for the POST example (+ for that) given but you can grab the expression and evaluate that using asteval. There may be limitations on asteval. It is safer than eval.
from selenium import webdriver
from asteval import Interpreter
d = webdriver.Chrome()
url = 'https://web2.0calc.com/'
d.get(url)
d.maximize_window()
d.find_element_by_css_selector('[name=cookies]').click()
d.find_element_by_id('input').send_keys(5)
d.find_element_by_id('BtnPlus').click()
d.find_element_by_id('input').send_keys(50)
d.find_element_by_id('BtnCalc').click()
expression = ''
while len(expression) == 0:
expression = d.find_element_by_id('result').get_attribute('title')
aeval = Interpreter()
print(aeval(expression))
d.quit()

Related

How to get BuildingElementProxys of ifc-files?

I have to extract some convection coolers out of an ifc-file. They're saved as BildingElementProxys & are in relation with IfcRelContainedInSpatialStructure.
My approach is to get all the IfcRelContainedInSpatialStructures & search with a for-loop through the RelatedObjects if there are objects which are also an IfcBuildingElementProxy.
But I'm not which excact commands I got to use for the for-loop.
Would be great if someone could help
That's what I've got so far:
import ifcopenshell
import ifcopenshell.util
from ifcopenshell.util.selector import Selector
import ifcopenshell.file
ifc = ifcopenshell.open(...)
selector = Selector()
buildingelementproxies = selector.parse(ifc, ".IfcBuildingElementProxy")
spaces = selector.parse(ifc, ".IfcSpace")
containedrelation = selector.parse(ifc, ".IfcRelContainedInSpatialStructure")
print (containedrelation)

Mess with lists of variables

Recently I've been working on a code that saves a set of variables on a list, and each set is saved on one list that contains all the other lists of variables, then I remove some characters of the variables and transform them into float, finally I take the smallest number of each list and save it on another list. The problem is when I move those numbers to the new list it just show me one number and not the entire list. Can somebody help me?
Here's the code:
from typing import List
from bs4 import BeautifulSoup
import requests
import pandas as pd
from decimal import Decimal
ListaPreciosCromos = list()
ListaUrl = ['https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_495570&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc', 'https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_540190&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc', 'https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_607210&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc',]
PageCromos = [requests.get(x) for x in ListaUrl]
SoupCromos = [BeautifulSoup(x.content, "html.parser") for x in PageCromos]
PrecioCromos = [x.find_all("span", {"data-price": True}) for x in SoupCromos]
for x in PrecioCromos:
for i in x: #
Cromolist2 = [h.replace("$","") for h in i]
CromoList3 = [h.replace("USD","") for h in Cromolist2]
CromoList4 = [float(h) for h in CromoList3]
CantidadCromos = len(CromoList4)
CromoList5 = sorted(CromoList4)
CromoList6 = CromoList5[0]
print(CromoList6)
Output:
0.06
Change CromoList6 = CromoList5[0] to CromoList6.append(CromoList5[0])
Instead of using replace() twice, you can use strip(USD$) to remove the $ and USD from the string.
You can use min() function to get the minimum value of list instead of sorting.
min() takes O(N) and sorted() takes O(NlogN) Time.
Since you need a list of minimum values, you need to use this
CromoList6.append(CromoList5[0]) - This appends all the minimum values to CromoList6.
Here is a minified version of your code with above mentioned changes.
from typing import List
from bs4 import BeautifulSoup
import requests
ListaPreciosCromos = list()
ListaUrl = ['https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_495570&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc', 'https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_540190&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc', 'https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_607210&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc',]
PageCromos = [requests.get(x) for x in ListaUrl]
SoupCromos = [BeautifulSoup(x.content, "html.parser") for x in PageCromos]
PrecioCromos = [x.find_all("span", {"data-price": True}) for x in SoupCromos]
min_CromoList = []
for item in PrecioCromos:
CromoList = [float(i.text.strip('USD$')) for i in item]
min_CromoList.append(min(CromoList))
print(min_CromoList)
[0.04, 0.05, 0.05]

Separate python tuple with newlines

I am working on a simple script that collects earthquake data and sends me a text with the info. For some reason I am not able to get my data to be separated by new lines. Im sure I am missing something easy but Im still pretty new to programming so any help is greatly appreciated! Some of the script below:
import urllib.request
import json
from twilio.rest import Client
import twilio
events_list = []
def main():
#Site to pull quake json data
urlData = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson"
webUrl = urllib.request.urlopen(urlData)
if (webUrl.getcode() == 200):
data = webUrl.read()
# Use the json module to load the string data into a dictionary
theJSON = json.loads(data)
# collect the events that only have a magnitude greater than 4
for i in theJSON["features"]:
if i["properties"]["mag"] >= 4.0:
events_list.append(("%2.1f" % i["properties"]["mag"], i["properties"]["place"]))
print(events_list)
# send with twilio
body = events_list
client = Client(account_sid, auth_token)
if len(events_list) > 0:
client.messages.create (
body = body,
to = my_phone_number,
from_ = twilio_phone_number
)
else:
print ("Received an error from server, cannot retrieve results " + str(webUrl.getcode()))
if __name__ == "__main__":
main()
To split the tuple with newlines, you need to call the "\n".join() function. However, you need to first convert all of the elements in the tuple into strings.
The following expression should work on a given tuple:
"\n".join(str(el) for el in mytuple)
Note that this is different from converting the entire tuple into a string. Instead, it iterates over the tuple and converts each element into its own string.
Since you have the list of tuples stored in "events_list" you could probably do something like this:
for event in events_list:
print(event[0],event[1])
It will give you something like this:
4.12 10km near Florida
5.00 4km near Bay

How can I get the text from a table in HTML?

I am trying to scrape data from https://fortnitetracker.com/events/epicgames_S10_FNCS_Week5_NAE. Specifically, I am trying to get the placement and number of points earned by a specific player. I went to the website and found the instance where the specific player ("Nickmercs") was located in the HTML which looked like this:
HTML Text
You can see the "rank" is shown above his name as 56, and the points are shown a few lines below his name which is also 56. I then wrote the following Python 3 program to scrape the data from the website:
import requests
class tracker:
url = "https://fortnitetracker.com/events/epicgames_S10_FNCS_Week5_NAE"
def getReq(website):
req = requests.get(website)
if req:
return req
req = getReq(url)
text = req.text
index = text.find("nickmercs")
split = text[index:index+1000]
print (split)
Running the program resulted in a large portion of the HTML code, but the instance of "Nickmercs" that it found was not the one I was looking for. The one shown in the picture of the HTML code shown above was the actual first instance if the "Nickmercs" string on the page, but for some reason it was not in the req.text / the response for my request. As a result I went back and modified my code to print out where the first instance actually was, and found that the line was different from what was shown in the HTML code picture. The line that was supposed to list the names "Nate Hill, Nickmercs, SypherPK" actually looked like this:
<span :style="{ 'color': '#' + metadata.primary_color }">{{ getPlayerNameList(entry.teamAccountIds, 4) }}</span>
I have little knowledge of how HTML works, so I am wondering if it is possible to fix this problem. It seems to be calling some (what I imagine is a) method called getPlayerNameList() which places the names in the correct spot, but makes it so I can't easily search the names / scrape the data. Is there a way to get around this? Any help is much appreciated!
The site is dynamic, thus, you need some way to access the data populated after the page originally loads. One such way is to use selenium:
from selenium import webdriver
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://fortnitetracker.com/events/epicgames_S10_FNCS_Week5_NAE')
h, *r = [[i.text for i in b.find_all('th' if b.td is None else 'td')] for b in soup(d.page_source, 'html.parser').find('div', {'id':'leaderboard'}).table.find_all('tr')]
new_data = {tuple(b.split(', ')):dict(zip([h[0], *h[2:]], [a[1:-1], *c])) for a, b, *c in r}
Now, to look up a player by name:
data = [b for a, b in new_data.items() if 'Nickmercs' in a][0]
Output:
{'Rank': '56', 'Points': '56 Top 0.373%', 'Matches': '10', 'Wins': '0', 'K/D': '3.50', 'Avg Place': '16.10'}
For your specific target value (Rank):
rank = [b for a, b in new_data.items() if 'Nickmercs' in a][0]['Rank']
Output:
56
Data is dynamically loaded from script tags so content is present in response. You can regex out the leaderboard/session info and the accounts info and connect the two via account_id. You find the right account_id based on the player name of interest
import requests, re, json
def get_json(pattern):
p = re.compile(pattern, re.DOTALL)
return p.findall(r.text)[0]
r = requests.get('https://fortnitetracker.com/events/epicgames_S10_FNCS_Week5_NAE')
player = 'Nickmercs'
session_info = json.loads(get_json('imp_leaderboard = (.*?);'))
player_info = json.loads(get_json('imp_accounts = (.*?);'))
account_id = [i['accountId'] for i in player_info if i['playerName'] == player][0]
team_info = [i for i in session_info['entries'] if account_id in i['teamId']]
print(team_info)
This gives you all the relevant info. Part of that is shown here:
Specific items:
print(team_info[0]['pointsEarned'])
print(team_info[0]['rank'])
You are scraping the HTML along the javascript code and it is not rendered.
For this task you could use computer vision to extract the table from the page.
Otherwise you can use PhantomJS (https://phantomjs.org/) to scrape the table without using images as it gives you the rendered page.

Fuzzy URL matching in Python

I'd like to find a tool that does a good job of fuzzy matching URLs that are the same expecting extra parameters. For instance, for my use case, these two URLs are the same:
atest = (http://www.npr.org/templates/story/story.php?storyId=4231170', 'http://www.npr.org/templates/story/story.php?storyId=4231170&sc=fb&cc=fp)
At first blush, fuzz.partial_ratio and fuzz.token_set_ratio fuzzywuzzy get the job done with a 100 threshold:
ratio = fuzz.ratio(atest[0], atest[1])
partialratio = fuzz.partial_ratio(atest[0], atest[1])
sortratio = fuzz.token_sort_ratio(atest[0], atest[1])
setratio = fuzz.token_set_ratio(atest[0], atest[1])
print('ratio: %s' % (ratio))
print('partialratio: %s' % (partialratio))
print('sortratio: %s' % (sortratio))
print('setratio: %s' % (setratio))
>>>ratio: 83
>>>partialratio: 100
>>>sortratio: 83
>>>setratio: 100
But this approach fails and returns 100 in other cases, like:
atest('yahoo.com','http://finance.yahoo.com/news/earnings-preview-monsanto-report-2q-174000816.html')
The URLs in my data and the parameters added vary a great deal. I interested to know if anyone has a better approach using url parsing or similar?
If all you want is check that all query parameters in the first URL are present in the second URL, you can do it in a simpler way by just doing set difference:
import urllib.parse as urlparse
base_url = 'http://www.npr.org/templates/story/story.php?storyId=4231170'
check_url = 'http://www.npr.org/templates/story/story.php?storyId=4231170&sc=fb&cc=fp'
base_url_parameters = set(urlparse.parse_qs(urlparse.urlparse(base_url).query).keys())
check_url_parameters = set(urlparse.parse_qs(urlparse.urlparse(check_url).query).keys())
print(base_url_parameters - check_url_parameters)
This will return an empty set, but if you change the base url to something like
base_url = 'http://www.npr.org/templates/story/story.php?storyId=4231170&test=1'
it will return {'test'}, which means that there are extra parameters in the base URL that are missing from the second URL.

Categories

Resources