Trying to read data from War Thunder local host with python - python

Basically I'm using python to send serial data to an arduino so that I can make moving dials using data from the game. This would work because you can use the url "localhost:8111" to give you a list of these stats when ingame. The problem is I'm using urllib and BeautifulSoup but they seem to be blindly reading the source code not giving the data I need.
The data I need comes up when I inspect the element of that page. Other pages seem to suggest that using something to run the HTML in python would fix this but I have found no way of doing this. Any help here would be great thanks.

Not the poster but i have been working on this with him. We managed to get it working. In case anyone else is having this problem here is the code that got it to display our speed
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("http://localhost:8111")
time.sleep(1)
while True:
elements = driver.find_element_by_id("stt-IAS, km/h")
print(elements.text)
Don't know why the time.sleep is needed but the code doesn't seem to work without it.

Your problem might be that the page elements are Dynamic. (Revealed by JavaScript for example)
Why is this a problem? A: You can't access those tags or data. You'll have to use either a headless/Automated browser ( Learn more about selenium ).
Then make a session through selenium and keep feeding the data the way you wanted to the Arduino.
Summary: If you inspect elements you can see the tag, if you go to view source you cant see it. This can't be solved using bs4 or requests alone. You'll have to use a module called Selenium or something similar.

Here is a Python module that you can use to get all air vehicle telemetry data from War Thunder localhost server pages "indicators" and "status". The contents of each of these pages are static JSON descriptions of the vehicle's current telemetry values.
The Python package uses the requests module to query the localhost server for the data, converts the returned JSON data into dictionaries, and then consolidates all the data into a singular telemetry dictionary. This data can then be used for other Python processes such as datalogging or graphing.

Related

Extracting info from webpage via python

I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0
This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.

Url request does not parse every information in HTML using Python

I am trying to extract information from an exchange website (chiliz.net) using Python (requests module) and the following code:
data = requests.get(url,time.sleep(15)).text
I used time.sleep since the website is not directly connecting to the exchange main page, but I am not sure it is necessary.
The things is that, I cannot find anything written under <body style> in the HTML text (which is the data variable in this case). How can I reach the full HTML code and then start to extract the price information from this website?
I know Python, but not familiar with websites/HTML that much. So I would appreciate if you explain the website related info like you are talking to a beginner. Thanks!
There could be a few reasons for this.
The website runs behind a proxy server from what I can tell, so this does interfere with your request loading time. This is why it's not directly connecting to the main page.
It might also be the case that the elements are rendered using javascript AFTER the page has loaded. So, you only get the page and not the javascript rendered parts. You can try to increase your sleep() time but I don't think that will help.
You can also use a library called Selenium. It simply automates browsers and you can use the page_source property to obtain the HTML source code.
Code (taken from here)
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://example.com")
html_source = browser.page_source
With selenium, you can also set the XPATH to obtain the data of -' extract the price information from this website'; you can see a tutorial on that here. Alternatively,
once you extract the HTML code, you can also use a parser such as bs4 to extract the required data.

Python webscraping - realtime data

I am trying scrape the live data at the to of this page:
https://www.wallstreet-online.de/devisen/euro-us-dollar-eur-usd-kurs/realtime
My current method:
import time
import re
import bs4 from bs4 import BeautifulSoup as soup
import requests
while (1==1):
con = requests.request('get','https://www.wallstreet-
online.de/devisen/euro-us-dollar-eur-usd-kurs/realtime', stream = True)
page = con.text
kursSoup = soup(page, "html.parser")
kursDiv = kursSoup.find("div", {"class":"pull-left quoteValue"})
print(kursDiv.span)
del con
del page
del kursSoup
del kursDiv
#time.sleep(2)
print("end")
works but is not in sync with the data on the website. I dont really get why because i delete all the variables at the end of the loop so the result should change when the data on the website changes but seems to stay the same for a fixed amount of times. Does anyone know why or has a better way of doing this (Im a bloody beginner and have no idea how the site even works thats why im parsing the html).
It looks like that web page may be using JavaScript to populate and update that number. I'm not familiar with BeautifulSoup but I don't think it will run the JavaScript on the page to update that number.
You may want to use something like Chrome Developer Tools to keep an eye on the network tab. I looked and it looks like there is a websocket connection to wss://push.wallstreet-online.de/lightstreamer going on behind the scenes. You may want to use a websocket client Python library to read from this socket and either find some API docs or reverse engineer the data that comes from the socket. Good luck!

Python 3.X Extract Source Code ONLY when page is done loading

I submit a query on a web page. The query takes several seconds before it is done. Only when it is done does it display an HTML table that I would like to get the information from. Let's say this query takes a maximum of 4 seconds to load. While I would prefer to get the data as soon as it is loaded, it would be acceptable to wait 4 seconds then get the data from the table.
The issue I have is when I make my urlread request, the page hasn't finished loading yet. I tried loading the page, then issuing a sleep command, then loading it again, but that does not work either.
My code is
import urllib.request
import time
uf = urllib.request.urlopen(urlname)
time.sleep(3)
uf.decode('UTF-8')
text = uf.read()
print (text)
The webpage I am looking at is http://bookscouter.com/prices.php?isbn=9781111835811 (feel free to ignore the interesting textbook haha)
And I am using Python 3.X on a Raspberry Pi
The prices you want are not in the page you're retrieving, so no amount of waiting will make them appear. Instead, the prices are retrieved by a JavaScript in that page after it has loaded. The urllib module is not a browser, so it won't run that script for you. You'll want to figure out what the URL is for the AJAX request (a quick look at the source code gives a pretty big hint) and retrieve that instead. It's probably going to be in JSON format so you can just use Python's json module to parse it.

Grabbing non-HTML data from a website using python

I'm trying to get the current contract prices on this page to a string: http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500.html
I would really like a python 2.6 solution.
It was easy to get the page html using urllib, but it seems like this number is live and not in the html. I inspected the element in Chrome and it's some td class thing.
But I don't know how to get at this with python. I tried beautifulsoup (but after several attempts gave up getting a tar.gz to work on my windows x64 system), and then elementtree, but really my programming interest is data analysis. I'm not a website designer and don't really want to become one, so it's all kind of a foreign language. Is this live price XML?
Any assistance gratefully received. Ideally a simple to install module and some actual code, but all hints and tips very welcome.
It looks like the numbers in the table are filled in by Javascript, so just fetching the HTML with urllib or another library won't be enough since they don't run the javascript. You'll need to use a library like PyQt to simulate the browser rendering the page/executing the JS to fill in the numbers, then scrape the output HTML of that.
See this blog post on working with PyQt: http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/link text
If you look at that website with something like firebug, you can see the AJAX calls it's making. For instance the initial values are being filled in with a AJAX call (at least for me) to:
http://www.cmegroup.com/CmeWS/md/MDServer/V1/Venue/G/Exchange/XCME/FOI/FUT/Product/ES?currentTime=1292780678142&contractCDs=,ESH1,ESM1,ESU1,ESZ1,ESH2,ESH1,ESM1,ESU1,ESZ1,ESH2
This is returning a JSON response, which is then parsed by javascript to fill in the tabel. It would be pretty simple to do that yourself with urllib and then use simplejson to parse the response.
Also, you should read this disclaimer very carefully. What you are trying to do is probably not cool with the owners of the web-site.
Its hard to know what to tell you wothout knowing where the number is coming from. It could be php or asp also, so you are going to have to figure out which language the number is in.

Categories

Resources