I am trying to make a small crawler app.
So I am constantly looking into the code of page inside of Firefox and I download the same page using urllib inside of python.
Sample URL: https://store.steampowered.com/search/?term=arma 3
And I checked using two separate libraries of python. Same result. File saved by python app looks different when saved on PC from one saved with browser.
Is it browser making the code more readable or server treats differently than standard browser application in a different way?
Thanks!
I tried saving using 2 different python libraries for accessing the web page.
I won't show the code here, because it's 200K of HTML without new lines.
The HTML code is with new lines when downloading page using normal browser.
Related
I want to make a script in python that interacts with a webpage that has quite a lot of javascript in it (it's a webpage that computes a bunch of physics stuff).
I don't want my code to break if the page formatting changes and I want it to run offline so I would prefer my script to run on a local html copy of the page I got (all the JS code is accessible in the HTML source, there is no call to an external server). I wanted to use the requests library to do it, but it only works with URLs. Is there any library to do this? Note that I want to interact with the HTML (input values and look at the outputs etc..), I know that I can parse the file but that's not what I'm asking. I'm also totally new to web bots or anything related.
Right now I can open my .html version of the page offline with chrome and interact with it, so there has to be a way to automate this somehow. I'm also not against using something else than python if there is a better library for this in another language.
interesting question, best way I can think to do that is use a web framework and then just scrape the data using requests. I am familiar with flask and its simple to use but im sure there are other options as well
Running Python 3.6 and I'm having a whole lot of issues logging to a site primarily due to captcha. I really only need to search up URLs and retrieve the html on the page but I need to be logged in for certain additional information to appear on the accessible URLs.
I was using urllib to read the URLs but now I was looking for a solution to login and then request information. The automatic route won't seem to work due to those issues, so I'm looking for a method by which I am already logged in on an open browser and python opens up new tabs to search for URLs (the searches can be hidden, they don't have to literally open up new tabs). It appears that when I open new tabs manually on the site it still shows i'm logged in so If i can manually log in each time i want to run the script and then work based off that, it would actually work just fine.
Thanks
I am trying to write a web scraper in python but I have an issue, the contents of the site are not coded into the html, it seems like they are coming from a different source and I want to know if there's any python library that can fetch the contents for me or if there is such tool in any other language I'm willing to learn.
See: Is this possible to load the page after the javascript execute using python?
You'll have to execute the JS and whatever else it is that generates the HTML you want. You can do this in a lot of ways, but the answer I linked above suggests using Selenium Web Driver.
I have written a scraper tool in Python which when executed produces a CSV file of information. I wish to embed it in a HTML, so within the page the user is able to run it and then the results are displayed on the page from the CSV file. How can I do this?
If you only need to display the information on a website, no interaction back with your python script, all you need to do is just write the results into a HTML file and .format() it appropriately.
However, since this requires the user to be able to run and view the results with interaction with your python script, you would need a web server.
You could try the two most popular python based web servers that can accomplish this: Flask and Django or you could use the less common but more lightweight Simple HTTP Server.
But, do keep in mind that an easier alternative (if possible) will be to just use a GUI such as Tkinter or PyQt or EasyGUI.
I am writing a Python script that can take a Facebook URL and locally save an html file of that Facebook page. Based on the answer to this question: Inherent way to save web page source
I tried using urllib2, but the resulting html file is different (missing some parts) compared to the html file that get from manually right clicking on the Facebook page and saving the entire webpage. Do you know why they would be different and what other Python libraries I could use instead of urllib2?