Python search script in an HTML webpage - python

I have an HTML webpage. It has a search textbox. I want to allow the user to search within a dataset. The dataset is represented by a bunch of files on my server. I wrote a python script which can make that search.
Unfortunately, I'm not familiar with how can I unite the HTML page and a Python script.
The task is to put a python script into the html file so, that:
Python code will be run on the server side
Python code can somehow take the values from the HTML page as input
Python code can somehow put the search results to the HTML webpage as output
Question 1 : How can I do this?
Question 2 : How the python code should be stored on the website?
Question 3 : How it should take HTML values as input?
Question 4 : How can it output the results to the webpage? Do I need to install/use any additional frameworks?
Thanks!

There are too many things to get wrong if you try to implement that by yourself with only what the standard library provides.
I would recommend using a web framework, like flask or django. I linked to the quickstart sections of the comprehensive documentation of both. Basically, you write code and URL specifications that are mapped to the code, e.g. an HTTP GET on /search is mapped to a method returning the HTML page.
You can then use a form submit button to GET /search?query=<param> with the being the user's input. Based on that input you search the dataset and return a new HTML page with results.
Both frameworks have template languages that help you put the search results into HTML.
For testing purposes, web frameworks usually come with a simple webserver you can use. For production purposes, there are better solutions like uwsgi and gunicorn
Also, you should consider putting the data into a database, parsing files for each query can be quite inefficient.
I'm sure you will have more questions on the way, but that's what stackoverflow is for, and if you can ask more specific questions, it is easier to provide more focused answers.

I would look at the cgi library in python.

You should check out Django, its a very flexible and easy Python web-framework.

Related

Import python projects to a HTML page

Supose I have a python game and I want to "post" it on a site like Friv that I am making. Is there any way
for me import the "game.py" to the "site.html" and it show when I enter the site? I made a search and found to use django, but I would need to pass all the html code that I already have to other aplication.
The language of browsers is JavaScript.
There is a project called PyJs which translates Python code to JavaScript and is useful in your case that you want to run Python code inside web browsers.
Finally you can use your resulting JavaScript files to fill up your HTML page.
In addition to PyJs, there are numerous other projects that "run Python code in a browser" like Brython. However, any of them have not been standardized and if you want a robust game in your browser, use JavaScript!
There are number of projects that compile python into JavaScript in order to be run on browser.
Here are two links that might help
Web Browser Programming: https://wiki.python.org/moin/WebBrowserProgramming
PyGame Trinket: https://trinket.io/features/pygame
The way I integrate python code in an html is to use templating language like jinja2 but if you want to write full python code in html then use need to use a transpiler like PyJS but since you want to integrate the same code in multiple program, why not use FLASK it is much more easier.
and make an api. Django is an option but it has a steep learning curve. you can make the UI using HTML and get the data from python using API.

HTML scraping vs json file in aspnet framework?

I would like to download the data in this table:
http://portal.ujn.gov.rs/Izvestaji/IzvestajiVelike.aspx
I know how to use selenium to go through the pages and the CSS selectors are helpful enough that it shouldn't be too difficult to get all the data...
However, I am curious if anyone knows some way of getting to a json or whatever intermediary object is used to make the html? As in, whatever the raw data format file that gets exported by the server is? Is this possible with aspnet frameworks?
I have found such solutions in the past, but with much simpler web pages and web pages with get requests...
Thank you!
Taking a look at the website (I have no experience with Russian at all but not like it maters much.) It looks to me like it is pulling the information from a database via PHP (In my book the "old" way of doing it) not a JSON file. Which means that your basically stuck doing it the normal web scraping route like you said OR to find a SQL injection (which I am in NO WAY SUGGESTING as it is illegal?) to be able to bypass the limitations of there crappy search page.

Code for web crawling with Python 2.7.3 in mac terminal?

I am a social scientist and a complete newbie/noob when it comes to coding. I have searched through the other questions/tutorials but am unable to get the gist of how to crawl a news website targeting the comments section specifically. Ideally, I'd like to tell python to crawl a number of pages and return all the comments as a .txt file. I've tried
from bs4 import BeautifulSoup
import urllib2
url="http://www.xxxxxx.com"
and that's as far as I can go before I get an error message saying bs4 is not a module. I'd appreciate any kind of help on this, and please, if you decide to respond, DUMB IT DOWN for me!
I can run wget on terminal and get all kinds of text from websites which is awesome IF I could actually figure out how to save the individual output html files into one big .txt file. I will take a response to either question.
Try Scrapy. It is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
You will most likely encounter this as you go, but in some cases, if the site is employing 3rd party services for comments, like Disqus, you will find that you will not be able to pull the comments down in this manner. Just a heads up.
I've gone down this route before and have had to tailor the script to a particular site's layout/design/etc.
I've found libcurl to be extremely handy, if you don't mind doing the post-processing using Python's string handler functions.
If you don't need to implement it purely in Python, you can make use of wget's recursive mirroring option to handle the content pull, then write your python code to parse the downloaded files.
I'll add my two cents here as well.
The first things to check are that you installed beautiful soup, and that it lives somewhere that it can be found. There's all kinds of things that can go wrong here.
My experience is similar to yours: I work at a web startup, and we have a bunch of users who register, but give us no information about their job (which is actually important for us). So my idea was to scrape the homepage and the "About us" page from the domain in their email address, and try to put a learning algorithm around the data that I captured to predict their job. The results for each domain are stored as a text file.
Unfortunately (for you...sorry), the code I ended up with was a bit complicated. The problem is that you'll end up getting a lot of garbage when you do the scraping, and you'll have to filter it out. You'll also end up with encoding issues, and (assuming you want to do some learning here) you'll have to get rid of low-value words. The total code is about 1000 lines, and I'll post some important pieces that may help you out here, if you're interested.

Python scripting for XBMC

I am new to programming and to Python itself. I have no programming experience. I have managed to read up on Python and done some fairly basic Python tutorial, now I am ready for my first project in Python.
I am basing my project around XBMC, I want to develop some addons for this awesome media center.
I have a few websites that I want to scrape and display in XBMC. One is a music website and one is a payed TV website which is only available to people with accounts with them. I have managed to scrape a website with feedparse but I have no idea how to output these titles and links to play in XBMC.
My question here is: where do I start, how do I construct the script for these websites, what tools/libraries/modules do I need. And what do I need to do to include it into XBMC.
On the general topic that has been asked a ton of times regarding webpage scraping, the common answer is always Mechanize/Beautiful Soup for python. That would allow you to actually get your data.
Once you have your data, its then just a matter of formatting it the way you want, for your xbmc app: http://wiki.xbmc.org/index.php?title=HOW-TO:Write_Python_Scripts_for_XBMC
Its a two step process.
Get your data from a source and format it into some common structure
Use the common structure to populate your elements in the xbmc script
What you actually want to do with your script will determine how you would use your data. If its just simply providing information, then that link above would pretty much explain it.

How do I create a web interface to a simple python script?

I am learning python. I have created some scripts that I use to parse various websites that I run daily (as their stats are updated), and look at the output in the Python interpreter. I would like to create a website to display the results. What I want to do is run my script when I go to the site, and display a sortable table of the results.
I have looked at Django and am part way through the tutorial, but it seems like an awful lot of overhead for what should be a simple problem. I know that I could just write a Python script to output simple HTML, but is that really the best way? I would like to be able to sort the table by various columns.
I have years of programming experience (C, Java, etc.), but have very little web development experience.
Thanks in advance.
Have you considered Flask? Like Tornado, it is both a "micro-framework" and a simple web server, so it has everything you need right out of the box. http://flask.pocoo.org/
This example (right off the homepage) pretty much sums up how simple the code can be:
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello():
return "Hello World!"
if __name__ == "__main__":
app.run()
If you are creating non-interactive pages, you can easily setup any modern web server to execute your python script as a CGI. Instead of loading a static file, your web server will return the output of your python script.
This isn't very sophisticated, but if you are simply returning the output without needing browser submitted date, this is the easiest way (scaling under load is a different story).
You don't even need the "cgi" module from python, if you aren't receiving any data from the browser. Anything more complicated than this and you should use a web framework.
Examples and other methods
Simple Example: hardest part is webserver configuration
mod_python: Cut down on CGI overhead (otherwise, apache execs the python interpreter for each hit)
python module cgi: sending data to your python script from the browser.
Sorting
Javascript side sorting: I've used this javascript library to add sortable tables. This is the easiest way to add sorting without requiring additional work or another HTTP GET.
Instructions:
Download this file
Add to your HTML
Add class="sortable" to any table you'd like to make sortable
Click on the headers to sort
You might consider Tornado if Django is too much overhead. I've used both and agree that, if you have something simple/small to do and don't already know Django, it's going to exponentially increase your time to production. On the other hand, you can 'get' Tornado in a couple of hours and get something relatively simple done in a day or two with no prior experience with it. At least, that's been my experience with it.
Note that Tornado is still a tradeoff: you get a lot of simplicity in exchange for the huge cornucopia of features and shortcuts you get w/ Django.
PS - in addition to being a 'micro-framework', Tornado is also its own web server, so there's no mucking with wsgi/mod-cgi/fcgi.... just write your request handlers and run it. Be sure to see the demos included in the distribution.
Have you seen bottle framework? It is a micro framework and very simple.
If I correctly understood your requirements you might find Wooey very interesting.
Wooey is a A Django app that creates automatic web UIs for Python scripts:
http://wooey.readthedocs.org
Here you can check a demo:
https://wooey.herokuapp.com/
Django is a big webframework, meant to include loads of things becaus eyou often needs them, even though sometimes you don't.
Look at Pyramid, earlier known as BFG. It's much smaller.
http://pypi.python.org/pypi/pyramid/1.0a1
Other microframeworks to check out are here: http://wiki.python.org/moin/WebFrameworks
On the other hand, in this case it's probably also overkill. sounds like you can run the script once every ten minites, and write a static HTML file, and just use Apache.
If you are not willing to write your own tool, there is a pretty advanced tool for executing your scripts: http://rundeck.org/
It's pretty simple to start and can be configured for complex scenarios as well.
For the requirement of custom view (with sortable results), I believe you can implement a simple plugin for translating script output into html elements.
Also, for simple setups I could recommend my own tool: https://github.com/bugy/script-server. It doesn't have tons of features, but very easy for end-users and supports interactive execution.
If you don't need any input from the browser, this sounds like an almost-static webpage that just happens to change once a day. You'll only need some way to get html out of your script, in a place where your webserver can access it.)
So you'd use some form of templating; if you'll need some structure above the single page, there's static site / blog generators that you can feed your output in, say, Markdown format, and call their make html or the like.
You can use DicksonUI https://dicksonui.gitbook.io
DicksonUI is better
Or Remi gui(search in google)
DicksonUI is better.
I am the author of DicksonUI

Categories

Resources