For a long time I have been wanting to start a blog. But knowing myself, I know I won't update it often. So, I would like to club my blog with an "ebook". I'd like to write some beginner level ebook/course on Biostatistics.
Here are some examples of other blogs (+ebooks) that follow this approach:
A Byte of Python: http://www.swaroopch.com/notes/Python
BabyPips (FX trading): http://www.babypips.com/school/
Learn Python the Hard way (ebook only): http://learnpythonthehardway.org/book/
I could simply use WordPress or Tumblr or some blog site to create a blog and write my tutorials there. One post for each tutorial. BUT I am leaning towards creating a more structured book with table-of-contents, sequential chapters, prev/next navigation, and even quizzes (if possible) etc. Blog-post style is better suited for independent tutorials that don't follow a structured course/book format.
For Blog section, there is WordPress etc. But I haven't figured out how to create a structured ebook like these guys have created. What software/plugin/CMS/wiki plugin to use for this?
PS: eventually, I'd also like to convert my web ebook to PDF, MOBI, EPUB format. But that is probably not hard. Most important is to publish a web ebook like these guys have done it.
UPDATE:
Ideally, I just want to be able to login and click create-> new book or new chapter or something and just write like I'd write in a WYSIWYG editor. That script should take care of generating table of content and navigation etc. I think this probably resemebles wiki script but wiki-script probably won't take care of next/prev navigation.
http://www.swaroopch.com/notes/Python uses MediaWiki, which is okay for writing books. If you want to publish in several formats, a sophisticated markup language like Markdown or reStructuredText might be more appropriate. You can use different utilities to create static web pages using those, like Sphinx, Hyde or Jinja.
Related
The more research I do, the more grim the outlook becomes.
I am trying to Flat Save, or Static Save a webpage with Python. This means merging all the styles to inline properties, and changing all links to absolute URLs.
I've tried nearly every free conversion website, api, and even libraries on github. None are that impressive. The best python implementation I could find for flattening styles is https://github.com/davecranwell/inline-styler. I adapted that slightly for Flask, but the generated file isn't that great. Here's how it looks:
Obviously, it should look better. Here's what it should look like:
https://dzwonsemrish7.cloudfront.net/items/3U302I3Y1H0J1h1Z0t1V/Screen%20Shot%202012-12-19%20at%205.51.44%20PM.png?v=2d0e3d26
It seems like a neverending struggle dealing with Malformed html, unrecognized CSS properties, Unicode errors, etc. So does anyone have a suggestion on a better way to do this? I understand I can go to file -> save in my local browser, but when I am trying to do this en mass, and extract a particular xpath that's not really viable.
It looks like Evernote's web clipper uses iFrames, but that seems more complicated than I think it should be. But at least the clippings look decent on Evernote.
After walking away for a while, I managed to install a ruby library that flattens the CSS much much better than anything else I've used. It's the library behind the very slow web interface here http://premailer.dialect.ca/
Thank goodness they released the source on Github, it's the best hands down.
https://github.com/alexdunae/premailer
It flattens styles, creates absolute urls, works with a URL or string, and can even create plain text email templates. Very impressed with this library.
Update Nov 2013
I ended up writing my own bookmarklet that works purely client side. It is compatible with Webkit and FireFox only. It recurses through each node and adds inline styles then sends the flattened HTML to the clippy.in API to save to the user's dashboard.
Client Side Bookmarklet
It sounds like inline styles might be a deal-breaker for you, but if not, I suggest taking another look at Evernote Web Clipper. The desktop app has an Export HTML feature for web clips. The output is a bit messy as you'd expect with inline styles, but I've found the markup to be a reliable representation of the saved page.
Regarding inline vs. external styles, for something like this I don't see any way around inline if you're doing a lot of pages from different sites where class names would have conflicting style rules.
You mentioned that Web Clipper uses iFrames, but I haven't found this to be the case for the HTML output. You'd likely have to embed the static page as an iFrame if you're re-publishing on another site (legally I assume), but otherwise that shouldn't be an issue.
Some automation would certainly help so you could go straight from the browser to the HTML output, and perhaps for relocating the saved images to a single repo with updated src links in the HTML. If you end up working on something like this, I'd be grateful to try it out myself.
I want to update/rewrite a small (10 page), simple website; 8 pages are entirely static and could be written in html, 1 page has a contact form and the other has to display a filterable list of clubs. At the moment the site is written in classic asp and uses dreamweaver templates for consistent pages.
My requirements are
A "masterpage" / Templating system, so all shared page elements are written in only 1 place.
Lightweight / low overhead framework
To learn a new language
I could use ASP.NET Webforms or ASP.NET MVC to get the masterpage, but they both come with overhead that isn't necessary for such a small site and on my godaddy hosting spinning up a site from cold is noticeably slower than a pure html page.
The clubs page will show a list of clubs filterable by location, but I don't want to use a database to store this list - there is another site that has the official list of clubs, but the system isn't capable of providing this as a service or other consumable resource so I would need to scrape the details periodically and cache them locally or use an iframe or something
I thought maybe Python or django might be good candidates but don't know enough to know. I now think that what I'm looking for is a "micro web framework". I've taken a quick look at the Mercurial Web Server which is written in python and that looks quite straightforward, but I don't have access to the hosted web server on Go Daddy, so can't install python...
Edit
I need this to run on my current shared hosting with GoDaddy on (IIS7)
Edit2
The list of clubs is maintained by the official HQ website, they occasionally add / remove clubs. I just need to keep my list up to date with theirs. I have been checking every few months (if I remember) and updating an MS SQL database, but that's hugely over the top. I was thinking of just pulling the details down into a json format and persisting it in a text file (once a month, or something) which I could then use as the basis of a table with jQuery filtering on. The club details are just text; Name of Club, Main contact, phone number, address and email address.
I would also like publishing to be simple, commit the code to Mercurial (or git) and have that run the site. I know bitbucket (and github) both serve static page sites (I'm not sure how I would get a contact us form to work in that environment - but it's the deployment model I would like)
The site I am looking to update is Seika Dojo
There is no need to run monsters to serve 10 almost static pages. If you plan to pull and cache some data out of the web, it is a way to go to update static HTML.
As another author mentioned HTML5 can help you. Take a look at jQuery for table filtration. As for page regeneration with common elements consider either jekyll/hyde or org-mode (using batch processing mode with emacs). You have a plenty of languages to choose from.
Well, I don't know about the other frameworks, but I have good experiences building a small site in NancyFx.
NancyFx supports multiple view engines. You could use the SuperSimpleViewEngine; masterpages come out of the box.
Getting started with Nancy is super easy.
I think you already know .NET/C#, but Nancy takes a lot of advantage of new dynamic features which are fun to play with.
Python is my favorite language, but I wouldn't use it for creating a simple website. I would recommend you to go with ready made CMS solutions, like Wordpress.
You would learn something new.
You won't have to implement any features (CMS + plug-ins will provide all you need).
You will get all the support you need.
It is easy to deploy on any hosting (since it's PHP based -yeap, sorry, php-).
Cactus:
https://github.com/koenbok/Cactus
is my current favourite static site generator - it uses Django templates to create a set of pages (in the 'build' directory) from a set of templates (in the 'pages' directory) and all the usual images and css in a 'static' directory.
Do your filterable table with Javascript on the client - it doesn't sound too complex. This lovely table grid component:
http://datatables.net/
might be just the ticket.
Pelican wasn't mentioned. It produces static pages and is done in Python. It meets all of the stated criteria.
See http://wiki.python.org/moin/WebFrameworks and play with a couple of the choices that tickle your fancy. I would tend toward the lighter frameworks based on what you described.
Although not exactly on the topic, since you don't need entirely static website, StaticMatic
may be of interest as one of the best lightweight tools for static content generation.
Flask and web.py would both be good choices. I don't think that Django is really suited for such a small project but you can definitely use it.
Perhaps look at cherrypy or webpy? They are very minimalist, python web frameworks designed for something like this. (I think django is too big for this small of an app, IMHO)
Also, take a look at Sinatra if you want to learn some ruby.
Express is good for some javascript experience.
For single-page apps, backbone.js is popular and really powerful, but might not be what you want.
But most importantly, have fun learning a new language!
I've been looking around further and the goal of a templating system can be met by just using xml includes, or Embedded JavaScript to pull in the header / footer / menu sections. All the 'work' is then done on the client browser, so the web server only needs to serve up static files which is about as lightweight as you can get.
Based on my experience, any 8 page static website someday needs dynamic features so it is always a good idea to start from something extensible.
Then the first decision is to use a Framework or a CMS. This depends on your expectation of the site getting bigger in the future, your ability to develop custom codes to achieve dynamic features and the structure and the requirements of the project in hand.
When we need a CMS we use,
Orchard if ASP .NET MVC 3: Take a look at our site www.dreamrain.com which looks like a static website but actually uses Orchard themes and modules for the dynamic features. If you don't need themes and modules, you can still create a 8 page website and then you can extend it in the future. Btw, we built this site in 5 days with 4 hours of development effort
Note: Orchard may need Full Trust so check its documentation and ask GoDaddy if they allow it.
WordPress if PHP,
When we need a Framework we use,
ASP .NET MVC 3 for .NET,
Django or Pyramid for Python,
Zend for PHP,
Grails for Groovy/Java/J2EE,
Play for SCALA
I am new to programming and to Python itself. I have no programming experience. I have managed to read up on Python and done some fairly basic Python tutorial, now I am ready for my first project in Python.
I am basing my project around XBMC, I want to develop some addons for this awesome media center.
I have a few websites that I want to scrape and display in XBMC. One is a music website and one is a payed TV website which is only available to people with accounts with them. I have managed to scrape a website with feedparse but I have no idea how to output these titles and links to play in XBMC.
My question here is: where do I start, how do I construct the script for these websites, what tools/libraries/modules do I need. And what do I need to do to include it into XBMC.
On the general topic that has been asked a ton of times regarding webpage scraping, the common answer is always Mechanize/Beautiful Soup for python. That would allow you to actually get your data.
Once you have your data, its then just a matter of formatting it the way you want, for your xbmc app: http://wiki.xbmc.org/index.php?title=HOW-TO:Write_Python_Scripts_for_XBMC
Its a two step process.
Get your data from a source and format it into some common structure
Use the common structure to populate your elements in the xbmc script
What you actually want to do with your script will determine how you would use your data. If its just simply providing information, then that link above would pretty much explain it.
I am planning to develop a web-based application which could crawl wikipedia for finding relations and store it in a database. By relations, I mean searching for a name say,'Bill Gates' and find his page, download it and pull out the various information from the page and store it in a database. Information may include his date of birth, his company and a few other things. But I need to know if there is any way to find these unique data from the page, so that I could store them in a database. Any specific books or algorithms would be greatly appreciated. Also mentioning of good opensource libraries would be helpful.
Thank You
If you haven't already, you should have a look at DBpedia. Many categories of wiki articles have "Infoboxes" for the kinds of information you describe, and they've made a database out of it:
http://en.wikipedia.org/wiki/DBpedia
You might also leverage some of the information in Metaweb's Freebase (which overlaps and I believe may even integrate the info from DBpedia.) They have an API for querying their graph database, and there's a Python wrapper for it called freebase-python.
UPDATE: Freebase is no more; they were acquired by Google and eventually folded into the Google Knowledge Graph. There is an API but I don't think they have anything like the formal sync'ing Freebase had with public sources like Wikipedia. I'm personally disappointed in how this looks to have turned out. :-/
As for the natural language processing bit, if you do make headway on that problem you might consider these databases as repositories for any information you do mine.
You mention Python and Open Source, so I would investigate the NLTK (Natural Language Toolkit). Text mining and natural language processing is one of those things that you can do a lot with a dumb algorithm (eg. Pattern matching), but if you want to go a step further and do something more sophisticated - ie. Trying to extract information that is stored in a flexible manner or trying to find information that might be interesting but is not known a priori, then natural language processing should be investigated.
NLTK is intended for teaching, so it is a toolkit. This approach suits Python very well. There are a couple of books for it as well. The O'Reilly book is also published online with an open license. See NLTK.org
Jvc, there are existing python modules that can do everything you mentioned above.
For pulling information from webpages, I like to use Selenium, http://seleniumhq.org/projects/ide/. Basically, you can localize and retrieve information on any webpage using a number of identifiers (id, Xpath, etc).
However, like winwaed said, it can be inflexible if you are simply "pattern matching", especially since some websites use dynamic code- meaning the identifiers can change with each subsequent reload of the page. But, this problem can be solved by adding regular expressions, i.e. (.*), to your code. Check out this youtube video, http://www.youtube.com/watch?v=Ap_DlSrT-iE. Even though he is using BeautifulSoup to scrape the website- you can see how he uses regular expressions to pull the information from the page.
Also, I'm not sure what type of database you are working with, but pyodbc, http://code.google.com/p/pyodbc/, can work with SQL types, and also mainstream databases like Microsoft Access.
So, my advice is to look into Selenium for finding the info on the webpage, pyodbc to store and retrieve it, and regular expressions when the identifiers are dynamic.
What are my choices for frameworks for doing Python web development and having a nice language for writing templates for CSS/HTML? A key goal for me is not to have to run a server or install many extra dependencies -- I'd like something that works just by using CGI and hopefully does not force me to do any fancy reconfiguration of Apache etc.
My goal is to write pages that look pretty very easily using templates for generating nice looking HTML with CSS, as opposed to painfully writing out HTML using print statements, and have it be modular. I don't need fancy database support and I am not planning to complex forms for user input that I need to process.
The ideal framework will also have a set of templates written in it that I can use for my website.
I essentially just want to make pages programmatically from Python that look good using CSS/HTML without much work.
How can I do this? Something like Django for example would be overkill, since what I am doing is very simple. (Django is great, don't get me wrong, but my purposes are way too simple).
More specifics about my app:
I want to make a gallery of photos and also display Python code next to each photo. So I'd like to have a way to easily get syntax highlighting etc. in HTML for Python code. Just like Wordpress has many nice templates for blogs, I'd like a combination of web framework and templating language that has a gallery examples of components I can reuse, so that I don't have to write my own CSS/HTML for making menus/headers/other components of a page look good.
thanks.
Well, you're probably not going to find a framework with templates like that included, simply because that's out of most frameworks' scopes. The page structure, variables, and the like of any given Web application are going to be considerably different from each other, so good generic templates are hard to write. The reason people have so many templates and themes for Wordpress (which, though its authors sometimes promote it as a framework, is just an application) is because there are limits on what you can do with it. Frameworks don't have as many such limits. You are probably going to have to find the templates somewhere else and adapt them to the template language you want to use.
On the subject of template languages, as far as a good, modular template language is concerned, Jinja2 is hard to beat. It's fast, easy to write in, and powerful. I have taken quite a few templates from other Web sites and added the Jinja2 markup relatively effortlessly. Flask is a nice, light framework that works well with it, and it can deploy to CGI. And as for syntax highlighting, I'm going to have to go with Ignacio and recommend Pygments. All of these libraries are well-documented, so you should be able to figure them out easily.
Unfortunately, as much as I would like to have a gallery of reusable theme components, those are not easy to find. You're going to have to scrounge around the Web and hack stuff together yourself.
There's some docs, some tools, and some more tools. Plus, flup can turn any WSGI framework into a CGI app. And there's Pygments for syntax highlighting.