I've been trying to make an application in python and I'm new to python.
Well, what I actually want to do is that . I want the feedparser to read the values from an RSS of a website... say reddit... and then I want to make that output as a stringand pass the value further to my code... my code right now..
import feedparser
import webbrowser
feed = feedparser.parse('http://www.reddit.com/.rss')
print feed['entries'][1]['title']
print feed['entries'][1]['link']
It is working right now.. it parses the feed and I get the output I want... Now, I want to use the "link" from the "print feed['entries'][1]['link'] " and use it in the code further...
how can I do so..? To be more specific.. I want to open that URL in my browser...
I concluded to something like this..
import feedparser
import webbrowser
feed = feedparser.parse('http://www.reddit.com/.rss')
print feed['entries'][1]['title']
print feed['entries'][1]['link']
mystring = 'feed['entries'][1]['link']'
webbrowser.open('mystring')
It is of course not working... Please Help... if you need to know anything else.. please let me know...
This is Reddit specific so it won't work on other RSS feeds but I thought this might help you.
from __future__ import print_function
import praw
r = praw.Reddit("my_cool_user_agent")
submissions = r.get_front_page()
for x in submissions:
print("Title: {0} URL: {1} Permalink: {2}".format(x, x.url, x.permalink))
print ("------------------------------------------------------------")
For Reddit there are 2 URLs that you might be interested in: the actual link that is submitted (the 'external' link... think imgur, etc) and the permalink to the Reddit post itself.
Instead of passing the feed[entries][1][link] as a string, just pass the value inside to the webbrowser.
Example -
webbrowser.open(feed['entries'][1]['link'])
Related
So I'm new to Python and am working on a simple program that will read a text file of protein names (PDB IDs) and create a URL to search a database (the PDB) for that protein and some associated data.
Unfortunately, as a newbie, I forgot to save my script, so I can't recall what I did to make my code work!
Below is my code so far:
import urllib
import urllib.parse
import urllib.request
import os
os.chdir("C:\\PythonProjects\\Samudrala Lab Projects")
protein_file = open("protein_list.txt","r")
protein_list = protein_file.read()
for item in protein_list:
item = item[0:4]
query_string =urlencode('customReportColumns','averageBFactor','resolution','experimentalTechnique','service=wsfile','format=csv')
**final_URL = url + '?pdbid={}{}'.format(url, item, query_string)**
print(final_URL)
The line of code I'm stuck on is starred.
The object "final_url" within the loop is missing some modification to indicate that I'd like the URL to search for the item as a pdbid. Can anyone give me a hint as to how I can tell the URL to plug in each item on the list as a PDBID?
I'm getting a type error indicating that it's not a valid non-string sequence or mapping object. Original post was edited to add this info.
Please let me know if this is an unclear question, or if you need any additional info.
Thanks!
How about something like this?
final_URL = "{}?pdbids={}{}".format(url, item, query_string)
Let me get there straight, I'm trying to make reader web app alike google reader, feedly etc... Hence i'm trying get rss by python using feedparser library. The thing is all website's rss is not in same format i mean some of them has no title, some of them has no publish date in RSS. However, i found that digg.com/reader is very useful digg's reader get rss with publish date and title too i wonder how this thing is work? Anyone got a clue or any little help would be appreciated
I've recently done some projects with the feed parser library and it can be very frustrating since many rss feeds are different. What works the most for me is something like this:
#to get posts from hackaday.com
import feedparser
feed = feedparser.parse("http://www.hackaday.com/blog/feed/") #get feed from hackaday
feed = feed['items'] #Get items in feed (this is the best way I've found)
print feed[0]['title'] #print post title
print feed[0]['summary'] #print post summary
print feed[0]['published'] #print date published
These are just a few of the different "fields" that feed parser has. To find the one you want just run these commands in the python shell and see what fits your needs.
you can use feedparser to know if a website have atom or rss, and then deal with each type.If a website has not a publish date or title, you can extract them using other librairies like goose-extractor (As an example :
from newspaper import Article
import feedparser
def extract_date(url):
article = Article(url)
article.download()
article.parse()
date=article.publish_date
return date
d=feedparser.parse("http://feeds.feedburner.com/webnewsit") #an italian website
d.entries[0] # the last entry
try :
d.entries[0].published
except AttributeError:
link_last_entry=d.entries[0].link
publish_date=extract_date(link_last_entry)
Let me know if you still don't get the publication date
I am trying to export a category from Turkish wikipedia page by following http://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export . Here is the code I am using;
# -*- coding: utf-8 -*-
import requests
from BeautifulSoup import BeautifulStoneSoup
from sys import version
link = "http://tr.wikipedia.org/w/index.php?title=%C3%96zel:D%C4%B1%C5%9FaAktar&action=submit"
def get(pages=[], category = False, curonly=True):
params = {}
if pages:
params["pages"] = "\n".join(pages)
if category:
params["addcat"] = 1
params["category"] = category
if curonly:
params["curonly"] = 1
headers = {"User-Agent":"Wiki Downloader -- Python %s, contact: Yaşar Arabacı: yasar11732#gmail.com" % version}
r = requests.post(link, headers=headers, data=params)
return r.text
print get(category="Matematik")
Since I am trying to get data from Turkish wikipedia, I have used its url. Other things should be self explanatory. I am getting the form page that you can use to export data instead of the actual xml. Can anyone see what am I doing wrong here? I have also tried making a get request.
There is no parameter named category, the category name should be in the catname parameter.
But Special:Export was not build for bots, it was build for humans. So, if you use catname correctly, it will return the form again, this time with pages from the category filled in. Then you are supposed to click "Submit" again, which will return the XML you want.
I think doing this in code would be too complicated. It would be easier if you used the API instead. There are some Python libraries that can help you with that: Pywikipediabot or wikitools.
Sorry my original answer was horribly flawed. I misunderstood the original intent.
I did some more experimenting because I was curious. It seems that the code you have above is not necessarily incorrect, it is, in fact, that the Special Export documentation is misleading. The documentation states that using catname and addcat will add the categories to the output, but instead it only lists the pages and categories within the specified catname inside an html form. It seems that wikipedia actually requires that the pages that you wish download be specified explicitly. Granted, there documentation doesn't necessarily appear to be very thorough on that matter. I would suggest that you parse the page for the pages within the category and then explicitly download those pages with your script. I do see an an issue with this approach in terms of efficiency. Due to the nature of Wikipedia's data, you'll get a lot of pages which are simply category pages of other pages.
As an aside, it could possibly be faster to use the actual corpus of data from Wikipedia which is available for download.
Good luck!
I am using the xgoogle python library to try to search as specific site. The code works for me when I do not use the "site:" indicator in the keyword search. If I do used it, the result set is empty. Does anyone have any thoughts how to get the code below to work?
from xgoogle.search import GoogleSearch, SearchError
gs = GoogleSearch("site:reddit.com fun")
gs.results_per_page = 50
results = gs.get_results()
print results
for res in results:
print res.title.encode("utf8")
print
A simple url with the "q" parameter (e.g. "http://www.google.com/search?&q=site:reddit.com+fun") works, so I assume it's some other problem.
If you are using pkrumins/xgoogle, a quick (and dirty) fix is to modify search.py line 240 as follows:
if not title or not url:
This is because Google changes their SERP layout, which breaks the _extract_description() function.
You can also take a look at this fork.
Put keyword before site:XX. It works for me.
I have never worked with json before. I am trying: http://api.worldbank.org//topics?format=JSON and make things with it, but I don't even know how to get started.
Following some manuals, I did this:
import urllib
import urllib2
import simplejson
urlb = 'http://api.worldbank.org/topics'
datab = urllib2.urlopen(urlb+'?'+ param)
resultb = simplejson.load(datab)
but I have no clue of how to parse and work on it now, how do I list the individual items? count them? filter them?. Is there any simple tutorial that you guys can point me to or advice? I checked diveintopython, json's website and most of the obvious ones, but I am still struggling with it. Is there any simple step-by-step guide that somebody could point me to?
Thanks
Trying printing resultb. Its just a python list with dictionaries inside it. Treat it like you would any list.