I am new to Python & AppEngine.
I am trying to use Feedparser to cache a feed to a datastore.
My code is at http://pastebin.com/uWPdWUm2
For some reason it doesn't work - it does not add the data to the datastore.
Any ideas? I am stumped.
You just forgot to use parenthesis in your model declaration.
Your code:
class FeedEntry3(db.Model):
title = db.StringProperty
link = db.StringProperty
content = db.TextProperty
What it should be:
class FeedEntry3(db.Model):
title = db.StringProperty()
link = db.StringProperty()
content = db.TextProperty()
Are you sure you are getting the values correcty or at all from feed parser? Have you tried to log them. Also for purpose of discussion if you think x.put is not working then separate that out and test that only e.g.
x = FeedEntry3()
x.title = "test title"
x.link = "test link"
x.content = "test content"
x.put()
Have you tried that, does that work? if that works most probably you are not getting values from feedparser, debug and log that.
Related
I am using a REST API with Python to get custom field content from WordPress. When I pull a non custom field like "Title", I can successfully render this with the code:
post_title = post_data_dict.get('title', {}).get('rendered', '')
However, with a custom field, I haven't been able to figure this out. Right now my code looks like this:
video_content = post_data_dict.get('acf', {}).get('content', '')
However this contains the HTML formatting.
Any hints would be greatly appreciated. Thank you.
I found this post that worked for me:
Python code to remove HTML tags from a string
I just used the function:
def cleanhtml(raw_html):
cleanr = re.compile('<.*?>')
cleantext = re.sub(cleanr, '', raw_html)
return cleantext
So, my new code looked like this:
video_content = cleanhtml(post_data_dict.get('acf', {}).get('content', ''))
I am trying to use PRAW to get new posts from subreddits on Reddit. The following code snippet shows how I get new items on a particular subreddit.
Is there a way to also get the URL of the particular submission?
submissions = r.get_subreddit('todayilearned')
submission = submissions.get_new(limit=1)
sub = [str(x) for x in submission]
print sub
PRAW allows you to do this:
To get the submitted link you can use submission.url
[submission] = submissions.get_new(limit=1)
print submission.url
Or if you're looking for the URL for the actual post to Reddit then you can use permalink
[submission] = submissions.get_new(limit=1)
print submission.permalink
The documentation lists a short_link property that returns a shortened version of the url to the submission. It does not appear that the full url is similarly provided, though it seems that it could be reconstructed from the subreddit name and the submission's id, which is stored in submission.id.
In summary, use:
[submission] = submissions.get_new(limit=1)
submission.short_link
to get a link to the submission.
can anyone help me with "extracting" stuff from site using Python? Here is the info :
I have folder name with set of numbers (they are ID of item) and i have to use that ID for entering page and then "scrap" info from page to my notepad... It's like this : http://www.somesite.com/pic.mhtml?id=[ID]... I need to exctract picture link (picture link always have ID.jpg at the end of the file)from it and write it in notepad and then replace that txt name with name of the picture... Picture is always in title tags... Thanks in advance...
What you need is a data scraper - http://www.crummy.com/software/BeautifulSoup/ will help you pull data off of websites. You can then load that data into a variable, write it to a file, or do anything you normally do with data.
You could try parsing the html source for images.
Try something similar:
class Parser(object):
__rx = r'(url|src)="(http://www\.page\.com/path/?ID=\d*\.(jpeg|jpg|gif|png)'
def __crawl(self, url):
images = []
code = urllib.urlopen(url).read()
for line in code.split('\n'):
imagesearch = re.search(self.__rx, line)
if imagesearch:
image = '%s.%s' % (imagesearch.group(2), imagesearch.group(4))
images.append(image)
return images
it's untestet, you may want to check the regex
I am having trouble getting a video entry which includes a link rel="edit". I need such an entry in order to be able to call DeleteVideoEntry(...) on it.
I am retrieving the video using GetYouTubeVideoEntry(youtube_id=XXXXXXX). My yt_service is initialized with a username, password, and a developer key. I use ProgrammaticLogin. This part seems to work fine. I use the same yt_service to upload said video earlier. Also, if I change the developer key to something bogus (during debugging) and try to authenticate, I get a 403 error. This leads me to believe that authentication works OK.
Needsless to say, the video entry retrieved with GetYouTubeVideoEntry(youtube_id=XXXXXXX) does not contain the edit link and I cannot use the entry in a DeleteVideoEntry(...) call.
Is there some special way to get a video entry which will contain a link element with a rel="edit"? Can anyone suggest some way to resolve my issue? Could this possibly be a bug?
Update:
For the records, when I tried getting the feed of all my uploads, and then looping through the video entries, the video entries do have an edit link. So using this works:
uri = 'http://gdata.youtube.com/feeds/api/users/%s/uploads' % username
feed = yt_service.GetYouTubeVideoFeed(uri)
for entry in feed.entry:
yt_service.DeleteVideoEntry(entry)
But this does not:
entry = yt_service.GetYouTubeVideoEntry(video_id = video.youtube_id)
yt_service.DeleteVideoEntry(entry)
Using the same yt_service.
I've just deleted youtube video using gdata and ProgrammaticLogin()
Here is some steps to reproduce:
import gdata.youtube.service
yt_service = gdata.youtube.service.YouTubeService()
yt_service.developer_key = 'developer_key'
yt_service.email = 'email'
yt_service.password = 'password'
yt_service.ProgrammaticLogin()
# video_id should looks like 'iu6Gq-tUsTc'
uri = 'https://gdata.youtube.com/feeds/api/users/%s/uploads/%s' % (username, video_id)
entry = yt_service.GetYouTubeUserEntry(uri=uri)
response = yt_service.DeleteVideoEntry(entry)
print response # True
yt_service.GetYouTubeVideoFeed(uri) works because GetYouTubeVideoFeed doesn't check uri and just calls self.Get(uri, ...) but originaly, I think, it expected 'https://gdata.youtube.com/feeds/api/videos' uri.
vice versa yt_service.GetYouTubeVideoEntry() use YOUTUBE_VIDEO_URI = 'https://gdata.youtube.com/feeds/api/videos' but this entry doesn't contains rel="edit"
Hope that helps you out
You can view the HTTP headers of the generated requests by setting the debug flag to true. This is as simple as:
yt_service = gdata.youtube.service.YouTubeService()
yt_service.debug = True
You can read about this in the documentation here.
As the title suggests, I'm trying to pass parameters into my cgi script so that when you type in (for example): www.helloworld.com/cgi-bin/world.py?post=101, the script will display that post
I've tried the following:
link = 'test' % postNumber
link = cgi.FieldStorage()
id = link.getvalue('post')
print id
but the value of id is nothing. It's like it's not reading the link properly or something.
Please help!
How about:
id = link["post"].value
print id