I have a lot of external links from users, and I need to scrap somehow the video content and generate thumbnail for it to display on my site.
I know this is a simple task with django-oembed, but the list of providers is limited. I need to support sites without oembed too.
So the main task is to determine if page contains the video or no, and generate thumbnail for it.
Can anybody suggest the best way?
You can use Pymedia to extract a frame from the video and PIL to create the thumbnail like suggested here
Related
I have some videos and I want to create closed-Captions(cc) for theme.
Two ways have come to my mind:
use some libraries like autosub to generate subtitles.(i try it!it's very inaccurate way and full of bad translations!)
upload each video to youtube and let it to generate cc for you.(it is a boring process and you have to upload each video one by one!)
I want an appropriate way to do it offline.like autosub.is it exists any Library or Api to use youtube translation mechanism?
Thanks.
I have a parent file type that is folderish, and I would like to include a thumbnail of the first page of a child pdf in the template. Can someone roughly outline the tools and process you can imagine would achieve this, so that I can investigate further?
Getting out the first page of pdf can be achieved by using ghostscript.
This is an example script which forms an gostscript command and stores the images. I took this from collective.pdfpeek. Which by the way could solve your problem right away :-)
Until few days ago I would have recommended you not to use it, since it was a little bit buggy, but they recently shipped a new version, so give it a try! I'm not sure whether they now support DX or not.
So the workflow for you should be.
Uploading a PDF
Subscribe modified/creation events.
create image of first page using ghostscript (check my command, or collective.pdfpeek)
store it as blob (NamedBlobImage) on your uploaded pdf.
Also implement some queueing like collective.pdfpeek to not block all your threads with ghostscript commands.
OR
Give collective.pdfpeek a shot!
BTW:
imho on a large scale the preview generation for pdfs needs to be implemented as a service, which stores/manages the images for you.
I am trying to see if it is possible to make a thumbnail pic from an online video (videos run in flash) programmatically. Just to elaborate on what I am exactly looking to do:
I have a link to to a video that plays in a browser using adobe flash. I would like to take a screenshot of the video (at any frame would do for now) using any programming language (prefer python). Also I have noticed that when one uploads a video on youtube, it automatically generates a thumbnail. Any pointers as to if this is a viable option and how to go about it?
Thanks
Similar to Reddit's r/pic sub-reddit, I want to aggregate media from various sources. Some sites use OEmbed specs to expose media on the page but not all sites do it. I was browsing through Reddit's source because essentially they 'scrape' links that users submit, retrieve images, videos etc. They create thumbnails which are then displayed along the link on their site. Now, I would like to do something similar and I looked at their code[1] and it seems that they have custom scrapers for each domain that they recognize and then they have a generic Scraper class that uses simple logic to get images from any domain (basically they retrieve the web-page, parse the html and then determine the largest image on the page which they then use to generate a thumbnail).
Since it's open source I can probably reuse the code for my application but unfortunately I have chosen Perl as this is a hobby project and I'm trying to learn Perl. Is there a Perl module which has similar functionality? If not, is there a Perl module that is similar to Python Imaging Library? It would be handy to determine the image sizes without actually downloading the whole image & thumbnail generation.
Thanks!
[1] https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py
Image::Size is the specialised module for determining image sizes from various format. It should be enough to read the first 1000 octets or so from a resource, enough for the diverse image headers, into a buffer and operating on that. I have not tested this.
I do not know any general scraping module that has an API for HTTP range requests in order to avoid downloading the whole image resource, but it is easy to subclass WWW::Mechanize.
Try PerlMagick, installation instruction is also listed there.
I'm writing a CLI for a music-media-platform. One of the features is going to be that you can directly play YouTube videos from the CLI. I don't really have an idea of how to do it, but this one sounded the most reasonable:
I'm going to use of those sites where you can download music from YouTube, for example, http://keepvid.com/ and then I directly stream and play this, but I have one problem. Is there any Python library capable of doing this and if so, do you have any concrete examples?
I've been looking, but I found nothing, even not with GStreamer.
You need two things to be able to download a YouTube video, the video id, which is represented by the v= section of the URL, and a hidden field t= which is present in the page source. I have no idea what this t value is, but it's what you need :)
You can then download the video using a URL in the format;
http://www.youtube.com/get_video?video_id=*******&t=*******
Where the stars represent the values obtained.
I'm guessing you can ask for the video id from user input, as it's straightforward to obtain. Your program would then download the HTML source for that video, parse the source for the t value, then download the video using the newly constructed URL.
For example, if you open this link in your browser, it should download the video, or you can use a downloading program such as Wget;
http://www.youtube.com/get_video?video_id=3HrSN7176XI&t=vjVQa1PpcFNM4c8MbEhsnGaNvYKoYERIJ-hK7ErLpUI=
It appears that KeepVid is simply a JavaScript bookmarklet that links you to a KeepVid download page where you can then download the YouTube video in any one of a variety of formats. So, unless you want to figure out how to stream the file that it links you to, it's not easily doable. You'd have to scrape the page returned and figure out which URL you wanted to download, and then you'd have to stream from that URL (and some of the formats may or may not be streamable anyway).
And as an aside, even though they don't have a terms of service specified, I'd imagine that since they appear to be mostly advertisement-supported that abusing their functionality by going around their advertisement-supported webpage would be ethically questionable.