Problem
Hi! Most answers refer to using pytube when using Python. But the problem is that pytube doesn't work for many videos on youtube now. It's outdated, and I always get errors. I also want to be able to get other free videos from other sites that are not on youtube.
And I know there are free sites and paid programs that let you put in a url, and it'll download it for you. But I want to understand the process of what's happening.
The following code works for easy things. Obviously it can be improved, but let's keep it super simple...
import requests
good_url = 'https://www.w3schools.com/tags/movie.mp4'
bad_url = 'https://r2---sn-vgqsknes.googlevideo.com/videoplayback?expire=1585432044&ei=jHF_XoXwBI7PDv_xtsgN&ip=12.345.678.99&id=743bcee1c959e9cd&itag=244&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C278%2C298%2C299%2C302%2C303&source=youtube&requiressl=yes&mh=1w&mm=31%2C26&mn=sn-vgqsknes%2Csn-ab5szn7z&ms=au%2Conr&mv=m&mvi=4&pl=23&pcm2=yes&initcwndbps=3728750&vprv=1&mime=video%2Fwebm&gir=yes&clen=22135843&dur=283.520&lmt=1584701992110857&mt=1585410393&fvip=5&keepalive=yes&beids=9466588&c=WEB&txp=5511222&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cpcm2%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=ADKhkGMwRgIhAI3WtBFTf4kklX4xl859U8yzqavSzu-2OEn8tvHPoqAWAiEAlSDPhPdb5y4xPxPoXJFCNKr-h2c4jxKU8sAaaxxa7ok%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=ABSNjpQwRQIhAJkFK4xhfLraysF13jSZpHCoklyhJrwLjNSCQ1v7IzeXAiBLpVpYf72Gp-dlvwTM2tYzMcVl4Axzm2ARd7fN1gPW-g%3D%3D&alr=yes&cpn=EvFJNwgO-zNQOWkz&cver=2.20200327.05.01&ir=1&rr=12&fexp=9466588&range=15036316-15227116&rn=14&rbuf=0'
r = requests.get(good_url, stream=True)
with open('my_video.mp4', 'wb') as file:
file.write(r.content)
This works. But when I want a youtube video (and I obviously can't use a regular youtube url because the document request is different from the video request)...
Steps taken
I'll check the network tab in the dev tools, and it's all filled with a bunch of xhr requests. The headers for them always have the very long url for the request, accept-ranges: bytes, and content-type: video/webm, or something similar for mp4, etc.
Then I'll copy the url for that xhr, change the file extension to the correct one, and run the program.
Result
Sometimes that downloads a small chunk of the video with no sound (few seconds long), and other times it will download a bigger chunk but with no image. But I want the entire video with sound.
Question
Can someone please help me understand how to do this, and explain what's happening, whether it's on another site or youtube??
Why does good_url work, but not bad_url??? I figured it might be a timeout thing, so I got that xhr, and immediately tested it from python, but still no luck.
A related question (don't worry about answering this one, unless required)...
Sometimes youtube has Blob urls in the html too, example: <video src='blob:https://www.youtube.com/f4484c06-48ed-4531-a6ee-6a3ae0291d26'>...
I've read various answers for what blobs are, and I'm not understanding it, because it looks to me like a blob url is doing an xhr to change a url in the DOM, as if it was trying to do the equivalent of an internal redirect on a webserver for a private file that would be served based on view/object level permissions. Is that what's happening? Cause I don't see the point, especially when these videos are free? The way I've done that, such as with lazy loading, is to have a data-src attribute with the correct value, and an onload event handler runs a function switching the data-src value to the src value.
you can try this video
https://www.youtube.com/watch?v=GAvr5_EtOnI
from this bad_url remove &range=3478828-4655264
try this code:
import requests
good_url = 'https://www.w3schools.com/tags/movie.mp4'
bad_url = 'https://r1---sn-gvcp5mp5u5-q5js.googlevideo.com/videoplayback?expire=1631119417&ei=2ZM4Yc33G8S8owPa1YiYDw&ip=42.0.7.242&id=o-AKG-sNstgjok92lJp_o4pF_iJ2MWD4skzEvFcTLl8LX8&itag=396&aitags=133,134,135,136,137,160,242,243,244,247,248,278,394,395,396,397,398,399&source=youtube&requiressl=yes&mh=gB&mm=31,29&mn=sn-gvcp5mp5u5-q5js,sn-i3belney&ms=au,rdu&mv=m&mvi=1&pl=24&initcwndbps=82500&vprv=1&mime=video/mp4&ns=mh_mFY1G7qq0apTltxepCQ8G&gir=yes&clen=7874090&dur=213.680&lmt=1600716258020131&mt=1631097418&fvip=1&keepalive=yes&fexp=24001373,24007246&beids=9466586&c=WEB&txp=5531432&n=Q3AfqZKoEoXUzw&sparams=expire,ei,ip,id,aitags,source,requiressl,vprv,mime,ns,gir,clen,dur,lmt&lsparams=mh,mm,mn,ms,mv,mvi,pl,initcwndbps&lsig=AG3C_xAwRQIgYZMQz5Tc2kucxFsorprl-3e4mCxJ3lpX1pbX-HnFloACIQD-CuHGtUeWstPodprweaA4sUp8ZikyxySZp1m3zlItKg==&alr=yes&sig=AOq0QJ8wRQIhAJZg4q9vLal64LO6KAyWkpY1T8OTlJRd9wNXrgDpNOuQAiB77lqm4Ka9uz2CAgrPWMSu6ApTf5Zqaoy5emABYqCB_g==&cpn=5E-Sqvee9UG2ZaNQ&cver=2.20210907.00.00&rn=19&rbuf=90320'
r = requests.get(good_url, stream=True)
with open('my_video.mp4', 'wb') as file:
file.write(r.content)
I am new with Python and am trying to create a program that will read in changing information from a webpage. I'm not sure if what I'm wanting to do is something simple or possible but in my head it seems do-able and relatively. Specifically I am interested in pulling in the song names from Pandora as they change. I have tried looking into just reading in information from a webpage using something like
import urllib
import re
page = urllib.urlopen("http://google.com").read()
re.findall("Shopping", page)
['Shopping']
page.find("Shopping")
However this isn't really what I'm wanting due to it getting information that doesn't change. Any advice or a link to helpful information about reading in changing info from a webpage would be greatly appreciated.
The only way this is possible (without some type of advanced algorithm) is if there are some elements of the page that do NOT change, which you can specify your program to look for. Otherwise, I believe you will need some sort of advanced logic. After all, computers can only do what we instruct them to do. Sorry :)
I am working on Python3.4.4
I tried to use a Merriam-Webster API, and here is an example link:
http://www.dictionaryapi.com/api/v1/references/collegiate/xml/purple?key=bf534d02-bf4e-49bc-b43f-37f68a0bf4fd
There is a file under the tag, you will see after you open the url.
And I am wondering that how can I retrieve that wav file......
Because it is kind of just a string to me......
Thank you very much!
Okay, I just sort it out.
Usually you need to look at the instructions for the API, I look it up on the official website and it tells you that how you are going to retrieve that. In this case you are going to another url, and then wala
I'm learning a practice called 'web scraping' using python. From what I can tell so far the idea is to send out a request to load the site data from a server, store the DOM html in a variable, and then basically data mine the s*** out of the resulting string until you are able to quickly access exactly and only the information you need.
Well I'm ready to start fiddling with statements that might help me do the actual data mining, but first I need to see and understand all of the html in my string. After I've got the hang of it I won't care what the html looks like, but right now I need to be able to reference it to properly analyze my output. so far I've tried google, python.net, youtube, various blogs and etc. But they all look like alianeese.
I'm just looking for the typical stuff you know?
<html><head><meta><script src=""><style src=""><title></title></head><body><div class=""><img src=""></div><div><h1>my page</h1><li></li><li></li><li></li><li></li><li></li><li></li><p>click here</p></div></body></html>
You get what I'm saying? Just a website... that uses like... html... to render some simple structured data.
P.S. This is kind of neat. I went to give this post some tags and I discovered 'simple-html-dom'. So I googled it. Apparently it's some kind of language that lets you parse html from online sources in exactly the way I am trying to. I may check that out later, but I still want to figure out how to do this with python.
EDIT Actually something like this would work fine but it's just so big. I would prefer something smaller to work with.
While it would probably be nice to build your own web pages to use, you can also try looking for pages "optimized for lynx". Lynx is a text-only browser with which "simple" pages naturally work best.
Most of the links you'll find will be dead already, but I found this list for instance, which still has many alive and equally simple pages: http://www.put.com/dead.html (please ignore the content itself... there is no particular reason I chose this example other than that it probably works nicely for your purposes!)
Good day, how i can upload image/photo on method answerInlineQuery in parameter 'article'? how it' work on #imdb bot:
Because when i send image in parameter 'message_text' it's not always loaded.
That image from your example was grabbed from the link included in message.
You can set parse_mode=HTML and make a link to your image around unbreakable space like this: