youtube-dl setting options in python

youtube-dl setting options in python - python

I am trying to save livestreams using youtube-dl API in python with the following code. Since it's a continuous live stream there is no end to the video, so I am using hls-use-mpegts as a way to periodically read the video for processing, that flag makes .mp4.part files playable.
Although the hls-use-mpegts option works well with the command-line thus:
youtube-dl -f worst <some URL> --retries infinite --continue --hls-use-mpegts
it doesn't seem to work with this code. I don't see any errors but don't see the file being saved in mpegts format. Do I have the options setting correct?
ydl_opts = {
'format': 'worst',
'retries': 99,
'continue': True,
'hls-use-mpegts': True
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])

It's because (sorry for saying that) the docs is somewhat good&shitt* at the same time.
I found every switches/cli-options that you were to use within Python you have to replace - (dash) to _ (sub dash).
Solution
In your case, hls_use_mpegts is the solution.
Why?
Read/explore about that here: https://github.com/ytdl-org/youtube-dl/blob/5208ae92fc3e2916cdccae45c6b9a516be3d5796/youtube_dl/downloader/common.py#L50
and
here: https://github.com/ytdl-org/youtube-dl/blob/5208ae92fc3e2916cdccae45c6b9a516be3d5796/youtube_dl/__init__.py#L428
or just browse as I do usually for those inconveniences: https://github.com/ytdl-org/youtube-dl/search?q=hls_use_mpegts%3A (fortunately GitHub does a really good job at this, and u don't have to have the src code downloaded to be searched)
Otherwise it's fun to use yt-dl, thanks for them!

Related

Youtube_DL download format (python 3)

Whilst reading through the Youtube_DL docs I saw an option for format and don't quite understand what it means and I cannot find the options.py file too.
| format: Video format code. See options.py for more information.
This is a quiet module and little posts exist (from what I can find) so for who knows about it - is this something you give the YoutubeDL class in the dictionary of options? Like this:
youtube_dl.YoutubeDL({'format':'mp3'})

Format refers to "video format options", if we look at options.py you'll see the argument option added here says help='Video format code, see the "FORMAT SELECTION" for all the info').
So yeah, you can read more about it in the FORMAT SELECTION

Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT SELECTION" for all the info
-F, --list-formats List all available formats of requested videos
from the manual you can use it as follow :
youtube-dl -F link_to_video
you will get all formats available in the video with a format code then you choose your requested format and download the video
youtube-dl -f format_code link_to_video

How can I handle reading a .json file in it that has comments with python?

Firstly, I understand that comments aren't valid json. That said, for some reason this .json file I have to process has comments at the start of lines and at the end of lines.
How can i handle this in python and basically load the .json file but ignore the comments so that I can process it? I am currently doing the following:
with open('/home/sam/Lean/Launcher/bin/Debug/config.json', 'r') as f:
config_data=json.load(f)
But this crashes at the json.load(f) command because the file f has comments in it.
I thought this would be a common problem but I can't find much online RE how to handle it in python. Someone suggested commentjson but that makes my script crash saying
ImportError: cannot import name 'dump'
When I import commentjson
Thoughts?
Edit:
Here is a snippet of the json file i must process.
{
// this configuration file works by first loading all top-level
// configuration items and then will load the specified environment
// on top, this provides a layering affect. environment names can be
// anything, and just require definition in this file. There's
// two predefined environments, 'backtesting' and 'live', feel free
// to add more!
"environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"
// algorithm class selector
"algorithm-type-name": "BasicTemplateAlgorithm",
// Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
"algorithm-language": "CSharp"
}

Switch into json5. The JSON 5 is a very small superset of JSON that supports comments and few other features you could just ignore.
import json5 as json
# and the rest is the same
It is beta, and it is slower, but if you just need to read some short configuration once when starting the program, this probably can be considered as an option. It is better to switch into another standard than not to follow any.

kind of a hack (because if there are // within the json data then it will fail) but simple enough for most cases:
import json,re
s = """{
// this configuration file works by first loading all top-level
// configuration items and then will load the specified environment
// on top, this provides a layering affect. environment names can be
// anything, and just require definition in this file. There's
// two predefined environments, 'backtesting' and 'live', feel free
// to add more!
"environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"
// algorithm class selector
"algorithm-type-name": "BasicTemplateAlgorithm",
// Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
"algorithm-language": "CSharp"
}
"""
result = json.loads(re.sub("//.*","",s,flags=re.MULTILINE))
print(result)
gives:
{'environment': 'backtesting', 'algorithm-type-name': 'BasicTemplateAlgorithm', 'algorithm-language': 'CSharp'}
apply regular expression to all the lines, removing double slashes and all that follows.
Maybe a state machine parsing the line would be better to make sure the // aren't in quotes, but that's slightly more complex (but doable)

I haven't used it personally but you can have a look on JSONComment python package which supports parsing a json file with comment. Use it in place of JsonParser
parser = JsonComment(json)
parsed_object = parser.loads(jsonString)

You can take out the comments with the following:
data=re.sub("//.*?\n","",data)
data=re.sub("/\\*.*?\\*/","",data)
This should remove all comments from the data. It could cause problems if there are // or /* inside your strings

Posting files to a chat through Slack API

I'm trying to deliver videos, through Slack API using Python's library slackclient.
I often use slack.api_call('chat.postMessage'...) and I am familiar with 'files.upload' but when I execute
slack = SlackClient(TOKEN)
slack.api_call('files.upload', file=open('video.mp4', 'rb')...)
the file is uploaded to the given channel, but is not posted as a message.
What I am trying to achieve is to create a message which I can send as a private message or to a channel that would look something like this
and maybe add some text above it if possible.
I've explored the Attachment section in the docs, but couldn't find anything related to files.
If there is a way to not supply the file in binary format, but as a link that would also be ok (as long as it is displayed in an embedded fashion).

How about this sample script? It uses io.BytesIO(f.read()) for the file. In order to use this, files:write:user has to be included in the scopes. About the text, you can import it using initial_comment. In my environment, attachments could not be used for files.upload. The API document is https://api.slack.com/methods/files.upload.
Script :
with open('./sample.mp4', 'rb') as f:
slack.api_call(
"files.upload",
channels='#sample',
filename='sample.mp4',
title='sampletitle',
initial_comment='sampletext',
file=io.BytesIO(f.read())
)
Result :
If I misunderstand your question, I'm sorry.

I came across this question because I had the same issue - my file would upload and I would get a response, but the file would not be posted to the channel I had sent. It turned out to be a poor job by me of reading the Slack API documentation. I had used chat.postMessage many times and included a single 'channel' argument. Here is that API: https://api.slack.com/methods/chat.postMessage
The files.upload method it wants a comma separated list of channels in a 'channels' argument. See https://api.slack.com/methods/files.upload Once I changed from 'channel' to 'channels' and made sure to pass it as a list, I was successfully posting the image to the channel I wanted.
To the original question then, in your link to the code you used (https://ibb.co/hwH5hF) try changing channel='bla'to channels=['bla']

This works for me:
import slack
client = slack.WebClient(token='xoxb-XXX')
with open('/path/to/attachment.jpeg', 'rb') as att:
r = client.api_call("files.upload", files={
'file': att,
}, data={
'channels': '#my_channel',
'filename': 'downloaded_filename.jpeg',
'title': 'Attachment\'s title',
'initial_comment': 'Attachment\'s description',
})
assert r.status_code == 200

Google drive python api: export never completes.

Summary:
I have an issue where sometimes a the google-drive-sdk for python does not detect the end of the document being exported. It seems to think that the google document is of infinite size.
Background, source code and tutorials I followed:
I am working on my own python based google-drive backup script (one with a nice CLI interface for browsing around). git link for source code
Its still in the making and currently only finds new files and downloads them (with 'pull' command).
To do the most important google-drive commands, I followed the official google drive api tutorials for downloading media. here
What works:
When a document or file is a non-google-docs document, the file is downloaded properly. However, when I try to "export" a file. I see that I need to use a different mimeType. I have a dictionary for this.
For example: I map application/vnd.google-apps.document to application/vnd.openxmlformats-officedocument.wordprocessingml.document when exporting a document.
When downloading google documents documents from google drive, this seems to work fine. By this I mean: my while loop with the code status, done = downloader.next_chunk() will eventual set done to true and the download completes.
What does not work:
However, on some files, the done flag never gets to true and script will download forever. This eventually amounts to several Gb. Perhaps I am looking for the wrong flag that says the file is complete when doing an export. I am surprised that google-drive never throws an error. Anybody know what could cause this?
Current status
For now I have exporting of google documents disabled in my code.
When I use scripts like "drive by rakyll" (at least the version I have) just puts a link to the online copy. I would really like to do a proper export so that my offline system can maintain a complete backup of everything on drive.
P.s. It's fine to put "you should use this service instead of the api" for the sake of others finding this page. I know that there are other services out there for this, but I'm really looking to explore the drive-api functions for integration with my own other systems.

OK. I found a pseudo solution here.
The problem is that the Google API never returns the Content-Length and the response is done in Chunks. However, either the chunk returned is wrong, or the Python API is not able to process it correctly.
What I did was, grab the code for the MediaIoBaseDownload from here
I left all the same, but changed this part:
if 'content-range' in resp:
content_range = resp['content-range']
length = content_range.rsplit('/', 1)[1]
self._total_size = int(length)
elif 'content-length' in resp:
self._total_size = int(resp['content-length'])
else:
# PSEUDO BUG FIX: No content-length, no chunk info, cut the response here.
self._total_size = self._progress
The else at the end is what I've added. I've also changed the default chunk size by setting DEFAULT_CHUNK_SIZE = 2*1024*1024. Also you will have to copy a few imports from that file, including this one from googleapiclient.http import _retry_request, _should_retry_response
Of course this is not a solution, it just says "if I don't understand the response, just stop it here". This will probably make some exports not work, but at least it doesn't kill the server. This is only until we can find a good solution.
UPDATE:
Bug is already reported here: https://github.com/google/google-api-python-client/issues/15
and as of January 2017, the only workaround is to not use MediaIoBaseDownload and do this instead (not suitable to large files):
req = service.files().export(fileId=file_id, mimeType=mimeType)
resp = req.execute(http=http)

I'm using this and it's works with the following library:
google-auth-oauthlib==0.4.1
google-api-python-client
google-auth-httplib2
This is the snippet I'm using:
from apiclient import errors
from googleapiclient.http import MediaIoBaseDownload
from googleapiclient.discovery import build
def download_google_document_from_drive(self, file_id):
try:
request = self.service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print('Download %d%%.' % int(status.progress() * 100))
return fh
except Exception as e:
print('Error downloading file from Google Drive: %s' % e)
You can write the file stream to a file:
import xlrd
workbook = xlrd.open_workbook(file_contents=fh.getvalue())

Python Advice for a beginner. Regex, Dictionaries etc?

I'm writing my second python script to try and parse the contents of a config file and would like some noob advice. I'm not sure if its best to use regex to parse my script since its multiple lines? I've also been reading about dictionaries and wondered if this would be good practice. I'm not necessarily looking for the code just a push in the right direction.
Example: My config file looks like this.
Job {
Name = "host.domain.com-foo"
Client = host.domain.com-fd
JobDefs = "DefaultJob"
FileSet = "local"
Write Bootstrap = "/etc/foo/host.domain.com-foo.bsr"
Pool = storage-disk1
}
Should I used regex, line splitting or maybe a module? If I had multiple jobs in my config file would I use a dictionary to correlate a job to a pool?

If you can change the configuration file format, you can directly write your file as a Python file.
config.py
job = {
'Name' : "host.domain.com-foo",
'Client' : "host.domain.com-fd",
'JobDefs' : "DefaultJob",
'FileSet' : "local",
'Write Bootstrap' : "/etc/foo/host.domain.com-foo.bsr",
'Pool' : 'storage-disk1'
}
yourscript.py
from config import job
print job['Name']

There are numorous existing alternatives for this task, json, pickle and yaml to name 3. Unless you really want to implement this yourself, you should use one of these. Even if you do roll your own, following the format of one of the above is still a good idea.
Also, it's a much better idea to use a parser/generator or similar tool to do the parsing, regex's are going to be harder to maintain and more inefficient for this type of task.

If your config file can be turned into a python file, just make it a dictionary and import the module.
Job = { "Name" : "host.domain.com-foo",
"Client" : "host.domain.com-fd",
"JobDefs" : "DefaultJob",
"FileSet" : "local",
"Write BootStrap" : "/etc/foo/host.domain.com-foo.bsr",
"Pool" : "storage-disk1" }
You can access the options by simply calling Job["Name"]..etc.
The ConfigParser is easy to use as well. You can create a text file that looks like this:
[Job]
Name=host.domain.com-foo
Client=host.domain.com-fd
JobDefs=DefaultJob
FileSet=local
Write BootStrap=/etc/foo/host.domain.com-foo.bsr
Pool=storage-disk1
Just keep it simple like one of the above.

ConfigParser module from the standard library is probably the most Pythonic and staight-forward way to parse a configuration file that your python script is using.
If you are restricted to using the particular format you have outlined, then using pyparsing is pretty good.

I don't think a regex is adequate for parsing something like this. You could look at a true parser, such as pyparsing. Or if the file format is within your control, you might consider XML. There are standard Python libraries for parsing that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.