I've been attempting to download a dataset downloaded off of PapersWithCode and when I run the download program I get the following error message:
usage: download_bios.py [-h] [-o OUT] [-r RETRIES] [-p N] wetpaths download_bios.py: error: the following arguments are required: wetpaths
and am not sure how to fix it
I attempted to reach out to a couple coder friends and the internet and none of them seemed to know what "wetpaths" were so I thought I would look here
The wetpaths argument of the download_bios.py script refers to the path of a WET file type used by CommonCrawl. The source code says that it expects a
common_crawl date like 2017-43 or a path to a -wet.paths file
So you should pass a valid date as an argument (e.g. 2022-49 is the latest crawl for Nov/Dec 2022).
To understand where the WET format comes from and why it's used, some background information is required.
Web crawls (e.g. those done by CommonCrawl) were originally stored in the internet ARChive (ARC) format. The Web ARChive (WARC) is a revision to this format that includes additional secondary data like metadata, abbreviated duplicate detection events, and later-date transformations. Since 2013, CommonCrawl has used the WARC format which allows for more efficient storage and processing of the archives. The full WARC specification can be found here.
One can think of WARC files as providing the raw data from the crawl process by CommonCrawl. Two additional formats are offered, namely WET and WAT:
The WAT file format contains the metadata about the records stored in the WARC format.
The WET file format contains the extracted plain text from the records stored in the WARC format.
I am trying to save livestreams using youtube-dl API in python with the following code. Since it's a continuous live stream there is no end to the video, so I am using hls-use-mpegts as a way to periodically read the video for processing, that flag makes .mp4.part files playable.
Although the hls-use-mpegts option works well with the command-line thus:
youtube-dl -f worst <some URL> --retries infinite --continue --hls-use-mpegts
it doesn't seem to work with this code. I don't see any errors but don't see the file being saved in mpegts format. Do I have the options setting correct?
ydl_opts = {
'format': 'worst',
'retries': 99,
'continue': True,
'hls-use-mpegts': True
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
It's because (sorry for saying that) the docs is somewhat good&shitt* at the same time.
I found every switches/cli-options that you were to use within Python you have to replace - (dash) to _ (sub dash).
Solution
In your case, hls_use_mpegts is the solution.
Why?
Read/explore about that here: https://github.com/ytdl-org/youtube-dl/blob/5208ae92fc3e2916cdccae45c6b9a516be3d5796/youtube_dl/downloader/common.py#L50
and
here: https://github.com/ytdl-org/youtube-dl/blob/5208ae92fc3e2916cdccae45c6b9a516be3d5796/youtube_dl/__init__.py#L428
or just browse as I do usually for those inconveniences: https://github.com/ytdl-org/youtube-dl/search?q=hls_use_mpegts%3A (fortunately GitHub does a really good job at this, and u don't have to have the src code downloaded to be searched)
Otherwise it's fun to use yt-dl, thanks for them!
Is there anyway I can extract localised name from ttf/otf font file?
A solution in Python would be preferred, but I am fine with any language.
Thank you very much.
Go to this page
http://wotsit.org/list.asp?al=T
Note there are several specifications for the file format of ttf files. Pick one. You need to decide which one is relevant for you. You will then have to devise a method (Ex in C with a struct) to read and extract what you need.
PyPI, the Python Package Index, is a good place to look for Python tools. I found a package called TTFQuery, which from the description sounds like it would do what you want.
It looks like font files may have multiple localised names.
Example with the fontconfig tools:
$ fc-query -f '%{fullname} (%{fullnamelang}): %{file}\n' /usr/share/fonts/truetype/unfonts-core/UnBatang.ttf
Un Batang,은 바탕 (en,ko): /usr/share/fonts/truetype/unfonts-core/UnBatang.ttf
I can select the korean (ko) name using the order in fullnamelang:
$ fc-query -f '%{fullname[1]}\n' /usr/share/fonts/truetype/unfonts-core/UnBatang.ttf
은 바탕
Is there a way to use the win32clipboard module to store a reference to a file in the windows clipboard in python. My goal is to paste an image in a way that allows transparency. If I drag and drop a 'png' file into OneNote or I copy the file and then paste it into OneNote, this seems to preserve transparency. As far as I can tell, the clipboard can't store transparent images which is why it has to be a reference to a file.
My research suggests that it might involve the win32clipboard.CF_HDrop attribute but I'm not sure.
So, just to summarize, my goal is to have a bit of python code which I can click and which uses a specific file on my Desktop named 'img.png' for instance. The result is that 'img.png' gets stored in the clipboard and can be pasted into other programs. Essentially, the same behavior as if I selected the file on the Desktop myself, right-clicked and selected 'Copy'.
EDIT:
This page seems to suggest there is a way using win32clipboard.CF_HDrop somehow:
http://timgolden.me.uk/pywin32-docs/win32clipboard__GetClipboardData_meth.html
It says "CF_HDROP" is associated with "a tuple of Unicode filenames"
from PythonMagick import Image
Image("img.png").write("clipboard:")
Grab the windows binaries for PythonMagick
I write this as an answer, although it's just a step that might help you, because comments don't have a lot of formatting options.
I wrote this sample script:
import win32clipboard as clp, win32api
clp.OpenClipboard(None)
rc= clp.EnumClipboardFormats(0)
while rc:
try: format_name= clp.GetClipboardFormatName(rc)
except win32api.error: format_name= "?"
print "format", rc, format_name
rc= clp.EnumClipboardFormats(rc)
clp.CloseClipboard()
Then I selected an image file in explorer and copied it; then, the script reports the following available clipboard formats:
format 49161 DataObject
format 49268 Shell IDList Array
format 15 ?
format 49519 DataObjectAttributes
format 49292 Preferred DropEffect
format 49329 Shell Object Offsets
format 49158 FileName
format 49159 FileNameW
format 49171 Ole Private Data
This “Preferred DropEffect” seems suspicious, although I'm far from a Windows expert. I would try first with FileNameW, though, since this might do the job for you (I don't have OneNote installed, sorry). It seems it expects as data only the full pathname encoded as 'utf-16-le' with a null character (i.e encoded as '\0\0') at the end.
I'm looking for a resource in python or bash that will make it easy to take, for example, mp3 file X and m4a file Y and say "copy X's tags to Y".
Python's "mutagen" module is great for manupulating tags in general, but there's no abstract concept of "artist field" that spans different types of tag; I want a library that handles all the fiddly bits and knows fieldname equivalences. For things not all tag systems can express, I'm okay with information being lost or best-guessed.
(Use case: I encode lossless files to mp3, then go use the mp3s for listening. Every month or so, I want to be able to update the 'master' lossless files with whatever tag changes I've made to the mp3s. I'm tired of stubbing my toes on implementation differences among formats.)
I needed this exact thing, and I, too, realized quickly that mutagen is not a distant enough abstraction to do this kind of thing. Fortunately, the authors of mutagen needed it for their media player QuodLibet.
I had to dig through the QuodLibet source to find out how to use it, but once I understood it, I wrote a utility called sequitur which is intended to be a command line equivalent to ExFalso (QuodLibet's tagging component). It uses this abstraction mechanism and provides some added abstraction and functionality.
If you want to check out the source, here's a link to the latest tarball. The package is actually a set of three command line scripts and a module for interfacing with QL. If you want to install the whole thing, you can use:
easy_install QLCLI
One thing to keep in mind about exfalso/quodlibet (and consequently sequitur) is that they actually implement audio metadata properly, which means that all tags support multiple values (unless the file type prohibits it, which there aren't many that do). So, doing something like:
print qllib.AudioFile('foo.mp3')['artist']
Will not output a single string, but will output a list of strings like:
[u'The First Artist', u'The Second Artist']
The way you might use it to copy tags would be something like:
import os.path
import qllib # this is the module that comes with QLCLI
def update_tags(mp3_fn, flac_fn):
mp3 = qllib.AudioFile(mp3_fn)
flac = qllib.AudioFile(flac_fn)
# you can iterate over the tag names
# they will be the same for all file types
for tag_name in mp3:
flac[tag_name] = mp3[tag_name]
flac.write()
mp3_filenames = ['foo.mp3', 'bar.mp3', 'baz.mp3']
for mp3_fn in mp3_filenames:
flac_fn = os.path.splitext(mp3_fn)[0] + '.flac'
if os.path.getmtime(mp3_fn) != os.path.getmtime(flac_fn):
update_tags(mp3_fn, flac_fn)
I have a bash script that does exactly that, atwat-tagger. It supports flac, mp3, ogg and mp4 files.
usage: `atwat-tagger.sh inputfile.mp3 outputfile.ogg`
I know your project is already finished, but somebody who finds this page through a search engine might find it useful.
Here's some example code, a script that I wrote to copy tags between
files using Quod Libet's music format classes (not mutagen's!). To run
it, just do copytags.py src1 dest1 src2 dest2 src3 dest3, and it
will copy the tags in sec1 to dest1 (after deleting any existing tags
on dest1!), and so on. Note the blacklist, which you should tweak to
your own preference. The blacklist will not only prevent certain tags
from being copied, it will also prevent them from being clobbered in
the destination file.
To be clear, Quod Libet's format-agnostic tagging is not a feature of mutagen; it is implemented on top of mutagen. So if you want format-agnostic tagging, you need to use quodlibet.formats.MusicFile to open your files instead of mutagen.File.
Code can now be found here: https://github.com/DarwinAwardWinner/copytags
If you also want to do transcoding at the same time, use this: https://github.com/DarwinAwardWinner/transfercoder
One critical detail for me was that Quod Libet's music format classes
expect QL's configuration to be loaded, hence the config.init line in my
script. Without that, I get all sorts of errors when loading or saving
files.
I have tested this script for copying between flac, ogg, and mp3, with "standard" tags, as well as arbitrary tags. It has worked perfectly so far.
As for the reason that I didn't use QLLib, it didn't work for me. I suspect it was getting the same config-related errors as I was, but was silently ignoring them and simply failing to write tags.
You can just write a simple app with a mapping of each tag name in each format to an "abstract tag" type, and then its easy to convert from one to the other. You don't even have to know all available types - just those that you are interested in.
Seems to me like a weekend-project type of time investment, possibly less. Have fun, and I won't mind taking a peek at your implementation and even using it - if you won't mind releasing it of course :-) .
There's also tagpy, which seems to work well.
Since the other solutions have mostly fallen off the net, here is what I came up, based on the python mediafile library (python3-mediafile in Debian GNU/Linux).
#!/usr/bin/python3
import sys
from mediafile import MediaFile
src = MediaFile (sys.argv [1])
dst = MediaFile (sys.argv [2])
for field in src.fields ():
try:
setattr (dst, field, getattr (src, field))
except:
pass
dst.save ()
Usage: mediafile-mergetags srcfile dstfile
It copies (merges) all tags from srcfile into dstfile, and seems to work properly with flac, opus, mp3 and so on, including copying album art.