How to run darktable-cli in parallel?

How to run darktable-cli in parallel? - python

I have a sequence of about 1000 CR2 images which I need to convert to TIFF16. The following command line works:
darktable-cli input_image.CR2 colorcard.xmp output.tiff --core --conf plugins/imageio/format/tiff/bpp=16
But when I want to execute that command in parallel via the Python code below, I am getting the following error after one image is converted:
[init] the database lock file contains a pid that seems to be alive in your system: 31531
[init] database is locked, probably another process is already using it
ERROR: can't acquire database lock, aborting.
Here is my Python code:
#!/usr/bin/env python3
import glob
import shlex
import subprocess
import multiprocessing as mp
from multiprocessing import Pool
def call_proc(cmd):
subprocess.run(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
app = '/Applications/darktable.app/Contents/MacOS/darktable-cli '
xmp = ' colorcard.xmp '
opt = ' --core --conf plugins/imageio/format/tiff/bpp=16 --conf plugins/imageio/storage/disk/overwrite=true --library /tmp/darktable.db'
raw_images = glob.glob('indata/*')
procs = []
for raw_image in raw_images:
tif_image = raw_image.replace('.CR2', '.tif').replace('indata', 'outdata')
cmd = app + raw_image + xmp + tif_image + opt
procs.append(cmd)
pool = Pool(mp.cpu_count())
pool.map(call_proc, procs)
pool.close()
pool.join()
Platform:
Darktable Version: darktable-cli 3.0.0
OS: macOS Mojave 10.14.3 (18D42)
NVIDIA GeForce GTX 680MX 2048 MB
I found the following thread but had no luck with the given solution.
Any help is highly appreciated.

In the thread, #miguev gave the answer which help me. Which is not pretty but works. I am adding for each image a tmp directory and pass that to the --configdir attr like so:
for i in raw_images:
os.mkdir('/tmp/' + str(os.path.basename(i).split('.')[0]))
cmds_list = []
for raw_image in raw_images:
tif_image = raw_image.replace('.CR2', '.tif').replace('indata', 'outdata')
cmd = app + raw_image + ' ' + xmp + ' ' + tif_image + opt + ' --configdir /tmp/' + str(os.path.basename(raw_image).split('.')[0])
cmds_list.append(cmd)
All you need to do at the end when you are done clean up behind you.

Related

How to kill a process using process name in python script

My requirement is to kill a process. I have the process name.
Below is my code:
def kill_process(name):
os.system(f"TASKKILL /F /IM {name}")
It works for Windows but not for Mac. My requirement is that it should work for both the OS.
Is there a way to make the above code OS independent or how can I code it for Mac?
Any help is appreciated.
Thank you.
Regards,
Rushikesh Kadam.

psutil supports a number of platforms (including Windows and Mac).
The following solution should fit the requirement:
import psutil
def kill_process(name):
for proc in psutil.process_iter():
if proc.name() == name:
proc.kill()

You can try this
import os, signal
def kill_process(name):
for line in os.popen("ps ax | grep " + name + " | grep -v grep"):
fields = line.split()
pid = fields[0]
os.kill(int(pid), signal.SIGKILL)

How to resolve BrokenPipeError when using multiprocessing in Python

I am working on learning multiproccessing, and have had no issues until I encountered this one when working with queues. Essentially, the queue gets filled up, but then something seems to go wrong and it crashes.
I am running python 3.6.8 on Windows 10. multiprocessing has seemed to work when I was not using queues (I built a similar code snippet to the below without queues to learn).
import glob, multiprocessing, os
def appendFilesThreaded(inputDirectory, outputDirectory, inputFileType=".txt", outputFileName="appended_files.txt"):
files = glob.glob(inputDirectory+'*'+inputFileType)
fileQueue = multiprocessing.Queue()
for file in files:
fileQueue.put(file)
threadsToUse = max(1, multiprocessing.cpu_count()-1)
print("Using " + str(threadsToUse) + " worker threads.")
processes = []
for i in range(threadsToUse):
p = multiprocessing.Process(target=appendFilesWorker, args=(fileQueue,outputDirectory+"temp-" + str(i) + outputFileName))
processes.append(p)
p.start()
for process in processes:
process.join()
with open(outputDirectory + outputFileName, 'w') as outputFile:
for i in range(threadsToUse):
with open(outputDirectory+"temp-" + str(i) + outputFileName) as fileToAppend:
outputFile.write(fileToAppend.read())
os.remove(outputDirectory+"temp-" + str(i) + outputFileName)
print('Done')
def appendFilesWorker(fileQueue, outputFileNamePath):
with open(outputFileNamePath, 'w') as outputFile:
while not fileQueue.empty:
with open(fileQueue.get()) as fileToAppend:
outputFile.write(fileToAppend.read())
if __name__ == '__main__':
appendFilesThreaded(inputDir,outputDir)
I would expect this to successfully append files, but it crashes. It results in BrokenPipeError: [WinError 232] The pipe is being closed

Found the issue: calling queue.empty is incorrect. You need parentheses (e.g. queue.empty())
I'll leave my embarrassing mistake up in case it helps others :)

youtbe-dl multiple downloads at the same time

I have a python app, where I have a variable that contains multiple urls.
At this moment I use something like this:
for v in arr:
cmd = 'youtube-dl -u ' + email + ' -p ' + password + ' -o "' + v['path'] + '" ' + v['url']
os.system(cmd)
But this way I download just one video after another. How can I download, let's say 3 videos at the same time ? (Is not from youtube so no playlist or channels)
I not necessary need multi threading in python, but to call the youtube-dl multiple times, splitting the array. So from a python perspective can be on thread.

Use a Pool:
import multiprocessing.dummy
import subprocess
arr = [
{'vpath': 'example/%(title)s.%(ext)s', 'url': 'https://www.youtube.com/watch?v=BaW_jenozKc'},
{'vpath': 'example/%(title)s.%(ext)s', 'url': 'http://vimeo.com/56015672'},
{'vpath': '%(playlist_title)s/%(title)s-%(id)s.%(ext)s',
'url': 'https://www.youtube.com/playlist?list=PLLe-WjSmNEm-UnVV8e4qI9xQyI0906hNp'},
]
email = 'my-email#example.com'
password = '123456'
def download(v):
subprocess.check_call([
'echo', 'youtube-dl',
'-u', email, '-p', password,
'-o', v['vpath'], '--', v['url']])
p = multiprocessing.dummy.Pool(concurrent)
p.map(download, arr)
multiprocessing.dummy.Pool is a lightweight thread-based version of a Pool, which is more suitable here because the work tasks are just starting subprocesses.
Note that instead of os.system, subprocess.check_call, which prevents the command injection vulnerability in your previous code.
Also note that youtube-dl output templates are really powerful. In most cases, you don't actually need to define and manage file names yourself.

I achieved the same thing using threading library, which is considered a lighter way to spawn new processes.
Assumption:
Each task will download videos to a different directory.
import os
import threading
import youtube_dl
COOKIE_JAR = "path_to_my_cookie_jar"
def download_task(videos, output_dir):
if not os.path.isdir(output_dir):
os.makedirs(output_dir)
if not os.path.isfile(COOKIE_JAR):
raise FileNotFoundError("Cookie Jar not found\n")
ydl_opts = {
'cookiefile': COOKIE_JAR,
'outtmpl': f'{output_dir}/%(title)s.%(ext)s'
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(videos)
if __name__ == "__main__":
output_dir = "./root_dir"
threads = []
for playlist in many_playlists:
output_dir = f"{output_dir}/playlist.name"
thread = threading.Thread(target=download_task, args=(playlist, output_dir))
threads.append(thread)
# Actually start downloading
for thread in threads:
thread.start()
# Wait for all the downloads to complete
for thread in threads:
thread.join()

Python: Parallel execution pysphere commands

My current for loop does 1 by 1 removing snapshots from my 16 VMs
for vmName in vmList:
snapshots = vmServer.get_vm_by_name(vmName).get_snapshots()
for i in range(len(snapshots)-3):
snapshotName = snapshots[i].get_name()
print "Deleting snapshot " + snapshotName + " of " + vmName
vmServer.get_vm_by_name(vmName).delete_named_snapshot(snapshotName)
I need to run it in parallel(so it wouldn't wait finish of previous job to start next one)
I was trying to apply "multiprocessing", here's full code:
import argparse
from pysphere import VIServer # Tested with vCenter Server 5.5.0 and pysphere package 0.1.7
from CONFIG import * # Contains username and password for vCenter connection, list of VM names to take snapshot
from multiprocessing.pool import ThreadPool as Pool
def purgeSnapshotStage(vmList):
# Connect to vCenter
vmServer = VIServer()
vmServer.connect("VM_ADDRESS", username, password)
snapshots = vmServer.get_vm_by_name(vmName).get_snapshots()
for i in range(len(snapshots) - 3):
snapshotName = snapshots[i].get_name()
print "Deleting snapshot " + snapshotName + " of VM: " + vmName
vmServer.get_vm_by_name(vmName).delete_named_snapshot(snapshotName)
vmServer.disconnect()
# Get the environment to delete snapshot from command line
parser = argparse.ArgumentParser(description="Take snapshot of VMs for stage or stage2")
parser.add_argument('env', choices=("stage", "stage2", "stage3"), help="Valid value stage or stage2 or stage3")
env = parser.parse_args().env
vmList = globals()[env + "VmList"]
pool_size = 5 # your "parallelness"
pool = Pool(pool_size)
for vmName in vmList:
pool.apply_async(purgeSnapshotStage, (vmList,))
pool.close()
pool.join()
But there is a mistake, because it's trying to execute "remove" command only on last one.
Didn't find good guide about multiprocessing, and can't find how to debug it.
Need help to find mistake.

You have an error in here:
for vmName in vmList:
pool.apply_async(purgeSnapshotStage, (vmList,))
It should be:
for vmName in vmList:
pool.apply_async(purgeSnapshotStage, (vmName,))
And then in your function header you need this:
def purgeSnapshotStage(vmList):
Then, there might be other errors in your code.
Generally: I doubt that parallelizing this might give you any performance benefit. Your bottleneck will be the vmware server. It will not be faster when you start many delete jobs at the same time.

How to show the rsync --progress in web browser using DJango?

I am writing a Python/Django application which transfer files from server to the local machine using rsync protocol. We will be dealing with the large files so the progress bar is mandatory. --progress argument in rsync command does this beautifully. All the detail progresses are shown in the terminal. How can I show that progress in web browser? Is there any hook function or something like that? Or Can I store the progress in a log file, call it and update it every one minute or so?

The basic principle is to run rsync in subprocess, expose a web API and get updates via javascript
Here's an example.
import subprocess
import re
import sys
print('Dry run:')
cmd = 'rsync -az --stats --dry-run ' + sys.argv[1] + ' ' + sys.argv[2]
proc = subprocess.Popen(cmd,
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
remainder = proc.communicate()[0]
mn = re.findall(r'Number of files: (\d+)', remainder)
total_files = int(mn[0])
print('Number of files: ' + str(total_files))
print('Real rsync:')
cmd = 'rsync -avz --progress ' + sys.argv[1] + ' ' + sys.argv[2]
proc = subprocess.Popen(cmd,
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
while True:
output = proc.stdout.readline()
if 'to-check' in output:
m = re.findall(r'to-check=(\d+)/(\d+)', output)
progress = (100 * (int(m[0][1]) - int(m[0][0]))) / total_files
sys.stdout.write('\rDone: ' + str(progress) + '%')
sys.stdout.flush()
if int(m[0][0]) == 0:
break
print('\rFinished')
But this only shows us the progress in our standard output (stdout).
We can however, modify this code to return the progress as a JSON output and this output can be made available via a progress webservice/API that we create.
On the client side use, we will then write javascript (ajax) to contact our progress webservice/API from time-to-time, and using that info update something client side e.g. a text msg, width of an image, color of some div etc

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to run darktable-cli in parallel? - python

Related

How to kill a process using process name in python script

How to resolve BrokenPipeError when using multiprocessing in Python

youtbe-dl multiple downloads at the same time

Python: Parallel execution pysphere commands

How to show the rsync --progress in web browser using DJango?

Categories

Resources