file upload from container using webdav results into empty file upload

file upload from container using webdav results into empty file upload - python

I'm attempting to wrap my brain around this because the identical code generates two different sets of outcomes, implying that there must be a fundamental difference between the settings in which the code is performed.
This is the code I use:
from webdav3.client import Client
if __name__ == "__main__":
client = Client(
{
"webdav_hostname": "http://some_address"
+ "project"
+ "/",
"webdav_login": "somelogin",
"webdav_password": "somepass",
}
)
ci = "someci"
version = "someversion"
directory = f'release-{ci.replace("/", "-")}-{version}'
client.webdav.disable_check = (
True # Need to be disabled as the check can not be performed on the root
)
f = "a.rst"
with open(f, "r") as fh:
contents = fh.read()
print(contents)
evaluated = contents.replace("#PIPELINE_URL#", "DUMMY PIPELINE URL")
with open(f, "w") as fh:
fh.write(evaluated)
print(contents)
client.upload(local_path=f, remote_path=f)
The file a.rst contains some text like:
Please follow instruction link below
#####################################
`Click here for instructions <https://some_website>`_
When I execute this code from macOS, a file with the same contents of a.rst appears on my website.
When I execute this script from within a container with a base image of Python 3.9 and the webdav dependencies, it creates a file on my website, but the content is always empty. I'm not sure why, but it could have something to do with the fact that I'm running it from within a Docker container that on top of it can't handle the special characters in the file (plain text seems to work though)?
Anyone have any ideas as to why this is happening and how to fix it?
EDIT:
It seems that the character ":" is creating the problem..

Related

how to do docx to pdf conversion using python library without subprocess in linux? [duplicate]

I'am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don't know what I'm looking for).
Right now I'm reading through this. Dont know how useful this is going to be.

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:
import sys
import os
import comtypes.client
wdFormatPDF = 17
in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
You could also use pywin32, which would be the same except for:
import win32com.client
and then:
word = win32com.client.Dispatch('Word.Application')

You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.
from docx2pdf import convert
convert("input.docx")
convert("input.docx", "output.pdf")
convert("my_docx_folder/")
pip install docx2pdf
docx2pdf input.docx output.pdf
Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf

I have tested many solutions but no one of them works efficiently on Linux distribution.
I recommend this solution :
import sys
import subprocess
import re
def convert_to(folder, source, timeout=None):
args = [libreoffice_exec(), '--headless', '--convert-to', 'pdf', '--outdir', folder, source]
process = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout)
filename = re.search('-> (.*?) using filter', process.stdout.decode())
return filename.group(1)
def libreoffice_exec():
# TODO: Provide support for more platforms
if sys.platform == 'darwin':
return '/Applications/LibreOffice.app/Contents/MacOS/soffice'
return 'libreoffice'
and you call your function:
result = convert_to('TEMP Directory', 'Your File', timeout=15)
All resources:
https://michalzalecki.com/converting-docx-to-pdf-using-python/

I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven's answer is right, but it will fail on my computer. There are two key points to fix it here:
(1). The first time when I created the 'Word.Application' object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the 'Word.Application' object will be deleted by OS. )
(2). After doing (1), the program will work well sometimes but may fail often. The crash error "COMError: (-2147418111, 'Call was rejected by callee.', (None, None, None, 0, None))" means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.
After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.
import os
import comtypes.client
import time
wdFormatPDF = 17
# absolute path is needed
# be careful about the slash '\', use '\\' or '/' or raw string r"..."
in_file=r'absolute path of input docx file 1'
out_file=r'absolute path of output pdf file 1'
in_file2=r'absolute path of input docx file 2'
out_file2=r'absolute path of outputpdf file 2'
# print out filenames
print in_file
print out_file
print in_file2
print out_file2
# create COM object
word = comtypes.client.CreateObject('Word.Application')
# key point 1: make word visible before open a new document
word.Visible = True
# key point 2: wait for the COM Server to prepare well.
time.sleep(3)
# convert docx file 1 to pdf file 1
doc=word.Documents.Open(in_file) # open docx file 1
doc.SaveAs(out_file, FileFormat=wdFormatPDF) # conversion
doc.Close() # close docx file 1
word.Visible = False
# convert docx file 2 to pdf file 2
doc = word.Documents.Open(in_file2) # open docx file 2
doc.SaveAs(out_file2, FileFormat=wdFormatPDF) # conversion
doc.Close() # close docx file 2
word.Quit() # close Word Application

unoconv (writen in Python) and OpenOffice running as a headless daemon.
https://github.com/unoconv/unoconv
http://dag.wiee.rs/home-made/unoconv/
Works very nicely for doc, docx, ppt, pptx, xls, xlsx.
Very useful if you need to convert docs or save/convert to certain formats on a server.

As an alternative to the SaveAs function, you could also use ExportAsFixedFormat which gives you access to the PDF options dialog you would normally see in Word. With this you can specify bookmarks and other document properties.
doc.ExportAsFixedFormat(OutputFileName=pdf_file,
ExportFormat=17, #17 = PDF output, 18=XPS output
OpenAfterExport=False,
OptimizeFor=0, #0=Print (higher res), 1=Screen (lower res)
CreateBookmarks=1, #0=No bookmarks, 1=Heading bookmarks only, 2=bookmarks match word bookmarks
DocStructureTags=True
);
The full list of function arguments is: 'OutputFileName', 'ExportFormat', 'OpenAfterExport', 'OptimizeFor', 'Range', 'From', 'To', 'Item', 'IncludeDocProps', 'KeepIRM', 'CreateBookmarks', 'DocStructureTags', 'BitmapMissingFonts', 'UseISO19005_1', 'FixedFormatExtClassPtr'

It's worth noting that Stevens answer works, but make sure if using a for loop to export multiple files to place the ClientObject or Dispatch statements before the loop - it only needs to be created once - see my problem: Python win32com.client.Dispatch looping through Word documents and export to PDF; fails when next loop occurs

If you don't mind using PowerShell have a look at this Hey, Scripting Guy! article. The code presented could be adopted to use the wdFormatPDF enumeration value of WdSaveFormat (see here).
This blog article presents a different implementation of the same idea.

I have modified it for ppt support as well. My solution support all the below-specified extensions.
word_extensions = [".doc", ".odt", ".rtf", ".docx", ".dotm", ".docm"]
ppt_extensions = [".ppt", ".pptx"]
My Solution: Github Link
I have modified code from Docx2PDF

I tried the accepted answer but wasn't particularly keen on the bloated PDFs Word was producing which was usually an order of magnitude bigger than expected. After looking how to disable the dialogs when using a virtual PDF printer I came across Bullzip PDF Printer and I've been rather impressed with its features. It's now replaced the other virtual printers I used previously. You'll find a "free community edition" on their download page.
The COM API can be found here and a list of the usable settings can be found here. The settings are written to a "runonce" file which is used for one print job only and then removed automatically. When printing multiple PDFs we need to make sure one print job completes before starting another to ensure the settings are used correctly for each file.
import os, re, time, datetime, win32com.client
def print_to_Bullzip(file):
util = win32com.client.Dispatch("Bullzip.PDFUtil")
settings = win32com.client.Dispatch("Bullzip.PDFSettings")
settings.PrinterName = util.DefaultPrinterName # make sure we're controlling the right PDF printer
outputFile = re.sub("\.[^.]+$", ".pdf", file)
statusFile = re.sub("\.[^.]+$", ".status", file)
settings.SetValue("Output", outputFile)
settings.SetValue("ConfirmOverwrite", "no")
settings.SetValue("ShowSaveAS", "never")
settings.SetValue("ShowSettings", "never")
settings.SetValue("ShowPDF", "no")
settings.SetValue("ShowProgress", "no")
settings.SetValue("ShowProgressFinished", "no") # disable balloon tip
settings.SetValue("StatusFile", statusFile) # created after print job
settings.WriteSettings(True) # write settings to the runonce.ini
util.PrintFile(file, util.DefaultPrinterName) # send to Bullzip virtual printer
# wait until print job completes before continuing
# otherwise settings for the next job may not be used
timestamp = datetime.datetime.now()
while( (datetime.datetime.now() - timestamp).seconds < 10):
if os.path.exists(statusFile) and os.path.isfile(statusFile):
error = util.ReadIniString(statusFile, "Status", "Errors", '')
if error != "0":
raise IOError("PDF was created with errors")
os.remove(statusFile)
return
time.sleep(0.1)
raise IOError("PDF creation timed out")

I was working with this solution but I needed to search all .docx, .dotm, .docm, .odt, .doc or .rtf and then turn them all to .pdf (python 3.7.5). Hope it works...
import os
import win32com.client
wdFormatPDF = 17
for root, dirs, files in os.walk(r'your directory here'):
for f in files:
if f.endswith(".doc") or f.endswith(".odt") or f.endswith(".rtf"):
try:
print(f)
in_file=os.path.join(root,f)
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
doc.SaveAs(os.path.join(root,f[:-4]), FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
word.Visible = True
print ('done')
os.remove(os.path.join(root,f))
pass
except:
print('could not open')
# os.remove(os.path.join(root,f))
elif f.endswith(".docx") or f.endswith(".dotm") or f.endswith(".docm"):
try:
print(f)
in_file=os.path.join(root,f)
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
doc.SaveAs(os.path.join(root,f[:-5]), FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
word.Visible = True
print ('done')
os.remove(os.path.join(root,f))
pass
except:
print('could not open')
# os.remove(os.path.join(root,f))
else:
pass
The try and except was for those documents I couldn't read and won't exit the code until the last document.

You should start from investigating so called virtual PDF print drivers.
As soon as you will find one you should be able to write batch file that prints your DOC files into PDF files. You probably can do this in Python too (setup printer driver output and issue document/print command in MSWord, later can be done using command line AFAIR).

import docx2txt
from win32com import client
import os
files_from_folder = r"c:\\doc"
directory = os.fsencode(files_from_folder)
amount = 1
word = client.DispatchEx("Word.Application")
word.Visible = True
for file in os.listdir(directory):
filename = os.fsdecode(file)
print(filename)
if filename.endswith('docx'):
text = docx2txt.process(os.path.join(files_from_folder, filename))
print(f'{filename} transfered ({amount})')
amount += 1
new_filename = filename.split('.')[0] + '.txt'
try:
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
except:
os.mkdir(files_from_folder + r'\txt_files')
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
elif filename.endswith('doc'):
doc = word.Documents.Open(os.path.join(files_from_folder, filename))
text = doc.Range().Text
doc.Close()
print(f'{filename} transfered ({amount})')
amount += 1
new_filename = filename.split('.')[0] + '.txt'
try:
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
except:
os.mkdir(files_from_folder + r'\txt_files')
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
word.Quit()
The Source Code, see here:
https://neculaifantanaru.com/en/python-full-code-how-to-convert-doc-and-docx-files-to-pdf-from-the-folder.html

I would suggest ignoring your supervisor and use OpenOffice which has a Python api. OpenOffice has built in support for Python and someone created a library specific for this purpose (PyODConverter).
If he isn't happy with the output, tell him it could take you weeks to do it with word.

How can I get Helm's binary from their GitHub repo?

I'm trying to download Helm's latest release using a script. I want to download the binary and copy it to a file. I tried looking at the documentation, but it's very confusing to read and I don't understand this. I have found a way to download specific files, but nothing regarding the binary. So far, I have:
from github import Github
def get_helm(filename):
f = open(filename, 'w') # The file I want to copy the binary to
g = Github()
r = g.get_repo("helm/helm")
# Get binary and use f.write() to transfer it to the file
f.close
return filename
I am also well aware of the limits of queries that I can do since there are no credentials.

For Helm in particular, you're not going to have a good time since they apparently don't publish their release files via GitHub, only the checksum metadata.
See https://github.com/helm/helm/releases/tag/v3.6.0 ...
Otherwise, this would be as simple as:
get the JSON data from https://api.github.com/repos/{repo}/releases
get the first release in the list (it's the newest)
look through the assets of that release to find the file you need (e.g. for your architecture)
download it using your favorite HTTP client (e.g. the one you used to get the JSON data in the first step)
Nevertheless, here's a script that works for Helm's additional hoops-to-jump-through:
import requests
def download_binary_with_progress(source_url, dest_filename):
binary_resp = requests.get(source_url, stream=True)
binary_resp.raise_for_status()
with open(dest_filename, "wb") as f:
for chunk in binary_resp.iter_content(chunk_size=524288):
f.write(chunk)
print(f.tell(), "bytes written")
return dest_filename
def download_newest_helm(desired_architecture):
releases_resp = requests.get(
f"https://api.github.com/repos/helm/helm/releases"
)
releases_resp.raise_for_status()
releases_data = releases_resp.json()
newest_release = releases_data[0]
for asset in newest_release.get("assets", []):
name = asset["name"]
# For a project using regular releases, this would be simplified to
# checking for the desired architecture and doing
# download_binary_with_progress(asset["browser_download_url"], name)
if desired_architecture in name and name.endswith(".tar.gz.asc"):
tarball_filename = name.replace(".tar.gz.asc", ".tar.gz")
tarball_url = f"https://get.helm.sh/{tarball_filename}"
return download_binary_with_progress(
source_url=tarball_url, dest_filename=tarball_filename
)
raise ValueError("No matching release found")
download_newest_helm("darwin-arm64")

rst2html on full python project

How can I configure rst2html to run on a full Python project instead of a single file?
I'm used to epydoc generating my docs, but thinking to use reStructuredText because it's PyCharm default (I know we as well could change PyCharm to epytext)

rst2html cannot do that by itself. It's a ridiculously thin wrapper over docutils.core.publish_cmdline. Check it's source:
https://github.com/docutils-mirror/docutils/blob/master/tools/rst2html.py
If you want to process several .rst files, you could write a short shell script that calls rst2html on each file.
Alternatively you can write a Python script. Here is an example:
# https://docs.python.org/3/library/pathlib.html
from pathlib import Path
# https://pypi.org/project/docutils/
import docutils.io, docutils.core
def rst2html(source_path):
# mostly taken from
# https://github.com/getpelican/pelican/
pub = docutils.core.Publisher(
source_class=docutils.io.FileInput,
destination_class=docutils.io.StringOutput)
pub.set_components('standalone', 'restructuredtext', 'html')
pub.process_programmatic_settings(None, None, None)
pub.set_source(source_path=str(source_path))
pub.publish()
html = pub.writer.parts['whole']
return html
SRC_DIR = Path('.')
DST_DIR = Path('.')
for rst_file in SRC_DIR.iterdir():
if rst_file.is_file() and rst_file.suffix == '.rst':
html = rst2html(rst_file)
with open(DST_DIR / (rst_file.stem + '.html'), 'w') as f:
f.write(html)
A useful reference could be:
http://docutils.sourceforge.net/docs/api/publisher.html

Run script on multiple files sequentially?

I have this code which along with the rest of it runs on a single file in a folder.
I want to try and run this code on 11 files in the folder. I have to pass parameters to the whole script via an .sh script which I've written.
I've searched on here and found various solutions which have not worked.
def get_m3u_name():
m3u_name = ""
dirs = os.listdir("/tmp/")
for m3u_file in dirs:
if m3u_file.endswith(".m3u") or m3u_file.endswith(".xml"):
m3u_name = m3u_file
return m3u_name
def remove_line(filename, what):
if os.path.isfile(filename):
file_read = open(filename).readlines()
file_write = open(filename, 'w')
for line in file_read:
if what not in line:
file_write.write(line)
file_write.close()
m3ufile = get_m3u_name()
I have tried a different method of deleting the file just processed then looping the script to run again on the next file as I can do this manually but when I use
os.remove(m3ufile)
I get file not found either method of improving my code would be of great help to me. I'm just a newbie at this but pointing me in the right direction would be of great help.

File management

I am working on python and biopython right now. I have a file upload form and whatever file is uploaded suppose(abc.fasta) then i want to pass same name in execute (abc.fasta) function parameter and display function parameter (abc.aln). Right now i am changing file name manually, but i want to have it automatically.
Workflow goes like this.
----If submit is not true then display only header and form part
--- if submit is true then call execute() and get file name from form input
--- Then displaying result file name is same as executed file name but only change in extension
My raw code is here -- http://pastebin.com/FPUgZSSe
Any suggestions, changes and algorithm is appreciated
Thanks

You need to read the uploaded file out of the cgi.FieldStorage() and save it onto the server. Ususally a temp directory (/tmp on Linux) is used for this. You should remove these files after processing or on some schedule to clean up the drive.
def main():
import cgi
import cgitb; cgitb.enable()
f1 = cgi.FieldStorage()
if "dfile" in f1:
fileitem = f1["dfile"]
pathtoTmpFile = os.path.join("path/to/temp/directory", fileitem.filename)
fout = file(pathtoTmpFile, 'wb')
while 1:
chunk = fileitem.file.read(100000)
if not chunk: break
fout.write (chunk)
fout.close()
execute(pathtoTmpFile)
os.remove(pathtoTmpFile)
else:
header()
form()
This modified the execute to take the path to the newly saved file.
cline = ClustalwCommandline("clustalw", infile=pathToFile)
For the result file, you could also stream it back so the user gets a "Save as..." dialog. That might be a little more usable than displaying it in HTML.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

file upload from container using webdav results into empty file upload - python

Related

how to do docx to pdf conversion using python library without subprocess in linux? [duplicate]

How can I get Helm's binary from their GitHub repo?

rst2html on full python project

Run script on multiple files sequentially?

File management

Categories

Resources