Save Visio Document as HTML - python

I'm trying to convert a lot of Visio files from .vsd to .html, but each file has a lot of pages, so I need to convert all pages to a single .html file.
Using the Python code below, I'm able to convert to PDF, but what I really need is HTML. I noticed I can use win32com.client.Dispatch("SaveAsWeb.VisSaveAsWeb"), but how to use it? Any ideas?
import sys
import win32com.client
from os.path import abspath
f = abspath(sys.argv[1])
visio = win32com.client.Dispatch("Visio.InvisibleApp")
doc = visio.Documents.Open(f)
doc.ExportAsFixedFormat(1, '{}.pdf'.format(f), 0, 0)
visio.Quit()
exit(0)

Visio cannot do that. You cannot "convert all pages into a single HTML file". You'll have a "root" file and a folder of "supporting" files.
VisSaveAsWeb is pretty well documented, no need to guess:
https://msdn.microsoft.com/en-us/vba/visio-vba/articles/vissaveasweb-object-visio-save-as-web
-- update
With python, it turned out to be not that trivial to deal with SaveAsWeb. It seems to default to a custom interface (non-dispatch). I don't think it's possible deal with this using win32com library, but with comtypes seems to work (comtypes library is building the client based on the type library, i.e. it also supports "custom" interfaces):
import sys
import comtypes
from comtypes import client
from os.path import abspath
f = abspath(sys.argv[1])
visio = comtypes.client.CreateObject("Visio.InvisibleApp")
doc = visio.Documents.Open(f)
comtypes.client.GetModule("{}\\SAVASWEB.DLL".format(visio.Path))
saveAsWeb = visio.SaveAsWebObject.QueryInterface(comtypes.gen.VisSAW.IVisSaveAsWeb)
webPageSettings = saveAsWeb.WebPageSettings.QueryInterface(comtypes.gen.VisSAW.IVisWebPageSettings)
webPageSettings.TargetPath = "{}.html".format(f)
webPageSettings.QuietMode = True
saveAsWeb.AttachToVisioDoc(doc)
saveAsWeb.CreatePages()
visio.Quit()
exit(0)
Other than that, you can try "command line" interface:
http://visualsignals.typepad.co.uk/vislog/2010/03/automating-visios-save-as-web-output.html
import sys
import win32com.client
from os.path import abspath
f = abspath(sys.argv[1])
visio = win32com.client.Dispatch("Visio.InvisibleApp")
doc = visio.Documents.Open(f)
visio.Addons("SaveAsWeb").Run("/quiet=True /target={}.htm".format(f))
visio.Quit()
exit(0)
Other than that you could give a try to my visio svg-export :)

Related

Using python to run Latex compiler, why does it hang if there are errors in the latex?

I have a python script that takes the (latex source) content of a google doc and creates a pdf.
This is the function I use for the pdf:
# -*- coding: utf-8 -*-
#!/usr/bin/python
"""One of the main activiating files of IMPS - this downloads all the files in a directory, creates the input.tex file and archives them a tar file
TODO
Before we send to stackoverflow we should make sure that everthing is in a function and that the If __main__ trigger is working
I'd also like to have doc strings for all of the methods
"""
import os
import glob
import tarfile
import time
import datetime
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import urlparse
import argparse
import re
def generate_pdf(filename,destination=""):
"""
Genertates the pdf from string
from http://stackoverflow.com/q/19683123/170243
"""
import subprocess
import shutil
current = os.getcwd()
this_dir=os.path.dirname(os.path.realpath(__file__))
os.chdir(this_dir+"/latex")
proc=subprocess.Popen(['pdflatex',filename+'.tex'],stdout=subprocess.PIPE)
# subprocess.Popen(['pdflatex',tex])
temp=proc.communicate()
#Have to do it twice so that the contents pages are right
proc=subprocess.Popen(['pdflatex',filename+'.tex'],stdout=subprocess.PIPE)
temp=proc.communicate()
shutil.copy(filename+'.pdf',"../../live/"+destination+filename+ str(datetime.datetime.now()).replace(".", "-").replace(":", "_") + ".pdf")
trace_file = open("../../live/"+destination+"trace.txt", "w")
print >>trace_file, temp[0]
print >>trace_file, temp[1]
trace_file.close()
os.chdir(current)
Everything runs fine if the latex has NO errors, but if there is a problem, the function hands and nothing gets done. What I want is that problems are noted and exported into the trace. Any ideas what's going wrong?
When it encounters errors, pdflatex asks the user about how to proceed, so your script "hangs" because it is expecting input. Use pdflatex -interaction=nonstopmode -halt-on-error. See this TeX StackExchange question.
I think what you are missing is that you need to also need to setup a pipe for STDERR. This will let you see the error messages from pdflatex. You could also try explicitly setting the buffer size to zero when calling Popen.
self.child = subprocess.Popen(command
,bufsize=0
,stdout=subprocess.PIPE
,stderr=subprocess.PIPE)

Python: Given a link, how to save locally if the link is an image and do not save if it is not an image?

I am currently using
import urllib
urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")
Is there a way to see if the link contains a pic or not, if not then no need to download, if so, then download.
Thanks!
The extension does not mean a file is an actual image, if you want to check that the file is indeed an image you could use imagemagik identify:
from subprocess import check_output, CalledProcessError
from tempfile import NamedTemporaryFile
import requests
from shutil import move
r = requests.get("http://www.digimouth.com/news/media/2011/09/google-logo.jpg").content
tmp = NamedTemporaryFile("wb", delete=False, dir=".")
tmp.write(r)
try:
out = check_output(["identify", "-format", "%m", tmp.name])
print(out)
move(tmp.name, "whatever.{}".format(out.lower()))
except CalledProcessError:
tmp.delete = True
To see all the format supported run identify -list format.

Python : Function to pull a sound clip from URL and save it in local machine

Would like to create a function that pulls a sound from given url and saves it in my machine locally
use urllib module
import urllib
urllib.urlretrieve(url,sound_clip_name)
the file will be save as what you provide the name
alternative, using urllib2
import urllib2
file = urllib2.urlopen(url).read()
f = open('sound_clip','w')
f.write(file)
f.close()
don't forget to give the extension of your file
If in Python 2.7, urllib2 module is your friend, or urllib.request in Python3.
Example in 2.7 :
import urllib2
f = urllib2.urlopen('http://www.python.org/')
with open(filename, w) as fd:
fd.write(f.read)

How to display a pdf that has been downloaded in python

I have grabbed a pdf from the web using for example
import requests
pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")
I would like to modify this code to display it
from gi.repository import Poppler, Gtk
def draw(widget, surface):
page.render(surface)
document = Poppler.Document.new_from_file("file:///home/me/some.pdf", None)
page = document.get_page(0)
window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)
window.show_all()
Gtk.main()
How do I modify the document = line to use the variable pdf that contains the pdf?
(I don't mind using popplerqt4 or anything else if that makes it easier.)
It all depends on the OS your using. These might usually help:
import os
os.system('my_pdf.pdf')
or
os.startfile('path_to_pdf.pdf')
or
import webbrowser
webbrowser.open(r'file:///my_pdf.pdf')
How about using a temporary file?
import tempfile
import urllib
import urlparse
import requests
from gi.repository import Poppler, Gtk
pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")
with tempfile.NamedTemporaryFile() as pdf_contents:
pdf_contents.file.write(pdf)
file_url = urlparse.urljoin(
'file:', urllib.pathname2url(pdf_contents.name))
document = Poppler.Document.new_from_file(file_url, None)
Try this and tell me if it works:
document = Poppler.Document.new_from_data(str(pdf.content),len(repr(pdf.content)),None)
If you want to open pdf using acrobat reader then below code should work
import subprocess
process = subprocess.Popen(['<here path to acrobat.exe>', '/A', 'page=1', '<here path to pdf>'], shell=False, stdout=subprocess.PIPE)
process.wait()
Since there is a library named pyPdf, you should be able to load PDF file using that.
If you have any further questions, send me messege.
August 2015 : On a fresh intallation in Windows 7, the problem is still the same :
Poppler.Document.new_from_data(data, len(data), None)
returns : Type error: must be strings not bytes.
Poppler.Document.new_from_data(str(data), len(data), None)
returns : PDF document is damaged (4).
I have been unable to use this function.
I tried to use a NamedTemporayFile instead of a file on disk, but for un unknown reason, it returns an unknown error.
So I am using a temporary file. Not the prettiest way, but it works.
Here is the test code for Python 3.4, if anyone has an idea :
from gi.repository import Poppler
import tempfile, urllib
from urllib.parse import urlparse
from urllib.request import urljoin
testfile = "d:/Mes Documents/en cours/PdfBooklet3/tempfiles/preview.pdf"
document = Poppler.Document.new_from_file("file:///" + testfile, None) # Works fine
page = document.get_page(0)
print(page) # OK
f1 = open(testfile, "rb")
data1 = f1.read()
f1.close()
data2 = "".join(map(chr, data1)) # converts bytes to string
print(len(data1))
document = Poppler.Document.new_from_data(data2, len(data2), None)
page = document.get_page(0) # returns None
print(page)
pdftempfile = tempfile.NamedTemporaryFile()
pdftempfile.write(data1)
file_url = urllib.parse.urljoin('file:', urllib.request.pathname2url(pdftempfile.name))
print( file_url)
pdftempfile.seek(0)
document = Poppler.Document.new_from_file(file_url, None) # unknown error

How to open a password protected excel file using python?

I looked at the previous threads regarding this topic, but they have not helped solve the problem.
how to read password protected excel in python
How to open write reserved excel file in python with win32com?
I'm trying to open a password protected file in excel without any user interaction. I searched online, and found this code which uses win32com.client
When I run this, I still get the prompt to enter the password...
from xlrd import *
import win32com.client
import csv
import sys
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename,password = r"\\HRA\Myfile.xlsx", 'caa team'
xlwb = xlApp.Workbooks.Open(filename, Password=password)
I don't think that named parameters work in this case. So you'd have to do something like:
xlwb = xlApp.Workbooks.Open(filename, False, True, None, password)
See http://msdn.microsoft.com/en-us/library/office/ff194819.aspx for details on the Workbooks.Open method.
I recently discovered a Python library that makes this task simple.
It does not require Excel to be installed and, because it's pure Python, it's cross-platform too!
msoffcrypto-tool supports password-protected (encrypted) Microsoft Office documents, including the older XLS binary file format.
Install msoffcrypto-tool:
pip install msoffcrypto-tool
You could create an unencrypted version of the workbook from the command line:
msoffcrypto-tool Myfile.xlsx Myfile-decrypted.xlsx -p "caa team"
Or, you could use msoffcrypto-tool as a library. While you could write an unencrypted version to disk like above, you may prefer to create an decrypted in-memory file and pass this to your Python Excel library (openpyxl, xlrd, etc.).
import io
import msoffcrypto
import openpyxl
decrypted_workbook = io.BytesIO()
with open('Myfile.xlsx', 'rb') as file:
office_file = msoffcrypto.OfficeFile(file)
office_file.load_key(password='caa team')
office_file.decrypt(decrypted_workbook)
# `filename` can also be a file-like object.
workbook = openpyxl.load_workbook(filename=decrypted_workbook)
If your file size is small, you can probably save that as ".csv".
and then read
It worked for me :)
Openpyxl Package works if you are using linux system. You can use secure the file by setting up a password and open the file using the same password.
For more info:
https://www.quora.com/How-do-I-open-read-password-protected-xls-or-xlsx-Excel-file-using-python-in-Linux
Thank you so much for the great answers on this topic. Trying to collate all of it. My requirement was to open a bunch of password protected excel files ( all had same password ) so that I could do some more processing on those. Please find the code below.
import pandas as pd
import os
from xlrd import *
import win32com.client as w3c
import csv
import sys
from tempfile import NamedTemporaryFile
df_list=[]
# print(len(files))
for f in files:
# print(f)
if('.xlsx' in f):
xlwb = xlapp.Workbooks.Open('C:\\users\\files\\'+f, False, True, None, 'password')
temp_f = NamedTemporaryFile(delete=False, suffix='.csv')
temp_f.close()
os.unlink(temp_f.name)
xlwb.SaveAs(Filename=temp_f.name, FileFormat=xlCSVWindows)
df = pd.read_csv(temp_f.name,encoding='Latin-1') # Read that CSV from Pandas
df.to_excel('C:\\users\\files\\password_removed\\'+f)

Categories

Resources