text formatting using programming - python

Is there a way to change the text-formatting of a .odt file using a programming software? I'm trying to make some text editions which is time consuming to be done manually in lib.office (text is big). Using R or python I can edit spaces and carriage breaks using brute force, but, to do edits like left, right-justify etc., I have to finally go back to the editor to do it. I can see there are functions in python to include tabs, change case etc., but is it possible to do right-left-centre justification in a .odt text file indirectly using a programming software?

This is a great job for a scripting language like Python. I think you want string methods like str.ljust. That method left-justifies strings.
to open the file.
Alternately, you might try defining a macro in OpenOffice (if they exist and your task is simple) - or investing some time in learning emacs or something and using that (this link shows at least some degree of emacs support for .odt files). Or learn vim, that is the one true way!
Edit: after some research I found this. It seems you could unzip the .odt file, read files within it, and manually edit the text nodes of the XML there. However, it seems like it might be easier to use a library - here are two:
https://pypi.python.org/pypi/odfpy
http://ooopy.sourceforge.net/
Edit 2: in terms of actually justifying the text, presuming you have extracted it, this:
def justify(string, left=True):
if left:
return "\n".join(line.lstrip() for line in string.splitlines())
else:
lines = string.splitlines()
longest_line = len(max(lines, key=len))
return "\n".join(line.rjust(longest_line) for line in lines)
should work.

Related

Displaying reports for files using Python

Question about general possibilities using Python here, I don't really know enough about programming to know whether it's something that's doable, and if so, how do I go about it.
I have a program which is a simple desktop program, which you load files into. The program can then output various properties of the thing that's in the file, and depending on what you ask it to do will output a report. It outputs the report in text format, but not as a file, and instead actually, just in the program itself displays the report. Like this:
My question is that if I want to get this text output for a large number of files, I'm currently manually loading the files individually into the program making the report, copying this to a text file, and saving the text file.
Basically I want to know whether it's extremely difficult to get Python to do this for me, or not. If it is doable, are the resources available for me to read about how it might be done? Are there conditions about being able to run my program and various commands from the Python command box?
Hope my question's clear enough. Sorry if it's a bit garbled.
The tricky part here is
The program can then output various properties of the thing that's in the file, and depending on what you ask it to do will output a report.
Basically, if the desktop application you use has a command line interface, it is possible and relatively easy.
If this program has command line option to open a document and output a report in any format (print the report on the standard output, write it into a file on the disk, etc.), you can call that commands from a script python for each files you set in a list.
If your software doesn't have a CLI (Command Line Interface), it might be possible but more diffficult. In that case, you have to automate actions by using a library that will emulate clicks on the Window of you software (1. Click on Open 2. Click, click, click to select the file to load 3. Click on the button to generate a report etc.) It's a pain, but it can be considered.
You will find plenty of resources to learn by yourself how to code a python script. You will probably need to learn about lists, loops, files manipulations and maybe the subprocess library which will let you call any command from your python script.
I suggest you to start with Python3 instead of Python2 because it has a better support for unicode that could quickly become an issue if you have non ascii characters in your input files or in reports from your software.
Good luck ;)
If the only way you can get report is selecting and copy/pasting it from program GUI, the situation just begs for AutoIt instead of Python.
With Python it would be much more difficult. Unless you want to improve your python knowledge or course...
Simulating keypresses, you can open specific file in program (through sending ctrl+o or alt and navigating file menu). Simulating mouse or keypress - start report generation. Then simulate a mouse click in text area, and perform something like:
(just a skeleton of script, probably need to be modified to suit your situation and needs)
send("^{a}^{c}") ; to select all and copy (if these keys are supported in this program
$text = ClipGet() ; get contents of clipboard
$fout = FileOpen("somefile.txt",2)
FileWrite($fout,$text)
FileClose($fout)
To fully automate the task, in script you can get a list of source files in specific folder, and run this macro for each of them, automatically naming resulting txt files.

Locate the Containing Package or Module **Before** Importing

I often find myself needing to import something, but not quite sure of its fully qualified name. I usually end up opening a browser, performing an internet search like python [target_of_import], and scanning a page or two until I find it.
This works, but causes a relatively long break in my workflow, especially if I have to search for a few in a row. How do other people address this?
Is there something like Haskell's Hoogle for Python?
[Note: I currently use vim, in case anyone suggests an IDE-based solution.]
EDIT: For answers concerning autocomplete, please specify this. In general, autocomplete is probably a non-starter solution since in the particular case I am asking about the leftmost characters of the string to be autocompleted are not known.
EDIT 2: While I will not categorically rule out suggestions concerning switching to/learning a new IDE, I'm pretty unlikely to completely change the way I work to accomplish this (e.g., switching from vim on the command line to something like Eclipse + plugins).
You can do this in vim using the Unite.vim
Enable fuzzy file searching by adding the following to your .vimrc:
call unite#filters#matcher_default#use(['matcher_fuzzy'])
Search for file:
:UniteWithInput file_rec/async:/base/path:!<cr>
Search within files:
:UniteWithInut grep:/base/path<cr>
Search file names and within files
:UniteWithInput file_rec/async:/base/path:! grep:/base/path<cr>
(Use to change between sources)
See also :h :UniteWithCursorWord
This will open a buffer with the file matches. You can open the file by pressing enter but since you only want copy the file name simply use y$ to yank the line, q to close the buffer and the p to paste the yanked line.

generating simulation input files with Python

I am using a scientific simulation package that requires several text-based input files for each 'experiment' to be conducted. These files can be quite lengthy and have a lot of boilerplate sections in them; however, specific 'experiment-specific' values must be entered at many locations within these files.
I would like to automate the generation of these files and do so in a way that is maintainable.
Right now, I am using a Python script I wrote that employs triple quoted blocks of text and variable substitution (using % and .format()) to create sections in the files. I then write out these blocks to the appropriate files.
Accounting for proper aesthetic indentation in the resulting input files is proving to be difficult; moreover, the autogenerator script is becoming more and more opaque as I enhance the types of simulations and options that can be handled.
Does anyone have suggestions about how to manage this task in a more elegant and maintainable way?
I am aware of templating packages like jinja. Do these have benefits outside of generating html-like files? Has anyone used these for the above-stated purpose?
Perhaps a totally different approach would be better.
Any suggestions would be greatly appreciated.
Jinja doesn't care what type of file you make. Text is text is text, unless it's binary. Not even sure Jinja cares then either.
IPython, and in particular, nbconvert, uses Jinja2 to export LaTeX, ipynb, markdown, etc.
There is also an IPython notebook with Jinja2 magics in case you want a demo.
My usual approach to this sort of problem is to create a small library of functions that help me generate and customise the boiler-plate. I don't know what your experiment-definition language looks like but generally I'd need to write a function that writes out the text to initialise the simulation, a function that writes out the text to wrap up the simulation and some other functions to write out the different chunks of text that define each type of experiment.
Having put those functions in a file called mysim, say, I could then use them like this:
from mysim import sim_init, sim_conclude, experimentType1, experimentType2
sim_init (name="Today's Simulation", author="Simon")
for param1 in [0,1,2,3,4,5,6,7,8,20,30,40,50,60,70]:
experimentType1 (param1)
for param2 in ["A", "B", "C"]:
experimentType2 (param1, param2)
sim_conclude (savefile="output.txt")
This Python script would generate a simulation input file that would run experiment type 1 for each value of param1 and experiment type 2 for each combination of param1 and param2.
The function implementations themselves might look messy, but the script that creates a particular simulation file will be simple and clear.

Which format should I save my python script output?

I have an executable (converted to exe from python using py2exe) that outputs lists of numbers that could be from 0-50K lines long or a little bit more.
While developing, I just saved them to a TXT file using simple f.write.
The person wants to print this output on paper! (don't ask why lol)
So, I'm wondering if I can output it to something like HTML? XML? Something that could display tables of 50K lines and maybe 3 columns and that would also run in any PC without additional programs?
Suggestions?
EDIT:
Regarding CSV:
In most situations the best way in my opinion would be to make a CSV. I'm not opposing it in anyway, rather I think others might find Lott's answer useful for their cases. Sorry I didn't explain it that well in my question as far as my constraints go.
My constraints are: the user doesn't have an office suite, no python installed. Just think of a PC that has the bare minimum after a clean windows xp/vista installation, maybe Internet Explorer 7 or 8. This PC has to be able to open my output file and allow for reasonable viewing, searching, and printing.
CSV.
http://docs.python.org/library/csv.html
http://en.wikipedia.org/wiki/Comma-separated_values
They can load a spreadsheet and print anything they want.
If you can't install anything on the computer, the you might be best off outputting an HTML file with the data in a <table> that the user could view/search/print in IE.
You could use LaTeX to produce a PDF, maybe? But why exactly isn't a text file good enough?
You can produce a PDF using Reportlab. After all if you really want full control of the printed output, there's nothing that beats PDF.
Does 50k lines make too large a file? If not, just continue writing text files. Otherwise an easy solution would be to continue spitting out text files and compress them, e.g. with zip. You could use the zipfile library in Python. Most computers have no trouble reading zip files.

Editing Photoshop PSD text layers programmatically

I have a multi-layered PSD, with one specific layer being non-rasterized text. I'm trying to figure out a way I can, from a bash/perl/python/whatever-else program:
load the PSD
edit the text in said layer
flatten all layers in the image
save as a web-friendly format like PNG or JPG
I immediately thought of ImageMagick, but I don't think I can edit the text layer through IM. If I can accomplish the first two steps some other programmatic way, I can always use ImageMagick to perform the last two steps.
After a couple of hours of googling and searching CPAN and PyPI, I still have found nothing promising. Does anyone have advice or ideas on the subject?
If you don't like to use the officially supported AppleScript, JavaScript, or VBScript, then there is also the possibility to do it in Python. This is explained in the article Photoshop scripting with Python, which relies on Photoshop's COM interface.
I have not tried it, so in case it does not work for you:
If your text is preserved after conversion to SVG then you can simply replace it by whatever tool you like. Afterwards, convert it to PNG (eg. by inkscape --export-png=...).
The only way I can think of to automate the changing of text inside of a PSD would be to use a regex based substitution.
Create a very simple picture in Photoshop, perhaps a white background and a text layer, with the text being a known length.
Search the file for your text, and with a hex editor, search nearby for the length of the text (which may or may not be part of the file format).
Try changing the text, first to a string of the same length, then to something shorter/longer.
Open in Photoshop after each change to see if the file is corrupt.
This method, if viable, will only work if the layer in question contains a known string, which can be substituted for your other value. Note that I have no idea whether this will work, as I don't have Photoshop on this computer to try this method out. Perhaps you can make it work?
As for converting to png, I am at a loss. If the replacing script is in Python, you may be able to do it with the Python Imaging Library (PIL, which seems to support it), but otherwise you may just have to open Photoshop to do the conversion. Which means that it probably wouldn't be worth it to change the text pragmatically in the first place.
Have you considered opening and editing the image in The GIMP? It has very good PSD support, and can be scripted in several languages.
Which one you use depends in part on your platform, the Perl interface didn't work on Windows the last I knew. I believe Scheme is supported in all ports.
You can use Photoshop itself to do this with OLE. You will need to install Photoshop, of course. Win32::OLE in Perl or similar module in Python. See http://www.adobe.com/devnet/photoshop/pdfs/PhotoshopScriptingGuide.pdf
If you're going to automate Photoshop, you pretty much have to use Photoshop's own scripting systems. I don't think there's a way around that.
Looking at the problem a different way, can you export from Photoshop to some other format which supports layers, like PNG, which is editable by ImageMagick?
You can also try this using Node.js. I made a PSD command-line tool
One-line command install (needs NodeJS/NPM installed)
npm install -g psd-cli
You can then use it by typing in your terminal
psd myfile.psd -t
You can check out the code to use it from another node script or use it through your shell is from another Bash/Perl/whatever script.

Categories

Resources