I was wondering whether it would be possible to write a python script that retrieves header information from an .exe file. I tried googling but didn't really find any results that were usable.
Thanks.
Sept
There is pefile : multi-platform Python module to read and work with Portable Executable (aka PE) files. Most of the information in the PE Header is accessible, as well as all the sections, section's information and data.
Looks like I'm almost 2 years and a dollar short! If you still need to solve this, Michał Niklas was right on point above. pefile was written for this very purpose. Here is an example from my interactive session:
ipython
import pefile
pe = pefile.PE('file.exe')
pe.print_info()
The output is too verbose to put up here, but the above gives you all header information from a PE.
Download pefile here: pefile
Of course it is possible to write a Python script to retrieve header information from an XYZ file. Three simple steps:
(1) Find docs for the header part of an XYZ file; read them.
(2) Read the docs for the Python struct module or ctypes module or both.
(3) Write and test the script.
Which step are you having trouble with?
Related
I'm looking for a way to convert excel to html while preserving formatting.
I know this is doable on windows due to the availability of some underlying win32 libraries, (eg via xlwings
Python - Excel to HTML (keeping format))
But I'm looking for a solution on Linux.
I've also come by Aspose Cells but this requires a paid license or else it will add a lot of extra junk to the output that needs to be scrubbed out.
And lastly I tried the python lib xlsx2html but it does a very poor job at preserving formatting.
Are there any suggestions for a Linux based solution? I'd also be interested in tools written in other languages that can be easily wrapped around via python.
Thanks in advance!
Update:
Here is an example of a random excel sheet I converted via excel itself that I would like to reproduce. It has some colors, some border variations, some merged cells and some font sizes to see if they all work.
You can use LibreOffice to convert an Excel file to a HTML file using the command line:
# --convert-to implies --headless so it's not mandatory to specify --headless
soffice --headless --convert-to html data.xlsx
You can refer to the documentation to know more about other CLI parameters.
I think you should search for Excel to HTML in the JS world not python (I am not saying it is not possible, but It's more usual in JS), I promise you will get better results.
In my opinion, finding a JS-based solution and make a python wrapper can be more helpful. Because in JS community, they struggled more than another communities to import and work with Excels.
Another idea is to change your approach, look for how you can import an Excel file in an embedded way or iframe inside an HTML page with JS and then export it.
But again, I highly recommend to check JS libraries or GitHub repositories, some of them care about formatting.
There's great stuff out there for handling Excel files from Python, and I think I'm just falling into a funny little crack: I need to write out a multi-worksheet workbook in the Excel 2003 XML format using pure Python (not win32com or VBA or something). Just like the poster here, I'm taking nasty proprietary files and having to spit them out in precisely the same nasty proprietary way, or else the nasty proprietary software won't take them back. I'm manipulating the data along the way, so this isn't just a format conversion; I need to be in Python to do real work on the files, and then write them back out in the same format they came in. A simpler version of the question was asked here but not directly answered.
The xlsxwriter docs have a nice summary of the current state of the art, which agrees with my own Googling: xlwt will handle the old non-XML formats, openpyxl specifically does Excel 2010 formats, xlsxwriter itself is for 2007+, pythonOffice hasn't been touched since 2012.
Please tell me I don't have to parse everything manually with BeautifulSoup or something to get back to Excel 2003! I can use Python 2, or 3, or both, if needed. Thanks. These are the relevant bits of namespace:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
...
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
I've also been dealing with similarly annoying proprietary files. After doing a lot of digging through all of the same python excel extensions, I've also come to the conclusion that yes you will have to parse the xml file manually.
I had the same problem, but found an answer that I think both :
Faster to implement
Easier to modify, and have evolve over time (expecially if your CSV changes like having new columns added)
I had to convert a regulat CSV to .xml in SpreadsheetML format (XML Spreadsheet 2003), and I found a nice tutorial on how to do it using many ways. For Python 3, I chose ffe (Flat File Extractor)
The essential:
To use ffe, you must be running is in linux environment and install it using $sudo apt-get install ffe for example. (FYI: There is also a binary for windows)
You need to create a .fferc configurations file in a specific format acting like a XML template (see article or documentation links provided)
You can then convert your input csv file into an output xml file using a bash/shell command line as such: $ffe -o output.xml -c csv2xml.fferc input.csv
If you want to prototype quickly, I succeded in a Google Colab Notebook. You can install it using the sudo command I provided above.
Happy coding!
Link to full article : Converting CSV to XML on Ubuntu Communiti Wiki.
Problem
On the Mac OS X platform, I would like to write a script, either in Python or Tcl to search for text within a PDF file and extract the relevant parts. I appreciate any help.
Background
I am writing scripts to look inside a PDF to determine if it is a bill, from what company, and for what period. Based on these information, I rename the PDF and move it to an appropriate directory. For example, file such as Statement_03948293929384.pdf might become 2012-07-15 Water Bill.pdf and moved to my Utilities folder.
What have I done so far?
I have searched for PDF-to-plain-text tools, but not found anything yet
I have looked into the Tcl wiki and found an example, but could not get it to work (I searched for text in PDF, but not found).
I am looking into pdf-parser.py by Didier Stevens
I heard of a Python package called pyPdf and will look at it next.
Update
I have found a command-line tool called pdftotext written by Glyph & Cog, LLC; built and packaged by Carsten Bluem. This tool is straight forward and it solves my problem. I am still looking out for those tools that can search PDF directly, without having to convert to text file.
I have successfully used PyODConverter to convert to/from PDFs (there is also a more powerful Java version). Once you have the PDF converted to text it should be trivial to do the searching. Also I believe iText should be capable of doing similar things, but I haven't tested it.
I would like to compare the dat of two files and store the report in another file.I tried using winmerge by invoking cmd.exe using subprocess module in python3.2.i was able to get the difference report but wasnt able to save that report.Is there a way with winmerge or with any other comparing tools(diffmerge/kdiff3) to save the difference report using cmd.exe in windows7?please help
Though your question is quite old, I wonder it wasn't answered yet. I was searching myself for an answer and funnily I found yours. Perhaps you mix quite a lot questions into one mail. So I decided to answer the main headline, where I suppose you try to compare human readable file contents.
To compare two files, there is a difflib library which is part of the Python distribution.
By the way an example how to generate a utility to compare files can be found on Python's documentation website.
The link is here: Helpers for computing deltas
From there you can learn to create an option and save deltas to a e.g. textfile or something. Some of these examples contain also a git-difference-like output, which possibly helps you to solve your question.
This means, if you are able to execute your script, then other delta tools are not required. It makes not soo much sense to call other tools via Python on CMD and try to control them... :)
Maybe also this Website with explanations and code examples may help you:
difflib – Compare sequences
I hope that helps you a bit.
EDIT: I forgot to mention, that the last site contains a straightforward example how to generate an HTML output:
HTML Output
How can I convert a .csv file into .dbf file using a python script? I found this piece of code online but I'm not certain how reliable it is. Are there any modules out there that have this functionality?
Using the dbf package you can get a basic csv file with code similar to this:
import dbf
some_table = dbf.from_csv(csvfile='/path/to/file.csv', to_disk=True)
This will create table with the same name and either Character or Memo fields and field names of f0, f1, f2, etc.
For a different filename use the filenameparameter, and if you know your field names you can also use the field_names parameter.
some_table = dbf.from_csv(csvfile='data.csv', filename='mytable',
field_names='name age birth'.split())
Rather basic documentation is available here.
Disclosure: I am the author of this package.
You won't find anything on the net that reads a CSV file and writes a DBF file such that you can just invoke it and supply 2 file-paths. For each DBF field you need to specify the type, size, and (if relevant) number of decimal places.
Some questions:
What software is going to consume the output DBF file?
There is no such thing as "the" (one and only) DBF file format. Do you need dBase III ? dBase 4? 7? Visual FoxPro? etc?
What is the maximum length of text field that you need to write? Do you have non-ASCII text?
Which version of Python?
If your requirements are minimal (dBase III format, no non-ASCII text, text <= 254 bytes long, Python 2.X), then the cookbook recipe that you quoted should do the job.
Use the csv library to read your data from the csv file. The third-party dbf library can write a dbf file for you.
Edit: Originally, I listed dbfpy, but the library above seems to be more actively updated.
None that are well-polished, to my knowledge. I have had to work with xBase files many times over the years, and I keep finding myself writing code to do it when I have to do it. I have, somewhere in one of my backups, a pretty functional, pure-Python library to do it, but I don't know precisely where that is.
Fortunately, the xBase file format isn't all that complex. You can find the specification on the Internet, of course. At a glance the module that you linked to looks fine, but of course make copies of any data that you are working with before using it.
A solid, read/write, fully functional xBase library with all the bells and whistles is something that has been on my TODO list for a while... I might even get to it in what is left this year, if I'm lucky... (probably not, though, sadly).
I have created a python script here. It should be customizable for any csv layout. You do need to know your DBF data structure before this will be possible. This script requires two csv files, one for your DBF header setup and one for your body data. good luck.
https://github.com/mikebrennan/csv2dbf_python