comparing two files and saving the report in any other file - python

I would like to compare the dat of two files and store the report in another file.I tried using winmerge by invoking cmd.exe using subprocess module in python3.2.i was able to get the difference report but wasnt able to save that report.Is there a way with winmerge or with any other comparing tools(diffmerge/kdiff3) to save the difference report using cmd.exe in windows7?please help

Though your question is quite old, I wonder it wasn't answered yet. I was searching myself for an answer and funnily I found yours. Perhaps you mix quite a lot questions into one mail. So I decided to answer the main headline, where I suppose you try to compare human readable file contents.
To compare two files, there is a difflib library which is part of the Python distribution.
By the way an example how to generate a utility to compare files can be found on Python's documentation website.
The link is here: Helpers for computing deltas
From there you can learn to create an option and save deltas to a e.g. textfile or something. Some of these examples contain also a git-difference-like output, which possibly helps you to solve your question.
This means, if you are able to execute your script, then other delta tools are not required. It makes not soo much sense to call other tools via Python on CMD and try to control them... :)
Maybe also this Website with explanations and code examples may help you:
difflib – Compare sequences
I hope that helps you a bit.
EDIT: I forgot to mention, that the last site contains a straightforward example how to generate an HTML output:
HTML Output

Related

Best way to get text of a pdf file?

How can I read pdf in python? I know one way of converting it to text, but I want to read the content directly from pdf.
Can anyone explain which module in python is best for pdf extraction?
I tried to use PyPDF2 package but it gives me inconsistent results. Also, i would like a lot to have a way to get the tables, the images, and remove the headers and the footers at least consistently, it doesn't need to happens 100% of the times. Thanks for your answers, i just need to find the right library. Thanks!
From another post that asked pretty much the same:
The answer depends if the question is general or specific to a single form. Your approach is reasonable for the general case, but there will be variability. If you have a pdf form that is a single form or report that has been created with different data at each iteration consider converting the form from pdf to postscript then see if you can parse the postscript.
Two utilities do this: pdf2ps and pdftops Try each. This approach may benefit if you know some postscript. With some luck the needed fields may be simple text strings. Worth a try.

TexSoup for Bib-Files

this is my first question so I will try to do everything as proper as possible.
I am currently using LaTeX to write my documents at my University because I want to use the powerful citing capabilities provided by BibTeX. For ease of use, I am writing on scripts that will implement my .bib-files into my .tex files easier and allow easier management of my .bib-files. As I am using Arch Linux, I did this in bash, but it is a little clunky. Therefore I wanted to switch to python, as I came across the TexSoup-library for Python.
My issue is now, that I cannot find resources regarding the use of TexSoup for .bib files, I can only find resources on .tex-files. Does anybody know, if and if yes how I can use TeXSoup to find books / articles or other entries in my bib-files with python (or the TexSoup-library)?
with open("bib_complete.bib") as f:
soup = TexSoup(f)
print(soup)
This is a code sample I am trying to use, but I don't know how to look for entry names or entry-types with the package. I would really appreciate if someone could guide me to good resources if they exist.
I hope my writing was comprehensive enough and not too long.
Thanks everybody!

Bentley Project Wise for Data Retriaval

This is my first post on stack.
I'm looking to gather a large amount of data from a multitude of files on PW so I can quantify a few things about the records.
The directories I'm working with have unique numbers and offer files that are all similar to files in other folders.
Is there a library from python I can use or any other useful tips for taking on this task?
It could potentially save many hours of work if I can do this with code.
A pseudocode example may look like.
for element in dataField:
search(folder)
if folder found:
search(file)
if file found
extract certain data from file X
extractedData.append(data)
Thank you,
R
Based off a quick web search for projectwise api, there is a web-based REST API available, so you'll definitely want to look into that more. You'll need to read the docs carefully to figure out which endpoint does what, but once you know what information you need to send and what kind of data you'll receive, programming a basic Python interface shouldn't be too difficult. One may already exist, I didn't look too hard.

Python pefile member question

Gang,
I apologize if this is a really dumb question... I am wanting to use the super convenient python script pefile (http://code.google.com/p/pefile/) that parses an executable and lists particular information about the PE structure. My question is where can I find information about how to access particular members of the executable? I've scoured the wiki and read the usage examples but that documentation only covered 4-5 members. What I am wondering is if you guys have a list of members I can access to display the information I care about. So specifically, if I wanted to list the Stack Commit Size of an executable, does it look like this: pe.FILE_HEADER.StackCommitSize, obviously I can run this code and figure it out but have you guys seen API DOC floating around that I find the members i need?
THANKS!
From the PE docstring:
Basic headers information will be available in the attributes:
DOS_HEADER
NT_HEADERS
FILE_HEADER
OPTIONAL_HEADER
All of them will contain among their attributes the members of the
corresponding structures as defined in WINNT.H
So, look at winnt.h and you'll see which attributes are available.
Or just read the source code for the module. It's big, but everything you need to know is in there.
You can generally find everything that is in a PE file in the Microsoft PE/COFF specification.
Once you've looked there, you know that the StackCommitSize is in the optional image header. Then all you have to do is to look for the corresponding structure in pefile, which usually bears a similar name, if not indeed the very same name. In this case:
pe = pefile.PE("C:\\Windows\\Notepad.exe")
print pe.OPTIONAL_HEADER.SizeOfStackCommit
Will give you what you want.
If you have trouble finding SizeOfStackCommit (after you've found it in the specification), just use your quick find on the source code. It's as easy to read as you can get, and I don't think you'll have any trouble finding the required structure.
Now, there probably aren't any API docs for pefile itself, but as you can see there's really no need for it, since it's just a nice Pythonic wrapper around the PE specification itself.

Shapefile/ArcInfo to SVG? Preferably in Python

I'm interested in taking these census cartographic files and converting them into SVG files. So far I've found this shptosvg Perl script, but I'd really prefer to do any coding or data wrangling in Python.
Also, I know shpUtils.py can be used for parsing .shp files in Python, but I'm unaware how to take that output and create SVG paths.
Anyways, I'd definitely be interested in any advice you guys have or modules you know of.
Late response, but here is exactly what you want, in Python with a wonderful API:
https://github.com/kartograph/kartograph.py
As comments have noted it was previously available at https://github.com/svgmap/svgmap.py
The svgmap link was broken for me on github, but kartograph.py works for ESRI shp. files
Not python, but you may be interested in these links:
http://egb13.net/2009/07/shapefile-to-svg-translator-project/
http://www.carto.net/svg/utils/shp2svg/

Categories

Resources