How do I upload a graphviz (.dot) file to Neo4j? - python

I have a large number of Graphviz files that I need to convert to Neo4j. At first blush, it looks like it should be easy enough to read it as a text file and convert to cypher but I am hoping that one of the python graphviz libraries would make it easier to "parse" the input, or that someone is aware of a prebuilt library.
Is anyone aware of, or has already built, a parser for conversion ? Partial examples are fine. Thanks

You can probably hack this together pretty easily using NetworkX. They implement a read_dot to read in the graphviz format, then I'm sure you can use one of their graph exporters to dump that back into a format that neo4j can use. For example, here's a package that attempts to simplify that export process (disclaimer: I've never tried this package, it just showed up in Google).

Related

Search office documents for string python

I've been looking for a fast and relatively easy way of searching (grep-ish) for user-defined strings in files of varying formats, i.e xlsx, docx, pptx, pdf using Python.
My research has led me to believe that there might not be a convenient way of doing this, as per a single module or similar. Am I forced to use a separate module for each file type? And if so are these approriate?
docx
openpyxl
pptx
slate
I also looked at forms of decompression to get to the xml-files containing actual text but it seems unwieldy. I just want to be sure that there is no simple, uniform way of handling all of these different filetypes.
Well, I've mostly figured it out. In the end I decided to use powershell combined with "itextsharp.dll" to process the files. It turned out to be simpler than using portable python. Thanks for the answers:-)

Python Library - json to json transformations

Does anyone know of a python library to convert JSON to JSON in an XSLT/Velocity template style?
JSON + transformation template = JSON (New)
Thanks!
Sorry if it's old, but you can use this module https://github.com/Onyo/jsonbender
Basically it transform a dicc to another Dicc object using a mapping. What you can do is to dump the json into a dicc, transform it to another dicc and then transfrom it back to a json.
I have not found the transformer library suitable for my needs and spend couple of days trying to create my own. And then I realized that creating transformation scheme is more difficult than writing native python code that transforms one json-like python object to another.
I understand, that this is not the answer to original question. And I also understand that my approach has certain limitations. F.e. if you need to generate documentation it wouldn't work.
But if you just need to transform json-like objects consider the possibility to just write python code that does it. Chances are that the code would be cleaner and easier to understand than transformation schema description.
I wish considered this approach more seriously couple of days ago.
I found pyjq library very magical, you can feed it a template and json file and it will map it out for you.
https://pypi.org/project/pyjq/
The only thing that is annoying about it was the requirements I have to install for it, it worked perfect on my local machine, but it failed when I tried to build it failed to build dependencies for lambda an aws.

Is there a reliable python library for taking a BibTex entry and outputting it into specific formats?

I'm developing using Python and Django for a website. I want to take a BibTex entry and output it in a view in 3 different formats, MLA, APA, and Chicago. Is there a library out there that already does this or am I going to have to manually do the string formatting?
There are the following projects:
BibtexParser
Pybtex
Pybliographer
BabyBib
If you need complex parsing and output, Pybtex is recommended. Example:
>>> from pybtex.database.input import bibtex
>>> parser = bibtex.Parser()
>>> bib_data = parser.parse_file('examples/foo.bib')
>>> bib_data.entries.keys()
[u'ruckenstein-diffusion', u'viktorov-metodoj', u'test-inbook', u'test-booklet']
>>> print bib_data.entries['ruckenstein-diffusion'].fields['title']
Predicting the Diffusion Coefficient in Supercritical Fluids
Good luck.
Having tried them, all of these projects are bad, for various reasons: terrible APIs, bad documentation, and a failure to parse valid BibTeX files. The implementation you want doesn't show up in most Google searches, from my own searching: it's biblib. This text from the README should sell it:
There are a lot of BibTeX parsers out there. Most of them are complete nonsense based on some imaginary grammar made up by the module's author that is almost, but not quite, entirely unlike BibTeX's actual grammar. BibTeX has a grammar. It's even pretty simple, though it's probably not what you think it is. The hardest part of BibTeX's grammar is that it's only written down in one place: the BibTeX source code.
The accepted answer of using pybtex is fraught with danger as Pybtex does not preserve the bibtex format of even simple bibtex files. (https://bitbucket.org/pybtex-devs/pybtex/issues/130/need-to-specially-represent-bibtex-markup)
Pybtex is therefore losing bibtex information when reading and re-writing a simple .bib file without making any changes. Users should be very careful following the recommendations to use pybtex.
I will try biblib as well and report back but the accepted answer should be edited to not recommend pybtex.
Edit:
I was able to import the data using Bibtex Parser, without any loss of data. However, I had to compile from https://github.com/sciunto-org/python-bibtexparser as the version installed via pip was bugged at the time. Users should verify that pip is getting the latest version.
As for exporting, once the data has been imported via BibTex Parser, it's in a dictionary, and can be exported as the user desires. BibTex Parser does not have built in functions for exporting in common formats. As I did not need this functionality, I didn't specifically test it. However, once imported into a dictionary, the string output can be converted to any citation format rather easily.
Here, pybtex and a custom style file can help. I used the style file provided by the journal and compiled in LaTeX instead, but PyBtex has python style files (but also allows ingesting .sty files). So I would recommend taking the Bibtex Parser input and transferring it to PyBtex (or similar) for outputting in a certain style.
The closest thing I know of is the pybtex package

Stringing together C and Python code in Go?

Update
I'm trying to create a simple Go function which will simply take in a string of reddit-style Markdown and return the appropriate HTML.
Right now, I know that having Discount installed is a prerequisite and that at least the following three files are used by reddit as wrappers around Discount:
https://github.com/reddit/reddit/blob/master/r2/r2/lib/c/reddit-discount-wrapper.c
https://github.com/reddit/reddit/blob/master/r2/r2/lib/c_markdown.py
https://github.com/reddit/reddit/blob/master/r2/r2/lib/py_markdown.py
Based on this, does anyone know how I can sort of glue all this together with Cgo and go-python to create a simple Markdown function? (independent of the rest of the reddit source code)
If all you want is Markdown, I don't see how Python fits into this. Maybe there's more to it, but if at all possible you should leave Python out of this. If there's a reason to use Python that wasn't in the question, I can edit this answer and address that.
First, try this native Go Markdown package: https://github.com/knieriem/markdown
If that doesn't work, the next easiest thing is to take Discount (or any other Markdown library written in C, such as GitHub's Upskirt fork) and wrap it with cgo or SWIG.

What's a good document standard to use programmatically?

I'm writing a program that requires input in the form of a document, it needs to replace a few values, insert a table, and convert it to PDF. It's written in Python + Qt (PyQt). Is there any well known document standard which can be easily used programmatically? It must be cross platform, and preferably open.
I have looked into Microsoft Doc and Docx, which are binary formats and I can't edit them. Python has bindings for it, but they're only on Windows.
Open Office's ODT/ODF is zipped in an xml file, so I can edit that one but there's no command line utilities or any way to programmatically convert the file to a PDF. Open Office provides bindings, but you need to run Open Office from the command line, start a server, etc. And my clients may not have Open Office installed.
RTF is readable from Python, but I couldn't find any way/libraries to convert RTF documents to PDF.
At the moment I'm exporting from Microsoft Word to HTML, replacing the values and using PyQt to convert it to a PDF. However it loses formatting features and looks awful. I'm surprised there isn't a well known library which lets you edit a variety of document formats and convert them into other formats, am I missing something?
Update: Thanks for the advice, I'll have a look at using Latex.
Thanks,
Jackson
Have you looked into using LaTeX documents?
They are perfect to use programatically (compiling documents? You gotta love that...), and you have several Python frameworks you can use such as plasTeX and PyTex.
Exporting a LaTeX documents to PDF is almost immediate.
Since you're already using PyQt anyway, it might be worth looking at Qt's built-in RTF processing module which looks decent. Here's the documentation on detailed content manipulation including inserting tables. Also the QPrinter module's default print-to-file format happens to be PDF.
Without knowing more about your particular needs it's hard to say if these would do what you want, but since your application already has PyQt as a dependency, seems silly to introduce any more without evaluating the functionality you've already got available.
The non-GUI parts of the Qt framework are often overlooked though.
edit: included more links.
You might want to try ReportLab. The open source version can write PDFs, and the commercial version has a lot of really nice abstractions to allow output to a variety of different formats from a single input.
I don't know the kind of odience of your program, Tex is good and i would go with it.
Another possible choice is Excel format, parsing it with xlrd.
I've used it a couple of time and it's pretty straightforward.
Excel file is a good for the following reasons:
Well known format easy to edit
You could prepare a predefined template with constrains and table
Creating XML documents, transforming them to XSL/fo and rendering with Fop or RenderX. If you use docbook as the primary input, there are toolchains freely available for converting that to PDF, RTF, HTML and so forth.
It is rather quirky to use and not my idea of fun, but is does deliver and can be embedded in an application, AFAICT.
Creating docbook is very straightforward as it has a wide range of semantic tags, table support etc to give a "meaningful" markup which can be reliably formatted. The XSL stylesheets are modular and allow parts to be customized or replaced to generate your own look and feel.
It works well for relatively free flow documents with lots of text.
For filling in the blanks kind of documents, a regular reporting engine may be a better fit, or some straighforward XSL stylesheets spitting out the XSL-fo directly.

Categories

Resources