TexSoup for Bib-Files - python

this is my first question so I will try to do everything as proper as possible.
I am currently using LaTeX to write my documents at my University because I want to use the powerful citing capabilities provided by BibTeX. For ease of use, I am writing on scripts that will implement my .bib-files into my .tex files easier and allow easier management of my .bib-files. As I am using Arch Linux, I did this in bash, but it is a little clunky. Therefore I wanted to switch to python, as I came across the TexSoup-library for Python.
My issue is now, that I cannot find resources regarding the use of TexSoup for .bib files, I can only find resources on .tex-files. Does anybody know, if and if yes how I can use TeXSoup to find books / articles or other entries in my bib-files with python (or the TexSoup-library)?
with open("bib_complete.bib") as f:
soup = TexSoup(f)
print(soup)
This is a code sample I am trying to use, but I don't know how to look for entry names or entry-types with the package. I would really appreciate if someone could guide me to good resources if they exist.
I hope my writing was comprehensive enough and not too long.
Thanks everybody!

Related

parse and import data structure in Python from Franca interface descriptive language (fidl) files

Is there a Python library for reading and parsing fidl files? Ideally I would like to represent the datastructure described in the fidl as a Python dictionary, so that it can then be manipulated easily.
I was looking into pyfranca but it seems unmaintained and lacking proper documentation even for basic stuff.
Honestly, I could not find much more, so any help would be appreciated.

comparing two files and saving the report in any other file

I would like to compare the dat of two files and store the report in another file.I tried using winmerge by invoking cmd.exe using subprocess module in python3.2.i was able to get the difference report but wasnt able to save that report.Is there a way with winmerge or with any other comparing tools(diffmerge/kdiff3) to save the difference report using cmd.exe in windows7?please help
Though your question is quite old, I wonder it wasn't answered yet. I was searching myself for an answer and funnily I found yours. Perhaps you mix quite a lot questions into one mail. So I decided to answer the main headline, where I suppose you try to compare human readable file contents.
To compare two files, there is a difflib library which is part of the Python distribution.
By the way an example how to generate a utility to compare files can be found on Python's documentation website.
The link is here: Helpers for computing deltas
From there you can learn to create an option and save deltas to a e.g. textfile or something. Some of these examples contain also a git-difference-like output, which possibly helps you to solve your question.
This means, if you are able to execute your script, then other delta tools are not required. It makes not soo much sense to call other tools via Python on CMD and try to control them... :)
Maybe also this Website with explanations and code examples may help you:
difflib – Compare sequences
I hope that helps you a bit.
EDIT: I forgot to mention, that the last site contains a straightforward example how to generate an HTML output:
HTML Output

Extract text from Webpages with Python 3.x

I am working with Python 3.x
I want to extract text from several webpages. What is a good library to allow me do just that?
Thanks,
Barry.
http://www.crummy.com/software/BeautifulSoup/
and the documentation to get you started
http://www.crummy.com/software/BeautifulSoup/documentation.html
mechanize is good library but unfortunately not ready for python 3, but you can take a look at lxml.html
I would suggest using Beautiful Soup and than it's just a matter of going through the returned structure for anything similar to an email address.
You could also just use urllib2 for this but Beautiful Soup takes care of a lot of syntax issues for you.
You don't say what you want to do with the extracted text, and that makes a big difference in how much effort you are willing to go to in order to get it out.
If you are trying to get the body text of a web page minus all of the site-related cruft (a nontrivial task), take a look at boilerpipe. It is written in Java, but it does an amazingly good job at getting essential text out of random web pages.
One of my hobbies over the next few weeks is recreating the core logic of boilerpipe in Python. We need the functionality it provides for a project, but don't want to haul the 10-ton rock that is the JVM around with it. I'm pretty certain we will be releasing it once it is fairly stable.

Shapefile/ArcInfo to SVG? Preferably in Python

I'm interested in taking these census cartographic files and converting them into SVG files. So far I've found this shptosvg Perl script, but I'd really prefer to do any coding or data wrangling in Python.
Also, I know shpUtils.py can be used for parsing .shp files in Python, but I'm unaware how to take that output and create SVG paths.
Anyways, I'd definitely be interested in any advice you guys have or modules you know of.
Late response, but here is exactly what you want, in Python with a wonderful API:
https://github.com/kartograph/kartograph.py
As comments have noted it was previously available at https://github.com/svgmap/svgmap.py
The svgmap link was broken for me on github, but kartograph.py works for ESRI shp. files
Not python, but you may be interested in these links:
http://egb13.net/2009/07/shapefile-to-svg-translator-project/
http://www.carto.net/svg/utils/shp2svg/

open source data mining/text analysis tools in python

I have a database full of reviews of various products. My task is to perform various calculation and "create" another "database/xml-export" with aggregated data. I am thinking of writing command line programs in python to do that. But I know someone have done this before and I know that there is some open source python solution or similar which probably gives lot more interesting "aggregated data" then I can possibly think off.
The problem is I don't really know much about this area other then basic data manipulation from command line nor I know what are the terms I should use to even search for this thing.. I am really not looking for some scientific/visualization stuff (not that I don't mind if the tool provides), something simple to start with and gradually see/develop stuff what I need.
My only requirement is either the "end aggregated data" be in a database or export as XML file no proprietary stuff. Its a bit robust then my python scripts as I have to deal with "lots" of data across 4 machines.
Any hint where should I start my research?
Thanks.
Looks like you are looking for a Data Integration solution.
One suggestion is the open source Kettle project part of the Pentaho suite.
For python, a quick search yielded PyDI and SnapLogic
What kind of analysis are you trying to do?
If you're analyzing text take a look at the Natural Language Toolkit (NLTK).
If you want to index and search the data, take a look at the whoosh search engine.
Please provide some more detail on what kind of analysis you're looking to do.

Categories

Resources