Does anyone know of a python library to convert JSON to JSON in an XSLT/Velocity template style?
JSON + transformation template = JSON (New)
Thanks!
Sorry if it's old, but you can use this module https://github.com/Onyo/jsonbender
Basically it transform a dicc to another Dicc object using a mapping. What you can do is to dump the json into a dicc, transform it to another dicc and then transfrom it back to a json.
I have not found the transformer library suitable for my needs and spend couple of days trying to create my own. And then I realized that creating transformation scheme is more difficult than writing native python code that transforms one json-like python object to another.
I understand, that this is not the answer to original question. And I also understand that my approach has certain limitations. F.e. if you need to generate documentation it wouldn't work.
But if you just need to transform json-like objects consider the possibility to just write python code that does it. Chances are that the code would be cleaner and easier to understand than transformation schema description.
I wish considered this approach more seriously couple of days ago.
I found pyjq library very magical, you can feed it a template and json file and it will map it out for you.
https://pypi.org/project/pyjq/
The only thing that is annoying about it was the requirements I have to install for it, it worked perfect on my local machine, but it failed when I tried to build it failed to build dependencies for lambda an aws.
Related
I have huge collection of .json files containing hundreds or thousands of documents I want to import to arangodb collections. Can I do it using python and if the answer is yes, can anyone send an example on how to do it from a list of files? i.e:
for i in filelist:
import i to collection
I have read the documentation but I couldn't find anything even resembling that
So after a lot of trial and error I found out that I had the answer in front of me. So I didn't need to import the .json file, I just needed to read it and then do a bulk import of documents. The code is like this:
a = db.collection('collection_name')
for x in list_of_json_files:
with open(x,'r') as json_file:
data = json.load(json_file)
a.import_bulk(data)
So actually it was quite simple. In my implementation I am collecting the .json files from multiple folders and importing them to multiple collections. I am using the python-arango 5.4.0 driver
I had this same problem. Though your implementation will be slightly different, the answer you need (maybe not the one you're looking for) is to use the "bulk import" functionality.
Since ArangoDB doesn't have an "official" Python driver (that I know of), you will have to peruse other sources to give you a good idea on how to solve this.
The HTTP bulk import/export docs provide curl commands, which can be neatly translated to Python web requests. Also see the section on headers and values.
ArangoJS has a bulk import function, which works with an array of objects, so there's no special processing or preparation required.
I have also used the arangoimport tool to great effect. It's command-line, so it could be controlled from Python, or used stand-alone in a script. For me, the key here was making sure my data was in JSONL or "JSON Lines" format (each line of the file is a self-contained JSON object, no bounding array or comma separators).
I have a large number of Graphviz files that I need to convert to Neo4j. At first blush, it looks like it should be easy enough to read it as a text file and convert to cypher but I am hoping that one of the python graphviz libraries would make it easier to "parse" the input, or that someone is aware of a prebuilt library.
Is anyone aware of, or has already built, a parser for conversion ? Partial examples are fine. Thanks
You can probably hack this together pretty easily using NetworkX. They implement a read_dot to read in the graphviz format, then I'm sure you can use one of their graph exporters to dump that back into a format that neo4j can use. For example, here's a package that attempts to simplify that export process (disclaimer: I've never tried this package, it just showed up in Google).
I am trying to load an avro file into a sparks dataframe so I can convert it to a pandas and eventually a dictionary. The method I want to use:
df = spark.read.format("avro").load(avro_file_in_memory)
(Note: the avro file data I'm trying to load into the dataframe is already in memory as a response from a request response from python requests)
However, this function uses sparks native to databricks environment, which I am not working in (I looked into pysparks for a similar function/code but could not see anything myself).
Is there any function similar that I can use outside of data bricks to produce the same results?
That Databricks library is open source, but was actually added to core Spark in 2.4 (though still an external library)
In any case, there's a native avro Python library, as well as fastavro, so I'm not entirely sure if you want to be starting up a JVM (because you're using Spark), just to load Avro data into a dictionary. Besides that, an Avro file consists of multiple records, so it would at the very least be a list of dictionaries
Basically, I think you're better off using the approach from your previous question, but start with writing the Avro data to disk, since that seems to be your current issue
Otherwise, maybe a little more searching for what you're looking for would solve this XY problem you're having
https://github.com/ynqa/pandavro
I'm a beginner in python, I'm currently working on a project scheduler, I get the data entered by the user, then put them in a table in a file to print it later.
unfortunately I have no idea how to do it, I searched a lot on the internet without success.
So if someone can help me it would be really cool
your request seems rather broad and not overly specific so this answer may not be what you're looking for but I will try help anyway.
Saving Files in Python
If you want to learn about saving files look up a tutorial on Pickle. Pickle lets you save data while maintaining its data type, for example, you can save a list in a file using Pickle then load the file using Pickle to get the list back. To use Pickle make sure to have the line import pickleat the top of your code and use pickle. before every Pickle function. e.g. pickle.dump()
Here's a useful tutorial I found on Pickle https://pythonprogramming.net/python-pickle-module-save-objects-serialization/
You will also want to ensure you know about file handling in Python. Here's a cheat sheet with all the basic file handling functions https://www.pythonforbeginners.com/cheatsheet/python-file-handling
Dates and Times in Python
A helpful module called Datetime will enable you to check your system's current Date and/or Time in Python. There are many functions of Datetime however you will probably only need to use the basic aspects of it. Again make sure you have the line import datetime at the top of your code and use dattime before every Datetime function.
Here's a useful tutorial I found on Datetime https://www.tutorialspoint.com/python/python_date_time.htm
If you find yourself stuck or you're not sure what to do feel free to ask more questions. Hope this has been somewhat helpful for you.
I've parsed a big corpus and I've saved the data I needed in a dictionary structure. But at the end of my code I've saved it as a .txt file 'cause I needed to manually check something. now in another part of my work I need that dictionary as my input. I wanted to know if there are other ways than just opening the text file and re-putting it as a dictionary structure. If I can just manipulate my other to keep also as it is. Is Pickle the right thing for my case? or I'm totally on a wrong way? sorry if my question is so naive ,I'm really new to python and I'm still learning it.
Copy & pasting from Pickle or json?
for the ease of reading.
If you do not have any interoperability requirements (i.e. you're just going to use the data with Python), and a binary format is fine, go with cPickle, which gives you really fast Python object serialization.
If you want interoperability, or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).
According to the above, I guess you would like cPickle over json
However, another article I found that is interesting: http://kovshenin.com/2010/pickle-vs-json-which-is-faster/, which proves that json is a lot faster than pickle (the author states in the article that cPickle is faster than pickle but stil slower than json)
This SO answer What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary? compares 6 different libraries.
pickle
cPickle
json
simplejson
usjon
yajl
In addition, if you use pypy, json can be really fast.
Finally, some very recently profiling data https://gist.github.com/schlamar/3134391.