re-reading a dictionary structure saved as text file , again as dictionary - python

I've parsed a big corpus and I've saved the data I needed in a dictionary structure. But at the end of my code I've saved it as a .txt file 'cause I needed to manually check something. now in another part of my work I need that dictionary as my input. I wanted to know if there are other ways than just opening the text file and re-putting it as a dictionary structure. If I can just manipulate my other to keep also as it is. Is Pickle the right thing for my case? or I'm totally on a wrong way? sorry if my question is so naive ,I'm really new to python and I'm still learning it.

Copy & pasting from Pickle or json?
for the ease of reading.
If you do not have any interoperability requirements (i.e. you're just going to use the data with Python), and a binary format is fine, go with cPickle, which gives you really fast Python object serialization.
If you want interoperability, or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).
According to the above, I guess you would like cPickle over json
However, another article I found that is interesting: http://kovshenin.com/2010/pickle-vs-json-which-is-faster/, which proves that json is a lot faster than pickle (the author states in the article that cPickle is faster than pickle but stil slower than json)
This SO answer What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary? compares 6 different libraries.
pickle
cPickle
json
simplejson
usjon
yajl
In addition, if you use pypy, json can be really fast.
Finally, some very recently profiling data https://gist.github.com/schlamar/3134391.

Related

Bulk import of .json files in arangodb with python

I have huge collection of .json files containing hundreds or thousands of documents I want to import to arangodb collections. Can I do it using python and if the answer is yes, can anyone send an example on how to do it from a list of files? i.e:
for i in filelist:
import i to collection
I have read the documentation but I couldn't find anything even resembling that
So after a lot of trial and error I found out that I had the answer in front of me. So I didn't need to import the .json file, I just needed to read it and then do a bulk import of documents. The code is like this:
a = db.collection('collection_name')
for x in list_of_json_files:
with open(x,'r') as json_file:
data = json.load(json_file)
a.import_bulk(data)
So actually it was quite simple. In my implementation I am collecting the .json files from multiple folders and importing them to multiple collections. I am using the python-arango 5.4.0 driver
I had this same problem. Though your implementation will be slightly different, the answer you need (maybe not the one you're looking for) is to use the "bulk import" functionality.
Since ArangoDB doesn't have an "official" Python driver (that I know of), you will have to peruse other sources to give you a good idea on how to solve this.
The HTTP bulk import/export docs provide curl commands, which can be neatly translated to Python web requests. Also see the section on headers and values.
ArangoJS has a bulk import function, which works with an array of objects, so there's no special processing or preparation required.
I have also used the arangoimport tool to great effect. It's command-line, so it could be controlled from Python, or used stand-alone in a script. For me, the key here was making sure my data was in JSONL or "JSON Lines" format (each line of the file is a self-contained JSON object, no bounding array or comma separators).

To save objects in a file without use binary serialization, Python 3.7.1

I use Python 3.7.1.
What I am looking for is to save transformed/treated object in a file, then open this file in a any text editor.
JSON is not what I am looking for. "JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8), while pickle is a binary serialization format". It looks perfect but my problem is JSON has no reason to support a namedtuple object. As far as I know it supports dict,list,and so on, but not this kind of objects.
Besides, as you read pickle makes binary serialization and this is unreadable by any text editor.
So, I want to know if there are other ways of human readable serialization of python's object, hoping this is not to complicated to implement.

How to append data to a nested JSON file in Python

I'm creating a program that will need to store different objects in a logical structure on a file which will be read by a web server and displayed to users.
Since the file will contain a lot of information, loading the whole file tom memory, appending information and writing the whole file back to the filesystem - as some answers stated - will prove problematic.
I'm looking for something of this sort:
foods = [{
"fruits":{
"apple":"red",
"banana":"yellow",
"kiwi":"green"
}
"vegetables":{
"cucumber":"green",
"tomato":"red",
"lettuce":"green"
}
}]
I would like to be able to add additional data to the table like so:
newFruit = {"cherry":"red"}
foods["fruits"].append(newFruit)
Is there any way to do this in python with JSON without loading the whole file?
That is not possible with pure JSON, appending to a JSON list will always require reading the whole file into memory.
But you could use JSON Lines for that. It's a format where each line in a valid JSON on itself, that's what AWS uses for their API's. Your vegetables.json could be written like this:
{"cucumber":"green"}
{"tomato":"red"}
{"lettuce":"green"}
Therefore, adding a new entry is very easy because it becomes just appending a new entry to the end of the file.
Since the file will contain a lot of information, loading the whole file tom memory, appending information and writing the whole file back to the filesystem - as some answers stated - will prove problematic
If your file is really too huge to fit in memory then either the source json should have been splitted in smaller independant parts or it's just not a proper use case for json. IOW what you have in this case is a design issue, not a coding one.
There's at least one streaming json parser that might or not allow you to solve the issue, depending on the source data structure and the effective updates you have to do.
This being said, given today's computers, you need a really huge json file to end up eating all your ram so before anything else you should probably just check the effective file size and how much memory it needs to be parsed to Python.

Python Library - json to json transformations

Does anyone know of a python library to convert JSON to JSON in an XSLT/Velocity template style?
JSON + transformation template = JSON (New)
Thanks!
Sorry if it's old, but you can use this module https://github.com/Onyo/jsonbender
Basically it transform a dicc to another Dicc object using a mapping. What you can do is to dump the json into a dicc, transform it to another dicc and then transfrom it back to a json.
I have not found the transformer library suitable for my needs and spend couple of days trying to create my own. And then I realized that creating transformation scheme is more difficult than writing native python code that transforms one json-like python object to another.
I understand, that this is not the answer to original question. And I also understand that my approach has certain limitations. F.e. if you need to generate documentation it wouldn't work.
But if you just need to transform json-like objects consider the possibility to just write python code that does it. Chances are that the code would be cleaner and easier to understand than transformation schema description.
I wish considered this approach more seriously couple of days ago.
I found pyjq library very magical, you can feed it a template and json file and it will map it out for you.
https://pypi.org/project/pyjq/
The only thing that is annoying about it was the requirements I have to install for it, it worked perfect on my local machine, but it failed when I tried to build it failed to build dependencies for lambda an aws.

Save formatted (not serialize) json to disk to use git for storage

In working on a project where I will be using OrderDict to store the data the code will be working on.
However, I also want to save many of these OrderDrict as well as .json-files, and to to commit them to a git-enabled content repository.
But, since json.dumps just serializes and in effect place the JSON content on one line, it will make it tricky to track the difference between committed versions in a visually appealing way for users.
Is there a way to make json output better readable formatting, with indents and line feeds so the output-file can be easy to read, as well as taking better use of gits version and diff features?
Or are there any 3rd party libraries, preferable with Python 3.5 support, out there I've missed?

Categories

Resources