Currently I am using the copy_to(..) function to get the following output:
>>> cur.copy_to(sys.stdout, 'test', sep="|")
1|100|abc'def
2|\N|dada
...
What I would like to achieve is to use the copy_to(..) function for selecting large amounts of data. I reviewed the documentation for psycopg2, however I could not find a way to use binding arguments with this function. Any suggestions?
From the psycopg docs:
Reads data from a file-like object appending them to a database table
"What's a file-like object?" you might ask. File objects are described in the python documentation and the distinction between them and file-like objects are indicated there. In general, it's an object that supports methods like open()/read()/write()/close() with the signatures matching that of the file object.
So the way to bind it would be either to use a real file (consider the tempfile module), or an in-memory "file" like that in StringIO.
Related
Lets say that memory address 0A7F03E4 stores a value 124. How to change it to 300 using python? Is there a module supplied for this kind of task?
>>> import ctypes
>>> memfield = (ctypes.c_char).from_address(0x0A7F03E4)
Now you can read memfield, assign to it, do whatever you want. As long as you have access to that memory location, of course.
you can also get a memarray with
>>> memarray = (ctypes.c_char*memoryfieldlen).from_address(0x0A7F03E4)
which gives you a list of memfields.
Update: I just read that you want to retrieve an integer. in this case, use ctypes.c_int, of course. In general: use the appropriate ctype ;)
Have you tried the built in ctypes module?
It provides a memset function which should give you what you need.
On UNIX, you can try to open() the /dev/mem device to access the physical memory, and then use the seek() method of the file object set the file pointer to the right location. Then use the write() method to change the value. Don't forget to encode the number as a raw string!
Generally, only the root user has access to this device, though.
Imagine the following problem: you have a dictionary of some content in python and want to generate python code that would create this dict. (which is like eval but in reverse)
Is there something that can do this?
Scenario:
I am working with a remote python interpreter. I can give source files to it but no input. So I am now looking for a way to encode my input data into a python source file.
Example:
d = {'a': [1,4,7]}
str_d = reverse_eval(d)
# "{'a': [1, 4, 7]}"
eval(str_d) == d
repr(thing)
will output text that when executed will (in most cases) reproduce the dictionary.
Actually, it's important for which types of data do you want this reverse function to exist. If you're talking about built-in/standard classes, usually their .__repr__() method returns the code you want to access. But if your goal is to save something in a human-readable format, but to use an eval-like function to use this data in python, there is a json library.
It's better to use json for this reason because using eval is not safe.
Json's problem is that it can't save any type of data, it can save only standard objects, but if we're talking about not built-in types of data, you never know, what is at their .__repr__(), so there's no way to use repr-eval with this kind of data
So, there is no reverse function for all types of data, you can use repr-eval for built-in, but for built-in data the json library is better at least because it's safe
What is the use of .digest() in this statement? Why do we use it ? I searched on google ( and documentation also) but still I am not able to figure it out.
train_hashes = [hashlib.sha1(x).digest() for x in train_dataset]
What I found is that it convert to string. Am I right or wrong?
The .digest() method returns the actual digest the hash is designed to produce.
It is a separate method because the hashing API is designed to accept data in multiple pieces:
hash = hashlib.sha1()
for chunk in large_amount_of_data:
hash.update(chunk)
final_digest = hash.digest()
The above code creates a hashing object without passing any initial data in, then uses the hash.update() method to put chunks of data in in a loop. This helps avoid having to all of the data into memory all at once, so you can hash anything between 1 byte and the entire Google index, if you ever had access to something that large.
If hashlib.sha1(x) produced the digest directly you could never add additional data to hash first. Moreover, there is also an alternative method of accessing the digest, as a hexadecimal string using the hash.hexdigest() method (equivalent to hash.digest().hex(), but more convenient).
The code you found uses the fact that the constructor of the hash object also accepts data; since that's the all of the data that you wanted to hash you can call .digest() immediately.
The module documentation covers it this way:
There is one constructor method named for each type of hash. All return a hash object with the same simple interface. For example: use sha256() to create a SHA-256 hash object. You can now feed this object with bytes-like objects (normally bytes) using the update() method. At any point you can ask it for the digest of the concatenation of the data fed to it so far using the digest() or hexdigest() methods.
(bold emphasis mine).
Currently expensively parsing a file, which generates a dictionary of ~400 key, value pairs, which is seldomly updated. Previously had a function which parsed the file, wrote it to a text file in dictionary syntax (ie. dict = {'Adam': 'Room 430', 'Bob': 'Room 404'}) etc, and copied and pasted it into another function whose sole purpose was to return that parsed dictionary.
Hence, in every file where I would use that dictionary, I would import that function, and assign it to a variable, which is now that dictionary. Wondering if there's a more elegant way to do this, which does not involve explicitly copying and pasting code around? Using a database kind of seems unnecessary, and the text file gave me the benefit of seeing whether the parsing was done correctly before adding it to the function. But I'm open to suggestions.
Why not dump it to a JSON file, and then load it from there where you need it?
import json
with open('my_dict.json', 'w') as f:
json.dump(my_dict, f)
# elsewhere...
with open('my_dict.json') as f:
my_dict = json.load(f)
Loading from JSON is fairly efficient.
Another option would be to use pickle, but unlike JSON, the files it generates aren't human-readable so you lose out on the visual verification you liked from your old method.
Why mess with all these serialization methods? It's already written to a file as a Python dict (although with the unfortunate name 'dict'). Change your program to write out the data with a better variable name - maybe 'data', or 'catalog', and save the file as a Python file, say data.py. Then you can just import the data directly at runtime without any clumsy copy/pasting or JSON/shelve/etc. parsing:
from data import catalog
JSON is probably the right way to go in many cases; but there might be an alternative. It looks like your keys and your values are always strings, is that right? You might consider using dbm/anydbm. These are "databases" but they act almost exactly like dictionaries. They're great for cheap data persistence.
>>> import anydbm
>>> dict_of_strings = anydbm.open('data', 'c')
>>> dict_of_strings['foo'] = 'bar'
>>> dict_of_strings.close()
>>> dict_of_strings = anydbm.open('data')
>>> dict_of_strings['foo']
'bar'
If the keys are all strings, you can use the shelve module
A shelf is a persistent, dictionary-like object. The difference with
“dbm” databases is that the values (not the keys!) in a shelf can be
essentially arbitrary Python objects — anything that the pickle module
can handle. This includes most class instances, recursive data types,
and objects containing lots of shared sub-objects. The keys are
ordinary strings.
json would be a good choice if you need to use the data from other languages
If storage efficiency matters, use Pickle or CPickle(for execution performance gain). As Amber pointed out, you can also dump/load via Json. It will be human-readable, but takes more disk.
I suggest you consider using the shelve module since your data-structure is a mapping.
That was my answer to a similar question titled If I want to build a custom database, how could I? There's also a bit of sample code in another answer of mine promoting its use for the question How to get a object database?
ActiveState has a highly rated PersistentDict recipe which supports csv, json, and pickle output file formats. It's pretty fast since all three of those formats are implement in C (although the recipe itself is pure Python), so the fact that it reads the whole file into memory when it's opened might be acceptable.
JSON (or YAML, or whatever) serialisation is probably better, but if you're already writing the dictionary to a text file in python syntax, complete with a variable name binding, you could just write that to a .py file instead. Then that python file would be importable and usable as is. There's no need for the "function which returns a dictionary" approach, since you can directly use it as a global in that file. e.g.
# generated.py
please_dont_use_dict_as_a_variable_name = {'Adam': 'Room 430', 'Bob': 'Room 404'}
rather than:
# manually_copied.py
def get_dict():
return {'Adam': 'Room 430', 'Bob': 'Room 404'}
The only difference is that manually_copied.get_dict gives you a fresh copy of the dictionary every time, whereas generated.please_dont_use_dict_as_a_variable_name[1] is a single shared object. This may matter if you're modifying the dictionary in your program after retrieving it, but you can always use copy.copy or copy.deepcopy to create a new copy if you need to modify one independently of the others.
[1] dict, list, str, int, map, etc are generally viewed as bad variable names. The reason is that these are already defined as built-ins, and are used very commonly. So if you give something a name like that, at the least it's going to cause cognitive-dissonance for people reading your code (including you after you've been away for a while) as they have to keep in mind that "dict doesn't mean what it normally does here". It's also quite likely that at some point you'll get an infuriating-to-solve bug reporting that dict objects aren't callable (or something), because some piece of code is trying to use the type dict, but is getting the dictionary object you bound to the name dict instead.
on the JSON direction there is also something called simpleJSON. My first time using json in python the json library didnt work for me/ i couldnt figure it out. simpleJSON was...easier to use
I have a large object I'd like to serialize to disk. I'm finding marshal works quite well and is nice and fast.
Right now I'm creating my large object then calling marshal.dump . I'd like to avoid holding the large object in memory if possible - I'd like to dump it incrementally as I build it. Is that possible?
The object is fairly simple, a dictionary of arrays.
The bsddb module's 'hashopen' and 'btopen' functions provide a persistent dictionary-like interface. Perhaps you could use one of these, instead of a regular dictionary, to incrementally serialize the arrays to disk?
import bsddb
import marshal
db = bsddb.hashopen('file.db')
db['array1'] = marshal.dumps(array1)
db['array2'] = marshal.dumps(array2)
...
db.close()
To retrieve the arrays:
db = bsddb.hashopen('file.db')
array1 = marshal.loads(db['array1'])
...
It all your object has to do is be a dictionary of lists, then you may be able to use the shelve module. It presents a dictionary-like interface where the keys and values are stored in a database file instead of in memory. One limitation which may or may not affect you is that keys in Shelf objects must be strings. Value storage will be more efficient if you specify protocol=-1 when creating the Shelf object to have it use a more efficient binary representation.
This very much depends on how you are building the object. Is it an array of sub objects? You could marshal/pickle each array element as you build it. Is it a dictionary? Same idea applies (marshal/pickle keys)
If it is just a big complex harry object, you might want to marshal dump each piece of the object, and then the apply what ever your 'building' process is when you read it back in.
You should be able to dump the item piece by piece to the file. The two design questions that need settling are:
How are you building the object when you're putting it in memory?
How do you need you're data when it comes out of memory?
If your build process populates the entire array associated with a given key at a time, you might just dump the key:array pair in a file as a separate dictionary:
big_hairy_dictionary['sample_key'] = pre_existing_array
marshal.dump({'sample_key':big_hairy_dictionary['sample_key']},'central_file')
Then on update, each call to marshal.load('central_file') will return a dictionary that you can use to update a central dictionary. But this is really only going to be helpful if, when you need the data back, you want to handle reading 'central_file' once per key.
Alternately, if you are populating arrays element by element in no particular order, maybe try:
big_hairy_dictionary['sample_key'].append(single_element)
marshal.dump(single_element,'marshaled_files/'+'sample_key')
Then, when you load it back, you don't necessarily need to build the entire dictionary to get back what you need; you just call marshal.load('marshaled_files/sample_key') until it returns None, and you have everything associated with the key.