I'm using Google App Engine and python for a web service. Some of the models (tables) I have in my web service have several binary data fields in them, and I'd like to present this data to a computer requesting it, all fields at the same time. Now, the problem is I don't know how to write it out in a way that the other computer knows where the first data ends and the other starts. I've been using JSON for all the things that aren't binary, but afaik JSON doesn't work for binary data. So how do you get around this?
You could of course separate the data and put it in its own model, and then reference it back to some metadata model. That would allow you to make a single page that just prints one data field of one of the items, but that is trappy both server and client implementation wise.
Another solution would be to put in some kind of separator, and just split the data on that. I suppose it would work and that's how you do it, but isn't there like a standardized way to do that? Any libraries I could use?
In short, I'd like to be able to do something like this:
binaryDataField1: data data data ...
binaryDataField2: data data data ...
etc
Several easy options:
base64 encode your data - meaning you can still use JSON.
Use Protocol Buffers.
Prefix each field with its length - either as a 4- or 8- byte integer, or as a numeric string.
One solution that would leverage your json investment would be to simply convert the binary data to something that json can support. For example, Base64 encoding might work well for you. You could treat the output of your BAse64 encoder just like you would a normal string in json. it looks like python has Base64 support built in, though i only use java on app engine so I can't guarantee that the linked library work in the sandbox or not.
Related
I am starting a project where I want to upload a PNG or a JPG to a table in my postgresql DB. The problem is that I don't know what libraries I should use. I am able to insert / update 'normal' data into my table. Can you give me any advice on what I should focus on first?
Thanks in advance.
Can you give me any advice on what I should focus on first?
upload a PNG/JPG to a postgresql DB is certain example of storing binary file in database, postgresql has article pertaining to it.
Generally there are two main distinct approaches to that:
using column of type designed for holding binary data, bytea in case of postgresql
converting binary data to text and then storing like any other long text
First does require less memory, but not all programming tools for postgresql does support it smoothly (according to author of linked article Python & .Net do), for second base64 encoding is most commonly, which is widely available for various programming language (in python it is part of standard library), however you might elect to use any other encoding, as long as it turns sequence of bytes into text and such generated text into original bytes.
I have an excel spreadsheet which basically acts as an UI, it is used to let the user enter some parameters which are then passed to some python code on a server via a web service, as well as a whole tab full of data.
I am by far no VBA expert but managed to get my data and individual variables submitted. My question is what is the best suited VBA data structure to use, ideally I would like to have something like a dictionary where the keys would be my defined Names for the Excel cells, plus the data which might for some cases will be a single value or a Variant array.
I have to be able to distinguish between keys and their corresponding values in python eventually.
So far I was playing around with collections
Dim Main_tab_vars As Collection
Set Main_tab_vars = New Collection
Main_tab_vars.Add Range("Start_Date").Value, "Start_Date_var", "Start_Date_var"
Main_tab_vars.Add Range("Definitions").Value, "Definitions_var"
If I look at the collection in my watches window I can see the values correctly stored in item1 and item2. But it looks like my key information gets lost
I would recommend either JSON or Xml when sending data to a web service, these are the industry standards. If chooisng JSON then you'd use nested dictionaries and then build a string (plenty of code on internet) when ready . If using Xml then you could build up the Xml document as you go.
I do not know how well Python handles JSON so probably I'd opt for XML.
I would like to preface my question that this is the first time I've interacted with an API and JSON as I'm typically more on the Database sides of things.
With that, I'm a little confused with one of the APIs I'm currently working with.
I have a vendor that has an API that allows me to pull down some information about some of the users of that service. The problem is that the response seems to not be in JSON, or if it is it isn't a version of JSON that I have seen.
The response looks like this.
{"Header":"Field1,Field2,Field3,Field4", "Rows":["Row1Value1,Row1Value2,Row1Value3,Row1Value4","Row2Value1,Row2Value2,Row2Value3,Row2Value4"]}
Which, seems wrong with everything that I've been doing with JSON so far. I'm unable to interpret this in Python as anything use-able or Powershell.
Is this a type of format? Or is this some weird thing that this vendor has generated that isn't JSON and needs to be interpreted as it's own thing?
It looks like a half-JSON implementation; the outer containers look like JSON, and you get a JSON list for the rows, but the inner contents of Header and each row in Rows looks like a string you'll need to tokenize yourself (split on commas).
I think there is a bit of confusion here. JSON means literally just JavaScript Object Notation. Anything that parses to a valid object in JS and is limited to the data types String, Bool, Int, Float, Array and Object is JSON.
So, is this JSON? Yes, beyond doubt. Is this good JSON? Not really. Unfortunately, the idea would be that you would be able to parse a JSON object into a tabular form, but here, you would have to split things yourself.
Using simple string manipulation (split()), you can easily parse the rows and restructure them to your heart's content.
I am indexing data on elasticsearch using the official python library for this: elasticsearch-py. The data is directly taken from oracle using the cx_oracle python library, cast into a document format and send for indexing to elasticsearch. For the most part this works great, but sometimes I encounter problems with characters like ö. Sometimes this character is indexed as \xc3\xb8 and sometimes as ö. This happens even in the same database entry. One variable can have the ö indexed correct while for another variable this is not the case.
Does Anyone an idea what might cause this?
thanks in advance
If your "ö" is sometimes right - and sometimes not, the data must be corrupted in your database. This is not a problem of Elasticsearch. (I had the exact same problem one month ago!)
Strings with various encodings are likely put in your database without being all converted to a single format before.
text = "ö"
asUtf=text.encode('UTF-8')
print(asUtf)
print(asUtf.decode())
Result:
b'\xc3\xb6'
ö
This problem could be solved before the insertion into Elasticsearch. Find the text sequences matching '\xXX\xXX', treat them as UTF-8 and decode them to unicode. Try to sanitize you database and fix the way you put information inside.
PS: a better practice to move information from a database to Elasticsearch is to use rivers or to make a script that would directly send the data to Elasticsearch, without saving them into a file first.
2016 edit: the rivers are deprecated now, so you should find an alternative like logstash.
I want to save some text to the database using the Django ORM wrappers. The problem is, this text is generated by scraping external websites and many times it seems they are listed with the wrong encoding. I would like to store the raw bytes so I can improve my encoding detection as time goes on without redoing the scrapes. But Django seems to want everything to be stored as unicode. Can I get around that somehow?
You can store data, encoded into base64, for example. Or try to analize HTTP headers from browser, may be it is simplier to get proper encoding from there.
Create a File with the data. Use a Django models.FileField to hold a reference to the file.
No it does not involve a ton of I/O. If your file is small it adds 2 or 3 I/O's (the directory read, the iNode read and the data read.)