Save JSON file into structured database with Python - python

I am building a warehouse consisting of data that's found from a public facing API. In order to store & analyze the data, I'd like to save the JSON files I'm receiving into a structured SQL database. Meaning, all the JSON contents shouldn't be contained in 1 column. The contents should be parsed out and stored in various other tables in a relational database.
From a process standpoint, I need to do the following:
Call API
Receive JSON
Parse JSON file
Insert/Update table(s) in a SQL database
(This process will be repeated hundreds and hundreds of times)
Is there a best practice to accomplish this - from either a process or resource standpoint? I'd like to do this in Python if possible.
Thanks.

You should be able to use json.dumps(json_value) to convert your JSON object into a JSON string that can be put into an sql database.

Related

How to set parameters for an API POST query using Python and JSON

I'm querying a real estate API using Python (requests), with POST data submitted in JSON format.
I'm getting responses as expected - however each time I want to make a query I'm editing the fields in a hardcoded JSON object in the .py file.
I'd like to do something a bit more robust - eg using a user prompt to populate the JSON object to be submitted, based on the API search schema (see JSON file (pastebin)) (open to alternative python based solutions to this).
The linked schema includes the full list of parameters available to query - I'll likely trim this down to the ones that are most relevant to the queries that I'm building/POSTing, so that there are less parameters to deal with. I'd like to know of a Pythonic way to cycle through the Parameters in the Schema and then add the ones I wish to submit for a query to the JSON object?
TIA.

Insert Information to MySQL from multiple related excel files

So i have this huge DB schema from a vehicle board cards, this data is actually stored in multiple excel files, my job was to create a database scheema to dump all this data into a MySql, but now i need to create the process to insert data into the DB.
This is an example of how is the excel tables sorted:
The thing is that all this excel files are not well tagged.
My question is, what do i need to do in order to create a script to dump all this data from the excel to the DB?
I'm also using ids, Foreign keys, Primary Keys, joins, etc.
I've thought about this so far:
1.-Normalize the structure of the tables in Excel in a good way so that data can be inserted with SQL language.
2.-Create a script in python to insert the data of each table.
Can you help out where should i start and how? what topics i should google?
With pandas you can easily read from excel (both csv and xlsx) and dump the data into any database
import pandas as pd
df = pd.read_excel('file.xlsx')
df.to_sql(sql_table)
If you have performance issues dumping to MySQL, you can find another way of doing the dump here
python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL?

CSV vs JSON vs DB - which is fastest and scalable to load in the memory and retrieve data

I have a large 1.5Gb data file with multiple fields separated by tabs.
I need to do lookups in this file from a web interface/ajax queries like an API, possibly large number of ajax requests coming in each second. So it needs to be fast in response.
What is the fastest option for retrieving this data? Is there performance-tested info, benchmarking?
Considering the tab-separated CSV file is a flat file that will be loaded in the memory. But it cannot produce an index.
JSON has more text because, but an 'indexed' JSON can be created, grouping entries for a certain field.
Neither. They are both horrible for your stated purpose. JSON cannot be partially loaded; TSV can be scanned without loading it in memory, but has sequential access. Use a proper database.
If, for some reason, you can't use a database, you can McGyver it by using TSV or JSONL (not JSON) with an additional index file that specifies the byte position of the start of the record for each ID (or another searchable field).

Insert CSV file into SQL Server (not import!)

I would like to store CSV files in SQL Server. I've created a table with column "myDoc" as varbinary(max). I generate the CSV's on a server using Python/Django. I would like to insert the actual CSV (not the path) as a BLOB object so that I can later retrieve the actual CSV file.
How do I do this? I haven't been able to make much headway with this documentation, as it mostly refers to .jpg's
https://msdn.microsoft.com/en-us/library/a1904w6t(VS.80).aspx
Edit:
I wanted to add that I'm trying to avoid filestream. The CSVs are too small (5kb) and I don't need text search over them.
Not sure why you want varbinary over varchar, but it will work either way
Insert Into YourTable (myDoc)
Select doc = BulkColumn FROM OPENROWSET(BULK 'C:\Working\SomeXMLFile.csv', SINGLE_BLOB) x

Method for converting diverse JSON files into RDBMS schema?

I have a large number of JSON documents. I would like to store them in an RDBMS for querying. Once there they will never change; it's a data warehousing issue. I have lots of RDBMS data that I want to match the JSON data with, so it would be inefficient to store the JSON in a more traditional manner (e.g. CouchDB).
From hunting the web, I gather that the best approach might be to create JSON schema files using a tool such as JSON Schema Generator and then use that to build a structured RDBMS series of tables. My data is sufficiently limited in scope (minimal JSON nesting) that I could do this by hand if needed, but a tool that automatically converted from JSON schema to DB DDL statements would be great if it is out there.
My question has two parts but is aimed at the first issue - is there a tool or method by which I can create a master schema that describes all of my data. Many instances are missing various fields (and I have tens of gigabytes of JSON data)? The second part is with the serialization process. Does there exist a library (ideally Python) that would take a schema file and a JSON object and output the DML to insert that into a RDBMS?
We just published this package in https://github.com/deepstartup/jsonutils. May be you will find it useful. If you need us to update something, open up a JIRA.
Try:
pip install DDLJ
from DDLj import genddl
genddl(*param1,param2,*param3,*param4)
Where
param1= JSON Schema File
param2=Database (Default Oracle)
Param3= Glossary file
Param4= DDL output script
Some draft Python for converting JSON to DDL. You'll have to adapt it for JSON schema.
import json
import sys
fp = open(sys.argv[1])
jsobj = json.load(fp)
print "Create table("
for elt in jsobj["fields"]:
print elt["name"], elt["type"], ","
print ");"

Categories

Resources