Upload a DynamoDB array from Json string in boto3 - python

To upload a JSON file to an AWS DynamoDB table in Python I am happily using the script found on this page, but I can't understand if it is possible to tell Python to split a single string of the JSON file on a specific character in order to create an array of elements on DynamoDB.
For example, let's use this data.json file
[
{
"artist": "Romero Allen",
"song": "Atomic Dim",
"id": "b4b0da3f-36e3-4569-b196-3ad982f72bbd",
"priceUsdCents": 392,
"publisher": "QUAREX|IME|RUME"
},
{
"artist": "Hilda Barnes",
"song": "Almond Dutch",
"id": "eeb58c73-603f-4d6b-9e3b-cf587488f488",
"priceUsdCents": 161,
"publisher": "LETPRO|SOUNDSCARE"
}
]
and this script.py file
import boto3
import json
dynamodb = boto3.client('dynamodb')
def upload():
with open('data.json', 'r') as datafile:
records = json.load(datafile)
for song in records:
print(song)
item = {
'artist':{'S':song['artist']},
'song':{'S':song['song']},
'id':{'S': song['id']},
'priceUsdCents':{'S': str(song['priceUsdCents'])},
'publisher':{'S': song['publisher']}
}
print(item)
response = dynamodb.put_item(
TableName='basicSongsTable',
Item=item
)
print("UPLOADING ITEM")
print(response)
upload()
My target is to edit the script so the publisher column won't include the string
publisher: "QUAREX|IME|RUME"
but a nested array of elements
publisher:["QUAREX","IME","RUME"]
For me, an extra edit of the JSON file with Python before running the upload script is an option.

You can just use .split('|')
item = {
'artist':{'S':song['artist']},
'song':{'S':song['song']},
'id':{'S': song['id']},
'priceUsdCents':{'S': str(song['priceUsdCents'])},
'publisher':{'L': song['publisher'].split('|')}
}

Related

JSON.py is deleting my json file when I run code

I am helping my teacher by creating a game where the k-2 kids can learn their passwords, I added this json file to create a save for the teacher so he doesn't have to re-add all the computers names and passwords... But when I run my code, the json file gets wiped, and I lose all of my code... Luckily I had backups but I cant have it erase for my teacher.
Python Code:
import json
with open("accounts.json", "w") as f:
accountData = json.dumps(f)
type(accountData)
JSON:
{ // not real names and passwords for security
"accounts": [
{
"id": "1",
"uname": "scarlett",
"pword": "k,",
"points": "0"
},
{
"id": "2",
"uname": "santiago",
"pword": "k,",
"points": "0"
},
{
"id": "3",
"uname": "harper",
"pword": "k,",
"points": "0"
}
]
}
It's a wrong use of the json.dump method.
Here is part of the help to json.dump in python at version 3.7.6.
Help on function dump in module json:
dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
``.write()``-supporting file-like object).
The first parameter should be the dict object which contains the names and passwords, and you missed this parameter.
And the parameter f should be in the second place.

How to convert JSON data to PDF using python script

I want to convert JSON data to PDF which is getting from API.
example JSON data
{
"data": [
{
"state": "Manchester",
"quantity": 20
},
{
"state": "Surrey",
"quantity": 46
},
{
"state": "Scotland",
"quantity": 36
},
{
"state": "Kent",
"quantity": 23
},
{
"state": "Devon",
"quantity": 43
},
{
"state": "Glamorgan",
"quantity": 43
}
]
}
I found this script:
http://code.activestate.com/recipes/578979-convert-json-to-pdf-with-python-and-xtopdf/
but getting error
no module PDFWriter
Is there any another way to convert JSON Data PDF.
PLEASE HELP.
the module PDFWriter is in xtopdf
PDFWriter - a core class of the xtopdf toolkit - can now be used with
a Python context manager, a.k.a. the Python with statement.
( http://code.activestate.com/recipes/578790-use-pdfwriter-with-context-manager-support/ )
how to install xtopdf is in https://bitbucket.org/vasudevram/xtopdf :
Installation and usage:
To install the files, first make sure that you have downloaded and
installed all the prerequisities mentioned above, including setup
steps such as adding needed directories to your PYTHONPATH. Then, copy
all the files in xtopdf.zip into a directory which is on your
PYTHONPATH.
To use any of the Python programs, run the .py file as:
python filename.py
This will give a usage message about the correct usage and arguments
expected.
To run the shell script(s), do the same as above.
Developers can look at the source code for further information.
an alternative is to use pdfdocument to create the pdf, it can be installed using pip ( https://pypi.python.org/pypi/pdfdocument )
parse the data from the json data ( How can I parse GeoJSON with Python, Parse JSON in Python ) and print it as pdf using pdfdocument ( https://pypi.python.org/pypi/pdfdocument )
import json
data = json.loads(datastring)
from io import BytesIO
from pdfdocument.document import PDFDocument
def say_hello():
f = BytesIO()
pdf = PDFDocument(f)
pdf.init_report()
pdf.h1('Hello World')
pdf.p('Creating PDFs made easy.')
pdf.generate()
return f.getvalue()
from json2html import *
import json
import tempfile
class PdfConverter(object):
def __init__(self):
pass
def to_html(self, json_doc):
return json2html.convert(json=json_doc)
def to_pdf(self, html_str):
return pdfkit.from_string(html_str, None)
def main():
stowflw = {
"data": [
{
"state": "Manchester",
"quantity": 20
},
{
"state": "Surrey",
"quantity": 46
},
{
"state": "Scotland",
"quantity": 36
},
{
"state": "Kent",
"quantity": 23
},
{
"state": "Devon",
"quantity": 43
},
{
"state": "Glamorgan",
"quantity": 43
}
]
}
pdfc = PdfConverter()
with open("sample.pdf", "wb") as pdf_fl:
pdf_fl.write(pdfc.to_pdf(pdfc.to_html(json.dumps(stowflw))))
install json2html
install pdfkit (requires wkhtmltox)
when you hit this code it will generate a pdf for this url (API).
import pdfkit
pdfkit.from_url('https://api.covid19api.com/summary', 'india.pdf')
you can also generate an pdf from a different formats like .file, .html , .text, multiple url
import json
import requests
response = requests.get('https://api.covid19api.com/summary').text
# loads converts a string to a JSON object
json_object = json.loads(response)
# json. dumps converts a json object to a string
print(json.dumps(json_object, indent=1))
#different formats
pdfkit.from_url('http://aj7t.me', 'output.pdf')
pdfkit.from_file('test.html', 'output.pdf')
pdfkit.from_string('Hello!', 'output.pdf')
👍For more information,
please check the documentation!

How can I parse JSON into a binary Avro file using the Python Avro api?

I am able to use the avro-tools-1.7.7.jar to take json data and avro schema and output a binary Avro file as shown here https://github.com/miguno/avro-cli-examples#json-to-avro. However, I want to be able to do this programmatically using the Avro python api: https://avro.apache.org/docs/1.7.7/gettingstartedpython.html.
In their example they show how you can write a record at a time into a binary avro file.
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
schema = avro.schema.parse(open("user.avsc").read())
writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
My use case is writing all of the records at once like the avro-tools jar does from a json file, just in python code. I do not want to shell out and execute the jar. This will be deployed to Google App Engine if that matters.
This can be accomplished with fastavro. For example, given the schema in the link:
twitter.avsc
{
"type" : "record",
"name" : "twitter_schema",
"namespace" : "com.miguno.avro",
"fields" : [ {
"name" : "username",
"type" : "string",
"doc" : "Name of the user account on Twitter.com"
}, {
"name" : "tweet",
"type" : "string",
"doc" : "The content of the user's Twitter message"
}, {
"name" : "timestamp",
"type" : "long",
"doc" : "Unix epoch time in seconds"
} ],
"doc:" : "A basic schema for storing Twitter messages"
}
And the json file:
twitter.json
{"username":"miguno","tweet":"Rock: Nerf paper, scissors is fine.","timestamp": 1366150681 }
{"username":"BlizzardCS","tweet":"Works as intended. Terran is IMBA.","timestamp": 1366154481 }
You can use something like the following script to write out an avro file:
import json
from fastavro import json_reader, parse_schema, writer
with open("twitter.avsc") as fp:
schema = parse_schema(json.load(fp))
with open("twitter.avro", "wb") as avro_file:
with open("twitter.json") as fp:
writer(avro_file, schema, json_reader(fp, schema))

Extract JSON Data in Python - Example Code Included

I am brand new to using JSON data and fairly new to Python. I am struggling with being able to parse the following JSON data in Python, in order to import the data into a SQL Server database. I already have a program that will import the parsed data into sql server using PYDOBC, however I can't for the life of me figure out how to correctly parse the JSON data into a Python dictionary.
I know there are a number of threads that address this issue, however I was unable to find any examples of the same JSON data structure. Any help would be greatly appreciated as I am completely stuck on this issue. Thank you SO! Below is a cut of the JSON data I am working with:
{
"data":
[
{
"name": "Mobile Application",
"url": "https://www.example-url.com",
"metric": "users",
"package": "example_pkg",
"country": "USA",
"data": [
[ 1396137600000, 5.76 ],
[ 1396224000000, 5.79 ],
[ 1396310400000, 6.72 ],
....
[ 1487376000000, 7.15 ]
]
}
],"as_of":"2017-01-22"}
Again, I apologize if this thread is repetitive, however as I mentioned above, I was not able to work out the logic from other threads as I am brand new to using JSON.
Thank you again for any help or advice in regard to this.
import json
with open("C:\\Pathyway\\7Park.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
The above code results with the following error:
Traceback (most recent call last):
File "JSONpy", line 10, in <module>
data = json.load(json_file)
File "C:\json\__init__.py", line 291, in load
**kw)
File "C:\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 7 column 1 (char 23549 - 146249)
Assuming the data you've described (less the ... ellipsis) is in a file called j.json, this code parses the JSON document into a Python object:
import json
with open("j.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
From your error message it seems possible that your file is not a single JSON document, but a sequence of JSON documents separated by newlines. If that is the case, then this code might be more helpful:
import json
with open("j.json") as json_file:
for line in json_file:
data = json.loads(line)
print (data["data"][0]["metric"])

Do file size requirements change when importing a CSV file to MongoDB?

Background:
I'm attempting to follow a tutorial in which I'm importing a CSV file that's approximately 324MB
to MongoLab's sandbox plan (capped at 500MB), via pymongo in Python 3.4.
The file holds ~ 770,000 records, and after inserting ~ 164,000 I hit my quota and received:
raise OperationFailure(error.get("errmsg"), error.get("code"), error)
OperationFailure: quota exceeded
Question:
Would it be accurate to say the JSON-like structure of NoSQL takes more space to hold the same data as a CSV file? Or am I doing something screwy here?
Further information:
Here are the database metrics:
Here's the Python 3.4 code I used:
import sys
import pymongo
import csv
MONGODB_URI = '***credentials removed***'
def main(args):
client = pymongo.MongoClient(MONGODB_URI)
db = client.get_default_database()
projects = db['projects']
with open('opendata_projects.csv') as f:
records = csv.DictReader(f)
projects.insert(records)
client.close()
if __name__ == '__main__':
main(sys.argv[1:])
Yes, JSON takes up much more space than CSV. Here's an example:
name,age,job
Joe,35,manager
Fred,47,CEO
Bob,23,intern
Edgar,29,worker
translated in JSON, it would be:
[
{
"name": "Joe",
"age": 35,
"job": "manager"
},
{
"name": "Fred",
"age": 47,
"job": "CEO"
},
{
"name": "Bob",
"age": 23,
"job": "intern"
},
{
"name": "Edgar",
"age": 29,
"job": "worker"
}
]
Even with all whitespace removed, the JSON is 158 characters, while the CSV is only 69 characters.
Not accounting for things like compression, a set of json documents would take up more space than a csv, because the field names are repeated in each record, whereas in the csv the field names are only in the first row.
The way files are allocated is another factor:
In the filesize section of the Database Metrics screenshot you attached, notice that it says that the first file allocated is 16MB, then the next one is 32MB, and so on. So when your data grew past 240MB total, you had 5 files, of 16MB, 32MB, 64MB, 128MB, and 256MB. This explains why your filesize total is 496MB, even though your data size is only about 317MB. The next file that would be allocated would be 512MB, which would put you way past the 500MB limit.

Categories

Resources