I want to convert JSON data to PDF which is getting from API.
example JSON data
{
"data": [
{
"state": "Manchester",
"quantity": 20
},
{
"state": "Surrey",
"quantity": 46
},
{
"state": "Scotland",
"quantity": 36
},
{
"state": "Kent",
"quantity": 23
},
{
"state": "Devon",
"quantity": 43
},
{
"state": "Glamorgan",
"quantity": 43
}
]
}
I found this script:
http://code.activestate.com/recipes/578979-convert-json-to-pdf-with-python-and-xtopdf/
but getting error
no module PDFWriter
Is there any another way to convert JSON Data PDF.
PLEASE HELP.
the module PDFWriter is in xtopdf
PDFWriter - a core class of the xtopdf toolkit - can now be used with
a Python context manager, a.k.a. the Python with statement.
( http://code.activestate.com/recipes/578790-use-pdfwriter-with-context-manager-support/ )
how to install xtopdf is in https://bitbucket.org/vasudevram/xtopdf :
Installation and usage:
To install the files, first make sure that you have downloaded and
installed all the prerequisities mentioned above, including setup
steps such as adding needed directories to your PYTHONPATH. Then, copy
all the files in xtopdf.zip into a directory which is on your
PYTHONPATH.
To use any of the Python programs, run the .py file as:
python filename.py
This will give a usage message about the correct usage and arguments
expected.
To run the shell script(s), do the same as above.
Developers can look at the source code for further information.
an alternative is to use pdfdocument to create the pdf, it can be installed using pip ( https://pypi.python.org/pypi/pdfdocument )
parse the data from the json data ( How can I parse GeoJSON with Python, Parse JSON in Python ) and print it as pdf using pdfdocument ( https://pypi.python.org/pypi/pdfdocument )
import json
data = json.loads(datastring)
from io import BytesIO
from pdfdocument.document import PDFDocument
def say_hello():
f = BytesIO()
pdf = PDFDocument(f)
pdf.init_report()
pdf.h1('Hello World')
pdf.p('Creating PDFs made easy.')
pdf.generate()
return f.getvalue()
from json2html import *
import json
import tempfile
class PdfConverter(object):
def __init__(self):
pass
def to_html(self, json_doc):
return json2html.convert(json=json_doc)
def to_pdf(self, html_str):
return pdfkit.from_string(html_str, None)
def main():
stowflw = {
"data": [
{
"state": "Manchester",
"quantity": 20
},
{
"state": "Surrey",
"quantity": 46
},
{
"state": "Scotland",
"quantity": 36
},
{
"state": "Kent",
"quantity": 23
},
{
"state": "Devon",
"quantity": 43
},
{
"state": "Glamorgan",
"quantity": 43
}
]
}
pdfc = PdfConverter()
with open("sample.pdf", "wb") as pdf_fl:
pdf_fl.write(pdfc.to_pdf(pdfc.to_html(json.dumps(stowflw))))
install json2html
install pdfkit (requires wkhtmltox)
when you hit this code it will generate a pdf for this url (API).
import pdfkit
pdfkit.from_url('https://api.covid19api.com/summary', 'india.pdf')
you can also generate an pdf from a different formats like .file, .html , .text, multiple url
import json
import requests
response = requests.get('https://api.covid19api.com/summary').text
# loads converts a string to a JSON object
json_object = json.loads(response)
# json. dumps converts a json object to a string
print(json.dumps(json_object, indent=1))
#different formats
pdfkit.from_url('http://aj7t.me', 'output.pdf')
pdfkit.from_file('test.html', 'output.pdf')
pdfkit.from_string('Hello!', 'output.pdf')
👍For more information,
please check the documentation!
Related
To upload a JSON file to an AWS DynamoDB table in Python I am happily using the script found on this page, but I can't understand if it is possible to tell Python to split a single string of the JSON file on a specific character in order to create an array of elements on DynamoDB.
For example, let's use this data.json file
[
{
"artist": "Romero Allen",
"song": "Atomic Dim",
"id": "b4b0da3f-36e3-4569-b196-3ad982f72bbd",
"priceUsdCents": 392,
"publisher": "QUAREX|IME|RUME"
},
{
"artist": "Hilda Barnes",
"song": "Almond Dutch",
"id": "eeb58c73-603f-4d6b-9e3b-cf587488f488",
"priceUsdCents": 161,
"publisher": "LETPRO|SOUNDSCARE"
}
]
and this script.py file
import boto3
import json
dynamodb = boto3.client('dynamodb')
def upload():
with open('data.json', 'r') as datafile:
records = json.load(datafile)
for song in records:
print(song)
item = {
'artist':{'S':song['artist']},
'song':{'S':song['song']},
'id':{'S': song['id']},
'priceUsdCents':{'S': str(song['priceUsdCents'])},
'publisher':{'S': song['publisher']}
}
print(item)
response = dynamodb.put_item(
TableName='basicSongsTable',
Item=item
)
print("UPLOADING ITEM")
print(response)
upload()
My target is to edit the script so the publisher column won't include the string
publisher: "QUAREX|IME|RUME"
but a nested array of elements
publisher:["QUAREX","IME","RUME"]
For me, an extra edit of the JSON file with Python before running the upload script is an option.
You can just use .split('|')
item = {
'artist':{'S':song['artist']},
'song':{'S':song['song']},
'id':{'S': song['id']},
'priceUsdCents':{'S': str(song['priceUsdCents'])},
'publisher':{'L': song['publisher'].split('|')}
}
I am helping my teacher by creating a game where the k-2 kids can learn their passwords, I added this json file to create a save for the teacher so he doesn't have to re-add all the computers names and passwords... But when I run my code, the json file gets wiped, and I lose all of my code... Luckily I had backups but I cant have it erase for my teacher.
Python Code:
import json
with open("accounts.json", "w") as f:
accountData = json.dumps(f)
type(accountData)
JSON:
{ // not real names and passwords for security
"accounts": [
{
"id": "1",
"uname": "scarlett",
"pword": "k,",
"points": "0"
},
{
"id": "2",
"uname": "santiago",
"pword": "k,",
"points": "0"
},
{
"id": "3",
"uname": "harper",
"pword": "k,",
"points": "0"
}
]
}
It's a wrong use of the json.dump method.
Here is part of the help to json.dump in python at version 3.7.6.
Help on function dump in module json:
dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
``.write()``-supporting file-like object).
The first parameter should be the dict object which contains the names and passwords, and you missed this parameter.
And the parameter f should be in the second place.
I'm trying to split a JSON file to two different XML files. Example below.
Trying to use a python script to perform this. A groovy script would work as well. This split function is part of a file transformation in Apace NiFi.
JSON file :
{
"Cars": {
"Car": [{
"Brand": "Volkswagon"
"Country": "Germany",
"Type": "All",
"Models":
[{
"Polo": {
"Type": "Hatchback",
"Color": "White",
"Cost": "10000"
}
} {
"Golf": {
"Type": "Hatchback",
"Color": "White",
"Cost": "12000"
}
}
]
}
]
}
}
Split to two XML files :
XML 1 :
<VehicleEntity>
<VehicleEntity>
<GlobalBrandId>Car123</GlobalBrandId>
<Name>Random Value</Name>
<Brand>Volkswagon</Brand>
</VehicleEntity>
</VehicleEntity>
XML 2 :
<VehicleEntityDetail>
<VehicleEntityDetailsEntity>
<GlobalBrandId>Car123</GlobalBrandId>
<Brand>Volkswagon</Brand>
<Type>Hatchback</Type>
<Color>White</Color>
<Cost>10000</Cost>
</VehicleEntityDetailsEntity>
</VehicleEntityDetail>
The XML tag names are a little different to the elements in the JSON file.
I'm looking for the best possible way to achieve this, but prefer a python script due to some experience working with Python.
Any other solution for Apache NiFi is also appreciated.
I have json files in S3 containing array of objects in each file, like shown below.
[{
"id": "c147162a-a304-11ea-aa90-0242ac110028",
"clientId": "xxx",
"contextUUID": "1bb6b39e-b181-4a6d-b43b-4040f9d254b8",
"tags": {},
"timestamp": 1592855898
}, {
"id": "c147162a-a304-11ea-aa90-0242ac110028",
"clientId": "yyy",
"contextUUID": "1bb6b39e-b181-4a6d-b43b-4040f9d254b8",
"tags": {},
"timestamp": 1592855898
}]
I used crawler to detect and load the schema to catalog. It was successful and it created a schema with a single column named array with data type array<struct<id:string,clientId:string,contextUUID:string,tags:string,timestamp:int>>.
Now, I tried to load the data using glueContext.create_dynamic_frame.from_catalog function, but I could not see any data. I tried printing schema and data as shown below.
ds = glueContext.create_dynamic_frame.from_catalog(
database = "dbname",
table_name = "tablename")
ds.printSchema()
root
ds.schema()
StructType([], {})
ds.show()
empty
ds.toDF().show()
++
||
++
++
Any idea, what I am doing wrong? I am planning to extract each object in array and transform the object to a different schema.
You can try to give regex in format_options to tell glue how it should read the data. Following code has worked for me:
glueContext.create_dynamic_frame_from_options('s3',
{
'paths': ["s3://glue-test-bucket-12345/events/101-1.json"]
},
format="json",
format_options={"jsonPath": "$[*]"}
).toDF()
I hope it solves the problem.
I am able to use the avro-tools-1.7.7.jar to take json data and avro schema and output a binary Avro file as shown here https://github.com/miguno/avro-cli-examples#json-to-avro. However, I want to be able to do this programmatically using the Avro python api: https://avro.apache.org/docs/1.7.7/gettingstartedpython.html.
In their example they show how you can write a record at a time into a binary avro file.
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
schema = avro.schema.parse(open("user.avsc").read())
writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
My use case is writing all of the records at once like the avro-tools jar does from a json file, just in python code. I do not want to shell out and execute the jar. This will be deployed to Google App Engine if that matters.
This can be accomplished with fastavro. For example, given the schema in the link:
twitter.avsc
{
"type" : "record",
"name" : "twitter_schema",
"namespace" : "com.miguno.avro",
"fields" : [ {
"name" : "username",
"type" : "string",
"doc" : "Name of the user account on Twitter.com"
}, {
"name" : "tweet",
"type" : "string",
"doc" : "The content of the user's Twitter message"
}, {
"name" : "timestamp",
"type" : "long",
"doc" : "Unix epoch time in seconds"
} ],
"doc:" : "A basic schema for storing Twitter messages"
}
And the json file:
twitter.json
{"username":"miguno","tweet":"Rock: Nerf paper, scissors is fine.","timestamp": 1366150681 }
{"username":"BlizzardCS","tweet":"Works as intended. Terran is IMBA.","timestamp": 1366154481 }
You can use something like the following script to write out an avro file:
import json
from fastavro import json_reader, parse_schema, writer
with open("twitter.avsc") as fp:
schema = parse_schema(json.load(fp))
with open("twitter.avro", "wb") as avro_file:
with open("twitter.json") as fp:
writer(avro_file, schema, json_reader(fp, schema))