I'm trying to split a JSON file to two different XML files. Example below.
Trying to use a python script to perform this. A groovy script would work as well. This split function is part of a file transformation in Apace NiFi.
JSON file :
{
"Cars": {
"Car": [{
"Brand": "Volkswagon"
"Country": "Germany",
"Type": "All",
"Models":
[{
"Polo": {
"Type": "Hatchback",
"Color": "White",
"Cost": "10000"
}
} {
"Golf": {
"Type": "Hatchback",
"Color": "White",
"Cost": "12000"
}
}
]
}
]
}
}
Split to two XML files :
XML 1 :
<VehicleEntity>
<VehicleEntity>
<GlobalBrandId>Car123</GlobalBrandId>
<Name>Random Value</Name>
<Brand>Volkswagon</Brand>
</VehicleEntity>
</VehicleEntity>
XML 2 :
<VehicleEntityDetail>
<VehicleEntityDetailsEntity>
<GlobalBrandId>Car123</GlobalBrandId>
<Brand>Volkswagon</Brand>
<Type>Hatchback</Type>
<Color>White</Color>
<Cost>10000</Cost>
</VehicleEntityDetailsEntity>
</VehicleEntityDetail>
The XML tag names are a little different to the elements in the JSON file.
I'm looking for the best possible way to achieve this, but prefer a python script due to some experience working with Python.
Any other solution for Apache NiFi is also appreciated.
Related
I have for example a log that will change each time it is run an example is below. I will like to take one of the value(id) lets say as a variable and log only the id to console or use that value somewhere else.
[
{
"#type": "type",
"href": [
{
"#url": "url1",
"#method": "get"
},
{
"#url": "url2",
"#method": "post"
},
{
"#url": "url3",
"#method": "post"
}
],
"id": "3",
"meta": [
{
"key": "key1",
"value": "value1"
},
{
"key": "key2",
"value": "value2"
}
]
}
]
I want to get the id in a variable because the id changes after each time the robot framework is ran
You can see here that the JSON you are getting is in list format. Which means that to get a value from the JSON, you'll first need to read the JSON object in, then get the dictionary out of the list and only then access the key value you'd need to extract.
Robot Framework supports using Python statements with Evaluate keyword. When we need to simply parse some string to JSON we can get by using
${DATA}= Evaluate json.loads("""${DATA}""")
Notice that the ${DATA} here should contain your JSON as a string.
Now that we have the JSON object, we can do whatever we want with it. We can actually see from your JSON that it is actually a dictionary nested inside a list object (See surrounding []). So first, extract dictionary from the list, then access the dictionary key normally. The following should work fine.
${DICT}= Get From List ${DATA} 0
${ID}= Get From Dictionary ${DICT} id
I have json files in S3 containing array of objects in each file, like shown below.
[{
"id": "c147162a-a304-11ea-aa90-0242ac110028",
"clientId": "xxx",
"contextUUID": "1bb6b39e-b181-4a6d-b43b-4040f9d254b8",
"tags": {},
"timestamp": 1592855898
}, {
"id": "c147162a-a304-11ea-aa90-0242ac110028",
"clientId": "yyy",
"contextUUID": "1bb6b39e-b181-4a6d-b43b-4040f9d254b8",
"tags": {},
"timestamp": 1592855898
}]
I used crawler to detect and load the schema to catalog. It was successful and it created a schema with a single column named array with data type array<struct<id:string,clientId:string,contextUUID:string,tags:string,timestamp:int>>.
Now, I tried to load the data using glueContext.create_dynamic_frame.from_catalog function, but I could not see any data. I tried printing schema and data as shown below.
ds = glueContext.create_dynamic_frame.from_catalog(
database = "dbname",
table_name = "tablename")
ds.printSchema()
root
ds.schema()
StructType([], {})
ds.show()
empty
ds.toDF().show()
++
||
++
++
Any idea, what I am doing wrong? I am planning to extract each object in array and transform the object to a different schema.
You can try to give regex in format_options to tell glue how it should read the data. Following code has worked for me:
glueContext.create_dynamic_frame_from_options('s3',
{
'paths': ["s3://glue-test-bucket-12345/events/101-1.json"]
},
format="json",
format_options={"jsonPath": "$[*]"}
).toDF()
I hope it solves the problem.
I have a .geojson file (call it data.geojson) which I use to manually update a dataset on mapbox.
Suppose that my data.geojson file is structured as follows:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"suburb": "A",
"unemployed": 10
},
"geometry": {
"type": "Point",
"coordinates": [
0,
0
]
}
},
{
"type": "Feature",
"properties": {
"suburb": "B",
"unemployed": 20
},
"geometry": {
"type": "Point",
"coordinates": [
1,
1
]
}
data.geojson is stored locally, and every 12 hours the 'unemployed' property of each feature is updated using another python script that scrapes data from the web.
Currently, in order to update these properties within the online dataset (stored at mapbox.com) I am manually navigating to the Mapbox website and reuploading the data.geojson file. I am looking for a way to accomplish this task pythonically.
Any help would be greatly appreciated!
you can setup a timer of some sort to automatically update the data using javascript functions. Here I am using a source and layer named "STI", which is just geoJSON line data.
The function would first add the source of the data as well as the layer :
var STI_SOURCE = 'json/sti/STI.json'; // declare URL for data
map.addSource('sti', { type: 'geojson', data: STI1 }); // Add source using URL
// Add the actual layer using the source
map.addLayer({
"id": "sti",
"type": "line",
"source": "sti",
"layout": {
"line-join": "miter",
"line-cap": "round"
},
"paint": {
"line-color": "#fff",
"line-width": 1,
"line-dasharray": [6, 2]
}
});
Then, when you want to refresh the data - remove them :
map.removeLayer('sti');
map.removeSource('sti');
Then, you can re-add them by starting at the beginning. There are other ways (and better) to do this, but this is just one way that works. I think there is a setData() function that does this better. But hopefully this can get you started.
My solution, in the end, was simply to point the source of the Mapbox layer to the locally stored dataset.geojson file rather than the corresponding dataset stored online at mapbox.com.
I was able to edit the locally stored dataset.geojson using the 'json' python package. Since the Mapbox layer source was pointing directly to the local dataset, all updates to this local file would then be reflected in the Mapbox layer. This way, there is no need to upload any data to Mapbox.
#David also posted a helpful solution if you wish to go down that route.
I have a webpage. It takes a json and I submit this json via button.
When I load the json with sendkeys method it doesnt work.
EMPTY_METADATAJSON=get_link("./appconfig.json")
wait.until(EC.presence_of_all_elements_located((By.ID, UIAppPublish.metadata_page_id)))
driver.find_element_by_id(UIAppPublish.metadata_input).send_keys(EMPTY_METADATAJSON)
Could u pls help me to load json?
Assign this json to one variable like below:
jsonToEnter = {
"system_service": false,
"version": "1.0.0",
"checksum": "",
"machineConfig": {
"subscriptions": {
"sinumerik_hf_data": {
"payload": [{
"sinumerikUid": "hfdd_data",
"period": 2
}],
"source": "communicationAdapter",
"quality": "high_performance",
"isCloudMessage": false
}
}
}
}
Pass same object as String into the text box using WebDriver(I) sendKeys() by strinfying your json Object like below:
driver.findEement(ElementLocator Of Text box).sendKeys(JSON.stringify(jsonToEnter));
You can also try and set it via Javascript, if the element has a value-attribute.
Something like (don't know Python, sorry):
webdriver.executeScript("document.getElementById('UIAppPublish.metadata_page_id').setAttribute('value', jsonToEnter)");
I'm trying to encode a somewhat large JSON in Python (v2.7) and I'm having trouble putting in my variables!
As the JSON is multi-line and to keep my code neat I've decided to use the triple double quotation mark to make it look as follows:
my_json = """{
"settings": {
"serial": "1",
"status": "2",
"ersion": "3"
},
"config": {
"active": "4",
"version": "5"
}
}"""
To encode this, and output it works well for me, but I'm not sure how I can change the numbers I have there and replace them by variable strings. I've tried:
"settings": {
"serial": 'json_serial',
but to no avail. Any help would be appreciated!
Why don't you make it a dictionary and set variables then use the json library to make it into json
import json
json_serial = "123"
my_json = {
'settings': {
"serial": json_serial,
"status": '2',
"ersion": '3',
},
'config': {
'active': '4',
'version': '5'
}
}
print(json.dumps(my_json))
If you absolutely insist on generating JSON with string concatenation -- and, to be clear, you absolutely shouldn't -- the only way to be entirely certain that your output is valid JSON is to generate the substrings being substituted with a JSON generator. That is:
'''"settings" : {
"serial" : {serial},
"version" : {version}
}'''.format(serial=json.dumps("5"), version=json.dumps(1))
But don't. Really, really don't. The answer by #davidejones is the Right Thing for this scenario.