I have a json config file where I store my path to data there
The data is bucketed in month and days, so without the json I would use an f-string like:
spark.read.parquet(f"home/data/month={MONTH}/day={DAY}")
Now I want to extract that from json. However, I run into problems with the Month and day variable. I do not want to split the path in the json.
But writing it like this:
{
"path":"home/data/month={MONTH}/day={DAY}"
}
and loading with:
DAY="1"
MONTH="12"
conf_path=pandas.read_json("...")
path=conf_path["path"]
data=spark.read_parquet(f"{path}")
does not really work.
Could you hint me a solution to retrieving a path with variable elements and filling them after reading? How would you store the path or retrieve it without splitting the path? Thanks
------- EDIT: SOLUTION --------
Thanks to Deepak Tripathi answer below, the answer is to use string format.
with the code like this:
day="1"
month="12"
conf_path=pandas.read_json("...")
path=conf_path["path"]
data=spark.read_parquet(path.format(MONTH=month, DAY=day))
you should use string.format() instead of f-strings
Still if you want to use f-strings then you should use eval like this, its unsafe
DAY="1"
MONTH="12"
df = pd.DataFrame(
[{
"path":"home/data/month={MONTH}/day={DAY}"
},
{
"path":"home/data/month={MONTH}/day={DAY}"
}
]
)
a = df['path'][0]
print(eval(f"f'{a}'"))
#home/data/month=12/day=1
Related
My JSON data looks like this (for example):
{
"data":
1:
"name": "Stackoverflow"
}
I want to get the highlighted data but I don't know how to do it. If I use print(json["singledata"]), it works. But if I use print(json["multiple": "datas"]), then it does not works. How can I get the multiple data?
Found the answer! Just use like that:
data1 = json["data"]
print(type(data1["name"]))
I am writing a python code to scrape website data through cURL. I converted cURL into python code using https://curlconverter.com/ . The code works just fine but I want to customize according to my need like in this line of code
data = '{"appDate":{"startDate":"2022-01-05T18:30:00.000Z","endDate":"2022-01-06T18:30:00.000Z"},"page_number":1,"page_size":20,"sort":{"key":"AppointmentStartTime","order":-1}}'
After "startDate" I want to add my variable (startdate) which I created like this
variable code
I tried to add variables like this
data = '{"appDate":{"startDate":'+ startdate +,"endDate":'+ enddate +'},"page_number":1,"page_size":20,"sort":{"key":"AppointmentStartTime","order":-1}}' but this did not work.
Also adding '+ str(startdate) +' did not help.
Please can anyone tell me how this should be done.
You might want to transform the json data string into a dictionary using the json module. Then you can freely manipulate and export your data.
import json
raw_data = '{"appDate":{"startDate":"2022-01-05T18:30:00.000Z","endDate":"2022-01-06T18:30:00.000Z"},"page_number":1,"page_size":20,"sort":{"key":"AppointmentStartTime","order":-1}}'
data = json.loads(raw_data) # load json from string to dict
data['appDate']['startDate'] = 23
data['appDate']['endDate'] = 42
print(json.dumps(data)) # export dict to json string
In the example you have shown, there is probably only a small mistake right after + startdate +, with the apostrophe. Compare carefully:
Your code (with the mistake):
data = '{"appDate":{"startDate":'+ startdate +,"endDate":'+ enddate +'},"page_number":1,"page_size":20,"sort":{"key":"AppointmentStartTime","order":-1}}'
^
SyntaxError: invalid syntax
Fixed code:
data = '{"appDate":{"startDate":'+ startdate +',"endDate":'+ enddate +'},"page_number":1,"page_size":20,"sort":{"key":"AppointmentStartTime","order":-1}}'
I want to filter the json that operatinSystem are linux ,and I have some problem with it,the part of json in
'' : {
that I don't know how dictionary represent it and
"DQ578CGN99KG6ECF" : {
how can I represent it with wildcard, anyone could help my please.
import json
import urllib2
response=urllib2.urlopen('https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/index.json')
url=response.read()
urlj=json.loads(url)
filterx=[x for x in urlj if x['??']['??']["attributes"]["operatingSystem"] == 'linux']
I'm not sure about the wildcard representation. I'll look into it and get back to you. Meanwhile, I have already worked with this json before so I can tell you how to access the information you need.
The information you need can be obtained as follows:
for each_product in urlx['products']:
if urlx['products'][each_product]['attributes']['operatingSystem']=="linux":
#your code here
If you need pricing information from the json you need to take the product code string and look into the priceDimensions field for it. Look at the sample json and code accordingly.
https://aws.amazon.com/blogs/aws/new-aws-price-list-api/
I have a json object saved inside test_data and I need to know if the string inside test_data['sign_in_info']['package_type'] contains the string "vacation_package" in it. I assumed that in could help but I'm not sure how to use it properly or if it´s correct to use it. This is an example of the json object:
"checkout_details": {
"file_name" : "pnc04",
"test_directory" : "test_pnc04_package_today3_signedout_noinsurance_cc",
"scope": "wdw",
"number_of_adults": "2",
"number_of_children": "0",
"sign_in_info": {
"should_login": false,
**"package_type": "vacation_package"**
},
package type has "vacation_package" in it, but it's not always this way.
For now I´m only saving the data this way:
package_type = test_data['sign_in_info']['package_type']
Now, is it ok to do something like:
p= "vacation_package"
if(p in package_type):
....
Or do I have to use 're' to cut the string and find it that way?
You answer depends on what exactly you expect to get from test_data['sign_in_info']['package_type']. Will 'vacation_package' always be by itself? Then in is fine. Could it be part of a larger string? Then you need to use re.search. It might be safer just to use re.search (and a good opportunity to practice regular expressions).
No need to use re, assuming you are using the json package. Yes, it's okay to do that, but are you trying to see if there is a "package type" listed, or if the package type contains vacation_package, possibly among other things? If not, this might be closer to what you want, as it checks for exact matches:
import json
data = json.load(open('file.json'))
if data['sign_in_info'].get('package_type') == "vacation_package":
pass # do something
I'm fairly new to javascript and such so I don't know if this will be worded correctly, but I'm trying to parse a JSON object that I read from a database. I send the html page the variable from a python script using Django where the variable looks like this:
{
"data":{
"nodes":[
{
"id":"n0",
"label":"Redditor(user_name='awesomeasianguy')"
},
...
]
}
}
Currently, the response looks like:
"{u'data': {u'nodes': [{u'id': u'n0', u'label': u"Redditor(user_name='awesomeasianguy')"}, ...
I tried to take out the characters like u' with a replaceAll type statement as seen below. This however is not that easy of a solution and it seems like there has got to be a better way to escape those characters.
var networ_json = JSON.parse("{{ networ_json }}".replace(/u'/g, '"').replace(/'/g, '"').replace(/u"/g, '"').replace(/"/g, '"'));
If there are any suggestions on a method I'm not using or even a tool to use for this, it would be greatly appreciated.
Use the template filter "|safe" to disable escaping, like,
var networ_json = JSON.parse("{{ networ_json|safe }}";
Read up on it here: https://docs.djangoproject.com/en/dev/ref/templates/builtins/#safe