Filtering JSON in python

Filtering JSON in python - python

I want to filter a json file where it only show me entries where content-type is application/json.
For now this is my code :
import json
with open('rob.json', 'r', encoding="utf8") as original_file:
data = json.load(original_file)
for line in data:
if line['value'] == 'application/json':
print(line)
The code I have written is very basic as I am quite a beginner when it comes to scripting. However it is not working and I have an error:
TypeError: string indices must be integers
I require some help on why I am having this error and whether there is a better alternative to filter a JSON file.
TIA

You have to understand the structure of the returned data. It is a dictionary containing one key ("log") that is also a dictionary. That dictionary contains an "entries" key which is a list. That list consists of dictionaries that have keys for "request" and "response". The "request" key has a "headers" key, which is a list of dictionaries containing "name" and "value" keys.
import json
with open('rob.json',encoding='utf8') as f:
data = json.load(f)
# Traverse the list of log entries:
for entry in data['log']['entries']:
# Traverse the list of headers:
for header in entry['response']['headers']:
# Look for the appropriate name and value.
if header['name'] == 'Content-Type' and header['value'] == 'application/json':
# I just print the request as the response is very long...
print(json.dumps(entry['request'],indent=2))
Output:
{
"method": "GET",
"url": "http://ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min.map",
"httpVersion": "HTTP/1.1",
"headers": [
{
"name": "Pragma",
"value": "no-cache"
},
{
"name": "Accept-Encoding",
"value": "gzip,deflate,sdch"
},
{
"name": "Host",
"value": "ajax.googleapis.com"
},
{
"name": "Accept-Language",
"value": "en-US,en;q=0.8"
},
{
"name": "User-Agent",
"value": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
},
{
"name": "Accept",
"value": "*/*"
},
{
"name": "Referer",
"value": "http://ericduran.github.io/chromeHAR/"
},
{
"name": "Connection",
"value": "keep-alive"
},
{
"name": "Cache-Control",
"value": "no-cache"
}
],
"queryString": [],
"cookies": [],
"headersSize": 412,
"bodySize": 0
}

import json
with open('rob.json','r',encoding="utf8") as original_file:
data = json.load(original_file)
for entry in data["log"]["entries"]:
res = entry["response"]
for header in res["headers"]:
if "application/json" in header["value"]:
print(header)
I really don't know what you want to look for but pretty sure this code will print out the header which has value including "application/json".

Related

Preserving sort order of .json file that gets created from api response

I am having a problem with getting the correct sort order of a .json file that gets created from an api response using PyCharm Community Edition with python 3.7.
This is the api request:
import requests
import json
url = "https://pokemon-go1.p.rapidapi.com/pokemon_names.json"
headers = {
'x-rapidapi-key': "c061ae2dffmshc2a33d10b00cee7p121f42jsn11f39d53dd1e",
'x-rapidapi-host': "pokemon-go1.p.rapidapi.com"
}
response = requests.request("GET", url, headers=headers)
If i now print(response.text), I get the following output: (That's how I want my .json file to look like later)
{
"1": {
"id": 1,
"name": "Bulbasaur"
},
"2": {
"id": 2,
"name": "Ivysaur"
},
"3": {
"id": 3,
"name": "Venusaur"
},
"4": {
"id": 4,
"name": "Charmander"
},
"5": {
"id": 5,
"name": "Charmeleon"
}
}
After that I write the response to the file "pokemondata.json" by doing this:
response_json = json.loads(response.text)
writeFile = open("pokemondata.json", "w")
writeFile.write(json.dumps(response_json, indent=4, sort_keys=True))
writeFile.close()
And then the file looks like this:
{
"1": {
"id": 1,
"name": "Bulbasaur"
},
"10": {
"id": 10,
"name": "Caterpie"
},
"100": {
"id": 100,
"name": "Voltorb"
},
"101": {
"id": 101,
"name": "Electrode"
},
"102": {
"id": 102,
"name": "Exeggcute"
}
}
I could not figure out how to get it done so that the file is sorted by ids (or the numbers before the id, e.g. "1") correctly. Could anybody please explain to me, how I can fix it?

import requests
import json
url = "https://pokemon-go1.p.rapidapi.com/pokemon_names.json"
headers = {
'x-rapidapi-key': "c061ae2dffmshc2a33d10b00cee7p121f42jsn11f39d53dd1e",
'x-rapidapi-host': "pokemon-go1.p.rapidapi.com"
}
response = requests.request("GET", url, headers=headers)
response_json = response.json()
int_keys = {int(k):v for k,v in response_json.items()}
with open("sample.json", 'w') as file_obj:
json.dump(int_keys, file_obj, indent=4, sort_keys=True)
Issue is with keys in your json which are in string format. Convert them to integers and save in the file

Unable to send an application in the right way using post requests having multiple parameters

I'm trying to send an application after filling in a form available in a webpage using python. I've tried to mimic the process that I see in chrome dev tools but it seems I've gone somewhere wrong and that is the reason when I execute the following script I get this error:
{
"message":"415 Unsupported Media Type returned for /apply-app/rest/jobs/PIDFK026203F3VBQB79V77VIY-87592/submissions with message: ",
"key":"Exception_server_error",
"errorId":"d6b128bd-426d-4bee-8dbb-03e232829f5e"
}
It seems to me that I need to use the value of token and version in an automatic manner as they are different in every application but I don't find them in page source and stuff.
I've selected No as value for all the dropdowns (when there is any) within Additional Information.
Link to the application page
Link to the attachment that I've used thrice.
I've tried with:
import requests
main_link = "https://karriere.hsbc.de/stellenangebote/stellenboerse/apply?jobId=PIDFK026203F3VBQB79V77VIY-87592&langCode=de_DE"
post_link = "https://emea3.recruitmentplatform.com/apply-app/rest/jobs/PIDFK026203F3VBQB79V77VIY-87592/submissions"
payload = {
"candidateIdentity":{"firstName":"syed","lastName":"mushfiq","email":"mthmt80#gmail.com"},
"answeredDocuments":[{"documentType":"answeredForm","formId":"hsbc_bewerbungsprozess_pers_nliche_daten",
"answers":[
{"questionId":"form_of_address","type":"options","value":["form_of_address_m"]},
{"questionId":"academic_title","type":"simple","value":" Dr.","questionIds":[]},
{"questionId":"first_name","type":"simple","value":"syed","questionIds":[]},
{"questionId":"last_name","type":"simple","value":"mushfiq","questionIds":[]},
{"questionId":"e-mail_address","type":"simple","value":"mthmt80#gmail.com","questionIds":[]},
{"questionId":"phone__mobile_","type":"phone","countryCode":"+880","isoCountryCode":"BD","subscriberNumber":"1790128884"}]},
{"documentType":"answeredForm","formId":"hsbc_bewerbungsprozess_standard_fragebogen","answers":[{"questionId":"custom_question_450","type":"options","value":["custom_question_450_ja"]},
{"questionId":"custom_question_451","type":"options","value":["custom_question_451_nein"]},
{"questionId":"custom_question_452","type":"options","value":["custom_question_452_unter_keine_der_zuvor_genannten"]},
{"questionId":"custom_question_580","type":"options","value":["custom_question_580_nein_978"]},
{"questionId":"custom_question_637","type":"options","value":["custom_question_637_nein"]},
{"questionId":"custom_question_579","type":"options","value":["custom_question_579_nein"]},
{"questionId":"custom_question_583","type":"options","value":["custom_question_583_hsbc_deutschland_karriereseite"]}]},
#============The following three lines are supposed to help upload three files============
{"documentType":"attachment","attachmentId":"cover_letter","token":"2d178469-cdb5-4d65-9f67-1e7637896953","filename": open("demo.pdf","rb")},
{"documentType":"attachment","attachmentId":"attached_resume","token":"81a5a661-66bb-4918-a35c-ec260ffb7d02","filename": open("demo.pdf","rb")},
{"documentType":"attachment","attachmentId":"otherattachment","token":"4c3f7500-b072-48d4-83cf-0af1399bc8ba","filename": open("demo.pdf","rb")}],
#============The version's value should not be hardcoded=========================
"version":"V2:3:14dfac80702d099625d0274121b0dba68ac0fd96:861836b7d86adae8cc1ce69198b69b8ca59e2ed5","lastModifiedDate":1562056029000,"answeredDataPrivacyConsents":[{"identifier":"urn:lms:ta:tlk:data-privacy-consent:mtu531:101","consentProvided":True},
{"identifier":"urn:lms:ta:tlk:data-privacy-consent:mtu531:102","consentProvided":True}],
"metaInformation":{"applicationFormUrl":"https://karriere.hsbc.de/stellenangebote/stellenboerse/apply?jobId=PIDFK026203F3VBQB79V77VIY-87592&langCode=de_DE","jobsToLink":[]}
}
def send_application(s,link):
res = s.post(link,data=payload)
print(res.text)
if __name__ == '__main__':
with requests.Session() as s:
send_application(s,post_link)
How can I send the application in the right way?
PS I can send the application manually multiple times using the same documents to the same email.

The best way to go about something like this is to open the page in a browser and view the network tab in the developer tools. From there as you're filling out the form you'll be able to see that each time you attach a document it sends an ajax request and receives the token in a json response. With those tokens you can build the final payload which should be submitted in json format.
Here's some example code that's working:
import requests
headers = {
'Host': 'emea3.recruitmentplatform.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'apply-config-key': 'AAACEwAA-55cd88d4-c9fd-41ce-95a4-f238402b898f',
'Origin': 'https://karriere.hsbc.de',
'DNT': '1',
'Connection': 'close',
'Referer': 'https://karriere.hsbc.de/',
'Cookie': 'lumesse_language=de_DE'
}
main_link = "https://karriere.hsbc.de/stellenangebote/stellenboerse/apply?jobId=PIDFK026203F3VBQB79V77VIY-87592&langCode=de_DE"
post_link = "https://emea3.recruitmentplatform.com/apply-app/rest/jobs/PIDFK026203F3VBQB79V77VIY-87592/submissions"
ajax_link = "https://emea3.recruitmentplatform.com/apply-app/rest/jobs/PIDFK026203F3VBQB79V77VIY-87592/attachments"
def build_payload(cover_letter_token, attached_resume_token, otherattachment_token):
return {
"candidateIdentity": {
"firstName": "Syed",
"lastName": "Mushfiq",
"email": "mthmt80#gmail.com"
},
"answeredDocuments": [
{
"documentType": "answeredForm",
"formId": "hsbc_bewerbungsprozess_pers_nliche_daten",
"answers": [
{
"questionId": "form_of_address",
"type": "options",
"value": [
"form_of_address_m"
]
},
{
"questionId": "academic_title",
"type": "simple",
"value": "prof",
"questionIds": []
},
{
"questionId": "first_name",
"type": "simple",
"value": "Syed",
"questionIds": []
},
{
"questionId": "last_name",
"type": "simple",
"value": "Mushfiq",
"questionIds": []
},
{
"questionId": "e-mail_address",
"type": "simple",
"value": "mthmt80#gmail.com",
"questionIds": []
},
{
"questionId": "phone__mobile_",
"type": "phone",
"countryCode": "+49",
"isoCountryCode": "DE",
"subscriberNumber": "30 33850062"
}
]
},
{
"documentType": "answeredForm",
"formId": "hsbc_bewerbungsprozess_standard_fragebogen",
"answers": [
{
"questionId": "custom_question_450",
"type": "options",
"value": [
"custom_question_450_ja"
]
},
{
"questionId": "custom_question_451",
"type": "options",
"value": [
"custom_question_451_nein"
]
},
{
"questionId": "custom_question_452",
"type": "options",
"value": [
"custom_question_452_unter_keine_der_zuvor_genannten"
]
},
{
"questionId": "custom_question_580",
"type": "options",
"value": [
"custom_question_580_ja"
]
},
{
"questionId": "custom_question_637",
"type": "options",
"value": [
"custom_question_637_nein"
]
},
{
"questionId": "custom_question_579",
"type": "options",
"value": [
"custom_question_579_nein"
]
},
{
"questionId": "custom_question_583",
"type": "options",
"value": [
"custom_question_583_linkedin"
]
}
]
},
{
"documentType": "attachment",
"attachmentId": "cover_letter",
"token": cover_letter_token,
"filename": "demo.pdf"
},
{
"documentType": "attachment",
"attachmentId": "attached_resume",
"token": attached_resume_token,
"filename": "demo.pdf"
},
{
"documentType": "attachment",
"attachmentId": "otherattachment",
"token": otherattachment_token,
"filename": "demo.pdf"
}
],
"version": "V2:3:14dfac80702d099625d0274121b0dba68ac0fd96:861836b7d86adae8cc1ce69198b69b8ca59e2ed5",
"lastModifiedDate": "1562056029000",
"answeredDataPrivacyConsents": [
{
"identifier": "urn:lms:ta:tlk:data-privacy-consent:mtu531:101",
"consentProvided": "true"
},
{
"identifier": "urn:lms:ta:tlk:data-privacy-consent:mtu531:102",
"consentProvided": "true"
}
],
"metaInformation": {
"applicationFormUrl": "https://karriere.hsbc.de/stellenangebote/stellenboerse/apply?jobId=PIDFK026203F3VBQB79V77VIY-87592&langCode=de_DE",
"jobsToLink": []
}
}
def submit_attachment(s, link, f):
d = open(f, 'rb').read()
r = s.post(link, files={'file':('demo.pdf', d),'applicationProcessVersion':(None, 'V2:3:14dfac80702d099625d0274121b0dba68ac0fd96:861836b7d86adae8cc1ce69198b69b8ca59e2ed5')})
r_data = r.json()
return r_data.get('token')
def send_application(s,link,p):
res = s.post(link, json=p)
return res
if __name__ == '__main__':
attachment_list = ["cover_letter_token", "attached_resume_token", "otherattachment_token"]
token_dict = {}
with requests.Session() as s:
s.headers.update(headers)
for at in attachment_list:
rt = submit_attachment(s, ajax_link, "demo.pdf")
token_dict[at] = rt
payload = build_payload(token_dict['cover_letter_token'], token_dict['attached_resume_token'], token_dict['otherattachment_token'])
rd = send_application(s, post_link, payload)
print(rd.text)
print(rd.status_code)

Python post request, problem with posting

I'm trying to write a typeform bot but I am a totally beginner so I have problems with request.post
I am trying to fill this typeform: https://typeformtutorial.typeform.com/to/aA7Vx9
by this code
import requests
token = requests.get("https://typeformtutorial.typeform.com/app/form/result/token/aA7Vx9/default")
data = {"42758279": "true",
"42758410": "text",
"token": token}
r = requests.post("https://typeformtutorial.typeform.com/app/form/submit/aA7Vx9", data)
print(r)
I think that something is wrong with "data" and I am not sure if I use token in a good way. Could you help me?

So, first of all, you need to get another field with the token. To do that, you should pass the header 'accept': 'application/json' in your first request. In the response, you'll get the json object with the token and landed_at parameters. You should use them in the next step.
Then, the post data shoud be different from what you're passing. See the network tab in the browser's developer tools to find out the actual template. It has a structure like that:
{
"signature": <YOUR_SIGNATURE>,
"form_id": "aA7Vx9",
"landed_at": <YOUR_LANDED_AT_TIME>,
"answers": [
{
"field": {
"id": "42758279",
"type": "yes_no"
},
"type": "boolean",
"boolean": True
},
{
"field": {
"id": "42758410",
"type": "short_text"
},
"type": "text",
"text": "1"
}
]
}
And finally, you should convert that json to text so the server would successfully parse it.
Working example:
import requests
import json
token = json.loads(requests.post(
"https://typeformtutorial.typeform.com/app/form/result/token/aA7Vx9/default",
headers={'accept': 'application/json'}
).text)
signature = token['token']
landed_at = int(token['landed_at'])
data = {
"signature": signature,
"form_id": "aA7Vx9",
"landed_at": landed_at,
"answers": [
{
"field": {
"id": "42758279",
"type": "yes_no"
},
"type": "boolean",
"boolean": True
},
{
"field": {
"id": "42758410",
"type": "short_text"
},
"type": "text",
"text": "1"
}
]
}
json_data = json.dumps(data)
r = requests.post("https://typeformtutorial.typeform.com/app/form/submit/aA7Vx9", data=json_data)
print(r.text)
Output:
{"message":"success"}

Need read some data from JSON

I need to make a get (id, name, fraction id) for each deputy in this json
{
"id": "75785",
"title": "(за основу)",
"asozdUrl": null,
"datetime": "2011-12-21T12:20:26+0400",
"votes": [
{
"deputy": {
"id": "99111772",
"name": "Абалаков Александр Николаевич",
"faction": {
"id": "72100004",
"title": "КПРФ"
}
},
"result": "accept"
},
{
"deputy": {
"id": "99100491",
"name": "Абдулатипов Рамазан Гаджимурадович",
"faction": {
"id": "72100024",
"title": "ЕР"
}
},
"result": "none"
}
.......,` etc
My code is looks like that:
urlData = "https://raw.githubusercontent.com/data-dumaGovRu/vote/master/poll/2011-12-21/75785.json"
response = urllib.request.urlopen(urlData)
content = response.read()
data = json.loads(content.decode("utf8"))
for i in data:
#print(data["name"])
`
And i dont know what to do with that #print line, how I should write it?

You can access the list containing the deputies with data['votes']. Iterating through the list, you can access the keys you're interested in as you would with dict key lookups. Nested dicts imply you have to walk through the keys starting from the root to your point of interest:
for d in data['votes']:
print(d['deputy']['id'], d['deputy']['name'], d['deputy']['faction']['id'])

How to parse log file using python and store data in database?

I am trying to parse a log file .which contains the structure like given below
i want to do it with python and want to store extracted data in database how can i do this ?
i am able to parse simple key value pair but facing some problem.
1: How can i parse nested structure for example context field in the sample file is nested in main group?
2: How to tackle with condition if separator comes as a string . like for key:value pair separator is colon (:) and in the "site" key there is a key:value pair site_url:http://something.com here url also contains colon (:) which gives the wrong answer.
{
"username": "lavania",
"host": "10.105.22.32",
"event_source": "server",
"event_type": "/courses/XYZ/CS101/2014_T1/xblock
/i4x:;_;_XYZ;_CS101;_video;_d333fa637a074b41996dc2fd5e675818/handler/xmodule_handler/save_user_state",
"context": {
"course_id": "XYZ/CS101/2014_T1",
"course_user_tags": {},
"user_id": 42,
"org_id": "XYZ"
},
"time": "2014-06-20T05:49:10.468638+00:00",
"site":"http://something.com",
"ip": "127.0.0.1",
"event": "{\"POST\": {\"saved_video_position\": [\"00:02:10\"]}, \"GET\": {}}",
"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0",
"page": null
}
{
"username": "rihana",
"host": "10.105.22.32",
"event_source": "server",
"event_type": "problem_check",
"context": {
"course_id": "XYZ/CS101/2014_T1",
"course_user_tags": {},
"user_id": 40,
"org_id": "XYZ",
"module": {
"display_name": ""
}
},
"time": "2014-06-20T06:43:52.716455+00:00",
"ip": "127.0.0.1",
"event": {
"submission": {
"i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"input_type": "choicegroup",
"question": "",
"response_type": "multiplechoiceresponse",
"answer": "MenuInflater.inflate()",
"variant": "",
"correct": true
}
},
"success": "correct",
"grade": 1,
"correct_map": {
"i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"hint": "",
"hintmode": null,
"correctness": "correct",
"npoints": null,
"msg": "",
"queuestate": null
}
},
"state": {
"student_answers": {},
"seed": 1,
"done": null,
"correct_map": {},
"input_state": {
"i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {}
}
},
"answers": {
"i4x-XYZ-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": "choice_0"
},
"attempts": 1,
"max_grade": 1,
"problem_id": "i4x://XYZ/CS101/problem/33e4aac93dc84f368c93b1d08fa984fc"
},
"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0",
"page": "x_module"
}
{
"username": "troysa",
"host": "localhost",
"event_source": "server",
"event_type": "/courses/XYZ/CS101/2014_T1/instructor_dashboard/api/list_instructor_tasks",
"context": {
"course_id": "XYZ/CS101/2014_T1",
"course_user_tags": {},
"user_id": 6,
"org_id": "XYZ"
},
"time": "2014-06-20T05:49:26.780244+00:00",
"ip": "127.0.0.1",
"event": "{\"POST\": {}, \"GET\": {}}",
"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0",
"page": null
}

Your data is in the JSON format. Use the json module in the standard library to parse it.
However, your data seems to be several JSON dicts concatenated together. Hopefully you just pasted from several individual entries, otherwise you're going to have to do some data cleanup before you start parsing in great detail.
Supposing these are individual files, I'll give an example of the "username": "raeha" set, which has been loaded into the data variable:
>>> import json
>>> newdata = json.loads(data)
>>> print(newdata["context"])
{'course_id': 'XYZ/CS101/2014_T1', 'course_user_tags': {}, 'org_id': 'XYZ', 'user_id': 40, 'module': {'display_name': ''}}
>>> print(newdata["context"]["user_id"])
40
The json.loads() method takes raw JSON data (as a string) and formats it into Python datatypes. Typically, the outermost type is a dict, each key of which is a string, and each value can be a string, list, dict, numeric value, or item like True, False, or None. These correspond to true, false, and null in JSON.

As has been pointed out this is a JSON data structure. I wrote some quick code that will read your log file line by line and attempt to find complete multi-line json objects. Once all the lines are read it is finished. I use pprint on the objects so that the output is human readable to ensure the dict that is returned looks correct.
import json
import pprint
with open("log.txt") as infile:
# Loop until we have parsed all the lines.
for line in infile:
# Read lines until we find a complete object
while (True):
try:
json_data = json.loads(line)
# We have a complete onject here
pprint.pprint(json_data)
# Try and find a new JSON object
break
except ValueError:
# We don't have a complete JSON object
# read another line and try again
line += next(infile)
This code is a bit of a kludge. It reads a line and sees if we have a complete parseable object. If not it reads the next line and concatenates it with the last. This continues until a parseable object can be loaded. It then does this over and over until all the lines are consumed and all objects have been found.
At this point in the code you have read a complete JSON object into json_data:
pprint.pprint(json_data)
I pprint the dict out but it is a standard python dictionary that can be processed for data as using normal dict traversal. For example you could retrieve the course_id with something like:
json_data['context']['course_id']
or the host via:
json_data['host']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Filtering JSON in python - python

Related

Preserving sort order of .json file that gets created from api response

Unable to send an application in the right way using post requests having multiple parameters

Python post request, problem with posting

Need read some data from JSON

How to parse log file using python and store data in database?

Categories

Resources