Converting xontent from text file to list of objects python

Converting xontent from text file to list of objects python - python

I have cookie.txt with content like below
[
{
"domain": "example.com",
"expirationDate": 1683810439,
"hostOnly": false,
"httpOnly": false,
"name": "__adroll_fpc",
"path": "/",
"sameSite": "lax",
"secure": false,
"session": false,
"storeId": null,
"value": "2123213-1651041941056"
},
{
"domain": "example.com",
"expirationDate": 1715324838,
"hostOnly": false,
"httpOnly": false,
"name": "_ga",
"path": "/",
"sameSite": null,
"secure": false,
"session": false,
"storeId": null,
"value": "12332.1651041940"
}
]
I'm trying to access each object of that txt like below
def initCookies(self):
with open('cookie.txt', encoding='utf8') as f:
cookies = f.readlines()
mystring = ' '.join([str(item) for item in cookies])
data = json.loads(mystring)
print(type(data))
for cookie in data:
print(cookie)
but it seems print(cookie) has the whole content.
how to access each object within {} ?
I should be able to access them like this cookie.get('name', ''), cookie.get('value', '')

You can simplify your code, even though it already works...
import json
with open('cookie.txt', encoding='utf8') as f:
cookies = json.load(f)
for cookie in cookies:
print(cookie.get("name")) # '__adroll_fpc', '_ga'

why you convert the list to a big string first ? You could just do it directly with json.load instead of json.loads
with open('cookie.txt', encoding='utf8') as f:
data = json.load(f)
data # list of 2 dictionaries
for dic in data:
print(dic.get('name'))
Output:
__adroll_fpc
_ga

Related

obtain data from localstorage in Python

I'm still bit new to Python and trying to understand. If you could help me out it would be appreciated!
My issue is that I want to grab my authorization token from twitter.com that is stored in my local storage. Is there a way in Python to obtain the auth_token from localstorage and save it to a text file?
(I know how to write things to a text file but I'm having issues grabbing the auth token. I'm using playwright async and already tried it through the cookies and pasting them in a JSON file, but this results in that the "auth_token" is sometimes placed in the JSON file as:
['cookies'][9]['value']
or:
['cookies'][7]['value'] ['cookies'][8]['value']
Is there a way that I can find this in an easier way? The format looks like:
{
"cookies": [{
"name": "auth_token",
"value": "22b23d52e7c639f123456ed451dfe9ebd9d439d3",
"domain": ".twitter.com",
"path": "/",
"expires": 1816242826,
"httpOnly": true,
"secure": true,
"sameSite": "None"
}, {
"name": "ct0",
"value": "1547663d5b6a5b5c857b726964d9e10c7eb4654c1b210c345d008d28d526f43e7c5a8f4dcfaaead4281bac844cfee5a642fa5a7e7e9824405817de778bbd970f712f5de0cf01bf352de94989da6eb349",
"domain": ".twitter.com",
"path": "/",
"expires": 1816242827,
"httpOnly": false,
"secure": true,
"sameSite": "Lax"
}, {
"name": "twid",
"value": "u%3D1550750937400360960",
"domain": ".twitter.com",
"path": "/",
"expires": 1690098828,
"httpOnly": false,
"secure": true,
"sameSite": "None"
}, {
"name": "_s",
"value": "CgdiXZmky9MlkvLhFqnr4TxEU0eoZnJT4Eir8QAH%2FZ4SZENccyKmnFwtUXTz9BKd",
"domain": ".app.link",
"path": "/",
"expires": 1690098784,
"httpOnly": false,
"secure": true,
"sameSite": "None"
}]
}
Issue is that the "auth_token" is stored in [7/8/9] which is different every time. Is there a way to do it like
['cookies']['auth_token']['value']
right now I have:
with open('t.json') as auth_obtainer:
authfile =json.load(auth_obtainer)
auth_token = json.dumps(authfile['cookies'][9]['value']).replace('"',"")
print(auth_token)
but sometimes it's located differently in the JSON file so it gives me the wrong value

Use a for loop to iterate over dictionary contained in cookies and check if it contains name which has value equal to "auth_token" or not, if it does then access the "value" key of that dictionary.
with open('t.json') as f:
content = json.load(f)
cookies = content['cookies']
for data in cookies:
if data['name'] == 'auth_token':
auth_token = data['value']
print(auth_token)

How can I extract URLs from a one line JSON text file using regex?

I've been trying to work this out, but I can only seem to get one URL to be exported to the output file.
The code I am currently using is...
import glob, re
with open('urls.txt', 'a') as output:
for file in glob.glob('json.txt'):
with open(file, 'r') as f:
for line in f.readlines():
pattern = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])"
find = re.findall(pattern, line)
if find:
try:
output.write(str(find[0]))
except UnicodeEncodeError:
pass
I've tested the Regex code and it's able to match all the URLs, just won't output them all to a file.
The file I've been trying to extract URLs from contains the following (indented for legibility):
{
"items": [{
"schema": "Event",
"source_id": "99558834",
"event_id": "7103414757044987314",
"start_time": "2022-05-30T06:37:10Z",
"end_time": "2022-05-30T06:37:24Z",
"event_type": "motion",
"source_type": null,
"duration_ms": 14400,
"session_duration": 14000,
"state": "timed_out",
"had_subscription": true,
"is_favorite": false,
"recording_status": "ready",
"cv": {
"person_detected": true,
"stream_broken": false,
"detection_type": "human",
"cv_triggers": null,
"detection_types": [{
"detection_type": "human",
"verified_timestamps": [1653892632153]
}]
},
"properties": {
"is_alexa": false,
"is_sidewalk": false,
"is_autoreply": false
},
"origin": null,
"error_message": null,
"updated_at": "2022-05-30T06:37:28.958Z",
"visualizations": {
"cloud_media_visualization": {
"schema": "CloudMediaVisualization",
"media": [{
"schema": "Media",
"url": "https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/8cbfaccd-9b1a-458b-88b9-5d12976f4293.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=b42e734ab24ce1c8057038a171c995326de1a8cf219810c33c0d0883e2ea38b2",
"custom_metadata": null,
"is_e2ee": false,
"manifest_id": null,
"file_type": "VIDEO",
"file_family": "VIDEO",
"preroll_duration_ms": 0,
"playback_duration": 14000,
"source": "Apsara"
}, {
"schema": "Media",
"url": "https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/b22b3c85-5de3-4e91-92b5-d91db479df55.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=7724f211de257f1f13fb585158f0f241e47daa1f1f67a3e48527e45883889a8b",
"custom_metadata": null,
"is_e2ee": false,
"manifest_id": null,
"file_type": "LQ_VIDEO",
"file_family": "LQ_VIDEO",
"preroll_duration_ms": 0,
"playback_duration": 14400,
"source": "Apsara"
}, {
"schema": "Media",
"url": "https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/564fb900-0d78-4521-8a3d-b760fff7ee8d.iframe?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=8ccd9823cd6d2fe0e386b843a700bd05cc3a694c6986a55b75c797cbf846b7c6",
"custom_metadata": null,
"is_e2ee": false,
"manifest_id": null,
"file_type": "THUMBNAIL",
"file_family": "THUMBNAIL",
"preroll_duration_ms": 0,
"playback_duration": 14000,
"source": "Apsara"
}]
},
"local_media_visualization": {
"schema": "LocalMediaVisualization",
"media": []
},
"radar_visualization": null,
"single_coordinate_visualization": null,
"map_visualization": null
},
"device": {
"id": 99558834,
"description": "Front",
"type": "cocoa_camera"
},
"owner_id": "71616327"
}]
}

I think it might be easier to make the data you have valid JSON and then use the object_hook parameter that json.loads() supports. For more details see my answer to How to find a particular JSON value by key?.
Here's how to apply it your data:
import json
def find_values(id, json_repr):
results = []
def _decode_dict(a_dict):
try:
results.append(a_dict[id])
except KeyError:
pass
return a_dict
json.loads(json_repr, object_hook=_decode_dict) # Return value ignored.
return results
with open('filename.json') as file:
jstr = file.read()
json_repr = jstr + ']}' # Make jstr valid JSON.
results = find_values('url', json_repr)
print(f'{len(results)} URLs found')
for i, url in enumerate(results, start=1):
print(f'{i}: {url}')
Output:
3 URLs found
1: https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/8cbfaccd-9b1a-458b-88b9-5d12976f4293.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=b42e734ab24ce1c8057038a171c995326de1a8cf219810c33c0d0883e2ea38b2
2: https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/b22b3c85-5de3-4e91-92b5-d91db479df55.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=7724f211de257f1f13fb585158f0f241e47daa1f1f67a3e48527e45883889a8b
3: https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/564fb900-0d78-4521-8a3d-b760fff7ee8d.iframe?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=8ccd9823cd6d2fe0e386b843a700bd05cc3a694c6986a55b75c797cbf846b7c6

As mentioned by others, there are better ways to parse/read json, but given your code, it could do what you want with a small tweak.
import glob, re
with open('urls.txt', 'a') as output:
for file in glob.glob('json.txt'):
with open(file, 'r') as f:
for line in f.readlines():
pattern = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])"
find = re.findall(pattern, line)
if find:
try:
for result in find:
output.write(str(result) + "\n")
except UnicodeEncodeError:
pass
You have only asked to output the first of the matched results (find[0]). You want to get all of them, so loop through them then output them.

Sending cookies in Python

I have some cookies in python stored like this:
cookie = [
{"""
"domain": ".justdial.com",
"expirationDate": 1577653041.993055,
"hostOnly": false,
"httpOnly": true,
"name": "_ctk",
"path": "/",
"sameSite": "no_restriction",
"secure": false,
"session": false,
"storeId": "0",
"value": "893b0b69e25c0359d6e1fd88f16fea90a4bd2e0e8f8356e80bfc572e7f7e1343",
"id": 1"""
},
{"""
"domain": ".justdial.com",
"expirationDate": 1546136368,
"hostOnly": false,
"httpOnly": false,
"name": "_fbp",
"path": "/",
"sameSite": "no_restriction",
"secure": false,
"session": false,
"storeId": "0",
"value": "fb.1.1546114608524.1389346931",
"id": 2"""
}
]
requests.post(URL, cookies=cookie)
I am trying to send these cookies using Requests, but that does not work. Is the format wrong, or the way that I am sending it?
Thanks for the help! Using RequestsCookieJar worked, but I found another way: I saved it to a separate file, then, using the json library I got it in the right format and was able to send the cookies.

In your code, cookie is a list. You need to send a dict, or you can use the requests.cookies.RequestsCookieJar() object:
From the docs:
>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
>>> url = 'https://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'

Why do I get an error "'unicode' object does not support item deletion" when trying to delete values from a JSON object?

I am trying to loop through a list of objects deleting an element from each object. Each object is a new line. I am trying to then save the new file as is without the element contained within the objects.
{
"business_id": "fNGIbpazjTRdXgwRY_NIXA",
"full_address": "1201 Washington Ave\nCarnegie, PA 15106",
"hours": {
"Monday": {
"close": "23:00",
"open": "11:00"
},
"Tuesday": {
"close": "23:00",
"open": "11:00"
},
"Friday": {
"close": "23:00",
"open": "11:00"
},
"Wednesday": {
"close": "23:00",
"open": "11:00"
},
"Thursday": {
"close": "23:00",
"open": "11:00"
},
"Saturday": {
"close": "23:00",
"open": "11:00"
}
},
"open": true,
"categories": ["Bars", "American (Traditional)", "Nightlife", "Lounges", "Restaurants"],
"city": "Carnegie",
"review_count": 7,
"name": "Rocky's Lounge",
"neighborhoods": [],
"longitude": -80.0849416,
"state": "PA",
"stars": 4.0,
"latitude": 40.3964688,
"attributes": {
"Alcohol": "full_bar",
"Noise Level": "average",
"Music": {
"dj": false
},
"Attire": "casual",
"Ambience": {
"romantic": false,
"intimate": false,
"touristy": false,
"hipster": false,
"divey": false,
"classy": false,
"trendy": false,
"upscale": false,
"casual": false
},
"Good for Kids": true,
"Wheelchair Accessible": true,
"Good For Dancing": false,
"Delivery": false,
"Dogs Allowed": false,
"Coat Check": false,
"Smoking": "no",
"Accepts Credit Cards": true,
"Take-out": true,
"Price Range": 1,
"Outdoor Seating": false,
"Takes Reservations": false,
"Waiter Service": true,
"Wi-Fi": "free",
"Caters": false,
"Good For": {
"dessert": false,
"latenight": false,
"lunch": false,
"dinner": false,
"brunch": false,
"breakfast": false
},
"Parking": {
"garage": false,
"street": false,
"validated": false,
"lot": true,
"valet": false
},
"Has TV": true,
"Good For Groups": true
},
"type": "business"
}
I need to remove the information contained within the hours element, however the information is not always the same. Some contain all the days and some only contain one or two day information.
This is the code I've tried:
import json
with open('data.json') as data_file:
data = json.load(data_file)
for element in data:
del element['hours']
However, I am getting an error when running the code:
TypeError: 'unicode' object does not support item deletion

Let's assume you want to overwrite the same file:
import json
with open('data.json', 'r') as data_file:
data = json.load(data_file)
for element in data:
element.pop('hours', None)
with open('data.json', 'w') as data_file:
data = json.dump(data, data_file)
dict.pop(<key>, not_found=None) is probably what you where looking for, if I understood your requirements. Because it will remove the hours key if present and will not fail if not present.
However I am not sure I understand why it makes a difference to you whether the hours key contains some days or not, because you just want to get rid of the whole key/value pair, right?
Now, if you really want to use del instead of pop, here is how you could make your code work:
import json
with open('data.json') as data_file:
data = json.load(data_file)
for element in data:
if 'hours' in element:
del element['hours']
with open('data.json', 'w') as data_file:
data = json.dump(data, data_file)
If you want to write it to another file, just change the filename in the second open statement.
I had to change the indentation, as you might have noticed, so that the file has been closed during the data cleanup phase and can be overwritten at the end.
with is what is called a context manager, whatever it provides (here the data_file file descriptor) is available only within that context. It means that as soon as the indentation of the with block ends, the file gets closed and the context ends, along with the file descriptor which becomes invalid/obsolete.
Without doing this, you wouldn't be able to open the file in write mode and get a new file descriptor to write into.

Json extraction of specfic field via Python

Trying to get the "externalCode" field from the below incomplete json file, however i am lost, i used python to only get to second element and get the error. I am not sure how to go about traversing through a nested JSON as such below
output.writerow([row['benefitCategories'], row['benefitValueSets']] + row['disabled'].values())
KeyError: 'benefitValueSets'
import csv, json, sys
input = open('C:/Users/kk/Downloads/foo.js', 'r')
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow([row['benefitCategories'], row['benefitValueSets']] + row['disabled'].values())
Json file
[
{
"benefitCategories": [
{
"benefits": [
{
"benefitCode": "NutritionLabel",
"benefitCustomAttributeSets": [
],
"benefitValueSets": [
{
"benefitValues": [
null
],
"costDifferential": 0,
"default": false,
"disabled": false,
"displayValue": "$500",
"externalCode": null,
"id": null,
"internalCode": "$500",
"selected": false,
"sortOrder": 0
}
],
"configurable": false,
"displayName": "DEDUCTIBLE",
"displayType": null,
"externalCode": "IndividualInNetdeductibleAmount",
"id": null,
"key": "IndividualInNetdeductibleAmount",
"productBenefitRangeValue": null,
"sortOrder": 0,
"values": [
{
"code": null,
"description": null,
"id": null,
"numericValue": null,
"selected": false,
"value": "$500"
}
]
},
{
"benefitCode": "NutritionLabel",
"benefitCustomAttributeSets": [
],
"benefitValueSets": [
{
"benefitValues": [
null
],
"costDifferential": 0,
"default": false,
"disabled": false,
"displayValue": "100%",
"externalCode": null,
"id": null,
"internalCode": "100%",
"selected": false,
"sortOrder": 0
}
],
"configurable": false,
"displayName": "COINSURANCE",
"displayType": null,
"externalCode": "PhysicianOfficeInNetCoInsurancePct",
"id": null,
"key": "PhysicianOfficeInNetCoInsurancePct",
"productBenefitRangeValue": null,
"sortOrder": 0,
"values": [
{
"code": null,
"description": null,
"id": null,
"numericValue": null,
"selected": false,
"value": "100%"
}
]
},
{

Try this code:
import csv, json, sys
input = open('C:/Users/spolireddy/Downloads/foo.js', 'r')
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow([row['benefitCategories'], row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0], row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0]['disabled']])
# for externalCode:
row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0]['externalCode']

I'm not quite sure I understand what you're looking to do with your code. There are multiple externalCode values for each element in the array, at least from the sample you've posted. But you can get the data you're looking for with this syntax:
data[0]["benefitCategories"][0]["benefits"][0]["externalCode"]
data[0]["benefitCategories"][0]["benefits"][1]["externalCode"]
The code below iterates through the data you're interested in (with a slightly modified JSON file so that it's complete) and works as desired:
import csv, json, sys
input = open('junk.json', 'r')
data = json.load(input)
input.close()
for x in data[0]["benefitCategories"][0]["benefits"]:
print x["externalCode"] + "\n\n"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting xontent from text file to list of objects python - python

You can simplify your code, even though it already works... import json with open('cookie.txt', encoding='utf8') as f: cookies = json.load(f) for cookie in cookies: print(cookie.get("name")) # '__adroll_fpc', '_ga'

why you convert the list to a big string first ? You could just do it directly with json.load instead of json.loads with open('cookie.txt', encoding='utf8') as f: data = json.load(f) data # list of 2 dictionaries for dic in data: print(dic.get('name')) Output: __adroll_fpc _ga

Related

obtain data from localstorage in Python

How can I extract URLs from a one line JSON text file using regex?

Sending cookies in Python

Why do I get an error "'unicode' object does not support item deletion" when trying to delete values from a JSON object?

Json extraction of specfic field via Python

Categories

Resources