How to get a string formatted JSON into a table - python

I have the following string formatted JSON data. How can I convert data into a table format in R or Python?
I've tried df = pd.DataFrame(data), but that doesn't work, because data is a string.
data = '{"Id":"048f7de7-81a4-464d-bd6d-df3be3b1e7e8","RecordType":20, "CreationTime":"2019-10-08T12:12:32","Operation":"SetScheduledRefresh", "OrganizationId":"39b03722-b836-496a-85ec-850f0957ca6b","UserType":0, "UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "ItemName":"ASO Daily Statistics","Schedules":{"RefreshFrequency":"Daily", "TimeZone":"E. South America Standard Time","Days":["All"], "Time":["07:30:00","10:30:00","13:30:00","16:30:00","19:30:00","22:30:00"]}, "IsSuccess":true,"ActivityId":"4e8b4514-24be-4ba5-a7d3-a69e8cb8229e"}'
Desired Output:
output =
------------------------------------------------------------------
ID | RecordType | CreationTime
048f7de7-81a4-464d-bd6d-df3be3b1e7e8 | 20 | 2019-10-08T12:12:32
Error:
ValueError Traceback (most recent call last)
<ipython-input-26-039b238b38ef> in <module>
----> 1 df = pd.DataFrame(data)
e:\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
483 )
484 else:
--> 485 raise ValueError("DataFrame constructor not properly called!")
486
487 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!

In Python:
Given data
str.replace true with True
Use ast.literal_eval to convert data from a str to dict
pandas.io.json.json_normalize to convert the json to a pandas dataframe
import pandas as pd
from ast import literal_eval
from pandas.io.json import json_normalize
data = '{"Id":"048f7de7-81a4-464d-bd6d-df3be3b1e7e8","RecordType":20, "CreationTime":"2019-10-08T12:12:32","Operation":"SetScheduledRefresh", "OrganizationId":"39b03722-b836-496a-85ec-850f0957ca6b","UserType":0, "UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "ItemName":"ASO Daily Statistics","Schedules":{"RefreshFrequency":"Daily", "TimeZone":"E. South America Standard Time","Days":["All"], "Time":["07:30:00","10:30:00","13:30:00","16:30:00","19:30:00","22:30:00"]}, "IsSuccess":true,"ActivityId":"4e8b4514-24be-4ba5-a7d3-a69e8cb8229e"}'
data = data.replace('true', 'True')
data = literal_eval(data)
{'ActivityId': '4e8b4514-24be-4ba5-a7d3-a69e8cb8229e',
'CreationTime': '2019-10-08T12:12:32',
'Id': '048f7de7-81a4-464d-bd6d-df3be3b1e7e8',
'IsSuccess': True,
'ItemName': 'ASO Daily Statistics',
'Operation': 'SetScheduledRefresh',
'OrganizationId': '39b03722-b836-496a-85ec-850f0957ca6b',
'RecordType': 20,
'Schedules': {'Days': ['All'],
'RefreshFrequency': 'Daily',
'Time': ['07:30:00',
'10:30:00',
'13:30:00',
'16:30:00',
'19:30:00',
'22:30:00'],
'TimeZone': 'E. South America Standard Time'},
'UserAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
'UserType': 0}
Create the dataframe:
df = json_normalize(data)
Id RecordType CreationTime Operation OrganizationId UserType UserAgent ItemName IsSuccess ActivityId Schedules.RefreshFrequency Schedules.TimeZone Schedules.Days Schedules.Time
048f7de7-81a4-464d-bd6d-df3be3b1e7e8 20 2019-10-08T12:12:32 SetScheduledRefresh 39b03722-b836-496a-85ec-850f0957ca6b 0 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36 ASO Daily Statistics True 4e8b4514-24be-4ba5-a7d3-a69e8cb8229e Daily E. South America Standard Time [All] [07:30:00, 10:30:00, 13:30:00, 16:30:00, 19:30:00, 22:30:00]

You will need the reticulate library: You will need to change all true to True. Look at the code below
a <- 'string = {"Id":"048f7de7-81a4-464d-bd6d-df3be3b1e7e8","RecordType":20,
"CreationTime":"2019-10-08T12:12:32","Operation":"SetScheduledRefresh",
"OrganizationId":"39b03722-b836-496a-85ec-850f0957ca6b","UserType":0,
"UserAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36",
"ItemName":"ASO Daily Statistics","Schedules":{"RefreshFrequency":"Daily",
"TimeZone":"E. South America Standard Time","Days":["All"],
"Time":["07:30:00","10:30:00","13:30:00","16:30:00","19:30:00","22:30:00"]},
"IsSuccess":true,"ActivityId":"4e8b4514-24be-4ba5-a7d3-a69e8cb8229e"}'
data.frame(reticulate::py_eval(gsub('true','True',sub('.*=\\s+','',a))))

Related

Create batches of pandas dataframe based on timestamp

I have a dataframe of the following form:
#timestamp ISP cache_result client_ip client_request_host client_request_method client_ua client_url client_user content_type ... http_response_code major os os_name querystring reply_length_bytes ts_process_time ts_timestamp type ua_name
2018-04-17T08:12:32.000Z cuaerH c rt,nlEIrnii.cec TCP_REFRESH_MISS 25.204.184.124 testhost.net GET Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... /wp-content/themes/Avada/includes/lib/assets/m... - application/javascript ... 200 65.0 Windows 10 Windows 10 ?ver=2.2.3 25204 321 17/Apr/2018:08:12:32 -0000 testdata Chrome
2018-04-17T08:12:32.000Z HeE iclirueIc rat,nrncc. TCP_REFRESH_MISS 8.157.89.174 testhost.net GET Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... /wp-content/plugins/fusion-core/js/min/avada-p... - application/javascript ... 200 65.0 Windows 10 Windows 10 ?ver=1 2825 177 17/Apr/2018:08:12:32 -0000 testdata Chrome
2018-04-17T08:12:33.000Z ,rrnI EnH.ceeiuclcicrat TCP_REFRESH_MISS 37.151.22.36 testhost.net GET Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... /wp-content/themes/Avada/includes/lib/assets/m... - application/javascript ... 200 65.0 Windows 10 Windows 10 ?ver=1 267 275 17/Apr/2018:08:12:33 -0000 testdata Chrome
2018-04-17T08:12:34.000Z tn.cHer uE,lecnir aircIc TCP_REFRESH_MISS 202.165.110.43 testhost.net GET Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... /wp-content/themes/Avada/includes/lib/assets/m... - application/javascript ... 200 65.0 Windows 10 Windows 10 ?ver=1 341 172 17/Apr/2018:08:12:34 -0000 testdata Chrome
2018-04-17T08:12:34.000Z rneecHuraci ctInir cl.,E TCP_REFRESH_MISS 174.201.44.32 testhost.net GET Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... /wp-content/plugins/fusion-builder/assets/js/m... - application/javascript ... 200 65.0 Windows 10 Windows 10 ?ver=1 302 180 17/Apr/2018:08:12:34 -0000 testdata Chrome
Is it possible to somehow split it to 2 minutes intervals? Let's say a function that takes the whole dataframe and outputs a df with the rows of the first 2 minutes, then if called again, it outputs the df with the rows of the next 2 minutes and so on.
EDIT: A larger portion of my data is the following:
{"#timestamp":"2018-04-17T08:12:32.000Z","ISP":"cuaerH c rt,nlEIrnii.cec","cache_result":"TCP_REFRESH_MISS","client_ip":"25.204.184.124","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/library\/jquery.ilightbox.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"ecftdl1e","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=2.2.3","reply_length_bytes":25204,"ts_process_time":321,"ts_timestamp":"17\/Apr\/2018:08:12:32 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:32.000Z","ISP":"HeE iclirueIc rat,nrncc.","cache_result":"TCP_REFRESH_MISS","client_ip":"8.157.89.174","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/plugins\/fusion-core\/js\/min\/avada-portfolio.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"ced1tlef","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":2825,"ts_process_time":177,"ts_timestamp":"17\/Apr\/2018:08:12:32 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:33.000Z","ISP":" ,rrnI EnH.ceeiuclcicrat","cache_result":"TCP_REFRESH_MISS","client_ip":"37.151.22.36","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/general\/fusion-waypoints.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"lde1ftce","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":267,"ts_process_time":275,"ts_timestamp":"17\/Apr\/2018:08:12:33 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:34.000Z","ISP":"tn.cHer uE,lecnir aircIc","cache_result":"TCP_REFRESH_MISS","client_ip":"202.165.110.43","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/library\/jquery.requestAnimationFrame.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"cl1etefd","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":341,"ts_process_time":172,"ts_timestamp":"17\/Apr\/2018:08:12:34 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:34.000Z","ISP":"rneecHuraci ctInir cl.,E","cache_result":"TCP_REFRESH_MISS","client_ip":"174.201.44.32","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/plugins\/fusion-builder\/assets\/js\/min\/general\/fusion-countdown.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"ctl1fdee","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":302,"ts_process_time":180,"ts_timestamp":"17\/Apr\/2018:08:12:34 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:35.000Z","ISP":"ri enuaHccecrcnl,.tir EI","cache_result":"TCP_REFRESH_MISS","client_ip":"170.122.151.169","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/plugins\/fusion-builder\/assets\/js\/min\/general\/fusion-flip-boxes.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"cl1feted","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":376,"ts_process_time":178,"ts_timestamp":"17\/Apr\/2018:08:12:35 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:36.000Z","ISP":"earr ec,ulIriccnH.ci ntE","cache_result":"TCP_REFRESH_MISS","client_ip":"177.120.159.58","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/library\/jquery.appear.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"t1lceedf","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":1331,"ts_process_time":179,"ts_timestamp":"17\/Apr\/2018:08:12:36 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:36.000Z","ISP":"a, uEr.cnIlHeictrecrcni ","cache_result":"TCP_REFRESH_MISS","client_ip":"94.247.12.106","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/plugins\/fusion-builder\/assets\/js\/min\/general\/fusion-tabs.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"fetel1dc","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":1154,"ts_process_time":86,"ts_timestamp":"17\/Apr\/2018:08:12:36 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:37.000Z","ISP":"rlcEt.icree ncaI uHi,crn","cache_result":"TCP_REFRESH_MISS","client_ip":"149.218.159.35","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/library\/jquery.hoverintent.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"lecte1df","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":463,"ts_process_time":172,"ts_timestamp":"17\/Apr\/2018:08:12:37 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:38.000Z","ISP":"e,ir ctuE iccnanrceIHlr.","cache_result":"TCP_REFRESH_MISS","client_ip":"138.228.110.199","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/library\/jquery.cycle.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"e1ftlecd","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=3.0.3","reply_length_bytes":7523,"ts_process_time":179,"ts_timestamp":"17\/Apr\/2018:08:12:38 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:39.000Z","ISP":"nirEei,latnu.cr cIH recc","cache_result":"TCP_REFRESH_MISS","client_ip":"117.81.45.92","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/library\/jquery.placeholder.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"cte1efdl","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=2.0.7","reply_length_bytes":874,"ts_process_time":178,"ts_timestamp":"17\/Apr\/2018:08:12:39 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:39.000Z","ISP":"Eic,e rlHccacrnuntI .rie","cache_result":"TCP_REFRESH_MISS","client_ip":"62.189.164.148","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/general\/fusion-tooltip.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"fe1eltdc","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":452,"ts_process_time":89,"ts_timestamp":"17\/Apr\/2018:08:12:39 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:40.000Z","ISP":"It.crue,lare rHiic cncnE","cache_result":"TCP_REFRESH_MISS","client_ip":"136.44.153.177","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/includes\/lib\/assets\/min\/js\/general\/fusion-ie1011.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"1dcetlef","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=1","reply_length_bytes":526,"ts_process_time":89,"ts_timestamp":"17\/Apr\/2018:08:12:40 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:41.000Z","ISP":"nIr,erecluiiHac cr.Ec nt","cache_result":"TCP_REFRESH_MISS","client_ip":"228.104.233.205","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/assets\/min\/js\/library\/bootstrap.scrollspy.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"ec1edltf","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=3.3.2","reply_length_bytes":1060,"ts_process_time":172,"ts_timestamp":"17\/Apr\/2018:08:12:41 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:42.000Z","ISP":"lrne,tEcuc eircIHc.air n","cache_result":"TCP_REFRESH_MISS","client_ip":"168.41.158.162","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/assets\/min\/js\/library\/jquery.sticky-kit.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"d1efctle","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=5.4.2","reply_length_bytes":1208,"ts_process_time":185,"ts_timestamp":"17\/Apr\/2018:08:12:42 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:27.000Z","ISP":".cccti a neuleEc,rriHnrI","cache_result":"TCP_REFRESH_MISS","client_ip":"113.202.240.119","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/plugins\/revslider\/public\/assets\/js\/jquery.themepunch.tools.min.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"d1eflcet","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=5.4.7","reply_length_bytes":38335,"ts_process_time":313,"ts_timestamp":"17\/Apr\/2018:08:12:27 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:28.000Z","ISP":"lnniueeiIH.ca rtrc ,ccEr","cache_result":"TCP_REFRESH_HIT","client_ip":"190.220.94.243","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2017\/10\/viettan.png","client_user":"-","content_type":"image\/png","device":"Other","dnet":"delcfet1","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":1549,"ts_process_time":170,"ts_timestamp":"17\/Apr\/2018:08:12:28 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:29.000Z","ISP":"ein.rcaelc uEn tIHcrcr,i","cache_result":"TCP_REFRESH_HIT","client_ip":"31.13.51.177","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2018\/03\/facebookviettan.jpg","client_user":"-","content_type":"image\/jpeg","device":"Other","dnet":"edteclf1","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":6705,"ts_process_time":178,"ts_timestamp":"17\/Apr\/2018:08:12:29 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:31.000Z","ISP":"clr,Hncie uaIciEncr. ter","cache_result":"TCP_REFRESH_HIT","client_ip":"128.129.21.211","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2018\/03\/chantroimoimedia.jpg","client_user":"-","content_type":"image\/jpeg","device":"Other","dnet":"edce1tlf","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":6216,"ts_process_time":90,"ts_timestamp":"17\/Apr\/2018:08:12:31 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:43.000Z","ISP":"tnrI.ccenruiirlE He,c ca","cache_result":"TCP_REFRESH_MISS","client_ip":"225.14.12.26","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/themes\/Avada\/assets\/min\/js\/general\/avada-contact-form-7.js","client_user":"-","content_type":"application\/javascript","device":"Other","dnet":"ec1tfled","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":"?ver=5.4.2","reply_length_bytes":504,"ts_process_time":178,"ts_timestamp":"17\/Apr\/2018:08:12:43 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:43.000Z","ISP":"tariirs oftpooorCoMcn","cache_result":"ERR_CLIENT_ABORT","client_ip":"173.38.196.130","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit\/537.51.1 (KHTML, like Gecko) Version\/7.0 Mobile\/11A465 Safari\/9537.53 BingPreview\/1.0b","client_url":"\/amp_preconnect_polyfill_404_or_other_error_expected._Do_not_worry_about_it","client_user":"-","content_type":"text\/html","device":"Spider","dnet":"1ceetlfd","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":404,"major":1.0,"os":"iOS","os_name":"iOS","querystring":"?1523952720000","reply_length_bytes":43261,"ts_process_time":1075,"ts_timestamp":"17\/Apr\/2018:08:12:43 -0000","type":"testdata","ua_name":"BingPreview"}
{"#timestamp":"2018-04-17T08:12:44.000Z","ISP":"i.nae,crHntc uiEcrlr ecI","cache_result":"TCP_REFRESH_HIT","client_ip":"217.198.69.197","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2018\/04\/dong-tam-bat-giu-cong-an-640x360.jpg","client_user":"-","content_type":"image\/jpeg","device":"Other","dnet":"1teedfcl","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":38627,"ts_process_time":228,"ts_timestamp":"17\/Apr\/2018:08:12:44 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:45.000Z","ISP":"c rcEn.,er reHciulitcanI","cache_result":"TCP_MISS","client_ip":"204.99.48.109","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2018\/04\/TMDuc.jpg","client_user":"-","content_type":"image\/jpeg","device":"Other","dnet":"ceteldf1","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":206,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":141770,"ts_process_time":512,"ts_timestamp":"17\/Apr\/2018:08:12:45 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:47.000Z","ISP":"Ht, eri enaErurcIcc.ciln","cache_result":"TCP_REFRESH_HIT","client_ip":"20.204.32.235","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2018\/03\/f1-13.jpg","client_user":"-","content_type":"image\/jpeg","device":"Other","dnet":"tecfe1dl","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":161573,"ts_process_time":593,"ts_timestamp":"17\/Apr\/2018:08:12:47 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:47.000Z","ISP":"Ei .ne,cHncrterarilccu I","cache_result":"TCP_REFRESH_HIT","client_ip":"224.60.44.234","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/uploads\/2018\/04\/f1-9-177x142.jpg","client_user":"-","content_type":"image\/jpeg","device":"Other","dnet":"1fecldet","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":10410,"ts_process_time":170,"ts_timestamp":"17\/Apr\/2018:08:12:47 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:12:48.000Z","ISP":"irre,n Iec.rciu ntlcHacE","cache_result":"TCP_REFRESH_HIT","client_ip":"68.18.239.120","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/65.0.3325.181 Safari\/537.36","client_url":"\/wp-content\/plugins\/contact-form-7\/images\/ajax-loader.gif","client_user":"-","content_type":"image\/gif","device":"Other","dnet":"fcledet1","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":65.0,"os":"Windows 10","os_name":"Windows 10","querystring":null,"reply_length_bytes":847,"ts_process_time":89,"ts_timestamp":"17\/Apr\/2018:08:12:48 -0000","type":"testdata","ua_name":"Chrome"}
{"#timestamp":"2018-04-17T08:21:23.000Z","ISP":"nnH rGmelbeen iOHzt","cache_result":"TCP_MISS","client_ip":"234.197.117.162","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit\/534.58.2 (KHTML, like Gecko) Version\/5.1.8 Safari\/534.58.2","client_url":"\/Nhin-Thay-Gi-Tu-Mot-Hoi-Nghi.html","client_user":"-","content_type":"text\/html","device":"Other","dnet":"1ecfeldt","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":301,"major":5.0,"os":"Mac OS X","os_name":"Mac OS X","querystring":null,"reply_length_bytes":0,"ts_process_time":523,"ts_timestamp":"17\/Apr\/2018:08:21:23 -0000","type":"testdata","ua_name":"Safari"}
{"#timestamp":"2018-04-17T08:22:03.000Z","ISP":"Tx osoy1bcy dPdx hreah ia nOo ra-et51XsiaPttt","cache_result":"ERR_CLIENT_ABORT","client_ip":"218.202.132.77","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Linux; Android 5.1; A1601 Build\/LMY47I; wv) AppleWebKit\/537.36 (KHTML, like Gecko) Version\/4.0 Chrome\/64.0.3282.137 Mobile Safari\/537.36 [FB_IAB\/FB4A;FBAV\/166.0.0.66.95;]","client_url":"\/bat-binh-voi-toa-an-len-lut-giao-hat-van-hanh-noi-lua-hiep-thong-voi-tu-nhan-luong-tam\/","client_user":"-","content_type":"text\/html","device":"A1601","dnet":"ftee1dlc","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":301,"major":166.0,"os":"Android","os_name":"Android","querystring":null,"reply_length_bytes":17707,"ts_process_time":31255,"ts_timestamp":"17\/Apr\/2018:08:22:03 -0000","type":"testdata","ua_name":"Facebook"}
{"#timestamp":"2018-04-17T08:21:25.000Z","ISP":"ne z briltHmOnHGeen","cache_result":"TCP_MISS","client_ip":"69.10.61.78","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit\/534.58.2 (KHTML, like Gecko) Version\/5.1.8 Safari\/534.58.2","client_url":"\/nhin-thay-gi-tu-mot-hoi-nghi\/","client_user":"-","content_type":"text\/html","device":"Other","dnet":"cfdt1lee","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":5.0,"os":"Mac OS X","os_name":"Mac OS X","querystring":null,"reply_length_bytes":19351,"ts_process_time":1302,"ts_timestamp":"17\/Apr\/2018:08:21:25 -0000","type":"testdata","ua_name":"Safari"}
{"#timestamp":"2018-04-17T08:21:29.000Z","ISP":"gooLLeG lC","cache_result":"TCP_HIT","client_ip":"167.182.156.107","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (compatible; Google-Apps-Script)","client_url":"\/-","client_user":"-","content_type":"text\/html","device":"Other","dnet":"cee1tfld","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":null,"os":"Other","os_name":"Other","querystring":null,"reply_length_bytes":16962,"ts_process_time":0,"ts_timestamp":"17\/Apr\/2018:08:21:29 -0000","type":"testdata","ua_name":"Other"}
{"#timestamp":"2018-04-17T08:21:28.000Z","ISP":"eLLol GCgo","cache_result":"TCP_HIT","client_ip":"207.89.148.171","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (compatible; Google-Apps-Script)","client_url":"\/-","client_user":"-","content_type":"text\/html","device":"Other","dnet":"c1dleeft","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":200,"major":null,"os":"Other","os_name":"Other","querystring":null,"reply_length_bytes":16962,"ts_process_time":0,"ts_timestamp":"17\/Apr\/2018:08:21:28 -0000","type":"testdata","ua_name":"Other"}
{"#timestamp":"2018-04-17T08:28:51.000Z","ISP":"oeClL LgoG","cache_result":"TCP_IMS_HIT","client_ip":"98.217.204.182","client_request_host":"testhost.net","client_request_method":"GET","client_ua":"Mozilla\/5.0 (Linux; Android 6.0.1; Nexus 5X Build\/MMB29P) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/41.0.2272.96 Mobile Safari\/537.36 (compatible; Googlebot\/2.1; +http:\/\/www.google.com\/bot.html)","client_url":"\/wp-content\/plugins\/accelerated-mobile-pages\/templates\/design-manager\/design-3\/fonts\/ptserif\/PT_Serif-Web-Regular.ttf","client_user":"-","content_type":"-","device":"Spider","dnet":"efd1etlc","host":"testhost.deflect.ca","http_request_scheme":"http","http_request_version":"HTTP\/1.1","http_response_code":304,"major":2.0,"os":"Android","os_name":"Android","querystring":null,"reply_length_bytes":0,"ts_process_time":0,"ts_timestamp":"17\/Apr\/2018:08:28:51 -0000","type":"testdata","ua_name":"Googlebot"}
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"timestamp": [
"2018-04-17T08:12:32.000Z",
"2018-04-17T08:11:33.000Z",
"2018-04-17T08:14:31.000Z",
"2018-04-17T08:25:35.000Z",
"2018-04-17T08:16:36.000Z",
"2018-04-17T08:10:42.000Z",
"2018-04-17T08:18:38.000Z",
"2018-04-17T08:09:29.000Z",
"2018-04-17T08:30:40.000Z",
"2018-04-17T08:21:21.000Z",
],
"value": [9, 2, 3, 4, 7, 8, 1, 2, 0, 3],
}
)
Here is one way to do it by defining a generator function:
def chunk(df, delta_in_min):
"""Helper function.
Args:
df: dataframe to split in chunks.
delta_in_min: size of chunk in minute (at least one).
Yields:
Chunk of input dataframe of the given size.
"""
start = df.index[0]
while True:
if delta_in_min <= 0:
yield df
break
end = start + pd.Timedelta(value=delta_in_min, unit="m")
if end > df.index[-1]:
yield df.loc[(df.index >= start), :]
break
yield df.loc[(df.index >= start) & (df.index < end), :]
start = end
if start > df.index[-1]:
break
And then:
df["timestamp"] = pd.to_datetime(df["timestamp"], infer_datetime_format=True)
df = df.set_index("timestamp").sort_index()
From here, you can call print(next(chunk(df, 2))) repeatedly to get each chunk or use a for loop, like this:
for s in chunk(df, 2):
print(s)
# Output
value
timestamp
2018-04-17 08:09:29+00:00 2
2018-04-17 08:10:42+00:00 8
value
timestamp
2018-04-17 08:11:33+00:00 2
2018-04-17 08:12:32+00:00 9
value
timestamp
2018-04-17 08:14:31+00:00 3
value
timestamp
2018-04-17 08:16:36+00:00 7
value
timestamp
2018-04-17 08:18:38+00:00 1
value
timestamp
2018-04-17 08:21:21+00:00 3
Empty DataFrame
Columns: [value]
Index: []
Empty DataFrame
Columns: [value]
Index: []
value
timestamp
2018-04-17 08:25:35+00:00 4
Empty DataFrame
Columns: [value]
Index: []
value
timestamp
2018-04-17 08:30:40+00:00 0

How to get correct date format from JSON string in Python?

I am trying to get some data from a JSON url using Python and convert in Pandas PD. Everything is working OK. Only there is a column for date. It is coming weired. How can I format it into correct date format? My code is given below:
sym_1 = 'NIFTY'
headers_gen = {"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"}
def PCR(sym_1):
url_pcr = "https://opstra.definedge.com/api/futures/pcr/chart/" + sym_1
req_pcr = requests.get(url_pcr, headers=headers_gen)
text_data_pcr= req_pcr.text
json_dict_pcr= json.loads(text_data_pcr)
df_pcr = pd.DataFrame.from_dict(json_dict_pcr['data'])
print(df_pcr)
return df_pcr
pd.to_datetime(..., unit="ms") fixes things.
I also simplified the requests code a tiny bit and added error handling.
import pandas as pd
import requests
headers_gen = {
"accept-encoding": "gzip, deflate",
"accept-language": "en-US,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36",
}
def PCR(sym_1):
req_pcr = requests.get(f"https://opstra.definedge.com/api/futures/pcr/chart/{sym_1}", headers=headers_gen)
req_pcr.raise_for_status()
data = req_pcr.json()
df_pcr = pd.DataFrame.from_dict(data['data'])
df_pcr[0] = pd.to_datetime(df_pcr[0], unit='ms')
return df_pcr
if __name__ == '__main__':
print(PCR('NIFTY'))
outputs
0 1 2 ... 6 7 8
0 2019-04-26 05:30:00 11813.50 1.661348 ... NaN NaN NaN
1 2019-04-30 05:30:00 11791.55 1.587803 ... NaN NaN NaN
2 2019-05-02 05:30:00 11765.40 1.634619 ... NaN NaN NaN
.. ... ... ... ... ... ... ...
735 2022-04-18 00:00:00 17229.60 1.169555 ... 0.963420 0.771757 1.328892
736 2022-04-19 00:00:00 16969.35 1.014768 ... 1.385167 0.980847
sym_1 = 'NIFTY'
headers_gen = {"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36"}
def PCR(sym_1):
url_pcr = "https://opstra.definedge.com/api/futures/pcr/chart/" + sym_1
req_pcr = requests.get(url_pcr, headers=headers_gen)
text_data_pcr= req_pcr.text
json_dict_pcr= json.loads(text_data_pcr)
df_pcr = pd.DataFrame.from_dict(json_dict_pcr['data'])
df_pcr[0] = df_pcr[0].apply(lambda x: datetime.utcfromtimestamp(x / 1000).astimezone(pytz.timezone('Asia/Kolkata')))
print(df_pcr)
return df_pcr
Updated to use apply and return datetime instead of string but AKX's answer is much more elegant.
Updated to use IST

Webscraping website - can't print price - api & json i think

having trouble with this website to print price, i think i'm close but getting errors.
please help, tx
"
{'statusDetails': {'state': 'FAILURE', 'errorCode': 'SYS-3003', 'correlationid': 'rrt-5636881267628447407-b-gsy1-18837-18822238-1', 'description': 'Invalid key identifier or token'}}
"
code:
import requests
import json
s = requests.Session()
url = 'https://www.bunnings.com.au/ozito-pxc-2-x-18v-cordless-line-trimmer-skin-only_p0167719'
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}
resp = s.get(url,headers=header)
api_url = f'https://api.prod.bunnings.com.au/v1/products/0167719/fulfillment/6400/radius/100000?isToggled=true'
price_resp = s.get(api_url,headers=header).json()
print(price_resp)
#price = price_resp['data']['price']['value']
#print(price)

I can't send form data to Python Jupyter

I try to build a python script which sends a POST with parameters for extracting the result, but I don't know where is my problem or why I can't get the page result with the html that I need...
import requests
url = ('https://ar.ec.universal-assistance.com/cotizar-asistencia-al-viajero')
data = {
'__RequestVerificationToken':'QWsTn0wqFmW9_jFfaBuuOjaWM4TE2Xk1XGn-oDTp0TENBO725YSkGnK8WeiAN53-jiPnjTDJ6zbZQjb6SzpprdCT4OlJg9jjZJKx1Wh7fGkZ5yCLkArUWCp6AIwq0t12gsonhP3orHzFJ2_1YqvIfJMcnzn2aXCb1-ZrDOzHM701',
'CCTLD':'.ar',
'CodigoOrganizacion':"",
'CodigoConvenio':"",
'OcultarTipoViaje':'false',
'CantidadPasajeros':1,
'CantidadDias':3,
'Origen':'ARGENTINA',
'Destino':'Centro america/Caribe',
'TipoViaje':'Un viaje',
'FechaInicio':'20/06/2019',
'FechaFin':'22/06/2019',
'Edad1':27,
'Edad2':"",
'Edad3':"",
'Edad4':"",
'Edad5':"",
'Edad6':"",
'Edad7':"",
'Edad8':"",
'Edad9':"",
'Edad10':"",
'Email':'no#no.com',
'Nombre':'PEDRO',
'Apellido':'PEREZ',
'CodigoArea':800,
'NumeroTelefono':9997777,
'dr':"",
'cn':'(direct)',
'cs':'(direct)',
'cm':'(none)',
'ck':'(not set 5)',
'cc':'(not set 5)',
'ua':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'ref':'ar.ec.universal-assistance.com',
'sr':'1366x768',
'vp':'1366x728'
}
resp = requests.post(url = url, data = data )
print(resp.text)
And I tried:
import requests
url = ('https://ar.ec.universal-assistance.com/cotizar-asistencia-al-viajero')
header = {
":authority": "ar.ec.universal-assistance.com",
":method": "POST",
":path": "/cotizar-asistencia-al-viajero",
":scheme": "https",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"accept-encoding": "gzip, deflate, br",
"accept-language": "es-ES,es;q=0.9",
"cache-control": "max-age=0",
"content-length": "921",
"content-type": "application/x-www-form-urlencoded",
"cookie":"__cfduid=db577552fb94c6b34d51ff081f56060601559763003; _ga=GA1.2.1051767382.1559763017; _fbp=fb.1.1559763016835.1821187230; ASP.NET_SessionId=zirpp0zdbsvt4p102zvfknic; __RequestVerificationToken=RuGNfaFUJxBI4FDOaVsMJBdBNwbqzUt_AMjdUu6Am3T6kpBrZ5__wM8CiDO3Ttw6z6iBseVrGvzsyD-GCoWI2XRuhHpJB3-qu7qXvjoDu3NQL6onXupDL1E4ZkUXuHpDSPi0mjQ7F5PSFf2l_SGtDA2; _gid=GA1.2.1308535569.1560799988; _gat=1",
"origin": "https://ar.ec.universal-assistance.com",
"referer": "https://ar.ec.universal-assistance.com/cotizar-asistencia-al-viajero",
"upgrade-insecure-requests": 1,
"user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
},
data = {
'__RequestVerificationToken':'QWsTn0wqFmW9_jFfaBuuOjaWM4TE2Xk1XGn-oDTp0TENBO725YSkGnK8WeiAN53-jiPnjTDJ6zbZQjb6SzpprdCT4OlJg9jjZJKx1Wh7fGkZ5yCLkArUWCp6AIwq0t12gsonhP3orHzFJ2_1YqvIfJMcnzn2aXCb1-ZrDOzHM701',
'CCTLD':'.ar',
'CodigoOrganizacion':"",
'CodigoConvenio':"",
'OcultarTipoViaje':'false',
'CantidadPasajeros':1,
'CantidadDias':3,
'Origen':'ARGENTINA',
'Destino':'Centro america/Caribe',
'TipoViaje':'Un viaje',
'FechaInicio':'20/06/2019',
'FechaFin':'22/06/2019',
'Edad1':27,
'Edad2':"",
'Edad3':"",
'Edad4':"",
'Edad5':"",
'Edad6':"",
'Edad7':"",
'Edad8':"",
'Edad9':"",
'Edad10':"",
'Email':'no#no.com',
'Nombre':'PEDRO',
'Apellido':'PEREZ',
'CodigoArea':800,
'NumeroTelefono':9997777,
'dr':"",
'cn':'(direct)',
'cs':'(direct)',
'cm':'(none)',
'ck':'(not set 5)',
'cc':'(not set 5)',
'ua':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'ref':'ar.ec.universal-assistance.com',
'sr':'1366x768',
'vp':'1366x728'
}
resp = requests.post(url = url, data = data )
print(resp.text)
I expect the html of "https://ar.ec.universal-assistance.com/ofertas-asistencia-al-viajero"
That would be the next page from first url.

Apache log file data analysis with python pandas

The problem with me is bit hard to explain. I'm analyzing a Apache log file which following is one line from it.
112.135.128.20 - [13/May/2013:23:55:04 +0530] "GET /SVRClientWeb/ActionController HTTP/1.1" 302 2 "https://www.example.com/sample" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile/10B329" GET /SVRClientWeb/ActionController - HTTP/1.1 www.example.com
Some parts from my code:
df = df.rename(columns={'%>s': 'Status', '%b':'Bytes Returned',
'%h':'IP', '%l':'Username', '%r': 'Request', '%t': 'Time', '%u': 'Userid', '%{Referer}i': 'Referer', '%{User-Agent}i': 'Agent'})
df.index = pd.to_datetime(df.pop('Time'))
test = df.groupby(['IP', 'Agent']).size()
test.sort()
print test[-20:]
I read log file to a data frame and get the following output with hit counts and user agents.
IP Agent
74.86.158.106 Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/) 369
203.81.107.103 Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20100101 Firefox/21.0 388
173.199.120.155 Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/) 417
124.43.84.242 Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 448
112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36 454
124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0 461
124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firefox/21.0 467
Then I want to get the
Most highest 3 hit counts(their IPs) and find the frequency of their occurrence?(like time difference between each hit occurrence
of the IP)
How to find whether there are different agents for one single IP?
At least please explain me how to solve above problems?
To do the first part you could just sort the DataFrame (by count) and take the top three rows:
In [11]: df.sort('Count', ascending=False).head(3)
Out[11]:
IP Agent Count
6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467
5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461
4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454
To test whether there are multiple rows (Agents) for a single IP you can use groupby:
In [12]: g = df.groupby('IP')
In [13]: repeated = g.count().Count != 1
In [14]: repeated
Out[14]:
IP
112.135.196.223 False
124.43.104.198 False
124.43.155.138 False
124.43.84.242 False
173.199.120.155 False
203.81.107.103 False
74.86.158.106 False
Name: Count, dtype: bool
In [15]: repeated[repeated]
Out[15]: Series([], dtype: bool)
There are none in this example.
In order to avoid sorting the entire DataFrame, it's possible and it could be more efficient (update: IT'S NOT) to use heapq (I don't think there is an nlargest in pandas):
In [21]: from heapq import nlargest
In [22]: top_3 = nlargest(3, df.iterrows(), key=lambda x: x[1]['Count'])
In [23]: pd.DataFrame.from_items(top_3).T
Out[23]:
IP Agent Count
6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467
5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461
4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454

Categories

Resources