Extract value in Python - python

My Code:
import requests
import json
web_page = requests.get("http://api.bart.gov/api/etd.aspx?cmd=etd&orig=mont&key=MW9S-E7SL-26DU-VV8V&json=y")
response = web_page.text
parsed_json = json.loads(response)
#print(parsed_json)
print(parsed_json['root']['date'])
print(parsed_json['root']['time'])
print(parsed_json['root']['station']['name'])
How to extract value of destination and minutes from below in Python.
[{'name': 'Montgomery St.', 'abbr': 'MONT', 'etd': [{'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'WHITE', 'hexcolor': '#ffffff', 'bikeflag': '1', 'delay': '220'}]}, {'destination': 'SF Airport', 'abbreviation': 'SFIA', 'limited': '0', 'estimate': [{'minutes': '16', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '132'}, {'minutes': '26', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '69'}]}]}]

Try this:
json_obj = {'name': 'Montgomery St.', 'abbr': 'MONT', 'etd': [{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '1', 'platform': '2', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '254'}]},
{'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '0', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '0'}]},
{'destination': 'SF Airport', 'abbreviation': 'SFIA', 'limited': '0', 'estimate': [{'minutes': '38', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0'}]}]}
for item in json_obj['etd']:
dest = item['destination']
minute = item['estimate'][0]['minutes']
print(dest, minute)
Output:
Antioch 1
Daly City 39
SF Airport 38

The problem is in parsed_json['root']['station']['name']. parsed_json['root']['station'] is a list, not a dict, so it doesn't have name key. You need to use index 0 or iterate over it
for station in parsed_json['root']['station']:
for etd in station['etd']:
for estimate in etd['estimate']:
print(etd['destination'], estimate['minutes'])
Output
Daly City 35
SF Airport 16
SF Airport 26

Try this to get json data:
import json
# some JSON:
json_data= {'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '0', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '0'}]}
# parse json_data:
data = json.dumps(json_data)
extract_json = json.loads(data)
print("Destination: "+extract_json["destination"])
print("Minutes: "+extract_json["estimate"][0]["minutes"])
Output:
Destination: Daly City
Minutes: 39

Assuming the data is in d_MONT:
d_MONT = {'name': 'Montgomery St.', 'abbr': 'MONT', 'etd': [{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '1', 'platform': '2', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '254'}]},
{'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '0', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '0'}]},
{'destination': 'SF Airport', 'abbreviation': 'SFIA', 'limited': '0', 'estimate': [{'minutes': '38', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0'}]}]}
This will find the next train to destinationRequired:
destinationList = d_MONT['etd']
destinationRequired = 'Daly City'
for destinationDict in destinationList:
if destinationDict['destination'] == destinationRequired:
earliest = None
for estimate in destinationDict['estimate']:
if earliest is None or estimate['minutes'] < eariest:
earliest = estimate['minutes']
print("Next train to {0}: {1} minutes".format(destinationRequired, earliest))
break
else:
print("No trains to {0}".format(destinationRequired))
Note there are more Pythonic ways to do this, and the code example above does not follow PEP8, but I think it is important you understand the basic logic of how to do what you want rather than a complex Python one-liner.
You do not document the JSON object format, so I don't think it is safe to assume the list of trains to destination will be in order, therefore the safest is to step through each one and find the earliest. It isn't even clear if more than one train will ever be returned in the list, in which case a simple [0] would be sufficient rather than stepping through each one.

Related

How can I import a nested json object into a pandas dataframe?

I have a json object like this:
[{'currency_pair': 'UOS_USDT',
'orders': [{'account': 'spot',
'amount': '1282.84',
'create_time': '1655394430',
'create_time_ms': 1655394430129,
'currency_pair': 'UOS_USDT',
'fee': '0',
'fee_currency': 'UOS',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208865523',
'left': '1282.84',
'point_fee': '0',
'price': '0.1949',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394430',
'update_time_ms': 1655394430129}],
'total': 1},
{'currency_pair': 'RMRK_USDT',
'orders': [{'account': 'spot',
'amount': '79.365',
'create_time': '1655394431',
'create_time_ms': 1655394431249,
'currency_pair': 'RMRK_USDT',
'fee': '0',
'fee_currency': 'RMRK',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208877018',
'left': '79.365',
'point_fee': '0',
'price': '2.52',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394431',
'update_time_ms': 1655394431249}],
'total': 1}]
I want to convert it to a dataframe.
The data comes from an api call to a crypto exchange. I converted this to json, using the .json() method. So it's proper json. I have tried:
df = pd.DataFrame(data)
df = pd.DataFrame(data["orders")
df = pd.DataFrame(data["currency_pair"]["orders"])
and every other imaginable path.
I want a df which has as columns ["currency_pair", "amount", "create_time", "price", "side"]
I some times get an error TypeError: list indices must be integers or slices, not str or the df works but the orders object is not unpacked. All help gratefully received. Thank you.
import pandas as pd
data = [{'currency_pair': 'UOS_USDT',
'orders': [{'account': 'spot',
'amount': '1282.84',
'create_time': '1655394430',
'create_time_ms': 1655394430129,
'currency_pair': 'UOS_USDT',
'fee': '0',
'fee_currency': 'UOS',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208865523',
'left': '1282.84',
'point_fee': '0',
'price': '0.1949',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394430',
'update_time_ms': 1655394430129}],
'total': 1},
{'currency_pair': 'RMRK_USDT',
'orders': [{'account': 'spot',
'amount': '79.365',
'create_time': '1655394431',
'create_time_ms': 1655394431249,
'currency_pair': 'RMRK_USDT',
'fee': '0',
'fee_currency': 'RMRK',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208877018',
'left': '79.365',
'point_fee': '0',
'price': '2.52',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394431',
'update_time_ms': 1655394431249}],
'total': 1}]
Use:
df = pd.json_normalize(data, record_path=['orders'])
And keep the columns you need.
It's only one line and it should cover your case since 'currency_pair' that you want is already in the 'orders' dictionary and from what I understand from your data it will always be the same as the 'currency_pair' value outside 'orders. As you said you don't need 'total' too.
Use:
df = pd.json_normalize(data, record_path=['orders'], meta=['currency_pair', 'total'], record_prefix='orders_')
If you want them all
import pandas as pd
data = [{'currency_pair': 'UOS_USDT',
'orders': [{'account': 'spot',
'amount': '1282.84',
'create_time': '1655394430',
'create_time_ms': 1655394430129,
'currency_pair': 'UOS_USDT',
'fee': '0',
'fee_currency': 'UOS',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208865523',
'left': '1282.84',
'point_fee': '0',
'price': '0.1949',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394430',
'update_time_ms': 1655394430129}],
'total': 1},
{'currency_pair': 'RMRK_USDT',
'orders': [{'account': 'spot',
'amount': '79.365',
'create_time': '1655394431',
'create_time_ms': 1655394431249,
'currency_pair': 'RMRK_USDT',
'fee': '0',
'fee_currency': 'RMRK',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208877018',
'left': '79.365',
'point_fee': '0',
'price': '2.52',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394431',
'update_time_ms': 1655394431249}],
'total': 1}]
df = pd.DataFrame(data)
df['amount'] = df.apply( lambda row: row.orders[0]['amount'] , axis=1)
df['create_time'] = df.apply( lambda row: row.orders[0]['create_time'] , axis=1)
df['price'] = df.apply( lambda row: row.orders[0]['price'] , axis=1)
df['side'] = df.apply( lambda row: row.orders[0]['side'] , axis=1)
required_df = df[['currency_pair', 'amount', 'create_time', 'price', 'side']]
required_df
Result:
currency_pair amount create_time price side
0 UOS_USDT 1282.84 1655394430 0.1949 buy
1 RMRK_USDT 79.365 1655394431 2.52 buy
HI, hope this process can help you
#Import pandas library
import pandas as pd
#Your data
data = [{'currency_pair': 'UOS_USDT',
'orders': [{'account': 'spot',
'amount': '1282.84',
'create_time': '1655394430',
'create_time_ms': 1655394430129,
'currency_pair': 'UOS_USDT',
'fee': '0',
'fee_currency': 'UOS',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208865523',
'left': '1282.84',
'point_fee': '0',
'price': '0.1949',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394430',
'update_time_ms': 1655394430129}],
'total': 1},
{'currency_pair': 'RMRK_USDT',
'orders': [{'account': 'spot',
'amount': '79.365',
'create_time': '1655394431',
'create_time_ms': 1655394431249,
'currency_pair': 'RMRK_USDT',
'fee': '0',
'fee_currency': 'RMRK',
'fill_price': '0',
'filled_total': '0',
'gt_discount': False,
'gt_fee': '0',
'iceberg': '0',
'id': '169208877018',
'left': '79.365',
'point_fee': '0',
'price': '2.52',
'rebated_fee': '0',
'rebated_fee_currency': 'USDT',
'side': 'buy',
'status': 'open',
'text': 'apiv4',
'time_in_force': 'gtc',
'type': 'limit',
'update_time': '1655394431',
'update_time_ms': 1655394431249}],
'total': 1}]
#Accessing nested values
#you cloud transform the specific column
#into a DataFrame and access it values with indices
#then parse the value to the type you need
#i.e
float(pd.DataFrame(data[0]['orders'])['amount'].values[0])
int(pd.DataFrame(data[0]['orders'])['create_time'].values[0])
float(pd.DataFrame(data[0]['orders'])['price'].values[0])
pd.DataFrame(data[0]['orders'])['side'].values[0]
#Create a dictionary with your chosen structure
#["currency_pair", "amount", "create_time", "price", "side"]
# then insert the corresponding columns
custom_dictionary = {
'currency_pair': [data[0]['currency_pair'], data[1]['currency_pair']],
'amount': [float(pd.DataFrame(data[0]['orders'])['amount'].values[0]),
float(pd.DataFrame(data[1]['orders'])['amount'].values[0])],
'create_time': [int(pd.DataFrame(data[0]['orders'])['create_time'].values[0]),
int(pd.DataFrame(data[1]['orders'])['create_time'].values[0])],
'price': [float(pd.DataFrame(data[0]['orders'])['price'].values[0]),
float(pd.DataFrame(data[1]['orders'])['price'].values[0])],
'side': [pd.DataFrame(data[0]['orders'])['side'].values[0],
pd.DataFrame(data[1]['orders'])['side'].values[0]]}
#Create a DataFrame with your custom dictionary and voila
df = pd.DataFrame(custom_dictionary)
df
the dataframe (df) could look like:
custom DataFrame

Grabbing second tree of information and to make a new table with pandas Dataframe

need help making a function to pull a second tree of information in td entries and make there own separate columns.
this is the code that generates the html text:
def viewer(df,orig_dict):
cur_dict = orig_dict
prev_dicts = []
prev_dfs = []
columns = []
while(True):
df_for_html = df
print(df_for_html.to_html())
display(df)
select_dict = {}
print("Selectable options: ")
options = " " + "(q)uit, (b)ack, (p)rint tree, "
if type(cur_dict) == dict:
columns = df.columns.values
for i,opt in enumerate(columns):
select_dict[str(i)] = opt
options += str(i) + ") " + str(opt) + " "
elif type(cur_dict) == list:
options += "row "
for j in range(0,len(cur_dict)):
select_dict[str(j)] = j
options += str(j) + " "
options = options[:-1]
print(options)
selection = input()
if selection == 'q':
quit()
elif selection == 'b':
if len(prev_dicts) > 0:
cur_dict = prev_dicts.pop()
df = prev_dfs.pop()
else:
print_start_menu(orig_dict)
elif selection == 'p':
print_tree(cur_dict)
elif selection.isdigit():
select_str = select_dict.get(selection)
if select_str == None:
print("Invalid number, please try again.")
else:
print("You selected " + str(select_str))
prev_dicts.append(cur_dict)
if type(cur_dict) == dict:
cur_dict = cur_dict[select_str]
elif type(cur_dict) == list:
cur_dict = cur_dict[select_str]
prev_dfs.append(df)
try:
df = pd.DataFrame.from_dict(cur_dict)
except ValueError:
print(str(select_str) + "= " + str(cur_dict))
def main():
in_file = sys.argv[1]
assert(os.path.isfile(in_file))
with open(in_file,"r") as f:
parser = etree.XMLParser()
tree = etree.parse(f,parser)
root = tree.getroot()
orig_dict = parse_xml(root)
df = pd.DataFrame.from_dict(orig_dict)
df_for_html = df
print(df_for_html.to_html())
As you see Th Entries should be also separated into columns based of their dictionaries so I need to know if there is a function to go deeper into a xml file and able to sort into a column if possible.
<table border="1" class="dataframe">
<thead>
<tr style="text-align: center;">
<th></th>
<th>DDMap</th>
</tr>
</thead>
<tbody>
<tr>
<th>Entries</th>
<td>{'DDEntry': [{'Name': 'CH_0', 'Channel': '0', 'Mode': 'INPUT', 'InitialValue': 'LOW', 'Hidden': 'false', 'VrefVoltage': '0', 'Description': 'NONE'}, {'Name': 'CH_1', 'Channel': '1', 'Mode': 'INPUT', 'InitialValue': 'LOW', 'Hidden': 'false', 'VrefVoltage': '0', 'Description': 'NONE'}, {'Name': 'CH_2', 'Channel': '2', 'Mode': 'INPUT', 'InitialValue': 'LOW', 'Hidden': 'false', 'VrefVoltage': '0', 'Description': 'NONE'}, {'Name': 'CH_3', 'Channel': '3', 'Mode': 'INPUT', 'InitialValue': 'LOW', 'Hidden': 'false', 'VrefVoltage': '0', 'Description': 'NONE'}, {'Name': 'PPV_BMC_RESET_N',
'Channel': '16', 'Mode': 'OUTPUT', 'InitialValue': 'HIGH', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'Pulse LOW to reset the BMC'}, {'Name': 'PPV_CPU_RESET_N', 'Channel': '17', 'Mode': 'OUTPUT', 'InitialValue': 'HIGH', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'Pulse LOW to reset the CPU'}, {'Name': 'PPV_FPGA_INIT_DONE', 'Channel': '34', 'Mode': 'INPUT', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_FPGA_CONF_DONE', 'Channel': '35', 'Mode': 'INPUT', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_FPGA_NSTATUS', 'Channel': '36', 'Mode': 'INPUT', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_FPGA_NCONFIG', 'Channel': '37', 'Mode': 'INPUT', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_FPGA_PMBUS_EN', 'Channel': '38', 'Mode': 'INPUT', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'Monitor only'}, {'Name': 'NC_PPV_VCCL_PG', 'Channel': '39', 'Mode': 'INPUT', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_MSEL0', 'Channel': '40', 'Mode': 'OUTPUT', 'InitialValue': 'HIGHZ', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_MSEL1', 'Channel': '41', 'Mode': 'OUTPUT', 'InitialValue': 'HIGHZ', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_MSEL2', 'Channel': '42', 'Mode': 'OUTPUT', 'InitialValue': 'HIGHZ', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'PPV_MSEL3', 'Channel': '43', 'Mode': 'OUTPUT', 'InitialValue': 'HIGHZ', 'Hidden': 'false', 'VrefVoltage': '1.8', 'Description': 'NONE'}, {'Name': 'P5V_IN', 'Channel': '0', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '5.5', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P12V_IN', 'Channel': '1', 'Gain': '2', 'LowerLimit': '-0.2', 'UpperLimit': '13.2', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P3P3V_IN', 'Channel': '2', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '3.63', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'true', 'ThresholdLimit': '0.3'}, {'Name': 'P1P2V_IN', 'Channel': '3', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.32', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'PVCCL', 'Channel': '4', 'Gain': '1', 'LowerLimit': '-0.2',
'UpperLimit': '0.97', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P2P8V_SYS', 'Channel': '5', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '3.08', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'PVCCL_HPS', 'Channel': '6', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '0.99', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'PVCCL_SDM', 'Channel': '7', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '0.88', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P1P8V_SYS', 'Channel': '9', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.98', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'PVCCN1V_IO_RNR', 'Channel': '10', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.1', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'PVCCA', 'Channel': '11', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.98', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P1PV8_VCCPLL_HPS', 'Channel': '12', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.98', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P2P5V_SYS', 'Channel': '13', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '2.75', 'Hidden':
'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P1P2V_SYS_A', 'Channel': '14', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.32', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P1P2V_SYS_B', 'Channel': '15', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.32', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P1P8V_VCCN', 'Channel': '16', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '1.98', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P12V_SYS', 'Channel': '18', 'Gain': '2', 'LowerLimit': '-0.2', 'UpperLimit': '13.2', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.4'}, {'Name': 'P3P3V_SYS', 'Channel': '19', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '3.63', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P0P6V_DDR4_VTT_CH0', 'Channel': '20', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '0.66', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}, {'Name': 'P0P6V_DDR4_VTT_CH1', 'Channel': '21', 'Gain': '1', 'LowerLimit': '-0.2', 'UpperLimit': '0.66', 'Hidden': 'false', 'Description': None, 'DeactuateCheck': 'false', 'ThresholdLimit': '0.2'}]}</td>
</tr>
<tr>
<th>IsFASEMap</th>
<td>true</td>
</tr>
<tr>
<th>IsMultiDut</th>
<td>false</td>
</tr>
<tr>
<th>MapHWType</th>
<td>STDIO</td>
</tr>
<tr>
<th>MapVersion</th>
<td>0.5</td>
</tr>
<tr>
<th>TIUCount</th>
<td>1</td>
</tr>
</tbody>
</table>

Scrape values from json python requests

So I am building a scraper for sizes on a site and I am confused how to extract the "EUR" and "pieces" from this json.... I want to print later all sizes like "EU 41 = Pieces 6". probably I need a for loop
Here ist the output of the json : "{'translations': {'en': {'lang': 'en', 'title': 'Nike Dunk Low Retro Premium', 'subtitle': 'Black / Pure Platinum-Anthracite', 'slug': 'nike-dunk-low-retro-premium', 'description': 'DH7913-001'}}, 'id': 'vpEW0nkBHBhvh4GFDXSb', 'prices': {'EUR': {'currency': 'EUR', 'value': 119}}, 'sizeSets': {'Men': {'name': 'Men', 'sizes': [{'id': '685d200c-c470-11eb-b5ee-a66da43170c1', 'us': '8', 'eur': '41', 'uk': '7', 'cm': '26', 'ean': '194955875308', 'pieces': 6}, {'id': '685d21c4-c470-11eb-9f9f-a66da43170c1', 'us': '8.5', 'eur': '42', 'uk': '7.5', 'cm': '26.5', 'ean': '194955875315', 'pieces': 18}, {'id': '685d232c-c470-11eb-8bda-a66da43170c1', 'us': '9', 'eur': '42.5', 'uk': '8', 'cm': '27', 'ean': '194955875322', 'pieces': 10}, {'id': '685d248a-c470-11eb-bf78-a66da43170c1', 'us': '9.5', 'eur': '43', 'uk': '8.5', 'cm': '27.5', 'ean': '194955875339', 'pieces': 17}, {'id': '685d25de-c470-11eb-8741-a66da43170c1', 'us': '10', 'eur': '44', 'uk': '9', 'cm': '28', 'ean': '194955875346', 'pieces': 15}, {'id': '685d2732-c470-11eb-bfb5-a66da43170c1', 'us': '10.5', 'eur': '44.5', 'uk': '9.5', 'cm': '28.5', 'ean': '194955875353', 'pieces': 5}, {'id': '685d2886-c470-11eb-ac68-a66da43170c1', 'us': '11', 'eur': '45', 'uk': '10', 'cm': '29', 'ean': '194955875360', 'pieces': 1}, {'id': '685d29e4-c470-11eb-8578-a66da43170c1', 'us': '11.5', 'eur': '45.5', 'uk': '10.5', 'cm': '29.5', 'ean': '194955875377', 'pieces': 2}, {'id': '685d2b38-c470-11eb-a729-a66da43170c1', 'us': '12', 'eur': '46', 'uk': '11', 'cm': '30', 'ean': '194955875384', 'pieces': 3}]}, 'Unisex': {'name': 'Unisex', 'sizes': []}, 'Women': {'name': 'Women', 'sizes': []}, 'Kids': {'name': 'Kids', 'sizes': []}}, 'images': ['0/08/083/0837c383a3212d52f2e4455e0d876f47.jpeg', 'c/ca/ca0/ca01c2ca1dfb35013a06723b60c062cc.jpeg', '8/8e/8e9/8e9d04f6d1e8712da6d85c3db98ff989.jpeg', '3/37/376/3769e3f56186e46b91c725d09dff3252.jpeg', 'a/aa/aa5/aa5a8934a05be2badfe9cff5e07f122c.jpeg', '7/7b/7b0/7b088912b94bb0b2e41d527d573d568d.jpeg', 'b/b8/b8b/b8b214b6e1a33d56880e412b7ef8fe01.jpeg', '5/56/562/562809e497cc98b69cf8789e3238e482.jpeg'], 'imagesPortrait': ['a/ab/abc/abc1eac4bcdf74bd899f8e2f7827f30c.jpeg'], 'createdAt': '2021-06-03T13:34:22+00:00', 'publishAt': '2021-06-10T10:00:00+00:00', 'openRegistrationAt': '2021-06-10T10:00:00+00:00', 'closeRegistrationAt': '2021-06-18T23:00:00+00:00', 'finished': True, 'headliner': False, 'code': 'DH7913-001', 'footshopLink': 'https://www.footshop.eu/en/723-limited-edition/orderby-activated_at/orderway-desc', 'soldout': False, 'deleted': False, 'limitedShipping': True, 'delayedExport': False, 'productIdentifier': '115147', 'status': 'Closed',
'resultAt': '2021-06-19T03:00:00+00:00'}"
from os import error
import requests
from bs4 import BeautifulSoup
from discord_webhook import DiscordWebhook,DiscordEmbed
import time
import json
URL= "https://releases.footshop.com/api/raffles/vpEW0nkBHBhvh4GFDXSb"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"}
page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
site_info= page.json()
print(site_info)
For the Json, I would recommend you first put it in a Json viewer/reader you can find online to see how is the data is more clearly and where is the information you want to get.
Here something like that should get you the information you want :
for s in site_info['sizeSets']['Men']['sizes']:
print(s['eur']+' '+ str(s['pieces']))

Web scraping using Beautifulsoup to collect dropdown values

I am new to Python, trying to get a list of all the drop down values from the following website "https://www.sfma.org.sg/member/category" but failing to do so.
The below code is producing an empty list
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import re
import pandas as pd
page = "https://www.sfma.org.sg/member/category"
information = requests.get(page)
soup = BeautifulSoup(information.content, 'html.parser')
categories = soup.find_all('select', attrs={'class' :'w3-select w3-border'})
The desired output is the below list :-
['Alcoholic Beverage','Beer','Bottled
Beverage',..........,'Trader','Wholesaler']
Thanks !!
The options are loaded through Javascript, but the data is on the page. With some crude regexes you can extract it:
import re
import json
import requests
url = 'https://www.sfma.org.sg/member/category/'
text = requests.get(url).text
d = re.findall(r'var\s*cObject\s*=\s*(.*)\s*;', text)[0]
d = re.sub(r'(\w+)(?=:)', r'"\1"', d)
d = json.loads(d.replace("'", '"'))
from pprint import pprint
pprint(d, width=200)
Prints:
{'category': [{'cat_type': '1', 'id': '1', 'name': 'Alcoholic Beverage', 'permalink': 'alcoholic-beverage', 'status': '2'},
{'cat_type': '1', 'id': '2', 'name': 'Beer', 'permalink': 'beer', 'status': '2'},
{'cat_type': '1', 'id': '3', 'name': 'Bottled Beverage', 'permalink': 'bottled-beverage', 'status': '2'},
{'cat_type': '1', 'id': '4', 'name': 'Canned Beverage', 'permalink': 'canned-beverage', 'status': '2'},
{'cat_type': '1', 'id': '5', 'name': 'Carbonated Beverage', 'permalink': 'carbonated-beverage', 'status': '2'},
{'cat_type': '1', 'id': '6', 'name': 'Cereal / Grain Beverage', 'permalink': 'cereal-grain-beverage', 'status': '2'},
{'cat_type': '1', 'id': '7', 'name': 'Cider', 'permalink': 'cider', 'status': '2'},
{'cat_type': '1', 'id': '8', 'name': 'Coffee', 'permalink': 'coffee', 'status': '2'},
{'cat_type': '1', 'id': '9', 'name': 'Distilled Water', 'permalink': 'distilled-water', 'status': '2'},
{'cat_type': '1', 'id': '10', 'name': 'Fruit / Vegetable Juice', 'permalink': 'fruit-vegetable-juice', 'status': '2'},
{'cat_type': '1', 'id': '11', 'name': 'Herbal Beverage', 'permalink': 'herbal-beverage', 'status': '2'},
{'cat_type': '1', 'id': '12', 'name': 'Instant Beverage', 'permalink': 'instant-beverage', 'status': '2'},
{'cat_type': '1', 'id': '13', 'name': 'Milk', 'permalink': 'milk', 'status': '2'},
{'cat_type': '1', 'id': '14', 'name': 'Mineral Water', 'permalink': 'mineral-water', 'status': '2'},
...and so on.
EDIT: To print just names of categories, you can do this:
for c in d['category']:
print(c['name'])
Prints:
Alcoholic Beverage
Beer
Bottled Beverage
Canned Beverage
Carbonated Beverage
Cereal / Grain Beverage
Cider
...
Manufacturer
Restaurant
Retail Outlet
Supplier
Trader
Wholesaler
This is not really a proper question but still.
categories = soup.find("select", attrs={"name": "ctype"}).find_all('option')
result = [cat.get_text() for cat in categories]

Python with Json, If Statement

I have the json code below and I have a list
i want to do a for loop or if statement which
if label in selected_size:
fsize = id
selected_size[]
in selected size:
[7, 7.5, 4, 4.5]
in json:
removed
print(json_data)
for size in json_data:
if ['label'] in select_size:
fsize = ['id']
print(fsize)
i have no idea on how to do it.
You need to access to list and later to dict, for example:
json_data = [{'id': '91', 'label': '10.5', 'price': '0', 'oldPrice': '0', 'products': ['81278']}, {'id': '150', 'label': '9.5', 'price': '0', 'oldPrice': '0', 'products': ['81276']}, {'id': '28', 'label': '4', 'price': '0', 'oldPrice': '0', 'products': ['81270']}, {'id': '29', 'label': '5', 'price': '0', 'oldPrice': '0', 'products': ['81271']}, {'id': '22', 'label': '8', 'price': '0', 'oldPrice': '0', 'products': ['81274']}, {'id': '23', 'label': '9', 'price': '0', 'oldPrice': '0', 'products': ['81275']}, {'id': '24', 'label': '10', 'price': '0', 'oldPrice': '0', 'products': ['81277']}, {'id': '25', 'label': '11', 'price': '0', 'oldPrice': '0', 'products': ['81279']}, {'id': '26', 'label': '12', 'price': '0', 'oldPrice': '0', 'products': ['81280']}]
fsize = []
select_size = [7, 7.5, 4, 4.5]
[float(i) for i in select_size] #All select_size values to float value
for size in json_data:
if float(size['label']) in select_size: #For compare it i need float(size['label']) for convert to float.
fsize.append(size['id']) #Add to list
print(fsize) #Print all list, i get only 28

Categories

Resources