Parsing Json with multiple "levels" with Python - python

I'm trying to parse a json file from an api call.
I have found this code that fits my need and trying to adapt it to what I want:
import math, urllib2, json, re
def download():
graph = {}
page = urllib2.urlopen("http://fx.priceonomics.com/v1/rates/?q=1")
jsrates = json.loads(page.read())
pattern = re.compile("([A-Z]{3})_([A-Z]{3})")
for key in jsrates:
matches = pattern.match(key)
conversion_rate = -math.log(float(jsrates[key]))
from_rate = matches.group(1).encode('ascii','ignore')
to_rate = matches.group(2).encode('ascii','ignore')
if from_rate != to_rate:
if from_rate not in graph:
graph[from_rate] = {}
graph[from_rate][to_rate] = float(conversion_rate)
return graph
And I've turned it into:
import math, urllib2, json, re
def download():
graph = {}
page = urllib2.urlopen("https://bittrex.com/api/v1.1/public/getmarketsummaries")
jsrates = json.loads(page.read())
for pattern in jsrates['result'][0]['MarketName']:
for key in jsrates['result'][0]['Ask']:
matches = pattern.match(key)
conversion_rate = -math.log(float(jsrates[key]))
from_rate = matches.group(1).encode('ascii','ignore')
to_rate = matches.group(2).encode('ascii','ignore')
if from_rate != to_rate:
if from_rate not in graph:
graph[from_rate] = {}
graph[from_rate][to_rate] = float(conversion_rate)
return graph
Now the problem is that there is multiple level in the json "Result > 0, 1,2 etc"
json screenshot
for key in jsrates['result'][0]['Ask']:
I want the zero to be able to be any number, I don't know if thats clear.
So I could get all the ask price to match their marketname.
I have shortened the code so it doesnt make too long of a post.
Thanks
PS: sorry for the english, its not my native language.

You could loop through all of the result values that are returned, ignoring the meaningless numeric index:
for result in jsrates['result'].values():
ask = result.get('Ask')
if ask is not None:
# Do things with your ask...

Related

My Merkle Tree calculation does not match the actual one

I am studying bitcoin.
https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees
I read the above URL and implemented Merkle root in Python.
Using the API below, I collected all transactions in block 641150 and calculated the Merkle Root.
https://www.blockchain.com/explorer/api/blockchain_api
The following is the expected value
67a637b1c49d95165b3dd3177033adbbbc880f6da3620498d451ee0976d7b1f4
(https://www.blockchain.com/btc/block/641150 )
The values I calculated were as follows
f2a2207a1e8360b75729fd2f23659b1b79b14940b6e4982a985cf6aa6f941ad7
What is wrong?
My python code is;
from hashlib import sha256
import requests, json
base_url = 'https://blockchain.info/rawblock/'
block_hash = '000000000000000000042cef688cf40b4a70ac814e4222e6646bd6bb79d18168'
end_point = base_url + block_hash
def reverse(hex_be):
bytes_be = bytes.fromhex(hex_be)
bytes_le = bytes_be[::-1]
hex_le = bytes_le.hex()
return hex_le
def dhash(hash):
return sha256(sha256(hash.encode('utf-8')).hexdigest().encode('utf-8')).hexdigest()
def culculate_merkle(hash_list):
if len(hash_list) == 1:
return dhash(hash_list[0])
hashed_list = list(map(dhash, hash_list))
if len(hashed_list) % 2 == 1:
hashed_list.append(hashed_list[-1])
parent_hash_list = []
it = iter(hashed_list)
for hash1, hash2 in zip(it, it):
parent_hash_list.append(hash1 + hash2)
hashed_list = list(map(dhash, hash_list))
return culculate_merkle(parent_hash_list)
data = requests.get(end_point)
jsondata = json.loads(data.text)
tx_list = list(map(lambda tx_object: tx_object['hash'], jsondata['tx']))
markleroot = '67a637b1c49d95165b3dd3177033adbbbc880f6da3620498d451ee0976d7b1f4'
tx_list = list(map(reverse, tx_list))
output = culculate_merkle(tx_list)
output = reverse(output)
print(output)
result
$ python merkleTree.py
f2a2207a1e8360b75729fd2f23659b1b79b14940b6e4982a985cf6aa6f941ad7
I expect the following output as a result
67a637b1c49d95165b3dd3177033adbbbc880f6da3620498d451ee0976d7b1f4
self solving
first:
There was mistake in the dhash calculation method.
second:
The hash list obtained from the API had already had the dhash applied once.
⇒ The first time dhash was performed without this perspective
The complete source code is summarized in the following blog (in Japanese)
https://tec.tecotec.co.jp/entry/2022/12/25/000000

Using jsonpath in api.ai in case we have already our json file to query in?

import json
def makeWebhookResult(req):
if req.get("result").get("action") != "Phdapp":
return {}
result = req.get("result")
parameters = result.get("parameters")
Progr = parameters.get("PhDsubjects")
time = parameters.get("PhdTime")
Levp = parameters.get("PhDDegLevp")
with open('Sheet1.json') as f:
data = f.read()
jsondata = json.loads(data)
match = jsonpath.jsonpath(jsondata,
'$.features[[?(#.ProgramName == Progr && #.Level == Levp && #.StartDate == time)]].UniversityName')
speech = "This is the universities you were looking for " + match
This is the part of my python code which have errors i can't figure it out, I have an intent with action which is "Phdapp" with three parameters that i need to use their values in my jsonpath querying from "sheet1.json" file in the same repository on GitHub in json format. But i can't get data from my intent neither accessing my son file for querying...is it because api.ai is not compatible with jsonpath or it is the problem of my code! or if there is a best to use which easier it can be my pleasure to know it. Thanks

How to parse a single-column text file into a table using python?

I'm new here to StackOverflow, but I have found a LOT of answers on this site. I'm also a programming newbie, so i figured i'd join and finally become part of this community - starting with a question about a problem that's been plaguing me for hours.
I login to a website and scrape a big body of text within the b tag to be converted into a proper table. The layout of the resulting Output.txt looks like this:
BIN STATUS
8FHA9D8H 82HG9F RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
INVENTORY CODE: FPBC *SOUP CANS LENTILS
BIN STATUS
HA8DHW2H HD0138 RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
8SHDNADU 00A123 #2956- INVALID STOCK COUPON CODE (MISSING).
93827548 096DBR RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
There are a bunch of pages with the exact same blocks, but i need them to be combined into an ACTUAL table that looks like this:
BIN INV CODE STATUS
HA8DHW2HHD0138 FPBC-*SOUP CANS LENTILS RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
8SHDNADU00A123 FPBC-*SOUP CANS LENTILS #2956- INVALID STOCK COUPON CODE (MISSING).
93827548096DBR FPBC-*SOUP CANS LENTILS RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
8FHA9D8H82HG9F SSXR-98-20LM NM CORN CREAM RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
Essentially, all separate text blocks in this example would become part of this table, with the inv code repeating with its Bin values. I would post my attempts at parsing this data(have tried Pandas/bs/openpyxl/csv writer), but ill admit they are a little embarrassing, as i cannot find any information on this specific problem. Is there any benevolent soul out there that can help me out? :)
(Also, i am using Python 2.7)
A simple custom parser like the following should do the trick.
from __future__ import print_function
def parse_body(s):
line_sep = '\n'
getting_bins = False
inv_code = ''
for l in s.split(line_sep):
if l.startswith('INVENTORY CODE:') and not getting_bins:
inv_data = l.split()
inv_code = inv_data[2] + '-' + ' '.join(inv_data[3:])
elif l.startswith('INVENTORY CODE:') and getting_bins:
print("unexpected inventory code while reading bins:", l)
elif l.startswith('BIN') and l.endswith('MESSAGE'):
getting_bins = True
elif getting_bins == True and l:
bin_data = l.split()
# need to add exception handling here to make sure:
# 1) we have an inv_code
# 2) bin_data is at least 3 items big (assuming two for
# bin_id and at least one for message)
# 3) maybe some constraint checking to ensure that we have
# a valid instance of an inventory code and bin id
bin_id = ''.join(bin_data[0:2])
message = ' '.join(bin_data[2:])
# we now have a bin, an inv_code, and a message to add to our table
print(bin_id.ljust(20), inv_code.ljust(30), message, sep='\t')
elif getting_bins == True and not l:
# done getting bins for current inventory code
getting_bins = False
inv_code = ''
A rather complex one, but this might get you started:
import re, pandas as pd
from pandas import DataFrame
rx = re.compile(r'''
(?:INVENTORY\ CODE:)\s*
(?P<inv>.+\S)
[\s\S]+?
^BIN.+[\n\r]
(?P<bin_msg>(?:(?!^\ ).+[\n\r])+)
''', re.MULTILINE | re.VERBOSE)
string = your_string_here
# set up the dataframe
df = DataFrame(columns = ['BIN', 'INV', 'MESSAGE'])
for match in rx.finditer(string):
inv = match.group('inv')
bin_msg_raw = match.group('bin_msg').split("\n")
rxbinmsg = re.compile(r'^(?P<bin>(?:(?!\ {2}).)+)\s+(?P<message>.+\S)\s*$', re.MULTILINE)
for item in bin_msg_raw:
for m in rxbinmsg.finditer(item):
# append it to the dataframe
df.loc[len(df.index)] = [m.group('bin'), inv, m.group('message')]
print(df)
Explanation
It looks for INVENTORY CODE and sets up the groups (inv and bin_msg) for further processing in afterwork() (note: it would be easier if you had only one line of bin/msg as you need to split the group here afterwards).
Afterwards, it splits the bin and msg part and appends all to the df object.
I had a code written for a website scrapping which may help you.
Basically what you need to do is write click on the web page go to html and try to find the tag for the table you are looking for and using the module (i am using beautiful soup) extract the information. I am creating a json as I need to store it into mongodb you can create table.
#! /usr/bin/python
import sys
import requests
import re
from BeautifulSoup import BeautifulSoup
import pymongo
def req_and_parsing():
url2 = 'http://businfo.dimts.in/businfo/Bus_info/EtaByRoute.aspx?ID='
list1 = ['534UP','534DOWN']
for Route in list1:
final_url = url2 + Route
#r = requests.get(final_url)
#parsing_file(r.text,Route)
outdict = []
outdict = [parsing_file( requests.get(url2+Route).text,Route) for Route in list1 ]
print outdict
conn = f_connection()
for i in range(len(outdict)):
insert_records(conn,outdict[i])
def parsing_file(txt,Route):
soup = BeautifulSoup(txt)
table = soup.findAll("table",{"id" : "ctl00_ContentPlaceHolder1_GridView2"})
#trtags = table[0].findAll('tr')
tdlist = []
trtddict = {}
"""
for trtag in trtags:
print 'print trtag- ' , trtag.text
tdtags = trtag.findAll('td')
for tdtag in tdtags:
print tdtag.text
"""
divtags = soup.findAll("span",{"id":"ctl00_ContentPlaceHolder1_ErrorLabel"})
for divtag in divtags:
for divtag in divtags:
print "div tag - " , divtag.text
if divtag.text == "Currently no bus is running on this route" or "This is not a cluster (orange bus) route":
print "Page not displayed Errored with below meeeage for Route-", Route," , " , divtag.text
sys.exit()
trtags = table[0].findAll('tr')
for trtag in trtags:
tdtags = trtag.findAll('td')
if len(tdtags) == 2:
trtddict[tdtags[0].text] = sub_colon(tdtags[1].text)
return trtddict
def sub_colon(tag_str):
return re.sub(';',',',tag_str)
def f_connection():
try:
conn=pymongo.MongoClient()
print "Connected successfully!!!"
except pymongo.errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e
return conn
def insert_records(conn,stop_dict):
db = conn.test
print db.collection_names()
mycoll = db.stopsETA
mycoll.insert(stop_dict)
if __name__ == "__main__":
req_and_parsing()

Python - Getting Attributes From A File of Constants

I have a file of constant variables that I need to query and I am not sure how to go about it.
I have a database query which is returning user names and I need to find the matching user name in the file of constant variables.
The file looks like this:
SALES_MANAGER_01 = {"user_name": "BO01", "password": "password", "attend_password": "BO001",
"csm_password": "SM001", "employee_num": "BOSM001"}
There is just a bunch of users just like the one above.
My function looks like this:
#attr("user_test")
def test_get_user_for_login(self):
application_code = 'BO'
user_from_view = self.select_user_for_login(application_code=application_code)
users = [d['USER'] for d in user_from_view]
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
password = ""
global_users = dir(gum)
for item in global_users:
if user_wo_ent not in item.__getattr__("user_name"):
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
else:
password = item.__getattr__("password")
print(user_wo_ent, password)
global_users = dir(gum) is my file of constants. So I know I am doing something wrong since I am getting an attribute error AttributeError: 'str' object has no attribute '__getattr__', I am just not sure how to go about resolving it.
You should reverse your looping as you want to compare each item to your match condition. Also, you have a dictionary, so use it to do some heavy lifting.
You need to add some imports
import re
from ast import literal_eval
I've changed the dir(gum) bit to be this function.
def get_global_users(filename):
gusers = {} # create a global users dict
p_key = re.compile(ur'\b\w*\b') # regex to get first part, e.g.. SALES_MANAGER_01
p_value = re.compile(ur'\{.*\}') # regex to grab everything in {}
with (open(filename)) as f: # open the file and work through it
for line in f: # for each line
gum_key = p_key.match(line) # pull out the key
gum_value = p_value.search(line) # pull out the value
''' Here is the real action. update a dictionary
with the match of gum_key and with match of gum_value'''
gusers[gum_key.group()] = literal_eval(gum_value.group())
return(gusers) # return the dictionary
The bottom of your existing code is replaced with this.
global_users = get_global_users(gum) # assign return to global_users
for key, value in global_users.iteritems(): # walk through all key, value pairs
if value['user_name'] != user_wo_ent:
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
else:
password = value['password']
So a very simple answer was get the dir of the constants file then parsing over it like so:
global_users = dir(gum)
for item in global_users:
o = gum.__dict__[item]
if type(o) is not dict:
continue
if gum.__dict__[item].get("user_name") == user_wo_ent:
print(user_wo_ent, o.get("password"))
else:
print("User was not in global_user_mappings")
I was able to find the answer by doing the following:
def get_user_for_login(application_code='BO'):
user_from_view = BaseServiceTest().select_user_for_login(application_code=application_code)
users = [d['USER'] for d in user_from_view]
user_with_ent = choice(users)
user_wo_ent = user_with_ent[4:]
global_users = dir(gum)
user_dict = {'user_name': '', 'password': ''}
for item in global_users:
o = gum.__dict__[item]
if type(o) is not dict:
continue
if user_wo_ent == o.get("user_name"):
user_dict['user_name'] = user_wo_ent
user_dict['password'] = o.get("password")
return user_dict

Why are the videos on the most_recent standard feed so out of date?

I'm trying to grab the most recently uploaded videos. There's a standard feed for that - it's called most_recent. I don't have any problems grabbing the feed, but when I look at the entries inside, they're all half a year old, which is hardly recent.
Here's the code I'm using:
import requests
import os.path as P
import sys
from lxml import etree
import datetime
namespaces = {"a": "http://www.w3.org/2005/Atom", "yt": "http://gdata.youtube.com/schemas/2007"}
fmt = "%Y-%m-%dT%H:%M:%S.000Z"
class VideoEntry:
"""Data holder for the video."""
def __init__(self, node):
self.entry_id = node.find("./a:id", namespaces=namespaces).text
published = node.find("./a:published", namespaces=namespaces).text
self.published = datetime.datetime.strptime(published, fmt)
def __str__(self):
return "VideoEntry[id='%s']" % self.entry_id
def paginate(xml):
root = etree.fromstring(xml)
next_page = root.find("./a:link[#rel='next']", namespaces=namespaces)
if next_page == None:
next_link = None
else:
next_link = next_page.get("href")
entries = [VideoEntry(e) for e in root.xpath("/a:feed/a:entry", namespaces=namespaces)]
return entries, next_link
prefix = "https://gdata.youtube.com/feeds/api/standardfeeds/"
standard_feeds = set("top_rated top_favorites most_shared most_popular most_recent most_discussed most_responded recently_featured on_the_web most_viewed".split(" "))
feed_name = sys.argv[1]
assert feed_name in standard_feeds
feed_url = prefix + feed_name
all_video_ids = []
while feed_url is not None:
r = requests.get(feed_url)
if r.status_code != 200:
break
text = r.text.encode("utf-8")
video_ids, feed_url = paginate(text)
all_video_ids += video_ids
all_upload_times = [e.published for e in all_video_ids]
print min(all_upload_times), max(all_upload_times)
As you can see, it prints the min and max timestamps for the entire feed.
misha#misha-antec$ python get_standard_feed.py most_recent
2013-02-02 14:40:02 2013-02-02 14:54:00
misha#misha-antec$ python get_standard_feed.py top_rated
2006-04-06 21:30:53 2013-07-28 22:22:38
I've glanced through the downloaded XML and it appears to match the output. Am I doing something wrong?
Also, on an unrelated note, the feeds I'm getting are all about 100 entries (I'm paginating through them 25 at a time). Is this normal? I expected the feeds to be a bit bigger.
Regarding the "Most-Recent-Feed"-Topic: There is a ticket for this one here. Unfortunately, the YouTube-API-Teams doesn't respond or solved the problem so far.
Regarding the number of entries: That depends on the type of standardfeed, but for the most-recent-Feed it´s usually around 100.
Note: You could try using the "orderby=published" parameter to get recents videos, although I don´t know how "recent" they are.
https://gdata.youtube.com/feeds/api/videos?orderby=published&prettyprint=True
You can combine this query with the "category"-parameter or other ones (region-specific queries - like for the standard feeds - are not possible, afaik).

Categories

Resources