Pandas include Key to json file - python

import requests
import pandas as pd
import json
url = 'http://www.fundamentus.com.br/resultado.php'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
fundamentus = requests.get(url, headers=headers)
dfs = pd.read_html(fundamentus.text)
table = dfs[0]
table.to_json('table7.json', orient='records', indent=2)
this is giving me the following:
[{
"Papel":"VNET3",
"Cota\u00e7\u00e3o":0.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"0,00%",
"Mrg. L\u00edq.":"0,00%",
"Liq. Corr.":0,
"ROIC":"0,00%",
"ROE":"12,99%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"9.257.250.00000",
"D\u00edv.Brut\/ Patrim.":0.0,
"Cresc. Rec.5a":"-2,71%"
},
{
"Papel":"CFLU4",
"Cota\u00e7\u00e3o":1.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"8,88%",
"Mrg. L\u00edq.":"10,72%",
"Liq. Corr.":110,
"ROIC":"17,68%",
"ROE":"32,15%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"60.351.00000",
"D\u00edv.Brut\/ Patrim.":6.0,
"Cresc. Rec.5a":"8,14%"
}
]
But I need the following.
[ VNET3 = {
"Cota\u00e7\u00e3o":0.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"0,00%",
"Mrg. L\u00edq.":"0,00%",
"Liq. Corr.":0,
"ROIC":"0,00%",
"ROE":"12,99%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"9.257.250.00000",
"D\u00edv.Brut\/ Patrim.":0.0,
"Cresc. Rec.5a":"-2,71%"
},
CFLU4 = {
"Cota\u00e7\u00e3o":1.0,
"P\/L":0.0,
"P\/VP":0.0,
"PSR":0.0,
"Div.Yield":"0,00%",
"P\/Ativo":0.0,
"P\/Cap.Giro":0,
"P\/EBIT":0.0,
"P\/Ativ Circ.Liq":0,
"EV\/EBIT":0.0,
"EV\/EBITDA":0.0,
"Mrg Ebit":"8,88%",
"Mrg. L\u00edq.":"10,72%",
"Liq. Corr.":110,
"ROIC":"17,68%",
"ROE":"32,15%",
"Liq.2meses":"000",
"Patrim. L\u00edq":"60.351.00000",
"D\u00edv.Brut\/ Patrim.":6.0,
"Cresc. Rec.5a":"8,14%"
}
]
The enconding is comming wrongly as well.
For example: "Cota\u00e7\u00e3o"
I tried: table.to_json('table7.json',**force_ascii=True**, orient='records', indent=2)
I also tried.
table.to_json('table7.json',**encoding='utf8'**, orient='records', indent=2)
But no success.
So I tried to read with json because the Idea was read it and convert it.
This is the json reader statement.
jasonfile = open('table7.json', 'r')
stocks = jasonfile.read()
jason_object = json.loads(stocks)
print(str(jason_object['Papel']))
But I've got.
**print(str(jason_object['Papel']))
TypeError: list indices must be integers or slices, not str**
Thanks in advance.

You have list with many dictionaries so you have to use index like [0] to get one dictionary
print( jason_object[0]['Papel'] )
And text Cota\u00e7\u00e3o can be correct. It is how JSON keeps native chars.
But if you print it
print('Cota\u00e7\u00e3o')
then you should get
Cotação
When I run
for key in jason_object[0].keys():
print(key)
then I get on screen
VNET3
Papel
Cotação
P/L
P/VP
PSR
Div.Yield
P/Ativo
P/Cap.Giro
P/EBIT
P/Ativ Circ.Liq
EV/EBIT
EV/EBITDA
Mrg Ebit
Mrg. Líq.
Liq. Corr.
ROIC
ROE
Liq.2meses
Patrim. Líq
Dív.Brut/ Patrim.
Cresc. Rec.5a
But if I open table7.json in text editor then I see Cota\u00e7\u00e3o
List [ VNET3 = { .. }] it is not correct JSON nor Python structure.
Correct JSON and Python structure is dictionary { "VNET3": { .. } }
new_data = dict()
for item in jason_object:
key = item['Papel']
item.pop('Papel')
val = item
new_data[key] = val
print(new_data)
Minimal working code
import requests
import pandas as pd
import json
url = 'http://www.fundamentus.com.br/resultado.php'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
response = requests.get(url, headers=headers)
dfs = pd.read_html(response.text)
table = dfs[0]
table.to_json('table7.json', orient='records', indent=2)
jasonfile = open('table7.json', 'r')
jason_object = json.loads(jasonfile.read())
#print(jason_object[0]['Papel'])
#for key in jason_object[0].keys():
# print(key)
new_data = dict()
for item in jason_object:
key = item['Papel']
item.pop('Papel')
val = item
new_data[key] = val
print(new_data)
Tested on Python 3.7, Linux Mint which as default uses UTF-8 in console/terminal.

Related

Unable to query graphql with a sha256 hash to scrape property links from a webpage

After visiting this website, when I fill out the inputbox with Sydney CBD, NSW and hit the search button, I can see the required results displayed on that site.
I wish to scrape the property links using requests module. When I go for the following attempt, I can get the property links from the first page.
The problem here is that I hardcoded the value of sha256Hash within params, which is not what I want to do. I don't know if the ID retrieved by issuing a get requests to the suggestion url needs to be converted to sha256Hash.
However, when I do that using this function get_hashed_string(), the value it produces is different from the hardcoded one that is available within params. As a result, the script spits out a keyError on this line: container = res.json().
import requests
import hashlib
from pprint import pprint
from bs4 import BeautifulSoup
url = 'https://suggest.realestate.com.au/consumer-suggest/suggestions'
link = 'https://lexa.realestate.com.au/graphql'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
payload = {
'max': '7',
'type': 'suburb,region,precinct,state,postcode',
'src': 'homepage-web',
'query': 'Sydney CBD, NSW'
}
params = {"operationName":"searchByQuery","variables":{"query":"{\"channel\":\"buy\",\"page\":1,\"pageSize\":25,\"filters\":{\"surroundingSuburbs\":true,\"excludeNoSalePrice\":false,\"ex-under-contract\":false,\"ex-deposit-taken\":false,\"excludeAuctions\":false,\"excludePrivateSales\":false,\"furnished\":false,\"petsAllowed\":false,\"hasScheduledAuction\":false},\"localities\":[{\"searchLocation\":\"sydney cbd, nsw\"}]}","testListings":False,"nullifyOptionals":False},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"}}}
def get_hashed_string(keyword):
hashed_str = hashlib.sha256(keyword.encode('utf-8')).hexdigest()
return hashed_str
with requests.Session() as s:
s.headers.update(headers)
r = s.get(url,params=payload)
hashed_id = r.json()['_embedded']['suggestions'][0]['id']
# params['extensions']['persistedQuery']['sha256Hash'] = get_hashed_string(hashed_id)
res = s.post(link,json=params)
container = res.json()['data']['buySearch']['results']['exact']['items']
for item in container:
print(item['listing']['_links']['canonical']['href'])
If I run the script as is, it works beautifully. When I uncomment the line params['extensions']['persistedQuery']--> and run the script again, the script breaks.
How can I generate the value of sha256Hash and use the same within the script above?
This is not how graphql works. The sha value stays the same across all requests but what you're missing is a valid graphql query.
You have to reconstruct that first and then just use the API pagination - that's the key.
Here's how:
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0",
"Accept": "application/graphql+json, application/json",
"Content-Type": "application/json",
"Host": "lexa.realestate.com.au",
"Referer": "https://www.realestate.com.au/",
}
endpoint = "https://lexa.realestate.com.au/graphql"
graph_query = "{\"channel\":\"buy\",\"page\":page_number,\"pageSize\":25,\"filters\":{\"surroundingSuburbs\":true," \
"\"excludeNoSalePrice\":false,\"ex-under-contract\":false,\"ex-deposit-taken\":false," \
"\"excludeAuctions\":false,\"excludePrivateSales\":false,\"furnished\":false,\"petsAllowed\":false," \
"\"hasScheduledAuction\":false},\"localities\":[{\"searchLocation\":\"sydney cbd, nsw\"}]}"
graph_json = {
"operationName": "searchByQuery",
"variables": {
"query": "",
"testListings": False,
"nullifyOptionals": False
},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": "ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"
}
}
}
if __name__ == '__main__':
with requests.Session() as s:
for page in range(1, 3):
graph_json['variables']['query'] = graph_query.replace('page_number', str(page))
r = s.post(endpoint, headers=headers, data=json.dumps(graph_json))
listing = r.json()['data']['buySearch']['results']['exact']['items']
for item in listing:
print(item['listing']['_links']['canonical']['href'])
This should give you:
https://www.realestate.com.au/property-apartment-nsw-sydney-140558991
https://www.realestate.com.au/property-apartment-nsw-sydney-141380404
https://www.realestate.com.au/property-apartment-nsw-sydney-140310979
https://www.realestate.com.au/property-apartment-nsw-sydney-141259592
https://www.realestate.com.au/property-apartment-nsw-barangaroo-140555291
https://www.realestate.com.au/property-apartment-nsw-sydney-140554403
https://www.realestate.com.au/property-apartment-nsw-millers+point-141245584
https://www.realestate.com.au/property-apartment-nsw-haymarket-139205259
https://www.realestate.com.au/project/hyde-metropolitan-by-deicorp-sydney-600036803
https://www.realestate.com.au/property-apartment-nsw-haymarket-140807411
https://www.realestate.com.au/property-apartment-nsw-sydney-141370756
https://www.realestate.com.au/property-apartment-nsw-sydney-141370364
https://www.realestate.com.au/property-apartment-nsw-haymarket-140425111
https://www.realestate.com.au/project/greenland-centre-sydney-600028910
https://www.realestate.com.au/property-apartment-nsw-sydney-141364136
https://www.realestate.com.au/property-apartment-nsw-sydney-139367203
https://www.realestate.com.au/property-apartment-nsw-sydney-141156696
https://www.realestate.com.au/property-apartment-nsw-sydney-141362880
https://www.realestate.com.au/property-studio-nsw-sydney-141311384
https://www.realestate.com.au/property-apartment-nsw-haymarket-141354876
https://www.realestate.com.au/property-apartment-nsw-the+rocks-140413283
https://www.realestate.com.au/property-apartment-nsw-sydney-141350552
https://www.realestate.com.au/property-apartment-nsw-sydney-140657935
https://www.realestate.com.au/property-apartment-nsw-barangaroo-139149039
https://www.realestate.com.au/property-apartment-nsw-haymarket-141034784
https://www.realestate.com.au/property-apartment-nsw-sydney-141230640
https://www.realestate.com.au/property-apartment-nsw-barangaroo-141340768
https://www.realestate.com.au/property-apartment-nsw-haymarket-141337684
https://www.realestate.com.au/property-unitblock-nsw-millers+point-141337528
https://www.realestate.com.au/property-apartment-nsw-sydney-141028828
https://www.realestate.com.au/property-apartment-nsw-sydney-141223160
https://www.realestate.com.au/property-apartment-nsw-sydney-140643067
https://www.realestate.com.au/property-apartment-nsw-sydney-140768179
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406051
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406047
https://www.realestate.com.au/property-apartment-nsw-sydney-139652067
https://www.realestate.com.au/property-apartment-nsw-sydney-140032667
https://www.realestate.com.au/property-apartment-nsw-sydney-127711002
https://www.realestate.com.au/property-apartment-nsw-sydney-140903924
https://www.realestate.com.au/property-apartment-nsw-walsh+bay-139130519
https://www.realestate.com.au/property-apartment-nsw-sydney-140285823
https://www.realestate.com.au/property-apartment-nsw-sydney-140761223
https://www.realestate.com.au/project/111-castlereagh-sydney-600031082
https://www.realestate.com.au/property-apartment-nsw-sydney-140633099
https://www.realestate.com.au/property-apartment-nsw-haymarket-141102892
https://www.realestate.com.au/property-apartment-nsw-sydney-139522379
https://www.realestate.com.au/property-apartment-nsw-sydney-139521259
https://www.realestate.com.au/property-apartment-nsw-sydney-139521219
https://www.realestate.com.au/property-apartment-nsw-haymarket-140007279
https://www.realestate.com.au/property-apartment-nsw-haymarket-139156515

How to change Json data output in table format

import requests
from pprint import pprint
import pandas as pd
baseurl = "https://www.nseindia.com/"
url = f'https://www.nseindia.com/api/live-analysis-oi-spurts-underlyings'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
'like Gecko) '
'Chrome/80.0.3987.149 Safari/537.36',
'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
session = requests.Session()
request = session.get(baseurl, headers=headers, timeout=30)
cookies = dict(request.cookies)
res = session.get(url, headers=headers, timeout=30, cookies=cookies)
print(res.json())
I tried df = pd.DataFrame(res.json()) but couldn't get data in table format. How to do that Plz. Also how to select few particular columns only in data output instead of all columns.
Try this :
import json
import codecs
df = pd.DataFrame(json.loads(codecs.decode(bytes(res.text, 'utf-8'), 'utf-8-sig'))['data'])
And to select a specific columns, you can use :
mini_df = df[['symbol', 'latestOI', 'prevOI', 'changeInOI', 'avgInOI']]
>>> print(mini_df)

Yandex Spellchecker API Returns Empty Array

I am trying to harness a Russian language spellcheck API, Yandex.Speller.
The request seems to work fine in my browser. However, when I use a python script, the response is empty.
I am stumped as to what I am doing wrong.
Here is my code:
import urllib
from urllib.request import urlopen
import json
def main():
api(text_preproc())
def text_preproc():
""" Takes misspelled word/phrase,
“t”, and prepares it for
API request
"""
t = "синхрафазатрон в дубне"
text = t.replace(" ", "+")
return text
def diff_api(text):
my_url = "https://speller.yandex.net/services/spellservice.json/checkText?text="
my_headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
my_data = {
"text" : text,
"lang" : "ru",
"format" : "plain"}
my_uedata = urllib.parse.urlencode(my_data)
my_edata = my_uedata.encode('ascii')
req = urllib.request.Request(url=my_url, data=my_edata, headers=my_headers)
response = urlopen(req)
data = json.load(response)
print(data)
The response is always an empty array, no matter how I tinker with my request.
Any insight into what I might be doing wrong?
my_uedata has to be a part of the URL you send the request to.
Also, in:
def main():
api(text_preproc())
You call api() but the function is not defined. I've used diff_api().
Try this:
import json
import urllib
from urllib.request import urlopen
def main():
diff_api(text_preproc("синхрафазатрон в дубне"))
def text_preproc(phrase):
""" Takes misspelled word/phrase,
“t”, and prepares it for
API request
"""
return phrase.replace(" ", "+")
def diff_api(text):
my_url = "https://speller.yandex.net/services/spellservice.json/checkText?text="
my_headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
my_data = {
"text": text,
"lang": "ru",
"format": "plain"}
my_uedata = urllib.parse.urlencode(my_data)
req = urllib.request.Request(url=my_url+my_uedata, headers=my_headers)
data = json.load(urlopen(req))
print(data)
main()
Output:
[{'code': 1, 'pos': 5, 'row': 0, 'col': 5, 'len': 14, 'word': 'синхрафазатрон', 's': ['синхрофазотрон', 'синхрофазатрон', 'синхрофазотрона']}]

CSV file and loop list of URL in Python

I've been trying to loop a CSV file, a list of URL, with this code, to scrape and store data in Excel. With one URL I could do it, but cant seem to find a way to do that with a list of URL (stock market tickers). This is my code:
import requests
import json
import csv
import pandas as pd
Urls = open('AcoesURLJsonCompleta.csv')
for row in Urls:
obj_id = row.strip().split(',')
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
jsonData = requests.get(row, headers=headers).json()
data = {
'Ticker': [],
'Beta': [],
'DY': [],
'VOL': [],
'P/L': [],
'Cresc5A': [],
'LPA': [],
'VPA': [],
'Ultimo': []
}
ticker = jsonData['ric']
beta = jsonData['beta']
DY = jsonData['current_dividend_yield_ttm']
VOL = jsonData['share_volume_3m']
PL = jsonData['pe_normalized_annual']
cresc5a = jsonData['eps_growth_5y']
LPA = jsonData['eps_normalized_annual']
VPA = jsonData['book_value_share_quarterly']
Ultimo = jsonData['last']
data['Ticker'].append(ticker)
data['Beta'].append(beta)
data['DY'].append(DY)
data['VOL'].append(VOL)
data['P/L'].append(PL)
data['Cresc5A'].append(cresc5a)
data['LPA'].append(LPA)
data['VPA'].append(VPA)
data['Ultimo'].append(Ultimo)
table = pd.DataFrame(data, columns=['Ticker', 'Beta', 'DY', 'VOL', 'P/L', 'Cresc5A', 'LPA', 'VPA', 'Ultimo'])
table.index = table.index + 1
table.to_csv('CompleteData.csv', sep=',', encoding='utf-8', index=False)
print(table)
The output is always a KeyError:with those jsonData, as KeyError: 'beta' for example. How to fix this?
Assuming your urls are valid and you don't have other validation errors (like KeyError), you need to loop through all of them and build a dataframe for each. Then append the dataframe to the csv file, with a structure such as:
for row in Urls:
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
jsonData = requests.get(row, headers=headers).json()
data = {
'Ticker': [],
'Beta': [],
'DY': [],
'VOL': [],
'P/L': [],
'Cresc5A': [],
'LPA': [],
'VPA': [],
'Ultimo': []
}
ticker = jsonData['ric']
beta = jsonData['beta']
DY = jsonData['current_dividend_yield_ttm']
VOL = jsonData['share_volume_3m']
PL = jsonData['pe_normalized_annual']
cresc5a = jsonData['eps_growth_5y']
LPA = jsonData['eps_normalized_annual']
VPA = jsonData['book_value_share_quarterly']
Ultimo = jsonData['last']
data['Ticker'].append(ticker)
data['Beta'].append(beta)
data['DY'].append(DY)
data['VOL'].append(VOL)
data['P/L'].append(PL)
data['Cresc5A'].append(cresc5a)
data['LPA'].append(LPA)
data['VPA'].append(VPA)
data['Ultimo'].append(Ultimo)
table = pd.DataFrame(data, columns=['Ticker', 'Beta', 'DY', 'VOL', 'P/L', 'Cresc5A', 'LPA', 'VPA', 'Ultimo'])
with open("append_to_csv.csv", 'a') as f:
table.to_csv(f, mode='a', header=not f.tell(), index=False)
Seems to me you're using beta instead of Beta. Just fix the capital letter.

adding x-www-form-urlencoded to post request

example from postman
import urllib2
url = "http://www.example.com/posts"
req = urllib2.Request(url,headers={'User-Agent' : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Ubuntu/11.04 Chromium/12.0.742.112 Chrome/12.0.742.112 Safari/534.30" , "Content-Type": "application/x-www-form-urlencoded"})
con = urllib2.urlopen(req)
print con.read()
now this code works fine but i want to add the value as you see in postman picture to get the response that i want i don't know how to add the key and value postid = 134686 to python and it's post request in postman
Form-encoded is the normal way to send a POST request with data. Just supply a data dict; you don't even need to specify the content-type.
data = {'postid': 134786}
headers = {'User-Agent' : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Ubuntu/11.04 Chromium/12.0.742.112 Chrome/12.0.742.112 Safari/534.30"}
req = urllib2.Request(url, headers=headers, data=data)
Just to an important thing to note is that for nested json data you will need to convert the nested json object to string.
data = { 'key1': 'value',
'key2': {
'nested_key1': 'nested_value1',
'nested_key2': 123
}
}
The dictionary needs to be transformed in this format
inner_dictionary = {
'nested_key1': 'nested_value1',
'nested_key2': 123
}
data = { 'key1': 'value',
'key2': json.dumps(inner_dictionary)
}
r = requests.post(URL, data = data)

Categories

Resources