How to work with multiple of the same key?

How to work with multiple of the same key? - python

I have a large dictionary that contains weather data. You can take a look at it here
This weather data is for multiple days, and I want to get all of the values from one key. How would I do this?
Here is a simplified version of the dictionary:
'data': { 'day1' : {'weather_discription': 'cloudy'},
'day2' : {'weather_discription': 'clear'}
}
I tried to use this code:
import requests
r = requests.get('data website')
res = r.json()
print(res['weather_discription'])

You need a loop to get them all.
for day, data in res['data'].items():
print(f"Weather on {day} was {data['weather_description']}")

Related

How to extract the data from a list of dictionaries?

I'm collecting some market data from Binance's API. My goal is to collect the list of all markets and use the 'status' key included in each row to detect if the market is active or not. If it's not active, I must search the last trade to collect the date of the market's shutdown.
I wrote this code
import requests
import pandas as pd
import json
import csv
url = 'https://api.binance.com/api/v3/exchangeInfo'
trade_url = 'https://api.binance.com/api/v3/trades?symbol='
response = requests.get(url)
data = response.json()
df = data['symbols'] #list of dict
json_data=[]
with open(r'C:\Users\Utilisateur\Desktop\json.csv', 'a' , encoding='utf-8', newline='') as j :
wr=csv.writer(j)
wr.writerow(["symbol","last_trade"])
for i in data['symbols'] :
if data[i]['status'] != "TRADING" :
trades_req = requests.get(trade_url + i)
print(trades_req)
but I got this error
TypeError: unhashable type: 'dict'
How can I avoid it?

That's because i is a dictionary. If data['symbols'] is a list of dictionaries, when you do in the loop:
for i in data['symbols']:
if data[i]['status'] ...
you are trying to hash i to use it as a key of data. I think you want to know the status of each dictionary on the list. That is:
for i in data['symbols']:
if i['status'] ...
In such a case, it would be better to use more declarative variable names, e.g., d, s, symbol instead of i.

Calling API multiple times - Python

I'm trying to use an API that converts dates. I've retrieved data from a file that contained full dates, used split and slice to get the day, month and year seperately. I need to send each date and return the conversion to the user.
What I currently have is:
def convert(day, month, year):
gr_to_hb_url = 'https://www.hebcal.com/converter?cfg=json&gy='+ year+ '&gm='+ month+ '&gd='+ day+'&g2h=1'
with urllib.request.urlopen(gr_to_hb_url) as response:
data = response.read()
obj = json.loads(data)
results = [(result['hd'], result['hm'],result['hy']) for result in obj]
return results
hby, hbm, hbd=convert(prep_day, prep_month, prep_year)
print(hby,hbm,hbd)
prep_day/ month/ year are the day, month and year I retrieved from each day separately as I mentioned above.
The error I get TypeError: string indices must be integers.
Appreciate any help. Thanks!

Having a look at the output of the request:
{"gy":2020,"gm":1,"gd":1,"afterSunset":false,"hy":5780,"hm":"Tevet","hd":4,"hebrew":"ד׳ בְּטֵבֵת תש״פ","events":["Parashat Vayigash"]}
I think you might want the following instead:
def convert(day, month, year):
gr_to_hb_url = 'https://www.hebcal.com/converter?cfg=json&gy='+ year+ '&gm='+ month+ '&gd='+ day+'&g2h=1'
with urllib.request.urlopen(gr_to_hb_url) as response:
data = response.read()
obj = json.loads(data)
results = (obj['hd'], obj['hm'],obj['hy'])
return results
The reason for the error you were seeing was that when you iterate through a dictionary type you just get the values.
In this case that would be something like the following (Although the order isn't guaranteed when iterating through a dictionary)
[2020,1,1,False, ...]
I imagine that the first element that you were iterating through was something like "Tevet".
If the value of result is "Tevet" then running "Tevet"["hd"]
would result in the error you were seeing.

Daily leaderboard or price tracking data

I'll just start from scratch since I feel like I'm lost with all the different possibilities. What I will be talking about is leaderboard but could apply to price tracking as well.
My goal is to scrape data from a website (the all time leaderboard / hidden), put it in a .csv file and update it daily at noon.
What I have succeeded so far : scraping the data.
Tried scraping with BS4 but since the data is hidden, I couldn't be specific enough to only get the all-time points. I find it's a success because I'm able to get a table with all the data I need and the date as a header. My problem with this solution is 1) unuseful data populating the csv 2) table is vertical and not horizontal
Scraped data with CSS selector but I have abandoned this idea because soemtimes the page won't load and the data wasn't scraped. Found out that there's a json file containing the data right away
Json scraping seems to be the best option, but having trouble creating a csv file that's OK to make a graph with.
This is what brings me to what I'm struggling with : storing the data in a table that looks like this where the grey area is the points and the DATE1 is the moment the data has been scraped :
I'd like not to manipulate the data in the csv file too much. If the table would look like what I picture above, then it's gonna be easier to make a graph afterwards but I'm having trouble. The best I did is creating a table that looks like this AND that's vertical and not horizontal.
name,points,date
Dennis,52570,10-23-2020
Dinh,40930,10-23-2020
name,points,date
Dennis,52570,10-23-2020
Dinh,40930,10-23-2020
name,points,date
Dennis,52570,10-23-2020
Dinh,40930,10-23-2020
Thank you for your help.
Here's the code
import pandas as pd
import time
timestr = time.strftime("%Y-%m-%d %H:%M")
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
table.to_csv('products.csv', index=True, encoding='utf-8')
If what I want is not possible, I might just scrape individually for each member, make one CSV file per member and make a graph that refers to those different files.

So, I've played around with your question a bit and here's what I came up with.
Basically, your best bet for data storage is a light weight database, as suggested in the comments. However, with a bit of planning, a few hoops to jump, and some hacky code you could get away with a simple (sort of) JSON that eventually ends up as a .csv file that looks like this:
Note: the values are the same as I don't want to wait a day or two for the leader-board to actually update.
What I did was rearranging the data that came back from the request to the API and built a structure that looks like this:
"BobTheElectrician": {
"id": 7160010,
"rank": 14,
"score_data": {
"2020-10-24 18:45": 4187,
"2020-10-24 18:57": 4187,
"2020-10-24 19:06": 4187,
"2020-10-24 19:13": 4187
}
Every player is your main key that has, among others, a scores_data value. This in turn is a dict that holds points value for each day you run the script.
Now, the trick is to get this JSON to look like the .csv you want. The question is - how?
Well, since you intend to update all players' data (I just assumed that) they all should have the same number of entries for score_data.
The keys for score_data are your timestamps. Grab any player's score_data keys and you have the date headers, right?
Having said that, you can build your .csv rows the same way: grab player's name and all their point values from score_data. This should get you a list of lists, right? Right.
Then, when you have all this, you just dump that to a .csv file and there you have it!
Putting it all together:
import csv
import json
import os
import random
import time
from urllib.parse import urlencode
import requests
API_URL = "https://community.koodomobile.com/widget/pointsLeaderboard?"
LEADERBOARD_FILE = "leaderboard_data.json"
def get_leaderboard(period: str = "allTime", max_results: int = 20) -> list:
payload = {"period": period, "maxResults": max_results}
return requests.get(f"{API_URL}{urlencode(payload)}").json()
def dump_leaderboard_data(leaderboard_data: dict) -> None:
with open("leaderboard_data.json", "w") as jf:
json.dump(leaderboard_data, jf, indent=4, sort_keys=True)
def read_leaderboard_data(data_file: str) -> dict:
with open(data_file) as f:
return json.load(f)
def parse_leaderboard(leaderboard: list) -> dict:
return {
item["name"]: {
"id": item["id"],
"score_data": {
time.strftime("%Y-%m-%d %H:%M"): item["points"],
},
"rank": item["leaderboardPosition"],
} for item in leaderboard
}
def update_leaderboard_data(target: dict, new_data: dict) -> dict:
for player, stats in new_data.items():
target[player]["rank"] = stats["rank"]
target[player]["score_data"].update(stats["score_data"])
return target
def leaderboard_to_csv(leaderboard: dict) -> None:
data_rows = [
[player] + list(stats["score_data"].values())
for player, stats in leaderboard.items()
]
random_player = random.choice(list(leaderboard.keys()))
dates = list(leaderboard[random_player]["score_data"])
with open("the_data.csv", "w") as output:
w = csv.writer(output)
w.writerow([""] + dates)
w.writerows(data_rows)
def script_runner():
if os.path.isfile(LEADERBOARD_FILE):
fresh_data = update_leaderboard_data(
target=read_leaderboard_data(LEADERBOARD_FILE),
new_data=parse_leaderboard(get_leaderboard()),
)
leaderboard_to_csv(fresh_data)
dump_leaderboard_data(fresh_data)
else:
dump_leaderboard_data(parse_leaderboard(get_leaderboard()))
if __name__ == "__main__":
script_runner()
The script also checks if you have a JSON file that pretends to be a proper database. If not, it'll write the leader-board data. Next time you run the script, it'll update the JSON and spit out a fresh .csv file.
Hope this answer will nudge you in the right direction.

Hey since you are loading it in a panda frame it makes the operations fairly simple. I ran your code first
import pandas as pd
import time
timestr = time.strftime("%Y-%m-%d %H:%M")
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
Then I added a few more lines of code to modify the panda frame table to your need.
idxs = table['date'].index
for i,val in enumerate(idxs):
table.at[ val , table['date'][i] ] = table['points'][i]
table = table.drop([ 'date', 'points' ], axis = 1)
In the above snippet I am using pandas frames ability to assign values using indexes. So first I get index values for the date column then I go through each of them to add column for the required date(values from date column) and get the corresponding points according to the indexes we pulled earlier
This gives me the following output:
name 10-24-2020
Dennis 52570.0
Dinh 40930.0
Sophia 26053.0
Mayumi 25300.0
Goran 24689.0
Robert T 19843.0
Allan M 19768.0
Bernard Koodo 14404.0
nim4165 13629.0
Timo Tuokkola 11216.0
rikkster 7338.0
David AKU 5774.0
Ranjan Koodo 4506.0
BobTheElectrician 4170.0
Helen Koodo 3370.0
Mihaela Koodo 2764.0
Fred C 2542.0
Philosoraptor 2122.0
Paul Deschamps 1973.0
Emilia Koodo 1755.0
I can then save this to csv using last line from your code. Similar you can pull data for more dates and format it to add it to the same panda frame
table.to_csv('products.csv', index=True, encoding='utf-8')

Pytrends - Interest over time - return column with None when there is no data

Pytrends for Google Trends data does not return a column if there is no data for a search parameter on a specific region.
The code below is from pytrends.request
def interest_over_time(self):
"""Request data from Google's Interest Over Time section and return a dataframe"""
over_time_payload = {
# convert to string as requests will mangle
'req': json.dumps(self.interest_over_time_widget['request']),
'token': self.interest_over_time_widget['token'],
'tz': self.tz
}
# make the request and parse the returned json
req_json = self._get_data(
url=TrendReq.INTEREST_OVER_TIME_URL,
method=TrendReq.GET_METHOD,
trim_chars=5,
params=over_time_payload,
)
df = pd.DataFrame(req_json['default']['timelineData'])
if (df.empty):
return df
df['date'] = pd.to_datetime(df['time'].astype(dtype='float64'),
unit='s')
df = df.set_index(['date']).sort_index()
From the code above, if there is no data, it just returns df, which will be empty.
My question is, how can I make it return a column with "No data" on every line and the search term as header, so that I can clearly see for which search terms there is no data?
Thank you.

I hit this problem, then I hit this web page. My solution was to ask Google trends for data on a search item it would have data for, then rename the column and 0 the data.
I used the ".drop" method to get rid of the "isPartial" column and the ".rename" method to change the column name. To zero the data in the column, I did the following, I created a function:
#Make value zero
def MakeZero(x):
return x *0
Then using the ".apply" method on the dataframe to 0 the column.
ThisYrRslt=BlankResult.apply(MakeZero)
: ) But the question is, what search term do you ask google trends about that will always return a value? I chose "Google". : )
I'm sure you can think of some better ones, but it's hard to leave those words in commercial code.

Iterating deeply nested pandas json object?

I have a pretty big json object which is of the format
[
{
"A":"value",
"TIME":1551052800000,
"C":35,
"D":36,
"E":34,
"F":35,
"G":33
},
{
"B":"value",
"TIME":1551052800000,
"C":36,
"D":56,
"E":44,
"F":75,
"G":38
}, ...
...
]
Converted to json with the help of pandas
df.to_json(orient='records')
I want to loop through the json body and update a specific key inside this json object and send it back to the client through my api
I want to do something like
for i = 0
object[i]["TIME"] = updateCaclulations
return i
I am new to python and have tried this. It helps iterate through the object but updation is not there and the time taken due to recursion is a lot.

First, pd.read_sql_query returns pd.DataFrame and not json.
As per your question:
Say you have a sample function calculate:
def update_calculation(time):
return time
You could update time so:
df["TIME"] = df["TIME"].apply(update_calculation)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to work with multiple of the same key? - python

You need a loop to get them all. for day, data in res['data'].items(): print(f"Weather on {day} was {data['weather_description']}")

Related

How to extract the data from a list of dictionaries?

Calling API multiple times - Python

Daily leaderboard or price tracking data

Pytrends - Interest over time - return column with None when there is no data

Iterating deeply nested pandas json object?

Categories

Resources