I want to create a DataFrame with data for Tennis matches of a specific player 'Lenny Hampel'. For this I downloaded a lot of .json files with data for of his matches - all in all there are around 100 files. As it is a json file i need to convert every single file into a dict, to get it into the dataframe in the end. Finally I need to concatenate each file to the dataframe. I could hard-code it, however it is kind of silly I think, but I could not find a proper way to iterate trough this.
Could you help me understand how I could create a loop or smth else in order to code it the smart way?
from bs4 import BeautifulSoup
import requests
import json
import bs4 as bs
import urllib.request
from urllib.request import Request, urlopen
import pandas as pd
import pprint
with open('lenny/2016/lenny2016_match (1).json') as json_file:
lennymatch1 = json.load(json_file)
player = [item
for item in lennymatch1["stats"]
if item["player_fullname"] == "Lenny Hampel"]
with open('lenny/2016/lenny2016_match (2).json') as json_file:
lennymatch2 = json.load(json_file)
player2 = [item
for item in lennymatch2["stats"]
if item["player_fullname"] == "Lenny Hampel"]
with open('lenny/2016/lenny2016_match (3).json') as json_file:
lennymatch3 = json.load(json_file)
player33 = [item
for item in lennymatch3["stats"]
if item["player_fullname"] == "Lenny Hampel"]
with open('lenny/2016/lenny2016_match (4).json') as json_file:
lennymatch4 = json.load(json_file)
player4 = [item
for item in lennymatch4["stats"]
if item["player_fullname"] == "Lenny Hampel"]
tabelle1 = pd.DataFrame.from_dict(player)
tabelle2 = pd.DataFrame.from_dict(player2)
tabelle3 = pd.DataFrame.from_dict(player33)
tabelle4 = pd.DataFrame.from_dict(player4)
tennisstats = [tabelle1, tabelle2, tabelle3, tabelle4]
result = pd.concat(tennisstats)
result
Well, it seems so basic knowledge that I don't understand why you ask for this.
# --- before loop ---
tennisstats = []
# --- loop ---
for filename in ["lenny/2016/lenny2016_match (1).json", "lenny/2016/lenny2016_match (2).json"]:
with open(filename) as json_file:
lennymatch = json.load(json_file)
player = [item
for item in lennymatch["stats"]
if item["player_fullname"] == "Lenny Hampel"]
tabele = pd.DataFrame.from_dict(player)
tennisstats.append(tabele)
# --- after loop ---
result = pd.concat(tennisstats)
If filenames are similar and they have only different number
for number in range(1, 101):
filename = f"lenny/2016/lenny2016_match ({number}).json"
with open(filename) as json_file:
and rest is the same as in first version.
If all files are in the same folder then maybe you should use os.listdir()
directory = "lenny/2016/"
for name in os.listdir(directory):
filename = directory + name
with open(filename) as json_file:
and rest is the same as in first version.
Related
scraping an API for NBA Player Props. It's a nested json where I filtered for my desired output.
import json
from urllib.request import urlopen
import csv
import os
# Delete CSV
os.remove('Path')
jsonurl = urlopen('https://sportsbook.draftkings.com//sites/US-SB/api/v4/eventgroups/88670846/categories/583/subcategories/5001')
games = json.loads(jsonurl.read())['eventGroup']['offerCategories'][8]['offerSubcategoryDescriptors'][0]['offerSubcategory']['offers']
# Open a new file as our CSV file
with open('Path', "a", newline='', encoding='utf-8') as csv_file:
csv_writer = csv.writer(csv_file)
# Add a header
csv_writer.writerow(["participant", "line", "oddsDecimal"])
for game in games:
for in_game in game:
outcomes = in_game.get('outcomes')
for outcome in outcomes:
# Write CSV
csv_writer.writerow([
outcome["participant"],
outcome["line"],
outcome["oddsDecimal"]
])
So my issue is here that I have index "8" at "offerCategories" hardcoded in my code (also "0" at "Sub") and it's dynamically changing from day to day at this provider. Not familiar with this stuff but can't figure out how to query this through string with "name": "Player Combos" (index=8 in this example)
Thx in advice!
Given the data structure you have, you need to iterate through the 'offerCategories' sub-list looking for the map with the appropriate 'name' key. It seems that your use of '0' as an index is fine, since there is only a single value to choose in that case:
import json
from urllib.request import urlopen
jsonurl = urlopen(
'https://sportsbook.draftkings.com//sites/US-SB/api/v4/eventgroups/88670846/categories/583/subcategories/5001')
games = json.loads(jsonurl.read())
for offerCategory in games['eventGroup']['offerCategories']:
if offerCategory['name'] == 'Player Combos':
offers = offerCategory['offerSubcategoryDescriptors'][0]['offerSubcategory']['offers']
print(offers)
break
Result:
[[{'providerOfferId': '129944623', 'providerId': 2, 'providerEventId': '27323732', 'providerEventGroupId': '42648', 'label': 'Brandon Clarke Points + Assists + Rebounds...
I'm trying to save some API request on a DataFrame, but after running the code, it continues empty. What am I missing? Could someone help me?
import requests
import json
import pandas as pd
from pandas import json_normalize
lista = []
lista = pd.DataFrame(lista)
contador = 0
data_inicial = 'https://public2-api2.ploomes.com/Deals'
while contador <= 3:
data_text = requests.get(data_inicial, headers={'User-Key':).text
json_object = json.loads(data_text)
json_formatted_str = json.dumps(json_object,indent=2)
data_teste = requests.get(data_inicial, headers={'User-Key':})
dictr = data_teste.json()
recs = dictr['value']
code = json_normalize(recs)
lista.append(code)
next_link = dictr['#odata.nextLink']
data_inicial = next_link
contador = contador + 1
continue
I deleted the header to avoid problems. But I think its something on the sintaxe.
When you use lista.append(code), it is appending to a data frame because of the earlier lines:
lista = []
lista = pd.DataFrame(lista)
Data frame appends are not in place. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html. It is also deprecated. Append to the list (which is in place) by omitting the second lista assignment, then pd.concat.
I am using a Pokemon API : https://pokeapi.co/api/v2/pokemon/
and I am trying to make a list which can store 6 pokemon ID's then, using a for loop, call to the API and retrieve data for each pokemon. Finally, I want to save this info in a txt file. This is what I have so far:
import random
import requests
from pprint import pprint
pokemon_number = []
for i in range (0,6):
pokemon_number.append(random.randint(1,10))
url = 'https://pokeapi.co/api/v2/pokemon/{}/'.format(pokemon_number)
response = requests.get(url)
pokemon = response.json()
pprint(pokemon)
with open('pokemon.txt', 'w') as pok:
pok.write(pokemon_number)
I don't understand how to get the API to read the IDs from the list.
I hope this is clear, I am in a right pickle.
Thanks
You are passing pokemon_number to the url variable, which is a list. You need to iterate over the list instead.
Also, to actually save the pokemon, you can use either the name or it's ID as the filename. The JSON library allows for easy saving of objects to JSON files.
import random
import requests
import json
# renamed this one to indicate it's not a single number
pokemon_numbers = []
for i in range (0,6):
pokemon_numbers.append(random.randint(1,10))
# looping over the generated IDs
for id in pokemon_numbers:
url = f"https://pokeapi.co/api/v2/pokemon/{id}/"
# if you use response, you overshadow response from the requests library
resp = requests.get(url)
pokemon = resp.json()
print(pokemon['name'])
with open(f"{pokemon['name']}.json", "w") as outfile:
json.dump(pokemon, outfile, indent=4)
I now have this:
import requests
pokemon_number = []
for i in range (0,6):
pokemon_number.append(random.randint(1,50))
x = 0
while x <len(pokemon_number):
print(pokemon_number[x])
x = x +1
url = 'https://pokeapi.co/api/v2/pokemon/{}/'.format(pokemon_number[])
response = requests.get(url)
pokemon = response.json()
print(pokemon)
print(pokemon['name'])
print(pokemon['height'])
print(pokemon['weight'])
with open('pokemon.txt', 'w') as p:
p.write(pokemon['name'])
p.write(pokemon['ability'])
I am iterating over a dict created from a json file which works fine but as soon as I remove some of the entries in the else clause the results change (normally it prints 35 nuts_ids but with the remove in the else only 32 are printed. So it seems that the remove influences the iterating but why? The key should be safe? How can I do this appropriately without loosing data?
import json
with open("test.json") as json_file:
json_data = json.load(json_file)
for g in json_data["features"]:
poly = g["geometry"]
cntr_code = g["properties"]["CNTR_CODE"]
nuts_id = g["properties"]["NUTS_ID"]
name = g["properties"]["NUTS_NAME"]
if cntr_code == "AT":
print(nuts_id)
# do plotting etc
else: # delete it if it is not part a specific country
json_data["features"].remove(g) # line in question
# do something else with the json_data
Not a good practice to delete items while iterating the object. Instead you can try filtering out the elements you do need.
Ex:
import json
with open("test.json") as json_file:
json_data = json.load(json_file)
json_data_features = [g for g in json_data["features"] if g["properties"]["CNTR_CODE"] == "AT"] #Filter other country codes.
json_data["features"] = json_data_features
for g in json_data["features"]:
poly = g["geometry"]
cntr_code = g["properties"]["CNTR_CODE"]
nuts_id = g["properties"]["NUTS_ID"]
name = g["properties"]["NUTS_NAME"]
# do plotting etc
# do something else with the json_data
Always remember the cardinal rule, never modify objects you are iterating on
You can take a copy of your dictionary and then iterate on it using copy.copy
import json
import copy
with open("test.json") as json_file:
json_data = json.load(json_file)
#Take copy of json_data
json_data_copy = json_data['features'].copy()
#Iterate on the copy
for g in json_data_copy:
poly = g["geometry"]
cntr_code = g["properties"]["CNTR_CODE"]
nuts_id = g["properties"]["NUTS_ID"]
name = g["properties"]["NUTS_NAME"]
if cntr_code == "AT":
print(nuts_id)
# do plotting etc
else: # delete it if it is not part a specific country
json_data["features"].remove(g) # line in question
I wanted to edit a csv file which reads the value from one of my another json file in python 2.7
my csv is : a.csv
a,b,c,d
,10,12,14
,11,14,15
my json file is a.json
{"a":20}
i want my where the column 'a' will try to match in json file. if their is a match. it should copy that value from json and paste it to my csv file and the final output of my csv file should be looks like this.
a,b,c,d
20,10,12,14
20,11,14,15
Till now I what I have tried is
fileCSV = open('a.csv', 'a')
fileJSON = open('a.json', 'r')
jsonData = fileJSON.json()
for k in range(jsonData):
for i in csvRow:
for j in jsonData.keys():
if i == j:
if self.count == 0:
self.data = jsonData[j]
self.count = 1
else:
self.data = self.data + "," + jsonData[j]
self.count = 0
fileCSV.write(self.data)
fileCSV.write("\n")
k += 1
fileCSV.close()
print("File created successfully")
I will be really thankful if anyone can help me for this.
please ignore any syntactical and indentation error.
Thank You.
Some basic string parsing will get you here.. I wrote a script which works for the simple scenario which you refer to.
check if this solves your problem:
import json
from collections import OrderedDict
def list_to_csv(listdat):
csv = ""
for val in listdat:
csv = csv+","+str(val)
return csv[1:]
lines = []
csvfile = "csvfile.csv"
outcsvfile = "outcsvfile.csv"
jsonfile = "jsonfile.json"
with open(csvfile, encoding='UTF-8') as a_file:
for line in a_file:
lines.append(line.strip())
columns = lines[0].split(",")
data = lines[1:]
whole_data = []
for row in data:
fields = row.split(",")
i = 0
rowData = OrderedDict()
for column in columns:
rowData[columns[i]] = fields[i]
i += 1
whole_data.append(rowData)
with open(jsonfile) as json_file:
jsondata = json.load(json_file)
keys = list(jsondata.keys())
for key in keys:
value = jsondata[key]
for each_row in whole_data:
each_row[key] = value
with open(outcsvfile, mode='w', encoding='UTF-8') as b_file:
b_file.write(list_to_csv(columns)+'\n')
for row_data in whole_data:
row_list = []
for ecolumn in columns:
row_list.append(row_data.get(ecolumn))
b_file.write(list_to_csv(row_list)+'\n')
CSV output is not written to the source file but to a different file.
The output file is also always truncated and written, hence the 'w' mode.
I would recommend using csv.DictReader and csv.DictWriter classes which will read into and out of python dicts. This would make it easier to modify the dict values that you read in from the JSON file.