Read json data from covid19 api using python - python

I was trying to import timeseries data from link Covid_data to get the daily historical and 7 day moving average data.But my code doesn't work. I am new to this so maybe my key value pair is not correct. The structure of the file is given here json_structure_link.
My Code
import requests
import pandas as pd
response = requests.get("https://api.covid19india.org/v4/min/timeseries.min.json")
if response.status_code == 200:
historical_day_numbers = response.json()
DATE = []
STATE = []
TOTAL_CASES = []
RECOVERED = []
DECEASED = []
TESTED = []
VACCINATED = []
for state in historical_day_numbers.keys():
STATE.append(state)
DATE.append(historical_day_numbers[state]["dates"])
TOTAL_CASES.append(historical_day_numbers[state]["dates"]["delta"]["confirmed"])
RECOVERED.append(historical_day_numbers[state]["dates"]["delta"]["recovered"])
DECEASED.append(historical_day_numbers[state]["dates"]["delta"]["deceased"])
TESTED.append(historical_day_numbers[state]["dates"]["delta"]["tested"])
VACCINATED.append(historical_day_numbers[state]["dates"]["delta"]["vaccinated"])
Covid19_historical_data = pd.DataFrame(
{
"STATE/UT": STATE,
"DATE": DATE,
"TOTAL_CASES": TOTAL_CASES,
"RECOVERED": RECOVERED,
"DECEASED": DECEASED,
"TESTED": TESTED,
"VACCINATED": VACCINATED,
}
)
#print(data.head())
else:
print("Error while calling API: {}".format(response.status_code, response.reason))
The error I am getting
KeyError: 'delta'
But I see the delta present.

historical_day_numbers[state]['dates'].keys()
Output: dict_keys(['2020-04-06', '2020-04-07', '2020-04-08', '2020-04-09', '2020-04-10', '2020-04-11', '2020-04-12', '2020-04-13', '2020-04-14', '2020-04-15', '2020-04-16', '2020-04-17', '2020-04-18', '2020-04-19', '2020-04-20', '2020-04-21',...])
When you type, you will realize that there is a key for each date and there is no key called 'delta' here.
If you edit your code as follows, you will not get this error.
historical_day_numbers[state]['dates']['2021-07-25']['delta']

Related

How to write a dataframe to dynamodb using AWS Lambda

I'M having a Lambda function set up in AWS Cloudformation. The runtime is python3.8.
The purpose is to pull some weather data from an API and write it to DynamoDB once a day.
So far the Lambda Test on AWS checks out, all green ...but the function doesnt write any values to the dynamodb.
Is there an error in indenting maybe?
Here is the code:
import boto3
import pyowm
import time
import json
import requests
from datetime import datetime, date, timedelta, timezone
import pandas as pd
from geopy.geocoders import Nominatim
def lambda_handler(event, context):
api_key = "xxxxxxx" #Enter your own API Key
owm = pyowm.OWM(api_key)
city = 'Berlin, DE'
geolocator = Nominatim(user_agent='aerieous#myserver.com')
location = geolocator.geocode(city)
lat = location.latitude
lon = location.longitude
# set the date to pull the data from to yesterday
# format = '2021-09-09 00:00:00'
x = (datetime.now() - timedelta(days = 1 ))
d = x.isoformat(' ', 'seconds')
# convert time to epoch
p = '%Y-%m-%d %H:%M:%S'
dt = int(time.mktime(time.strptime(d,p)))
url = "https://api.openweathermap.org/data/2.5/onecall/timemachine?lat=%s&lon=%s& dt=%s&appid=%s&units=metric" % (lat, lon, dt, api_key)
response = requests.get(url)
data_history = json.loads(response.text)
# here we flatten only the nested list "hourly"
df_history2 = pd.json_normalize(data_history, record_path='hourly', meta=['lat', 'lon', 'timezone'],
errors='ignore')
# convert epoch to timestamp
df_history2['dt'] = pd.to_datetime(df_history2['dt'],unit='s').dt.strftime("%m/%d/%Y %H:%M:%S")
# replace the column header
df_history2 = df_history2.rename(columns={'dt': 'timestamp'})
df_history2['uuid'] = df_history2[['timestamp','timezone']].agg('-'.join, axis=1)
df_select_hist2 = df_history2[['uuid','lat','lon', 'timezone', 'timestamp', 'temp', 'feels_like', 'humidity', 'pressure']]
df_select_hist2 = df_select_hist2.astype(str)
df_select_hist2
content = df_select_hist2.to_dict('records')
return content
dynamodb = boto3.resource(
'dynamodb',
aws_access_key_id='xx',
aws_secret_access_key='xx',
region_name='eu-west-1')
table = dynamodb.Table("Dev_Weather")
for item in content:
uuid = item['uuid']
timezone = item['timezone']
timestamp = item['timestamp']
lat = item['lat']
lon = item['lon']
temp = item['temp']
feels_like = item['feels_like']
humidity = item['humidity']
pressure = item['pressure']
table.put_item(
Item={
'pk_id': uuid,
'sk': timestamp,
'gsi_1_pk': lat,
'gsi_1_sk': lon,
'gsi_2_pk': temp,
'gsi_2_sk': feels_like,
'humidity': humidity,
'pressure': pressure,
'timezone': timezone
}
)
Thank you for any help in advance.
A
The line return content ends your lambda function. It basically tells the script: I'm done and this is the result. Nothing after it is executed. Remove the line to be able to execute code afterwards. Also, the indentation in your code example seems off (a space too little when starting the dynamodb stuff), so I'm a bit confused over why this wouldn't give syntax errors.
Also: there is no need to specify an access key, region etc. when creating the dynamodb resource. It's fetched by lambda automatically. Just make sure the lambda role has the right permissions to call dynamodb.

How to handle multiple missing keys in a dict?

I'm using an API to get basic information about shops in my area, name of shop, address, postcode, phone number etc… The API returns back a long list about each shop, but I only want some of the data from each shop.
I created a for loop that just takes the information that I want for every shop that the API has returned. This all works fine.
Problem is not all shops have a phone number or a website, so I get a KeyError because the key website does not exist in every return of a shop. I tried to use try and except which works but only if I only handle one thing, but a shop might not have a phone number and a website, which leads to a second KeyError.
What can I do to check for every key in my for loop and if a key is found missing to just add the value "none"?
My code:
import requests
import geocoder
import pprint
g = geocoder.ip('me')
print(g.latlng)
latitude, longitude = g.latlng
URL = "https://discover.search.hereapi.com/v1/discover"
latitude = xxxx
longitude = xxxx
api_key = 'xxxxx' # Acquire from developer.here.com
query = 'food'
limit = 12
PARAMS = {
'apikey':api_key,
'q':query,
'limit': limit,
'at':'{},{}'.format(latitude,longitude)
}
# sending get request and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
data = r.json()
#print(data)
for x in data['items']:
title = x['title']
address = x['address']['label']
street = x['address']['street']
postalCode = x['address']['postalCode']
position = x['position']
access = x['access']
typeOfBusiness = x['categories'][0]['name']
contacts = x['contacts'][0]['phone'][0]['value']
try:
website = x['contacts'][0]['www'][0]['value']
except KeyError:
website = "none"
resultList = {
'BUSINESS NAME:':title,
'ADDRESS:':address,
'STREET NAME:':street,
'POSTCODE:':postalCode,
'POSITION:':position,
'POSITSION2:':access,
'TYPE:':typeOfBusiness,
'PHONE:':contacts,
'WEBSITE:':website
}
print("--"*80)
pprint.pprint( resultList)
I think a good way to handle it would be to use the operator.itemgetter() to create a callable the will attempt to retrieve all the keys at once, and if any aren't found, it will generate a KeyError.
A short demonstration of what I mean:
from operator import itemgetter
test_dict = dict(name="The Shop", phone='123-45-6789', zipcode=90210)
keys = itemgetter('name', 'phone', 'zipcode')(test_dict)
print(keys) # -> ('The Shop', '123-45-6789', 90210)
keys = itemgetter('name', 'address', 'phone', 'zipcode')(test_dict)
# -> KeyError: 'address'

parse github api.. getting string indices must be integers error

I need to loop through commits and get name, date, and messages info from
GitHub API.
https://api.github.com/repos/droptable461/Project-Project-Management/commits
I have many different things but I keep getting stuck at string indices must be integers error:
def git():
#name , date , message
#https://api.github.com/repos/droptable461/Project-Project-Management/commits
#commit { author { name and date
#commit { message
#with urlopen('https://api.github.com/repos/droptable461/Project Project-Management/commits') as response:
#source = response.read()
#data = json.loads(source)
#state = []
#for state in data['committer']:
#state.append(state['name'])
#print(state)
link = 'https://api.github.com/repos/droptable461/Project-Project-Management/events'
r = requests.get('https://api.github.com/repos/droptable461/Project-Project-Management/commits')
#print(r)
#one = r['commit']
#print(one)
for item in r.json():
for c in item['commit']['committer']:
print(c['name'],c['date'])
return 'suc'
Need to get person who did the commit, date and their message.
item['commit']['committer'] is a dictionary object, and therefore the line:
for c in item['commit']['committer']: is transiting dictionary keys.
Since you are calling [] on a string (the dictionary key), you are getting the error.
Instead that code should look more like:
def git():
link = 'https://api.github.com/repos/droptable461/Project-Project-Management/events'
r = requests.get('https://api.github.com/repos/droptable461/Project-Project-Management/commits')
for item in r.json():
for key in item['commit']['committer']:
print(item['commit']['committer']['name'])
print(item['commit']['committer']['date'])
print(item['commit']['message'])
return 'suc'

Python - iterate through a list

I'm trying to automate email reporting using python. My problem is that i cant pull the subject from the data that my email client outputs.
Abbreviated dataset:
[(messageObject){
id = "0bd503eb00000000000000000000000d0f67"
name = "11.26.17 AM [TXT-CAT]{Shoppers:2}"
status = "active"
messageFolderId = "0bd503ef0000000000000000000000007296"
content[] =
(messageContentObject){
type = "html"
subject = "Early Cyber Monday – 60% Off Sitewide "
}
}
]
I can pull the other fields like this:
messageId = []
messageName = []
subject = []
for info in messages:
messageId.append(str(info['id']))
messageName.append(str(info['name']))
subject.append(str(info[content['subject']]))
data = pd.DataFrame({
'id': messageId,
'name': messageName,
'subject': subject
})
data.head()
I've been trying to iterate though content[] using a for loop, but i can't get it to work. Let me know if you have any suggestions.
#FamousJameous gave the correct answer:
That format is called SOAP. My guess for the syntax would be info['content']['subject'] or maybe info['content'][0]['subject']
info['content'][0]['subject'] worked with my data.

How to stream Csv file into BigQuery?

Examples I found so far is streaming json to BQ, e.g. https://cloud.google.com/bigquery/streaming-data-into-bigquery
How do I stream Csv or any file type into BQ? Below is a block of code for streaming and seems "issue" is in insert_all_data where 'row' defined as json.. thanks
# [START stream_row_to_bigquery]
def stream_row_to_bigquery(bigquery, project_id, dataset_id, table_name, row,
num_retries=5):
insert_all_data = {
'rows': [{
'json': row,
# Generate a unique id for each row so retries don't accidentally
# duplicate insert
'insertId': str(uuid.uuid4()),
}]
}
return bigquery.tabledata().insertAll(
projectId=project_id,
datasetId=dataset_id,
tableId=table_name,
body=insert_all_data).execute(num_retries=num_retries)
# [END stream_row_to_bigquery]
This is how I wrote using bigquery-python library very easily.
def insert_data(datasetname,table_name,DataObject):
client = get_client(project_id, service_account=service_account,
private_key_file=key, readonly=False, swallow_results=False)
insertObject = DataObject
try:
result = client.push_rows(datasetname,table_name,insertObject)
except Exception, err:
print err
raise
return result
Here insertObject is a list of dictionaries where one dictionary contains one row.
eg: [{field1:value1, field2:value2},{field1:value3, field2:value4}]
csv can be read as follows,
import pandas as pd
fileCsv = pd.read_csv(file_path+'/'+filename, parse_dates=C, infer_datetime_format=True)
data = []
for row_x in range(len(fileCsv.index)):
i = 0
row = {}
for col_y in schema:
row[col_y['name']] = _sorted_list[i]['col_data'][row_x]
i += 1
data.append(row)
insert_data(datasetname,table_name,data)
data list can be sent to the insert_data
This will do that but still there's a limitation that I already raised here.

Categories

Resources