Querying from Soda database using Socrata client.get in Python - python

I am trying to query from a database and I've tried to lookup the right way to format an SoQL string but I am failing. I try the following:
from __future__ import division, print_function
from sodapy import Socrata
import pandas as pd
import numpy as np
client = Socrata("data.cityofchicago.org", None)
df = client.get("kkgn-a2j4", query="WHERE traffic > -1")
and receive an error that it Could not parse SoQL query "WHERE traffic > -1" at line 1 character 1. If I do the following, however, it works:
from __future__ import division, print_function
from sodapy import Socrata
import pandas as pd
import numpy as np
client = Socrata("data.cityofchicago.org", None)
df = client.get("kkgn-a2j4", where="traffic > -1")
But I want to know how to get the query argument to work so I can use more complex queries. Specifically, I want to try to query when traffic > -1 and BETWEEN '2013-01-19T23:50:32.000' AND '2014-12-14T23:50:32.000'.

You can use the sodapy where parameter ($where in SoQl) to combine multiple filters, just use AND to combine them:
traffic > -1 AND last_update BETWEEN '2013-01-19T23:50:32.000' AND '2014-12-14T23:50:32.000'

Related

how to create a custom metric in data dog using aws lambda

I am using aws lambda to calculate number of days for rds maintenance. Now i get an integer and i would like to send this to datadog so that it creates a metric. I am new to datadog and not sure how to do it. I already created lambda layer for datadog. Here is my code , since my lambda is doing alot of other stuff so i will only include the problematic block
import boto3
import json
import collections
import datetime
from dateutil import parser
import time
from datetime import timedelta
from datetime import timezone
from datadog_lambda.metric import lambda_metric
from datadog_lambda.wrapper import datadog_lambda_wrapper
import urllib.request, urllib.error, urllib.parse
import os
import sys
from botocore.exceptions import ClientError
from botocore.config import Config
from botocore.session import Session
import tracemalloc
<lambda logic, only including else block at which point i want to send the data>
print("WARNING! ForcedApplyDate is: ", d_fapply )
rdsForcedDate = parser.parse(d_fapply)
print (rdsForcedDate)
current_dateTime = datetime.datetime.now(timezone.utc)
difference = rdsForcedDate - current_dateTime
#print (difference)
if difference < timedelta(days=7):
rounded = difference.days
print (rounded)
lambda_metric(
"rds_maintenance.days", # Metric name
rounded, # Metric value
tags=['key:value', 'key:value'] # Associated tags
)
here i would like to send the number of days which could be 5, 10 15 any number. I have also added lambda extension layer, function runs perfectly but i dont see metric in DD
I have also tried using this
from datadog import statsd
if difference < timedelta(days=7):
try:
rounded = difference.days
statsd.gauge('rds_maintenance_alert', round(rounded,
2))
print ("data sent to datadog")
except Exception as e:
print(f"Error sending metric to Datadog: {e}")
Again i dont get error for this block too but cant see metric. Data dog api key, site are in lambda env variables

Accessing nested dictionary from a JSON, with variable headers

I am trying to use json_normalize to parse data from the yahoo financials package. Seem to be running into an issue when trying to separate the columns out from the last object, a variable date. Each date I believe is a dictionary, which contains various balance sheet line items.
My code is:
import json
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import yfinance as yf
from yahoofinancials import YahooFinancials
tickerinput = "AAPL"
ticker = yf.Ticker(tickerinput)
tickerfin = YahooFinancials(tickerinput)
balancesheet = tickerfin.get_financial_stmts('annual', 'balance')
''' Flattening w json_normalize'''
balsheet = pd.json_normalize(balancesheet, record_path=['balanceSheetHistory', tickerinput])
I have also tried using this below code but receive a key error, despite it being in the original JSON output.
balsheet = pd.json_normalize(balancesheet, record_path=['balanceSheetHistory', tickerinput], meta=['2021-09-25', ['totalLiab','totalStockholderEquity','totalAssets','commonStock','otherCurrentAssets','retainedEarnings','otherLiab','treasuryStock','otherAssets','cash','totalCurrentLiabilities','shortLongTermDebt','otherStockholderEquity','propertyPlantEquipment','totalCurrentAssets','longTermInvestments','netTangibleAssets','shortTermInvestments','netReceivables','longTermDebt','inventory','accountsPayable']], errors='ignore')
The main issue is that I am returned the below data frame:
Returned dataframe from balsheet
Sample Output of the JSON file:
JSON Output (balancesheet variable)

Where am I going wrong retrieving stock data from Quandl?

ValueError: The Quandl API key must be provided either through the api_key variable or through the environmental variable QUANDL_API_KEY.
I am trying to retrieve some simple stock data from Quandl. I have put in the actual API key instead of the x in the below example code below but I am still getting errors. Am I missing out on something?
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web
style.use('ggplot')
symbol = 'AAPL'
api_key = 'x'
start = dt.datetime(2015, 1, 1)
end = dt.datetime.now()
df = web.DataReader(symbol, 'quandl', start, end, api_key)
print(df.head())
From the quandl docs:
AUTHENTICATION The Quandl Python module is free but you must have a
Quandl API key in order to download data. To get your own API key, you
will need to create a free Quandl account and set your API key.
After importing the Quandl module, you can set your API key with the
following command: quandl.ApiConfig.api_key = "YOURAPIKEY"
So you will need to pip install and import quandl. Then you can set the api_key attribute as above.
If you only want to get the data from Quandl, maybe you can try another approach.
import pandas as pd
import Quandl
api_key = 'yoursuperamazingquandlAPIkey'
df = Quandl.get('heregoesthequandlcode', authtoken = api_key)
print(df.head())

Parse requests.get() output into a pandas dataframe

I am following a tutorial an am stuck at parsing the output of requests.get()
My goal is to connect to the API below to pull historical crypto-currency prices and put them into a pandas dataframe for further analysis.
[API: https://www.cryptocompare.com/api/#-api-data-histoday-]
Here's what I have.
import requests
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
print(response.text)
Now I want to output into a dataframe...
pd.DataFrame.from_dict(response)
But I get...
PandasError: DataFrame constructor not properly called!
You can use the json package to convert to dict:
import requests
from json import loads
import pandas as pd
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
dic = loads(response.text)
print(type(dic))
pd.DataFrame.from_dict(dic)
However as jonrsharpe noted, a much more simple way would be:
import requests
import pandas as pd
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
print(type(response.json()))
pd.DataFrame.from_dict(response.json())

Python retrieving data from web HTTP 400: Bad Request Error (Too many Requests?)

I am using a python module (googlefinance) to retrieve stock information. In my code, I create a symbols list which then gets sent into a loop to collect the information for each symbol.
The symbols list contains about 3000 indexes which is why I think I am getting this error. When I try shortening the range of the loop (24 requests), it works fine. I have tried also tried using a time delay in between requests but no luck. How can I make it so that I can retrieve the information for all specified symbols without getting the HTTP 400 Error?
from googlefinance import getQuotes
import pandas as pd
import pymysql
import time
import threading
import urllib.request
def createSymbolList(csvFile):
df = pd.read_csv(csvFile)
saved_column = df['Symbol']
return saved_column
def getSymbolInfo(symbolList):
newList=[]
for i in range(int(24)):
newList.append(getQuotes(symbolList[i]))
return newList
nyseList = createSymbolList("http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download")
try:
l=(getSymbolInfo(nyseList))
print(l)
print(len(l))
except urllib.error.HTTPError as err:
print(err)

Categories

Resources