I have a stock database of the following form where data is given by the minute.
I want to fetch all results which have at least N rows for a particular ticker, for a particular date. (when grouped by ticker and date)
The purpose is to ignore missing data for a given ticker on a given date. There should be at least 629 rows for any given ticker,date combination. Anything less than this, I want to ignore that combination.
I tried something like the following but it wasn't working
start_date = '2021-11-08'
SQL = "SELECT datetime, datetime_utc, ticker,open,close,session_type, high,low,volume FROM candles WHERE id in (SELECT id from candles WHERE session_type='Live' GROUP BY datetime, ticker HAVING COUNT(ID) >= 629) AND datetime >= '{} 00:00:00'::timestamp ORDER BY ticker, datetime_utc ASC".format(start_date)
Could anyone point me at the right SQL incantation? I am using postgres > 9.6.
sample data (apologies i tried to set up an SQL fiddle but it wasn't allowing me to due to character limitations, then data type limitations. The data is below in json, set to be read by pandas)
df = pd.io.json.read_json('{"datetime":{"12372":1636572720000,"12351":1636571460000,"12493":1636590240000,"15210":1636633380000,"16212":1636642500000,"16009":1636560060000,"12213":1636554180000,"16386":1636657800000,"13580":1636572660000,"14733":1636659000000,"12086":1636537200000,"14802":1636667880000,"14407":1636585980000,"14356":1636577640000,"16086":1636574280000,"15437":1636661040000,"14115":1636577400000,"14331":1636576140000,"12457":1636582800000,"14871":1636678080000},"datetime_utc":{"12372":1636572720000,"12351":1636571460000,"12493":1636590240000,"15210":1636633380000,"16212":1636642500000,"16009":1636560060000,"12213":1636554180000,"16386":1636657800000,"13580":1636572660000,"14733":1636659000000,"12086":1636537200000,"14802":1636667880000,"14407":1636585980000,"14356":1636577640000,"16086":1636574280000,"15437":1636661040000,"14115":1636577400000,"14331":1636576140000,"12457":1636582800000,"14871":1636678080000},"ticker":{"12372":"AAPL","12351":"AAPL","12493":"AAPL","15210":"AAPL","16212":"NET","16009":"NET","12213":"AAPL","16386":"NET","13580":"NET","14733":"AAPL","12086":"AAPL","14802":"AAPL","14407":"AAPL","14356":"AAPL","16086":"NET","15437":"AAPL","14115":"NET","14331":"AAPL","12457":"AAPL","14871":"AAPL"},"open":{"12372":148.29,"12351":148.44,"12493":148.1,"15210":148.68,"16212":199.72,"16009":202.27,"12213":150.21,"16386":198.65,"13580":194.9,"14733":147.94,"12086":150.2,"14802":147.87,"14407":148.1,"14356":148.09,"16086":193.82,"15437":148.01,"14115":194.64,"14331":148.07,"12457":148.12,"14871":148.2},"close":{"12372":148.32,"12351":148.44,"12493":148.15,"15210":148.69,"16212":199.32,"16009":202.52,"12213":150.25,"16386":198.57,"13580":194.96,"14733":147.99,"12086":150.17,"14802":147.9,"14407":148.1,"14356":147.99,"16086":194.43,"15437":148.01,"14115":194.78,"14331":148.05,"12457":148.11,"14871":148.28},"session_type":{"12372":"Live","12351":"Live","12493":"Post","15210":"Pre","16212":"Live","16009":"Live","12213":"Pre","16386":"Live","13580":"Live","14733":"Live","12086":"Pre","14802":"Post","14407":"Post","14356":"Live","16086":"Live","15437":"Live","14115":"Live","14331":"Live","12457":"Post","14871":"Post"},"high":{"12372":148.3600006104,"12351":148.4700012207,"12493":148.15,"15210":148.69,"16212":199.8399963379,"16009":202.5249938965,"12213":150.25,"16386":198.6499938965,"13580":195.0299987793,"14733":147.9900054932,"12086":150.2,"14802":147.9,"14407":148.1,"14356":148.1049957275,"16086":194.4400024414,"15437":148.0399932861,"14115":195.1699981689,"14331":148.0800018311,"12457":148.12,"14871":148.28},"low":{"12372":148.26,"12351":148.38,"12493":148.06,"15210":148.68,"16212":199.15,"16009":202.27,"12213":150.2,"16386":198.49,"13580":194.79,"14733":147.93,"12086":150.16,"14802":147.85,"14407":148.1,"14356":147.98,"16086":193.82,"15437":148.0,"14115":194.64,"14331":148.01,"12457":148.07,"14871":148.2},"volume":{"12372":99551.0,"12351":68985.0,"12493":0.0,"15210":0.0,"16212":9016.0,"16009":2974.0,"12213":0.0,"16386":1395.0,"13580":5943.0,"14733":59854.0,"12086":0.0,"14802":0.0,"14407":0.0,"14356":341196.0,"16086":8715.0,"15437":45495.0,"14115":16535.0,"14331":173785.0,"12457":0.0,"14871":0.0},"date":{"12372":1636502400000,"12351":1636502400000,"12493":1636502400000,"15210":1636588800000,"16212":1636588800000,"16009":1636502400000,"12213":1636502400000,"16386":1636588800000,"13580":1636502400000,"14733":1636588800000,"12086":1636502400000,"14802":1636588800000,"14407":1636502400000,"14356":1636502400000,"16086":1636502400000,"15437":1636588800000,"14115":1636502400000,"14331":1636502400000,"12457":1636502400000,"14871":1636588800000}}')
Expected result
if N = 4, then the results will exclude NET on 2021-11-11. (count of all shown here for illustration)
Like below (sample only)
You can use analytical function (window function) to count rows for each date and ticker combination.
select datetime, datetime_utc, ticker,open,close,session_type, high,low,volume
from (
SELECT datetime, datetime_utc, ticker,open,close,session_type, high,low,volume,
COUNT(*) OVER(PARTITION BY CAST(datetime AS DATE), ticker) as C
FROM candles
)
WHERE C >= 5
I am not familiar with postgres, so it may contain syntax problem.
You can use the pandasql with a nested groupby to count the daily data points for all tickers and use an inner join
import pandasql as ps
N=4
df_select = ps.sqldf(f"select df.* from (select count(*) as count, ticker, date from df group by date, ticker having count >= {N}) df_counts left join df on df.ticker=df_counts.ticker and df.date=df_counts.date")
Output:
I have a postgres table that stores recurring appointments called appointments. I also have an enum type called day_of_week.
day_of_week is an enum that looks like:
select enum_range(null::ck_day_of_week);
returns
{Monday,Tuesday,Thursday,Sunday,Saturday,Friday,Wednesday}
The appointments table's DDL is:
create table appointments
(
id serial primary key not null,
title varchar not null
recurrence_start_date timestampz not null,
recurrence_end_date timestampz,
recurrence_days_of_week _ck_day_of_week -- this is actually an array of enums, not just an enum. I don't fully understand why the DDL looks like this
);
A query of
select id, title, recurrence_days_of_week from appointments where recurrence_start_date <= now() and (recurrence_end_date is null or recurrence_end_date >= now());
returns something like this:
1,My Recurring Appointment,{Wednesday,Thursday,Friday,Saturday,Sunday}
My question is, I need to 'materialize' this result as a set of rows for my UI view. Specifically I'd like to query the DB and have it return rows like:
id,My Recurring Appointment,date
where a unique key would likely be the appointment id and specific date and where date is a essentially a combination of recurrence_start_date + day of week (day of week is an offset between 0 and 6).
In postgres, how do I select from a table that has a column which is an enum array and have it return a --virtual-- single row of the --real-- same row for each enum value in the array? I'm thinking of a view or something, but I'm now sure how to write the select statement.
Bonus points if this can be translated into SQLAlchemy....
I'm also open and have considered just creating an appointments table that stores each appointment with its specific date. But if I do this...in theory, for appointments with recurring dates where there is no end date, the table could be infinitely large, so I've been trying to avoid that.
First, I'd not suggest using id as a column name ever.
Second, I'd use built in date/time functions to achieve the 1st objective. More specificly the combination extract function with dow parameter (day of week) and generate_series function with.
Make sure to check fiddle in the References section.
Query returning list of appointments for each day of week:
create or replace function fnc_appointment_list_dow_by_id(_app_id int)
returns table (__app_id int, __title varchar, __recurrence_start_date timestamptz, __recurrence_appointment_day timestamptz, __reccurence_dow_offset int) as
$body$
select
app.app_id,
app.app_title,
app.app_recurrence_start_date,
_appointment_days._day,
extract(dow from _day)::int
from
appointments app,
generate_series(app.app_recurrence_start_date, app.app_recurrence_end_date, interval '1 day') _appointment_days(_day)
where app.app_id=_app_id;
$body$
language sql stable; -- consider changing function volatility as needed.
-- select * from fnc_appointment_list_dow_by_id(1);
Query returning array of appointment days of week:
create or replace function fnc_appointment_array_dow_by_id(_app_id int)
returns table (__app_id int, __title varchar, __recurrence_start_date timestamptz, __recurrence_appointment_day timestamptz[], __reccurence_dow_offset int[]) as
$body$
select
app.app_id,
app.app_title,
app.app_recurrence_start_date,
array_agg(_day::timestamptz),
array_agg(extract(dow from _day)::int)
from
appointments app,
generate_series(app.app_recurrence_start_date, app.app_recurrence_end_date, interval '1 day') as _appointment_days(_day)
where app.app_id=_app_id
group by app.app_id, app.app_title, app.app_recurrence_start_date;
$body$
language sql stable; -- consider changing function volatility as needed.
-- select * from fnc_appointment_array_dow_by_id(1);
If either of functions is what you are looking for let me know and we can move on to next objectives.
References:
Documentation on date/time functions (look for extract
function)
Documentation on generate_series functions (look for
generate_series function with arguments of timestamp type)
Fiddle (Don't mind array of timestamptz being displayed as array of ints)