Python / Bloomberg api - historical security data incl. weekends (sat. and sun.) - python

I'm currently working with an blpapi and trying to get a bdh of an index including weekends. (I'll later need to match this df with another date vector.)
I'm allready using
con.bdh([Index],['PX_LAST'],'19910102', today.strftime('%Y%m%d'), [("periodicitySelection", "DAILY")])
but this will return only weekdays (mon - fr). I know how this works in excel with the bbg function-builder but not sure about the wording within the blpapi.
Since I'll need always the first of each month,
con.bdh([Index],['PX_LAST'],'19910102', today.strftime('%Y%m%d'), [("periodicitySelection", "MONTHLY")])
wont work as well because it will return 28,30,31 and so.
Can anyone help here? THX!

You can use a combination of:
"nonTradingDayFillOption", "ALL_CALENDAR_DAYS" # include all days
"nonTradingDayFillMethod", "PREVIOUS_VALUE" # fill non-trading days with previous value

Related

How can I get all the dates of current week?

I have a use case where I need to count the server failures for current week. I am counting by reading a file where all the dates and failures are given. But I need to Calculate weekly failures, So I thought of to get all the dates in current week and compare that with dates in file and hence count the failures. So the question is how can I get all the dates of current week ? Also, how can i check if any date comes in that week?
Can anyone please help ?
Using pandas:
df.loc[df["dates"].dt.week == week_number]
This simply gets all the rows where week is equal to the specified week (you can find out that week by trying a dummy value and using .dt.week).
print(date.today())
for x in range(7):
print(date.today() + timedelta(days=x))

python groupby to dataframe (just groupby to data no additional functions) to export to excel

I am at a total loss as to why this is impossible to find but I really just want to be able to groupby and then export to excel. Don't need counts, or sums, or anything else and can only find examples including these functions. Tried removing those functions and the whole code just breaks.
Anyways:
Have a set of monthly metrics - metric name, volumes, date, productivity, and fte need. Simple calcs got the data looking nice, good to go. Currently it is grouped in 1 month sections so all metrics from Jan are one after the other etc. Just want to change the grouping so first section is individual metrics from Jan to Dec and so on for each one.
Initial data I want to export to excel (returns not a dataframe error)
dfcon = pd.concat([PmDf,ReDf])
dfcon['Need'] = dfcon['Volumes'] / (dfcon['Productivity']*21*8*.80)
dfcon[['Date','Current Team','Metric','Productivity','Volumes','Need']]
dfg = dfcon.groupby(['Metric','Date'])
dfg.to_excel(r'S:\FilePATH\GroupBy.xlsx', sheet_name='pandas_group', index = 0)
The error I get here is: 'DataFrameGroupBy' object has no attribute 'to_excel' (I have tried a variety of conversions to dataframes and closest I can get is a correct grouping displaying counts only for each one, which I do not need in the slightest)
I have also tried:
dfcon.sort('Metric').to_excel(r'S:\FILEPATH\Grouped_Output.xlsx', sheet_name='FTE Need', index = 0)
this returns the error: AttributeError: 'DataFrame' object has no attribute 'sort'
Any help you can give to get this to be able to be exported grouped in excel would be great. I am at my wits end here after over an hour of googling. I am also self taught so feel like I may be missing something very, very basic/simple so here I am!
Thank you for any help you can provide!
Ps: I know I can just sort after in excel but would rather learn how to make this work in python!
I am pretty sure sort() doesnt work anymore, try sort_values()

Validating user access logs with pandas DataFrames

I am not very proficient in pandas, but have been using it for various project for the last year or so - I like it so far, but I have not really gotten a firm grip on it, and so I would love some help. I have tried googling for days, but I'm approaching a point where I just want to use pandas as an iterator, which seems like a waste. I feel I might just be missing some basic terminology and just don't know what to search for, but I am getting fed up with reading and searching.
What I am working on right now requires me to check some logs for valid access, by comparing ID's and dates of access with something like an user registry. I'm using python and pandas because they are the tools I feel most comfortable with, but I am open to suggestions on other approaches. The registry is parsed from a few excelsheets managed by someone else and the logs are nicely ordered csv's.
Loading these into two DataFrames, I want to check each log entry for validity. One dataframe acts as a registry, providing user ID's, a creation date and an end date and the other contains the logs as rows, with userid and a timestamp:
Registry
created end ID
1 2018-09-04 NaT 66f56efc5cc6353ba
2 2018-10-09 2018-11-09 167a2c65133d9f4a3
3 2018-10-09 2018-11-09 f0efc501e52e7b1e1
Logs
Timestamp ID
0 2019-08-01 00:01:48.027 4459eeab695a2
1 2019-08-01 00:06:03.981 e500df5f2c2ed
2 2019-08-01 00:06:36.100 e500df5f2c2ed
I want to check each log entry against my registry to see, if access was permitted when it occured. I have written a function to check ID and date against my registry, but I need to figure out how to apply my check to the whole log-dataframe:
def validate(userid, date): #eg. 'wayf-1234', datetime.date(2019,11,23)
df_target=df_registry[df_acl['created'].notnull() & ~(df_registry['end'] < date)]
return (df_target.values==userid).any()
My first inclination was to use the function directly like a row selector (not sure what to call it), but it doesn't work:
df_logs[validate(df_logs['id'], df_logs['Timestamp']) == True]
I am pretty sure it would be incredibly inefficient to initiate a dataframe for every row to check for a specific date, but I'm just hacking and trying to make something work and inefficient is fine for now. But I would really love know if someone has any input or perspectives on how to work this.
Should I just iterate through the rows of the dataframe and apply my logic for each line (which seems to work counter to how pandas is supposed to be used) or is there a smarter way to go about it?
Thanks.
merge_asof is the tool here. It allow to exactly merge on a list of columns (with by) and then to find the highest value in one column immediately under the value in the other dataframe.
Here you could use:
tmp = pd.merge_asof(logs, registry, left_on = ['Timestamp'],
right_on=['created'], by='ID')
For each line from logs, you get the line from registry with same ID and a created date immediately below the Timestamp. Then an access is valid if end is NaT or greater that timestamp (after adding one day...):
tmp['valid'] = tmp.end.isna()|(tmp.end + pd.Timedelta('1D') > tmp.Timestamp)
Beware: this only works if date columns are true pd.Timestamp (dtype datetime64)...

Is there a replacement for this syntax?

with the new update on pandas I can't use this function that I used on on Datacamp learning course - (DAYOFWEEK doesn't exist anymore)
days_of_week = pd.get_dummies(dataframe.index.dayofweek,
prefix='weekday',
drop_first=True)
How can I change the syntax of my 'formula' to get the same results?
Sorry about the silly question but spent a lot of time here and I'm stuck...
Thanks in advance!
already tried just using the dataframe with index but doesn't get the days of the week on the get dummies\
used datetimeindex but messing up on the formulation as well
`days_of_week = pd.get_dummies(dataframe.index.dayofweek, prefix='weekday', drop_first=True)`
the dataframe is fairly big and need the outputs to get me the weekdays because I'm dealing with stock prices
Try weekday instead of dayofweek.
So
days_of_week = pd.get_dummies(dataframe.index.weekday,
prefix='weekday',
drop_first=True)
See docs below:
pandas.Series.dt.weekday

Python: How to extract time date specific information from text/nltk_contrib timex.py bug

I am new to python. I am looking for ways to extract/tag the date & time specific information from text
e.g.
1.I will meet you tomorrow
2. I had sent it two weeks back
3. Waiting for you last half an hour
I had found timex from nltk_contrib, however found couple of problems with it
https://code.google.com/p/nltk/source/browse/trunk/nltk_contrib/nltk_contrib/timex.py
b. Not sure of the Date data type passed to ground(tagged_text, base_date)
c. It deals only with date i.e. granularity at day level. Cant find expression like next one hour etc.
Thank you for your help
b) The data type that you need to pass to ground(tagged_text, base_date) is an instance of the datetime.date class which you'd initialize using something like:
from datetime import date
base_date = date.today()

Categories

Resources