Sorting dataframe rows by Day of Date wise - python

I have made my dataframe. But I want to sort it by the date wise..For example, I want data for 02.01.2016 just after 01.01.2016.
df_data_2311 = df_data_231.groupby('Date').agg({'Wind Offshore in [MW]': ['sum']})
df_data_2311 = pd.DataFrame(df_data_2311)
After running this, I got the below output. This dataframe has 2192 rows.
Wind Offshore in [MW]
sum
Date
01.01.2016 5249.75
01.01.2017 12941.75
01.01.2018 19020.00
01.01.2019 13723.00
01.01.2020 17246.25
... ...
31.12.2017 21322.50
31.12.2018 13951.75
31.12.2019 21457.25
31.12.2020 16491.25
31.12.2021 35683.25
Kindly let me know How would I sort this data of the day of the date.

You can use the sort_values function in pandas.
df_data_2311.sort_values(by=["Date"])
However in order to sort them by the Date column you will need reset_index() on your grouped dataframe and then to convert the date values to datetime, you can use pandas.to_datetime.
df_data_2311 = df_data_231.groupby('Date').agg({'Wind Offshore in [MW]': ['sum']}).reset_index()
df_data_2311["Date"] = pandas.to_datetime(df_data_2311["Date"], format="%d.%m.%Y")
df_data_2311 = df_data_2311.sort_values(by=["Date"])
I recommend reviewing the pandas docs.

Related

Grouping by week and product and summing over IDs in pandas

I have a pandas dataframe containing amongst others the columns Product_ID, Producttype and a Timestamp. It looks roughly like this:
df
ID Product Time
C561 PX 2017-01-01
00:00:00
T801 PT 2017-01-01
00:00:01
I already converted the Time column into the datetime format.
Now I would like to sum up the number of different IDs per Product in a particular week.
I already tried a for loop:
for data['Time'] in range(start='1/1/2017', end='8/1/2017'):
data.groupby('Product')['ID'].sum()
But range requires an integer.
I also thought about using pd.Grouper with freq="1W" but then I don't know how to combine it with both Product and ID.
Any help is greatly appreciated!

For each NAME, calculate the average SNOW for each month

import pandas as pd
import numpy as np
# Show the specified columns and save it to a new file
col_list= ["STATION", "NAME", "DATE", "AWND", "SNOW"]
df = pd.read_csv('Data.csv', usecols=col_list)
df.to_csv('filteredData.csv')
df['year'] = pd.DatetimeIndex(df['DATE']).year
df2016 = df[(df.year==2016)]
df_2016 = df2016.groupby(['NAME', 'DATE'])['SNOW'].mean()
df_2016.to_csv('average2016.csv')
How come my dates are not ordered correctly here? Row 12 should be on the top but it's on the bottom of May instead and same goes for row 25
The average of SNOW per NAME/month is also not being displayed on my excel sheet. Why is that? Basically, I'm trying to calculate the average SNOW for May in ADA 0.7 SE, MI US. Then calculate the average SNOW for June in ADA 0.7 SE, MI US. etc..
I've spent all day and this is all I have got... Any help will be appreciated. Thanks in advance.
original data
https://gofile.io/?c=1gpbyT
Please try
Data
df=pd.read_csv(r'directorywhere the data is\data.csv')
df
Working
df.dtypes# Checking the datatype on each column
df.columns#listing columns
df['DATE']=pd.to_datetime(df['DATE'])#Converting date from object to a date format
df.set_index(df['DATE'], inplace=True)#Seeting the date as index
df['SNOW'].fillna(0)#filling all Not a Number values with zeros to make aggregation possible
df['SnowMean']=df.groupby([df.index.month, df.NAME])['SNOW'].transform('mean')#Groupby name, month and calculate the mean of snow. Store the result in anew column called df['SnowMean']
df
Checking
df.loc[:,['DATE','Month','SnowMean']]# Slice relevant columns to check
I realize you have multiple years. If you wanted mean per month in each year, again extract the year and add it in the groups to groupby as follows
df['SnowMeanPerYearPerMonth']=df.groupby([df.index.month,df.index.year,df.NAME])['SNOW'].transform('mean')
df
Check again
pd.set_option('display.max_rows',999)#diaplay upto 999 rows to check
df.loc[:,['DATE','Month','Year','SnowMean']]# Slice relevant columns to check

How to get mean value of every month in such dataframe?

dataframe
time A100 A101 A102
2017/1/1 0:00
2017/1/1 1:00
2017/1/1 2:00
...
2017/12/31 23:00
I have a dataframe as shown above, which includes 24 hours daily records in 2017. How can I get every month's mean value of every column?
1st convert your time column to datatime in pandas by using to_datetime, then we using groupby
df.time=pd.to_datetime(df.time,format='%Y/%m/%d %H:%M')
GMonth=df.groupby(df.time.dt.strftime('%Y-%m')).mean()
First make sure that the data type for the time column is parsed correctly, use dtypes to verify it.
Next step would be just:
df.resample("M", how='mean')

Interpolate a date between two other dates to get a value

I have this pandas dataframe:
ISIN MATURITY PRICE
0 AR121489 Corp 29/09/2019 5.300
1 AR714081 Corp 29/12/2019 7.500
2 AT452141 Corp 29/06/2020 2.950
3 QJ100923 Corp 29/09/2020 6.662
My question is if there exists a way to interpolate a date in the column "MATURITY" and get the price value of that date. For example, If I select the date 18/11/2019, the value of the price on that date should be between 5.300 and 7.500. I don't know if what I am asking is possible but thank you so much for taking your time to read it and trying to help me.
What you can do if you wanted a daily frequency interpolated is first create a daily frequency range with your start and end-dates.
new_df = pd.DataFrame()
new_df["MATURITY"] = pd.date_range(start='29/09/2019', end='29/09/2020')
new_df = pd.concat([new_df,old_df], join="outer", axis=1)
new_df["PRICE"] = new_df["PRICE"].interpolate(method = "linear")
I would treat the dates as datetime objects and for interpolation convert the date from datetime object to some time-interval value i.e. either seconds since 20XX-XX-XX 00:00:00 or days and the same I would do for the output timemoments. After that the interpolation works also with NumPy interpolate method.
In matplotlib.dates there is a method date2num and also num2date worth to try.

Passing Quandl query to pandas dataframe

I'm using the Quandl database service API and its python support to download stock financial data.
Right now, I'm using the free SFO database which downloads year operational financial data.
For example, this query code passes the last 6-8 years of data for stock "CRM" to the dataframe.
df=quandl.get('SF0/CRM_REVENUE_MRY')
df
Out[29]:
Value
Date
2010-01-31 1.305583e+09
2011-01-31 1.657139e+09
2012-01-31 2.266539e+09
2013-01-31 3.050195e+09
2014-01-31 4.071003e+09
2015-01-31 5.373586e+09
2016-01-31 6.667216e+09
What I want to do with this is to recursively pass it a list of about 50 stocks and also grab 6-8 other columns from this database using different query codes appended on to the SFO/CRM_ part of the query.
qcolumns = ['REVUSD_MRY',
'GP_MRY',
'INVCAP_MRY',
'DEBT_MRY',
'NETINC_MRY',
'RETEARN_MRY',
'SHARESWADIL_MRY',
'SHARESWA_MRY',
'COR_MRY',
'FCF_MRY',
'DEBTUSD_MRY',
'EBITDAUSD_MRY',
'SGNA_MRY',
'NCFO_MRY',
'RND_MRY']
So, I think I need to:
a) run the query for each column and in each case append to the dataframe.
b) Add column names to the dataframe.
c) Create a dataframe for each stock (should this be a panel or a list of dataframes? (apologies as I'm new to Pandas and dataframes and am on my learning curve.
d) write to CSV
could you suggest or point me?
This code works to do two queries (two columns of data, both date indexed), renames the columns, and then concatenates them.
df=quandl.get('SF0/CRM_REVENUE_MRY')
df = df.rename(columns={'Value': 'REVENUE_MRY'})
dfnext=quandl.get('SF0/CRM_NETINC_MRY')
dfnext = dfnext.rename(columns={'Value': 'CRM_NETINC_MRY'})
frames = [df, dfnext]
dfcombine = pd.concat([df, dfnext], axis=1) # now question is how to add stock tag "CRM" to frame
dfcombine
Out[39]:
REVENUE_MRY CRM_NETINC_MRY
Date
2010-01-31 1.305583e+09 80719000.0
2011-01-31 1.657139e+09 64474000.0
2012-01-31 2.266539e+09 -11572000.0
2013-01-31 3.050195e+09 -270445000.0
2014-01-31 4.071003e+09 -232175000.0
2015-01-31 5.373586e+09 -262688000.0
2016-01-31 6.667216e+09 -47426000.0
I can add recursion to this to get all the columns (there are around 15) but how do I tag each frame for each stock? Use a key? Use a 3D panel? Thanks for helping a struggling python programmer!

Categories

Resources