How to combine 2 columns in pandas DataFrame? - python

Hello! This is a CSV table.I was trying to combine CSV output with Python to create Gantt Charts. Each column in CSV file means a date time, for example start1 is the hours and the start2 - minutes. After that, i use pd.to_datetime(data["start1"], format="%H") for the proper formatting. Same to the start2.
And here is the thing: how can i combine both this columns in pandas DataFrame to get one column in "%H-%M" format? Like data["start"]. Here is the data.head() output and code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import timedelta
#import data
data = pd.read_csv('TEST.csv')
#convert data str to "datetime" data
data["start1"] = pd.to_datetime(data["start1"], format="%H")
data["start2"] = pd.to_datetime(data["start2"], format="%M")
data["end1"] = pd.to_datetime(data["end1"], format="%H")
data["end2"] = pd.to_datetime(data["end2"], format="%M")

Try:
data["start"] = pd.to_datetime(data["start1"].astype(str).str.pad(2, fillchar="0") +
data["start2"].astype(str).str.pad(2, fillchar="0"),
format="%H%M")
data["end"] = pd.to_datetime(data["end1"].astype(str).str.pad(2, fillchar="0") +
data["end2"].astype(str).str.pad(2, fillchar="0"),
format="%H%M")

Before you change the data types to date time you can add an additional column like this:
data["start"] = data["start1"] + '-' + data["start2"]
data["start"] = pd.to_datetime(data["start"], format="%H-%M")
# then do the other conversions.

Related

Reset index and present data in table format

I am using the following code -
import pandas as pd
from mftool import Mftool
import pandas as pd
import os
import time
mf = Mftool()
data =mf.get_scheme_historical_nav('138564',as_Dataframe=True)
data = data.rename_axis("Date",index= False)`
`
The above mentioned code gives me data in the following format -
enter image description here
Clearly, Date has been set to index, but i want to
keep 'Date' column in my df without categorizing it as index.
change dd-mm-yyyy to yyyy-mm-dd
can anybody help, thank you!
I tried using following, but it was not useful -
'data = data.set_index(to_datetime(data['Date']))
'data.d['Date'] = pd.to_datetime(data['Dateyour text'])'`

Refresh a Panda dataframe while printing using loop

How can i print a new dataframe and clear the last printed dataframe while using a loop?
So it wont show all dataframes just the last one in the output?
Using print(df, end="\r") doesn't work
import pandas as pd
import numpy as np
while True:
df = pd.DataFrame(np.random.rand(10,10))
print(df)
If i get live data from an api to insert into the df, i'll use the while loop to constantly update the data. But how can i print only the newest dataframe instead of printing all the dataframes underneath each other in the output?
If i use the snippet below it does work, but i think there should be a more elegant solution.
import pandas as pd
import numpy as np
Height_DF = 10
Width_DF = 10
while True:
df = pd.DataFrame(np.random.rand(10,10))
print(df)
for i in range(Height_DF + 1):
sys.stdout.write("\033[F")
try this:
import pandas as pd
import numpy as np
import time
import sys
while True:
df = pd.DataFrame(np.random.rand(10,10))
print(df)
sys.stdout.write("\033[F")
time.sleep(1)

How to filter for dates range in timeseries or dataframe using python

Still a newbie with Python just trying to learn this stuff. Appreciate any help.
Right now when I connect to Alpha Vantage I get the full range of data for all the dates and it looks like this
I found some good sources for guides, but I keep getting empty dataframes or errors
This is how the code looks so far
import pandas as pd
from pandas import DataFrame
import datetime
from datetime import datetime as dt
from alpha_vantage.timeseries import TimeSeries
import numpy as np
stock_ticker = 'SPY'
api_key = open('/content/drive/My Drive/Colab Notebooks/key').read()
ts = TimeSeries (key=api_key, output_format = "pandas")
data_daily, meta_data = ts.get_daily_adjusted(symbol=stock_ticker, outputsize ='full')
#data_date_changed = data[:'2019-11-29']
data = pd.DataFrame(data_daily)
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2015,month=2,day=1)]
The answer for this is
stock_ticker = 'SPY'
api_key = 'apikeyddddd'
ts = TimeSeries (key=api_key, output_format = "pandas")
data_daily, meta_data = ts.get_daily_adjusted(symbol=stock_ticker, outputsize ='full')
test = data_daily[(data_daily.index > '2014-01-01') & (data_daily.index <= '2017-08-15')]
print(data_daily)
print(test)
import datetime
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]
In my experience, you can't pass a simple string inside loc, needs to be a datetime object.

Count diff with datetimes and today and group it in one month frequency using Pandas

I have the following format of data in a csv:
1,2015-02-01
The format is
<internal_id>,<datetime>
I want to ignore the internal id, and use the datetime (if posible even not read it from the csv to save memory).
And what I want is to plot a histogram of the difference in months of the dates in the file and today, each bar of the histogram being a month.
The process in pseudo-code is:
1) Calculate de difference in month of each row in the file and today
2) Accumulate that differences in buckets of one month
3) Plot in a histogram or something similar
For now I have made this code in a jupyter notebook with python3:
from io import StringIO
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
% matplotlib notebook
text = """1,2015-01-01
1,2015-02-01
1,2015-02-01
1,2015-03-01
1,2015-03-01
1,2015-03-01
1,2015-04-01
1,2015-04-01
1,2015-04-01
1,2015-04-01"""
plt.subplots()
def diff(row_date):
today = datetime.now()
return (today.year - row_date.year) * 12 + (today.month - row_date.month)
df = pd.read_csv(StringIO(text), usecols=[1], header=None, names=['date'], parse_dates=['date'])
serie = df.date
serie = serie.apply(diff)
serie.hist()
Is there a more elegant way to do it using built-in function to group and calculate the difference of time using Pandas? (or faster)
Thanks!
from StringIO import StringIO
import pandas as pd
text = """1,2015-01-18
1,2015-02-10
1,2015-02-15
1,2015-02-20
1,2015-03-01
1,2015-03-02
1,2015-03-03"""
df = pd.read_csv(StringIO(text), header=None, parse_dates=[1], names=['count', 'Date'], index_col=1)
df.groupby(pd.TimeGrouper('M')).count().hist()

pandas - Joining CSV time series into a single dataframe

I'm trying to get 4 CSV files into one dataframe. I've looked around on the web for examples and tried a few but they all give errors. Finally I think I'm onto something, but it gives unexpected results. Can anybody tell me why this doesn't work?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 24*365*4
dates = pd.date_range('20120101',periods=n,freq='h')
df = pd.DataFrame(np.random.randn(n,1),index=dates,columns=list('R'))
#df = pd.DataFrame(index=dates)
paths = ['./LAM DIV/10118218_JAN_LAM_DIV_1.csv',
'./LAM DIV/10118218_JAN-APR_LAM_DIV_1.csv',
'./LAM DIV/10118250_JAN_LAM_DIV_2.csv',
'./LAM DIV/10118250_JAN-APR_LAM_DIV_2.csv']
for i in range(len(paths)):
data = pd.read_csv(paths[i], index_col=0, header=0, parse_dates=True)
df.join(data['TempC'])
df.head()
Expected result:
Date Time R 0 1 2 3
Getting this:
Date Time R
You need to save the result of your join:
df = df.join(data['TempC'])

Categories

Resources