How can I sort timestamp from following data dictionary?

How can I sort timestamp from following data dictionary? - python

Code:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
data_=pd.DataFrame(bdata)
print(data_)
data=pd.to_datetime(data_[prices],unit='ms')
print(data)
Output:
Requirement:
But I required output in which 4 columns:
Timestamp, Prices, Market_caps, Total_volume
And I want to change the timestamp format into to_datetime
In the above codes, I just sort the bitcoin data from pycoingecko
Example:

You can convert this into a dataframe format like this:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
prices = pd.DataFrame(bdata['prices'], columns=['TimeStamp', 'Price']).set_index('TimeStamp')
market_caps = pd.DataFrame(bdata['market_caps'], columns=['TimeStamp', 'Market Cap']).set_index('TimeStamp')
total_volumes = pd.DataFrame(bdata['total_volumes'], columns=['TimeStamp', 'Total Volumes']).set_index('TimeStamp')
# combine the separate dataframes
df_market = pd.concat([prices, market_caps, total_volumes], axis=1)
# convert the index to a datetime dtype
df_market.index = pd.to_datetime(df_market.index, unit='ms')
Code adapted from this answer.

You can extract the timestamp column and convert it into date as following with minimum change to your code, you can follow up by merging the new column to your array:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
data_=pd.DataFrame(bdata)
print(data_)
#data=pd.to_datetime(data_["prices"],unit='ms')
df = pd.DataFrame([pd.Series(x) for x in data_["prices"]])
df.columns = ["timestamp","data"]
df=pd.to_datetime(df["timestamp"],unit='ms')
print(df)

Related

filter out observation of a column which start with values of a list

I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'code': ['52511', '52512', '12525', '13333']})
and the following list:
list = ['525', '13333']
I want to consider only the observations of df that start witht the element of list.
Desired output:
import pandas as pd
df = pd.DataFrame({'code': ['52511', '52512', '13333']})

The startswith function supports tuple type. You can convert list to tuple.
listt = ['525', '13333']
df=df[df['code'].str.startswith(tuple(listt))]
df
'''
code
0 52511
1 52512
3 13333
'''

how to sum of columns based on another column value of excel

I would like to ask how to sum using python or excel.
Like to do summation of "number" columns based on "time" column.
Sum of the Duration for (00:00 am - 00:59 am) is (2+4) 6.
Sum of the Duration for (02:00 am - 02:59 am) is (3+1) 4.
Could you please advise how to ?

When you have a dataframe you can use groupby to accomplish this:
# import pandas module
import pandas as pd
# Create a dictionary with the values
data = {
'time' : ["12:20:51", "12:40:51", "2:26:35", "2:37:35"],
'number' : [2, 4, 3, 1]}
# create a Pandas dataframe
df = pd.DataFrame(data)
# or load the CSV
df = pd.read_csv('path/dir/filename.csv')
# Convert time column to datetime data type
df['time'] = df['time'].apply(pd.to_datetime, format='%H:%M:%S')
# add values by hour
dff = df.groupby(df['time'].dt.hour)['number'].sum()
print(dff.head(50))
output:
time
12 6
2 4
When you need more than one column. You can pass the columns as a list inside .groupby(). The code will look like this:
import pandas as pd
df = pd.read_csv('filename.csv')
# Convert time column to datetime data type
df['time'] = df['time'].apply(pd.to_datetime, format='%H:%M:%S')
df['date'] = df['date'].apply(pd.to_datetime, format='%d/%m/%Y')
# add values by hour
dff = df.groupby([df['date'], df['time'].dt.hour])['number'].sum()
print(dff.head(50))
# save the file
dff.to_csv("filename.csv")

Filter particular date in a DF column

I want to filter particular date in a DF column.
My code:
df
df["Crawl Date"]=pd.to_datetime(df["Crawl Date"]).dt.date
date=pd.to_datetime("03-21-2020")
df=df[df["Crawl Date"]==date]
It is showing no match.
Note: df column is having time also with date which need to be trimmed.
Thanks in advance.

The following script assumes that the 'Crawl Dates' column contains strings:
import pandas as pd
import datetime
column_names = ["Crawl Date"]
df = pd.DataFrame(columns = column_names)
#Populate dataframe with dates
df.loc[0] = ['03-21-2020 23:45:57']
df.loc[1] = ['03-22-2020 23:12:33']
df["Crawl Date"]=pd.to_datetime(df["Crawl Date"]).dt.date
date=pd.to_datetime("03-21-2020")
df=df[df["Crawl Date"]==date]
Then df returns:
Crawl Date 0 2020-03-21

Element-wise maximum with date values

I have a dataframe with date values and would like to manipulate them to 1 Jan or later. Since I need to do this element-wise, I use np.maximum(). The code below however gives
TypeError: Cannot compare type 'Timestamp' with type 'int'.
What's the appropriate method to deal with this kind of data type?
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': np.arange('1999-12', '2000-02', dtype='datetime64[D]')})
df['corrected_date'] = np.maximum(pd.to_datetime('20000101', format='%Y%m%d'), df['date'])

For me working comparing with Series:
s = pd.Series(pd.to_datetime('20000101', format='%Y%m%d'), index=df.index)
df['corrected_date'] = np.maximum(s, df['date'])
Or with DatetimeIndex:
i = np.repeat(pd.to_datetime(['20000101'], format='%Y%m%d'), len(df))
df['corrected_date'] = np.maximum(i, df['date'])

Why can't I search for a row in a pandas df using a date as part of a tuple index?

I am trying to search a pandas df I made which has a tuple as an index. The first part of the tuple is a date and the second part is a forex pair. I've tried a few things but I can't seem to search using a date-formatted string as part of a tuple with .loc or .ix
My df looks like this:
Open Close
(11-01-2018, AEDAUD) 0.3470 0.3448
(11-01-2018, AEDCAD) 0.3415 0.3408
(11-01-2018, AEDCHF) 0.2663 0.2656
(11-01-2018, AEDDKK) 1.6955 1.6838
(11-01-2018, AEDEUR) 0.2277 0.2261
Here is the complete code :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
forex_11 = pd.read_csv('FOREX_20180111.csv', sep=',', parse_dates=['Date'])
forex_12 = pd.read_csv('FOREX_20180112.csv', sep=',', parse_dates=['Date'])
time_format = '%d-%m-%Y'
forex = forex_11.append(forex_12, ignore_index=False)
forex['Date'] = forex['Date'].dt.strftime(time_format)
GBP = forex[forex['Symbol'] == "GBPUSD"]
forex.index = list(forex[['Date', 'Symbol']].itertuples(index=False, name=None))
forex_open_close = pd.DataFrame(np.array(forex[['Open','Close']]), index=forex.index)
forex_open_close.columns = ['Open', 'Close']
print(forex_open_close.head())
print(forex_open_close.ix[('11-01-2018', 'GBPUSD')])
How do I get the row which has index ('11-01-2018', 'GBPUSD') ?

Can you try putting the tuple in a list using brackets?
Like this:
print(forex_open_close.ix[[('11-01-2018', 'GBPUSD')]])

I would recommend using the Pandas multiIndex. In your case you could do the following:
tuples = list(data[['Date', 'Symbol']].itertuples(index=False, name=None))
data.index = pd.MultiIndex.from_tuples(tuples, names=['Date', 'Symbol'])
# And then to index
data.loc['2018-01-11', 'AEDCAD']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I sort timestamp from following data dictionary? - python

Related

filter out observation of a column which start with values of a list

how to sum of columns based on another column value of excel

Filter particular date in a DF column

Element-wise maximum with date values

Why can't I search for a row in a pandas df using a date as part of a tuple index?

Categories

Resources