I am using investpy to get historical stock data for 2 stocks ( TRP_pb , TRP_pc )
import investpy
import pandas as pd
import numpy as np
TRP_pb = investpy.get_stock_historical_data(stock='TRP_pb',
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
print(TRP_pb.head())
TRP_pc = investpy.get_stock_historical_data(stock='TRP_pc',
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
print(TRP_pc.head())
I can append the two tables by using the append method
appendedtable = TRP_pb.append(TRP_pc, ignore_index=False)
What I am trying to do is to use a loop function in order to combine these two tables
Here is what I have tried so far
preferredlist = ['TRP_pb','TRP_pc']
for i in preferredlist:
new = investpy.get_stock_historical_data(stock=i,
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
new.append(new, ignore_index=True)
However this doesnt work.
I would appreciate any help
Since get_stock_historical_data returns a DataFrame, you can create an empty dataframe before the for loop and concat in the loop.
preferredlist = ['TRP_pb','TRP_pc']
final_list = pd.DataFrame()
for i in preferredlist:
new = investpy.get_stock_historical_data(stock=i,
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
final_list = pd.concat([final_list, new])
Related
I have a dataframe
import pandas as pd
df = pd.DataFrame({'product':['shoe','shirt','pants','socks'],
'review_rating':[1.2,3.0,4.0,2.1],
'review_text':['good','bad','good','bad']})
good_reviews = []
print(df)
I want to be able to append my review_text values to the list using a conditional statement.
I tried this:
for column in df[['reviews.rating', 'reviews.text']]:
if df[df['reviews.rating']] <= 2.0:
good_reviews.append(df['reviews.text'])
After trying that I got an error:
KeyError: None of [Index(['reviews.rating', 'reviews.text'], dtype='object')] are in the [columns]
import pandas as pd
df = pd.DataFrame({'product':['shoe','shirt','pants','socks'],
'review_rating':[1.2,3.0,4.0,2.1],
'review_text':['good','bad','good','bad']})
good_reviews = df.loc[df["review_rating"] <= 2.0,'review_text']
print(good_review)
import pandas as pd
nba = pd.read_csv("nba.csv")
names = pd.Series(nba['Name'])
data = nba['Salary']
nba_series = (data, index=[names])
print(nba_series)
Hello I am trying to convert the columns 'Name' and 'Salary' into a series from a dataframe. I need to set the names as the index and the salaries as the values but i cannot figure it out. this is my best attempt so far anyone guidance is appreciated
I think you are over-thinking this. Simply construct it with pd.Series(). Note the data needs to be with .values, otherwis eyou'll get Nans
import pandas as pd
nba = pd.read_csv("nba.csv")
nba_series = pd.Series(data=nba['Salary'].values, index=nba['Name'])
Maybe try set_index?
nba.set_index('name', inlace = True )
nba_series = nba['Salary']
This might help you
import pandas as pd
nba = pd.read_csv("nba.csv")
names = nba['Name']
#It's automatically a series
data = nba['Salary']
#Set names as index of series
data.index = nba_series
data.index = names might be correct but depends on the data
I am new to Python so please bear with me.
I am trying to convert what I think may be a nested dictionary into a csv that I can export. Below is my code:
import pandas as pd
import os
from fbprophet import Prophet
# Read in File
df1 = pd.read_csv('File_Path.csv')
#Create Loop to Forecast Multiple SKUs
def get_prediction(df):
prediction = {}
df1 = df.rename(columns={'Date': 'ds','qty_ordered': 'y', 'item_no': 'item'})
list_items = df1.item.unique()
for item in list_items:
item_df = df1.loc[df1['item'] == item]
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(yearly_seasonality= True, seasonality_prior_scale=1.0)
my_model.fit(item_df)
future_dates = my_model.make_future_dataframe(periods=12, freq='M')
forecast = my_model.predict(future_dates)
prediction[item] = forecast
return prediction
# Save predictions to dictionary
df2 = get_prediction(df1)
# Convert dictionary
df3 = pd.DataFrame.from_dict(df3, index='columns)
So the last part of the code is where I am struggling. I need to convert the df2 dictionary to a dataframe (df3) so I can export it to a csv. But it looks as if it is a nested dictionary? Not sure if I need to update my function or not.
This is what a snippet of the dictionary looks like
I need to export it so it will look like this
Any help would be greatly appreciated!
The following code should help flattening df2 (dictionary of dataframes if I understand correctly).
def flatten(dict_of_df):
# insert column 'item'
for key, value in dict_of_df.items():
value['item'] = key
# return vertically concatenated dataframe with all the items
return pd.concat(dict_of_df.values())
I pull the data from the census api using the census wrapper, i would like to filter that data out with a list of zips i compiled.
So i am trying to filter the data from a pull request data of the census. I Have a csv file of the zip i want to use and i have it put into a list already. I have tried a few things such as putting the census in a data frame and trying to filter the zipcode column by my list but i dont think my syntax is correct.
this is just the test data i pulled,
census_data = c.acs5.get(('NAME', 'B25034_010E'),
{'for': 'zip code tabulation area:*'})
census_pd = census_pd.rename(columns={"NAME": "Name", "zip code tabulation area": "Zipcode"})
censusfilter = census_pd['Zipcode'==ziplst]
so i tried this way, and also i tried a for loop where i take census_pd['Zipcode'] and a inner for loop to iterate over the list with a if statement like zip1 == zip2 append to a list.
my dependencys
# Dependencies
import pandas as pd
import requests
import json
import pprint
import numpy as np
import matplotlib.pyplot as plt
import requests
from census import Census
import gmaps
from us import states
# Census & gmaps API Keys
from config import (api_key, gkey)
c = Census(api_key, year=2013)
# Configure gmaps
gmaps.configure(api_key=gkey)
as mentioned i want to filter out whatever data i may pull from the census data specific to the zipcodes i use
It's not clear how your data looks like. I am guessing that you have a scalar column and you want to filter that column using a list. If it is the question then you can use isin built in method to filter the dataframe.
import pandas as pd
data = {'col': [2, 3, 4], 'col2': [1, 2, 3], 'col3': ["asd", "ads", "asdf"]}
df = pd.DataFrame.from_dict(data)
random_list = ["asd", "ads"]
df_filtered = df[df["col3"].isin(random_list)]
The sample data isn't very clear, so below is how to filter a dataframe on a column using a list of values to filter by
import pandas as pd
from io import StringIO
# Example data
df = pd.read_csv(StringIO(
'''zip,some_column
"01234",A1
"01234",A2
"01235",A3
"01236",B1
'''), dtype = {"zip": str})
zips_list = ["01234", "01235"]
# using a join
zips_df = pd.DataFrame({"zip": zips_list})
df1 = df.merge(zips_df, how='inner', on='zip')
print(df1)
# using query
df2 = df.query('zip in #zips_list')
print(df2)
# using an index
df.set_index("zip", inplace=True)
df3=df.loc[zips_list]
print(df3)
Output in all cases:
zip some_column
0 01234 A1
1 01234 A2
2 01235 A3
I wrote an function which only depends on a dataframe. The functions output is also a dataframe. I would like make different dataframes according a condition and save them as different datasets with different names. However I couldnt save them as dataframes with different names. Instead i manually do the process. Is there a code which would do the same. It would be much beneficial.
import os
import numpy as np
import pandas as pd
data1 = pd.read_csv('C:/Users/Oz/Desktop/vintage/vintage1.csv', encoding='latin-1')
product_list= data1['product_types'].unique()
def vintage_table(df):
df['Disbursement_Date']=pd.to_datetime(df.Disbursement_Date)
df['Closing_Date']=pd.to_datetime(df.Closing_Date)
df['NPL_date']=pd.to_datetime(df.NPL_date, errors='ignore')
df['NPL_date_period']=df.loc[df.NPL_date > '2015-01-01', 'NPL_date'].apply(lambda x: x.strftime('%Y-%m'))
df['Dis_date_period'] = df.Disbursement_Date.apply(lambda x: x.strftime('%Y-%m'))
df['diff']=((df.NPL_date-df.Disbursement_Date) / np.timedelta64(3, 'M')).round(0)
df=df.groupby(['Dis_date_period','NPL_date_period']).agg({'Dis_amount' : 'sum', 'NPL_amount' : 'sum', 'diff' : 'mean'})
df.reset_index(level=0, inplace=True)
df['Vintage_Ratio']=df['NPL_amount']/df['Dis_amount']
table=pd.pivot_table(df,values='Vintage_Ratio',index='Dis_date_period',columns=['diff'],).fillna(0)
return
The above is the function
#for e in product_list:
# sub = data1[data1['product_types'] == e]
# print(sub)
consumer = data1[data1['product_types'] == product_list[0]]
mortgage = data1[data1['product_types'] == product_list[1]]
vehicle = data1[data1['product_types'] == product_list[2]]
table_con = vintage_table(consumer)
table_mor = vintage_table(mortgage)
table_veh = vintage_table(vehicle)
I would like to improve this part is there a better way to do the same process?
You could have your vintage_table() function return a dataframe instead of just modifying one dataframe over and over and that way you could do this in the second code block:
table_con = vintage_table(consumer)
table_mor = vintage_table(mortgage)
table_veh = vintage_table(vechicle)