I have a list of dataframes, each created from a unique web query;
bngimp = parse_forecast_data(get_json('419524'), None)
belimp = parse_forecast_data(get_json('419525'), None)
braimp = parse_forecast_data(get_json('419635'), None)
chilimp = parse_forecast_data(get_json('419526'), None)
chinimp = parse_forecast_data(get_json('419527'), None)
domimp = parse_forecast_data(get_json('419633'), None)
fraimp = parse_forecast_data(get_json('419636'), None)
greimp = parse_forecast_data(get_json('419528'), None)
ghaimp = parse_forecast_data(get_json('419638'), None)
indimp = parse_forecast_data(get_json('419530'), None)
indoimp = parse_forecast_data(get_json('419639'), None)
itaimp = parse_forecast_data(get_json('419533'), None)
japimp = parse_forecast_data(get_json('419534'), None)
kuwimp = parse_forecast_data(get_json('419640'), None)
litimp = parse_forecast_data(get_json('419641'), None)
meximp = parse_forecast_data(get_json('419537'), None)
I need to format each dataframe in the same way as follows;
bngimp = bngimp[['From Date','Sales Volume']]
bngimp = bngimp.set_index('From Date')
bngimp.index = pd.to_datetime(bngimp.index)
bngimp = bngimp.groupby(by=[bngimp.index.year, bngimp.index.month]).sum()
bngimp.columns = ['bngimp']
Is there any way I could loop through the name of dataframes without having to copy and paste each dataframe name into the above code?
There will be multiple more dataframes so the copying and pasting quite time consuming!
Any help is much appreciated;
I suggest create dictionary for map numbers by DataFrame names and create dictionary of DataFrame called out:
d = {'419524': 'bngimp', '419525': 'belimp', ...}
out = {}
for k, v in d.items():
df = parse_forecast_data(get_json(k), None)
df = df[['From Date','Sales Volume']]
df = df.set_index('From Date')
df.index = pd.to_datetime(df.index)
df = df.groupby(by=[df.index.year, df.index.month]).sum()
df.columns = [v]
out[v] = df
then for get DataFrame select by key:
print (out['bngimp'])
Also if want create one big DataFrame is possible use:
df = pd.concat(out, axis=1)
Related
How can I remove the utc portion of a DF created from a yfinance? Every example I and approach I seen has failed.
eg:
df = yf.download('2022-01-01', '2023-01-06', interval = '60m' )
pd.to_datetime(df['Datetime'])
error: 3806 #If we have a listlike key, _check_indexing_error will raise
KeyError: 'Datetime'
As well as the following approaches
enter code heredf = df.reset_index()
df = pd.DataFrame(df, columns = ['Datetime', "Close"])
df.rename(columns = {'Date': 'ds'}, inplace = True)
df.rename(columns = {'Close':'y'}, inplace = True)
#df['ds'] = df['ds'].dt.date
#df['ds'] = datetime.fromtimestamp(df['ds'], tz = None)
#df['ds'] = df['ds'].dt.floor("Min")
#df['ds'] = pd.to_datetime(df['ds'].dt.tz_convert(None))
#df['ds'] = pd.to_datetime['ds']
#pd.to_datetime(df['ds'])
df['ds'].dt.tz_localize(None)
print(df)
with similar errors, Any help or pointer will greatly appreciated I have spent the entire morning on this.
Thanks in advance
BTT
Your code interprets '2022-01-01' as the first and required argument tickers.
This date is not a valid ticker, so yf.download() does not return any price and volume data.
Try:
df = yf.download(tickers='AAPL', start='2022-01-01', end='2023-01-06', interval = '60m' )
df.index = df.index.tz_localize(None)
I'm fetching data from a Google sheet:
values1 = pd.DataFrame(values)
aux = values1.head(1)
values1.drop(index={0}, inplace=True)
senal1 = (values1[2] == "SEÑAL")
senal = values1[senal1]
senal.dropna(axis=1, inplace=True)
print(senal)
This is my result after running the code:
I have this data entry:
[{'id': 2269396, 'from': 1647086100, 'at': 1647086160000000000, 'to': 1647086160, 'open': 1.072652, 'close': 1.072691, 'min': 1.072641, 'max': 1.072701, 'volume': 0},..]
Apllying this indexing pandas:
current = self.getAllCandles(self.active_id,start_candle)
main = pd.DataFrame()
useful_frame = pd.DataFrame()
for candle in current:
useful_frame = pd.DataFrame(list(candle.values()),index = list(candle.keys())).T.drop(columns = ['at'])
useful_frame = useful_frame.set_index(useful_frame['from']).drop(columns = ['id'])
main = main.append(useful_frame)
main.drop_duplicates()
final_data = main.drop(columns = {'to'})
final_data = final_data.loc[~final_data.index.duplicated(keep = 'first')]
return final_data
After that I have the following result:
from open close min max volume
from
1.647086e+09 1.647086e+09 1.072652 1.072691 1.072641 1.072701 0.0
... ... ... ... ... ... ...
Since df.append() will be deprecated, I'm struggling to execute the same instructions using df.concat(). But I'm not getting it, how could I change that?
Thank you all, I made a small modification to the code suggested by our friend Stuart Berg #stuart-berg, and it was perfect:
current = self.getAllCandles(self.active_id, start_candle)
frames = []
useful_frame = pd.DataFrame.from_dict(current, orient='columns')
useful_frame = useful_frame.set_index('from')
useful_frame = useful_frame.drop(columns=['at', 'id'])
frames.append(useful_frame)
main = pd.concat(frames).drop_duplicates()
final_data = main.drop(columns='to')
final_data = final_data.loc[~final_data.index.duplicated()]
return final_data
I think this is what you're looking for:
current = self.getAllCandles(self.active_id, start_candle)
frames = []
for candle in current:
useful_frame = pd.DataFrame.from_dict(candle, orient='columns')
#useful_frame['from'] = datetime.datetime.fromtimestamp(int(useful_frame['from'])).strftime('%Y-%m-%d %H:%M:%S')
useful_frame = useful_frame.set_index('from')
useful_frame = useful_frame.drop(columns=['at', 'id'])
frames.append(useful_frame)
main = pd.concat(frames).drop_duplicates()
final_data = main.drop(columns='to')
final_data = final_data.loc[~final_data.index.duplicated()]
Create an empty python list and then append all the series to the list. Finally call pandas' concat on that list, this will give you that dataframe.
How can I simplify this function I am trying to create? I would like to pull data from a csv. Turn it into a Dataframe, randomly select a choice, add that choice to a corresponding dictionary key value pair.
def generate_traits():
import pandas as pd
df_bonds = pd.read_csv('/file/location_1')
df_alignments = pd.read_csv('/file/location_2')
df_faiths = pd.read_csv('/file/location_3')
df_flaws = pd.read_csv('/file/location_4')
df_ideals = pd.read_csv('/file/location_5')
df_lifestyles = pd.read_csv('/file/location_6')
df_organizations = pd.read_csv('/file/location_7')
df_personalities = pd.read_csv('/file/location_8')
df_names = pd.read_csv("/file/location_9")
random_bond = df_bonds.sample(1)
random_alignment = df_alignments.sample(1)
random_faith = df_faiths.sample(1)
random_flaw = df_flaws.sample(1)
random_ideal = df_ideals.sample(1)
random_lifestyle = df_lifestyles.sample(1)
random_organization = df_organizations.sample(1)
random_personaltiy = df_personalities.sample(1)
random_name = df_names.sample(1)
traits_dict={"Name:": random_name.iloc[0,0],
"Alignment:": random_alignment.iloc[0,0],
"Bond:":random_bond.iloc[0,0],
"Religion:":random_faith.iloc[0,0],
"Flaw:":random_flaw.iloc[0,0],
"Ideal:":random_ideal.iloc[0,0],
"Lifestyle:":random_lifestyle.iloc[0,0],
"Organization:":random_organization.iloc[0,0],
"Personality:":random_personaltiy.iloc[0,0]}
return traits_dict
The function does behave as expected however, I know there must be a way to loop through this I just have not found any way to do so.
You can chain your operations:
import pandas as pd
def generate_traits():
return {'Name': pd.read_csv('/file/location_1').sample(1).iloc[0,0],
'Alignment:': pd.read_csv('/file/location_2').sample(1).iloc[0,0],
'Bond': pd.read_csv('/file/location_3').sample(1).iloc[0,0],
'Religion': pd.read_csv('/file/location_4').sample(1).iloc[0,0],
'Flaw': pd.read_csv('/file/location_5').sample(1).iloc[0,0],
'Ideal': pd.read_csv('/file/location_6').sample(1).iloc[0,0],
'Lifestyle': pd.read_csv('/file/location_7').sample(1).iloc[0,0],
'Organization': pd.read_csv('/file/location_8').sample(1).iloc[0,0],
'Personality': pd.read_csv('/file/location_9').sample(1).iloc[0,0]}
def generate_traits():
import pandas as pd
name_location = {'Bond': 'location_1'
'Alignment': 'location_2'
'Religion': 'location_3'
'Flaw': 'location_4'
'ideals': 'location_5'
'Lifestyle': 'location_6'
'Organization': 'location_7'
'Personality': 'location_8'
'Name': 'location_9'}
all_df = {name: pd.read_csv(f'/file/{loc}') for name, loc in name_location.items()}
traits_dict = {name: df.sample(1).iloc[0, 0] for name, df in all_df.items()}
return traits_dict
I have to pass locations to API to retrieve values.
Working Code
dfs = []
locations = ['ZRH','SIN']
for loc in locations:
response = requests.get(f'https://risk.dev.tyche.eu-central-1.aws.int.kn/il/risk/location/{loc}', headers=headers, verify=False)
if 'items' in data:
df = pd.json_normalize(data, 'items', 'totalItems')
df1 = pd.concat([pd.DataFrame(x) for x in df.pop('relatedEntities')], keys=df.index).add_prefix('relatedEntities.')
df3 = df.join((df1).reset_index(level=1, drop=True))
dfs.append(df3)
df = pd.concat(dfs, ignore_index=True)
Failing Code ( while passing as parameter)
When I try to pass location as parameter which is created another dataframe column it fails.
Unique_Location = data['LOCATION'].unique()
Unique_Location = pd.DataFrame( list(zip(Unique_Location)), columns =['Unique_Location'])
t= ','.join(map(repr,Unique_Location['Unique_Location'] ))
locations = [t]
for loc in locations:
response = requests.get(f'https://risk.dev.logindex.com/il/risk/location/{loc}', headers=headers)
data = json.loads(response.text)
df = pd.json_normalize(data, 'items', 'totalItems')
What is wrong in my code?
Error
`c:\users\ashok.eapen\pycharmprojects\rs-components\venv\lib\site-packages\pandas\io\json\_normalize.py in _pull_records(js, spec)
246 if has non iterable value.
247 """
--> 248 result = _pull_field(js, spec)
249
250 # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not
c:\users\ashok.eapen\pycharmprojects\rs-components\venv\lib\site-packages\pandas\io\json\_normalize.py in _pull_field(js, spec)
237 result = result[field]
238 else:
--> 239 result = result[spec]
240 return result
241
KeyError: 'items'
`
You can test if items exist in json like:
dfs = []
locations = ['NZAKL', 'NZ23-USBCH', 'DEBAD', 'ARBUE', 'AR02_GSTI', 'AEJEA', 'UYMVD', 'UY03', 'AE01_GSTI', 'TH02_GSTI', 'JO01_GSTI', 'ITSIM', 'GB75_GSTI', 'DEAMA', 'DE273_GSTI', 'ITPRO', 'AT07_GSTI', 'FR05', 'FRHAU', 'FR01_GSTI', 'FRHER', 'ES70X-FRLBM', 'THNEO']
for loc in locations:
response = requests.get(f'https://risk.dev.logindex.com/il/risk/location/{loc}', headers=headers)
data = json.loads(response.text)
if 'items' in data:
if len(data['items']) > 0:
df = pd.json_normalize(data, 'items', 'totalItems')
#NaN in column, so failed - replace NaN to empty list
f = lambda x: x if isinstance(x, list) else []
df['raw.identifiers'] = df['raw.identifiers'].apply(f)
df['raw.relationships'] = df['raw.relationships'].apply(f)
df1 = pd.concat([pd.DataFrame(x) for x in df.pop('raw.identifiers')], keys=df.index).add_prefix('raw.identifiers.')
df2 = pd.concat([pd.DataFrame(x) for x in df.pop('raw.relationships')], keys=df.index).add_prefix('raw.relationships.')
df3 = df.join(df1.join(df2).reset_index(level=1, drop=True))
dfs.append(df3)
df = pd.concat(dfs, ignore_index=True)