How to create Dynamic Dataframe in Pandas - python

lst = ['SymbolA','SymbolB', 'SymbolC' .... 'SymbolN']
I want to create dynamic Dataframe in Python Pandas.
for i in lst:
data = SomeFunction(lst[i]) # This will return dataframe of 10 x 100
lst[i]+str(i) = pd.DataFrame(data)
pd.Concat(SymbolA1,SymbolB1,SymbolC1,SymbolD1)
Anyone can help on how to create the dataframe dynamically to achieve as per the requirements?

I hope this will help, as i understood from this.
gbl = globals()
lst = ['SymbolA','SymbolB', 'SymbolC' .... 'SymbolN']
for i in lst:
data = SomeFunction(lst[i])
gbl[lst[i]+str(i)] = pd.Dataframe(data)
this will create a df dynamically . for accessing those df you need to run code like this.
gbl[lst[i]+str(i)]
try this..

You input has to be like below:
lst = ({'data':['SymbolA','SymbolB', 'SymbolC', 'SymbolN']})
print pd.DataFrame(lst)

Related

Normalize json column and join with rest of dataframe

This is my first question here on stackoverflow so please don't roast me.
I was trying to find similar problems on the internet and actually there are several, but for me the solutions didn't work.
I have created this dataframe:
import pandas as pd
from ast import literal_eval
d = {'order_id': [1], 'email': ["hi#test.com"], 'line_items': ["[{'sku':'testproduct1', 'quantity':'2'},{'sku':'testproduct2','quantity':'2'}]"]}
orders = pd.DataFrame(data=d)
It looks like this:
order_id email line_items
1 hi#test.com [{'sku':'testproduct1', 'quantity':'2'},{'sku':'testproduct2','quantity':'2'}]
I want the dataframe to look like this:
order_id email line_items.sku line_items.quantity
1 hi#test.com testproduct1 2
1 hi#test.com testproduct2 2
I used the following code to change the type of line_items from string to dict:
orders.line_items = orders.line_items.apply(literal_eval)
Normally I would use json_normalize now to flatten the line_items column. But I also want to keep the id and don't know how to do that. I also want to avoid any loops.
Is there anyone who can help me with this issue?
Kind regards
joant95
If your dictionary really is that strange, then you could try:
d['line_items'] = eval(d['line_items'][0])
df = pd.json_normalize(d, record_path=['line_items'], meta=['order_id', 'email'])
To create d out of orders you could try:
d = orders.to_dict(orient='list')
Or you could try:
orders.line_items = orders.line_items.map(eval)
d = orders.to_dict(orient='records')
df = pd.json_normalize(d, record_path=['line_items'], meta=['order_id', 'email'])
But: I still don't have a clear picture of the situation :)

Create Forecasts Looping over SKUs and Export to CSV using Facebook Prophet

I am new to Python so please bear with me.
I am trying to convert what I think may be a nested dictionary into a csv that I can export. Below is my code:
import pandas as pd
import os
from fbprophet import Prophet
# Read in File
df1 = pd.read_csv('File_Path.csv')
#Create Loop to Forecast Multiple SKUs
def get_prediction(df):
prediction = {}
df1 = df.rename(columns={'Date': 'ds','qty_ordered': 'y', 'item_no': 'item'})
list_items = df1.item.unique()
for item in list_items:
item_df = df1.loc[df1['item'] == item]
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(yearly_seasonality= True, seasonality_prior_scale=1.0)
my_model.fit(item_df)
future_dates = my_model.make_future_dataframe(periods=12, freq='M')
forecast = my_model.predict(future_dates)
prediction[item] = forecast
return prediction
# Save predictions to dictionary
df2 = get_prediction(df1)
# Convert dictionary
df3 = pd.DataFrame.from_dict(df3, index='columns)
So the last part of the code is where I am struggling. I need to convert the df2 dictionary to a dataframe (df3) so I can export it to a csv. But it looks as if it is a nested dictionary? Not sure if I need to update my function or not.
This is what a snippet of the dictionary looks like
I need to export it so it will look like this
Any help would be greatly appreciated!
The following code should help flattening df2 (dictionary of dataframes if I understand correctly).
def flatten(dict_of_df):
# insert column 'item'
for key, value in dict_of_df.items():
value['item'] = key
# return vertically concatenated dataframe with all the items
return pd.concat(dict_of_df.values())

ValueError: arrays must all be same length - print dataframe to CSV

thanks for stopping by! I was hoping to get some help creating a csv using pandas dataframe. Here is my code:
a = ldamallet[bow_corpus_new[:21]]
b = data_text_new
print(a)
print("/n")
print(b)
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
'topic_0': a[0][1],
'topic_1': a[1][1],
'topic_2': a[2][1],
'topic_3': a[3][1],
'topic_4': a[4][1],
'topic_5': a[5][1],
'topic_6': a[6][1],
'topic_7': a[7][1],
'topic_8': a[8][1],
'topic_9': a[9][1],
'topic_10': a[10][1],
'topic_11': a[11][1],
'topic_12': a[12][1],
'topic_13': a[13][1],
'topic_14': a[14][1],
'topic_15': a[15][1],
'topic_16': a[16][1],
'topic_17': a[17][1],
'topic_18': a[18][1],
'topic_19': a[19][1]}
print(d)
df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False)
The data:
print(a): the format is in tuples
[[(topic number: 0, topic percentage),...(19, #)], [(topic distribution for next row, #)...(19, .819438),...(#,#),...]
print(b)
Here is my error:
This is the size of the dataframe:
This is what I wished it looked like:
Any help would be greatly appreciated :)
It might be easiest to get the second value of each tuple for all of the rows in it's own list. Something like this
topic_0=[]
topic_1=[]
topic_2=[]
...and so on
for i in a:
topic_0.append(i[0][1])
topic_1.append(i[1][1])
topic_2.append(i[2][1])
...and so on
Then you can make your dictionary like so
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
'topic_0': topic_0,
'topic_1': topic_1,
etc. }
I took #mattcremeens advice and it worked. I've posted the full code below. He was right about nixing the tuples my previous code wasn't iterating through the rows but only printed the first row.
topic_0=[]
topic_1=[]
topic_2=[]
topic_3=[]
topic_4=[]
topic_5=[]
topic_6=[]
topic_7=[]
topic_8=[]
topic_9=[]
topic_10=[]
topic_11=[]
topic_12=[]
topic_13=[]
topic_14=[]
topic_15=[]
topic_16=[]
topic_17=[]
topic_18=[]
topic_19=[]
for i in a:
topic_0.append(i[0][1])
topic_1.append(i[1][1])
topic_2.append(i[2][1])
topic_3.append(i[3][1])
topic_4.append(i[4][1])
topic_5.append(i[5][1])
topic_6.append(i[6][1])
topic_7.append(i[7][1])
topic_8.append(i[8][1])
topic_9.append(i[9][1])
topic_10.append(i[10][1])
topic_11.append(i[11][1])
topic_12.append(i[12][1])
topic_13.append(i[13][1])
topic_14.append(i[14][1])
topic_15.append(i[15][1])
topic_16.append(i[16][1])
topic_17.append(i[17][1])
topic_18.append(i[18][1])
topic_19.append(i[19][1])
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
'topic_0': topic_0,
'topic_1': topic_1,
'topic_2': topic_2,
'topic_3': topic_3,
'topic_4': topic_4,
'topic_5': topic_5,
'topic_6': topic_6,
'topic_7': topic_7,
'topic_8': topic_8,
'topic_9': topic_9,
'topic_10': topic_10,
'topic_11': topic_11,
'topic_12': topic_12,
'topic_13': topic_13,
'topic_14': topic_14,
'topic_15': topic_15,
'topic_16': topic_16,
'topic_17': topic_17,
'topic_18': topic_18,
'topic_19': topic_19}
df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False, mode = 'a')

Create dataframe in a loop

I would like to create a dataframe in a loop and after use these dataframe in a loop. I tried eval() function but it didn't work.
For example :
for i in range(5):
df_i = df[(df.age == i)]
There I would like to create df_0,df_1 etc. And then concatenate these new dataframe after some calculations :
final_df = pd.concat(df_0,df_1)
for i in range(2:5):
final_df = pd.concat(final_df, df_i)
You can create a dict of DataFrames x and have is as dict keys:
np.random.seed(42)
df = pd.DataFrame({'age': np.random.randint(0, 5, 20)})
x = {}
for i in range(5):
x[i] = df[df['age']==i]
final = pd.concat(x.values())
Then you can refer to individual DataFrames as:
x[1]
Output:
age
5 1
13 1
15 1
And concatenate all of them with:
pd.concat(x.values())
Output:
age
18 0
5 1
13 1
15 1
2 2
6 2
...
The way is weird and not recommended, but it can be done.
Answer
for i in range(5):
exec("df_{i} = df[df['age']=={i}]")
def UDF(dfi):
# do something in user-defined function
for i in range(5):
exec("df_{i} = UDF(df_{i})")
final_df = pd.concat(df_0,df_1)
for i in range(2:5):
final_df = pd.concat(final_df, df_i)
Better Way 1
Using a list or a dict to store the dataframe should be a better way since you can access each dataframe by an index or a key.
Since another answer shows the way using dict (#perl), I will show you the way using list.
def UDF(dfi):
# do something in user-defined function
dfs = [df[df['age']==i] for i in range(i)]
final_df = pd.concat(map(UDF, dfs))
Better Way 2
Since you are using pandas.DataFrame, groupby function is a 'pandas' way to do what you want. (maybe, I guess, cause I don't know what you want to do. LOL)
def UDF(dfi):
# do something in user-defined function
final_df = df.groupby('age').apply(UDF)
Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

How to implement a select-like function

I got a dataset in python and the structure of it is like
Tree Species number of trunks
------------------------------
Acer rubrum 1
Quercus bicolor 1
Quercus bicolor 1
aabbccdd 0
and I have a question of can I implement a function similar to
Select sum(number of trunks)
from trees.data['Number of Trunks']
where x = trees.data["Tree Species"]
group by trees.data["Tree Species"]
in python? x is an array contains five elements:
x = array(['Acer rubrum', 'Acer saccharum', 'Acer saccharinum',
'Quercus rubra', 'Quercus bicolor'], dtype='<U16')
what I want to do is mapping each elements in x to trees.data["Tree Species"] and calculate the sum of number of trunks, it should return an array of
array = (sum_num(Acer rubrum), sum_num(Acer saccharum), sum_num(Acer saccharinum),
sum_num(Acer Quercus rubra), sum_num(Quercus bicolor))
Did you want to look at Python Pandas. That will allow you to do something like
df.groupby('Tree Species')['Number of Trunks'].sum()
Please note here df is whatever the variable name you read in your data frame. I would recommend you to look at pandas and lambda function too.
You can do something like this:
import pandas as pd
df = pd.DataFrame()
tree_species = ["Acer rubrum", "Quercus bicolor", "Quercus bicolor", "aabbccdd"]
no_of_trunks = [1,1,1,0]
df["Tree Species"] = tree_species
df["Number of Trunks"] = no_of_trunks
df.groupby('Tree Species').sum() #This will create a pandas dataframe
df.groupby('Tree Species')['Number of Trunks'].sum() #This will create a pandas series.
You can do the same thing by just using dictionaries too:
tree_species = ["Acer rubrum", "Quercus bicolor", "Quercus bicolor", "aabbccdd"]
no_of_trunks = [1,1,1,0]
d = {}
for key, trunk in zip(tree_species, no_of_trunks):
if not key in d.keys():
d[key] = 0
d[key] += trunk
print(d)

Categories

Resources