How can I convert this for loop to the dataframe? - python

If I use print ,I can print all datas.But when I use data= ,it just show me a value of i=2917. How can I convert this for loop to the dataframe.
import pandas as pd
df = pd.read_excel('C:/Users/aaaa/Desktop/rrrrr/twstock/1101.xlsx')
for i in range (1,2917):
data='{:.6%}'.format((df['close'][i]/df['close'][i-1])-1)

You reassign data in every iteration of your for loop. There data contains only the value for i = 2916.
How about creating a list and then appending your data to it inside the for loop?
data = []
for i in range(1,2917):
data.append('{:.6%}'.format((df['close'][i]/df['close'][i-1])-1))
print(data)

I would recommend using pandas vectorized methods for speed and cleanness:
df = pd.read_excel('C:/Users/aaaa/Desktop/rrrrr/twstock/1101.xlsx')
data = df["close"].pct_change()
Then you can change to a string representation list if desired by doing something like:
string_list = ['{:.6%}'.format(x) for x in data.tolist()[1:]]

DON'T loop through the dataframe like kalehmann suggested, it's very inefficient. You can either call data = df["close"].pct_change() as Sven suggested, or if you want to use a similar function to the one you defined:
data = df['first']/df['first'].shift(1)-1
And then you can run:
data_list = ['{:.6%}'.format(x) for x in data.tolist()]

Related

Iterate through different dataframes and apply a function to each one

I have 4 different dataframes containing time series data that all have the same structure.
My goal is to take each individual dataframe and pass it through a function I have defined that will group them by datestamp, sum the columns and return a new dataframe with the columns I want. So in total I want 4 new dataframes that have only the data I want.
I just looked through this post:
Loop through different dataframes and perform actions using a function
but applying this did not change my results.
Here is my code:
I am putting the dataframes in a list so I can iterate through them
dfs = [vds, vds2, vds3, vds4]
This is my function I want to pass each dataframe through:
def VDS_pre(df):
df = df.groupby(['datestamp','timestamp']).sum().reset_index()
df = df.rename(columns={'datestamp': 'Date','timestamp':'Time','det_vol': 'VolumeVDS'})
df = df[['Date','Time','VolumeVDS']]
return df
This is the loop I made to iterate through my dataframe list and pass each one through my function:
for df in dfs:
df = VDS_pre(df)
However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did. Thanks for the help!
However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did.
Yes, this is actually the case. The reason why they have not been modified is:
Assignment to an item in a for item in lst: loop does not have any effect on both the lst and the identifier/variables from which the lst items got their values as it is demonstrated with following code:
v1=1; v2=2; v3=3
lst = [v1,v2,v3]
for item in lst:
item = 0
print(lst, v1, v2, v3) # gives: [1, 2, 3] 1 2 3
To achieve the result you expect to obtain you can use a list comprehension and the list unpacking feature of Python:
vds,vds2,vds3,vds4=[VDS_pre(df) for df in [vds,vds2,vds3,vds4]]
or following code which is using a list of strings with the identifier/variable names of the dataframes:
sdfs = ['vds', 'vds2', 'vds3', 'vds4']
for sdf in sdfs:
exec(str(f'{sdf} = VDS_pre(eval(sdf))'))
Now printing vds, vds2, vds3 and vds4 will output the modified dataframes.
Pandas frame operations return new copy of data. Your snippet store the result in df variable which is not stored or updated to your initial list. This is why you don't have any stored result after execution.
If you don't need to keep original frames, you may simply overwrite them:
for i, df in enumerate(dfs):
dfs[i] = VDS_pre(df)
If not just use a second list and append result to it.
l = []
for df in dfs:
df2 = VDS_pre(df)
l.append(df2)
Or even better use list comprehension to rewrite this snippet into a single line of code.
Now you are able to store the result of your processing.
Additionally if your frames have the same structure and can be merged as a single frame, you may consider to first concat them and then apply your function on it. That would be totally pandas.

How do slice and select only the first characters- like you would use str() for a character string, but for a numeric feature?

I am trying to slice first 3 digits of the feature Zip Code and create a new variable off of that.
zipcode = 75012
sliced_zipcode = str(zipcode)[:3]
# If you want the result in integer
three_digits_zipcode = int(str(zipcode)[:3])
# If you want to apply this in a dataframe
import pandas as pd
df['three_digits_zip'] = df['zipcode'].apply(lambda x: int(str(x)[:3]))
Zip = 300123
newZip = Zip//1000
print(newZip)

Create multiple empty DataFrames named from a list using a loop

I'm trying to create multiple empty DataFrames with a for loop where each DataFrame has a unique name stored in a list. Per the sample code below, I would like three empty DataFrames, one called A[], another B[] and the last one C[]. Thank you.
import pandas as pd
report=['A','B','C']
for i in report:
report[i]=pd.DataFrame()
It would be best to use a dictionary
import pandas as pd
report=['A','B','C']
df_dict = {}
for i in report:
df_dict[i]=pd.DataFrame()
print(df_dict['A'])
print(df_dict['B'])
print(df_dict['C'])
You should use dictionnary for that:
import pandas as pd
report={'A': pd.DataFrame(),'B': pd.DataFrame(),'C': pd.DataFrame()]
if you have a list of string or character containing the name, which is I think what you are really trying to do
name_dataframe = ['A', 'B', 'C']
dict_dataframe = {}
for name in name_dataframe:
dict_dataframe[name] = pd.Dataframe()
It is not a good practise, and you should probably use a dictionary to do this, but the below code gets the work done if you still need to do it, this will create the DataFrames in the memory with the names in the list report:
for i in report:
exec(i + ' = pd.DataFrame()')
And if you want to store the empty DataFrames in a list:
df_list = []
for i in report:
exec(i + ' = pd.DataFrame() \ndf_list.append(' + i+ ')')

Add 3 variable to an empty DataFrame

I want to append 3 variables to an empty dataframe after each loop.
dfvol = dfvol.append([stock,mean,median],columns=['Stock','Mean','Median'])
Columns in Dataframe should be ['Stock','Median','Mean']
Result should be:
How can I solve the problem, because something with the append code is wrong.
You're trying to use a syntax for creating a new dataframe to append to it, which is not going to work.
Here is one way you can try to do what you want
df.loc[len(df)] = [stock,mean,median]
The better approach will be creating list of entries and when your loop is done to create the dataframe using that list (instead of appending to df with every iteration)
Like this:
some_list = []
for a in b:
some_list.append([stock,mean,median])
df = pd.DataFrame(some_list, columns = ['Stock','Mean','Median'])
The append method doesn't work like that. You would only use the columns parameter if you were creating a DataFrame object. You either want to create a second temporary DataFrame and append it to the main DataFrame like this:
df_tmp = pd.DataFrame([[stock,mean,median]], columns=['Stock','Mean','Median'])
dfvol = dfvol.append(df_tmp)
...or you can use a dictionary like this:
dfvol = dfvol.append({'Stock':stock,'Mean':mean,'Median':median}, ignore_index=True)
Like this:
In [256]: dfvol = pd.DataFrame()
In [257]: stock = ['AAPL', 'FB']
In [258]: mean = [600.356, 700.245]
In [259]: median = [281.788, 344.55]
In [265]: dfvol = dfvol.append(pd.DataFrame(zip(stock, mean, median), columns=['Stock','Mean','Median']))
In [265]: dfvol
Out[265]:
Stock Mean Median
0 AAPL 600.356 281.788
1 FB 700.245 344.550
check the append notation here. There are multiple way to do it.
dfvol = dfvol.append(pd.DataFrame([[Stock,Mean,Median]],columns=['Stock','Mean','Median']))

Replacing characters in a string Python

I have a data frame that I iterate through and modify as follows:
filtered = pd.read_csv(fileloc)
for index, row in filtered.iterrows():
row["standardUpc"] = row["standardUpc"].replace("['","")
row["standardUpc"] = row["standardUpc"].replace("']","")
This is not working.
You shouldn't need to iterate over the df for this, and instead you should be able to do:
filtered['standardUpc'] = filtered['standardUpc'].str.replace("'['","")
filtered['standardUpc'] = filtered['standardUpc'].str.replace("']'","")
There are ways to chain the two calls together, but that should be the way that you can do the string replacement.

Categories

Resources