why can't the pandas data frame append appropriately to form one data frame in this loop?
#Produce the overall data frame
def processed_data(data1_,f_loc,open,close):
"""data1_: is the csv file to be modified
f_loc: is the location of csv files to be processed
open and close: are the columns to undergo computations
returns a new dataframe of modified columns"""
main_file=drop_col(data1_)#Dataframe to append more data columns to
for i in files_path(f_loc):
data=get_data_frame(i[0])#returns the dataframe, takes file path location of the csv file and returns the data frame
perc=perc_df(data,open,close,i[1])#Dataframe to append
copy_data=main_file.append(perc)
return copy_data
heres the output:
Date WTRX-USD
0 2021-05-27 NaN
1 2021-05-28 NaN
2 2021-05-29 NaN
3 2021-05-30 NaN
4 2021-05-31 NaN
.. ... ...
79 NaN -2.311576
80 NaN 5.653349
81 NaN 5.052950
82 NaN -2.674435
83 NaN -3.082957
[450 rows x 2 columns]
My intention is to return something like this(where each append operation adds a column):
Date Open High Low Close Adj Close Volume
0 2021-05-27 0.130793 0.136629 0.124733 0.128665 0.128665 70936563
1 2021-05-28 0.128659 0.129724 0.111244 0.113855 0.113855 71391441
2 2021-05-29 0.113752 0.119396 0.108206 0.111285 0.111285 62049940
3 2021-05-30 0.111330 0.115755 0.107028 0.112185 0.112185 70101821
4 2021-05-31 0.112213 0.126197 0.111899 0.125617 0.125617 83502219
.. ... ... ... ... ... ... ...
361 2022-05-23 0.195637 0.201519 0.185224 0.185231 0.185231 47906144
362 2022-05-24 0.185242 0.190071 0.181249 0.189553 0.189553 33312065
363 2022-05-25 0.189550 0.193420 0.183710 0.183996 0.183996 33395138
364 2022-05-26 0.184006 0.186190 0.165384 0.170173 0.170173 57218888
365 2022-05-27 0.170636 0.170660 0.165052 0.166864 0.166864 63560568
[366 rows x 7 columns]
pandas.concat
pandas.DataFrame.append has been deprecated. Use pandas.concat instead.
Combine DataFrame objects horizontally along the x-axis by passing in
axis=1
copy_data=pd.concat([copy_data,perc], axis=1)
Related
I have a pandas df with 5181 rows and with a column of customer names and I have a separate list of 383 customer names from within that column whose corresponding rows I want to drop from the df. I tried to write a piece of code that would iterate through all the names in the customer column and drop each of the rows with customer names matching those on the list. My result is TypeError: 'NoneType' object is not subscriptable.
The list is called Retail_Customer_Tracking and the df is called df_final and looks like:
index Customer First_Order_Date Last_Order_Date
0 0 0 2022-09-15 2022-09-15
1 1 287 2018-02-19 2020-11-30
2 2 606 2017-10-31 2017-12-07
3 3 724 2021-12-28 2022-09-15
4 4 1025 2015-08-13 2015-08-13
... ... ... ... ...
5176 5176 tulips little pop up shop 2021-10-25 2022-10-08
5177 5177 unboxed 2021-06-24 2022-10-10
5178 5178 upMADE 2021-09-10 2022-03-31
5179 5179 victorias floral design 2021-07-12 2021-07-12
5180 5180 vintique marketplace 2021-03-16 2022-10-15
5181 rows × 4 columns
The code i wrote looks like
i = 0
for x in Retail_Customer_Tracking:
while i < 5182:
if df_final["Customer"].iloc[i] == x:
df_final = df_final.drop(df_final[i], axis=0, inplace=True)
else:
i = i + 1
I was hoping that the revised df_final would not have the rows I wanted to drop...
i'm very new at coding and any help would be greatly appreciated. Thanks!
I have seen many methods like concat, join, merge but i am missing the technique for my simple dataset.
I have two datasets looks like mentioned below
dates.csv
2020-07-06
2020-07-07
2020-07-08
2020-07-09
2020-07-10
.....
...
...
mydata.csv
Expected,Predicted
12990,12797.578628473471
12990,12860.382061836583
12990,12994.159035827917
12890,13019.073929662367
12890,12940.34108357684
.............
.......
.....
I want to combine these two datasets which have same number of rows on btoh csv files. I tried concat method but i see NaN's
delete = dates.csv (pd.DataFrame)
data1 = mydata.csv (pd.DataFrame)
result = pd.concat([delete, data1], axis=0, ignore_index=True)
print(result)
Output:
0 Expected Predicted
0 2020-07-06 NaN NaN
1 2020-07-07 NaN NaN
2 2020-07-08 NaN NaN
3 2020-07-09 NaN NaN
4 2020-07-10 NaN NaN
.. ... ... ...
307 NaN 10999.0 10526.433098
308 NaN 10999.0 10911.247147
309 NaN 10490.0 11038.685328
310 NaN 10490.0 10628.204624
311 NaN 10490.0 10632.495169
[312 rows x 3 columns]
I dont want all NaN's.
Thanks for your help!
You could use .join() method from pandas.
delete = dates.csv (pd.DataFrame)
data1 = mydata.csv (pd.DataFrame)
result = delete.join(data1)
If your two dataframes respect the same order, you can use the join method mentionned by Nik, by default it joins on the index.
Otherwise, if you have a key that you can join your dataframes on, you can specify it like this:
joined_data = first_df.join(second_df, on=key)
Your first_df and second_df should then share a column with the same name to join on.
I am importing a dataframe from an Excel spreadsheet where the data column is incomplete:
Date Value
0 2020-04-29 144
1 NaT 158
2 NaT 134
3 2020-04-30 114
4 NaT 153
and I'd like to fill in the NaTs by replacing them with the date from the line above. The slow method works:
for i in range(0, df.shape[0]):
if pd.isnull(df.iat[i,0]):
df.iat[i, 0] = df.iat[i-1, 0]
but the methods I think ought to work, don't. Both of these replace the first NaT they can encounter but skip NaTs after that (are they working on copies of the data?)
df["Date"] = np.where(df["Date"].isnull(), df["Date"].shift(1), df["Date"])
df['Date'].mask(df['Date'].isnull(), df['Date'].shift(1), inplace=True)
Is there any quick way of doing this?
A
You can try ffill:
df.ffill()
If "Date" values are string, you can convert "NaT" into actual NaN value using replace:
df.replace("NaT", np.NaN).ffill()
Explanation
Use replace to replace "NaT" string to actuel NaN values.
Fill all NaN cells from the previous not NaN cell using ffill.
Code + illustration
import pandas as pd
import numpy as np
print(df.replace("NaT", np.NaN))
# Date Value
# 0 2020-04-29 144
# 1 NaN 158
# 2 NaN 134
# 3 2020-04-30 114
# 4 NaN 153
print(df.replace("NaT", np.NaN).ffill())
# Date Value
# 0 2020-04-29 144
# 1 2020-04-29 158
# 2 2020-04-29 134
# 3 2020-04-30 114
# 4 2020-04-30 153
This question already has answers here:
How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?
(11 answers)
Closed 2 years ago.
I don't know why unnamed: 0 got there when I reversed the index and for the life of me, I can't drop or del it. It will NOT go away. No matter what I do, by index or by any possible string variation from 'Unnamed: 0' to just '0'. I've tried setting it by columns= or by .drop(df.columns, I've tried everything already in my code such as drop=True. Then I tried dropping other columns and that wouldn't work.
import pandas as pd
# set csv file as constant
TRADER_READER = pd.read_csv('TastyTrades.csv')
# change date format, make date into timestamp object, set date as index, write changes to csv file
def clean_date():
# TRADER_READER['Date'] = TRADER_READER['Date'].replace({'T': ' ', '-0500': '', '-0400': ''}, regex=True)
# TRADER_READER['Date'] = pd.to_datetime(TRADER_READER['Date'], format="%Y-%m-%d %H:%M:%S")
TRADER_READER.set_index('Date', inplace=True, drop=True)
# TRADER_READER.iloc[::-1].reset_index(drop=True)
print(TRADER_READER)
# TRADER_READER.to_csv('TastyTrades.csv')
clean_date()
Unnamed: 0 Type ... Strike Price Call or Put
Date ...
2020-04-01 11:00:05 0 Trade ... 21.0 PUT
2020-04-01 11:00:05 1 Trade ... NaN NaN
2020-03-31 17:00:00 2 Receive Deliver ... 22.0 PUT
2020-03-31 17:00:00 3 Receive Deliver ... NaN NaN
2020-03-27 16:15:00 4 Receive Deliver ... 7.5 PUT
... ... ... ... ... ...
2019-12-12 10:10:22 617 Trade ... 11.0 PUT
2019-12-12 10:10:21 618 Trade ... 45.0 CALL
2019-12-12 10:10:21 619 Trade ... 32.5 PUT
2019-12-12 09:45:42 620 Trade ... 18.0 CALL
2019-12-12 09:45:42 621 Trade ... 13.0 PUT
[622 rows x 16 columns]
Process finished with exit code 0
I think the problem is from the CSV that includes a non-named column, to fix it, read the csv specifying to use the first column as index, and then set the Date index.
TRADER_READER = pd.read_csv('TastyTrades.csv', index_col=0)
TRADER_READER.set_index('Date', inplace=True, drop=True)
I have a csv file which is something like below
date,mean,min,max,std
2018-03-15,3.9999999999999964,inf,0.0,100.0
2018-03-16,0.46403712296984756,90.0,0.0,inf
2018-03-17,2.32452732452731,,0.0,143.2191767899579
2018-03-18,2.8571428571428523,inf,0.0,100.0
2018-03-20,0.6928406466512793,100.0,0.0,inf
2018-03-22,2.8675703858185635,,0.0,119.05383697172658
I want to select those column values which is > 20 and < 500 that is (20 to 500) and put those values along with date in another column of a dataframe.The other dataframe looks something like this
Date percentage_change location
2018-02-14 23.44 BOM
So I want to get the date, value from the csv and add it into the new dataframe at appropriate columns.Something like
Date percentage_change location
2018-02-14 23.44 BOM
2018-03-15 100.0 NaN
2018-03-16 90.0 NaN
2018-03-17 143.2191767899579 NaN
.... .... ....
Now I am aware of functions like df.max(axis=1) and df.min(axis=1) which gives you the min and max but not sure for finding values based on a range.So how can this be achieved?
Given dataframes df1 and df2, you can achieve this via aligning column names, cleaning numeric data, and then using pd.DataFrame.append.
df_app = df1.loc[:, ['date', 'mean', 'min', 'std']]\
.rename(columns={'date': 'Date'})\
.replace(np.inf, 0)\
.fillna(0)
print(df_app)
df_app['percentage_change'] = np.maximum(df_app['min'], df_app['std'])
print(df_app)
df_app = df_app[df_app['percentage_change'].between(20, 500)]
res = df2.append(df_app.loc[:, ['Date', 'percentage_change']])
print(res)
# Date location percentage_change
# 0 2018-02-14 BOM 23.440000
# 0 2018-03-15 NaN 100.000000
# 1 2018-03-16 NaN 90.000000
# 2 2018-03-17 NaN 143.219177
# 3 2018-03-18 NaN 100.000000
# 4 2018-03-20 NaN 100.000000
# 5 2018-03-22 NaN 119.053837