I'm having troubles appending a tuple to a pandas Dataframe inside a for loop.
I initialized the Dataframe where all the tuples will be attached as follows:
self.URM_test = pd.DataFrame(columns=["playlist_id", "track_id"])
Then I enter a loop where I want to attach the tuple I'm creating at each iteration, I'm doing it in this way:
URM_test_tuple = pd.DataFrame({"playlist_id": [int(self.target_playlists_test[count])], "track_id": [playlist_tracks_list]})
self.URM_test.append(URM_test_tuple)
If I print URM_test_tuple I get a correct result, as follows:
playlist_id track_id
0 13317 [18940, 18902, 8892, 1365, 6806, 8972, 18944, ...
But when I'm printing self.URM_test during the bug I'm noticing it's empty and I'm getting this printed to console
Empty DataFrame
Columns: [playlist_id, track_id]
Index: []
Do you know what might be the bug in this code?
The append method in pandas creates a new object as stated in the documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
You could try assigning the new object to URM_test in your loop
URM_test = pd.DataFrame(columns=["playlist_id", "track_id"])
URM_test_tuple = pd.DataFrame({"playlist_id": ['foo'], "track_id": ['bar']})
URM_test = URM_test.append(URM_test_tuple)
Related
I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.
Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]
I have defined 10 different DataFrames A06_df, A07_df , etc, which picks up six different data point inputs in a daily time series for a number of years. To be able to work with them I need to do some formatting operations such as
A07_df=A07_df.fillna(0)
A07_df[A07_df < 0] = 0
A07_df.columns = col # col is defined
A07_df['oil']=A07_df['oil']*24
A07_df['water']=A07_df['water']*24
A07_df['gas']=A07_df['gas']*24
A07_df['water_inj']=0
A07_df['gas_inj']=0
A07_df=A07_df[['oil', 'water', 'gas','gaslift', 'water_inj', 'gas_inj', 'bhp', 'whp']]
etc for a few more formatting operations
Is there a nice way to have a for loop or something so I don’t have to write each operation for each dataframe A06_df, A07_df, A08.... etc?
As an example, I have tried
list=[A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
for i in list:
i=i.fillna(0)
But this does not do the trick.
Any help is appreciated
As i.fillna() returns a new object (an updated copy of your original dataframe), i=i.fillna(0) will update the content of ibut not of the list content A06_df, A07_df,....
I suggest you copy the updated content in a new list like this:
list_raw = [A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
list_updated = []
for i in list_raw:
i=i.fillna(0)
# More code here
list_updated.append(i)
To simplify your future processes I would recommend to use a dictionary of dataframes instead of a list of named variables.
dfs = {}
dfs['A0'] = ...
dfs['A1'] = ...
dfs_updated = {}
for k,i in dfs.items():
i=i.fillna(0)
# More code here
dfs_updated[k] = i
I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)
Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`
For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})
At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.
I wrote a function.
Dataframe was added 3 times using append.
But the result is only added one last time.
======
・It was an error to declare the dataframe type outside of the function first.
So I declared it in a function.
・Later, I wrote Dataframe outside of def AddDataframe (ymd, sb, vol) :. Then I got an error. The error is below.
NameError: name 'Hisframe10' is not defined
import pandas as pd
def AddDataframe(ymd,sb,vol):
data={'yyyymmdd':[],
'Sell':[],
'Buy':[],
'Volume':[],
'JPX':[],
'FutPrice':[]}
Hisframe8=pd.DataFrame(data)
Hisframe8
print('')
print('Hisframe8= ',Hisframe8)
adddata={'yyyymmdd':[ymd],
'Sell':[sb],
'Buy':['Nan'],
'Volume':[vol],
'JPX':[-1],
'FutPrice':[0.]}
Hisframe10=pd.DataFrame(adddata)
Hisframe10
return(Hisframe8.append(Hisframe10))
AddDataframe('2019-05-03','sell',123)
AddDataframe('2019-05-04','sell',345)
AddDataframe('2019-05-05','sell',456)
#Hisframe10 #err
======
I want to add 3 lines of data frame.
How should I do it?
https://imgur.com/i1lAB8M
you can make Hisframe8 as global
import pandas as pd
Hisframe8=pd.DataFrame(columns=['yyyymmdd','Sell','Buy','Volume','JPX','FutPrice'])
def AddDataframe(ymd,sb,vol):
global Hisframe8
adddata={'yyyymmdd':[ymd],'Sell':[sb],'Buy':['Nan'],'Volume':[vol],'JPX':[-1],'FutPrice':[0.]}
Hisframe10=pd.DataFrame(adddata)
Hisframe8 = Hisframe8.append(Hisframe10)
AddDataframe('2019-05-03','sell',123)
AddDataframe('2019-05-04','sell',345)
AddDataframe('2019-05-05','sell',456)
print(Hisframe8)
It is not necessary that you create an additional dataframe and append that to the first.
You can instead just append the dictionary to the existing df as shown here: append dictionary to data frame
Also a better style maybe would be to define the dataframe you are inserting into first, since that is not the primary use of your function.
My suggestion:
structure = {'date':[], 'Sell':[], 'Buy':[], 'Volume':[], 'JPX':[], 'FutPrice':[]}
df = pd.DataFrame(structure)
def add_data(df, date, sb, vol):
insert_dict = {'date':[date], 'Sell':[sb], 'Buy':[np.nan],
'Volume':[vol], 'JPX':[-1], 'FutPrice':[0.]}
return df.append(insert_dict , ignore_index = True)
df_appended = add_data(df, '2019-01-01', 'sell', 456)
I hope this helps
Here is my python code, Which is throwing error while executing.
def split_cell(s):
a = s.split(".")
b = a[1].split("::=")
return (a[0].lower(),b[0].lower(),b[1].lower())
logic_tbl,logic_col,logic_value = split_cell(rules['logic_1'][ith_rule])
mems = logic_tbl[logic_tbl[logic_col]==logic_value]['mbr_id'].tolist()
Function split_cell is working fine, and all the columns in logic_tbl are of object datatypes.
HEre is the Traceback
Got this corrected!
Logic_tbl contains name of pandas dataframe
Logic_col contains name of column name in the pandas dataframe
logic_value contains value of the rows in the logic_col variable in logic_tbl dataframe.
mems = logic_tbl[logic_tbl[logic_col]==logic_value]['mbr_id'].tolist()
I was trying like above, But python treating logic_tbl as string, not doing any pandas dataframe level operations.
So, I had created a dictionary like this
dt_dict={}
dt_dict['a_med_clm_diag'] = a_med_clm_diag
And modified my code as below,
mems = dt_dict[logic_tbl][dt_dict[logic_tbl][logic_col]==logic_value]['mbr_id'].tolist()
This is working as expected. I come to this idea when i wrote like,
mems = logic_tbl[logic_tbl[logic_col]==logic_value,'mbr_id']
And this throwed message like,"'logic_tbl' is a string Nothing to filter".
Try writing that last statement like below code:
filt = numpy.array[a==logic_value for a in logic_col]
mems = [i for indx,i in enumerate(logic_col) if filt[indx] == True]
Does this work?