Appending a tuple to Pandas Dataframe

Appending a tuple to Pandas Dataframe - python

I'm having troubles appending a tuple to a pandas Dataframe inside a for loop.
I initialized the Dataframe where all the tuples will be attached as follows:
self.URM_test = pd.DataFrame(columns=["playlist_id", "track_id"])
Then I enter a loop where I want to attach the tuple I'm creating at each iteration, I'm doing it in this way:
URM_test_tuple = pd.DataFrame({"playlist_id": [int(self.target_playlists_test[count])], "track_id": [playlist_tracks_list]})
self.URM_test.append(URM_test_tuple)
If I print URM_test_tuple I get a correct result, as follows:
playlist_id track_id
0 13317 [18940, 18902, 8892, 1365, 6806, 8972, 18944, ...
But when I'm printing self.URM_test during the bug I'm noticing it's empty and I'm getting this printed to console
Empty DataFrame
Columns: [playlist_id, track_id]
Index: []
Do you know what might be the bug in this code?

The append method in pandas creates a new object as stated in the documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
You could try assigning the new object to URM_test in your loop
URM_test = pd.DataFrame(columns=["playlist_id", "track_id"])
URM_test_tuple = pd.DataFrame({"playlist_id": ['foo'], "track_id": ['bar']})
URM_test = URM_test.append(URM_test_tuple)

Related

pandas: while loop to simultaneously advance through multiple lists and call functions

I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.

Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.

def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()

#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]

for loop with same dataframe on both side of the operator

I have defined 10 different DataFrames A06_df, A07_df , etc, which picks up six different data point inputs in a daily time series for a number of years. To be able to work with them I need to do some formatting operations such as
A07_df=A07_df.fillna(0)
A07_df[A07_df < 0] = 0
A07_df.columns = col # col is defined
A07_df['oil']=A07_df['oil']*24
A07_df['water']=A07_df['water']*24
A07_df['gas']=A07_df['gas']*24
A07_df['water_inj']=0
A07_df['gas_inj']=0
A07_df=A07_df[['oil', 'water', 'gas','gaslift', 'water_inj', 'gas_inj', 'bhp', 'whp']]
etc for a few more formatting operations
Is there a nice way to have a for loop or something so I don’t have to write each operation for each dataframe A06_df, A07_df, A08.... etc?
As an example, I have tried
list=[A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
for i in list:
i=i.fillna(0)
But this does not do the trick.
Any help is appreciated

As i.fillna() returns a new object (an updated copy of your original dataframe), i=i.fillna(0) will update the content of ibut not of the list content A06_df, A07_df,....
I suggest you copy the updated content in a new list like this:
list_raw = [A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
list_updated = []
for i in list_raw:
i=i.fillna(0)
# More code here
list_updated.append(i)
To simplify your future processes I would recommend to use a dictionary of dataframes instead of a list of named variables.
dfs = {}
dfs['A0'] = ...
dfs['A1'] = ...
dfs_updated = {}
for k,i in dfs.items():
i=i.fillna(0)
# More code here
dfs_updated[k] = i

How to change all columns in csv file to str?

I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)

Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`

For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})

At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.

I added a row to Dataframe of pandas 3 times. However, only the last line is added

I wrote a function.
Dataframe was added 3 times using append.
But the result is only added one last time.
======
・It was an error to declare the dataframe type outside of the function first.
So I declared it in a function.
・Later, I wrote Dataframe outside of def AddDataframe (ymd, sb, vol) :. Then I got an error. The error is below.
NameError: name 'Hisframe10' is not defined
import pandas as pd
def AddDataframe(ymd,sb,vol):
data={'yyyymmdd':[],
'Sell':[],
'Buy':[],
'Volume':[],
'JPX':[],
'FutPrice':[]}
Hisframe8=pd.DataFrame(data)
Hisframe8
print('')
print('Hisframe8= ',Hisframe8)
adddata={'yyyymmdd':[ymd],
'Sell':[sb],
'Buy':['Nan'],
'Volume':[vol],
'JPX':[-1],
'FutPrice':[0.]}
Hisframe10=pd.DataFrame(adddata)
Hisframe10
return(Hisframe8.append(Hisframe10))
AddDataframe('2019-05-03','sell',123)
AddDataframe('2019-05-04','sell',345)
AddDataframe('2019-05-05','sell',456)
#Hisframe10 #err
======
I want to add 3 lines of data frame.
How should I do it?
https://imgur.com/i1lAB8M

you can make Hisframe8 as global
import pandas as pd
Hisframe8=pd.DataFrame(columns=['yyyymmdd','Sell','Buy','Volume','JPX','FutPrice'])
def AddDataframe(ymd,sb,vol):
global Hisframe8
adddata={'yyyymmdd':[ymd],'Sell':[sb],'Buy':['Nan'],'Volume':[vol],'JPX':[-1],'FutPrice':[0.]}
Hisframe10=pd.DataFrame(adddata)
Hisframe8 = Hisframe8.append(Hisframe10)
AddDataframe('2019-05-03','sell',123)
AddDataframe('2019-05-04','sell',345)
AddDataframe('2019-05-05','sell',456)
print(Hisframe8)

It is not necessary that you create an additional dataframe and append that to the first.
You can instead just append the dictionary to the existing df as shown here: append dictionary to data frame
Also a better style maybe would be to define the dataframe you are inserting into first, since that is not the primary use of your function.
My suggestion:
structure = {'date':[], 'Sell':[], 'Buy':[], 'Volume':[], 'JPX':[], 'FutPrice':[]}
df = pd.DataFrame(structure)
def add_data(df, date, sb, vol):
insert_dict = {'date':[date], 'Sell':[sb], 'Buy':[np.nan],
'Volume':[vol], 'JPX':[-1], 'FutPrice':[0.]}
return df.append(insert_dict , ignore_index = True)
df_appended = add_data(df, '2019-01-01', 'sell', 456)
I hope this helps

TypeError: string indices must be integers, not str in python

Here is my python code, Which is throwing error while executing.
def split_cell(s):
a = s.split(".")
b = a[1].split("::=")
return (a[0].lower(),b[0].lower(),b[1].lower())
logic_tbl,logic_col,logic_value = split_cell(rules['logic_1'][ith_rule])
mems = logic_tbl[logic_tbl[logic_col]==logic_value]['mbr_id'].tolist()
Function split_cell is working fine, and all the columns in logic_tbl are of object datatypes.
HEre is the Traceback

Got this corrected!
Logic_tbl contains name of pandas dataframe
Logic_col contains name of column name in the pandas dataframe
logic_value contains value of the rows in the logic_col variable in logic_tbl dataframe.
mems = logic_tbl[logic_tbl[logic_col]==logic_value]['mbr_id'].tolist()
I was trying like above, But python treating logic_tbl as string, not doing any pandas dataframe level operations.
So, I had created a dictionary like this
dt_dict={}
dt_dict['a_med_clm_diag'] = a_med_clm_diag
And modified my code as below,
mems = dt_dict[logic_tbl][dt_dict[logic_tbl][logic_col]==logic_value]['mbr_id'].tolist()
This is working as expected. I come to this idea when i wrote like,
mems = logic_tbl[logic_tbl[logic_col]==logic_value,'mbr_id']
And this throwed message like,"'logic_tbl' is a string Nothing to filter".

Try writing that last statement like below code:
filt = numpy.array[a==logic_value for a in logic_col]
mems = [i for indx,i in enumerate(logic_col) if filt[indx] == True]
Does this work?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Appending a tuple to Pandas Dataframe - python

Related

pandas: while loop to simultaneously advance through multiple lists and call functions

for loop with same dataframe on both side of the operator

How to change all columns in csv file to str?

I added a row to Dataframe of pandas 3 times. However, only the last line is added

TypeError: string indices must be integers, not str in python

Categories

Resources