TypeError: string indices must be integers, not str in python - python

Here is my python code, Which is throwing error while executing.
def split_cell(s):
a = s.split(".")
b = a[1].split("::=")
return (a[0].lower(),b[0].lower(),b[1].lower())
logic_tbl,logic_col,logic_value = split_cell(rules['logic_1'][ith_rule])
mems = logic_tbl[logic_tbl[logic_col]==logic_value]['mbr_id'].tolist()
Function split_cell is working fine, and all the columns in logic_tbl are of object datatypes.
HEre is the Traceback

Got this corrected!
Logic_tbl contains name of pandas dataframe
Logic_col contains name of column name in the pandas dataframe
logic_value contains value of the rows in the logic_col variable in logic_tbl dataframe.
mems = logic_tbl[logic_tbl[logic_col]==logic_value]['mbr_id'].tolist()
I was trying like above, But python treating logic_tbl as string, not doing any pandas dataframe level operations.
So, I had created a dictionary like this
dt_dict={}
dt_dict['a_med_clm_diag'] = a_med_clm_diag
And modified my code as below,
mems = dt_dict[logic_tbl][dt_dict[logic_tbl][logic_col]==logic_value]['mbr_id'].tolist()
This is working as expected. I come to this idea when i wrote like,
mems = logic_tbl[logic_tbl[logic_col]==logic_value,'mbr_id']
And this throwed message like,"'logic_tbl' is a string Nothing to filter".

Try writing that last statement like below code:
filt = numpy.array[a==logic_value for a in logic_col]
mems = [i for indx,i in enumerate(logic_col) if filt[indx] == True]
Does this work?

Related

Datetime comparison using f strings in python

Consider the following dataframe
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
Y['Date'] = pd.to_datetime(Y['Date'])
Now consider the following code snippet in which I try to print slices of the dataframe filtered on the column "Date". However, it prints a empty dataframe
for date in set(Y['Date']):
print(Y.query(f'Date == {date.date()}'))
Essentially, I wanted to filter the dataframe on the column "Date" and do some processing on that in the loop. How do I achieve that?
The date needs to be accessed at the following query command:
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
for date in set(Y['Date']):
print(Y.query('Date == #date'))
Use "" because f-strings removed original "" and error is raised:
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
Y['Date'] = pd.to_datetime(Y['Date'])
for date in set(Y['Date']):
print(Y.query(f'Date == "{date}"'))

logger print error: not enough arguments for format string

I'm trying to fix a "logger print error: not enough arguments for format string" cropping up on a jupyter lab report and have tried a few solutions but no joy.
my dataframe looks like this:
df_1 = pd.DataFrame(df, columns = ['col1','col2','col3','col4','col5','col6','col7', 'col8', 'col9', 'col10'])
#I'm applying a % format because I only need last four columns in percentage:
df_1['col7'] = df_1['col7'].apply("{0:.0f}%".format)
df_1['col8'] = df_1['col8'].apply("{0:.0f}%".format)
df_1['col9'] = df_1['col9'].apply("{0:.0f}%".format)
df_1['col10'] = df_1['col10'].apply("{0:.0f}%".format)
I want to maintain the table format/structure so i'm not doing print(df_1) but rather just:
df_1
The above works fine, but I can't seem to get past the "logger print error: not enough arguments for format string" error.
p.s I've also tried using formats like "{:.2%}" or "{0:.0%}" but it turns -3 to -300%
Here is what the columns look like without any format:
Edit: fixed by removing this line from dataframe source query '%Y-%m-%d'
If you are using python 3, this should do it:
from random import randint
df_1['col7'] = df_1['col7'].apply(f"{randint(-3,-301)}%")
df_1['col8'] = df_1['col8'].apply(f"{randint(-3,-301)}%")
df_1['col9'] = df_1['col9'].apply(f"{randint(-3,-301)}%")
df_1['col10'] = df_1['col10'].apply(f"{randint(-3,-301)}%")

Appending a tuple to Pandas Dataframe

I'm having troubles appending a tuple to a pandas Dataframe inside a for loop.
I initialized the Dataframe where all the tuples will be attached as follows:
self.URM_test = pd.DataFrame(columns=["playlist_id", "track_id"])
Then I enter a loop where I want to attach the tuple I'm creating at each iteration, I'm doing it in this way:
URM_test_tuple = pd.DataFrame({"playlist_id": [int(self.target_playlists_test[count])], "track_id": [playlist_tracks_list]})
self.URM_test.append(URM_test_tuple)
If I print URM_test_tuple I get a correct result, as follows:
playlist_id track_id
0 13317 [18940, 18902, 8892, 1365, 6806, 8972, 18944, ...
But when I'm printing self.URM_test during the bug I'm noticing it's empty and I'm getting this printed to console
Empty DataFrame
Columns: [playlist_id, track_id]
Index: []
Do you know what might be the bug in this code?
The append method in pandas creates a new object as stated in the documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
You could try assigning the new object to URM_test in your loop
URM_test = pd.DataFrame(columns=["playlist_id", "track_id"])
URM_test_tuple = pd.DataFrame({"playlist_id": ['foo'], "track_id": ['bar']})
URM_test = URM_test.append(URM_test_tuple)

How to change all columns in csv file to str?

I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)
Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`
For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})
At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.

Return unknown string in dataframe (extract unkown string from

I have a large dataset which I have imported using the read_csv as described below which should be float measurement and NaN.
df = pd.read_csv(file_,parse_dates=[['Date','Time']],na_values = ['No Data','Bad Data','','No Sample'],low_memory=False)
When I apply df.dtypes, most of the columns return as object type which indicate that there are other objects in the dataframe that I am not aware of.I am looking for a way of identifying those string and replace then by na values.
First thing that I wanted to do was to convert everything to dtype = np.float but I couldn't. Then, I tried to read in each (columns,index) and return the identified string.
I have tried something very inefficient (I am a beginner) and time consuming, it has worked for other dataframe but here it returns a errors:
TypeError: argument of type 'float' is not iterable
from isstring import *
list_string = []
for i in range(0,len(df)):
for j in range(0,len(df.columns)):
x = test.ix[i,j]
if isstring(x) and '.'not in x:
list_string.append(x)
list_string = pd.DataFrame(list_string, columns=["list_string"])
g = list_string.groupby('list_string').size()
Is there a simple way of detecting unknown string in large dataset. Thanks
You could try:
string_list = []
for col, series in df.items(): # iterating over all columns - perhaps only select `object` types
string_list += [s for s in series.unique() if isinstance(s, str)]

Categories

Resources