The function will have much more conditional statements, but to start off, and where I've trouble-shooted to, I get the error: 'str' object has no attribute 'isin', etc. I've tried several things to no avail.
def categorise(row):
if (row['state'] == 'FL') & (row['city'].isin(['MIAMI', 'TALLAHASSEE', 'ORLANDO'])):
return 1
...
df['colF'] = df.apply(lambda row: categorise(row), axis=1)
i have tried to string the column names and then change them - no succces, it left it NaN
data.rename(columns=str).rename(columns={'NaN':'Tip Analiza','NaN':'Limite' }, inplace=True)
i tried to use the in function to replace NaN- no succes - it gave an error,
TypeError: argument of type 'float' is not iterable
data.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in data.columns.values]).ffill().values.flatten()
what should i try ?
Try:
data.columns=map(str, data)
# in case of unique column names
data=data.replace({"col1": "rnm1", "col2": "rnm2"})
# otherwise ignore first line, and just do
data.columns=["rnm1", "rnm2"]
I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)
Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`
For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})
At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.
So given a cell I want to know the value in which the cell right before it (same row, previous column) has.
Here is my code and I thought it was working but...:
def excel_test(col_num, sheet_object):
for cell in sheet_object.columns[col_number]:
prev_col = (column_index_from_string(cell.column))
row = cell.row
prev_cell = sheet_object.cell(row, prev_col)
I keep getting this error:
coordinate = coordinate.upper().replace('$', '')
builtins.AttributeError: 'int' object has no attribute 'upper'
I have also tried this:
def excel_test(col_num, sheet_object):
for cell in sheet_object.columns[col_number]:
prev_col = (column_index_from_string(cell.column))
row = cell.row
prev_cell = sheet_object.cell(row, get_column_letter(prev_col))
Can somebody tell me how i can access that, I've also imported everything there needs to be imported.
You should look at the cell.offset() method.
I am trying to do a filter operation to get all the rows where the length of my variable country is less than 4 and I keep getting errors no matter what I do.
This is the current code (using the Python API)
uniqueRegions = sqlContext.sql("SELECT country, city FROM df")
uniqueRegions = uniqueRegions.rdd
uniqueRegions = uniqueRegions.distinct()
uniqueRegions = uniqueRegions.filter(lambda line: len(line.country) < 4)
This is the error
TypeError: object of type 'NoneType' has no len()
And the first row (done with rdd.first):
Row(country=u'xxxxxx', city=u'xxxxxx')
Any suggestion on how to solve this?
Thanks.
You have a database record where the country is NULL. The length of that doesn't make sense. What should it do when there's no country set?
Maybe you want to filter the records? SELECT country, city FROM df WHERE country IS NOT NULL? Or maybe lambda l: l.country is not None and len(l.country) < 4, or depending on your logic, lambda l: l.country is None or len(l.country) < 4.