cant change column name from 'NaN' to smth else - python

i have tried to string the column names and then change them - no succces, it left it NaN
data.rename(columns=str).rename(columns={'NaN':'Tip Analiza','NaN':'Limite' }, inplace=True)
i tried to use the in function to replace NaN- no succes - it gave an error,
TypeError: argument of type 'float' is not iterable
data.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in data.columns.values]).ffill().values.flatten()
what should i try ?

Try:
data.columns=map(str, data)
# in case of unique column names
data=data.replace({"col1": "rnm1", "col2": "rnm2"})
# otherwise ignore first line, and just do
data.columns=["rnm1", "rnm2"]

Related

AttributeError: 'Series' object has no attribute 'Mean_μg_L'

Why am I getting this error if the column name exists.
I have tried everything. I am out of ideas
Since the AttributeError is raised at the first column with a name containing a mathematical symbol (µ), I would suggest you these two solutions :
Use replace right before the loop to get rid of this special character
df.columns = df.columns.str.replace("_\wg_", "_ug_", regex=True)
#change df to Table_1_1a_Tarawa_Terrace_System_1975_to_1985
Then inside the loop, use row.Mean_ug_L, .. instead of row.Mean_µg_L, ..
Use row["col_name"] (highly recommended) to refer to the column rather than row.col_name
for index, row in Table_1_1a_Tarawa_Terrace_System_1975_to_1985.iterrows():
SQL_VALUES_Tarawa = (row["Chemicals"], rows["Contamminant"], row["Mean_µg_L"], row["Median_µg_L"], row["Range_µg_L"], row["Num_Months_Greater_MCL"], row["Num_Months_Greater_100_µg_L"])
cursor.execute(SQL_insert_Tarawa, SQL_VALUES_Tarawa)
counting = cursor.rowcount
print(counting, "Record added")
conn.commit()

If conditions Pandas

all my values are int in my data frame and I am trying to do this if condition
if the value is greater than 1, than multiply by 1 else multiple the value with -1 and add to a new column, but it gives error
'>' not supported between instances of 'str' and 'int'
below is the code I wrote
Cfile["Equ"] = [i*1 if i>1 else i*-1 for i in Cfile["Net Salary"]]
s = df['Net Salary']
df['Equ'] = np.where(s.gt(1), s, s.mul(-1))
Use from this code:
df['new'] = (df.net>1)*df.net - (1≥df.net)*df.net

Cannot resolve stack() due to type mismatch

I have a pyspark code that looks like this:
from pyspark.sql.functions import expr
unpivotExpr = """stack(14, 'UeEnd', UeEnd,
'Encedreco', Endereco,
'UeSitFun', UeSitFun,
'SitacaoEscola', SituacaoEscola,
'Creche', Creche,
'PreEscola', PreEscola,
'FundAnosIniciais', FundAnosIniciais,
'FundAnosFinais', FundAnosFinais,
'EnsinoMedio', EnsinoMedio,
'Profissionalizante', Profissionalizante,
'EJA', EJA,
'EdEspecial', EdEspecial,
'Conveniada', Conveniada,
'TipoAtoCriacao', TipoAtoCriacao)
as (atributo, valor)"""
unpivotDf = df.select("Id", expr(unpivotExpr))
When I run it I get this Error:
cannot resolve 'stack(14, 'UeEnd', `UeEnd`, 'Encedreco', `Endereco`, 'UeSitFun', `UeSitFun`,
'SitacaoEscola', `SituacaoEscola`, 'Creche', `Creche`, 'PreEscola', `PreEscola`,
'FundAnosIniciais', `FundAnosIniciais`, 'FundAnosFinais', `FundAnosFinais`, 'EnsinoMedio',
`EnsinoMedio`, 'Profissionalizante', `Profissionalizante`, 'EJA', `EJA`, 'EdEspecial',
`EdEspecial`, 'Conveniada', `Conveniada`, 'TipoAtoCriacao', `TipoAtoCriacao`)'
due to data type mismatch: Argument 2 (string) != Argument 6 (bigint); line 1 pos 0;
What might be causing this problem?
When you unpivot a group of columns, all of their values are going to end up in the same column. Because of that, you should first make sure that all of the columns you are trying to unpivot into one have the same data types. Otherwise you would have a column with multiple different types in different rows.

convert pandas series (with strings) to python list

It's probably a silly thing but I can't seem to correctly convert a pandas series originally got from an excel sheet to a list.
dfCI is created by importing data from an excel sheet and looks like this:
tab var val
MsrData sortfield DetailID
MsrData strow 4
MsrData inputneeded "MeasDescriptionTest", "SiteLocTest", "SavingsCalcsProvided","BiMonthlyTest"
# get list of cols for which input is needed
cols = dfCI[((dfCI['var'] == 'inputneeded') & (dfCI['tab'] == 'MsrData'))]['val'].values.tolist()
print(cols)
>> ['"MeasDescriptionTest", "SiteLocTest", "SavingsCalcsProvided", "BiMonthlyTest"']
# replace null text with text
invalid = 'Input Needed'
for col in cols:
dfMSR[col] = np.where((dfMSR[col].isnull()), invalid, dfMSR[col])
However the second set of (single) quotes added when I converted cols from series to list, makes all the columns a single value so that
col = '"MeasDescriptionTest", "SiteLocTest", "SavingsCalcsProvided", "BiMonthlyTest"'
The desired output for cols is
cols = ["MeasDescriptionTest", "SiteLocTest", "SavingsCalcsProvided", "BiMonthlyTest"]
What am I doing wrong?
Once you've got col, you can convert it to your expected output:
In [1109]: col = '"MeasDescriptionTest", "SiteLocTest", "SavingsCalcsProvided", "BiMonthlyTest"'
In [1114]: cols = [i.strip() for i in col.replace('"', '').split(',')]
In [1115]: cols
Out[1115]: ['MeasDescriptionTest', 'SiteLocTest', 'SavingsCalcsProvided', 'BiMonthlyTest']
Another possible solution that comes to mind given the structure of cols is:
list(eval(cols[0])) # ['MeasDescriptionTest', 'SiteLocTest', 'SavingsCalcsProvided', 'BiMonthlyTest']
Although this is valid, it's less safe and I would go with list-comprehension as #MayankPorwal suggested.

How to change all columns in csv file to str?

I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)
Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`
For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})
At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.

Categories

Resources