df = pd.DataFrame({'a': ['Anakin Ana', 'Anakin Ana, Chris Cannon', 'Chris Cannon', 'Bella Bold'],
'b': ['Bella Bold, Chris Cannon', 'Donald Deakon', 'Bella Bold', 'Bella Bold'],
'c': ['Chris Cannon', 'Chris Cannon, Donald Deakon', 'Chris Cannon', 'Anakin Ana, Bella Bold']},
index=[0, 1, 2])
Hi everyone,
I'm trying to count how many names are in common in each column.
Above is an example of what my data looks like. At first, it said 'float' object has no attribute 'split' error. I did some searching and it seems the error is coming from my missing data which is reading as float. But even when I change the column in string variable it keeps getting the error.
Below is my code.
import pandas as pd
import csv
filepath = "C:/Users/data/Untitled Folder/creditdata2.csv"
df = pd.read_csv(filepath,encoding='utf-8')
df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
df['overlap_count'] = df['word_overlap'].str.len()
df.to_csv('creditdata3.csv',mode='a',index=False)
And here is the error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-b85ac8637aae> in <module>
4 df = pd.read_csv(filepath,encoding='utf-8')
5
----> 6 df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
7 df['overlap_count'] = df['word_overlap'].str.len()
8
<ipython-input-21-b85ac8637aae> in <listcomp>(.0)
4 df = pd.read_csv(filepath,encoding='utf-8')
5
----> 6 df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
7 df['overlap_count'] = df['word_overlap'].str.len()
8
AttributeError: 'float' object has no attribute 'astype'
astype is a method in DataFrame, and here you have just a primitive float type, because you've already indexed x.
Try this:
df['word_overlap'] = [set(str(x[8]).split(",")) & set(str(x[10]).split(",")) for x in df.values]
import pandas as pd
import csv
filepath = "C:/data/Untitled Folder/creditdata2.csv"
df = pd.read_csv(filepath,encoding='utf-8')
def f(columns):
f_desc, f_def = str(columns[6]), str(columns[7])
common = set(f_desc.split(",")).intersection(set(f_def.split(",")))
return common, len(common)
df[['word_overlap', 'word_count']] = df.apply(f, axis=1, raw=True).apply(pd.Series)
df.to_csv('creditdata3.csv',mode='a',index=False)
I found another way to do it thank you, everyone!
Related
I have a dataframe where I want to create a Dummy variable that takes the value 1 when the Asset Class starts with a D. I want to have all variants that start with a D. How would you do it?
The data looks like
dic = {'Asset Class': ['D.1', 'D.12', 'D.34','nan', 'F.3', 'G.12', 'D.2', 'nan']}
df = pd.DataFrame(dic)
What I want to have is
dic_want = {'Asset Class': ['D.1', 'D.12', 'D.34', 'nan', 'F.3', 'G.12', 'D.2', 'nan'],
'Asset Dummy': [1,1,1,0,0,0,1,0]}
df_want = pd.DataFrame(dic_want)
I tried
df_want["Asset Dummy"] = ((df["Asset Class"] == df.filter(like="D"))).astype(int)
where I get the following error message: ValueError: Columns must be same length as key
I also tried
CSDB["test"] = ((CSDB["PAC2"] == CSDB.str.startswith('D'))).astype(int)
where I get the error message AttributeError: 'DataFrame' object has no attribute 'str'.
I tried to transform my object to a string with the standard methos (as.typ(str) and to_string()) but it also does not work. This is probably another problem but I have found only one post with the same question but the post does not have a satisfactory answer.
Any ideas how I can solve my problem?
There are many ways to create a new column based on conditions this is one of them :
import pandas as pd
import numpy as np
dic = {'Asset Class': ['D.1', 'D.12', 'D.34', 'F.3', 'G.12', 'D.2']}
df = pd.DataFrame(dic)
df['Dummy'] = np.where(df['Asset Class'].str.contains("D"), 1, 0)
Here's a link to more : https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
You can use Series.str.startswith on df['Asset Class']:
>>> dic = {'Asset Class': ['D.1', 'D.12', 'D.34', 'nan', 'F.3', 'G.12', 'D.2', 'nan']}
>>> df = pd.DataFrame(dic)
>>> df['Asset Dummy'] = df['Asset Class'].str.startswith('D').astype(int)
>>> df
Asset Class Asset Dummy
0 D.1 1
1 D.12 1
2 D.34 1
3 nan 0
4 F.3 0
5 G.12 0
6 D.2 1
7 nan 0
I am trying to export the summary of my multiple regression models in a table.
results = {'A':result.summary(),
'B': result1.summary(), 'C': result2.summary(), 'D': result3.summary(), 'E' : result4.summary()}
df2 = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
for mod in results.keys():
for col in results[mod].tables[0].columns:
if col % 2 == 0:
df2 = df2.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
'Param':results[mod].tables[0][col].values,
'Value':results[mod].tables[0][col+1].values}))
print(df2)
When I run the code it gives me error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-280-952fff354224> in <module>
3 df2 = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
4 for mod in results.keys():
----> 5 for col in results[mod].tables[0].column:
6 if col % 2 == 0:
7 df2 = df2.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
AttributeError: 'SimpleTable' object has no attribute 'column'
The SimpleTable in this context is statsmodels.iolib.table.SimpleTable. We can use pandas.DataFrame.from_records to convert the data type to DataFrame. From here, you can access the columns easily.
Assure this SimpleTable is accessed through a variable named "t"
df = pd.DataFrame.from_records(t.data)
header = df.iloc[0] # grab the first row for the header
df = df[1:] # take the data less the header row
df.columns = header
print(df.shape)
return df['your_col_name']
It's hard to tell without seeing how you're creating result.summary() et al, but it's likely that the SimpleTable API follows similar/related pandas APIs, in which case you're looking for the columns attribute (note the plural 's').
I have a list of part numbers that I want to use to extract a list of prices on a website.
However I'm getting the below error when running the code:
Traceback (most recent call last):
File "C:/Users/212677036/.PyCharmCE2019.1/config/scratches/scratch_1.py", line 13, in
data = {"partOptionFilter": {"PartNumber": PN(i), "AlternativeOemId": "17155"}}
TypeError: 'DataFrame' object is not callable
Process finished with exit code 1
import requests
import pandas as pd
df = pd.read_excel(r'C:\Users\212677036\Documents\Copy of MIC Parts Review - July 26 19.xlsx')
PN = pd.DataFrame(df, columns = ['Product code'])
#print(PN)
i = 0
Total_rows = PN.shape[0]
while i < Total_rows:
data = {"partOptionFilter": {"PartNumber": PN(i), "AlternativeOemId": "17155"}}
r = requests.post('https://www.partsfinder.com/Catalog/Service/GetPartOptions', json=data).json()
print(r['Data']['PartOptions'][0]['YourPrice'])
i=i+1
You are calling PN(i). That is why it says
TypeError: 'DataFrame' object is not callable
The (i) is like a method call.
I am not sure how your df looks like and what you want to extract but you have to index the DataFrame like this:
PN[i]
or
PN.loc[i, 'columnname']
or
PN.iloc[i, 0]
or ... depending on your df
i have couple columns in data frame that contains numeric values and string
and i want to remove all characters and leave only numbers
Admit_DX_Description Primary_DX_Description
510.9 - EMPYEMA W/O FISTULA 510.9 - EMPYEMA W/O FISTULA
681.10 - CELLULITIS, TOE NOS 681.10 - CELLULITIS, TOE NOS
780.2 - SYNCOPE AND COLLAPSE 427.89 - CARDIAC DYSRHYTHMIAS NEC
729.5 - PAIN IN LIMB 998.30 - DISRUPTION OF WOUND, UNSPEC
to
Admit_DX_Description Primary_DX_Description
510.9 510.9
681.10 681.10
780.2 427.89
729.5 998.30
code:
for col in strip_col:
# # Encoding only categorical variables
if df[col].dtypes =='object':
df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))
print df.head()
error:
Traceback (most recent call last):
df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.py", line 2175, in map
new_values = map_f(values, arg)
File "pandas/src/inference.pyx", line 1217, in pandas.lib.map_infer (pandas/lib.c:63307)
df[col] = df[col].map(lambda x: x.rstrip(r'[a-zA-Z]'))
AttributeError: 'int' object has no attribute 'rstrip'
You can use this example:
I chose re module to extract float numbers only.
import re
import pandas
df = pandas.DataFrame({'A': ['Hello 199.9', '19.99 Hello'], 'B': ['700.52 Test', 'Test 7.7']})
df
A B
0 Hello 199.9 700.52 Test
1 19.99 Hello Test 7.7
for col in df:
df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]
A B
0 199.9 700.52
1 19.99 7.7
If you have integer numbers also, change re pattern to this: \d*\.?\d+.
EDITED
For TypeError I'd recommend to use try. In this example I created a list errs. This list will be used in except TypeError. You can print (errs) to see those values.
Check df too.
...
...
errs = []
for col in df:
try:
df[col] = [''.join(re.findall("\d+\.\d+", item)) for item in df[col]]
except TypeError:
errs.extend([item for item in df[col]])
You should look into df.applymap and apply it over the columns from which you want to remove the text.
[edited]
Alternatively:
import pandas as pd
test = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
fun = lambda x: x+10
df = pd.DataFrame(test)
df['c1'] = df['c1'].apply(fun)
print df
I have written the following python code using the pandas package.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pandas import Series
csv = pd.read_csv('train.csv')
df_csv = pd.DataFrame(csv)
PassengerId = np.array(df_csv['PassengerId'])
Age = np.array(df_csv['Age'])
Pclass = np.array(df_csv['Pclass'])
Sex = np.array(df_csv['Sex'])
i = 0
while i < 891:
if Sex[i] == 'male':
Sex[i] = 0
i = i + 1
else:
Sex[i] = 1
i = i + 1
Sex = np.array(Sex)
new_df = pd.DataFrame[
'PassengerId': Series(PassengerId),
'Age': Series(Age),
'Pclass': Series(Pclass),
'Sex': Series(Sex)
]
print(new_df)
I am trying to create a data frame by reading a csv file, storing a few columns as numpy array then replacing value of one array. when i merge those arrays again as a data frame, i get the following error
D:\Projects\Titanic>python python.py
Traceback (most recent call last):
File "python.py", line 27, in <module>
'Sex': Sex
TypeError: 'type' object is not subscriptable
Please help me out. Thanks in advance
Try replacing
new_df = pd.DataFrame[
'PassengerId': Series(PassengerId),
'Age': Series(Age),
'Pclass': Series(Pclass),
'Sex': Series(Sex)
]
with
new_df = pd.DataFrame({
'PassengerId': Series(PassengerId),
'Age': Series(Age),
'Pclass': Series(Pclass),
'Sex': Series(Sex)
})