Numpy TypeError - python

Could someone please explain what the following error is about. Following is my code:
import pandas as pd
from pandas import DataFrame
data =pd.read_csv('FILENAME')
b=data.info()
print b
Following is the error:
Traceback (most recent call last): File
"FILENAME", line 5, in <module>
b=data.info() File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1443, in
info
counts = self.count() File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 3862, in
count
result = notnull(frame).sum(axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/common.py", line 276, in
notnull
return -res File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 604, in
__neg__
arr = operator.neg(_values_from_object(self))
TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~`
operator or the logical_not function instead.
All I am trying to do is display a summary of my dataset using the Dataframe.info() function, and I am having trouble trying to make sense of the error. Although I do feel it has something to do with the numpy package altogether. What needs to be done here?

The problem is with the old version of pandas as new version of numpy.
You must update pandas to get your code working.
If you are on conda you can do a conda update pandas to update pandas.
If you are using pip you can do pip install --upgrade pandas
Also, keep in mind that in pandas documentation it is mentioned the following for the info function
This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage
data.info() will print the info to the console. So no need to assign it to a variable and then later printing it.
import pandas as pd
from pandas import DataFrame
data =pd.read_csv('FILENAME')
print data.info()
This code will work fine for you.

Related

How to replace text in in a Pandas dataframe imported from Excel

I'm trying to modify one column in a pandas dataframe. The data is originally from an excel worksheet.
I tried modifying this article using str.replace and re.sub and this is the script that I have at the moment
Question I was using to base my script around: How to replace text in a string column of a Pandas dataframe?
I
import re
import pandas as pd
# reading the excel file
df = pd.read_excel('Upload to dashboard - Untitled (28).xlsx', skiprows = 7)
print(df.head(3))
print(df.dtypes)
df['trim']= df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)|(.*)','')
df['trim2']=re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
df.to_csv("C:/Users/sward/Downloads/out.csv")
#pd.options.display.max_colwidth = None
print(df['trim'])
print(df['trim2'])
Currently I'm getting an error that says
C:\Users\sward\.spyder-py3\temp.py:24: FutureWarning: The default value of regex will change from True to False in a future version.
df['trim']= df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)|(.*)','')
Traceback (most recent call last):
File ~\.spyder-py3\temp.py:25 in <module>
df['trim2']=re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
File ~\Anaconda3\lib\re.py:210 in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
I was trying to use regex to extract the domain from the Publisher URL column. I can get the regex expression. I wanted to make
https://www.healthline.com/health/gerd#home-remedies
into
www.healthline.com
And in this step I'm looking for all of the mobile versions of the website and take out the m. part of the url- the expression

trying to drop some columns from dataframe but throwing a name error

I've read a CSV file into Jupyter Notebook and I'm trying to remove redundant columns I'm not gonna use.
I'm using the drop() method but its giving me a NameError! I'm sure the columns exist and I feel Like I'm missing something obvious here but i can't seem to figure it out.
So here is my code so far:
#Calling Libraries
import os # File management
import pandas as pd # Data frame manipulation
import numpy as np # Data frame operations
import datetime as dt # Date operations
import seaborn as sns # Data Viz
flight_df=pd.read_csv(r'C:\Users\pc\Desktop\Work\flights.csv')
# removing na rows
flight_df.dropna()
# dropping redundant columns
newdf=flight_df.drop([O_COUNTRY,O_LATITUDE,O_LONGITUDE,D_COUNTRY,D_LATITUDE,D_LONGITUDE,SCHEDULED_DEPARTURE,DIVERTED,CANCELLED,CANCELLATION_REASON,TAXI_OUT,TAXI_IN,WHEELS_OFF, WHEELS_ON,SCHEDULED_ARRIVAL],axis=1, inplace = True)
throws this error:
NameError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15328/4119461383.py in
2 flight_df.dropna()
3 # dropping redundant columns
newdf=flight_df.drop([O_COUNTRY,O_LATITUDE,O_LONGITUDE,D_COUNTRY,D_LATITUDE,D_LONGITUDE,SCHEDULED_DEPARTURE,DIVERTED,CANCELLED,CANCELLATION_REASON,TAXI_OUT,TAXI_IN,WHEELS_OFF, WHEELS_ON,SCHEDULED_ARRIVAL],axis=1, inplace = True)
NameError: name 'O_COUNTRY' is not defined
I have tried to instead define the ones i want to keep but it's giving me the same error
As column names are str in this case, you have to enclose them in str delimiters
newdf=flight_df.drop(['O_COUNTRY','O_LATITUDE','O_LONGITUDE','D_COUNTRY' ...
Warning !
You are using the attribute inplace=True but you try to assing the result to a new variable. This variable will be None.
Either write
flight_df.drop(['O_COUNTRY', ...],axis=1,inplace=True)
or
newdf=flight_df.drop(['O_COUNTRY', ...],axis=1)
Same things with your dropna. It won't be stored as you wrote it.
I think you just want to put quotes around the column names. The way you are doing it now Python expects there to be an object named O_COUNTRY.

Python: Reading tdms files using Python npTDMS and creating a Pandas dataframe

I'm able to read a labview .tdms file using Python npTDMS package, could read metadata and sample data for groups and channels as well.
However, the file has timestamp values with year '9999'. Hence getting the following error while converting to a pandas dataframe:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp:.
I went through the documentation in:
https://nptdms.readthedocs.io/en/stable/apireference.html#nptdms.TdmsFile.as_dataframe; however, couldn't find an option to deal with this data situation.
Tried passing errors='coerce' while calling as.dataframe() didn't work either. Any pointers or directions to read the .tdms file to a pandas dataframe, with this data situation, would be very helpful.
Changing the data at the source is not an option.
Code snippet to read tdms file:
import numpy as np
import pandas as pd
from nptdms import TdmsFile as td
tdms_file = td.read(<tdms file name>)
tdms_file_df = tdms_file.as_dataframe()
Error while creating a pandas dataframe

Basic Importing Excel Documents Into Python

I'm a new Python user and am simply trying to export an Excel (or CSV) file into Jupyter Notebook to play around with.
From google searching, the common code I see is something like the below:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
print("Column headings:")
print(df.columns)
I tried this with a CSV file and got the below error message:
File "", line 5
df = pd.read_excel(C:\Users\dhauge1\Desktop\Python Workshop\fortune500.csv, sheetname=fortune500)
^ SyntaxError: invalid syntax
Please see above for error message. Is anyone able to help me understand what I'm doing wrong?
when reading a csv file use the comand pd.read_csv('filename')

writing a csv file by column in pandas throws error

I am reading and writing a csv file using pandas.
I am reading a csv file column by column and writing it to a seperate csv file by column by column reading works fine but while writing a csv file it thorws error
import pandas
f1 = open('artist_links','a')
data_df = pandas.read_csv('upc1.upcs_result.csv')
#data_wr = pandas.to_csv('test.csv')
df = data_df['one']
dd = data_df['two']
header = ["df", "dd"]
df.to_csv("test.csv",columns = header)
Output:
Traceback (most recent call last):
File "merge.py", line 9, in <module>
df.to_csv("test.csv",columns = header)
TypeError: to_csv() got an unexpected keyword argument 'columns'
But there is a column argument actully here pandas library
How could i make this program work(Writing column by column)
Changes in v0.16.0
http://pandas.pydata.org/pandas-docs/dev/whatsnew.html
cols as the keyword for the csv and Excel writers was replaced with columns.
Try cols instead or upgrade pandas.
Instead of:
df.to_csv("test.csv", columns=header)
Use:
df.to_csv("test.csv", cols=header)
Edit: Either way you should upgrade. Sincerely. If the error is a keyword argument and you are basing your method off of documentation for the most recent version on software written over 1.5 years ago, with substantial changes made since then, you should upgrade.
EDIT2: If you're desperate to make life difficult for yourself and continue using outdated functions and try to use new features, you could do workarounds. This is not recommended, since some stuff may be a lot more subtle and throw exceptions when you least expect it.
You could... do...
lst = []
for column in header:
s = df[column]
# export all to list and nested
lst.append(s.tolist())
# zip resulting lists to tuples
zipped = zip(*lst)
# export them as a CSV.
print '\n'.join([','.join(i) for i in zipped])
EDIT3: Much simpler, but you could also do:
df2 = df[header] # return copy
df2.to_csv()

Categories

Resources