Trying to change column from object to to date time - python

I am trying to change a object column to a date time column.
However, every time I run the code below, I get an TypeError: "NATType" object is not callable.
I am assuming that this is due to the blanks in the column but I am not really sure how to resolve it. Removing the rows is not an option here because there are other columns to consider as well.
df['jahreskontakt'] = pd.to_datetime(df['jahreskontakt'], errors='ignore')
Does anybody have any advice? Thanks in advance.
Explanations:
df['jahreskontakt'] #column with yearly contacts by sales team
Values that can be found in the column:
2014-07-01 00:00:00
00:00:00
""
Full error: Tried changing between error = coerce or ignore
TypeError Traceback (most
recent call last)
<ipython-input-122-9d57805d9290> in <module>()
----> 1 df['jahreskontakt'] = pd.to_datetime(df['jahreskontakt'], errors='coerce')
TypeError: 'NaTType' object is not callable

Related

Extract quarter information from numpy datetime64 obkect

I have below numpy datetime64 object
import numpy as np
date_time = np.datetime64('2012-05-01T01:00:00.000000+0100')
I would like to express this in YearQuarter i.e. '2012Q2'. Is there any method available to perform this? I tried with pandas Timestamp method but it generates error:
import pandas as pd
>>> pd.Timestamp(date_time).dt.quarter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Timestamp' object has no attribute 'dt'
Any pointer will be very helpful
There are various ways that one can achieve that, depending on the desired output type.
If one wants the type pandas._libs.tslibs.period.Period, then one can use:
pandas.Period as follows
year_quarter = pd.Period(date_time, freq='Q')
[Out]: 2012Q2
pandas.Timestamp, as user7864386 mentioned, as follows
year_quarter = pd.Timestamp(date_time).to_period('Q')
[Out]: 2012Q2
Alternatively, if one wants the final output to be a string, one will have to pass pandas.Series.dt.strftime, more specifically .strftime('%YQ%q'), such as
year_quarter = pd.Period(date_time, freq='Q').strftime('%YQ%q')
# or
year_quarter = pd.Timestamp(date_time).to_period('Q').strftime('%YQ%q')
Notes:
date_time = np.datetime64('2012-05-01T01:00:00.000000+0100') gives a
DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future
To check the variable year_quarter type, one can do the following
print(type(year_quarter))

Pyspark and Python - Column is not iterable

I am using Python-3 with Azure data bricks.
I have a dataframe. The column 'BodyJson' is a json string that contains one occurrence of 'vmedwifi/' within it. I have added a constant string literal of 'vmedwifi/' as column named 'email_type'.
I want to find the start position of text 'vmedwifi/' with column 'BodyJson' - all columns are within the same dataframe. My code is below.
I get the error 'Column is not iterable' on the second line of code. Any ideas of what I am doing wrong?
# Weak logic to try and identify email addressess
emailDf = inputDf.select('BodyJson').where("BodyJson like('%vmedwifi%#%.%')").withColumn('email_type', lit('vmedwifi'))
b=emailDf.withColumn('BodyJson_Cutdown', substring(emailDf.BodyJson, expr('locate(emailDf.email_type, emailDf.BodyJson)'), 20))
TypeError: Column is not iterable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<command-536715104422314> in <module>()
12 #emailDf1 = inputDf.select('BodyJson').where("BodyJson like('%#xxx.abc.uk%')")
13
---> 14 b=emailDf.withColumn('BodyJson_Cutdown', substring(emailDf.BodyJson, expr('locate(emailDf.email_type, emailDf.BodyJson)'), 20))
15
16 #inputDf.unpersist()
The issue was with the literial passed to expr.
I decided to tackle this problem a different way which got around this issue.

How to iterate over dates in Python/Mysql? 'datetime.date' is not iterable"

Given a Mysql table with columns ("Title", "Author", "Date"). How do you:
Iterate over database to compare a given user provided date input to the database column "Date"
append matching records to lists
without getting the error "TypeError: argument of type 'datetime.date' is not iterable" example code below: Python 3.7
date = request.form.get("date")
list1=[]
list2=[]
list3=[]
results = db.session.query(Books).all()
for i in results:
if date in i.date is True:
list1.append(i.title)
list2.append(i.author)
list3.append(i.date)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-c4085a31faa3> in <module>()
5 results = db.session.query(Books).all()
6 for i in results:
----> 7 if date in i.date:
8 list1.append(i.title)
9 list2.append(i.author)
TypeError: argument of type 'datetime.date' is not iterable
Use sqlachemy filter to search. Doing database operations in application code is comparatively poor performance.
results = db.session.query(Books).filter(Books.date==date)

Why is this error occuring when I am using filter in pandas: TypeError: 'int' object is not iterable

When I want to remove some elements which satisfy a particular condition, python is throwing up the following error:
TypeError Traceback (most recent call last)
<ipython-input-25-93addf38c9f9> in <module>()
4
5 df = pd.read_csv('fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv;
----> 6 df = filter(df,~('-02-29' in df['Date']))
7 '''tmax = []; tmin = []
8 for dates in df['Date']:
TypeError: 'int' object is not iterable
The following is the code :
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv');
df = filter(df,~('-02-29' in df['Date']))
What wrong could I be doing?
Following is sample data
Sample Data
Use df.filter() (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html)
Also please attach the csv so we can run it locally.
Another way to do this is to use one of pandas' string methods for Boolean indexing:
df = df[~ df['Date'].str.contains('-02-29')]
You will still have to make sure that all the dates are actually strings first.
Edit:
Seeing the picture of your data, maybe this is what you want (slashes instead of hyphens):
df = df[~ df['Date'].str.contains('/02/29')]

How to convert float into int in pandas?

This is my code:
users.age.mean().astype(int64)
(where users is the name of dataframe and age is a column in it)
This is the error I am getting:
AttributeError
Traceback (most recent call last)
<ipython-input-29-10b672e7f7ae> in <module>
----> 1 users.age.mean().astype(int64)
AttributeError: 'float' object has no attribute 'astype'
users.age.mean() returns a float not a series. Floats don't have astype, only pandas series.
Try:
x = numpy.int64(users.age.mean())
Or:
x = int(users.age.mean())
Try int before your function example:
X = int(users.age.mean())
Hope it helps!

Categories

Resources