Why isn't my column converting to string from int? - python

*Input:*
df["waiting_time"].value_counts()
​
*Output:*
2 days 6724
4 days 5290
1 days 5213
7 days 4906
6 days 4037
...
132 days 1
125 days 1
117 days 1
146 days 1
123 days 1
Name: waiting_time, Length: 128, dtype: int64
I tried:
df['wait_dur'] = df['waiting_time'].values.astype(str)
and I've tried apply as well. No changes to the data type, it stays the same.

You need to skip the 'values' part in your code:
df['wait_dur'] = df['waiting_time'].astype(str)
If you check first row for example, you will get:
type(df['wait_dur'][0])
<class 'str'>

df = df.applymap(str)
This should work, it applies the map string throughout.
If you want to see more methods go here.

Related

How to convert timedelta to integer in pandas?

I have a column 'Time' in pandas that includes both integer and time deltas in days:
index Time
1 91
2 28
3 509 days 00:00:00
4 341 days 00:00:00
5 250 days 00:00:00
I am wanting to change all of the Time deltas to integers, but I am getting many errors when trying to pick and choose which values to convert, as it throws errors when I try to convert an integer within the column rather than a TD.
I want this:
index Time
1 91
2 28
3 509
4 341
5 250
I've tried a few variations of this to check if it's an integer, as I'm not concerned with those:
for x in finished['Time Future']:
if isinstance(x, int):
continue
else:
finished['Time'][x] = finished['Time'][x].astype(int)
But It is not working at all. I can't seem to find a solution.
This seems to work:
# If the day counts are actual integers:
m = ~df.Time.apply(lambda x: isinstance(x, int))
# OR, in case the day counts are strings:
m = ~df.Time.str.isdigit()
df.loc[m, 'Time'] = df.Time[m].apply(lambda x: pd.Timedelta(x).days)
Results in:
Time
1 91
2 28
3 509
4 341
5 250

Drawing a boxplot of a Panda dataframe with time intervals

I have a Panda Dataframe with the following data:
df1[['interval','answer']]
interval answer
0 0 days 06:19:17.767000 no
1 0 days 00:26:35.867000 no
2 0 days 00:29:12.562000 no
3 0 days 01:04:36.362000 no
4 0 days 00:04:28.746000 yes
5 0 days 02:56:56.644000 yes
6 0 days 00:20:13.600000 no
7 0 days 02:31:17.836000 no
8 0 days 02:33:44.575000 no
9 0 days 00:08:08.785000 no
10 0 days 03:48:48.183000 no
11 0 days 00:22:19.327000 no
12 0 days 00:05:05.253000 question
13 0 days 01:08:01.338000 unsubscribe
14 0 days 15:10:30.503000 no
15 0 days 11:09:05.824000 no
16 1 days 12:56:07.526000 no
17 0 days 18:10:13.593000 no
18 0 days 02:25:56.299000 no
19 2 days 03:54:57.715000 no
20 0 days 10:11:28.478000 no
21 0 days 01:04:55.025000 yes
22 0 days 13:59:40.622000 yes
The format of the df is:
id object
datum datetime64[ns]
datum2 datetime64[ns]
answer object
interval timedelta64[ns]
dtype: object
As a result the boxplot looks like:
enter image description here
Any idea?
Any help is appreciated...
Robert
Seaborn may help you achieve what you want.
First of all, one needs to make sure the columns are of the type one wants.
In order to recreate your problem, created the same dataframe (and gave it the same name df1). Here one can see the data types of the columns
[In]: df1.dtypes
[Out]:
interval object
answer object
dtype: object
For the column "answers", one can use pandas.factorize as follows
df1['NewAnswer'] = pd.factorize(df1['answer'])[0] + 1
That will create a new column and assign the values 1 to No, 2 to Yes, 3 to Question, 4 to Unscribe.
With this, one can, already, create a box plot using sns.boxplot as
ax = sns.boxplot(x="interval", y="NewAnswer", hue="answer", data=df1)
Which results in the following
The amount of combinations one can do are various, so I will leave only these as OP didn't specify its requirements nor gave an example of the expected output.
Notes:
Make sure you have the required libraries installed.
There may be other visualizations that would work better with these dataframe, here one can see a gallery with examples.

Convert object to only date and difference both of the columns

I have a dataframe with two columns to be given in date, but the dtype of that is object.
DF :
A B
7/27/2002 5/29/2013
5/25/2004 4/21/2005
4/22/2008 4/28/2010
6/22/2007 7/30/2008
7/26/2008 6/21/2011
7/29/2008 6/20/2013
6/26/2000 7/23/2005
6/20/1991 7/27/2013
5/22/2005 4/27/2010
I want to subtract B from A to get the no of years and no of days in each of separate columns.
OUTPUT EXPECTED :
NO OF YEARS NO OF DAYS
1 320
2 600
3 900
I think need convert all values to datetimes, then subtract by sub and convert timedeltas to days, for years is used divide by constant 365.2425 with floor:
df = df.apply(pd.to_datetime)
df['days'] = df['B'].sub(df['A']).dt.days
df['years'] = np.floor(df['days'] / 365.2425).astype(int)
print (df)
A B days years
0 2002-07-27 2013-05-29 3959 10
1 2004-05-25 2005-04-21 331 0
2 2008-04-22 2010-04-28 736 2
3 2007-06-22 2008-07-30 404 1
4 2008-07-26 2011-06-21 1060 2
5 2008-07-29 2013-06-20 1787 4
6 2000-06-26 2005-07-23 1853 5
7 1991-06-20 2013-07-27 8073 22
8 2005-05-22 2010-04-27 1801 4

Time arithmetic on pandas series

I have a pandas DataFrame with a column "StartTime" that could be any datetime value. I would like to create a second column that gives the StartTime relative to the beginning of the week (i.e., 12am on the previous Sunday). For example, this post is 5 days, 14 hours since the beginning of this week.
StartTime
1 2007-01-19 15:59:24
2 2007-03-01 04:16:08
3 2006-11-08 20:47:14
4 2008-09-06 23:57:35
5 2007-02-17 18:57:32
6 2006-12-09 12:30:49
7 2006-11-11 11:21:34
I can do this, but it's pretty dang slow:
def time_since_week_beg(x):
y = x.to_datetime()
return pd.Timedelta(days=y.weekday(),
hours=y.hour,
minutes=y.minute,
seconds=y.second
)
df['dt'] = df.StartTime.apply(time_since_week_beg)
What I want is something like this, that doesn't result in an error:
df['dt'] = pd.Timedelta(days=df.StartTime.dt.dayofweek,
hours=df.StartTime.dt.hour,
minute=df.StartTime.dt.minute,
second=df.StartTime.dt.second
)
TypeError: Invalid type <class 'pandas.core.series.Series'>. Must be int or float.
Any thoughts?
You can use a list comprehension:
df['dt'] = [pd.Timedelta(days=ts.dayofweek,
hours=ts.hour,
minutes=ts.minute,
seconds=ts.second)
for ts in df.StartTime]
>>> df
StartTime dt
0 2007-01-19 15:59:24 4 days 15:59:24
1 2007-03-01 04:16:08 3 days 04:16:08
2 2006-11-08 20:47:14 2 days 20:47:14
3 2008-09-06 23:57:35 5 days 23:57:35
4 2007-02-17 18:57:32 5 days 18:57:32
5 2006-12-09 12:30:49 5 days 12:30:49
6 2006-11-11 11:21:34 5 days 11:21:34
Depending on the format of StartTime, you may need:
...for ts in pd.to_datetime(df.StartTime)

Replacing a substring with another substring pandas python

I have a dataframe, df.
I want to replace the 7th to 5th from last character with a 0 if it's a /:
df['StartDate'].str[-7:-5]=df['StartDate'].str[-7:-5].str.replace('/', '0')
Returns the error:
TypeError: 'StringMethods' object does not support item assignment
Data looks like:
number StartDate EndDate Location_Id Item_Id xxx yyy\
3 460 4/1/2012 4/11/2012 2502 3890004215 0 0
28 2731 10/17/2013 10/30/2013 3509 5100012114 0 0
34 1091 1/10/2013 1/23/2013 2544 5100012910 0 0
134 1630 5/2/2013 5/15/2013 2506 69511912000 0 0
138 327 1/12/2012 1/25/2012 5503 1380016686
Pandas has builtin support for datetime objects (pandas might have its own implementation rather than using the standard library's directly, but the idea is the same), so instead of trying to reformat dates using string methods, converting to datetime is much easier:
df['StartDate'] = pd.to_datetime(df['StartDate'])
Once you've converted, there are some easy to use methods related to datetime objects that you can get at through the .dt accessor (may be a recent addition in v0.15):
df.StartDate.dt.month
Out[20]:
3 4
28 10
34 1
134 5
138 1
dtype: int64

Categories

Resources