Pandas - inserting a comma on a number - python

I’m using Python and pandas and I’m using a dataframe that has temperatures (Celsius) on it, I worked it and right now they follow this pattern, e.g.
362
370
380
385
376
I want to make it have the comma between the second and third number,
e.g. 36,2
But I just can’t do this, is this possible?
Thanks in advance!

Try with division + astype + str.replace:
df['temp'] = (df['temp'] / 10).astype(str).str.replace('.', ',', regex=False)
temp
0 36,2
1 37,0
2 38,0
3 38,5
4 37,6
DataFrame Used:
import pandas as pd
df = pd.DataFrame({'temp': [362, 370, 380, 385, 376]})
temp
0 362
1 370
2 380
3 385
4 376

Presumably, you want the last digit to be separated by a comma (for example, 88 should be 8,8). In that case, this will work:
ls = [362, 370, 380, 385, 376]
ls = [f"{str(item)[:-1]},{str(item)[-1]}" for item in ls]
# ['36,2', '37,0', '38,0', '38,5', '37,6']
Where:
str(item)[:-1] get's all digits except the final one
str(item)[-1] get's just the final digit
In a dataframe, your values are stored as a pandas series. In that case:
import pandas as pd
ls = pd.Series([362, 370, 380, 385, 376])
ls = ls.astype("str").map(lambda x : f"{x[:-1]},{x[-1]}")
Or more specifically
df["Your column"] = df["Your column"].astype("str").map(lambda x : f"{x[:-1]},{x[-1]}")
Output:
0 36,2
1 37,0
2 38,0
3 38,5
4 37,6

You would have to convert this integer data to string in order to enter the ','.
For example:
temp=362
x = str(temp)[:-1]+','+str(temp)[-1]
You could use this in a loop or a list comprehension which was already mentioned. (They can be trickier to understand, so I provided this instead) Hope it helps!

Related

Create categories based on Partial Values Python

Hi I have a data frame as below:
response ticket
so service reset performed 123
reboot done 343
restart performed 223
no value 444
ticket created 765
Im trying something like this:
import pandas as pd
df = pd.read_excel (r'C:\Users\Downloads\response.xlsx')
print (df)
count_other = 0
othersvocab = ['Service reset' , 'Reboot' , 'restart']
if df.response = othersvocab
{
count_other = count_other + 1
}
What I'm trying to do is get the count of how many have either of 'othersvocab' and how many don't.
I'm really new to Python, and I'm not sure how to do this.
Expected Output:
other ticketed
3 2
Can you help me figure it out, hopefully with what's happening in your code?
I am doing this on lunch break, I don't like the for other in others thing I have and there are better ways using pandas DataFrame methods you can use but it will have to do.
import pandas as pd
df = pd.DataFrame({"response": ["so service reset performed", "reboot done",
"restart performed"],
"ticket": [123, 343, 223]})
others = ['service reset' , 'reboot' , 'restart']
count_other = 0
for row in df["response"].values:
for other in others:
if other in row:
count_other += 1
So first you are going to need to address that if you want to perform this in the way I have you're going to have to lowercase the response column and the others variable, that's not very hard (lookup for pandas apply and the string operator .lower).
What I have done in this is I am looping first over the values in the loop column.
Then within this loop I am looping over the others list items.
Finally seeing whether any of these is in the list.
I hope my rushed response gives a hand.
Consider below df:
In [744]: df = pd.DataFrame({'response':['so service reset performed', 'reboot done', 'restart performed', 'no value', 'ticket created'], 'ticket':[123, 343, 223, 444, 765]})
In [745]: df
Out[745]:
response ticket
0 so service reset performed 123
1 reboot done 343
2 restart performed 223
3 no value 444
4 ticket created 765
Below is your othersvocab:
In [727]: othersvocab = ['Service reset' , 'Reboot' , 'restart']
# Converting all elements to lowercase
In [729]: othersvocab = [i.lower() for i in othersvocab]
Use Series.str.contains:
# Converting response column to lowercase
In [733]: df.response = df.response.str.lower()
In [740]: count_in_vocab = len(df[df.response.str.contains('|'.join(othersvocab))])
In [742]: count_others = len(df) - count_in_vocab
In [752]: res = pd.DataFrame({'other': [count_in_vocab], 'ticketed': [count_others]})
In [753]: res
Out[753]:
other ticketed
0 3 2

Create Multiple dataframes from a large text file

Using Python, how do I break a text file into data frames where every 84 rows is a new, different dataframe? The first column x_ft is the same value every 84 rows then increments up by 5 ft for the next 84 rows. I need each identical x_ft value and corresponding values in the row for the other two columns (depth_ft and vel_ft_s) to be in the new dataframe too.
My text file is formatted like this:
x_ft depth_ft vel_ft_s
0 270 3535.755 551.735107
1 270 3534.555 551.735107
2 270 3533.355 551.735107
3 270 3532.155 551.735107
4 270 3530.955 551.735107
.
.
33848 2280 3471.334 1093.897339
33849 2280 3470.134 1102.685547
33850 2280 3468.934 1113.144287
33851 2280 3467.734 1123.937134
I have tried many, many different ways but keep running into errors and would really appreciate some help.
I suggest looking into pandas.read_table, which automatically outputs a DataFrame. Once doing so, you can isolate the rows of the DataFrame that you are looking to separate (every 84 rows) by doing something like this:
df = #Read txt datatable with Pandas
arr = []
#This gives you an array of all x values in your dataset
for x in range(0,403):
val = 270+5*x
arr.append(val)
#This generates csv files for every row with a specific x_ft value with its corresponding columns (depth_ft and vel_ft_s)
for x_value in arr:
tempdf = df[(df['x_ft'])] = x_value
tempdf.to_csv("df"+x_value+".csv")
You can get indexes to split your data:
rows = 84
datasets = round(len(data)/rows) # total datasets
index_list = []
for index in data.index:
x = index % rows
if x == 0:
index_list.append(index)
print(index_list)
So, split original dataset by indexes:
l_mod = index_list + [max(index_list)+1]
dfs_list = [data.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]
print(len(dfs_list))
Outputs
print(type(dfs_list[1]))
# pandas.core.frame.DataFrame
print(len(dfs_list[0]))
# 84

pandas dataFrame : i'd like to 'uniformize' values

First of all, I couldn't find a proper english way to put my request, therefore it might have been answered before but I couldn't find what I need. Please forgive me if there's already an answer for this...
So I have "hours" stored in a pd.DataFrame as follow:
1454
1621
and so on (they are 14:54 and 16:21)
problem :
some of them are 953 (for 09:53).
question :
how could I "autocomplete" these so that they are four digits long, containing zeroes (i'd like the above to be 0953, and additionnaly 23 to be 0023).
I was considering converting the number into strings, checking if they have less than 4 caracters, and adding a 0 at the beginning if not, but surely there must be a more pythonic way to do this ?
Thank you very much for your help and have a nice day !
You'll need to have a string column, and then you can use zfill:
df = pd.DataFrame([1453, 923, 24, 1250], columns=['time'])
df['time'].astype(str).str.zfill(4)
#0 1453
#1 0923
#2 0024
#3 1250
#Name: time, dtype: object
To add 0 at the beginning, the type must be string. If the column names is hours, start with
df.hours = df.hours.astype(str)
Now you can conditionally add a 0 to the beginning of shorter entries:
short = df.hours.str.len() < 4
df.hours.loc[short] = '0' + df.hours.loc[short]
For example:
df = pd.DataFrame({'hours': [123, 3444, 233]})
df.hours = df.hours.astype(str)
short = df.hours.str.len() < 4
df.hours.loc[short] = '0' + df.hours.loc[short]
>>> df
hours
0 0123
1 3444
2 0233
Perhaps this is just me, but I firmly believe all dates manipulations should be done through datetime, not strings, so I would recommend some thing as follow:
df['time'] = pd.to_datetime(df['time'].astype(str).str.zfill(4).apply(lambda x: x[:2] + ':' + x[2:]))
df['time_str'] = df['time'].dt.strftime('%I-%M')

Pandas: change data type of Series to String

I use Pandas 'ver 0.12.0' with Python 2.7 and have a dataframe as below:
df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'shape': ['round', 'triangular', 'triangular','triangular','square',
'triangular','round','triangular']
}, columns= ['id','colour', 'shape'])
The id Series consists of some integers and strings. Its dtype by default is object. I want to convert all contents of id to strings. I tried astype(str), which produces the output below.
df['id'].astype(str)
0 1
1 5
2 z
3 1
4 1
5 7
6 2
7 6
1) How can I convert all elements of id to String?
2) I will eventually use id for indexing for dataframes. Would having String indices in a dataframe slow things down, compared to having an integer index?
A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.
As per the documentation, a Series can be converted to the string datatype in the following ways:
df['id'] = df['id'].astype("string")
df['id'] = pandas.Series(df['id'], dtype="string")
df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)
You can convert all elements of id to str using apply
df.id.apply(str)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Edit by OP:
I think the issue was related to the Python version (2.7.), this worked:
df['id'].astype(basestring)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Name: id, dtype: object
You must assign it, like this:-
df['id']= df['id'].astype(str)
Personally none of the above worked for me.
What did:
new_str = [str(x) for x in old_obj][0]
You can use:
df.loc[:,'id'] = df.loc[:, 'id'].astype(str)
This is why they recommend this solution: Pandas doc
TD;LR
To reflect some of the answers:
df['id'] = df['id'].astype("string")
This will break on the given example because it will try to convert to StringArray which can not handle any number in the 'string'.
df['id']= df['id'].astype(str)
For me this solution throw some warning:
> SettingWithCopyWarning:
> A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
There are two possibilities:
Use .astype("str").astype("string"). As seen here
Use .astype(pd.StringDtype()). From the official documentation
For me it worked:
df['id'].convert_dtypes()
see the documentation here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html
use pandas string methods ie df['id'].str.cat()
If you want to do dynamically
df_obj = df.select_dtypes(include='object')
df[df_obj.columns] = df_obj.astype(str)
Your problem can easily be solved by converting it to the object first. After it is converted to object, just use "astype" to convert it to str.
obj = lambda x:x[1:]
df['id']=df['id'].apply(obj).astype('str')
for me .to_string() worked
df['id']=df['id'].to_string()

Python/Pandas: Counting the number of times a value less than x appears in a column

Hi I have a file of variables that I have read in as sim.
>>sim.head()
SIM0 212321
SIM1 9897362
SIM2 345
SIM3 2345
SIM4 79727367
I have assigned the first value of the column to original:
original=sim[0]
212321
I would like to use pandas to count the number of times a number less than 212321 appears in sim.
Is there a way to do this without a loop?
If sim is a Series, you could do this:
import pandas as pd
sim = pd.Series([212321, 9897362, 345, 2345, 79727367],
index=map('SIM{}'.format, range(5)))
orig = sim[0]
num_smaller_items = (sim < orig).sum()
print(num_smaller_items)
# 2

Categories

Resources