This question already has answers here:
Convert pandas.Series from dtype object to float, and errors to nans
(3 answers)
Closed 3 years ago.
Data from json is in df and am trying to ouput to a csv.
I am trying to multiply dataframe column with a fixed value and having issues how data is displayed
I have used the following but the data is still not how i want to display
df_entry['Hours'] = df_entry['Hours'].multiply(2)
df_entry['Hours'] = df_entry['Hours'] * 2
Input
ID, name,hrs
100,AB,37.5
Expected
ID, name,hrs
100,AB,75.0
What I am getting
ID, name,hrs
100,AB,37.537.5
That happens because the dtype of the column is str. You need to convert it to float before multiplication.
df_entry['Hours'] = df_entry['Hours'].astype(float) * 2
You can use apply function.
df_entry['Hours'] = df_entry['Hours'].apply(lambda x: float(int(x))*2)
Related
This question already has answers here:
Transposing a 1D NumPy array
(15 answers)
numpy's transpose method can't convert 1D row ndarray to a column one [duplicate]
(2 answers)
Numpy transpose of 1D array not giving expected result
(4 answers)
Closed last month.
I know the simple/worked solution to this question is reshape (-1, 1) for turning row vector (numpy.array) into a column vector (numpy.array).
Specifically, I want to understand why numpy.transpose(a) won't work.
Say,
vector_of_1 = np.transpose(np.ones(N)) # statement 1
And if I define a column vector b, and use the following statement:
V = b + vector_of_1
I would get a weird matrix V.
My fix is to use
vector_of_1 = np.ones(N).reshape(-1,1)
And it works as expected (V being a column vector).
But I want to understand why the transpose method (i.e., statement 1) won't work. Detailed explanation is appreciated.
This question already has answers here:
Implementing thousands (1k = 1000, 1kk = 1000000) interpreter
(3 answers)
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
(6 answers)
Closed 11 months ago.
I have a data frame
df3= pd.DataFrame(
{"a": ["21.7K","22.7K","1.7K"]}
)
I would like to change the type of a to int, I try to replace "K" with "000", but the replace()method doesn't work even I set inplace = True. And it seems replacing with "000" also causes new problems with the inaccurate numbers. how can I change the data type to int?
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I'm stuck with an equivalence of code between R and Python.
Code in R
library(datasets)
data <- airquality
data2 <- data[data$Ozone < 63,]
I download the file of airquality and use pd.read_csv() function for obtain the .csv file into Python. But I don't know how obtain this equivalent line data[data$Ozone < 63,].
data2 = data.loc[data["Ozone"] < 63,:]
This should do the trick.
data["Ozone"] < 63 returns an index where the condition is verified
data.loc[index, :] returns a copy of the dataframe data, for all columns : on the given index
This question already has answers here:
converting currency with $ to numbers in Python pandas
(5 answers)
Closed 3 years ago.
I am summing a column of data using pandas that includes positive and negative values.
I first clean the data by removing the $ sign and parenthesis. Then format as a float.
How can I sum the whole column and subtract by the negative numbers?
Example:
$1000
($200)
$300
$1250
($100)
I want the answer to be 2250 not 2550.
Thanks in advance!
You want to identify the values and the signs:
# positive and negative
signs = np.where(s.str.startswith('('), -1, 1)
# extract the values
vals = s.str.extract('\$([\d\.]*)')[0].astype(int)
# calculate the sum
vals.mul(signs).sum()
# 2250
A Pandas DataFrame object has the .sum method that takes axis as a parameter
my_dataframe['name_of_column_you_want'].sum(axis = 0) # axis=0 means down (the rows)
I don't understand your example.
import re
def clean(column_name) :
if column_name.find('(') > 0 :
return float(re.match(r'(\d+)').group(0))
else :
return -float(re.match(r'(\d+)').group(0))
my_dataframe['column_name'].apply(clean).sum()
This question already has answers here:
Counting the number of non-NaN elements in a numpy ndarray in Python
(5 answers)
Closed 4 years ago.
I'm currently trying to learn Python and Numpy. The task is to determine the length of individual columns of an imported CSV file.
So far I have:
import numpy as np
data = np.loadtxt("assignment5_data.csv", delimiter = ',')
print (data.shape[:])
Which returns:
(62, 2)
Is there a way to iterate through each column to count [not is.nan]?
If I understand correctly, and you are trying to get the length of non-nan values in each column, use:
np.sum(~np.isnan(data),axis=0)