I am trying to convert the all the cells value (except date) to float point number, but I'm getting and
error:
Can only use .str accessor with string values!
here is my code:
df['Market Cap_'+str(coin)] = df['Market Cap_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Volume_'+str(coin)] = df['Volume_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Open_'+str(coin)] = df['Open_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Close_'+str(coin)] = df['Close_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
here is the output of df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 1 to 30
Data columns (total 5 columns):
Column Non-Null Count Dtype
0 Date_ETHEREUM 30 non-null datetime64[ns]
1 Market Cap_ETHEREUM 30 non-null float64
2 Volume_ETHEREUM 30 non-null float64
3 Open_ETHEREUM 30 non-null float64
4 Close_ETHEREUM 30 non-null object
dtypes: datetime64ns, float64(3), object(1)
memory usage: 1.4+ KB
here is an image of my dataframe:
Note: Coin is just a string which added dynamically from URL for each particular coin table.
I would appreciate any help or an alternative solution.
You have a $ sign so the value cannot be parsed as a float. Remove it before converting the column to a float type
Related
When I try to read the date from Excel file, I found that
the column called "TOT_SALES" the data type is float64 and all values with % sign.
I want to remove this sign and dividing all values on 100. And at the same time the values in the column in Excel file are regular as I mentioned.
any help how to remove the (%) from the values.
df= pd.read_excel("QVI_transaction_data_1.xlsx")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264836 entries, 0 to 264835
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DATE 264836 non-null datetime64[ns]
1 STORE_NBR 264836 non-null int64
2 LYLTY_CARD_NBR 264836 non-null int64
3 TXN_ID 264836 non-null int64
4 PROD_NBR 264836 non-null int64
5 PROD_NAME 264836 non-null object
6 PROD_QTY 264836 non-null int64
7 TOT_SALES 264836 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(5), object(1)
memory usage: 16.2+ MB
df.head()
this is the result appear in the DataFrame
TOT_SALES
1180.00%
740.00%
420.00%
1080.00%
660.00%
This is the values in the Excel file without (%) sign
TOT_SALES
11.80
7.40
4.20
10.80
6.60
enter image description here
enter image description here
enter image description here
I am having trouble solving one assignment. Well, in a dataframe in one column I have values as text strings (objects). I want to convert this to a numeric value but every time I get an error that I cannot convert the string to a float.
I want to try using regex to convert the string '-1 203.45' into the value '1203.45'. Please help me how the code should be written in Pandas.
I've tested virtually all of the forum hints and none of them work. Please give me a hint.
First what I did:
Read the file csv in different way:
df = pd.read_csv('dane_navision.csv', delimiter=";", decimal= ",",
thousands=" " )
and
df = pd.read_table('dane_navision.csv', delimiter=";", thousands=" ", decimal=',')
I receive such table:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1905 entries, 0 to 1904
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Data księgowania 1905 non-null object
1 Typ zapisu 1905 non-null object
2 Nr zapisu 1905 non-null int64
3 Nr dokumentu 1905 non-null object
4 Nr zapasu 1905 non-null object
5 Opis 1905 non-null object
6 Kod lokalizacji 1905 non-null object
7 Ilość 1905 non-null object
8 Ilość zafakturowana 1905 non-null object
9 MPK - kod 1905 non-null object
10 Nr dok.zewn. 0 non-null float64
11 Kod kategorii zapasu 1826 non-null object
12 Ilość na jednostkę miary 1905 non-null int64
13 Kwota kosztu (rzeczywista) 1905 non-null object
dtypes: float64(1), int64(2), object(11)
memory usage: 208.5+ KB
For me is important 13 column. I wanted to change them from object to float and I can't do it.
I get the following error every time:
ValueError: could not convert string to float: '-1xa0032.02'
I tried many ways with the method:
df['colum_name'].astype(float) - does not work.
df['column_name'].str.replace[" ", "" ] - does not work.
Maybe someone has some idea how to cut space from string and then it will be easier to convert it to number.
I am trying to convert all the cells value (except date) to float point number, I can successfully convert first 3 column but getting an error on the last one:
Here is my code:
df['Market Cap_'+str(coin)] = df['Market Cap_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Volume_'+str(coin)] = df['Volume_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Open_'+str(coin)] = df['Open_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Close_'+str(coin)] = df['Close_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
Here is df.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 1 to 30
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date_ETHEREUM 30 non-null datetime64[ns]
1 Market Cap_ETHEREUM 30 non-null float64
2 Volume_ETHEREUM 30 non-null float64
3 Open_ETHEREUM 30 non-null float64
4 Close_ETHEREUM 30 non-null object
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.4+ KB
And here is the Error:
AttributeError: Can only use .str accessor with string values!
As you can see the column type is an object, (same as what others were before conversion, but I'm getting an error on this one)
I am new to pandas and I am trying to convert Time into DateTime format. Unfortunately I get the time with an added date which is not my intention.
My dataFrame is the following:
After running data['Time'] = pd.to_datetime(data['Time'], format = '%H:%M:%S') I get the following:
What am I doing wrong?
Try this:
data = {'time':['05:05:30','06:04:23','03:40:45','12:05:30'], 'value':[2,3,5,7]}
data = pd.DataFrame(data)
data['TIME']=pd.to_datetime(data['time'],format='%H:%M:%S')
you get TIME in the desired format:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 4 non-null object
1 value 4 non-null int64
2 TIME 4 non-null timedelta64[ns]
dtypes: int64(1), object(1), timedelta64[ns](1)
Hello I have an issue to convert column of object to integer for complete column.
I have a data frame and I tried to convert some columns that are detected as Object into Integer (or Float) but all the answers I already found are working for me
First status
Then I tried to apply the to_numeric method but doesn't work.
To numeric method
Then a custom method that you can find here: Pandas: convert dtype 'object' to int
but doesn't work either: data3['Title'].astype(str).astype(int)
( I cannot pass the image anymore - You have to trust me that it doesn't work)
I tried to use the inplace statement but doesn't seem to be integrated in those methods:
I am pretty sure that the answer is dumb but cannot find it
You need assign output back:
#maybe also works omit astype(str)
data3['Title'] = data3['Title'].astype(str).astype(int)
Or:
data3['Title'] = pd.to_numeric(data3['Title'])
Sample:
data3 = pd.DataFrame({'Title':['15','12','10']})
print (data3)
Title
0 15
1 12
2 10
print (data3.dtypes)
Title object
dtype: object
data3['Title'] = pd.to_numeric(data3['Title'])
print (data3.dtypes)
Title int64
dtype: object
data3['Title'] = data3['Title'].astype(int)
print (data3.dtypes)
Title int32
dtype: object
As python_enthusiast said ,
This command works for me too
data3.Title = data3.Title.str.replace(',', '').astype(float).astype(int)
but also works fine with
data3.Title = data3.Title.str.replace(',', '').astype(int)
you have to use str before replace in order to get rid of commas only then change it to int/float other wise you will get error .
2 years and 11 months later, but here I go.
It's important to check if your data has any spaces, special characters (like commas, dots, or whatever else) first. If yes, then you need to basically remove those and then convert your string data into float and then into an integer (this is what worked for me for the case where my data was numerical values but with commas, like 4,118,662).
data3.Title = data3.Title.str.replace(',', '').astype(flaoat).astype(int)
also you can try this code, work fine with me
data3.Title= pd.factorize(data3.Title)[0]
Version that works with Nulls
With older version of Pandas there was no NaN for int but newer versions of pandas offer Int64 which has pd.NA.
So to go from object to int with missing data you can do this.
df['col'] = df['col'].astype(float)
df['col'] = df['col'].astype('Int64')
By switching to float first you avoid object cannot be converted to an IntegerDtype error.
Note it is capital 'I' in the Int64.
More info here https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
Working with pd.NA
In Pandas 1.0 the new pd.NA datatype has been introduced; the goal of pd.NA is to provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending on the data type).
With this in mind they have created the dataframe.convert_dtypes() and Series.convert_dtypes() functions which converts to datatypes that support pd.NA. This is currently considered experimental but might well be a bright future.
I had a dataset like this
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79902 entries, 0 to 79901
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Query 79902 non-null object
1 Video Title 79902 non-null object
2 Video ID 79902 non-null object
3 Video Views 79902 non-null object
4 Comment ID 79902 non-null object
5 cleaned_comments 79902 non-null object
dtypes: object(6)
memory usage: 5.5+ MB
Removed the None, NaN entries using
dataset = dataset.replace(to_replace='None', value=np.nan).dropna()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 79868 entries, 0 to 79901
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Query 79868 non-null object
1 Video Title 79868 non-null object
2 Video ID 79868 non-null object
3 Video Views 79868 non-null object
4 Comment ID 79868 non-null object
5 cleaned_comments 79868 non-null object
dtypes: object(6)
memory usage: 6.1+ MB
Notice the reduced entries
But the Video Views were floats, as shown in dataset.head()
Then I used
dataset['Video Views'] = pd.to_numeric(dataset['Video Views'])
dataset['Video Views'] = dataset['Video Views'].astype(int)
Now,
<class 'pandas.core.frame.DataFrame'>
Int64Index: 79868 entries, 0 to 79901
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Query 79868 non-null object
1 Video Title 79868 non-null object
2 Video ID 79868 non-null object
3 Video Views 79868 non-null int64
4 Comment ID 79868 non-null object
5 cleaned_comments 79868 non-null object
dtypes: int64(1), object(5)
memory usage: 6.1+ MB