I'm trying to assign a value to a cell, yet Pandas rounds it to zero. (I'm using Python 3.6)
in: df['column1']['row1'] = 1 / 331616
in: print(df['column1']['row1'])
out: 0
But if I try to assign this value to a standard Python dictionary key, it works fine.
in: {'column1': {'row1': 1/331616}}
out: {'column1': {'row1': 3.0155360416867704e-06}}
I've already done this, but it didn't help:
pd.set_option('precision',50)
pd.set_option('chop_threshold',
.00000000005)
Please, help.
pandas appears to be presuming that your datatype is an integer (int).
There are several ways to address this, either by setting the datatype to a float when the DataFrame is constructed OR by changing (or casting) the datatype (also referred to as a dtype) to a float on the fly.
setting the datatype (dtype) during construction:
>>> import pandas as pd
In making this simple DataFrame, we provide a single example value (1) and the columns for the DataFrame are defined as containing floats during creation
>>> df = pd.DataFrame([[1]], columns=['column1'], index=['row1'], dtype=float)
>>> df['column1']['row1'] = 1 / 331616
>>> df
column1
row1 0.000003
converting the datatype on the fly:
>>> df = pd.DataFrame([[1]], columns=['column1'], index=['row1'], dtype=int)
>>> df['column1'] = df['column1'].astype(float)
>>> df['column1']['row1'] = 1 / 331616
df
column1
row1 0.000003
Your column's datatype most likely is set to int. You'll need to either convert it to float or mixed types object before assigning the value:
df = pd.DataFrame([1,2,3,4,5,6])
df.dtypes
# 0 int64
# dtype: object
df[0][4] = 7/125
df
# 0
# 0 1
# 1 2
# 2 3
# 3 4
# 4 0
# 5 6
df[0] = df[0].astype('O')
df[0][4] = 7 / 22
df
# 0
# 0 1
# 1 2
# 2 3
# 3 4
# 4 0.318182
# 5 6
df.dtypes
# 0 object
# dtype: object
Related
This question is related to how to check the dtype of a column in python pandas.
An empty pandas dataframe is created. Following this, it's filled with data. How can I then check if any of its columns contain complex types?
index = [np.array(['foo', 'qux'])]
columns = ["A", "B"]
df = pd.DataFrame(index=index, columns=columns)
df.loc['foo']["A"] = 1 + 1j
df.loc['foo']["B"] = 1
df.loc['qux']["A"] = 2
df.loc['qux']["B"] = 2
print df
for type in df.dtypes:
if type == complex:
print type
At the moment, I get the type as object which isn't useful.
A B
foo (1+1j) 1
qux 2 2
Consider the series s
s = pd.Series([1, 3.4, 2 + 1j], dtype=np.object)
s
0 1
1 3.4
2 (2+1j)
dtype: object
If I use pd.to_numeric, it will upcast the dtype to complex if any are complex
pd.to_numeric(s).dtype
dtype('complex128')
I'm trying to use assign to create a new column in a pandas DataFrame. I need to use something like str.format to have the new column be pieces of existing columns. For instance...
import pandas as pd
df = pd.DataFrame(np.random.randn(3, 3))
gives me...
0 1 2
0 -0.738703 -1.027115 1.129253
1 0.674314 0.525223 -0.371896
2 1.021304 0.169181 -0.884293
an assign for a totally new column works
# works
print(df.assign(c = "a"))
0 1 2 c
0 -0.738703 -1.027115 1.129253 a
1 0.674314 0.525223 -0.371896 a
2 1.021304 0.169181 -0.884293 a
But, if I want to use an existing column into a new column it seems like pandas is adding the whole existing frame into the new column.
# doesn't work
print(df.assign(c = "a{}b".format(df[0])))
0 1 2 \
0 -0.738703 -1.027115 1.129253
1 0.674314 0.525223 -0.371896
2 1.021304 0.169181 -0.884293
c
0 a0 -0.738703\n1 0.674314\n2 1.021304\n...
1 a0 -0.738703\n1 0.674314\n2 1.021304\n...
2 a0 -0.738703\n1 0.674314\n2 1.021304\n...
Thanks for the help.
In [131]: df.assign(c="a"+df[0].astype(str)+"b")
Out[131]:
0 1 2 c
0 0.833556 -0.106183 -0.910005 a0.833556419295b
1 -1.487825 1.173338 1.650466 a-1.48782514804b
2 -0.836795 -1.192674 -0.212900 a-0.836795026809b
'a{}b'.format(df[0]) is a str. "a"+df[0].astype(str)+"b" is a Series.
In [142]: type(df[0].astype(str))
Out[142]: pandas.core.series.Series
In [143]: type('{}'.format(df[0]))
Out[143]: str
When you assign a single string to the column c, that string is repeated for every row in df.
Thus, df.assign(c = "a{}b".format(df[0])) assigns the string 'a{}b'.format(df[0])
to each row of df:
In [138]: 'a{}b'.format(df[0])
Out[138]: 'a0 0.833556\n1 -1.487825\n2 -0.836795\nName: 0, dtype: float64b'
It is really no different than what happened with df.assign(c = "a").
In contrast, when you assign a Series to the column c, then the index of the Series is aligned with the index of df and the corresponding values are assigned to df['c'].
Under the hood, the Series.__add__ method is defined in such a way so that addition of the Series containing strings with a string results in a new Series with the string concatenated with the values in the Series:
In [149]: "a"+df[0].astype(str)
Out[149]:
0 a0.833556419295
1 a-1.48782514804
2 a-0.836795026809
Name: 0, dtype: object
(The astype method was called to convert the floats in df[0] into strings.)
df['c'] = "a" + df[0].astype(str) + 'b'
df
0 1 2 c
0 -1.134154 -0.367397 0.906239 a-1.13415403091b
1 0.551997 -0.160217 -0.869291 a0.551996920472b
2 0.490102 -1.151301 0.541888 a0.490101854737b
I have a series containing data like
0 a
1 ab
2 b
3 a
And I want to replace any row containing 'b' to 1, and all others to 0. I've tried
one = labels.str.contains('b')
zero = ~labels.str.contains('b')
labels.ix[one] = 1
labels.ix[zero] = 0
And this does the trick but it gives this pesky warning
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
And I know I've seen this before in the last few times I've used pandas. Could you please give the recommended approach? My method gives the desired result, but what should I do? Also, I think Python is supposed to be an 'if it makes logical sense and you type it it will run' kind of language, but my solution seems perfectly logical in the human-readable sense and it seems very non-pythonic that it throws an error.
Try this:
ds = pd.Series(['a','ab','b','a'])
ds
0 a
1 ab
2 b
3 a
dtype: object
ds.apply(lambda x: 1 if 'b' in x else 0)
0 0
1 1
2 1
3 0
dtype: int64
You can use numpy.where. Output is numpy.ndarray, so you have to use Series constructor:
import pandas as pd
import numpy as np
ser = pd.Series(['a','ab','b','a'])
print ser
0 a
1 ab
2 b
3 a
dtype: object
print np.where(ser.str.contains('b'),1,0)
[0 1 1 0]
print type(np.where(ser.str.contains('b'),1,0))
<type 'numpy.ndarray'>
print pd.Series(np.where(ser.str.contains('b'),1,0), index=ser.index)
0 0
1 1
2 1
3 0
dtype: int32
I want to sum up all values that I select based on some function of column and row.
Another way of putting it is that I want to use a function of the row index and column index to determine if a value should be included in a sum along an axis.
Is there an easy way of doing this?
Columns can be selected using the syntax dataframe[<list of columns>]. The index (row) can be used for filtering using the dataframe.index method.
import pandas as pd
df = pd.DataFrame({'a': [0.1, 0.2], 'b': [0.2, 0.1]})
odd_a = df['a'][df.index % 2 == 1]
even_b = df['b'][df.index % 2 == 0]
# odd_a:
# 1 0.2
# Name: a, dtype: float64
# even_b:
# 0 0.2
# Name: b, dtype: float64
If df is your dataframe :
In [477]: df
Out[477]:
A s2 B
0 1 5 5
1 2 3 5
2 4 5 5
You can access the odd rows like this :
In [478]: df.loc[1::2]
Out[478]:
A s2 B
1 2 3 5
and the even ones like this:
In [479]: df.loc[::2]
Out[479]:
A s2 B
0 1 5 5
2 4 5 5
To answer your question, getting even rows and column B would be :
In [480]: df.loc[::2,'B']
Out[480]:
0 5
2 5
Name: B, dtype: int64
and odd rows and column A can be done as:
In [481]: df.loc[1::2,'A']
Out[481]:
1 2
Name: A, dtype: int64
I think this should be fairly general if not the cleanest implementation. This should allow applying separate functions for rows and columns depending on conditions (that I defined here in dictionaries).
import numpy as np
import pandas as pd
ran = np.random.randint(0,10,size=(5,5))
df = pd.DataFrame(ran,columns = ["a","b","c","d","e"])
# A dictionary to define what function is passed
d_col = {"high":["a","c","e"], "low":["b","d"]}
d_row = {"high":[1,2,3], "low":[0,4]}
# Generate list of Pandas boolean Series
i_col = [df[i].apply(lambda x: x>5) if i in d_col["high"] else df[i].apply(lambda x: x<5) for i in df.columns]
# Pass the series as a matrix
df = df[pd.concat(i_col,axis=1)]
# Now do this again for rows
i_row = [df.T[i].apply(lambda x: x>5) if i in d_row["high"] else df.T[i].apply(lambda x: x<5) for i in df.T.columns]
# Return back the DataFrame in original shape
df = df.T[pd.concat(i_row,axis=1)].T
# Perform the final operation such as sum on the returned DataFrame
print(df.sum().sum())
I'm trying to plot a DataFrame using pandas but it's not working (see this similar thread for details). I think part of the problem might be that my DataFrame seems to be made of objects:
>>> df.dtypes
Field object
Moment object
Temperature object
However, if I were to convert all the values to type float, I lose a lot of precision. All the values in column Moment are of the form -132.2036E-06 and converting to float with df1 = df.astype(float) changes it to -0.000132.
Anyone know how I can preserve the precision?
You can do this to change the displayed precision
In [1]: df = DataFrame(np.random.randn(5,2))
In [2]: df
Out[2]:
0 1
0 0.943371 0.171686
1 1.508525 0.005589
2 -0.764565 0.259490
3 -1.059662 -0.837602
4 0.561804 -0.592487
[5 rows x 2 columns]
In [3]: pd.set_option('display.precision',12)
In [4]: df
Out[4]:
0 1
0 0.94337126946 0.17168604324
1 1.50852519105 0.00558907755
2 -0.76456509501 0.25948965731
3 -1.05966206139 -0.83760201886
4 0.56180449801 -0.59248656304
[5 rows x 2 columns]