I have been trying to get the total value of all columns based on an argument, but it didn't work out.
import numpy as np
np.random.seed(100)
NO= pd.DataFrame({'TR':'NO', 'A': np.random.randint(1, 10,3), 'B': np.random.randint(10, 20,3), 'C': np.random.randint(25, 35,3)})
YS= pd.DataFrame({'TR':'YS', 'A': np.random.randint(1, 10,3), 'B': np.random.randint(10, 20,3), 'C': np.random.randint(25, 35,3)})
frames = (NO, YS)
df = pd.concat(frames)
Total=df.loc[df['TR'] == 'NO', ['A', 'B', 'C']].sum()
The total would be a single value = 152
You have to sum twice to reduce the dimensional:
>>> df.loc[df['TR'] == 'NO', ['A', 'B', 'C']].sum().sum()
152
Related
I have two data frames with same column names but some columns may have different datatypes. How do i copy the {col:datatype} from reference dataframe and apply to the main dataframe
df1 = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': ['a', 'b', 'c', 'd', 'e'],
'C': [1.1, '1.0', '1.3', 2, 5] })
df2 = pd.DataFrame({
'A': [1.0, 2.0, 3.0, 4.0, 5.0],
'B': ['a', 'b', 'c', 'd', 'e'],
'C': [1.1, '1.0', '1.3', 2, 5] })
dtypes =df1.dtypes.astype(str).to_dict() #take the columns and its datatypes from reference df
df2 = df2.astype({k:v for k,v in dtypes.items()})# apply to main df
You can loop through the columns and apply the type.
But you may need to add some error handling as astype requires the column to be castable to the provided type. So if you try to .astype(int) a column where a value is 'c', you can't. Generally you would use the more flexible pd.to_numeric or pd.to_datetime methods that can coerce bad values to NaN and infer the dtype (i.e. float vs. int for pd.to_numeric).
for col in df2.columns:
try:
df2[col] = df2[col].astype(dtypes[col])
except (KeyError, ValueError):
pass
May be you can try this,
enter code here
for i in df1.columns:
df2[i]=df2[i].astype(df1[i].dtype)
I have a pandas dataframe being generated by some other piece of code - the dataframe may have different number of columns each time it is generated: let's call them col1,col2,...,coln where n is not fixed. Please note that col1,col2,... are just placeholders, the actual names of columns can be arbitrary like TimeStamp or PrevState.
From this, I want to convert each column into a list, with the name of the list being the same as the column. So, I want a list named col1 with the entries in the first column of the dataframe and so on till coln.
How do I do this?
Thanks
It is not recommended, better is create dictionary:
d = df.to_dict('list')
And then select list by keys of dict from columns names:
print (d['col'])
Sample:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
d = df.to_dict('list')
print (d)
{'A': ['a', 'b', 'c', 'd', 'e', 'f'], 'B': [4, 5, 4, 5, 5, 4], 'C': [7, 8, 9, 4, 2, 3]}
print (d['A'])
['a', 'b', 'c', 'd', 'e', 'f']
import pandas as pd
df = pd.DataFrame()
df["col1"] = [1,2,3,4,5]
df["colTWO"] = [6,7,8,9,10]
for col_name in df.columns:
exec(col_name + " = " + df[col_name].values.__repr__())
How to groupby two keys in dictionary and get the sum of the values of the other key val.
Input:
data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'],
'val':[1, 2, 3, 4]}
In this example, I want to groupby the key1 and the key2, and then sum up the value in val.
Expected:
data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'],
'val':[1, 2, 3, 4], 'val_sum':[1, 2, 7, 7]}
Actually, I don't want to convert the dictionary data into pandas.DataFrame then convert back to dictionary to achieve it, because my data is actually very big.
Update:
To help understand the generating val_sum, I post my code using pandas.DataFrame.
df = pd.DataFrame(data)
tmp = df.groupby(['key1', 'key2'])['val'].agg({'val_sum':'sum'})
df['val_sum'] = df.set_index(['key1', 'key2']).index.map(tmp.to_dict()['val_sum'])
And the result is shown as follows:
key1 key2 val val_sum
0 a m 1 1
1 a n 2 2
2 b m 3 7
3 b m 4 7
You can build your own summing solution using a defaultdict, say as follows.
from collections import defaultdict
data = {'key1':['a','a', 'b', 'b'], 'key2':['m','n', 'm', 'm'],
'val':[1, 2, 3, 4]}
keys_to_group = ['key1','key2']
temp = defaultdict(int) #initializes sum to zero
for i, *key_group in zip(data['val'], *[data[key] for key in keys_to_group]):
print(i, key_group) #key_group now looks like ['a', 'm'] or ['b', 'm'] or so on
temp[tuple(key_group)] += i
val_sum = [temp[key_group] for key_group in zip(*[data[key] for key in keys_to_group])]
data['val_sum'] = val_sum
print(data)
{'key1': ['a', 'a', 'b', 'b'],
'key2': ['m', 'n', 'm', 'm'],
'val': [1, 2, 3, 4],
'val_sum': [1, 2, 7, 7]}
Having said that however, it does seem like your data is more suited for tabular structures, and if you plan to do more than just this one operation, it might make sense to load it up in a dataframe anyways.
This question already has answers here:
Fast replacement of values in a numpy array
(10 answers)
Closed 7 years ago.
I have a numpy array, which has hundreds of elements which are capital letters, in no particular order
import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J', ...])
Each element in this numpy.ndarray is a numpy.string_.
I also have a "translation dictionary", with key/value pairs such that the capital letter corresponds to a city
transdict = {'A': 'Adelaide', 'B': 'Bombay', 'C': 'Cologne',...}
There are only 26 pairs in the dictionary transdict, but there are hundreds of letters in the numpy array I must translate.
What is the most efficient way to do this?
I have considered using numpy.core.defchararray.replace(a, old, new, count=None)[source] but this returns a ValueError, as the numpy array is a different size that the dictionary keys/values.
AttributeError: 'numpy.ndarray' object has no attribute 'translate'
With brute-force NumPy broadcasting -
idx = np.nonzero(transdict.keys() == abc_array[:,None])[1]
out = np.asarray(transdict.values())[idx]
With np.searchsorted based searching and indexing -
sort_idx = np.argsort(transdict.keys())
idx = np.searchsorted(transdict.keys(),abc_array,sorter = sort_idx)
out = np.asarray(transdict.values())[sort_idx][idx]
Sample run -
In [1]: abc_array = np.array(['B', 'D', 'A', 'B', 'D', 'A', 'C'])
...: transdict = {'A': 'Adelaide', 'B': 'Bombay', 'C': 'Cologne', 'D': 'Delhi'}
...:
In [2]: idx = np.nonzero(transdict.keys() == abc_array[:,None])[1]
...: out = np.asarray(transdict.values())[idx]
...:
In [3]: out
Out[3]:
array(['Bombay', 'Delhi', 'Adelaide', 'Bombay', 'Delhi', 'Adelaide',
'Cologne'],
dtype='|S8')
In [4]: sort_idx = np.argsort(transdict.keys())
...: idx = np.searchsorted(transdict.keys(),abc_array,sorter = sort_idx)
...: out = np.asarray(transdict.values())[sort_idx][idx]
...:
In [5]: out
Out[5]:
array(['Bombay', 'Delhi', 'Adelaide', 'Bombay', 'Delhi', 'Adelaide',
'Cologne'],
dtype='|S8')
Will this do? Sometimes, plain Python is a good, direct way to handle such things. The below builds a list of translations (easily converted back to a numpy array) and the joined output.
import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J'])
transdict = {'A': 'Adelaide',
'B': 'Bombay',
'C': 'Cologne',
'D': 'Dresden',
'E': 'Erlangen',
'F': 'Formosa',
'G': 'Gdansk',
'H': 'Hague',
'I': 'Inchon',
'J': 'Jakarta',
'Z': 'Zambia'
}
phoenetic = [transdict[letter] for letter in abc_array]
print ' '.join(phoenetic)
The output from this is:
Bombay Dresden Adelaide Formosa Hague Inchon Zambia Jakarta
Let's suppose I have the following DataFrame:
import pandas as pd
df = pd.DataFrame({'label': ['a', 'a', 'b', 'b', 'a', 'b', 'c', 'c', 'a', 'a'],
'numbers': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
'arbitrarydata': [False] * 10})
I want to assign a value to the arbitrarydata column according to the values in both of the other colums. A naive approach would be as follows:
for _, grp in df.groupby(('label', 'numbers')):
grp.arbitrarydata = pd.np.random.rand()
Naturally, this doesn't propagate changes back to df. Is there a way to modify a group such that changes are reflected in the original DataFrame ?
Try using transform, e.g.:
df['arbitrarydata'] = df.groupby(('label', 'numbers')).transform(lambda x: np.random.rand())