I want to update multiple rows and columns in a CSV file, using pandas
I've tried using iterrows() method but it only works on a single column.
here is the logic I want to apply for multiple rows and columns:
if(value < mean):
value += std_dev
else:
value -= std_dev
Here is another way of doing it,
Consider your data is like this:
price strings value
0 1 A a
1 2 B b
2 3 C c
3 4 D d
4 5 E f
Now lets make strings column as the index:
df.set_index('strings', inplace='True')
#Result
price value
strings
A 1 a
B 2 b
C 3 c
D 4 d
E 5 f
Now set the values of rows C, D, E as 0
df.loc[['C', 'D','E']] = 0
#Result
price value
strings
A 1 a
B 2 b
C 0 0
D 0 0
E 0 0
or you can do more precisely
df.loc[df.strings.isin(["C", "D", "E"]), df.columns.difference(["strings"])] = 0
df
Out[82]:
price strings value
0 1 A a
1 2 B b
2 0 C 0
3 0 D 0
4 0 E 0
Related
Let's assume, I have the following data frame.
Id Combinations
1 (A,B)
2 (C,)
3 (A,D)
4 (D,E,F)
5 (F)
I would like to filter out Combination column values with more than value in a set. Something like below. AND I would like count the number of occurrence as whole in Combination column. For example, ID number 2 and 5 should be removed since their value in a set is only 1.
The result I am looking for is:
ID Combination Frequency
1 A 2
1 B 1
3 A 2
3 D 2
4 D 2
4 E 1
4 F 2
Can anyone help to get the above result in Python pandas?
First if necessary convert values to lists:
df['Combinations'] = df['Combinations'].str.strip('(,)').str.split(',')
If need count after filtering only one values by Series.str.len in boolean indexing, then use DataFrame.explode and count values by Series.map with Series.value_counts:
df1 = df[df['Combinations'].str.len().gt(1)].explode('Combinations')
df1['Frequency'] = df1['Combinations'].map(df1['Combinations'].value_counts())
print (df1)
Id Combinations Frequency
0 1 A 2
0 1 B 1
2 3 A 2
2 3 D 2
3 4 D 2
3 4 E 1
3 4 F 1
Or if need count before removing them filter them by Series.duplicated in last step:
df2 = df.explode('Combinations')
df2['Frequency'] = df2['Combinations'].map(df2['Combinations'].value_counts())
df2 = df2[df2['Id'].duplicated(keep=False)]
Alternative:
df2 = df2[df2.groupby('Id').Id.transform('size') > 1]
Or:
df2 = df2[df2['Id'].map(df2['Id'].value_counts() > 1]
print (df2)
Id Combinations Frequency
0 1 A 2
0 1 B 1
2 3 A 2
2 3 D 2
3 4 D 2
3 4 E 1
3 4 F 2
I have a dataframe that has dtype=object, i.e. categorical variables, for which I'd like to have the counts of each level of. I'd like the result to be a pretty summary of all categorical variables.
To achieve the aforementioned goals, I tried the following:
(line 1) grab the names of all object-type variables
(line 2) count the number of observations for each level (a, b of v1)
(line 3) rename the column so it reads "count"
stringCol = list(df.select_dtypes(include=['object'])) # list object of categorical variables
a = df.groupby(stringCol[0]).agg({stringCol[0]: 'count'})
a = a.rename(index=str, columns={stringCol[0]: 'count'}); a
count
v1
a 1279
b 2382
I'm not sure how to elegantly get the following result where all string column counts are printed. Like so (only v1 and v4 shown, but should be able to print such results for a variable number of columns):
count count
v1 v4
a 1279 l 32
b 2382 u 3055
y 549
The way I can think of doing it is:
select one element of stringCol
calculate the count of for each group of the column.
store the result in a Pandas dataframe.
store the Pandas dataframe in an object (list?)
repeat
if last element of stringCol is done, break.
but there must be a better way than that, just not sure how to do it.
I think simpliest is use loop:
df = pd.DataFrame({'A':list('abaaee'),
'B':list('abbccf'),
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aacbbb')})
print (df)
A B C D E F
0 a a 7 1 5 a
1 b b 8 3 3 a
2 a b 9 5 6 c
3 a c 4 7 9 b
4 e c 2 1 2 b
5 e f 3 0 4 b
stringCol = list(df.select_dtypes(include=['object']))
for c in stringCol:
a = df[c].value_counts().rename_axis(c).to_frame('count')
#alternative
#a = df.groupby(c)[c].count().to_frame('count')
print (a)
count
A
a 3
e 2
b 1
count
B
b 2
c 2
a 1
f 1
count
F
b 3
a 2
c 1
For list of DataFrames use list comprehension:
dfs = [df[c].value_counts().rename_axis(c).to_frame('count') for c in stringCol]
print (dfs)
[ count
A
a 3
e 2
b 1, count
B
b 2
c 2
a 1
f 1, count
F
b 3
a 2
c 1]
I want to select the rows in a dataframe which have zero in every column in a list of columns. e.g. this df:.
In:
df = pd.DataFrame([[1,2,3,6], [2,4,6,8], [0,0,3,4],[1,0,3,4],[0,0,0,0]],columns =['a','b','c','d'])
df
Out:
a b c d
0 1 2 3 6
1 2 4 6 8
2 0 0 3 4
3 1 0 3 4
4 0 0 0 0
Then:
In:
mylist = ['a','b']
selection = df.loc[df['mylist']==0]
selection
I would like to see:
Out:
a b c d
2 0 0 3 4
4 0 0 0 0
Should be simple but I'm having a slow day!
You'll need to determine whether all columns of a row have zeros or not. Given a boolean mask, use DataFrame.all(axis=1) to do that.
df[df[mylist].eq(0).all(1)]
a b c d
2 0 0 3 4
4 0 0 0 0
Note that if you wanted to find rows with zeros in every column, remove the subsetting step:
df[df.eq(0).all(1)]
a b c d
4 0 0 0 0
Using reduce and Numpy's logical_and
The point of this is to eliminate the need to create new Pandas objects and simply produce the mask we are looking for using the data where it sits.
from functools import reduce
df[reduce(np.logical_and, (df[c].values == 0 for c in mylist))]
a b c d
2 0 0 3 4
4 0 0 0 0
I have seen similar questions, but nothing that really matchs my problem. If I have a table of values such as:
value
a
b
b
c
I want to use pandas to add in columns to the table to show for example:
value a b
a 1 0
b 0 1
c 0 0
I have tried the following:
df['a'] = 0
def string_count(indicator):
if indicator == 'a':
df['a'] == 1
df['a'].apply(string_count)
But this produces:
0 None
1 None
2 None
3 None
I would like to at least get to the point where the choices are hardcoded in (i.e I already know that a,b and c appear), but would even better if I could look set the column of strings and then insert a column for each unique string.
Am I approaching this the wrong way?
dummies = pd.get_dummies(df.value)
a b c
0 1 0 0
1 0 1 0
2 0 1 0
3 0 0 1
If you only want to display unique occurrences, you can add:
dummies.index = df.value
dummies.drop_duplicates()
a b c
value
a 1 0 0
b 0 1 0
c 0 0 1
Alternatively:
df = df.join(pd.get_dummies(df.value))
value a b c
0 a 1 0 0
1 b 0 1 0
2 b 0 1 0
3 c 0 0 1
Where you could again .drop_duplicates() to only see unique entries from the value column.
I have a pandas dataframe like the following:
A B C
1 2 1
3 4 0
5 2 0
5 3 1
And would like to get the value from A if the value of C is 1 and the value of B if C is zero. How would I do this? Ultimately I'd like to end up with a vector with the values of A if C is one and B if C is 0 which would be [1,4,2,5]
Assuming you mean "from A is the value of C is 1 and from B if the value of C is 0", which makes sense given your intended output, I might use Series.where:
>>> df
A B C
0 1 2 1
1 3 4 0
2 5 2 0
3 5 3 1
>>> df.A.where(df.C, df.B)
0 1
1 4
2 2
3 5
dtype: int64
which is read "make a series using values of A if the corresponding value of C is true, otherwise use the corresponding value of B". Here since 1 is true we can just use df.C, but we could use df.C == 1 or df.C*5+3 < 4 or any other boolean Series.