I am trying to drop rows in pandas based on whether or not it contains "/" in the cells in column "Price". I have referred to the question: Drop rows in pandas if they contains "???".
As such, I have tried both codes:
df = df[~df["Price"].str.contains('/')]
and
df = df[~df["Price"].str.contains('/',regex=False)]
However, both codes give the error:
AttributeError: Can only use .str accessor with string values!
For reference, the first few rows of my dataframe is as follows:
Fruit Price
0 Apple 3
1 Apple 2/3
2 Banana 2
3 Orange 6/7
May I know what went wrong and how can I fix this problem? Thank you very much!
Try this:
df = df[~df['Price'].astype(str).str.contains('/')]
print(df)
Fruit Price
0 Apple 3
2 Banana 2
You need to convert the price column to string first and then apply this operation. I believe that price column doesn't have datatype string
df['Price'] = df['Price'].astype(str)
and then try
df = df[~df["Price"].str.contains('/',regex=False)]
Related
I want to subset a DataFrame by two columns in different dataframes if the values in the columns are the same. Here is an example of df1 and df2:
df1
A
0 apple
1 pear
2 orange
3 apple
df2
B
0 apple
1 orange
2 orange
3 pear
I would like the output to be a subsetted df1 based upon the df2 column:
A
0 apple
2 orange
I tried
df1 = df1[df1.A == df2.B] but get the following error:
ValueError: Can only compare identically-labeled Series objects
I do not want to rename the column in either.
What is the best way to do this? Thanks
If need compare index values with both columns create Multiindex and use Index.isin:
df = df1[df1.set_index('A', append=True).index.isin(df2.set_index('B', append=True).index)]
print (df)
A
0 apple
2 orange
I could find some answer which should have work but strangely it did not. Any help would be appreciated.
I have the following dataframe:
vendor currency value
2 CKE 3
PWW 2
LPS 1
5 PWO 4
On this df I try to take only the following desired output with the code:
vendor currency value
2 CKE 3
LPS 1
CODE:
fiat = ['CKE','LPS','ZZZ']
df = df.loc[(2, fiat)]
ERROR:
KeyError: "None of [Index(['CKE','LPS','ZZZ'], dtype='object')] are in the [columns]"
You can add : for select all columns, without it pandas incorrectly parse it like second value of tuple are not existing columns names, so error is raised:
fiat = ['CKE','LPS','ZZZ']
df = df.loc[(2, fiat), :]
print (df)
value
vendor currency
2 CKE 3
LPS 1
I have a .tsv file dataset, and I transformed it into a DataFrame using Pandas.
Imagine that my_tsv_file was something like:
A Apple
B Orange
C Pear
To build the DataFrame I used:
df = pandas.read_csv(my_tsv_file, sep='\t')
Now, the first row of my_tsv_file was originally a row part of the data, but it has been transformed to the "key row" in the new DataFrame. So now the Dataframe is something like:
A Apple
0 B Orange
1 C Pear
As "A" and "Apple" were keys, when they actually are not. I would like to add the correct "key row", in order to obtain something like:
ID Fruit
0 A Apple
1 B Orange
2 C Pear
How can I achieve this?
I can't modify the original .tsv file.
Please remind that I am at the very beginning with Python and Pandas.
have you tried
df = pandas.read_csv(my_tsv_file, sep='\t', names=['ID', 'Fruit'])
I have the following df:
Item Service Damage Type Price
A Fast 3.5 1 15.48403728
A Slow 3.5 1 17.41954194
B Fast 5 1 19.3550466
B Slow 5 1 21.29055126
C Fast 5.5 1 23.22605592
and so on
I want to turn this into this format:
Item Damage Type Price_Fast Price_slow
So the first row would be:
Item Damage Type Price_Fast Price_slow
A 3.5 1 15.4840.. 17.41954...
I tried:
df.pivot(index=['Item', 'Damage', 'Type'],columns='Service', values='Price')
but it threw this error:
ValueError: Length of passed values is 2340, index implies 3
To get exactly the dataframe layout you want use
dfData = dfRaw.pivot_table(index=['Item', 'Damage', 'Type'],columns='Service', values='Price')
like #CJR suggested followed by
dfData.reset_index(inplace=True)
to flatten dataframe and
dfData.rename(columns={'Fast': 'Price_fast'}, inplace=True)
dfData.rename(columns={'Slow': 'Price_slow'}, inplace=True)
to get your desired column names.
Then use
dfNew.columns = dfNew.columns.values
to get rid of custom index label and your are done (Thanks to #Akaisteph7 for pointing that out that I was not quite done with my previous solution.)
You can do it with the following code:
# You should use pivot_table as it handles multiple column pivoting and duplicates aggregation
df2 = df.pivot_table(index=['Item', 'Damage', 'Type'], columns='Service', values='Price')
# Make the pivot indexes back into columns
df2.reset_index(inplace=True)
# Change the columns' names
df2.rename(columns=lambda x: "Price_"+x if x in ["Fast", "Slow"] else x, inplace=True)
# Remove the unneeded column Index name
df2.columns = df2.columns.values
print(df2)
Output:
Item Damage Type Price_Fast Price_Slow
0 A 3.5 1 15.484037 17.419542
1 B 5.0 1 19.355047 21.290551
2 C 5.5 1 23.226056 NaN
After using transpose on a dataframe there is always an extra row as a remainder from the initial dataframe's index for example:
import pandas as pd
df = pd.DataFrame({'fruit':['apple','banana'],'number':[3,5]})
df
fruit number
0 apple 3
1 banana 5
df.transpose()
0 1
fruit apple banana
number 3 5
Even when i have no index:
df.reset_index(drop = True, inplace = True)
df
fruit number
0 apple 3
1 banana 5
df.transpose()
0 1
fruit apple banana
number 3 5
The problem is that when I save the dataframe to a csv file by:
df.to_csv(f)
this extra row stays at the top and I have to remove it manually every time.
Also this doesn't work:
df.to_csv(f, index = None)
because the old index is no longer considered an index (just another row...).
It also happened when I transposed the other way around and I got an extra column which i could not remove.
Any tips?
I had the same problem, I solved it by reseting index before doing the transpose. I mean df.set_index('fruit').transpose():
import pandas as pd
df = pd.DataFrame({'fruit':['apple','banana'],'number':[3,5]})
df
fruit number
0 apple 3
1 banana 5
And df.set_index('fruit').transpose() gives:
fruit apple banana
number 3 5
Instead of removing the extra index, why don't try setting the new index that you want and then use slicing ?
step 1: Set the new index you want:
df.columns = df.iloc[0]
step 2: Create a new dataframe removing extra row.
df_new = df[1:]