Below is my dataframe:
Id,ReturnCreated,ReturnTime,TS_startTime
O108808972773560,Return Not Created,nan,2018-08-23 12:30:41
O100497888936380,Return Not Created,nan,2018-08-18 14:57:20
O109648374050370,Return Not Created,nan,2018-08-16 13:50:06
O112787613729150,Return Not Created,nan,2018-08-16 13:15:26
O110938305325240,Return Not Created,nan,2018-08-22 11:03:37
O110829757146060,Return Not Created,nan,2018-08-21 16:10:37
I want to replace the nan with Blanks. Tried the below code, but its not working.
import pandas as pd
import numpy as np
df = pd.concat({k:pd.Series(v) for k, v in ordercreated.items()}).unstack().astype(str).sort_index()
df.columns = 'ReturnCreated ReturnTime TS_startTime'.split()
df1 = df.replace(np.nan,"", regex=True)
df1.to_csv('OrderCreationdetails.csv')
Kindly help me understand where i am going wrong and how can i fix the same.
You should try DataFrame.fillna() method
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
In your case:
df1 = df.fillna("")
should work I think
I think nans are strings, because .astype(str). So need:
df1 = df.replace('nan',"")
Either you can use df.fillna("") (i think that will perform better) or simple replace that values with blank
df1 = df.replace('NaN',"")
Related
I'm trying to assign an empty cell (blank/nan) in my "np.where" condition to my pandas dataframe, but nothing seems to work.
The reason for this is to run fillna,ffill on the missing values.
Np.Where code:
df['x'] = np.where(df['y']>0.05,1,np.nan)
Fillna code:
df['x'] = df['x'].fillna(method="ffill")
Anybody know where I'm going wrong?
this line of code works:
df['x'] = np.where(df['y']>0.05,1,np.nan)
just remove the unneeded paratheses in the right
I was able to fix it using pandas.NA instead which fillna, for some reason, recognizes as blanks to fill with ffill
Fix:
df['x'] = np.where(df['y']>0.05,1,pd.NA)
df['x'] = df['x'].fillna(method="ffill")
I'm looking to convert a column to lower case. The issue is there are some instances where the string within the column only contains numbers. In my real life case this is due to poor data entry. Instead of having these values converted to NaN, I would like to keep the numeric string as is. What is the best approach to achieving this?
Below is my current code and output
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
df['col'].str.lower()
Current Output
Desired Output
Just convert to column to strings first:
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
print(df['col'].astype(str).str.lower())
Pre-Define the data as str format.
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']}, dtype=str)
print(df['col'].str.lower())
to add a slight variation to Tim Roberts' solution without using the .str accessor:
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
print(df['col'].astype(str).apply(lambda x: x.lower()))
I have dictionary and created Pandas using
cars = pd.DataFrame.from_dict(cars_dict, orient='index')
and
sorted the index (columns in alphabetical order
cars = cars.sort_index(axis=1)
After sorting I noticed the DataFrame has NaN and I wasn't sure
if the really np.nan values?
print(cars.isnull().any()) and all column shows false.
I have tried different method to convert those "NaN" values to zero which is what I want to do but non of them is working.
I have tried replace and fillna methods and nothing works
Below is sample of my dataframe..
speedtest size
toyota 65 NaN
honda 77 800
Either use replace or np.where on the values if they are strings:
df = df.replace('NaN', 0)
Or,
df[:] = np.where(df.eq('NaN'), 0, df)
Or, if they're actually NaNs (which, it seems is unlikely), then use fillna:
df.fillna(0, inplace=True)
Or, to handle both situations at the same time, use apply + pd.to_numeric (slightly slower but guaranteed to work in any case):
df = df.apply(pd.to_numeric, errors='coerce').fillna(0, downcast='infer')
Thanks to piRSquared for this one!
#cs95's answer didn't work here.
Had to import numpy as np and use replace with np.Nan and inplace = True
import numpy as np
df.replace(np.NaN, 0, inplace=True)
Then all the columns got 0 instead of NaN.
I have a data frame named df1 like this:
as_id TCGA_AF_2687 TCGA_AF_2689_Norm TCGA_AF_2690 TCGA_AF_2691_Norm
31 1 5 9 2
I wanna select all the columns which end with "Norm", I have tried the code down below
import os;
print os.getcwd()
os.chdir('E:/task')
import pandas as pd
df1 = pd.read_table('haha.txt')
Norms = []
for s in df1.columns:
if s.endswith('Norm'):
Norms.append(s)
print Norms
but I only get a list of names. what can I do to select all the columns including their values rather than just the columns names? I know it may be a silly question, but I am a new beginner, really need someone to help, thank you so much for your kindness and your time.
df1[Norms] will get the actual columns from df1.
As a matter of fact the whole code can be simplified to
import os
import pandas as pd
os.chdir('E:/task')
df1 = pd.read_table('haha.txt')
norm_df = df1[[column for column in df1.columns if column.endswith('Norm')]]
One can also use the filter higher-order function:
newdf = df[list(filter(lambda x: x.endswith("Norm"),df.columns))]
print(newdf)
Output:
TCGA_AF_2689_Norm TCGA_AF_2691_Norm
0 5 2
Recently I need to write a python script to find out how many times the specific string occurs in the excel sheet.
I noted that we can use *xlwings.Range('A1').table.formula* to achieve this task only if the cells are continuous. If the cells are not continuous how can I accomplish that?
It's a little hacky, but why not.
By the way, I'm assuming you are using python 3.x.
First well create a new boolean dataframe that matches the value you are looking for.
import pandas as pd
import numpy as np
df = pd.read_excel('path_to_your_excel..')
b = df.applymap(lambda x: x == 'value_you_want_to_find' if isinstance(x, str) else False)
and then simply sum all occurences.
print(np.count_nonzero(b.values))
As clarified in the comments, if you already have a dataframe, you can simply use count (Note: there must be a better way of doing it):
df = pd.DataFrame({'col_a': ['a'], 'col_b': ['ab'], 'col_c': ['c']})
string_to_search = '^a$' # should actually be a regex, in this example searching for 'a'
print(sum(df[col].str.count(string_to_search).sum() for col in df.columns))
>> 1