I am trying to convert df to just get the length of it in a new dataframe.
Which is what I do, but then this dataframe does not have a header.
How do I add a header to this length?
df = df.append(df_temp, ignore_index=True, sort=True)
df = len(df)
when I print df i get the # of records but no header. How can I add a header to this?
If you want to your df have the column name and the lenght then you shuld try something like:
labels = {}
for column in temp_df.columns:
labels[column] = len(temp_df[column].dropna())
print(labels)
Here labels would be a dictionary with the column name as a key and the number of rows as a value.
Related
I have a database which has two columns with unique numbers. This is my reference dataframe (df_reference). In another dataframe (df_data) I want to get the rows of this dataframe of which a column values exist in this reference dataframe. I tried stuff like:
df_new = df_data[df_data['ID'].isin(df_reference)]
However, like this I can't get any results. What am I doing wrong here?
From what I see, you are passing the whole dataframe in .isin() method.
Try:
df_new = df_data[df_data['ID'].isin(df_reference['ID'])]
Convert the ID column to the index of the df_data data frame. Then you could do
matching_index = df_reference['ID']
df_new = df_data.loc[matching_index, :]
This should solve the issue.
I have a dataframe which is loaded from a .csv, and I would like to remove some text in the labels.
Right now, my dataframe save the labels as output.text.user.12, output.text.user.1224,...
I would like to remove from that labels the part "output.text.user."
output.text.user.12 ... output.text.user.23424
index ...
332 0.06924 ... 0.0
Does anyone know how could I do it?
I've seen how to replace the current name from a dictionary, but it's a too long dataframe to do it.
DataFrame.rename is what you want. Assuming your dataframe is df
df = df.rename(columns=lambda x: x.replace('output.text.user.', ''))
Consider the following snippet:
import pandas as pd
# your dataframe
df = pd.DataFrame()
# loop over columns, split by dot (.) and select last item in resulting list
new_columns = []
for column in df.columns:
new_columns.append(column.split('.')[-1])
# assign new column names to your dataframe by overwriting the old ones
df.columns = new_columns
in my data frame, I have a column called requester when I want to get data of requester column I'm getting another column data along with it, how to remove the extra column
the data frame i've
and the code I'm trying here is
names = pd.DataFrame(report['Requester'])
and the output I'm getting is
I want to remove SLA resol column from daataframe.
Problem is first column is converted to index.
So possible solution is first add DataFrame.reset_index for default index and then for one column DataFrame add [[]] with column name:
report = report.reset_index()
names = report[['Requester']]
Another idea is create DataFrame with default index by read_csv or read_excel with index=False:
df = pd.read_csv(file, index=False)
df = pd.read_excel(file1, index=False)
Last if need write DataFrame without default index:
df.to_csv(file2, index=False)
df.to_excel(file12, index=False)
I've read a csv file into a pandas data frame, df, of 84 rows. There are n (6 in this example) values in a column that I want to use as keys in a dictionary, data, to convert to a data frame df_data. Column names in df_data come from the columns in df.
I can do most of this successfully, but I'm not getting the actual data into the dataframe. I suspect the problem is in my loop creating the dictionary, but can't figure out what's wrong.
I've tried subsetting df[cols], taking it out of a list, etc.
data = {}
cols = [x for x in df.columns if x not in drops] # drops is list of unneeded columns
for uni in unique_sscs: # unique_sscs is a list of the values to use as the index
for col in cols:
data[uni] = [df[cols]]
df_data = pd.DataFrame(data, index=unique_sscs, columns=cols)
Here's my result (they didn't paste, but all values show as NaN in Jupyter):
lab_anl_method_name analysis_date test_type result_type_code result_unit lab_name sample_date work_order sample_id
1904050740
1904050820
1904050825
1904050830
1904050840
1904050845
I recently started working with pandas dataframes.
I have a list of dataframes called 'arr'.
Edit: All the dataframes in 'arr' have same columns but different data.
Also, I have an empty dataframe 'ndf' which I need to fill in using the above list.
How do I iterate through 'arr' to fill in the max values of a column from 'arr' into a row in 'ndf'
So, we'll have
Number of rows in ndf = Number of elements in arr
I'm looking for something like this:
columns=['time','Open','High','Low','Close']
ndf=DataFrame(columns=columns)
ndf['High']=arr[i].max(axis=0)
Based on your description, I assume a basic example of your data looks something like this:
import pandas as pd
data =[{'time':'2013-09-01','open':249,'high':254,'low':249,'close':250},
{'time':'2013-09-02','open':249,'high':256,'low':248,'close':250}]
data2 =[{'time':'2013-09-01','open':251,'high':253,'low':248,'close':250},
{'time':'2013-09-02','open':245,'high':251,'low':243,'close':247}]
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
arr = [df, df2]
If that's the case, then you can simply iterate over the list of dataframes (via enumerate()) and the columns of each dataframe (via iteritems(), see http://pandas.pydata.org/pandas-docs/stable/basics.html#iteritems), populating each new row via a dictionary comprehension: (see Create a dictionary with list comprehension in Python):
ndf = pd.DataFrame(columns = df.columns)
for i, df in enumerate(arr):
ndf = ndf.append(pd.DataFrame(data = {colName: max(colData) for colName, colData in df.iteritems()}, index = [i]))
If some of your dataframes have any additional columns, the resulting dataframe ndf will have NaN entries in the relevant places.