I have a DataFrame df with various rows and columns. I want to add a column named df_index in the df which contains values that numbers all the records. For example: the df_index value for the 1st record should be df_1, for 2nd record it should be df_2, that goes up to the last record in df.
Please see that each value in df_index should be of the format df_{number}.
df['df_index'] = 'df_' + pd.Series(range(1, len(df)+1), index=df.index).astype(str)
Related
I am trying to convert df to just get the length of it in a new dataframe.
Which is what I do, but then this dataframe does not have a header.
How do I add a header to this length?
df = df.append(df_temp, ignore_index=True, sort=True)
df = len(df)
when I print df i get the # of records but no header. How can I add a header to this?
If you want to your df have the column name and the lenght then you shuld try something like:
labels = {}
for column in temp_df.columns:
labels[column] = len(temp_df[column].dropna())
print(labels)
Here labels would be a dictionary with the column name as a key and the number of rows as a value.
I would like to set for each row, the value of ART_IN_TICKET to be the number of rows that have the same TICKET_ID as this row.
for example, for the first 5 rows of this dataframe, TICKET_ID is 35592159 and ART_IN_TICKET should be 5 since there are 5 rows with that same TICKET_ID.
There can be other solutions as well. A relatively simple solution would be to get the count of rows for each TICKET_ID and then merge the new df with this one to get the final result in ART_IN_TICKET. Assuming the above dataframe is in df.
count_df = df[['TICKET_ID', 'ART_IN_TICKET']].groupby("TICKET_ID").count().reset_index()
df = df[list(set(df.columns.tolist())-set(["ART_IN_TICKET"]))] # Removing ART_IN_TICKET column before merging
final_df = df.merge(count_df, on="TICKET_ID")
I have a pandas dataframe that has the same title in multiple columns. So when I print df.columns I get title.1 title.2,title.3...etc.
What I m trying to do is to get the highest value of title and rename that column to remove the number and put it into another dataframe.
For example i have a dataframe:
df = pd.read_excel(excel_file_path_from_db, engine='openpyxl', sheet_name='Sheet1', skiprows=1)
In my case the most recent values of title will be in the last 12 columns of this dataframe. i get that by,
df2 = df.iloc[:, -12:]
My problem is that df2 will have the highest value of title example title.7. how do I remove the .X from title in df2?
Data is an unique value, id is repeated multiple times in an excel file. Data is column 1 and id's are column 2. I would like to group the unique data values to an id without losing any. Then set the column index as the id, and paste the data values associated below. Then do the same thing with the second id and paste that id's values below 1 cell to the left of the first id column. Could anyone help me sort it out to such layout?
You can't have varying length columns in a dataframe. So NaNs are unavoidable.
import pandas as pd
df = pd.DataFrame({'col1':[2,3,3,4,2,1,3,4], 'col2':[1,1,1,1,2,2,2,3]})
# First problem
df2 = df.pivot(columns='col2')["col1"]
df2 = df2.apply(lambda x: pd.Series(x.dropna().values))
print(df2)
# Second problem
def concat(s):
return s.tolist()
df3 = df.groupby('col2').agg(concat)["col1"].apply(pd.Series)
print(df3)
I have a large dataframe df1 with many data columns, two of which are dates and colNum. I have built a second dataframe df2 which spans the date range and colNum of df1. I now want to fill df2 with a third column (any of the many other data columns) of df1 which meet the criteria of dates and colNum from df1 that match dateIndex and colNum of df2.
I've tried various incarnations of MERGE with no success.
I can loop through the combinations, but df1 is very large (270k, 2k) so it takes forever to do fill one df2 from one of df1's columns, let alone all of them.
Slow looping version
dataList = ['revt']
for i in dataList:
goodRows = df1.index[~np.isnan(df1[i])].tolist()
for j in goodRows:
df2.loc[df1['dates'][j], str(df1['colNum'][j])] = df1[i][j]
Input
Desired Output
convert index to column e.g
df1.reset_index() #as per your statement date seems to be in index
df2.rest_index()
df2 = pd.merge(df2, df1, on = ['dateIndex', 'colNum'], how = 'left') #keep either "left" or "inner" as per your convenience
update
rather you can keep date in index and in pd.merge there is a option to join via index too