How value from col df insert to row df2? - python

I have a dataframe with columns route_id, equipment_id, postition.
I need to create new df index = route_id with col_name = postition and value is equipment_id.
I tried to create a new dataframe. with data retention:

Related

Pandas, making a dataframe based on the length of another dataframe

I am trying to convert df to just get the length of it in a new dataframe.
Which is what I do, but then this dataframe does not have a header.
How do I add a header to this length?
df = df.append(df_temp, ignore_index=True, sort=True)
df = len(df)
when I print df i get the # of records but no header. How can I add a header to this?
If you want to your df have the column name and the lenght then you shuld try something like:
labels = {}
for column in temp_df.columns:
labels[column] = len(temp_df[column].dropna())
print(labels)
Here labels would be a dictionary with the column name as a key and the number of rows as a value.

Pandas DataFrame with multiple repeating columns

I have a pandas dataframe that has the same title in multiple columns. So when I print df.columns I get title.1 title.2,title.3...etc.
What I m trying to do is to get the highest value of title and rename that column to remove the number and put it into another dataframe.
For example i have a dataframe:
df = pd.read_excel(excel_file_path_from_db, engine='openpyxl', sheet_name='Sheet1', skiprows=1)
In my case the most recent values of title will be in the last 12 columns of this dataframe. i get that by,
df2 = df.iloc[:, -12:]
My problem is that df2 will have the highest value of title example title.7. how do I remove the .X from title in df2?

Pandas column1 values, column2 names, is there a way to group and rearrange the data so that column2 becomes row header

Data is an unique value, id is repeated multiple times in an excel file. Data is column 1 and id's are column 2. I would like to group the unique data values to an id without losing any. Then set the column index as the id, and paste the data values associated below. Then do the same thing with the second id and paste that id's values below 1 cell to the left of the first id column. Could anyone help me sort it out to such layout?
You can't have varying length columns in a dataframe. So NaNs are unavoidable.
import pandas as pd
df = pd.DataFrame({'col1':[2,3,3,4,2,1,3,4], 'col2':[1,1,1,1,2,2,2,3]})
# First problem
df2 = df.pivot(columns='col2')["col1"]
df2 = df2.apply(lambda x: pd.Series(x.dropna().values))
print(df2)
# Second problem
def concat(s):
return s.tolist()
df3 = df.groupby('col2').agg(concat)["col1"].apply(pd.Series)
print(df3)

How to get data correctly into dictionary

I've read a csv file into a pandas data frame, df, of 84 rows. There are n (6 in this example) values in a column that I want to use as keys in a dictionary, data, to convert to a data frame df_data. Column names in df_data come from the columns in df.
I can do most of this successfully, but I'm not getting the actual data into the dataframe. I suspect the problem is in my loop creating the dictionary, but can't figure out what's wrong.
I've tried subsetting df[cols], taking it out of a list, etc.
data = {}
cols = [x for x in df.columns if x not in drops] # drops is list of unneeded columns
for uni in unique_sscs: # unique_sscs is a list of the values to use as the index
for col in cols:
data[uni] = [df[cols]]
df_data = pd.DataFrame(data, index=unique_sscs, columns=cols)
Here's my result (they didn't paste, but all values show as NaN in Jupyter):
lab_anl_method_name analysis_date test_type result_type_code result_unit lab_name sample_date work_order sample_id
1904050740
1904050820
1904050825
1904050830
1904050840
1904050845

Give index numbers to the records

I have a DataFrame df with various rows and columns. I want to add a column named df_index in the df which contains values that numbers all the records. For example: the df_index value for the 1st record should be df_1, for 2nd record it should be df_2, that goes up to the last record in df.
Please see that each value in df_index should be of the format df_{number}.
df['df_index'] = 'df_' + pd.Series(range(1, len(df)+1), index=df.index).astype(str)

Categories

Resources