How to extract values based on column header in excel? - python

I have an excel file containing values, I needed values as the highlighted one in single column and deleting the rest on. But due to mismatch in rows and column header file, I am not able to extract. Once you will see the excel will able to understand what values I needed.As this is just a sample of mine data.
Column A2:A17 date is continuous but few date are repeating, but in Row (D1:K1) date are not repeating, so in this case value of same date occurring just below of of one other.
How to get values in one column?
Is there a way to highlight the values of same date occurring in row and column? The sample data consist of manually highlighted. I have huge dataset that cannot be manually highlighted.
Because from colour code also I can get the required values too.
Following is the file I am attaching here
https://docs.google.com/spreadsheets/d/1-xBMKRP1_toA_Ky8mKxCKAFi4uQ8YWJq/edit?usp=sharing&ouid=110042758694954349181&rtpof=true&sd=true
Please visit the link and help me to find the solution.
Thank you

I'm not clear what those values in columns D to K are.
If only the shaded ones matter and they can be derived from the Latitude and Longitude for each row separately:
Insert a column titled "Row", say in A, and populate it 1,2,3...
I think you also want a column E which is whatever the calculation you currently have in D-K. Is this "Distance"?
Then create a Pivot Table on rows A to E and you can do anything you are likely to need: https://support.microsoft.com/en-us/office/create-a-pivottable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576
Dates at Colum Labels, Row numbers as Row Labels, and Sum of "Distance" as Values.

Related

How to parse batches of flagged rows and keep the row sastisfying some conditions in a Pandas dataframes?

I have a dataframe containing duplicates, that are flagged by a specific variable. The df looks like this:
enter image description here
The idea is that the rows to keep and its duplicates are stacked in batches (a pair or more if many duplicates)and identified by the "duplicate" column. I would like, for each batch, to keep the row depending on one conditions: it has to be the row with the smallest number of empty cells. For Alice for instance, it should be the second row (and not the one flagged "keep").
The difficulty lies also in the fact that I cannot group by on the "name", "lastname" or "phone" column, because they are not always filled (the duplicates are computed on these 3 concatenated columns by a ML algo).
Unlike already posted questions I've seen (how do I remove rows with duplicate values of columns in pandas data frame?), here the conditions to select the row to keep is not fixed (like keeping the first row or the last withing the batch of duplicates) but depends on the rows completion in each batch.
How can I parse the dataframe according to this column "duplicate", and among each batch extract the row I want ?
I tried to assign an unique label for each batch, in order to iterate over these label, but it fails.

How to position values in First-in-First out order in a Dataframe under a conditional in Python?

I am attempting to position some numeric values in an imported excel file according to changes in my date column (it's called Label). For the moment I am only concerned to place several values in several columns at the same row when the year changes.
Here's my attempt:
#Values to place
values1=[1,2,3]
values2=[4,5,6]
#New columns
workfile['year'] = pd.DatetimeIndex(workfile['Label']).year
workfile['Voice']=''
workfile['Political Stability']=''
for j in range(0,len(values1)):
for i in range(1,len(workfile['Label'])):
if workfile.iloc[i,4]!= workfile.iloc[i-1,4] and workfile.iloc[i,5]=='':
workfile.iloc[i-1,5]=values1[j]
workfile.iloc[i-1,6]=values2[j]
break
This returns the last values of both vectors 'values1' and 'values2' positioned at just one row (the code only identified one change in years, whereas each value from these vectors represents a year itself). I want to place each of the values in these vectors in FIFO (First In First Out) order just the row before the year changes. I hope I have made myself clear, if not please let me know.
I imagine it's evident, but I am a complete beginner in Python. So many thanks in advance are in order for any comment or suggestion, I'd be very appreciative!

How can I find what row isn't allowing the column conversion to float/int in Python?

so I have a dataframe with 300 thousand rows, 12 columns, including one column called value. Basically all the values in this column are of a type that can be converted to float, except for a row or two which includes the headers, which find their way in there as a result of the consolidation process. How can I find this troublesome row?
Thank you!

Take values from one column and create new column from it

I have big database that has one column called "Measurments" and one column with called "data" which contains data about those different measurments, for example, i measurments you can find height, weight and different indices values and in data you will find the value for this "measurment".
I would like to organize this database in a way that each unique measurment type, will have its' own column, so for example i'll have column name weight, height ect. and the vvalue they got from the column "data".
Until nowI have used this way in order to create many little databases with my relevant data:
df_NDVI=df[(df['Measurement'] == 'NDVI') & (df['Data']!='Corrupt')]
df_VPP_kg=df[(df['Measurement'] == 'WEIGHT')]
But as youcan see, it is not efficient and it creates many databases instead of one with those columns.
My end goal: to take each unique field from "measurments" column and create new column for it with the correct data from column "data".
Try this:
df["obs"]=df.groupby("Measurements")["Measurements"].cumcount()
df.pivot(index="obs", columns="Measurements", values="Data")
So you will get 1 column for each unique value from Measurements, and Data will be order below by order of observation.

How to calculate based on multiple conditions using Python data frames?

I have excel data file with thousands of rows and columns.
I am using python and have started using pandas dataframes to analyze data.
What I want to do in column D is to calculate annual change for values in column C for each year for each ID.
I can use excel to do this – if the org ID is same are that in the prior row, calculate annual change (leaving the cells highlighted in blue because that’s the first period for that particular ID). I don’t know how to do this using python. Can anyone help?
Assuming the dataframe is already sorted
df.groupby(‘ID’).Cash.pct_change()
However, you can speed things up with the assumption things are sorted. Because it’s not necessary to group in order to calculate percentage change from one row to next
df.Cash.pct_change().mask(
df.ID != df.ID.shift()
)
These should produce the column values you are looking for. In order to add the column, you’ll need to assign to a column or create a new dataframe with the new column
df[‘AnnChange’] = df.groupby(‘ID’).Cash.pct_change()

Categories

Resources