I have a big csv file and I would like to combine rows with the same id#.
For instance, this is what my csv shows right now.
and I would like it to be like this:
how can I do this using pandas?
Try this:
df = df.groupby('id').agg({'name':'last',
'type':'last',
'date':'last' }).reset_index()
this way you can have customized function in handling each columns.
(By changing the function from 'last' to your function)
You can read the csv with pd.read_csv() function and then use the GroupBy.last() function to aggregate rows with the same id.
something like:
df = pd.read_csv('file_name.csv')
df1 = df.groupby('id').last()
you should also decide an aggregation function instead of using "the last" row value.
Related
I have a dataframe as follows:
I want to have the max of the 'Sell_price' column according to 'Date' and 'Product_id' columns, without losting the dimension of dataframe as:
Since my data is very big, without a doubt using 'for' command is not logical.
I think you are looking for transform:
df['Sell_price'] = df.groupby(['Date', 'Product_id'])['Sell_price'].transform('max')
I have a list of columns from a dataframe
df_date=[df[var1],df[var2]]
I want to change the data in that columns to date time type
for t in df_date:
pd.DatetimeIndex(t)
for some reason its not working
I whould like to understand what is more general solution for applying sevral operations on several columns.
As an alternative, you can do:
for column_name in ["var1", "var2"]:
df[column_name] = pd.DatetimeIndex(df[column_name])
You can use pandas.to_datetime and pandas.DataFrame.apply to convert a dataframe's entire content to datetime. You can also filter out the columns you need and apply it only to them.
df[['column1', 'column2']] = df[['column1', 'column2']].apply(pd.to_datetime)
Note that a list of series and a DataFrame are not the same thing.
A DataFrame is accessed like this:
df[[columns]]
While a list of series is looks like this:
[seriesA, seriesB]
I have a data frame called df and in one column 'Properties' I have listed properties of some product. These properties are a single sentence. Some of them have the same ending i.e. stock.
I was trying to do something like:
df.loc[df['Properties'][-6:] == 'stock']
to filter this values but it was not working.
I'd like to implement functionality where I can filter data frame by its last 5 characters.
Do you have any ideas how to do this task?
Try this:
df = df[df['Properties'].str.endswith('stock')]
If you want to try what you were trying, this would work:
df = df[df['Properties'].str[-5:]=='stock']
Actually, I am facing the problem to add the data in the subcolumn in the specific format. I have created the "Polypoints" as the main column and I want
df["Polypoints"] = [{"__type":"Polygon","coordinates":Row_list}]
where Row_list is the column of dataframe which contains the data in the below format
df["Row_list"] = [[x1,y1],[x2,y2],[x3,y3]
[x1,y1],[x2,y2],[x3,y3]
[x1,y1],[x2,y2],[x3,y3]]
I want to convert the dataframe into json in the format
"Polypoints" :{"__type":"Polygon" ,"coordinates":Row_list}
There are various ways to do that.
One can create a function create_polygon that takes as input the dataframe (df), and the column name (columname). That would look like the following
def create_polygon(df, columnname):
return {"__type":"Polygon", "coordinates":df[columnname]}
Considering that the column name will be Row_list, the following will already be enough
def create_polygon(df):
return {"__type":"Polygon", "coordinates":df['Row_list']}
Then with pandas.DataFrame.apply one can apply it to the column Polypoints as follows
df['Polypoints'] = df.apply(create_polygon, axis=1)
As Itamar Mushkin mentions, one can also do it with a Lambda function as follows
df['Polypoints'] = df.apply(lambda row: {"__type":"Polygon", "coordinates":row['Row_list']} ,axis=1)
I would like to create a very long pivot table using pandas.
I import a .csv file, creating the dataframe df. The .csv file looks like:
LOC,surveyor_name,test_a,test_b
A,Bob,FALSE,FALSE
A,Bob,TRUE,TRUE
B,Bob,TRUE,FALSE
B,Ryan,TRUE,TRUE
I have the basic pivot table setup here, creating the pivot on index LOC
table = pd.pivot_table(df, values=['surveyor_name'], index=['LOC'],aggfunc={'surveyor_name': np.count_nonzero})
I would like to pass into the aggfunc section a dictionary for each column heading
I created a csv with the list of column headings and the aggregation function, i.e:
a,b
surveyor_name, np.count_nonzero
test_a,np.count_nonzero
test_b,np.count_nonzero
I create a dataframe and convert this dataframe to a dict here:
keys = pd.read_csv('keys.csv')
x = keys.to_dict()
I now have object x that I want to enter into aggfunc, but it is at this point I can't move foward.
So the issue with this came in two parts.
Firstly the creation of the dict was not correct.
x= dict(zip(keys['a'],keys['b']))
Secondly, instead of np.count_nonzero the use of nunique worked.