How to fix the order of columns in dict - python

I have a series which I grouped and now I want to save that series as csv file with both index and values as two columns(Index followed by values).
So I first tried to covert the series as dataframe and then save the data frame as csv.
s_group_count=df_page_concat.groupby(df_page_concat).count()
df_grouped_values=pd.DataFrame({"page_path":s_group_count.index,"count":s_group_count.values})
Problem is since it is using dict to create a dataframe and that dicts are not ordered, it is adding the count which is the value of series as the first column in dataframe while I want the Index as first column and values (count) as second column.
Any advise how to fix the order and if this is the most optimal way to create a csv out of series with index stored as another column?

from collections import OrderedDict
This has been particularly helpful for me in groupby.agg operations, to enforce column order
So pass in
OrderedDict([("page_path",s_group_count.index),("count",s_group_count.values)])

Related

How to split dataframe or array by unique column value with multiple unique values

So I have a dataframe that looks like this for example:
In this example, I need to split the dataframe into multiple dataframes based on the account_id(or arrays because I will convert it anyways). I want each account id (ab123982173 and bc123982173) to be either an individual data frame or array. Since the actual dataset is thousands of rows long, splitting into a temporary array in a loop was my original thought.
Any help would be appreciated.
you can get a subset of your dataframe.
Using your dataframe as example,
subset_dataframe = dataframe[dataframe["Account_ID"] == "ab123982173"]
Here is a link from the pandas documentation that has visual examples:
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

How to add a Column named Key into a dictionary of multiple dataframes

Given a dictionary with multiple dataframes in it. How I can add a column to each dataframe with all the rows in that df filled with the key name'?
I tried this code:
for key, df in sheet_to_df_map.items():
df['sheet_name'] = key
This code does add the key column in each dataframe inside the dictionary, but also creates an additional dataframe.
Can't this be done without creating an additional dataframe?
Furthermore, I want to separate dataframes from the dictionary by number of columns. All the dataframes that have 10 columns concatenated, the ones with 9 concatenated and so on. I don't know how to do this.
I could do it with the method assign() in the DataFrames and then replacing the hole value in the dictionary, but I don't know in fact if it's this that you want...
for key, df in myDictDf.items():
myDictDf[key] = df.assign(sheet_name=[key for w in range(len(df.index))])
To sort your dictionary, I think you can use an OrderedDict with the columns property of the DataFrames.
By using len(df.columns) you can get the quantity of columns for each frame.
I think these links can be useful for you:
https://note.nkmk.me/en/python-pandas-len-shape-size/
https://www.geeksforgeeks.org/python-sort-python-dictionaries-by-key-or-value/
I've found a related question too:
Adding new column to existing DataFrame in Python pandas

Extracting values from pandas DataFrame using a pandas Series

I have a pandas Series that contains key-value pairs, where the key is the name of a column in my pandas DataFrame and the value is an index in that column of the DataFrame.
For example:
Series:
Series
Then in my DataFrame:
Dataframe
Therefore, from my DataFrame I want to extract the value at index 12 from my DataFrame for 'A', which is 435.81 . I want to put all these values into another Series, so something like { 'A': 435.81 , 'AAP': 468.97,...}
My reputation is low so I can't post my images as images instead of links (can someone help fix this? thanks!)
I think this indexing is what you're looking for.
pd.Series(np.diag(df.loc[ser,ser.axes[0]]), index=df.columns)
df.loc allows you to index based on string indices. You get your rows given from the values in ser (first positional argument in df.loc) and you get your column location from the labels of ser (I don't know if there is a better way to get the labels from a series than ser.axes[0]). The values you want are along the main diagonal of the result, so you take just the diagonal and associate them with the column labels.
The indexing I gave before only works if your DataFrame uses integer row indices, or if the data type of your Series values matches the DataFrame row indices. If you have a DataFrame with non-integer row indices, but still want to get values based on integer rows, then use the following (however, all indices from your series must be within the range of the DataFrame, which is not the case with 'AAL' being 1758 and only 12 rows, for example):
pd.Series(np.diag(df.iloc[ser,:].loc[:,ser.axes[0]]), index=df.columns)

pandas: Select one-row data frame instead of series [duplicate]

I have a huge dataframe, and I index it like so:
df.ix[<integer>]
Depending on the index, sometimes this will have only one row of values. Pandas automatically converts this to a Series, which, quite frankly, is annoying because I can't operate on it the same way I can a df.
How do I either:
1) Stop pandas from converting and keep it as a dataframe ?
OR
2) easily convert the resulting series back to a dataframe ?
pd.DataFrame(df.ix[<integer>]) does not work because it doesn't keep the original columns. It treats the <integer> as the column, and the columns as indices. Much appreciated.
You can do df.ix[[n]] to get a one-row dataframe of row n.

Add pandas Series to a DataFrame, preserving index

I have been having some problems adding the contents of a pandas Series to a pandas DataFrame. I start with an empty DataFrame, initialised with several columns (corresponding to consecutive dates).
I would like to then sequentially fill the DataFrame using different pandas Series, each one corresponding to a different date. However, each Series has a (potentially) different index.
I would like the resulting DataFrame to have an index that is essentially the union of each of the Series indices.
I have been doing this so far:
for date in dates:
df[date] = series_for_date
However, my df index corresponds to that of the first Series and so any data in successive Series that correspond to an index 'key' not in the first Series are lost.
Any help would be much appreciated!
Ben
If i understand you can use concat:
pd.concat([series1,series2,series3],axis=1)

Categories

Resources