How to generate multiple pandas dataframe from ordereddict?

How to generate multiple pandas dataframe from ordereddict? - python

I have an Ordered Dictionary, where the keys are the worksheet names, and the values contain the the worksheet items. Thus, the question: How do I use each of the keys and convert to an individual dataframe?
import pandas as pd
powerbipath = 'PowerBI_Ingestion.xlsx' dfs = pd.read_excel(powerbipath, None)
values=[] for idx, eachdf in enumerate(dfs):
eachdf=dfs[eachdf]
new_list1.append(eachdf)
eachdf = pd.DataFrame(new_list1[idx])
Examples I have seen only show how to convert from an ordered dictionary to 1 pandas dataframe. I want to convert to multiple dataframes. Thus, if there are 5 keys, there will be 5 dataframes.

You may want to do something like this, (Assuming your dictionary looks like 'd') :
d = {'first': [1, 2], 'second': [3, 4]}
for i in d:
df = pd.DataFrame(d.get(i), columns=[i])
print(df)
Output looks like :
first
0 1
1 2
second
0 3
1 4

Here is a basic answer using one of these ideas
keys = df["key_column"].unique
df_array = {}
for k in keys :
df_array[k] = dfs[dfs['key_column'] == k]
There might be more efficient way to do it though.

Related

extracted data from sql for processing using python

I have saved out a data column as follows:
[[A,1], [B,5], [C,18]....]
i was hoping to group A,B,C as shown above into Category and 1,5,18 into Values/Series for updating of my powerpoint chart using python pptx.
Example:
Category
Values
A
1
B
5
Is there any way i can do it? currently the above example is also extracted as strings so i believe i have to convert it to lists first?
thanks in advance!

Try to parse your strings (a list of lists) then create your dataframe from the real list:
import pandas as pd
import re
s = '[[A,1], [B,5], [C,18]]'
cols = ['Category', 'Values']
data = [row.split(',') for row in re.findall('\[([^]]+)\]', s[1:-1])]
df = pd.DataFrame(data, columns=cols)
print(df)
# Output:
Category Values
0 A 1
1 B 5
2 C 18

You should be able to just use pandas.DataFrame and pass in your data, unless I'm misunderstanding the question. Anyway, try:
df = pandas.DataFrame(data=d, columns = ['Category', 'Value'])
where d is your list of tuples.

from prettytable import PrettyTable
column = [["A",1],["B",5],["C",18]]
columnname=[]
columnvalue =[]
t = PrettyTable(['Category', 'Values'])
for data in column:
columnname.append(data[0])
columnvalue.append(data[1])
t.add_row([data[0], data[1]])
print(t)

How to sort a dataFrame in python pandas by two or more columns based on a list of values? [duplicate]

So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.
The list looks something like this:
class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']
This list would correspond to the Dataframe column df['Class']. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class'] is in a different order currently. What would be the best way to do this?

You could make the Class column your index column
df = df.set_index('Class')
and then use df.loc to reindex the DataFrame with class_list:
df.loc[class_list]
Minimal example:
>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6
>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3

Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in time does not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:
ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']
df_list = []
for i in ordered_classes:
df_list.append(df[df['Class']==i])
ordered_df = pd.concat(df_list)

Creating a dataframe from dictionary with arbitrary length values (using recycled keys as column value)

I am struggling with converting a dictionary into a dataframe.
There are already a lot of answers showing how to do it in the "wide format" like https://stackoverflow.com/a/52819186/6912069 but I would like to do something different, preferably not using loops.
Consider the following example:
I have a dictionary like this one
d_test = {'A': [1, 2], 'B': [3]}
and I'd like to get a dataframe like
index id values
0 A 1
1 A 2
2 B 3
The index can be a normal consecutive integer column. By recycling I mean turning 'A'=[1, 2] into two rows having A in the id column and the values in the values column. This way I would have a "long format" dataframe of the dictionary items.
It seems to be a very basic thing to do, but I was wondering if there is an elegant pythonic way to achieve this. Many thanks for your help.

I would create 2 lists. One from the keys, and other one from the values of the dictionary. As you defined the lists you can pass the lists into the DataFrame.
import pandas as pd
dic = {'A': [1, 2], 'B': [3], 'D': [4, 5, 6]}
keys = []
values = []
for key, value in dic.items():
for v in value:
keys.append(key)
values.append(v)
df = pd.DataFrame(
{'id': keys,
'values': values,
})
print(df)

Renaming column values in Pandas in alphabetical order

I have a large data set with a column that contains personal names, totally there are 60 names by value_counts(). I don't want to show those names when I analyze the data, instead I want to rename them to participant_1, ... ,participant_60.
I also want to rename the values in alphabetical order so that I will be able to find out who is participant_1 later.
I started with create a list of new names:
newnames = [f"participant_{i}" for i in range(1,61)]
Then I try to use the function df.replace.
df.replace('names', 'newnames')
However, I don't know where to specify that I want participant_1 replace the name that comes first in alphabetical order. Any suggestions or better solutions?

If need replace values in column in alphabetical order use Categorical.codes:
df = pd.DataFrame({
'names':list('bcdada'),
})
df['new'] = [f"participant_{i}" for i in pd.Categorical(df['names']).codes + 1]
#alternative solution
#df['new'] = [f"participant_{i}" for i in pd.CategoricalIndex(df['names']).codes + 1]
print (df)
names new
0 b participant_2
1 c participant_3
2 d participant_4
3 a participant_1
4 d participant_4
5 a participant_1

use rename
df.rename({'old_column_name':'new_column_nmae',......},axis=1,inplace=1)
You can generate the mapping using a dict comprehension like this -
mapper = {k: v for (k,v) in zip(sorted(df.columns), newnames)}

If I understood correctly you want to replace column values not column names.
Create a dict with old_names and new_names then You can use df.replace
import pandas as pd
df = pd.DataFrame()
df['names'] = ['sam','dean','jack','chris','mark']
x = ["participant_{}".format(i+1) for i in range(len(df))]
rep_dict = {k:v for k,v in zip(df['names'].sort_values(), x)}
print(df.replace(rep_dict))
Output:
names
0 participant_5
1 participant_2
2 participant_3
3 participant_1
4 participant_4

Pandas DataFrame take automatically wrong value as index

I tried to create DataFrames from a JSON file.
I have a list named "Series_participants" containing a part of this JSON file. My list look like thise when i print it.
participantId 1
championId 76
stats {'item0': 3265, 'item2': 3143, 'totalUnitsHeal...
teamId 100
timeline {'participantId': 1, 'csDiffPerMinDeltas': {'1...
spell1Id 4
spell2Id 12
highestAchievedSeasonTier SILVER
dtype: object
<class 'list'>
After i tri to convert this list to a DataFrame like this
pd.DataFrame(Series_participants)
But pandas use values of "stats" and "timeline" as index for the DataFrame. I expected to have automatic index range (0, ..., n)
EDIT 1:
participantId championId stats teamId timeline spell1Id spell2Id highestAchievedSeasonTier
0 1 76 3265 100 NaN 4 12 SILVER
I want to have a dataframe with "stats" & "timeline" colomns containing dicts of their values as in the Series display.
What is my error ?
EDIT 2:
I have tried to create manually the DataFrame but pandas didn't take my choices in consideration and finally take indexes of "stats" key of the Series.
here is my code :
for j in range(0,len(df.participants[0])):
for i in range(0,len(df.participants[0][0])):
Series_participants = pd.Series(df.participants[0][i])
test = {'participantId':Series_participants.values[0],'championId':Series_participants.values[1],'stats':Series_participants.values[2],'teamId':Series_participants.values[3],'timeline':Series_participants.values[4],'spell1Id':Series_participants.values[5],'spell2Id':Series_participants.values[6],'highestAchievedSeasonTier':Series_participants.values[7]}
if j == 0:
df_participants = pd.DataFrame(test)
else:
df_participants.append(test, ignore_index=True)
The double loop is to parse all "participant" of my JSON file.
LAST EDIT :
I achieved what i wanted with the following code :
for i in range(0,len(df.participants[0])):
Series_participants = pd.Series(df.participants[0][i])
df_test = pd.DataFrame(data=[Series_participants.values], columns=['participantId','championId','stats','teamId','timeline','spell1Id','spell2Id','highestAchievedSeasonTier'])
if i == 0:
df_participants = pd.DataFrame(df_test)
else:
df_participants = df_participants.append(df_test, ignore_index=True)
print(df_participants)
Thanks to all for your help !

For efficiency, you should try and manipulate your data as you construct your dataframe rather than as a separate step.
However, to split apart your dictionary keys and values you can use a combination of numpy.repeat and itertools.chain. Here's a minimal example:
df = pd.DataFrame({'A': [1, 2],
'B': [{'key1': 'val0', 'key2': 'val9'},
{'key1': 'val1', 'key2': 'val2'}],
'C': [{'key3': 'val10', 'key4': 'val8'},
{'key3': 'val3', 'key4': 'val4'}]})
import numpy as np
from itertools import chain
chainer = chain.from_iterable
lens = df['B'].map(len)
res = pd.DataFrame({'A': np.repeat(df['A'], lens),
'B': list(chainer(df['B'].map(lambda x: x.values())))})
res.index = chainer(df['B'].map(lambda x: x.keys()))
print(res)
A B
key1 1 val0
key2 1 val9
key1 2 val1
key2 2 val2

If you try to input lists, series or arrays containing dicts into the object constructor, it doesn't recognise what you're trying to do. One way around this is manually setting:
df.at['a', 'b'] = {'x':value}
Note, the above will only work if the columns and indexes are already created in your DataFrame.

Updated per comments: Pandas data frames can hold dictionaries, but it is not recommended.
Pandas is interpreting that you want one index for each of the your dictionary keys and then broadcasting the single item columns across them.
So to help with what you are trying to do I would recommend reading in your dictionaries items as columns. Which is what data frames are typically used for and very good at.
Example Error due to pandas trying to read in the dictionary by key, value pair:
df = pd.DataFrame(columns= ['a', 'b'], index=['a', 'b'])
df.loc['a','a'] = {'apple': 2}
returns
ValueError: Incompatible indexer with Series
Per jpp in the comments below (When using the constructor method):
"They can hold arbitrary types, e.g.
df.iat[0, 0] = {'apple': 2}
However, it's not recommended to use Pandas in this way."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to generate multiple pandas dataframe from ordereddict? - python

You may want to do something like this, (Assuming your dictionary looks like 'd') : d = {'first': [1, 2], 'second': [3, 4]} for i in d: df = pd.DataFrame(d.get(i), columns=[i]) print(df) Output looks like : first 0 1 1 2 second 0 3 1 4

Here is a basic answer using one of these ideas keys = df["key_column"].unique df_array = {} for k in keys : df_array[k] = dfs[dfs['key_column'] == k] There might be more efficient way to do it though.

Related

extracted data from sql for processing using python

How to sort a dataFrame in python pandas by two or more columns based on a list of values? [duplicate]

Creating a dataframe from dictionary with arbitrary length values (using recycled keys as column value)

Renaming column values in Pandas in alphabetical order

Pandas DataFrame take automatically wrong value as index

Categories

Resources