Convert dictionary of dictionaries into DataFrame Python - python

I am trying to convert the following dictionary of dictionaries into pandas DataFrame.
My dictionary looks like this:
mydata = {1965:{1:52, 2:54, 3:67, 4:45},
1966:{1:34, 2:34, 3:35, 4:76},
1967:{1:56, 2:56, 3:54, 4:34}}
And I need to get a resulting dataframe that looks like this:
Sector 1965 1966 1967
1 52 34 56
2 54 34 56
3 67 35 54
4 45 76 34
I was using something like this, but I'm not getting the result that I need.
df = pd.DataFrame([[col1,col2,col3] for col1, d in test.items() for col2, col3 in d.items()])enter code here
Thanks a lot for your help!!!

You can use DataFrame.from_records:
import pandas as pd
ydata = {1965:{1:52, 2:54, 3:67, 4:45},
1966:{1:34, 2:34, 3:35, 4:76},
1967:{1:56, 2:56, 3:54, 4:34}}
print (pd.DataFrame.from_records(ydata))
1965 1966 1967
1 52 34 56
2 54 34 56
3 67 35 54
4 45 76 34
print (pd.DataFrame.from_records(ydata).reset_index().rename(columns={'index':'Sector'}))
Sector 1965 1966 1967
0 1 52 34 56
1 2 54 34 56
2 3 67 35 54
3 4 45 76 34

Related

Place data from a Pandas DF into a Grid or Template

I have process where the end product is a Pandas DF where the output, which is variable in terms of data and length, is structured like this example of the output.
9 80340796
10 80340797
11 80340798
12 80340799
13 80340800
14 80340801
15 80340802
16 80340803
17 80340804
18 80340805
19 80340806
20 80340807
21 80340808
22 80340809
23 80340810
24 80340811
25 80340812
26 80340813
27 80340814
28 80340815
29 80340816
30 80340817
31 80340818
32 80340819
33 80340820
34 80340821
35 80340822
36 80340823
37 80340824
38 80340825
39 80340826
40 80340827
41 80340828
42 80340829
43 80340830
44 80340831
45 80340832
46 80340833
I need to get the numbers in the second column above, into the following grid format based on the numbers in the first column above.
1 2 3 4 5 6 7 8 9 10 11 12
A 1 9 17 25 33 41 49 57 65 73 81 89
B 2 10 18 26 34 42 50 58 66 74 82 90
C 3 11 19 27 35 43 51 59 67 75 83 91
D 4 12 20 28 36 44 52 60 68 76 84 92
E 5 13 21 29 37 45 53 61 69 77 85 93
F 6 14 22 30 38 46 54 62 70 78 86 94
G 7 15 23 31 39 47 55 63 71 79 87 95
H 8 16 24 32 40 48 56 64 72 80 88 96
So the end result in this example would be
Any advice on how to go about this would be much appreciated. I've been asked for this by a colleague, so the data is easy to read for their team (as it matches the layout of a physical test) but I have no idea how to produce it.
pandas pivot table, can do what you want in your question, but first you have to create 2 auxillary columns, 1 determing which column the value has to go in, another which row it is. You can get that as shown in the following example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'num': list(range(9, 28)), 'val': list(range(80001, 80020))})
max_rows = 8
df['row'] = (df['num']-1)%8
df['col'] = np.ceil(df['num']/8).astype(int)
df.pivot_table(values=['val'], columns=['col'], index=['row'])
val
col 2 3 4
row
0 80001.0 80009.0 80017.0
1 80002.0 80010.0 80018.0
2 80003.0 80011.0 80019.0
3 80004.0 80012.0 NaN
4 80005.0 80013.0 NaN
5 80006.0 80014.0 NaN
6 80007.0 80015.0 NaN
7 80008.0 80016.0 NaN

how to change multiple header into a flat header dataframe

This is my data:
My Very First Column
Sr No Col1 Col2
Sr No sub_col1 sub_col2 sub_col3 sub_col1 sub_col2 sub_col3
1 9 45 3 9 97 9
2 32 95 12 67 78 34
3 3 6 5 85 54 99
4 32 31 75 312 56 98
This is how I want it to be:
Sr No Col1-sub_col1 Col1-sub_col2 Col1-sub_col3 Col2-sub_col1 Col2-sub_col2 Col2-sub_col3
1 9 45 3 9 97 9
2 32 95 12 67 78 34
3 3 6 5 85 54 99
4 32 31 75 312 56 98
The problem is the columns and sub columns always differ every time. Thus, I can't put any constant value.
suppose you have a dataframe named df:-
Try using:-
columns=[]
for x in df.columns.to_list():
columns.append('-'.join([list(x)[0],list(x)[1]]))
Finally:-
df.columns=columns

Python: How to exclude specific parts of a row when reading from CSV file

I'm very new to Python and am trying to read a CSV file:`
1980,Mark,Male,Student,L,90,56,78,44,88
1982,Cindy,Female,Student,S,45,76,22,42,90
1984,Kevin,Male,Student,L,67,83,52,55,59
1986,Michael,Male,Student,M,94,63,73,60,43
1988,Anna,Female,Student,S,66,50,59,57,33
1990,Jessica,Female,Student,S,72,34,29,69,27
1992,John,Male,Student,L,80,67,90,89,68
1994,Tom,Male,Student,M,23,60,89,78,39
1996,Nick,Male,Student,S,56,98,84,44,50
1998,Oscar,Male,Student,M,64,61,74,59,63
2000,Andy,Male,Student,M,11,50,93,69,90
I'd like to save only the specific attributes of this data into a dictionary, or a list of lists. For example, I'd only like to keep the year, name and the five numbers (in a row). I'm not sure how to exclude only the middle three columns.
This is the code I have now:
def read_data(filename):
f = open("myfile.csv", "rt")
import csv
data = {}
for line in f:
row = line.rstrip().split(',')
data[row[0]] = [e for e in row[5:]]
return data
I only know how to keep chunks of columns together, but not only specific columns one by one.
You could use pd.read_csv() and pass in your desired column names:
import pandas as pd
df = pd.read_csv('csv1.csv', names=['Year','Name','Gender','ID1','ID2','Val1','Val2','Val3','Val4','Val5'])
desired = df[['Year','Name','Val1','Val2','Val3','Val4','Val5']]
Yields:
Year Name Val1 Val2 Val3 Val4 Val5
0 1980 Mark 90 56 78 44 88
1 1982 Cindy 45 76 22 42 90
2 1984 Kevin 67 83 52 55 59
3 1986 Michael 94 63 73 60 43
4 1988 Anna 66 50 59 57 33
5 1990 Jessica 72 34 29 69 27
6 1992 John 80 67 90 89 68
7 1994 Tom 23 60 89 78 39
8 1996 Nick 56 98 84 44 50
9 1998 Oscar 64 61 74 59 63
10 2000 Andy 11 50 93 69 90
Another option would be to pass the column index locations up front with usecols, like so:
df = pd.read_csv('csv1.csv', header=None, usecols=[0,1,5,6,7,8,9])
Notice that this returns a dataframe with index-location named columns:
0 1 5 6 7 8 9
0 1980 Mark 90 56 78 44 88
1 1982 Cindy 45 76 22 42 90
2 1984 Kevin 67 83 52 55 59
3 1986 Michael 94 63 73 60 43
4 1988 Anna 66 50 59 57 33
5 1990 Jessica 72 34 29 69 27
6 1992 John 80 67 90 89 68
7 1994 Tom 23 60 89 78 39
8 1996 Nick 56 98 84 44 50
9 1998 Oscar 64 61 74 59 63
10 2000 Andy 11 50 93 69 90
You could do this with a simple list comprehension:
def read_data(filename):
f = open("myfile.csv", "rt")
data = {}
col_nums = [0, 1, 5, 6, 7, 8, 9]
for line in f:
row = line.rstrip().split(',')
data[row[0]] = [row[i] for i in col_nums]
return data
You could also consider using Pandas to help you read and wrangle the data:
import pandas as pd
df = pd.read_csv("myfile.csv", columns=['year', 'name', 'gender', 'kind', 'size', 'num1', 'num2', 'num3', 'num4', 'num5'])
data = df[['year', 'name', 'num1', 'num2', 'num3', 'num4', 'num5']]
You could try to split each line and assign it explicitly to variables; then simply ignore the variables you will not use (I named them _, so its obvious that they will not be used).
This will raise errors (in the code line that has split()) if a line has less or more than the desired fields.
def read_data(filename):
data = {}
with open(filename) as f:
for line in f:
line = line.strip()
if len(line) > 0:
year, name, _, _, _, n1, n2, n3, n4, n5 = line.split(',')
data[year] = [n1, n2, n3, n4, n5]
return data

How can I get combined result of a column values in a DataFrame?

I have below data in a DataFrame.
city age
mumbai 12 33 5 55
delhi 24 56 78 23 43 55 67
kal 12 43 55 78 34
mumbai 14 56 78 99 # Have a leading space
MUMbai 34 59 # Have Capitol letters
kal 11
I want to convert it into below format :
city age
mumbai 12 33 5 55 14 56 78 99 34 59
delhi 24 56 78 23 43 55 67
kal 12 43 55 78 34 11
How can I achieve this?
Note:
I have edited the data, now some city name are in Capital letter and some has leading spaces. How can we apply strip() and lower() functions to it?
We use groupby with sort=False to ensure we present cities in the same order they first appear.
We use ' '.join to concatenate the strings together.
Lastly, we reset_index to get the city values that have been placed in the index into the dataframe proper.
df.groupby('city', sort=False).age.apply(' '.join).reset_index()
city age
0 mumbai 12 33 5 55 14 56 78 99 34 59
1 delhi 24 56 78 23 43 55 67
2 kal 12 43 55 78 34 11
Response to Edit
df.age.str.strip().groupby(
df.city.str.strip().str.lower(),
sort=False
).apply(' '.join).reset_index()
city age
0 mumbai 12 33 5 55 14 56 78 99 34 59
1 delhi 24 56 78 23 43 55 67
2 kal 12 43 55 78 34 11

Last cell in a column dataframe from excel using pandas

I just had a quick question. How would one go about getting the last cell value of an excel spreadsheet when working with it as a dataframe using pandas, for every single different column. I'm having quite some difficulty with this, I know the index can be found with len(), but I can't quite wrap my finger around it. Thank you any help would be greatly appreciated.
If you want the last cell of a dataframe meaning the most bottom right cell, then you can use .iloc:
df = pd.DataFrame(np.arange(1,101).reshape((10,-1)))
df
Output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10
1 11 12 13 14 15 16 17 18 19 20
2 21 22 23 24 25 26 27 28 29 30
3 31 32 33 34 35 36 37 38 39 40
4 41 42 43 44 45 46 47 48 49 50
5 51 52 53 54 55 56 57 58 59 60
6 61 62 63 64 65 66 67 68 69 70
7 71 72 73 74 75 76 77 78 79 80
8 81 82 83 84 85 86 87 88 89 90
9 91 92 93 94 95 96 97 98 99 100
Use .iloc with -1 index selection on both rows and columns.
df.iloc[-1,-1]
Output:
100
DataFrame.head(n) gets the top n results from the dataframe. DataFrame.tail(n) gets the bottom n results from the dataframe.
If your dataframe is named df, you could use df.tail(1) to get the last row of the dataframe. The returned value is also a dataframe.

Categories

Resources