How do I convert nested list to dictionary? - python

I am currently working on an assignment where I need to convert a nested list to a dictionary, where i have to separate the codes from the nested list below.
data = [['ABC', "Tel", "12/07/2017", 1.5, 1000],['ACE', "S&P", "12/08/2017", 3.2, 2000],['AEB', "ENG", "04/03/2017", 1.4, 3000]]
to get this
Code Name Purchase Date Price Volume
ABC Tel 12/07/2017 1.5 1000
ACE S&P 12/08/2017 3.2 2000
AEB ENG 04/03/2017 1.4 3000
so the remaining values are still in a list, but tagged to codes as keys.
Could anyone advice on this please,thank you!

You can use a dictcomp:
keys = ['Code','Name','Purchase Date','Price','Volume']
{k: v for k, *v in zip(keys, *data)}
Result:
{'Code': ['ABC', 'ACE', 'AEB'],
'Name': ['Tel', 'S&P', 'ENG'],
'Purchase Date': ['12/07/2017', '12/08/2017', '04/03/2017'],
'Price': [1.5, 3.2, 1.4],
'Volume': [1000, 2000, 3000]}

You can use pandas dataframe for that:
import pandas as pd
data = [['ABC', "Tel", "12/07/2017", 1.5, 1000],['ACE', "S&P", "12/08/2017", 3.2, 2000],['AEB', "ENG", "04/03/2017", 1.4, 3000]]
columns = ["Code","Name","Purchase Date","Price","Volume"]
df = pd.DataFrame(data, columns=columns)
print(df)

I assume that by dictionaries you mean a list of dictionaries, each representing a row with the header as its keys.
You can do that like this:
keys = ['Code','Name','Purchase Date','Price','Volume']
dictionaries = [ dict(zip(keys,row)) for row in data ]

Related

Convert a dictionary of a list of dictionaries to pandas DataFrame

I pulled a list of historical option price of AAPL from the RobinHoood function robin_stocks.get_option_historicals(). The data was returned in a form of dictional of list of dictionary as shown below.
I am having difficulties to convert the below object (named historicalData) into a DataFrame. Can someone please help?
historicalData = {'data_points': [{'begins_at': '2020-10-05T13:30:00Z',
'open_price': '1.430000',
'close_price': '1.430000',
'high_price': '1.430000',
'low_price': '1.430000',
'volume': 0,
'session': 'reg',
'interpolated': False},
{'begins_at': '2020-10-05T13:40:00Z',
'open_price': '1.430000',
'close_price': '1.340000',
'high_price': '1.440000',
'low_price': '1.320000',
'volume': 0,
'session': 'reg',
'interpolated': False}],
'open_time': '0001-01-01T00:00:00Z',
'open_price': '0.000000',
'previous_close_time': '0001-01-01T00:00:00Z',
'previous_close_price': '0.000000',
'interval': '10minute',
'span': 'week',
'bounds': 'regular',
'id': '22b49380-8c50-4c76-8fb1-a4d06058f91e',
'instrument': 'https://api.robinhood.com/options/instruments/22b49380-8c50-4c76-8fb1-a4d06058f91e/'}
I tried the below code code but that didn't help:
import pandas as pd
df = pd.DataFrame(historicalData)
df
You didn't write that you want only data_points (as in the
other answer), so I assume that you want your whole dictionary
converted to a DataFrame.
To do it, start with your code:
df = pd.DataFrame(historicalData)
It creates a DataFrame, with data_points "exploded" to
consecutive rows, but they are still dictionaries.
Then rename open_price column to open_price_all:
df.rename(columns={'open_price': 'open_price_all'}, inplace=True)
The reason is to avoid duplicated column names after join
to be performed soon (data_points contain also open_price
attribute and I want the corresponding column from data_points
to "inherit" this name).
The next step is to create a temporary DataFrame - a split of
dictionaries in data_points to individual columns:
wrk = df.data_points.apply(pd.Series)
Print wrk to see the result.
And the last step is to join df with wrk and drop
data_points column (not needed any more, since it was
split into separate columns):
result = df.join(wrk).drop(columns=['data_points'])
This is pretty easy to solve with the below. I have chucked the dataframe to a list via list comprehension
import pandas as pd
df_list = [pd.DataFrame(dic.items(), columns=['Parameters', 'Value']) for dic in historicalData['data_points']]
You then could do:
df_list[0]
which will yield
Parameters Value
0 begins_at 2020-10-05T13:30:00Z
1 open_price 1.430000
2 close_price 1.430000
3 high_price 1.430000
4 low_price 1.430000
5 volume 0
6 session reg
7 interpolated False

How to convert columns of a Pandas DataFrame into separate dicts where dictnames are the column names?

I want to convert a Pandas DataFrame into separate dicts, where the names of the dict are the columnn names and all dics have the same index.
the dataframe looks like this:
cBmsExp cCncC cDnsWd
PlantName
A.gre 2.5 0.45 896.8
A.rig 2.5 0.40 974.9
A.tex 3.5 0.45 863.1
the result should be:
cBmsExp = {"A.gre":2.5, "A.rig": 2.5, "A.tex": 3.5}
cCncC = {"A.gre":0.45, "A.rig": 0.4, "A.tex": 0.45}
cDnsWd = {"A.gre":898.8, "A.rig": 974.9, "A.tex": 863.1}
I can't figure out how a column name can become the name of a variable in my Python code.
I went through piles of stack overflow questions and answers, but I didn't find this type of problem among them.
Suggestions for code are very much appreciated!
It is not recommended, better is create dict of dicts and select by keys:
d = df.to_dict()
print (d)
{'cBmsExp': {'A.gre': 2.5, 'A.rig': 2.5, 'A.tex': 3.5},
'cCncC': {'A.gre': 0.45, 'A.rig': 0.4, 'A.tex': 0.45},
'cDnsWd': {'A.gre': 896.8, 'A.rig': 974.9, 'A.tex': 863.1}}
print (d['cBmsExp'])
{'A.gre': 2.5, 'A.rig': 2.5, 'A.tex': 3.5}
But possible, e.g. by globals:
for k, v in d.items():
globals()[k] = v
print (cBmsExp)
{'A.gre': 2.5, 'A.rig': 2.5, 'A.tex': 3.5}

Convert a Python DataFrame into a list of dictionaries

I have a dataframe and want to convert it to a list of dictionaries. I use read_csv() to create this dataframe. The dataframe looks like the following:
AccountName AccountType StockName Allocation
0 MN001 #1 ABC 0.4
1 MN001 #1 ABD 0.6
2 MN002 #2 EFG 0.5
3 MN002 #2 HIJ 0.4
4 MN002 #2 LMN 0.1
The desired output:
[{'ABC':0.4, 'ABD':0.6}, {'EFG':0.5, 'HIJ':0.4,'LMN':0.1}]
I have tried to research on similar topics and used the Dataframe.to_dict() function. I look forward to getting this done. Many thanks for your help!
import pandas as pd
import numpy as np
d = np.array([['MN001','#1','ABC', 0.4],
['MN001','#1','ABD', 0.6],
['MN002', '#2', 'EFG', 0.5],
['MN002', '#2', 'HIJ', 0.4],
['MN002', '#2', 'LMN', 0.1]])
df = pd.DataFrame(data=d, columns = ['AccountName','AccountType','StockName', 'Allocation'])
by_account_df = df.groupby('AccountName').apply(lambda x : dict(zip(x['StockName'],x['Allocation']))).reset_index(name='dic'))
by_account_lst = by_account_df['dic'].values.tolist()
And the result should be:
print(by_account_lst)
[{'ABC': '0.4', 'ABD': '0.6'}, {'EFG': '0.5', 'HIJ': '0.4', 'LMN': '0.1'}]
This should do it:
portfolios = []
for _, account in df.groupby('AccountName'):
portfolio = {stock['StockName']: stock['Allocation']
for _, stock in account.iterrows()}
portfolios.append(portfolio)
First use the groupby() function to group the rows of the dataframe by AccountName. To access the individual rows (stocks) for each account, you use the iterrows() method. As user #ebb-earl-co explained in the comments, the _ is there as a placeholder variable, because iterrows() returns (index, Series) tuples, and we only need the Series (the rows themselves). From there, use a dict comprehension to create a dictionary mapping StockName -> Allocation for each stock. Finally, append that dictionary to the list of portfolios, resulting in the expected output:
[{'ABC': 0.4, 'ABD': 0.6}, {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}]
One more thing: if you decide later that you want to label each dict in the portfolios with the account name, you could do it like this:
portfolios = []
for acct_name, account in df.groupby('AccountName'):
portfolio = {stock['StockName']: stock['Allocation']
for _, stock in account.iterrows()}
portfolios.append({acct_name: portfolio})
This will return a list of nested dicts like this:
[{'MN001': {'ABC': 0.4, 'ABD': 0.6}},
{'MN002': {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}}]
Note that in this case, I used the variable acct_name instead of assigning to _ because we actually will use the index to "label" the dicts in the portfolios list.

Sorting array data by common date

I have a .csv file with many rows and 3 columns: Date, Rep, and Sales. I would like to use Python to generate a new array that groups the data by Date and, for the given date, sorts the Reps by Sales. As an example, my input data looks like this:
salesData = [[201703,'Bob',3000], [201703,'Sarah',6000], [201703,'Jim',9000],
[201704,'Bob',8000], [201704,'Sarah',7000], [201704,'Jim',12000],
[201705,'Bob',15000], [201705,'Sarah',14000], [201705,'Jim',8000],
[201706,'Bob',10000], [201706,'Sarah',18000]]
My desired output would look like this:
sortedData = [[201703,'Jim', 'Sarah', 'Bob'], [201704,'Jim', 'Bob',
'Sarah'], [201705,'Bob', 'Sarah', 'Jim'], [201706, 'Sarah', 'Bob']]
I am new to Python, but I have searched quite a bit for a solution with no success. Most of my search results lead me to believe there may be an easy way to do this using pandas (which I have not used) or numpy (which I have used).
Any suggestions would be greatly appreciated. I am using Python 3.6.
Use Pandas!
import pandas as pd
salesData = [[201703, 'Bob', 3000], [201703, 'Sarah', 6000], [201703, 'Jim', 9000],
[201704, 'Bob', 8000], [201704, 'Sarah', 7000], [201704, 'Jim', 12000],
[201705, 'Bob', 15000], [201705, 'Sarah', 14000], [201705, 'Jim', 8000],
[201706, 'Bob', 10000], [201706, 'Sarah', 18000]]
sales_df = pd.DataFrame(salesData)
result = []
for name, group in sales_df.groupby(0):
sorted_df = group.sort_values(2, ascending=False)
result.append([name] + list(sorted_df[1]))
print(result)
Without pandas, you can try this one line answer:
sortedData = [[i]+[item[1] for item in salesData if item[0]==i] for i in sorted(set([item[0] for item in salesData]))]
EDIT:
You can do this to order each inner list by sales:
sortedData = [[i]+[item[1] for item in sorted(salesData, key=lambda x: -x[2]) if item[0]==i] for i in sorted(set([item[0] for item in salesData]))]
Note that sorted(salesData, key=lambda x: -x[2]) part performs the ordering

Convert pyspark.sql.dataframe.DataFrame type Dataframe to Dictionary

I have a pyspark Dataframe and I need to convert this into python dictionary.
Below code is reproducible:
from pyspark.sql import Row
rdd = sc.parallelize([Row(name='Alice', age=5, height=80),Row(name='Alice', age=5, height=80),Row(name='Alice', age=10, height=80)])
df = rdd.toDF()
Once I have this dataframe, I need to convert it into dictionary.
I tried like this
df.set_index('name').to_dict()
But it gives error. How can I achieve this
Please see the example below:
>>> from pyspark.sql.functions import col
>>> df = (sc.textFile('data.txt')
.map(lambda line: line.split(","))
.toDF(['name','age','height'])
.select(col('name'), col('age').cast('int'), col('height').cast('int')))
+-----+---+------+
| name|age|height|
+-----+---+------+
|Alice| 5| 80|
| Bob| 5| 80|
|Alice| 10| 80|
+-----+---+------+
>>> list_persons = map(lambda row: row.asDict(), df.collect())
>>> list_persons
[
{'age': 5, 'name': u'Alice', 'height': 80},
{'age': 5, 'name': u'Bob', 'height': 80},
{'age': 10, 'name': u'Alice', 'height': 80}
]
>>> dict_persons = {person['name']: person for person in list_persons}
>>> dict_persons
{u'Bob': {'age': 5, 'name': u'Bob', 'height': 80}, u'Alice': {'age': 10, 'name': u'Alice', 'height': 80}}
The input that I'm using to test data.txt:
Alice,5,80
Bob,5,80
Alice,10,80
First we do the loading by using pyspark by reading the lines. Then we convert the lines to columns by splitting on the comma. Then we convert the native RDD to a DF and add names to the colume. Finally we convert to columns to the appropriate format.
Then we collect everything to the driver, and using some python list comprehension we convert the data to the form as preferred. We convert the Row object to a dictionary using the asDict() method. In the output we can observe that Alice is appearing only once, but this is of course because the key of Alice gets overwritten.
Please keep in mind that you want to do all the processing and filtering inside pypspark before returning the result to the driver.
Hope this helps, cheers.
You need to first convert to a pandas.DataFrame using toPandas(), then you can use the to_dict() method on the transposed dataframe with orient='list':
df.toPandas().set_index('name').T.to_dict('list')
# Out[1]: {u'Alice': [10, 80]}
RDDs have built in function asDict() that allows to represent each row as a dict.
If you have a dataframe df, then you need to convert it to an rdd and apply asDict().
new_rdd = df.rdd.map(lambda row: row.asDict(True))
One can then use the new_rdd to perform normal python map operations like:
# You can define normal python functions like below and plug them when needed
def transform(row):
# Add a new key to each row
row["new_key"] = "my_new_value"
return row
new_rdd = new_rdd.map(lambda row: transform(row))
One easy way can be to collect the row RDDs and iterate over it using dictionary comprehension. Here i will try to demonstrate something similar:
Lets assume a movie dataframe:
movie_df
movieId
avg_rating
1
3.92
10
3.5
100
2.79
100044
4.0
100068
3.5
100083
3.5
100106
3.5
100159
4.5
100163
2.9
100194
4.5
We can use dictionary comprehension and iterate over the row RDDs like below:
movie_dict = {int(row.asDict()['movieId']) : row.asDict()['avg_rating'] for row in movie_avg_rating.collect()}
print(movie_dict)
{1: 3.92,
10: 3.5,
100: 2.79,
100044: 4.0,
100068: 3.5,
100083: 3.5,
100106: 3.5,
100159: 4.5,
100163: 2.9,
100194: 4.5}

Categories

Resources