Rows not appending to dataframe while on loop - python

I was working through a database and creating a dataframe of selected information. The database can be found at www.cricsheet.org.
The code for the same is:
bat = {'Name' : [], 'Runs' : [], 'Balls' : [], 'StrikeR' : []}
batsman = pd.DataFrame(bat)
batsman.head()
index = ['Name','Runs','Balls','StrikeR']
data = []
count = 0
for i in items[0]["1st innings"]["deliveries"]:
name = list(i.values())[0]["batsman"]
run = list(i.values())[0]["runs"]["batsman"]
if name in list(batsman['Name']):
batsman.loc[batsman.Name == name].Runs += run
batsman.loc[batsman.Name == name].Balls += 1
batsman.loc[batsman.Name == name].StrikeR = batsman.loc[batsman.Name == name].Runs/batsman.loc[batsman.Name == name].Balls
else:
data = [name,run,1,run]
print(b)
batsman.append(pd.Series(data, index = index), ignore_index=True)
To give a context the array data is of type:
['GC Smith', 0, 1, 0]
['HH Dippenaar', 0, 1, 0]
['HH Dippenaar', 0, 1, 0]
['HH Dippenaar', 2, 1, 2]
['HH Dippenaar', 0, 1, 0]
I was hoping to update this data in a pandas dataframe, However the data is not appending to the dataframe. Can anyone tell me why and what is the solution to it?
Edit: I am adding a part of items[0] dataset.
{'1st innings': {'team': 'South Africa', 'deliveries': [{0.1: {'batsman': 'GC Smith', 'bowler': 'WPUJC Vaas', 'non_striker': 'HH Dippenaar', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}, {0.2: {'batsman': 'GC Smith', 'bowler': 'WPUJC Vaas', 'non_striker': 'HH Dippenaar', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}, {0.3: {'batsman': 'GC Smith', 'bowler': 'WPUJC Vaas', 'non_striker': 'HH Dippenaar', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}, {0.4: {'batsman': 'GC Smith', 'bowler': 'WPUJC Vaas', 'non_striker': 'HH Dippenaar', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}, {0.5: {'batsman': 'GC Smith', 'bowler': 'WPUJC Vaas', 'non_striker': 'HH Dippenaar', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}, {0.6: {'batsman': 'GC Smith', 'bowler': 'WPUJC Vaas', 'non_striker': 'HH Dippenaar', 'runs': {'batsman': 0, 'extras': 0, 'total': 0}}}

Hei,
Appending to a dataframe doesn't happen in place. The append function will only return the new dataframe which contains the appended value, and will not modify the original dataframe
So,
batsman.append(pd.Series(data, index = index), ignore_index=True)
Should be
batsman = batsman.append(pd.Series(data, index = index), ignore_index=True)

Related

python nested dictionary to pandas DataFrame

main_dict = {
'NSE:ACC': {'average_price': 0,
'buy_quantity': 0,
'depth': {'buy': [{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0}],
'sell': [{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0},
{'orders': 0, 'price': 0, 'quantity': 0}]},
'instrument_token': 5633,
'last_price': 2488.9,
'last_quantity': 0,
'last_trade_time': '2022-09-23 15:59:10',
'lower_circuit_limit': 2240.05,
'net_change': 0,
'ohlc': {'close': 2555.7,
'high': 2585.5,
'low': 2472.2,
'open': 2575},
'oi': 0,
'oi_day_high': 0,
'oi_day_low': 0,
'sell_quantity': 0,
'timestamp': '2022-09-23 18:55:17',
'upper_circuit_limit': 2737.75,
'volume': 0},
}
convert dict to pandas dataframe
for example:
symbol last_price net_change Open High Low Close
NSE:ACC 2488.9 0 2575 2585.5 2472.2 2555.7
I am trying pd.DataFrame.from_dict(main_dict)
but it does not work.
please give the best suggestion.
I would first select the necessary data from your dict and then pass that as input to pd.DataFrame()
df_input = [{
"symbol": symbol,
"last_price": main_dict.get(symbol).get("last_price"),
"net_change": main_dict.get(symbol).get("net_change"),
"open": main_dict.get(symbol).get("ohlc").get("open"),
"high": main_dict.get(symbol).get("ohlc").get("high"),
"low": main_dict.get(symbol).get("ohlc").get("low"),
"close": main_dict.get(symbol).get("ohlc").get("close")
} for symbol in main_dict]
import pandas as pd
df = pd.DataFrame(df_input)

Unpacking Json with nested Lists in Pandas

I have a json file that I am trying to unpack that looks like this:
[{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}},
{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0},
'wickets': [{'player_out': 'LA Marsh', 'kind': 'bowled'}]},
{'batter': 'EA Perry',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}}]
using the following code:
df = pd.json_normalize(data)
I get the following:
As you can see, the second entry has a nested list in it. In place of the column 'wickets' I would like to have two columns "player_out" and "kind". My preferred output looks like this:
Use:
df = df.drop(columns=['wickets']).join(df['wickets'].explode().apply(pd.Series))
You can try:
import pandas as pd
from collections import MutableMapping
def flatten(d, parent_key='', sep='.'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
elif isinstance(v, list):
for idx, value in enumerate(v):
items.extend(flatten(value, new_key, sep).items())
else:
items.append((new_key, v))
return dict(items)
data = [{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}},
{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0},
'wickets': [{'player_out': 'LA Marsh', 'kind': 'bowled'}]},
{'batter': 'EA Perry',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}}]
output = []
for dict_data in data:
output.append(flatten(dict_data))
df = pd.DataFrame(output)
print(df)
Output:
batter bowler non_striker runs.batter runs.extras runs.total wickets.player_out wickets.kind
0 LA Marsh MJG Nielsen M Kapp 0 0 0 NaN NaN
1 LA Marsh MJG Nielsen M Kapp 0 0 0 LA Marsh bowled
2 EA Perry MJG Nielsen M Kapp 0 0 0 NaN NaN
if you want to keep using json normalize you need to fisrt homogenize the data
then apply json normalize
# homogenize data
nan_entries = [{'player_out': pd.NA, 'kind': pd.NA}]
for entry in data:
if 'wickets' not in entry.keys():
entry['wickets'] = nan_entries
# use json normailze
pd.json_normalize(data,
record_path='wickets',
meta=['batter', 'bowler', 'non_striker', ['runs', 'batter'],
['runs', 'extras'], ['runs', 'total'] ],
record_prefix='wickets.')
output
wickets.player_out wickets.kind batter bowler non_striker runs.batter runs.extras runs.total
0 <NA> <NA> LA Marsh MJG Nielsen M Kapp 0 0 0
1 LA Marsh bowled LA Marsh MJG Nielsen M Kapp 0 0 0
2 <NA> <NA> EA Perry MJG Nielsen M Kapp 0 0 0

Convert a nested list of strings into a data frame

I have JSON file containing something like this.
[7500,
'29-Dec-2022',
{'strikePrice': 7500, 'expiryDate': '29-Dec-2022', 'underlying': 'NIFTY', 'identifier': 'OPTIDXNIFTY29-12-2022PE7500.00', 'openInterest': 21, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 8.6, 'change': 0, 'pChange': 0, 'totalBuyQuantity': 1800, 'totalSellQuantity': 0, 'bidQty': 1800, 'bidprice': 3.05, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 17287.05},
8300,
'30-Jun-2022',
{'strikePrice': 8300, 'expiryDate': '30-Jun-2022', 'underlying': 'NIFTY', 'identifier': 'OPTIDXNIFTY30-06-2022PE8300.00', 'openInterest': 3, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 4.7, 'change': 0, 'pChange': 0, 'totalBuyQuantity': 1050, 'totalSellQuantity': 0, 'bidQty': 750, 'bidprice': 0.35, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 17287.05},
8500,
'29-Jun-2023', {'strikePrice': 8500, 'expiryDate': '29-Jun-2023', 'underlying': 'NIFTY', 'identifier': 'OPTIDXNIFTY29-06-2023CE8500.00', 'openInterest': 319.5, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 1775, 'change': 0, 'pChange': 0, 'totalBuyQuantity': 0, 'totalSellQuantity': 50, 'bidQty': 0, 'bidprice': 0, 'askQty': 50, 'askPrice': 9970, 'underlyingValue': 17287.05},
8500,
'29-Dec-2022',
{'strikePrice': 8500, 'expiryDate': '29-Dec-2022', 'underlying': 'NIFTY', 'identifier': 'OPTIDXNIFTY29-12-2022PE8500.00', 'openInterest': 2254, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 22.9, 'change': 0, 'pChange': 0, 'totalBuyQuantity': 2700, 'totalSellQuantity': 0, 'bidQty': 1800, 'bidprice': 3.15, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 17287.05}]
Code:
read_cont = []
new_list1 = []
new_list2 = []
for i in rjson:
for j in rjson[i]:
read_cont.append(rjson[i][j])
data_filter = read_cont[1]
for item in data_filter:
for j in item:
new_list1.append(item[j])
new_list1 = map(str,new_list1)
for i in new_list1:
if len(i) > 100:
new_list2.append(i)
header_names = ["STRIKE PRICE","EXPIRY","underlying", "identifier","OPENINTEREST","changeinOpenInterest","pchangeinOpenInterest", "totalTradedVolume","impliedVolatility","lastPrice","change","pChange", "totalBuyQuantity","totalSellQuantity","bidQty","bidprice","askQty","askPrice","underlyingValue"]
df = pd.DataFrame(columns=header_names)
In order to separate the strikePrice entries from the nested list, I had converted all the items to string
["{'strikePrice': 7500, 'expiryDate': '29-Dec-2022', 'underlying': 'NIFTY', 'identifier': 'OPTIDXNIFTY29-12-2022PE7500.00', 'openInterest': 21, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 8.6, 'change': 0, 'pChange': 0, 'totalBuyQuantity': 1800, 'totalSellQuantity': 0, 'bidQty': 1800, 'bidprice': 3.05, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 17287.05}",
"{'strikePrice': 8300, 'expiryDate': '30-Jun-2022', 'underlying': 'NIFTY', 'identifier': 'OPTIDXNIFTY30-06-2022PE8300.00', 'openInterest': 3, 'changeinOpenInterest': 0, 'pchangeinOpenInterest': 0, 'totalTradedVolume': 0, 'impliedVolatility': 0, 'lastPrice': 4.7, 'change': 0, 'pChange': 0, 'totalBuyQuantity': 1050, 'totalSellQuantity': 0, 'bidQty': 750, 'bidprice': 0.35, 'askQty': 0, 'askPrice': 0, 'underlyingValue': 17287.05}"
Now I want to transfer the content to a data frame containing the below column mention in the code
result_dict = []
result_values = []
for i in range(2, len(input_list), 3):
result_dict.append(input_list[i])
result_values.append(input_list[i].values())
col_names = list(result_dict[0].keys())
result_df = pd.DataFrame(result_values, columns = col_names)
rjson = response.json()
read_cont = []
new_list1 = []
new_list2 = []
for i in rjson:
for j in rjson[i]:
read_cont.append(rjson[i][j])
data_filter = read_cont[1]
for item in data_filter:
for j in item:
new_list1.append(item[j])
for j in new_list1:
if type(j) == dict:
new_list2.append(j)
df = pd.DataFrame(new_list2)

Extract key and value from json to new dataframe

I have a dataframe that has JSON values are in columns. Those were indented into multiple levels. I would like to extract the end key and value into a new dataframe. I will give you sample column values below
{'shipping_assignments': [{'shipping': {'address': {'address_type':
'shipping', 'city': 'Calder', 'country_id': 'US',
'customer_address_id': 1, 'email': 'roni_cost#example.com',
'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello',
'parent_id': 1, 'postcode': '49628-7978', 'region': 'Michigan',
'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff
Parkway'], 'telephone': '(555) 229-3326'}, 'method':
'flatrate_flatrate', 'total': {'base_shipping_amount': 5,
'base_shipping_discount_amount': 0,
'base_shipping_discount_tax_compensation_amnt': 0,
'base_shipping_incl_tax': 5, 'base_shipping_invoiced': 5,
'base_shipping_tax_amount': 0, 'shipping_amount': 5,
'shipping_discount_amount': 0,
'shipping_discount_tax_compensation_amount': 0, 'shipping_incl_tax':
5, 'shipping_invoiced': 5, 'shipping_tax_amount': 0}}, 'items':
[{'amount_refunded': 0, 'applied_rule_ids': '1',
'base_amount_refunded': 0, 'base_discount_amount': 0,
'base_discount_invoiced': 0, 'base_discount_tax_compensation_amount':
0, 'base_discount_tax_compensation_invoiced': 0,
'base_original_price': 29, 'base_price': 29, 'base_price_incl_tax':
31.39, 'base_row_invoiced': 29, 'base_row_total': 29, 'base_row_total_incl_tax': 31.39, 'base_tax_amount': 2.39,
'base_tax_invoiced': 2.39, 'created_at': '2019-09-27 10:03:45',
'discount_amount': 0, 'discount_invoiced': 0, 'discount_percent': 0,
'free_shipping': 0, 'discount_tax_compensation_amount': 0,
'discount_tax_compensation_invoiced': 0, 'is_qty_decimal': 0,
'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0,
'order_id': 1, 'original_price': 29, 'price': 29, 'price_incl_tax':
31.39, 'product_id': 1434, 'product_type': 'configurable', 'qty_canceled': 0, 'qty_invoiced': 1, 'qty_ordered': 1,
'qty_refunded': 0, 'qty_shipped': 1, 'row_invoiced': 29, 'row_total':
29, 'row_total_incl_tax': 31.39, 'row_weight': 1, 'sku':
'WS03-XS-Red', 'store_id': 1, 'tax_amount': 2.39, 'tax_invoiced':
2.39, 'tax_percent': 8.25, 'updated_at': '2019-09-27 10:03:46', 'weight': 1, 'product_option': {'extension_attributes':
{'configurable_item_options': [{'option_id': '141', 'option_value':
167}, {'option_id': '93', 'option_value': 58}]}}}]}],
'payment_additional_info': [{'key': 'method_title', 'value': 'Check /
Money order'}], 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title':
'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount':
2.39}], 'item_applied_taxes': [{'type': 'product', 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent':
8.25, 'amount': 2.39, 'base_amount': 2.39}]}], 'converting_from_quote': True}
Above is single row value of the dataframe column df['x']
My codes are below to convert
sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)
it gives new dataframe with columns
Index(['applied_taxes', 'converting_from_quote', 'item_applied_taxes',
'payment_additional_info', 'shipping_assignments'],
dtype='object')
When I tried to do the same above to convert the column which has row values
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)
It doesn't work
Check this link to get the beautified_json
I came across a beautiful ETL package in python called petl. convert the json list into dict form with the help of function called fromdicts(json_string)
order_table = fromdicts(data_list)
If you find any nested dict in any of the columns, use unpackdict(order_table,'nested_col')
it will unpack the nested dict.
In my case, I need to unpack the applied_tax column. Below code will unpack and append the key and value as a column and row in the same table.
order_table = unpackdict(order_table, 'applied_taxes')
If you guys wants to know more about -petl
It seems that your mistake was in tolist(). Try the following:
import pandas as pd
import json
import re
data = {"shipping_assignments":[{"shipping":{"address":{"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"roni_cost#example.com","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326"},"method":"flatrate_flatrate","total":{"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0}},"items":[{"amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":{"extension_attributes":{"configurable_item_options":[{"option_id":"141","option_value":167},{"option_id":"93","option_value":58}]}}}]}],"payment_additional_info":[{"key":"method_title","value":"Check / Money order"}],"applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}],"item_applied_taxes":[{"type":"product","applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}]}],"converting_from_quote":"True"}
df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)
prints the following:
0
0 {'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...

list.append copies the last item only

This might endup in very silly question, but being a newbie in python i am not able to find a good solution to following problem.
class Preprocessor:
mPath = None;
df = None;
def __init__(self, path):
self.mPath = path;
def read(self):
self.df = pd.read_csv(self.mPath);
return self.df;
def __findUniqueGenres(self):
setOfGenres = set();
for index, genre in self.df['genres'].iteritems():
listOfGenreInMovie = genre.lower().split("|");
for i, _genre in np.ndenumerate(listOfGenreInMovie):
setOfGenres.add(_genre)
return setOfGenres;
def __prepareDataframe(self, genres):
all_columns = set(["title", "movieId"]).union(genres)
_df = pd.DataFrame(columns=all_columns)
return _df;
def __getRowTemplate(self, listOfColumns):
_rowTemplate = {}
for col in listOfColumns:
_rowTemplate[col] = 0
return _rowTemplate;
def __createRow(self, rowTemplate, row):
rowTemplate['title'] = row.title;
rowTemplate['movieId'] = row.movieId;
movieGenres = row.genres.lower().split("|");
for movieGenre in movieGenres:
rowTemplate[movieGenre] = 1;
return rowTemplate;
def tranformDataFrame(self):
genres = self.__findUniqueGenres();
print('### List of genres...', genres);
__df = self.__prepareDataframe(genres); # Data frame with all required columns.
rowTemplate = self.__getRowTemplate(__df.columns)
print('### Row template looks like -->', rowTemplate)
collection = []
for index, row in self.df.iterrows():
_rowToAdd=self.__createRow(rowTemplate, row);
print('### Row looks like', _rowToAdd)
collection.append(_rowToAdd)
print('### Collection looks like', collection)
return __df.append(collection)
Here when i am trying to append a _rowToAdd to collection, it endsup having a collection of last item ( last row of self.df).
Below are logs for the same (self.df has 3 rows here),
### List of genres... {'mystery', 'horror', 'comedy', 'drama', 'thriller', 'children', 'adventure'}
### Row template looks like --> {'title': 0, 'horror': 0, 'comedy': 0, 'drama': 0, 'children': 0, 'mystery': 0, 'movieId': 0, 'thriller': 0, 'adventure': 0}
### Row looks like {'title': 'Big Night (1996)', 'horror': 0, 'comedy': 1, 'drama': 1, 'children': 0, 'mystery': 0, 'movieId': 994, 'thriller': 0, 'adventure': 0}
### Row looks like {'title': 'Grudge, The (2004)', 'horror': 1, 'comedy': 1, 'drama': 1, 'children': 0, 'mystery': 1, 'movieId': 8947, 'thriller': 1, 'adventure': 0}
### Row looks like {'title': 'Cheetah (1989)', 'horror': 1, 'comedy': 1, 'drama': 1, 'children': 1, 'mystery': 1, 'movieId': 2039, 'thriller': 1, 'adventure': 1}
### Collection looks like [{'title': 'Cheetah (1989)', 'horror': 1, 'comedy': 1, 'drama': 1, 'children': 1, 'mystery': 1, 'movieId': 2039, 'thriller': 1, 'adventure': 1}, {'title': 'Cheetah (1989)', 'horror': 1, 'comedy': 1, 'drama': 1, 'children': 1, 'mystery': 1, 'movieId': 2039, 'thriller': 1, 'adventure': 1}, {'title': 'Cheetah (1989)', 'horror': 1, 'comedy': 1, 'drama': 1, 'children': 1, 'mystery': 1, 'movieId': 2039, 'thriller': 1, 'adventure': 1}]
I want my collection to like
### [
{'title': 'Big Night (1996)', 'horror': 0, 'comedy': 1, 'drama': 1, 'children': 0, 'mystery': 0, 'movieId': 994, 'thriller': 0, 'adventure': 0},
{'title': 'Grudge, The (2004)', 'horror': 1, 'comedy': 0, 'drama': 0, 'children': 0, 'mystery': 1, 'movieId': 8947, 'thriller': 1, 'adventure': 0},
{'title': 'Cheetah (1989)', 'horror': 0, 'comedy': 0, 'drama': 0, 'children': 1, 'mystery': 0, 'movieId': 2039, 'thriller': 0, 'adventure': 1}
]
Dataset - https://grouplens.org/datasets/movielens/
I got to understand the issue now, i was trying to mutate the dictionary object.
def tranformDataFrame(self):
genres = self.__findUniqueGenres();
print('### List of genres...', genres);
__df = self.__prepareDataframe(genres); # Data frame with all required columns.
rowTemplate = self.__getRowTemplate(__df.columns)
print('### Row template looks like -->', rowTemplate)
collection = []
for index, row in self.df.iterrows():
# Creating the fresh copy of row template every time prevent mutation.
_rowToAdd = self.__createRow(self.__getRowTemplate(__df.columns), row);
print('### Row looks like', _rowToAdd)
collection.append(_rowToAdd)
print('### Collection looks like', collection)
return __df.append(collection)
Although there must be some way to cache the copy and cloning it every time ( instead of processing some logic, and creating a dictionary). But, this solution resolve this particular issue at-least.

Categories

Resources