how to separate a dictionary item from a nested list - python

I am stuck in a project where I have to seperate all Dictionary item from a list and create a dataframe from that. Below is the json file link.
Link:- https://drive.google.com/file/d/1H76rjDEZweVGzPcziT5Z6zXqzOSmVZQd/view?usp=sharing
I had written this code which coverting the all list item into string. hence I am able to seperate them into a new list. However the collected item is not getting coverted into a dataframe. Your help will be highly appriciated.
read_cont = []
new_list1 = []
new_list2 = []
for i in rjson:
for j in rjson[i]:
read_cont.append(rjson[i][j])
data_filter = read_cont[1]
for item in data_filter:
for j in item:
new_list1.append(item[j])
new_list1 = map(str,new_list1)
for i in new_list1:
if len(i) > 100:
new_list2.append(i)
header_names = ["STRIKE PRICE","EXPIRY","underlying", "identifier","OPENINTEREST","changeinOpenInterest","pchangeinOpenInterest", "totalTradedVolume","impliedVolatility","lastPrice","change","pChange", "totalBuyQuantity","totalSellQuantity","bidQty","bidprice","askQty","askPrice","underlyingValue"]
df = pd.DataFrame(new_list2,columns=header_names)`
It should be looking something like this.........
Columns: [STRIKE PRICE, EXPIRY, underlying, identifier, OPENINTEREST, changeinOpenInterest, pchangeinOpenInterest, totalTradedVolume, impliedVolatility, lastPrice, change, pChange, totalBuyQuantity, totalSellQuantity, bidQty, bidprice, askQty, askPrice, underlyingValue]
Index: []

import json
import pandas as pd
h = json.load(open('scrap.json'))
mdf = pd.DataFrame()
for i in h['records']['data']:
for k in i:
if isinstance(i[k], dict):
df = pd.DataFrame(i[k], index=[0])
mdf = pd.concat([mdf, df])
continue
print(mdf)

Related

How do I get every permutation of a dictionary possibilies from two lists?

I have multiple sets of two lists that I need to convert into one dictionary by looking at every permutation across rows in a dataframe.
For example, if there is a list of ['cat1','cat2'] and a list of ['top1','top2'], I'd like a resulting dictionary of {'cat1':'top1','cat1':'top2','cat2':'top1','cat2':'top2'}
Here is my current code that gets close but ends up using every letter and not string...
import pandas as pd
test_df = pd.DataFrame()
test_df['category'] = [['cat1'],['cat2'],['cat3','cat3.5'],['cat5']]
test_df['topic'] = [['top1'],[''],['top2','top3'],['top4']]
final_dict = {}
res = {}
for index, row in test_df.iterrows():
print(row["category"], row["topic"])
temp_keys = row["category"]
temp_values = row["topic"]
res = {}
for test_key in temp_keys:
#print(test_key)
for test_value in temp_values:
#print(test_value)
#print(res)
test_key = str(test_key)
print(test_key)
test_value = str(test_value)
print(test_value)
#res[key] = key
#res = dict(zip(str(key),str(test_value)))
res = dict(zip(str(test_key),str(test_value)))
print(res)
print('\n')
If you want a list of tuple instead of dict, you can use pd.MultiIndex.from_product:
out = test_df.apply(pd.MultiIndex.from_product, axis=1).apply(list)
>>> out
0 [(cat1, top1)]
1 [(cat2, )]
2 [(cat3, top2), (cat3, top3), (cat3.5, top2), (...
3 [(cat5, top4)]
dtype: object
>>> out.tolist()
[[('cat1', 'top1')],
[('cat2', '')],
[('cat3', 'top2'), ('cat3', 'top3'), ('cat3.5', 'top2'), ('cat3.5', 'top3')],
[('cat5', 'top4')]]

Python merge similar list values, from defined number of charactes to new list values

I have already sorted and filtered list of files which look similar to this:
sortList = ['aa.001', 'aa.002', 'aa.003', 'vvv.001', 'vvv.002', 'vvv.003']
and I would like to have new list with merged similar values before . as independent lists inside list:
merList = [['aa.001', 'aa.002', 'aa.003'], ['vvv.001', 'vvv.002', 'vvv.003']]
I tried to make loop but without result, so will be great if anyone could help fix it:
merList = []
for name in sortList:
temp_merList = []
for b in range(len(sortList)-1):
if name[b][:-3] == name[b+1][:-3] and name[b] not in merList:
temp_merList.append(name)
else:
merList.append(temp_merList)
print(merList)
You can use itertools.groupby:
from itertools import groupby
sortList = ['aa.001', 'aa.002', 'aa.003', 'vvv.001', 'vvv.002', 'vvv.003']
out = []
for _, g in groupby(sortList, lambda k: k.split('.')[0]):
out.append(list(g))
print(out)
Prints:
[['aa.001', 'aa.002', 'aa.003'], ['vvv.001', 'vvv.002', 'vvv.003']]
EDIT: Another method (using temporay dictionary):
sortList = ['aa.001', 'aa.002', 'aa.003', 'vvv.001', 'vvv.002', 'vvv.003']
tmp = {}
for name in sortList:
tmp.setdefault(name.split('.')[0], []).append(name)
merList = [v for _, v in tmp.items()]
print(merList)

Loop over each item in a row and compare with each item from another row then save the result in a new column_python

I want to loop in python, over each item from a row against other items from the correspondent row from another column.
If item is not present in the row of the second column then should append to the new list that will be converted in another column (this should also eliminate duplicates when appending through if i not in c).
The goal is to compare items from each row of a column against items from the correspondent row in another column and to save the unique values from the first column, in a new column same df.
df columns
This is just an example, I have much many items in each row
I tried using this code but nothing happened and conversion of the list into the column it's not correct from what I have tested
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
for i in a:
if i in a:
if i not in b:
if i not in c:
c.append(i)
print(c)
df['new'] = pd.Series(c)
Any help is more than needed, thanks in advance
So seeing as you have these two variables one way would be:
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
Try something like this:
new = {}
for index, items in enumerate(a):
for thing in items:
if thing not in b[index]:
if index in new:
new[index].append(thing)
else:
new[index] = [thing]
Then map the dictionary to the df.
df['new'] = df.index.map(new)
There are better ways to do it but this should work.
This should be what you want:
import pandas as pd
data = {'final_key_concat':[['Camiseta', 'Tecnica', 'hombre', 'barate'],
['deportivas', 'calcetin', 'hombres', 'deportivas', 'shoes']],
'attributes_tokenize':[['The', 'North', 'Face', 'manga'], ['deportivas',
'calcetin', 'shoes', 'North']]} #recreated from your image
df = pd.DataFrame(data)
a= df['final_key_concat'].tolist() #this generates a list of lists
b = df['attributes_tokenize'].tolist()#this also generates a list of lists
#Both list a and b need to be flattened so as to access their elements the way you want it
c = [itm for sblst in a for itm in sblst] #flatten list a using list comprehension
d = [itm for sblst in b for itm in sblst] #flatten list b using list comprehension
final_list = [itm for itm in c if itm not in d]#Sort elements common to both list c and d
print (final_list)
Result
['Camiseta', 'Tecnica', 'hombre', 'barate', 'hombres']
def parse_str_into_list(s):
if s.startswith('[') and s.endswith(']'):
return ' '.join(s.strip('[]').strip("'").split("', '"))
return s
def filter_restrict_words(row):
targets = parse_str_into_list(row[0]).split(' ', -1)
restricts = parse_str_into_list(row[1]).split(' ', -1)
print(restricts)
# start for loop each words
# use set type to save words or list if we need to keep words in order
words_to_keep = []
for word in targets:
# condition to keep eligible words
if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
words_to_keep.append(word)
print(words_to_keep)
return ' '.join(words_to_keep)
df['FINAL_KEYWORDS'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)

Optimizing a python code involving looping through lists, dataframes and dictionaries

I am trying to run a code on a large dataset, and optimizing the code in any way could greatly help.
The following is a dummy code of what I am doing:
output = []
for i in my_list:
for index,row in df.iterrows():
# required in output
c1 = []
c2 = []
output_row1 = []
output_row2 = []
# data from datframe df
var1 = row.Var1
var2 = row.Var2
# data from dictionaries
for j in my_dict1[i].col1:
output_row1.append(data_dict[j+":"+i+":"+var1+":"+var2])
c1.append(-1)
for j in my_dict2[i].col2:
output_row2.append(data_dict[i+":"+j+":"+var1+":"+var2])
c2.append(1)
# Final output
output.append([output_row1 + output_row2, c1 + c2])
For each element in my_list, and for each row in dataframe df, I want to add an element in output, the data for which is obtained from 3 separate dictionaries, my_dict1, my_dict2 and data_dict
Could anyone help me in terms of suggesting any better ways to store the data, or any latest libraries of python which might solve this faster. Thanks in advance.
Edited Code:
import pandas as pd
my_list = ["Node1","Node2","Node3","Node4"]
df = pd.DataFrame({"Shipments":[1,2],
"Origin":["Node1","Node2"],
"Destination":["Node3","Node4"]})
my_dict1 = {"Node1":[],
"Node2":["Node1","Node3"],
"Node3":[],
"Node4":["Node2", "Node3"]}
my_dict2 = {"Node1":["Node2"],
"Node2":["Node4"],
"Node3":["Node2", "Node4"],
"Node4":[]}
data_dict = {"Node1:Node2:Node1:Node3":5,
"Node1:Node2:Node2:Node4":5,
"Node3:Node2:Node1:Node3":4,
"Node3:Node2:Node2:Node4":4,
"Node2:Node4:Node1:Node3":3,
"Node2:Node4:Node2:Node4":3,
"Node3:Node4:Node1:Node3":8,
"Node3:Node4:Node2:Node4":8}
output = []
for i in my_list:
for index,row in df.iterrows():
# required in output
c1 = []
c2 = []
output_row1 = []
output_row2 = []
# data from datframe df
var1 = row.Origin
var2 = row.Destination
# data from dictionaries
for j in my_dict1[i]:
output_row1.append(data_dict[j+":"+i+":"+var1+":"+var2])
c1.append(-1)
for j in my_dict2[i]:
output_row2.append(data_dict[i+":"+j+":"+var1+":"+var2])
c2.append(1)
# Final output
output.append([output_row1 + output_row2, c1 + c2])

Python : Find Duplicate Items

I have data in columns of csv .I have an array from two columns of it.Iam using a List of list . I have string list like this
[[A,Bcdef],[Z,Wexy]
I want to identify duplicate entries i.e [A,Bcdef] and [A,Bcdef]
import csv
import StringIO
import os, sys
import hashlib
from collections import Counter
from collections import defaultdict
from itertools import takewhile, count
columns = defaultdict(list)
with open('person.csv','rU') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
listoflists = [];
for row in reader: # read a row as {column1: value1, column2: value2,...}
a_list = [];
for (c,n) in row.items():
if c =="firstName":
try:
a_list.append(n[0])
except IndexError:
pass
for (c,n) in row.items():
if c=="lastName":
try:
a_list.append(n);
except IndexError:
pass
#print list(a_list);
listoflists.append(a_list);
#i += 1
print len(listoflists);
I have tried a couple of solutions proposed here
Using set (listoflist) always returns :unhashable type: 'list'
Functions : returns : 'list' object has no attribute 'values'
For example:
results = list(filter(lambda x: len(x) > 1, dict1.values()))
if len(results) > 0:
print('Duplicates Found:')
print('The following files are identical. the content is identical')
print('___________________')
for result in results:
for subresult in result:
print('\t\t%s' % subresult)
print('___________________')
else:
print('No duplicate files found.')
Any suggestions are welcomed.
Rather than lists, you can use tuples which are hashable.
You could build a set of the string representations of you lists, which are quite hashable.
l = [ ['A', "BCE"], ["B", "CEF"], ['A', 'BCE'] ]
res = []
dups = []
s = sorted(l, key=lambda x: x[0]+x[1])
previous = None
while s:
i = s.pop()
if i == previous:
dups.append(i)
else:
res.append(i)
previous = i
print res
print dups
Assuming you just want to get rid of duplicates and don't care about the order, you could turn your lists into strings, throw them into a set, and then turn them back into a list of lists.
foostrings = [x[0] + x[1] for x in listoflists]
listoflists = [[x[0], x[1:]] for x in set(foostrings)]
Another option, if you're going to be dealing with a bunch of tabular data, is to use pandas.
import pandas as pd
df = pd.DataFrame(listoflists)
deduped_df = df.drop_duplicates()

Categories

Resources