unhashable type: 'dict' - python

I am new in here and want to ask something about removing duplicate data enter, right now I'm still doing my project about face recognition and stuck in remove duplicate data enter that I send to google sheets, this is the code that I use:
if(confidence <100):
id = names[id]
confidence = "{0}%".format (round(100-confidence))
row = (id,datetime.datetime,now().strftime('%Y-%m-%d %H:%M:%S'))
index = 2
sheet.insert_row (row,index)
data = sheet.get_all_records()
result = list(set(data))
print (result)
The message error "unhashable type: 'dict"
I want to post the result in google sheet only once enter

You can't add dictionaries to sets.
What you can do is add the dictionary items to the set. You can cast this to a list of tuples like so:
s = set(tuple(data.items()))
If you need to convert this back to a dictionary after, you can do:
for t in s:
new_dict = dict(t)

According to documentation of gspread get_all_records() returns list of dicts where dict has head row as key and value as cell value. So, you need to iterate through this list compare your ids to find and remove repeating items. Sample code:
visited = []
filtered = []
for row in data:
if row['id'] not in visited:
visited.append(row['id'])
else:
filtered.append(row)
Now, filtered should contain unique items. But instead of id you should put the name of the column which contains repeating value.

Related

How do I find the count of a column of lists and display by date?

My dataset looks like this
Using python and pandas I want to display the count of each unique item in the coverage column which are stored in a list shown in the table.
I want to display that count by device and by date.
Example out put would be:
the unique coverage count being the count of each unique list value in the "coverage" row
You can use apply method to iterate over rows and apply a custom function. This function may return the length of the list. For example:
df["covarage_count"] = df["coverage"].apply(lambda x: len(x))
Here's how I solved it using for loops
coverage_list = []
for item in list(df["coverage"]):
if item == '[]':
item = ''
else:
item = list(item.split(","))
coverage_list.append(len(item))
# print(len(item))
df["coverage_count"] = coverage_list

Extract element of exploded JSON via name of list element

I have a JSON that I have read in using
data_fields = spark.read.json(json_files)
where json_files is the path to the json files. To extract the data from the JSON I then use:
data_fields = data_fields.select('datarecords.fields')
I then give each record its own row via:
input_data = input_data.select((explode("fields").alias('fields')))
Resulting in data in the fields column that looks like:
fields
[[ID,, 101],[other_var,, 'some_value']]
[[other_var,,"some_value"],[ID,, 102],[other_var_2,, 'some_value_2']
each sub list element can be refereed too using "name", "status" and "value" as the components. For example:
input_data = input_data.withColumn('new_col', col('fields.name'))
Will extract the name of the first element. So in the above example, "ID" and "other_var". I am trying to extract the id for each record to its own column to end with:
id
fields
101
[[ID,, 101],[other_var,, 'some_value']]
102
[[other_var,,"some_value"],[ID,, 102],[other_var_2,, 'some_value_2']
For those cases where the id is the first element in the fields column, row 1 above, I can do this via:
input_data = input_data.withColumn('id', col('fields')[0].value)
However as shown the "id" is not always the first element in the list in column fields, and there are many hundreds of potential sub list elements. I have therefore being trying to extract the "id" via its name rather than its position in the list but have come up against a blank. The nearest I have come is to use the below to identify which element it exists in:
input_data = input_data.withColumn('id', array_position(col('fields.name'),"ID"))
Which returns the position. But not sure where to go to get the value unless I do something like:
result = input_data.withColumn('id',
when(col('fields.name')[0] == 'ID',col('fields')[0].value)
.when(col('fields.name')[1] == 'ID',col('fields')[1].value)
.when(col('fields.name')[2] == 'ID',col('fields')[2].value))
And of course the above is impractical with potentially 100 of sub list elements in the fields column
Any help to achieve the above would be appreciated to extract the id regardless of position in the list efficiently.
Hopefully the above minimum example is clear.

Turning stocktake data into a dictionary

So I am automating stocktake comparisons, we receive a stock update daily and it needs to be compared to our own stock data to see if there are differences. I think the easiest way of doing this would be to get both stock reports into a dictionary format with {item a, quantity} etc.. I have done this for our own stock but the stock form the warehouse comes in an excel file and it separates each item out by batch number.
I have read this using xlrd using the following:
data = []
file = file_name
wb = xlrd.open_workbook(file)
sh = wb.sheet_by_index(0)
row_numbers = range(6, sh.nrows)
for row in row_numbers:
if str(sh.row_values(row)[0]).startswith('Sku'):
data.append(sh.row_values(row)[1:3])
print(data)
and have it in the format of a list of lists. For reference this would look like [item a, 1200], [item a, 4000] etc.. The number of entries per item is not consistent and goes up to 6 but can also be 1. What would be the best method for creating a final dictionary with only one entry per time with a grand total across all of the original lines?
What you'll want to do is iterate through your list of lists, and for each one find whether the item is in the dictionary. If it's not, add it to the dictionary with the quantity as the mapped value. If it already is, look up the mapped value and add the quantity to it.
For example:
final_dict = {}
for entry in list_of_lists:
final_dict[entry[0]] = final_dict.get(entry[0], 0) + entry[1]
Note that here final_dict.get(x, y) means look up using the key x, and return y as a default if x isn't in the dictionary

Convert JSON column in dataframe to simple array of values

I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.
The possible labels are the following categories: [glass, cardboard, trash, metal, paper].
[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
TO
([191 70 183 311], 0)
I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.
UPDATE
The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.
BB_CSV
You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html
import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)
Edit:
If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:
extracted = []
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)
# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10
Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:
without_comma = []
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)
It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])
# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
Now, to simple unpack the row you can do:
df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())
Which will get you a new column with a tuple of 5 items.
If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:
labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}
This should get your your desired layout if you want a one-line solution without writing your own function.
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))
A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.
import json
def unpack_bbox(row, labels):
# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only
keys = list(json.loads(row)[0].values())
bbox_values = keys[:4]
bbox_label = keys[-1]
label_value = labels.get(bbox_label)
return bbox_values, label_value
df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))

Table/Data manipulation with Python Dictionary

I need help finishing up this python script. I'm an intern at a company, and this is my first week. I was asked to develop a python script that will take a .csv and put(append) any related columns into one column so that they have only the 15 or so necessary columns with the data in them. For example, if there are zip4, zip5, or postal code columns, they want those to all be underneath the zip code column.
I just started learning python this week as I was doing this project so please excuse my noobish question and vocabulary. I'm not looking for you guys to do this for me. I'm just looking for some guidance. In fact, I want to learn more about python, so anyone who could lead me in the right direction, please help.
I'm using dictionary key and values. The keys are every column in the first row. The values of each key are the remaining rows(second through 3000ish). Right now, I'm only getting one key:value pair. I'm only getting the final row as my array of values, and I'm only getting one key. Also, I'm getting a KeyError message, so my key's aren't being identified correctly. My code so far is underneath. I'm gonna keep working on this, and any help is immensely appreciated! Hopefully, I can by the person who helps me a beer and I can pick their brain a little:)
Thanks for your time
# To be able to read csv formated files, we will frist have to import the csv module
import csv
# cols = line.split(',')# each column is split by a comma
#read the file
CSVreader = csv.reader(open('N:/Individual Files/Jerry/2013 customer list qc, cr, db, gb 9-19-2013_JerrysMessingWithVersion.csv', 'rb'), delimiter=',', quotechar='"')
# define open dictionary
SLSDictionary={}# no empty dictionary. Need column names to compare to.
i=0
#top row are your keys. All other rows are your values
#adjust loop
for row in CSVreader:
# mulitple loops needed here
if i == 0:
key = row[i]
else:
[values] = [row[1:]]
SLSDictionary = dict({key: [values]}) # Dictionary is keys and array of values
i=i+1
#print Dictionary to check errors and make sure dictionary is filled with keys and values
print SLSDictionary
# SLSDictionary has key of zip/phone plus any characters
#SLSDictionary.has_key('zip.+')
SLSDictionary.has_key('phone.+')
#value of key are set equal to x. Values of that column set equal to x
#[x]=value
#IF SLSDictionary has the key of zip plus any characters, move values to zip key
#if true:
# SLSDictionary['zip'].append([x])
#SLSDictionary['phone_home'].append([value]) # I need to append the values of the specific column, not all columns
#move key's values to correct, corresponding key
SLSDictionary['phone_home'].append(SLSDictionary[has_key('phone.+')])#Append the values of the key/column 'phone plus characters' to phone_home key/column in SLSDictionary
#if false:
# print ''
# go to next key
SLSDictionary.has_value('')
if true:
print 'Error: No data in column'
# if there's no data in rows 1-?. Delete column
#if value <= 0:
# del column
print SLSDictionary
Found a couple of errors just quickly looking at it. One thing you need to watch out for is that you're assigning a new value to the existing dictionary every time:
SLSDictionary = dict({key: [values]})
You're re-assigning a new value to your SLSDictionary every time it enters that loop. Thus at the end you only have the bottom-most entry. To add a key to the dictionary you do the following:
SLSDictionary[key] = values
Also you shouldn't need the brackets in this line:
[values] = [row[1:]]
Which should instead just be:
values = row[1:]
But most importantly is that you will only ever have one key because you constantly increment your i value. So it will only ever have one key and everything will constantly be assigned to it. Without a sample of how the CSV looks I can't instruct you on how to restructure the loop so that it will catch all the keys.
Assuming your CSV is like this as you've described:
Col1, Col2, Col3, Col4
Val1, Val2, Val3, Val4
Val11, Val22, Val33, Val44
Val111, Val222, Val333, Val444
Then you probably want something like this:
dummy = [["col1", "col2", "col3", "col4"],
["val1", "val2", "val3", "val4"],
["val11", "val22", "val33", "val44"],
["val111", "val222", "val333", "val444"]]
column_index = []
SLSDictionary = {}
for each in dummy[0]:
column_index.append(each)
SLSDictionary[each] = []
for each in dummy[1:]:
for i, every in enumerate(each):
try:
if column_index[i] in SLSDictionary.keys():
SLSDictionary[column_index[i]].append(every)
except:
pass
print SLSDictionary
Which Yields...
{'col4': ['val4', 'val44', 'val444'], 'col2': ['val2', 'val22', 'val222'], 'col3': ['val3', 'val33', 'val333'], 'col1': ['val1', 'val11', 'val111']}
If you want them to stay in order then change the dictionary type to OrderedDict()

Categories

Resources