Why doesn't json.loads() work for multiple columns? - python

Is it possible to apply json.loads to multiple columns? If I do something like:
df['col1'] = df['col1'].apply(json.loads)
I can apply it to each entry in col1 and everything is fine. But if I do something like,
df[['col1', 'col2', 'col3'] = df[['col1', 'col2', 'col3' ].apply(json.loads)
I get the error:
TypeError: the JSON object must be str, bytes or bytearray, not Series.
Why doesn't this way work? Is it possible to apply it all at once or should I just do each column individually?

Per https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html, apply applies a function along an axis.
One way is to write a function that applies to the row series. The example doesn't apply the json.loads because the sample input would need to be json, but you can change the apply_transform function to meet your needs
(sorry for the to_dict() output but I have trouble getting dataframe output into text editors)
import pandas as pd
import numpy as np
columns = [
"col1",
"col2",
"col3",
"col4",
"col5"
]
df = pd.DataFrame(np.random.randint(0,5,size=(5, 5)), columns=columns)
df.to_dict()
# {'col1': {0: 4, 1: 3, 2: 3, 3: 1, 4: 2},
# 'col2': {0: 1, 1: 1, 2: 3, 3: 3, 4: 1},
# 'col3': {0: 1, 1: 2, 2: 3, 3: 4, 4: 3},
# 'col4': {0: 0, 1: 1, 2: 1, 3: 3, 4: 2},
# 'col5': {0: 3, 1: 2, 2: 0, 3: 2, 4: 0}}
You will notice that after the transformation, the columns have been doubled. You would replace the multiplication transformation with your own code
def apply_transform(row):
new_row = row.copy()
for col in ['col1', 'col2', 'col3']:
new_row[col] = new_row[col] * 2 # apply your own transform here
return new_row
df_new = df.apply(apply_transform, axis=1)
df_new.to_dict()
# {'col1': {0: 8, 1: 6, 2: 6, 3: 2, 4: 4},
# 'col2': {0: 2, 1: 2, 2: 6, 3: 6, 4: 2},
# 'col3': {0: 2, 1: 4, 2: 6, 3: 8, 4: 6},
# 'col4': {0: 0, 1: 1, 2: 1, 3: 3, 4: 2},
# 'col5': {0: 3, 1: 2, 2: 0, 3: 2, 4: 0}}

Related

Python Looping - storing dataframes from .txt file loop, with different lengths

I would like to loop through a bunch of .txt files, for each of the files processing it (removing columns, changing names, nan etc) to get the end dataframe output of df1, which has certain date, lat, lon, and variables assigned to it. Over the loop, I would like to get df_all, with all the information from all the files in (most likely in date order).
However, each of my dataframes are different lengths, and there is the possibility of them sharing the same date+ lat/lon values in that column.
I have made code to feed in and process files individually, but I'm stuck on how to make this into a larger loop (via concat/append...?).
I am trying to end up with one large dataframe (df_all), which contains all the 'scattered' information of the different files (df1 outputs). In addition, if there is a conflicting date and lat/lon, I would find the mean. Is this possible to do in python/pandas?
Any help at all on any of the multiple issues would be greatly appreciated! Or ideas on how to go about this.
Here are fake tables that are read in by a for-loop and concat to a big table. Then after all rows are added to a single big table, you can group together multiple rows that have the same values in the A column and get the mean of the B and C columns as an example. You should be able to run this chunk of code yourself and I hope this helps give you keywords to use to search for other questions similar to yours!
import pandas as pd
#Making fake table read ins. you'd be using pd.read_csv or similar
def fake_read_table(name):
small_df1 = pd.DataFrame({'A': {0: 5, 1: 1, 2: 3, 3: 1}, 'B': {0: 4, 1: 4, 2: 4, 3: 4}, 'C': {0: 2, 1: 1, 2: 4, 3: 1}})
small_df2 = pd.DataFrame({'A': {0: 4, 1: 5, 2: 1, 3: 4, 4: 3, 5: 2, 6: 5, 7: 1}, 'B': {0: 3, 1: 1, 2: 1, 3: 1, 4: 5, 5: 1, 6: 4, 7: 2}, 'C': {0: 4, 1: 1, 2: 5, 3: 2, 4: 4, 5: 4, 6: 5, 7: 2}})
small_df3 = pd.DataFrame({'A': {0: 2, 1: 2, 2: 4, 3: 3, 4: 1, 5: 4, 6: 5}, 'B': {0: 1, 1: 2, 2: 3, 3: 1, 4: 3, 5: 5, 6: 4}, 'C': {0: 5, 1: 2, 2: 3, 3: 3, 4: 5, 5: 4, 6: 5}})
if name == '1.txt':
return small_df1
if name == '2.txt':
return small_df2
if name == '3.txt':
return small_df3
#Start here
txt_paths = ['1.txt','2.txt','3.txt']
big_df = pd.DataFrame()
for txt_path in txt_paths:
small_df = fake_read_table(txt_path)
# .. do some processing you need to do somewhere in here ..
big_df = pd.concat((big_df,small_df))
#Taking the average B and C values for rows that have the same A value
agg_df = big_df.groupby('A').agg(
mean_B = ('B','mean'),
mean_C = ('C','mean'),
).reset_index()
print(agg_df)

When I use .copy() in python, why does it still make references of one dictionary? [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 1 year ago.
In python, I want to make a 2D array with dictionaries. I do have knowledge of references, so I explicitly used .copy. When I print the array out, however, the dictionaries that I do not want to be changed also changes.
My code is below.
dicts = []
for j in range(3):
dicts.append([{0:0,1:0,2:0,3:0}.copy()].copy() * 3)
dicts[0][0][0] = 5
dicts[1][1][0] = 10
print(dicts)
OUTPUT:
[[{0: 5, 1: 0}, {0: 5, 1: 0}], [{0: 0, 1: 10}, {0: 0, 1: 10}]]
Does anyone know why this happens, and anyway to fix it? Thank you.
The way to solve this kind of thing cleanly is with list comprehensions:
dicts = [[{0:0,1:0,2:0,3:0} for i in range(3)] for j in range(3)]
dicts[0][0][0] = 5
dicts[1][1][0] = 10
print(dicts)
Output:
[[{0: 5, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}], [{0: 0, 1: 0, 2: 0, 3: 0}, {0: 10, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}], [{0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}]]

Python doesn't return correct values with a dictionary using a function [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a function that returns an array of values and another array with dictionaries, All dictionaries are different but it returns the same value
When I print it form the function I get correct values for example
{0: 0}
{0: 1, 1: 0}
{0: 2, 1: 1, 2: 0}
{0: 3, 1: 5, 2: 0}
{0: 4, 1: 1, 2: 0}
{0: 5, 1: 0, 2: 0}
{0: 6, 1: 5, 2: 0}
{0: 7, 1: 2, 2: 1, 3: 0}
But when I return the array y get this (the wrong answer)
([0, 2, 4, 4, 6, 1, 6, 5], # This array is correct
[{0: 7, 1: 2, 2: 1, 3: 0}, # From here is incorrect
{0: 7, 1: 2, 2: 1, 3: 0},
{0: 7, 1: 2, 2: 1, 3: 0},
{0: 7, 1: 2, 2: 1, 3: 0},
{0: 7, 1: 2, 2: 1, 3: 0},
{0: 7, 1: 2, 2: 1, 3: 0},
{0: 7, 1: 2, 2: 1, 3: 0},
{0: 7, 1: 2, 2: 1, 3: 0}])
This is the fragment of code with this problem
.
.
.
for i in range(n):
j = i
count = 0
while parent[j] != -1:
s[i][count] = j
count = count + 1
j = parent[j]
s[i][count] = start
###########
print(s[i])
###########
return dist, s
I think this is what u 're looking for:
for i in range(n):
j = i
s = []
while parent[j] != -1:
s.append(j)
j = parent[j]
s.append(start)
path[i] = s[::-1]

Sort nested dictionary by variable number of keys

I'm trying to sort a dict by multiple keys. This is the dict I have:
standings = {1: {1: 1, 2: 0, 3: 1, 4: 0, 5: 0, 'player': 'Jack', 'points': 15},
2: {1: 1, 2: 0, 3: 2, 4: 2, 5: 0, 'player': 'Kate', 'points': 15},
3: {1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 'player': 'Sawyer', 'points': 5}}
I want to sort it by, in this order: 'points', 1, 2, 3, 4, 5.
I could do this, I assume:
reversed(sorted(standings, key=lambda x: (standings[x]['points'],
standings[x][1],
standings[x][2],
standings[x][3],
standings[x][4],
standings[x][5])))
However, the 1, 2, 3, 4, 5 keys are dynamic (and could be 1, 2, 3, 4, 5, 6, 7, 8, 9, etc.)
So, somehow I want to make the sorting keys dynamic in sorted(), except for 'points' which will always be used.
The result I want is a reversed sorted list with the keys (which are player ids from the db) from the first dict. i.e. for the given example it will be[2, 1, 3].
Basically, what you are looking for is itemgetter with range:
from operator import itemgetter
standings = ... # your dictionary of dictionaries
n = 5 # number of keys to sort on (1, 2, 3, ..., n)
# The following will collect values by 'points', 1, 2, ..., n in a tuple:
get_values = itemgetter('points', *range(1, n + 1))
result = sorted(standings,
key=lambda x: get_values(standings[x]),
reverse=True)
# [2, 1, 3]
Explanation:
In order to achieve the sorting by several dict keys, you could use itemgetter to create a function that will return a tuple of values by specified keys. So, as a simple example, if you would have this dictionary:
my_dict = {1: 10, 2: 20, 3: 30, 4: 40, 5: 50, 'player': 'Ben'}
and you would want to get the values by keys player, 1 and 2, you would write:
from operator import itemgetter
get_values = itemgetter('player', 1, 2)
get_values(my_dict)
# ('Ben', 10, 20)
Now, as the number of the values can vary and those are actually ordered integers (1, 2, 3, ...), you could unpack the given range to the itemgetter:
get_values = itemgetter('player', *range(1, 4)) # 'player', 1, 2, 3
get_values(my_dict)
# ('Ben', 10, 20, 30)
Finally, for your given example dictionary of dictionaries we get these tuples for each child dictionary and sort by them:
standings = {1: {1: 1, 2: 0, 3: 1, 4: 0, 5: 0, 'player': 'Jack', 'points': 15},
2: {1: 1, 2: 0, 3: 2, 4: 2, 5: 0, 'player': 'Kate', 'points': 15},
3: {1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 'player': 'Sawyer', 'points': 5}}
max_key = 5 # you may also calculate it as a max integer key
get_values = itemgetter('points', *range(1, n + 1))
result = sorted(standings, key=lambda x: get_values(standings[x]))
# [3, 1, 2]
# or reversed:
sorted(standings,
key=lambda x: get_keys(standings[x]),
reverse=True)
# [2, 1, 3]

Elements of dict of sets in python

I have a dictionary like this:
dict1 = {0: set([1, 4, 5]), 1: set([2, 6]), 2: set([3]), 3: set([0]), 4: set([1]), 5: set([2]), 6: set([])}
and from this dictionary I want to build another dictionary that count the occurrences of keys in dict1 in every other value ,that is the results should be:
result_dict = {0: 1, 1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1}
My code was this :
dict1 = {0: set([1, 4, 5]), 1: set([2, 6]), 2: set([3]), 3: set([0]), 4: set([1]), 5:set([2]), 6: set([])}
result_dict = {}
for pair in dict1.keys():
temp_dict = list(dict1.keys())
del temp_dict[pair]
count = 0
for other_pairs in temp_dict :
if pair in dict1[other_pairs]:
count = count + 1
result_dict[pair] = count
The problem with this code is that it is very slow with large set of data.
Another attempt was in a single line, like this :
result_dict = dict((key ,dict1.values().count(key)) for key in dict1.keys())
but it gives me wrong results, since values of dict1 are sets:
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0}
thanks a lot in advance
I suppose, for a first stab, I would figure out which values are there:
all_values = set().union(*dict1.values())
Then I'd try to count how many times each value occurred:
result_dict = {}
for v in all_values:
result_dict[v] = sum(v in dict1[key] for key in dict1)
Another approach would be to use a collections.Counter:
result_dict = Counter(v for set_ in dict1.values() for v in set_)
This is probably "cleaner" than my first solution -- but it does involve a nested comprehension which can be a little difficult to grok. It does work however:
>>> from collections import Counter
>>> dict1
{0: set([1, 4, 5]), 1: set([2, 6]), 2: set([3]), 3: set([0]), 4: set([1]), 5: set([2]), 6: set([])}
>>> result_dict = Counter(v for set_ in dict1.values() for v in set_)
Just create a second dictionary using the keys from dict1, with values initiated at 0. Then iterate through the values in the sets of dict1, incrementing values of result_dict as you go. The runtime is O(n), where n is the aggregate number of values in sets of dict1.
dict1 = {0: set([1, 4, 5]), 1: set([2, 6]), 2: set([3]), 3: set([0]), 4: set([1]), 5:set([2]), 6: set([])}
result_dict = dict.fromkeys(dict1.keys(), 0)
# {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0}
for i in dict1.keys():
for j in dict1[i]:
result_dict[j] += 1
print result_dict
# {0: 1, 1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1}

Categories

Resources