column value parse json

column value parse json - python

ive got data drame
json = {'contexts_ru_andata_master_cookies_1': {0: [{'_ym_uid': '1664978572350562652'}],
1: [{'_ym_uid': '1664978577951178500'}],
2: [{'_ym_uid': '1631015476823239589'}],
3: [{'_ym_uid': '1664945479855475653'}],
4: [{'_ym_uid': '1663327749550707020'}],
6: [{'_ym_uid': '1664978547593809275'}],
7: [{'_ym_uid': '16649783691007078342'}],
8: [{'_ym_uid': '1662551949642530804'}]}}
pd.DataFrame.from_dict(json)
i need to get numeric value from cell, any help will be appreciated.
like 1664978577951178500 and etc

It's not a corret json string, you can use "re" to match it.
import re
json = '''{'contexts_ru_andata_master_cookies_1': {0: [{'_ym_uid': '1664978572350562652'}],
1: [{'_ym_uid': '1664978577951178500'}],
2: [{'_ym_uid': '1631015476823239589'}],
3: [{'_ym_uid': '1664945479855475653'}],
4: [{'_ym_uid': '1663327749550707020'}],
6: [{'_ym_uid': '1664978547593809275'}],
7: [{'_ym_uid': '16649783691007078342'}],
8: [{'_ym_uid': '1662551949642530804'}]}}'''
res = re.finditer(r'\'[0-9]*\'', json)
cookies_l = []
for i in res:
cookies_l.append(i.group()[1:-1])
print(cookies_l)
This is output：
['1664978572350562652', '1664978577951178500', '1631015476823239589', '1664945479855475653', '1663327749550707020', '1664978547593809275', '16649783691007078342', '1662551949642530804']

df['_ym_uid'] = df['contexts_ru_andata_master_cookies_1'].str[0].str[0].apply(lambda x : x['_ym_uid'])
that was the answer

Related

How to assign values of a column based on two conditions for current and previous row values?

Here's the data frame, original has a million rows so solution needs to be efficient:
Code:
import pandas as pd
df_temp = pd.DataFrame({'Download Button Clicked Time': {0: '2021-10-24 12:39:27.189629',
1: '2021-10-24 12:42:06.346536',
2: '2021-10-24 12:42:06.369056',
3: '2021-10-24 12:42:11.551610',
4: '2021-10-24 12:44:38.475047',
5: '2021-10-24 12:46:33.331920',
6: '2021-10-24 12:46:33.346536',
7: '2021-10-24 12:46:33.369056',
8: '2021-10-24 12:46:33.421520',
9: '2021-10-24 12:46:33.404641'},
'Install Verified Time': {0: '2021-10-24 12:41:04.669589',
1: '2021-10-24 12:43:14.032023',
2: '2021-10-24 12:43:14.033913',
3: '2021-10-24 12:44:08.667666',
4: '2021-10-24 12:46:11.161883',
5: '2021-10-24 12:46:34.976129',
6: '2021-10-24 12:46:35.032023',
7: '2021-10-24 12:46:35.033913',
8: '2021-10-24 12:46:35.065320',
9: '2021-10-24 12:46:35.125156'},
'App ID': {0: 'com.foxbytecode.captureintruder',
1: 'in.onecode.app',
2: 'com.payworld.phoneapp',
3: 'messenger.messenger.videocall.messenger',
4: 'imagito.image.search',
5: 'reward.earn.talktime.sixer',
6: 'com.hivoco.app',
7: 'messenger.social.productivity.notifire',
8: 'com.foxbytecode.exiftool',
9: 'com.fliplearn.app'},
'Email ID': {0: 'mandeepsharma38276atwehoo.com',
1: 'luckychauhan1199atwehoo.com',
2: 'mandeepsharma38276atwehoo.com',
3: 'chettanmon40atwehoo.com',
4: 'kaliapradhan1413atwehoo.com',
5: 'pinkydevi69784atwehoo.com',
6: 'pinkydevi69784atwehoo.com',
7: 'pinkydevi69784atwehoo.com',
8: 'pinkydevi69784atwehoo.com',
9: 'pinkydevi69784atwehoo.com'},
'install_time': {0: 97.47996,
1: 68.29827800000001,
2: 120.708813,
3: 117.116056,
4: 92.686836,
5: 1.644209,
6: 1.6854870000000002,
7: 1.664857,
8: 1.6438000000000001,
9: 1.720515},
'fraud': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}})
df_temp
Output should only have the last FIVE 'fraud' rows as one but current output is this:
The code I'm using to detect fraud and get this output is this:
df_temp['Download Button Clicked Time'] = df_temp['Download Button Clicked Time'].astype('datetime64[ns]')
df_temp['Install Verified Time'] = df_temp['Install Verified Time'].astype('datetime64[ns]')
df_temp['install_time'] = df_temp['Install Verified Time'] - df_temp['Download Button Clicked Time']
df_temp['install_time'] = df_temp['install_time'].dt.total_seconds()
df_temp['diff'] = df_temp.install_time.diff().abs()
def fraud_time(row):
fraud = 0
if row['install_time'] < 0.5:
fraud = 1
elif row['diff'] < 0.1:
fraud = 1
return fraud
df_temp['fraud'] = df_temp.apply(fraud_time, axis=1)
df_temp
I'm using Install Verified Time, seems more sensible than Download Button Clicked Time. As you can clearly see, the third row should not be marked as one as second and third row emails were different. Also, last five, not four, rows should be also marked 1.
TL;DR
Detect fraud (maybe by) using pandas.DataFrame.diff only if last two email addresses were different.
Edit: Frauds will have a very small value of time difference (say 0.02 seconds), for the SAME email, not different ones. Two different users installing two different apps in under 2 milliseconds makes sense, same user doing so doesn't makes sense.

The key is to group entries by email for successive installations:
# I assume you have done this already
df['Install Verified Time'] = pd.to_datetime(df['Install Verified Time'])
df['fraud'] = (df['install_time'] < 0.5) | (
df.groupby('Email ID', as_index=False)['Install Verified Time'].diff()['Install Verified Time'] < pd.Timedelta(seconds=0.1)
)

Changing keys after loading json file dictionary

Okay, so I am having this issue with JSON, keys, and strings. I'm using a JSON dump in python to save my game dictionaries and it does work. The issue is when I load the dictionaries the game I'm making uses int values as keys in the world directory but JSON stores keys as strings. Here's a random generation I did.
worldmap = {
'Regions': {
1: 'Zelbridge', 2: 'Forest Path', 3: 'Baronbell', 4: 'Old Path', 5: 'Cariva', 6: 'Prairie Path'},
'Zelbridge': {1: 'Field', 2: 'Prairie Path', 3: 'School', 4: 'Mountain Path', 5: 'Graveyard',
6: 'Old Path', 7: 'Blacksmith', 8: 'Forest Path', 9: 'Doctor', 0: 'Zelbridge'},
'Forest Path': {1: 'Trees', 2: 'Bushes', 3: 'Path', 4: 'Cariva', 5: 'Path',
6: 'Baronbell', 7: 'Path', 8: 'Zelbridge', 9: 'Path', 0: 'Forest Path'},
'Baronbell': {1: 'House', 2: 'Mountain Path', 3: 'Graveyard', 4: 'Old Path', 5: 'Field',
6: 'Forest Path', 7: 'Church', 8: 'Prairie Path', 9: 'Shop', 0: 'Baronbell'},
'Old Path': {1: 'Path', 2: 'Trees', 3: 'Bushes', 4: 'Cariva', 5: 'Path',
6: 'Zelbridge', 7: 'Trees', 8: 'Baronbell', 9: 'Trees', 0: 'Old Path'},
'Cariva': {1: 'Cellar', 2: 'Old Path', 3: 'Graveyard', 4: 'Mountain Path', 5: 'Town Hall',
6: 'Prairie Path', 7: 'School', 8: 'Forest Path', 9: 'Blacksmith', 0: 'Cariva'},
'Prairie Path': {1: 'Bushes', 2: 'Path', 3: 'Path', 4: 'Zelbridge', 5: 'Trees',
6: 'Cariva', 7: 'Trees', 8: 'Baronbell', 9: 'Path', 0: 'Prairie Path'}
}
So when I use the load function I get key errors due to the int's being converted to strings. I attempted a for loop to iterate over the keys and change them back but I get this error about the dictionary changing. Here's an example of me trying to load a different (and also random) world. Its set to print after each loop showing that it works
What was your hero's name? #Input hero name
Loading...
{'2': 'Prairie Path', '3': 'Cariva', '4': 'Old Path', '5': 'Baronbell', '6': 'Mountain Path', 1: 'Zelbridge'} #Region number 1 no longer string
{'3': 'Cariva', '4': 'Old Path', '5': 'Baronbell', '6': 'Mountain Path', 1: 'Zelbridge', 2: 'Prairie Path'} #Region numbers 1 and 2 no longer string
{'4': 'Old Path', '5': 'Baronbell', '6': 'Mountain Path', 1: 'Zelbridge', 2: 'Prairie Path', 3: 'Cariva'} #ect
{'5': 'Baronbell', '6': 'Mountain Path', 1: 'Zelbridge', 2: 'Prairie Path', 3: 'Cariva', 4: 'Old Path'} #ect
{'6': 'Mountain Path', 1: 'Zelbridge', 2: 'Prairie Path', 3: 'Cariva', 4: 'Old Path', 5: 'Baronbell'} # Region numbers 1, 2, 3, 4, and 5 no longer string
{'6': 'Mountain Path', 1: 'Zelbridge', 2: 'Prairie Path', 3: 'Cariva', 5: 'Baronbell'}
Traceback (most recent call last):
File "C:/Users/crazy/PycharmProjects/rpg/main.py", line 581, in load
for key in worldmap["Regions"]:
RuntimeError: dictionary changed size during iteration
Process finished with exit code 1
I'm not exactly sure what's wrong with it but sadly I'll also have to do this for each location within a region. Any and all help is appreciated, as I've looked all over SO and google but to no avail.
with open(world_file) as infile:
worldmap = json.load(infile)
copy = worldmap.copy()
regions = worldmap["Regions"]
locations = copy.pop("Regions")
for key in worldmap["Regions"]:
value = worldmap["Regions"][key]
new_key = int(key)
worldmap["Regions"].update({new_key: value})
worldmap ["Regions"].pop(key)
print(str(worldmap["Regions"]) + "\n")```

Use For loop this way and update the key in loop itself:
mydict = {1: 'a', 2: 'b'}
for index, (key, value) in enumerate(mydict.items()):
print("index: {}, key: {}, value: {}".format(index, key, value))
mydict[index] = mydict.pop(key)
Or use can use List to force a copy of the keys to be made:
mydict = {1: 'a', 2: 'b'}
for index, key in enumerate(list(mydict)):
mydict[index] = mydict.pop(key)
# which will give output like:
# ---------------------------
# {0: 'a', 1: 'b'}

How can I convert an array inside a python dictionary to a tuple?

I have this dictionary:
{
0: array([-0.16638531, -0.11749843]),
1: array([-0.2318372 , 0.00917023]),
2: array([-0.42934129, -0.0675385 ]),
3: array([-0.63377579, -0.02102854]),
4: array([-0.26648222, -0.42038916]),
5: array([-0.17250316, -0.73490218]),
6: array([-0.42774336, -0.61259704]),
7: array([-0.55420825, -0.77304496]),
8: array([0.13900166, 0.07800885]),
9: array([0.42223986, 0.16563338]),
10: array([ 0.39895669, -0.09198566]),
12: array([0.24324618, 0.44829616]),
11: array([ 0.55394714, -0.17960723]),
13: array([0.192127 , 0.5988793]),
14: array([0.39554203, 0.7186038 ]),
15: array([0.53721604, 1. ])
}
I want to convert those numpy.ndarray values to tuples, and have something like this:
{
0: (-0.16638531, -0.11749843),
1: (-0.2318372 , 0.00917023),
...
}

From this answer here it looks like for each value in the dictionary you can:
tuple(arr)
So for the whole dictionary you can probably do something like:
new_dict = {key: tuple(arr) for key, arr in old_dict.items()}
Or easier to understand:
new_dict = {}
for key, arr in old_dict.items():
new_dict.update({key: tuple(arr)})

You can use a dictionary comprehension.
Python dictionaries have an .items() method that return a tuple of (key, value) for each key-value pair.
The comprehension recreates a new mapping with the original key and the array cast as a tuple.
from numpy import array
data = {
0: array([-0.16638531, -0.11749843]),
1: array([-0.2318372 , 0.00917023]),
2: array([-0.42934129, -0.0675385 ]),
3: array([-0.63377579, -0.02102854]),
4: array([-0.26648222, -0.42038916]),
5: array([-0.17250316, -0.73490218]),
6: array([-0.42774336, -0.61259704]),
7: array([-0.55420825, -0.77304496]),
8: array([0.13900166, 0.07800885]),
9: array([0.42223986, 0.16563338]),
10: array([ 0.39895669, -0.09198566]),
12: array([0.24324618, 0.44829616]),
11: array([ 0.55394714, -0.17960723]),
13: array([0.192127 , 0.5988793]),
14: array([0.39554203, 0.7186038 ]),
15: array([0.53721604, 1. ])
}
print({key: tuple(value) for key, value in data.items()})
OUTPUT:
{0: (-0.16638531, -0.11749843), 1: (-0.2318372, 0.00917023), 2: (-0.42934129, -0.0675385), 3: (-0.63377579, -0.02102854), 4: (-0.26648222, -0.42038916), 5: (-0.17250316, -0.73490218), 6: (-0.42774336, -0.61259704), 7: (-0.55420825, -0.77304496), 8: (0.13900166, 0.07800885), 9: (0.42223986, 0.16563338), 10: (0.39895669, -0.09198566), 12: (0.24324618, 0.44829616), 11: (0.55394714, -0.17960723), 13: (0.192127, 0.5988793), 14: (0.39554203, 0.7186038), 15: (0.53721604, 1.0)}

mapping = { key: (item[0], item[1]) for key, item in your_dict.items() }

Exporting Interactive Jupyter Notebook to html

The following code plots an interactive figure where I can toggle specific lines on/off. This works perfectly when I'm working in an Ipython Notebook
import pandas as pd
import numpy as np
from itertools import cycle
import matplotlib.pyplot as plt, mpld3
from matplotlib.widgets import CheckButtons
import matplotlib.patches
import seaborn as sns
%matplotlib nbagg
sns.set(style="whitegrid")
df = pd.DataFrame({'freq': {0: 0.01, 1: 0.02, 2: 0.029999999999999999, 3: 0.040000000000000001, 4: 0.050000000000000003, 5: 0.059999999999999998, 6: 0.070000000000000007, 7: 0.080000000000000002, 8: 0.089999999999999997, 9: 0.10000000000000001, 10: 0.01, 11: 0.02, 12: 0.029999999999999999, 13: 0.040000000000000001, 14: 0.050000000000000003, 15: 0.059999999999999998, 16: 0.070000000000000007, 17: 0.080000000000000002, 18: 0.089999999999999997, 19: 0.10000000000000001, 20: 0.01, 21: 0.02, 22: 0.029999999999999999, 23: 0.040000000000000001, 24: 0.050000000000000003, 25: 0.059999999999999998, 26: 0.070000000000000007, 27: 0.080000000000000002, 28: 0.089999999999999997, 29: 0.10000000000000001}, 'kit': {0: 'B', 1: 'B', 2: 'B', 3: 'B', 4: 'B', 5: 'B', 6: 'B', 7: 'B', 8: 'B', 9: 'B', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A', 15: 'A', 16: 'A', 17: 'A', 18: 'A', 19: 'A', 20: 'C', 21: 'C', 22: 'C', 23: 'C', 24: 'C', 25: 'C', 26: 'C', 27: 'C', 28: 'C', 29: 'C'}, 'SNS': {0: 91.198979591799997, 1: 90.263605442199989, 2: 88.818027210899999, 3: 85.671768707499993, 4: 76.23299319729999, 5: 61.0969387755, 6: 45.1530612245, 7: 36.267006802700003, 8: 33.0782312925, 9: 30.739795918400002, 10: 90.646258503400006, 11: 90.306122449, 12: 90.178571428600009, 13: 89.498299319699996, 14: 88.435374149599994, 15: 83.588435374200003, 16: 75.212585034, 17: 60.969387755100001, 18: 47.278911564600001, 19: 37.627551020399999, 20: 90.986394557800011, 21: 90.136054421799997, 22: 89.540816326499993, 23: 88.690476190499993, 24: 86.479591836799997, 25: 82.397959183699996, 26: 73.809523809499993, 27: 63.180272108800004, 28: 50.935374149700003, 29: 41.241496598699996}, 'FPR': {0: 1.0953616823100001, 1: 0.24489252678500001, 2: 0.15106142277199999, 3: 0.104478605177, 4: 0.089172822253300005, 5: 0.079856258734300009, 6: 0.065881413455800009, 7: 0.059892194050699996, 8: 0.059892194050699996, 9: 0.0578957875824, 10: 0.94097291541899997, 11: 0.208291741532, 12: 0.14773407865800001, 13: 0.107805949291, 14: 0.093165635189999998, 15: 0.082518134025399995, 16: 0.074532508152000007, 17: 0.065881413455800009, 18: 0.062554069341799995, 19: 0.061888600519100001, 20: 0.85313103081100006, 21: 0.18899314567100001, 22: 0.14107939043000001, 23: 0.110467824582, 24: 0.099820323417899995, 25: 0.085180009316599997, 26: 0.078525321088700001, 27: 0.073201570506399985, 28: 0.071870632860800004, 29: 0.0705396952153}})
tableau20 = ["#6C6C6C", "#92D050", "#FFC000"]
tableau20 = cycle(tableau20)
kits = ["A","B", "C"]
color = iter(["#6C6C6C", "#92D050", "#FFC000"])
fig = plt.figure(figsize=(12,8))
for kit in kits:
colour = next(color)
for i in df.groupby('kit'):
grouped_df = pd.DataFrame(np.array(i[1]), columns =
['freq', 'SNS', 'FPR', 'kit'])
if grouped_df.kit.tolist()[1] == kit:
x = [float(value) for i, value in enumerate(grouped_df.FPR)]
y = [float(value) for i, value in enumerate(grouped_df.SNS)]
x, y = (list(x) for x in zip(*sorted(zip(x, y))))
label = grouped_df['kit'].tolist()[1]
p = plt.plot(x, y, "-o",label = label, color = colour)
labels = [label.get_text() for label in plt.legend().texts]
plt.legend().set_visible(False)
for i, value in enumerate(labels):
exec('label%s="%s"'%(i, value))
for i in range(len(labels)):
exec('l%s=fig.axes[0].lines[i]'%(i))
rax = plt.axes([0.92, 0.7, 0.2, 0.2], frameon=False)
check = CheckButtons(rax, (labels), ('True ' * len(labels)))
for i, rec in enumerate(check.rectangles):
rec.set_facecolor(tableau20.next())
def func(label):
for i in range(len(labels)):
if label == eval('label%s'%(i)): eval('l%s.set_visible(not l%s.get_visible())'%(i,i))
plt.draw()
check.on_clicked(func)
plt.show()
Problem is, I need to export the notebook as a html to share with colleagues who know nothing about python. How can I export the notebook to html and get it to maintain the interactive (toggle) functionality (which it currently loses)? Thanks!

Maybe you don't need to export jupyter notebook to html, but share the notebook link to the other people and they can visit the url using their browser.
A jupyter notebook plugin would help you do this more efficiently: jupyter/dashboards, it's maintained by official jupyter team, and it helps you share your notebook like a report, and you can control which cell to display and the location of each cell displayed. Worth a try!

Search a key in a dict and assing the value of that key to a variable Python

i have this dict:
dict_meses = {1: 'Enero', 2: 'Febrero', 3: 'Marzo', 4: 'Abril', 5: 'Mayo', 6: 'Junio', 7: 'Julio', 8: 'Agosto',
9: 'Setiembre', 10: 'Octubre', 11: 'Noviembre', 12: 'Diciembre'}
I need to change the month on a string like this '14/1/2015' for the month that corresponds in the dict. For example if a have '14/1/2015' i need to change it to '1/Enero/2015'
I am trying to do something like this:
def xxx(days): -----> days is a list of tuples like this [('14/1/2015', 500), ...]
dict_months = {1: 'Enero', 2: 'Febrero', 3: 'Marzo', 4: 'Abril', 5: 'Mayo', 6: 'Junio', 7: 'Julio', 8: 'Agosto',
9: 'Setiembre', 10: 'Octubre', 11: 'Noviembre', 12: 'Diciembre'}
days_list = []
for i in days:
lista = list(i)
fecha_entera = lista[0].split('/') ---> ['14','1','2015']
dia = fecha_entera[1] ----------------> '1'
if int(dia) in dict_meses.keys():
fecha_entera[1] = ????------------> want to change '1' to 'Enero'
dias_lista.append(fecha_entera)
return dias_lista
Question: How can i take the value that corresponds to the key that the day represents?
If i am not explaining this to clear just let me know and i will try harder.
Thanks in advance for the help provided

For a string solution, use the string "replace" function on "/1/".
lista.replace("/" + dia + "/", "/" + dict_months[int(dia)] + "/")

You can use datetime to parse your dates using %B with srftime to get the output you want:
from datetime import datetime
dte = '14/1/2015'
print(datetime.strptime(dte,"%d/%m/%Y").strftime("%d/%B/%Y"))
%B will give you the locale’s full month name.
In [1]: from datetime import datetime
In [2]: dte = '14/1/2015'
In [3]: import locale
In [4]: locale.setlocale(locale.LC_ALL,"es_SV.utf_8")
Out[4]: 'es_SV.utf_8'
In [5]: print(datetime.strptime(dte,"%d/%m/%Y").strftime("%d/%B/%Y"))
14/enero/2015
If every first element is a date string:
def xxx(days):
return [datetime.strptime(dte, "%d/%m/%Y").strftime("%d/%B/%Y")
for dte, _ in days]
If you want to use your dict:
def xxx(days):
dict_months = {"1": 'Enero', "2": 'Febrero', "3": 'Marzo', "4": 'Abril', "5": 'Mayo', "6": 'Junio', "7": 'Julio',
"8": 'Agosto',
"9": 'Setiembre', "10": 'Octubre', "11": 'Noviembre', "12": 'Diciembre'}
days_list = []
for sub in map(list, days):
dy, mn, year = sub[0].split()
days_list.append("{}/{}/{}".format(dy, dict_months[mn], year))
return days_list
You should use the keys as strings, it is pointless having to cast to int to compare.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

column value parse json - python

df['_ym_uid'] = df['contexts_ru_andata_master_cookies_1'].str[0].str[0].apply(lambda x : x['_ym_uid']) that was the answer

Related

How to assign values of a column based on two conditions for current and previous row values?

Changing keys after loading json file dictionary

How can I convert an array inside a python dictionary to a tuple?

Exporting Interactive Jupyter Notebook to html

Search a key in a dict and assing the value of that key to a variable Python

Categories

Resources