I'm trying to Query data using python pandas library. here is an example json of the data...
[
{
"name": "Bob",
"city": "NY",
"status": "Active"
},
{
"name": "Jake",
"city": "SF",
"status": "Active"
},
{
"name": "Jill",
"city": "NY",
"status": "Lazy"
},
{
"name": "Steve",
"city": "NY",
"status": "Lazy"
}]
My goal is to query the data where city == NY and status == Lazy.
One way using pandas DataFrame is to do...
df = df[(df.status == "Lazy") & (df.city == "NY")]
This is working fine but i wanted this to be more abstract.
This there way I can use **kwargs to filter the data? so far i've had trouble using Pandas documentation.
so far I've done.....
def main(**kwargs):
readJson = pd.read_json(sys.argv[1])
for key,value in kwargs.iteritems():
print(key,value)
readJson = readJson[readJson[key] == value]
print readJson
if __name__ == '__main__':
main(status="Lazy",city="NY")
again...this works just fine, but I wonder if there is some better way to do it.
I don't really see anything wrong with your approach. If you wanted to use df.query you could do something like this, although I'd argue it's less readable.
expr = " and ".join(k + "=='" + v + "'" for (k,v) in kwargs.items())
readJson = readJson.query(expr)
**Kwargs is nothing really to do with Pandas, it is a basic Python thing, you simply need to make a function that accepts Kwargs and substitute the variable Kwargs into the pandas Df query statement (inside the function). Don't have the time to code it for you but reading the Python docs should get you going. Pandas is but one great part of the Python system, when you start to combine multiple parts you will need to get familiar with those pieces.
Related
I am using Python requests library to execute GraphQL mutation. I need to pass requests library a query parameter which should contain a string which should be constructed from the Python list of Python dictionaries.
Python list of dictionaries looks like:
my_list_of_dicts = [{"custom_module_id": "23", "answer": "some text 2", "user_id": "111"},
{"custom_module_id": "24", "answer": "a", "user_id": "111"}]
Now I need to convert this list of dictionaries in a string so it should look like this:
my_list_of_dicts = [{custom_module_id: "23", answer: "some text 2", user_id: "111"},
{custom_module_id: "24", answer: "a", user_id: "111"}]
Basically I need to get the string that looks like a Python list of dictionaries except that keys of the dictionaries does not have quotations around dictionary key names. I did this and it works:
my_query_string = json.dumps(my_list_of_dicts).replace("\"custom_module_id\"", "custom_module_id")
my_query_string = my_query_string.replace("\"answer\"", "answer")
my_query_string = my_query_string.replace("\"user_id\"", "user_id")
But I was wondering maybe there is better way to achieve this? By "better" I mean some function call that will prepare json/dictionary format for ready to be used GraphQL string.
I think this may help you find your final answer.
Follow this article
gq = """
mutation ReorderProducts($id: ID!, $moves: [MoveInput!]!) {
collectionReorderProducts(id: $id, moves: $moves) {
job {
id
}
userErrors {
field
message
}
}
}
"""
resp = self.sy_graphql_client.execute(
query=gq,
variables={
"id": before_collection_meta.coll_meta.id,
"moves": list(map(lambda mtc:
{
"id": mtc.id, "newPosition": mtc.new_position
}, move_to_commands))
}
)
reorder_job_id = resp["data"]["collectionReorderProducts"]["job"]["id"]
self.sy_graphql_client.wait_for_job(reorder_job_id)
I have some json data similar to this...
{
"people": [
{
"name": "billy",
"age": "12"
...
...
},
{
"name": "karl",
"age": "31"
...
...
},
...
...
]
}
At the moment I can do this to get a entry from the people list...
wantedPerson = "karl"
for person in people:
if person['name'] == wantedPerson:
* I have the persons entry *
break
Is there a better way of doing this? Something similar to how we can .get('key') ?
Thanks,
Chris
Assuming you load that json data using the standard library for it, you're fairly close to optimal, perhaps you were looking for something like this:
from json import loads
text = '{"people": [{"name": "billy", "age": "12"}, {"name": "karl", "age": "31"}]}'
data = loads(text)
people = [p for p in data['people'] if p['name'] == 'karl']
If you frequently need to access this data, you might just do something like this:
all_people = {p['name']: p for p in data['people']}
print(all_people['karl'])
That is, all_people becomes a dictionary that uses the name as a key, so you can access any person in it quickly by accessing them by name. This assumes however that there are no duplicate names in your data.
First, there's no problem with your current 'naive' approach - it's clear and efficient since you can't find the value you're looking for without scanning the list.
It seems that you refer to better as shorter, so if you want a one-liner solution, consider the following:
next((person for person in people if person.name == wantedPerson), None)
It gets the first person in the list that has the required name or None if no such person was found.
similarly
ps = {
"people": [
{
"name": "billy",
"age": "12"
},
{
"name": "karl",
"age": "31"
},
]
}
print([x for x in ps['people'] if 'karl' in x.values()])
For possible alternatives or details see e.g. # Get key by value in dictionary
I am learning and having much fun with python, currently I am making a simple discord bot but I am stuck with nested dictionary access problem.
This is my command
#bot.command()
async def burek(ctx, arg):
burek_seller = burek_dictionary["bureks"][arg]["seller"]
burek_price = burek_dictionary["bureks"][arg]["price"]
burek_state = burek_dictionary["bureks"][arg]["state"]
burek_name = burek_dictionary["bureks"][arg]["name"]
await ctx.send(
f"{burek_name} is available {burek_state} at {burek_seller} for {burek_price}$."
)
The problem is I want to change 'arg' to search by 'name' in my nested dictionary not by a number of nested dictionary. I am aware I will have to make a few changes, but I have been stuck trying to figure it out for two days now :(
this is my dictionary
burek_dictionary = {
"bureks": {
"0": {
"name": "mesni",
"ingredient": "govedina",
"seller": "sipac",
"price": 2.2,
"state": "hot",
"demand": "low",
},
"1": {
"name": "sirni",
"ingredient": "sir",
"seller": "merc",
"price": 1.8,
"state": "cold",
"demand": "average",
},
"2": {
"name": "spinacni",
"ingredient": "spinaca",
"seller": "pecjak",
"price": 2,
"state": "fresh",
"demand": "high",
},
"3": {
"name": "ajdov",
"ingredient": "sirspinaca",
"price": 2.1,
"state": "hot",
"demand": "average",
},
}
}
Obviously now as 'arg' I have to write a number to achieve my result, but I would like to use 'name' from dictionary and achieve the same result. I have no idea idea how to approach this. I hope it makes sense! Thank you.
Sure.
Loop through the bureks and when you find the one with the matching name, use that.
I assume you might need the same functionality somewhere else, so I broke it out into a separate function.
def find_burek_by_name(name):
for burek_key, burek_info in burek_dictionary["bureks"].items():
if burek_info["name"] == name:
return burek_info
return None # Not found
#bot.command()
async def burek(ctx, arg):
burek_info = find_burek_by_name(arg)
if burek_info:
burek_seller = burek_info["seller"]
burek_price = burek_info["price"]
burek_state = burek_info["state"]
burek_name = burek_info["name"]
await ctx.send(
f"{burek_name} is available {burek_state} at {burek_seller} for {burek_price}$."
)
else:
pass # No such burek
If you know the entire dictionary beforehand, you can build a map from names to numbers, if the dict does not change you build the map in a single pass and avoid looping later:
name2num = {}
for num in burek_dictionary["bureks"]:
name2num[burek_dictionary["bureks"][num]["name"]] = num
print(name2num)
name = "sirni"
num = name2num[name]
burek_seller = burek_dictionary["bureks"][num]["seller"]
burek_price = burek_dictionary["bureks"][num]["price"]
burek_state = burek_dictionary["bureks"][num]["state"]
burek_name = burek_dictionary["bureks"][num]["name"]
print(burek_seller, burek_price, burek_state, burek_name)
Cheers!
this is going to be a kinky one... well it is for me as I've been trying to nail it for a week with no success so far :(
Lets say I get a nested JSON response from an API hit as:
{"Parameters": {
"Name": {
"Unparsed": null,
"First": "John",
"Middle": "A",
"Last": "Smith",
"Suffix": "Jr"
},
"Address": {
"Unparsed": null,
"Line1": "123 Main St",
"Line2": "apt.2",
"City": "New York",
"State": "NY",
"Zip": "12345"
}
and I wanted to create a variables dynamically from the key and assign value from the key's value.
I know how to do it like with name_first = data.get("Name").get(First), but in this case I am highly dependable on JSON response structure and above wont work if the structure is changed (renamed keys, added or deleted key) etc.
So I am working on writing a python script to do it, but so far had no luck getting this nailed.
thanks!
You might use locals().update to update current variables. So, this snippet creates new variables, like Address_Line2, Name_Suffix, etc
from collection import deque
import json
st = deque()
st.append(([], json.loads(your_json)['Parameters']))
while len(st):
prefix, item = st.pop()
if isinstance(item, dict):
for k, v in item.items():
st.append((prefix + [k], v))
else:
print({'_'.join(prefix): item})
locals().update({'_'.join(prefix): item})
I am new in Python, would like to extract data from json with Padas.
Json nested structure is as follows:
{
"idDriver": "100001",
"defaultTripType": "private",
"fleetManagerRole": null,
"identifications": [
{
"code": "90-00-00-77-20",
"from": "2019-08-08T10:38:15Z",
"rawId": "",
"vehicle": {
"isBusinessCar": "0",
"id": "10000",
"licensePlate": "ABCD",
"class": "Suziki 1.6 CDTI",
}
}
}
]
}
As an output I would need on one line: 'idDriver' from level 0 and then ‘licensePlate’ from identifications/ vehicle node in one line:
What I have been tried to apply is:
(after loading data from API what works fine)
json_data = json.loads(myResponse.text)
#only unwrapping 'identifications' – works 100% fine
workdata = json_normalize(json_data, record_path= ['identifications'],
meta=['idDriver'])
#unwrapping 'identifications'\'vehicle' - is NOT working
workdata = json_normalize(json_data, record_path= ['identifications','vehicle'],
meta=['idDriver'])
I would appreciate any hint on that.
Kind Regards,
Arek
I would go for rebuilding your dictionary like this:
New_Data = {
"id" : [],
"licensePlate" : []
}
New_Data["id"].append(data["idDriver"])
New_Data["licensePlate"].append(data["identifications"][0]["vehicle"]["licensePlate"])
If you have many data["identifications"] you can easly look over them, if you have many drivers you can do it as well.
For me your first code working nice, only if necessary remove vehicle. text from columns names:
json_data = {
"idDriver": "100001",
"defaultTripType": "private",
"fleetManagerRole": 'null',
"identifications": [
{
"code": "90-00-00-77-20",
"from": "2019-08-08T10:38:15Z",
"rawId": "",
"vehicle": {
"isBusinessCar": "0",
"id": "10000",
"licensePlate": "ABCD",
"class": "Suziki 1.6 CDTI",
}
}
]
}
workdata = json_normalize(json_data, record_path= ['identifications'], meta=['idDriver'])
print (workdata)
code from rawId vehicle.isBusinessCar \
0 90-00-00-77-20 2019-08-08T10:38:15Z 0
vehicle.id vehicle.licensePlate vehicle.class idDriver
0 10000 ABCD Suziki 1.6 CDTI 100001
workdata.columns = workdata.columns.str.replace('vehicle\.','')
print (workdata)
code from rawId isBusinessCar id \
0 90-00-00-77-20 2019-08-08T10:38:15Z 0 10000
licensePlate class idDriver
0 ABCD Suziki 1.6 CDTI 100001
I recently wrote a package to deal with tasks like this easily, it's called cherrypicker. I think the following snippet would achieve your task with CherryPicker:
from cherrypicker import CherryPicker
json_data = json.loads(myResponse.text)
picker = CherryPicker(json_data)
flat_data = picker.flatten['idDriver', 'identifications_0_vehicle_licensePlate'].get()
flat_data would then look like this (I'm assuming that your data is actually a list of objects like the one you described above):
[['100001', 'ABCD'], ...]
You can then load this into a dataframe as follows:
import pandas as pd
df = pd.DataFrame(flat_data, columns=["idDriver", "licensePlate"])
If you want to flatten your data in slightly different ways (e.g. you want every license plate/driver ID combination, not just the first license plate for each driver), then you should be able to do this too although it may require two or three lines rather than just one. Check our the docs for examples of other ways of using it: https://cherrypicker.readthedocs.io.
To install cherrypicker, it's just pip install --user cherrypicker.