Python Object ID - python

I would like create a object ID in python, I explain:
I know that exist mysql, sqlite, mongoDB, etc... But I would like at least create a object ID for store data in json.
Before I was putting the json info inside of a list and the ID was the index of this json in the list, for example:
data = [{"name": userName}]
data[0]["id"] = len(data) - 1
Then I realize that was wrong and obviously dont look like objectID, then I thought in that the ID can be the Date and Time together, but I thought was wrong too, so, I would like know the best way for make like a objectID, that represent this json inside the list. this list will be more longer, is for users or clients (is just a personal project). And how can be a example of a method for create the ID
Thanks so much, hope I explained good.

If you just want to create a locally unique value, you could use a really simple autoincrement approach.
data = [
{"name": "Bill"},
{"name": "Javier"},
{"name": "Jane"},
{"name": "Xi"},
{"name": "Nosferatu"},
]
current_id = 1
for record in data:
record["id"] = current_id
current_id += 1
print(record)
# {'name': 'Bill', 'id': 1}
# {'name': 'Javier', 'id': 2}
# {'name': 'Jane', 'id': 3}
# {'name': 'Xi', 'id': 4}
# {'name': 'Nosferatu', 'id': 5}
To add a new value if you're not initializing like this, you can get the last one with max(d.get("id", 0) for d in data).
This may cause various problems depending on your use case. If you don't want to worry about that, you could also throw UUIDs at it; they're heavier but easy to generate with reasonable confidence of uniqueness.
from uuid import uuid4
data = [{"name": "Conan the Librarian"}]
data[0]["id"] = str(uuid4())
print(data)
# 'id' will be different each time; example:
# [{'name': 'Conan the Librarian', 'id': '85bb4db9-c450-46e3-a027-cb573a09f3e3'}]
Without knowing your actual use case, though, it's impossible to say whether one, either, or both of these approaches would be useful or appropriate.

Related

Extracting data from nested JSON using python

I am hoping someone can help me solve this problem I am having with a nested JSON response. I have been trying to crack this for a few weeks now with no success.
Using a sites API I am trying to create a dictionary which can hold three pieces of information, for each user, extracted from the JSON responses. The first JSON response holds the users uid and crmid that I require.
The API comes back with a large JSON response, with an object for each account. An extract of this for a single account can be seen below:
{
'uid': 10,
'key':
'[
N#839374',
'customerUid': 11,
'selfPaid': True,
'billCycleAllocationMethodUid': 1,
'stmtPaidForAccount': False,
'accountInvoiceDeliveryMethodUid': 1,
'payerAccountUid': 0,
'countryCode': None,
'currencyCode': 'GBP',
'languageCode': 'en',
'customFields':
{
'field':
[{
'name': 'CRMID',
'value': '11001'
}
]
},
'consentDetails': [],
'href': '/accounts/10'}
I have made a loop which extracts each UID for each account:
get_accounts = requests.get('https://website.com/api/v1/accounts?access_token=' + auth_key)
all_account_info = get_accounts.json()
account_info = all_account_info['resource']
account_information = {}
for account in account_info:
account_uid = account['uid']
I am now trying to extract the CRMID value, in this case '11001': {'name': 'CRMID', 'value': '11001'}.
I have been struggling all week to make this work, I have two problems:
I would like to extract the UID (which I have done) and the CRMID from the deeply nested 'customFields' dictionary in the JSON response. I manage to get as far as ['key'][0], but I am not sure how to access the next dictionary that is nested in the list.
I would like to store this information in a dictionary in the format below:
{'accounts': [{'uid': 10, 'crmid': 11001, 'amount': ['bill': 4027]}{'uid': 11, 'crmid': 11002, 'amount': ['bill': 1054]}]}
(The 'bill' information is going to come from a separate JSON response.)
My problem is, with every loop I design the dictionary seems to only hold one account/the last account it loops over. I cant figure out a way to append to the dictionary instead of overwrite whilst using a loop. If anyone has a useful link on how to do this it would be much appreciated.
My end goal is to have a single dictionary which holds the three pieces of information for each account (uid, crmid, bill). I'm then going to export this into a CSV document.
Any help, guidance, useful links etc would be much appreciated.
In regards to question 1, it may be helpful to print each level as you go down, then try and work out how to access the object you are returned at that level. If it is an array it will using number notation like [0] and if it is a dictionary it will use key notation like ['key']
Regarding question 2, your dictionary needs unique keys. You are probably looping over and replacing the whole thing each time.
The final structure you suggest is a bit off, imagine it as:
accounts: {
'10': {
'uid': '10',
'crmid': 11001,
'amount': {
'bill': 4027
}
},
'11': {
'uid': '11',
'crmid': 11011,
'amount': {
'bill': 4028
}
}
}
etc.
So you can access accounts['10']['crmid'] or accounts['10']['amount']['bill'] for example.

find an item inside a list of dictionaries

Say I have a list of dictionaries.
each dict in the list has 3 elements.
Name, id and status.
list_of_dicts = [{'id':1, 'name':'Alice', 'status':0},{'id':2, 'name':'Bob', 'status':0},{'id':3, 'name':'Robert', 'status':1}]
so I get:
In[20]: print list_of_dicts
Out[20]:
[{'id': 1, 'name': 'Alice', 'status': 0},
{'id': 2, 'name': 'Bob', 'status': 0},
{'id': 3, 'name': 'Robert', 'status': 1}]
If i recieve a name, how can I get its status without iterating on the list?
e.g. I get 'Robert' and I want to output 1. thank you.
for example you can use pandas
import pandas as pd
list_of_dicts = [{'id':1, 'name':'Alice', 'status':0},{'id':2, 'name':'Bob', 'status':0},{'id':3, 'name':'Robert', 'status':1}]
a = pd.DataFrame(list_of_dicts)
a.loc[a['name'] == 'Robert']
and play with dataframes its very fast because write on c++ and easy (like sql queries)
As you found you have to iterate (unless you are able to change your data structure to an enclosing dict) why don't you just do it?
>>> [d['status'] for d in list_of_dicts if d['name']=='Robert']
[1]
Despite this, I recommend considering a map type (like dict) every time you see some 'id' field in a proposed data structure. If it's there you probably want to use it for general identification, instead of carrying dicts around. They can be used for relations also, and transfer easily into a relational database if you need it later.
I don't think you can do what you ask without iterating through the dictionary:
Best case, you'll find someone that suggests you a method that hides the iteration.
If what really concerns you is the speed, you may break your iteration as soon as you find the first valid result:
for iteration, elements in enumerate(list_of_dicts):
if elements['name'] == "Robert":
print "Elements id: ", elements['id']
break
print "Iterations: ", iteration
# OUTPUT: Elements id: 3, Iterations: 1
Note that numbers of iteration may vary, since dictionaries are not indexed, and if you have more "Roberts", only for one the "id" will be printed
It's not possible to do this without iteration.
However, but you can transform you dictionary into a different data structure, such as a dictionary where names are the keys:
new_dict = {person["name"]: {k: v for k, v in person.items() if k != "name"} for person in list_of_dicts}
Then you can get the status like so:
new_dict["Robert"]["status"]
# 1
Additionally, as #tobias_k mentions in the comments, you can keep the internal dictionary the same:
{person["name"]: person for person in list_of_dicts}
The only issue with the above approaches is that it can't handle multiple names. You can instead add the unique id into the key to differentiate between names:
new_dict = {(person["name"], person["id"]): person["status"] for person in list_of_dicts}
Which can be called like this:
new_dict["Robert", 3]
# 1
Even though it takes extra computation(only once) to create these data structures, the lookups afterwards will be O(1), instead of iterating the list every time when you want to search a name.
Your list_of_dicts cannot be reached without a loop so for your desire your list should be modified a little like 1 dict and many lists in it:
list_of_dicts_modified = {'name':['Alice', 'Bob', 'Robert'],'id':[1, 2, 3], 'status': [0, 0, 1]}
index = list_of_dicts_modified['name'].index(input().strip())
print('Name: {0} ID: {1} Status: {2}'.format(list_of_dicts_modified['name'][index], list_of_dicts_modified['id'][index], list_of_dicts_modified['status'][index]))
Output:
C:\Users\Documents>py test.py
Alice
Name: Alice ID: 1 Status: 0

Code to extract text fields from a mongodb collection and append it to a list in python using pymongo

I have created an if statement to cycle through a mongodb collection of json objects and extract the text field from each and append it to a list. Here is the code below.
appleSentimentText = []
for record in db.Apple.find():
if record.get('text'):
appleSentimentText.append(record.get("text"))
This works grand but I have 20 collections to do this to and I fear the code may start to get a little messy and unmanageable with another 19 variations of this code. I have started to write a piece of code that may accomplish this. Firstly I have created an array with the names of the 20 collections in it shown below.
filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Yahoo', 'Apple','Google', 'Amazon', 'EBay', 'Diageo',
'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
'Investec', 'WWE', 'Time Warner', 'Santander Group']
I then use this array in an if statement to cycle through each collection
for word in filterKeywords:
for record in db[word].find():
if db[word].get('text'):
I now want it to create a list variable based on the collection name (ie AppleSentimentText if collection is apple, FacebookSentimentText if it is Facebook collection, etc) though im unsure of what to do next. Any help is welcome
You may use $exists and limit the returned field to "text" so it doesn't need to go through all records, in pymongo it should be something like this:
Edited:
As #BarnieHackett pointed out, you can filter out the _id as well.
for word in filterKeywords:
for r in db[word].find({'text': {'$exists': True}}, {'text': 1, '_id': False}):
appleSentimentText.append(r['text'])
The key is to use $exists and then limit the return field to 'text', unfortunately since pymongo returns the cursor which includes the '_id' & 'text' field, you need to filter this out.
Hope this helps.

Finding a key value by referencing another key value in the same dict level in Python

I'm pretty new to Python, so I'm having a hard time even coming up with the proper jargon to describe my issue.
Basic idea is I have a dict that has the following structure:
myDict =
"SomeMetric":{
"day":[
{"date": "2013-01-01","value": 1234},
{"date": "2013-01-02","value": 5678},
etc...
I want to pull out the "value" where the date is known. So I want:
myDict["SomeMetric"]["day"]["value"] where myDict["SomeMetric"]["day"]["date"] = "2013-01-02"
Is there a nice one-line method for this without iterating through the whole dict as my dict is much larger, and I'm already iterating through it, so I'd rather not do nested iteritems.
Generator expressions to the resque:
next(d['value']
for d in myDict['SomeMetric']['day']
if d['date'] == "2013-01-02")
So, loop over all day dictionaries, and find the first one that matches the date you are looking for. This loop stops as soon as a match is found.
Do you have control over your data structure? It seems to be constructed in such a way that lends itself to sub-optimal lookups.
I'd structure it as such:
data = { 'metrics': { '2013-01-02': 1234, '2013-01-01': 4321 } }
And then your lookup is simply:
data['metrics']['2013-01-02']
Can you change the structure? If you can, you might find it much easier to change the day list to a dictionary which has dates as keys and values as values, so
myDict = {
"SomeMetric":{
"day":{
"2013-01-01": 1234,
"2013-01-02": 5678,
etc...
Then you can just index into it directly with
myDict["SomeMetric"]["day"]["2013-01-02"]

Implementing "select distinct ... from ..." over a list of Python dictionaries

Here is my problem: I have a list of Python dictionaries of identical form, that are meant to represent the rows of a table in a database, something like this:
[ {'ID': 1,
'NAME': 'Joe',
'CLASS': '8th',
... },
{'ID': 1,
'NAME': 'Joe',
'CLASS': '11th',
... },
...]
I have already written a function to get the unique values for a particular field in this list of dictionaries, which was trivial. That function implements something like:
select distinct NAME from ...
However, I want to be able to get the list of multiple unique fields, similar to:
select distinct NAME, CLASS from ...
Which I am finding to be non-trivial. Is there an algorithm or Python included function to help me with this quandry?
Before you suggest loading the CSV files into a SQLite table or something similar, that is not an option for the environment I'm in, and trust me, that was my first thought.
If you want it as a generator:
def select_distinct(dictionaries, keys):
seen = set()
for d in dictionaries:
v = tuple(d[k] for k in keys)
if v in seen: continue
yield v
seen.add(v)
if you want the result in some other form (e.g., a list instead of a generator) it's not hard to alter this (e.g., .append to the initially-empty result list instead of yielding, and return the result list at the end).
To be called, of course, as
for values_tuple in select_distinct(thedicts, ('NAME', 'CLASS')):
...
or the like.
distinct_list = list(set([(d['NAME'], d['CLASS']) for d in row_list]))
where row_list is the list of dicts you have
One can implement the task using hashing. Just hash the content of rows that appear in the distinct query and ignore the ones with same hash.

Categories

Resources