I want to copy some documents from MongoDB from one DB to another, but only specific properties of this document. Given a source document with the format:
{topic: string, email: {enabled: boolean}, mobile: {enabled: boolean}}
I want to insert into the new DB something with the following format:
{topic: string, email: {enabled: boolean}}
I am doing something like this in order to create the new object that will be inserted:
class Document(object):
topic = ""
email = object()
def make_document(topic, email):
document = Document()
document.topic = topic
document.email = email
return document
settings = source_db.collectionName.find({})
for setting in settings:
new_setting = make_document(setting.topic, setting.email)
db.collectionName.insertOne(new_setting)
My doubts are:
am I creating the new object correctly ?
is the declaration of the email property in the class correct (e.g. email = object())
is there a better (or a more correct) way to do it?
I am really new with Python.
This is the right way of initializing variable(class attribute) inside a class,
class Document(object):
def __init__(self, topic, email):
self.topic = topic
self.email = email
settings = source_db.collectionName.find({})
for setting in settings:
new_setting = Document(setting.topic, setting.email)
db.collectionName.insertOne(new_setting)
Question to you: Since the response you're getting it as dictionary/json, any reason why you trying to make it as class?
{topic: string, email: {enabled: boolean}, mobile: {enabled: boolean}}
Update: Then you can simply achieve this through dictionary parsing. Check this one,
settings = source_db.collectionName.find({})
for setting in settings:
new_data = {"topic": setting["topic"], "email": setting["email"]}
db.collectionName.insertOne(new_data)
Related
Let's say I have a route that allows clients to create a new user
(pseudocode)
#app.route("POST")
def create_user(user: UserScheme, db: Session = Depends(get_db)) -> User:
...
and my UserScheme accepts a field such as an email. I would like to be able to set some settings (for example max_length) globally in a different model Settings. How do I access that inside a scheme? I'd like to access the db inside my scheme.
So basically my scheme should look something like this (the given code does not work):
class UserScheme(BaseModel):
email: str
#validator("email")
def validate_email(cls, value: str) -> str:
settings = get_settings(db) # `db` should be set somehow
if len(value) > settings.email_max_length:
raise ValueError("Your mail might not be that long")
return value
I couldn't find a way to somehow pass db to the scheme. I was thinking about validating such fields (that depend on db) inside my route. While this approach works somehow, the error message itself is not raised on the specific field but rather on the entire form, but it should report the error for the correct field so that frontends can display it correctly.
One option to accept arbitrary JSON objects as input, and then construct a UserScheme instance manually inside the route handler:
#app.route(
"POST",
response_model=User,
openapi_extra={
"requestBody": {
"content": {
"application/json": {
"schema": UserScheme.schema(ref_template="#/components/schemas/{model}")
}
}
}
},
)
def create_user(request: Request, db: Session = Depends(get_db)) -> User:
settings = get_settings(db)
user_data = request.json()
user_schema = UserScheme(settings, **user_data)
Note that this idea was borrowed from https://stackoverflow.com/a/68815913/2954547, and I have not tested it myself.
In order to facilitate the above, you might want to redesign this class so that the settings object itself as an attribute on the UserScheme model, which means that you don't ever need to perform database access or other effectful operations inside the validator, while also preventing you from instantiating a UserScheme without some kind of sensible settings in place, even if they are fallbacks or defaults.
class SystemSettings(BaseModel):
...
def get_settings(db: Session) -> SystemSettings:
...
EmailAddress = typing.NewType('EmailAddress', st)
class UserScheme(BaseModel):
settings: SystemSettings
if typing.TYPE_CHECKING:
email: EmailAddress
else:
email: str | EmailAddress
#validator("email")
def _validate_email(cls, value: str, values: dict[str, typing.Any]) -> EmailAddress:
if len(value) > values['settings'].max_email_length:
raise ValueError('...')
return EmailAddress(value)
The use of tyipng.NewType isn't necessary here, but I think it's a good tool in situations like this. Note that the typing.TYPE_CHECKING trick is required to make it work, as per https://github.com/pydantic/pydantic/discussions/4823.
I am reading data from csv and converting that data into a python class object. But when i try to iterate over that rdd with user-defined class objects, I get errors like,
_pickle.PicklingError: Can't pickle <class '__main__.User'>: attribute lookup User on __main__ failed
I'm adding some part of the code here,
class User:
def __init__(self, line):
self.user_id = line[0]
self.location = line[1]
self.age = line[2]
def create_user(line):
user = User(line)
return user
def print_user(line):
user = line
print(user.user_id)
conf = (SparkConf().setMaster("local").setAppName("exercise_set_2").set("spark.executor.memory", "1g"))
sc = SparkContext(conf = conf)
users = sc.textFile("BX-Users.csv").map(lambda line: line.split(";"))
users_objs = users.map(lambda entry: create_user(entry))
users_objs.map(lambda entry: print_user(entry))
For the above code, I get results like,
PythonRDD[93] at RDD at PythonRDD.scala:43
CSV data source URL(Needs a zip extraction): HERE
UPDATE:
changing the code to include collect will result in error again, I still have to try with Pickle. I never tried that one before, If you anyone have a sample, I can do it easily.
users_objs = users.map(lambda entry: create_user(entry)).collect()
When you use
def create_user(line):
user = User(line)
return user
directly in a map call, this means that the User class has to be accessible to your nodes. Typically this means it needs to be serializable/picklable. How would a node use that class, or know what it is (unless you have a common NFS mount or something)? That's why you have gotten that pickle error. To make your User class picklable, please read this: https://docs.python.org/2/library/pickle.html.
Additionally, you aren't performing a collect() on your RDD, which is why you see PythonRDD[93] at RDD at PythonRDD.scala:43. It's still just an RDD, your data is out on the nodes.
Okay, found an explanation. Storing classes in separate files will make the classes picklable automatically. So I stored the User class inside user.py
and added the following import into my code.
from user import User
contents of User.py
class User:
def __init__(self, line):
self.user_id = line[0]
self.location = line[1]
self.age = line[2]
As mentioned in earlier answer, I can user collect(an RDD method) on the created User objects. So the following code will print all user ids, as I wanted.
for user_obj in users.map(lambda entry: create_user(entry)).collect():
print_user(user_obj)
I want to drop all user sessions when user resets his password, but I can't find a way to do that.
My idea was to get all UserTokens of the specific user and delete them, but it seems impossible, because of
user = model.StringProperty(required=True, indexed=False)
in UserToken model
Any ideas how to do that?
I see two ways how to do that.
First is to inherit from the UserToken class making user an indexed property. Then you can set the token_model class property to your new token model in your user class. Here is the code:
class MyToken(UserToken):
user = ndb.StringProperty(required=True)
class MyUser(User):
token_model = MyToken
# etc.
Don't forget to set the user model used by webapp2 to your user class if you do not do it already:
webapp2_config = {
"webapp2_extras.auth": {
"user_model": "models.MyUser"
},
# etc.
}
app = webapp2.WSGIApplication(routes, config=webapp2_config)
The second way is to make a complicated datastore query based on the token key name. Since the key names are of the form <user_id>.<scope>.<random>, it is possible to retrieve all the entities starting with a specific user ID. Have a look at the code:
def query_tokens_by_user(user_id):
min_key = ndb.Key(UserToken, "%s." % user_id)
max_key = ndb.Key(UserToken, "%s/" % user_id) # / is the next ASCII character after .
return UserToken.query(UserToken.key > min_key, UserToken.key < max_key)
This uses the fact that the query by key names works in the lexicographical order.
I am facing a very weird issue today.
Here is my serializer class.
class Connectivity(serializers.Serializer):
device_type = serializers.CharField(max_length=100,required=True)
device_name = serializers.CharField(max_length=100,required=True)
class Connections(serializers.Serializer):
device_name = serializers.CharField(max_length=100,required=True)
connectivity = Connectivity(required = True, many = True)
class Topologyserializer(serializers.Serializer):
name = serializers.CharField(max_length=100,required=True, \
validators=[UniqueValidator(queryset=Topology.objects.all())])
json = Connections(required=True,many=True)
def create(self, validated_data):
return validated_data
I am calling Topologyserializer from a Django view and I am passing a json like:
{
"name":"tokpwol",
"json": [
]
}
As per my experience with DRF since I have mentioned required = True in json field it should not accept the above json.
But I am able to create record.
Can anyone suggest me why it is not validating the json field and how it accepting empty list as json field?
I am using django rest framework 3.0.3
DRF does not clearly state what required stands for for lists.
In its code, it appears that validation passes as long as a value is supplied, even if that value is an empty list.
If you want to ensure the list is not empty, you'll need to validate its content manually. You would do that by adding the following method on your TopologySerializer:
def validate_json(self, value):
if not value:
raise serializers.ValidationError("Connections list is empty")
return value
I cannot test it right now, but it should work.
Hi I am kind of trying to get the concept behind DataStore as a No-SQL database, what I am trying to fetch is a list of object wich have been "reference" by another. As this
class Person(db.Model):
name = db.StringProperty(required=True)
class Contact(db.Model):
name = db.StringProperty(required=True)
email = db.StringProperty()
trader = db.ReferenceProperty(Person)
This works fine and they get to be saved when I use person.put() without any problem. But when I try to retrieve it and encoded as json it nevers shows me the contact as a list in fact it totally ignores it.
persons_query = Person.all()
persons = persons_query.fetch(50)
data = json.encode(persons)
I would expect person to have a collection of Contact but it doesn't any ideas on how to solve this problem?
To make it clearer currently i am getting something like this:
[
{
name: "John Doe"
}
]
I would like to be
[
{
name: "John Doe"
contacts: [{name:"Alex", email:'alex#gmail.com'}]
}
]
Edit
Thanks all you were right I needed to fetch the collection of contacts there was only one issue for this is that when Contact was being encoded it recursively tried to encode the Trader object and this it's contact and so on.
So I got an obvious error recursive error, the solution to this was clearly to remove the trader object from the Contact when it's being encoded.
Make a custom toJson function in your class
class Person(db.Model):
name = db.StringProperty(required=True)
def toJson(self):
contact = self.contact_set #this is the default collection name for your class
d = {"name":self.name,"contact":contact}
return json.dumps(d)
class Contact(db.Model):
name = db.StringProperty(required=True)
email = db.StringProperty()
trader = db.ReferenceProperty(Person)
then you may do the ff:
persons_query = Person.all()
persons = persons_query.fetch(50)
data = person.toJson()
To fetch all the contacts you will need to write a custom json encoder, which fetches all of the reverse of the reference property.
ReferenceProperties automatically get a reverse query. From the docs "collection_name is the name of the property to give to the referenced model class. The value of the property is a Query for all entities that reference the entity. If no collection_name is set, then modelname_set (with the name of the referenced model in lowercase letters and _set added) is used."
So you would add a method to resolve the reverse reference set query.
class Person(db.Model):
name = db.StringProperty(required=True)
def contacts(self):
return self.contact_set.fetch(50) # should be smarter than that
Then use it in your custom json encoder.
If you want to find all the contacts that include a person you will need to issue a query for it.
contacts = Contact.all().filter("trader =", person)