UIDs in Gmail - per "folder" or unique per user account? - python

I recently began using the IMAPClient package in an app where I had previously used imaplib. Something broke after a long time away from the code base (long enough that I was going from Python 2 to Python 3), and I wasn't up for wading into IMAP4 gore (again). I decided to search for something a bit higher level. IMAPClient has, for the most part, been an improvement.
Looking at the docs, I came across this (emphasis mine):
In the IMAP protocol, messages are identified using an integer. These message ids are
specific to a given folder.
I have been using UIDs, operating under the assumption that they are unique across all messages within the account, not per-folder (or per-label for Gmail). This might have been a mistake. Should I instead be considering them to be per-folder? How would that work when talking to Gmail, whose labels aren't exactly the same as folders?
Let me take this from the hypothetical to the concrete. Consider the message with Message-ID <python/cpython/pull/24293#github.com> to which I have applied two labels, polly and python/github. IMAP4 treats these labels as if they are folders. Will Gmail associate one UID or two with that message? If I search first polly, then python/github I assume something representing that message will be returned twice. Will it have the same UID or not?
I found this bit of Gmail-specific IMAP4 documentation, but it seems to be mum on the topic.

Related

Is there a library in Python that has a full list of registered domains on the internet?

I want to implement a program that basically detects to see if input from the user is not a genuine domain name or registered name on the ICANN.
Is there a library in Python that has a full list of registered domains on the internet?
No, not in Python or any other language, and for obvious reasons (don't you think such kind of list would get misused if it existed?)
First two points:
see if input from the user is not a genuine domain name or registered name on the ICANN.
ICANN has nothing to do here with your needs. It has no operational play in the day to day life of domain names, when they are registered or deleted. ICANN just define the list of current TLDs, that is basically topmost registries maintaining central databases with the list of domain names.
What is a "genuine" domain? But more importantly, what exactly do you need to test with domains? Is it like for people entering an email address to test if it is plausible? If so, testing the domain name is not the correct approach. But to say more really depends on your use case that you do not describe, which also makes your question offtopic as long as it has no more relationship with programming.
So just some generic ideas:
you can use the DNS to query domain names, but this has edge cases (and you need to understand how the DNS works) among which not all really registered domain names are published, for totally normal reasons
you can use whois, as John said, or better RDAP when it is available (at least all gTLDs); this has drawbacks too, as, without a solid library parsing the replies, you don't even have a standard way of finding out if a name does not exist, as the registries will give different free form strings for such cases; plus it is not suitable for high volume queries as it is heavily rate limited
if you would be really interested in something closer to a list of domain names, all gTLDs are required to publish their zonefile, at least daily, which is basically the list of all resolving domains (which is a subset of the list of all registered domains, consider a few % of difference), see https://czds.icann.org/ ; some ccTLDs do it too, but each one on their own policies and rules, some have an "open data" feature that gives things similar (often with a delay) or a list of "recently" registered domain names (so if you grab that days after days at some point you have something close to the list of all domains)
Any good programming language has already polished libraries for DNS queries, whois or RDAP queries, or parsing zonefiles.

Storing and Searching Protobuf messages in a database

I have a nested data structure defined with protocol-buffer messages. I have a service that receives these messages. On the server side, I need to store these messages and be able to search/find messages that have certain values for different fields, or to find the message(s) that is referenced in another one.
I have searched on what would be the best way to do it, and it seems having a database that can store these messages (directly or via a JSON) and allow query in them would be a good way.
I searched on what kind of database would provide this support effectively, but it was not very successful.
One way I found was around MongoDB, setting a mirror schema and converting messages to JSON and storing on MongoDB.
I also found the ProfaneDB, where the problem it states to address is "very much" like what I need. However, it seems it has been dormant in the last 3-4 years, and not sure how stable/scalable it is, or there has been more recent, or more popular solutions for this.
I thought there should be better solutions to go for this use case. I'd appreciate if one could advise what would be a good way to do this?
I think you should discard the binary protobuf messages as soon as you've unmarshaled them on your server. Unless you have a legal requirement to retain the transmitted message as-is. The protobuf format is optimized for network transmission (on-the-wire) not searching.
Once you have the message in your preferred language's structs type, most databases will be able to store the data. Your focus would then need to be on how you wish to access the data, what levels of reliability, availability, consistency etc. How much you want to pay...
One important requirement is whether you want to have structured queries against your data or whether you want free-form (arbitrary|text) searches. For the former, you may consider SQL and NoSQL databases. For the latter, something like Elasticsearch.
There are so many excellent, well-supported, cloud-based (if you want it) databases that can meet your needs, that you should disregard any that aren't popular unless you have very specific needs that are only addressed by a niche solution.

Why is there a difference in format of encrypted passwords in Django

I am using Django 1.97. The encrypted passwords are significantly different (in terms of the format).
Some passwords are of format $$$:
pbkdf2_sha256$24000$61Rm3LxOPsCA$5kV2bzD32bpXoF6OO5YuyOlr5UHKUPlpNKwcNVn4Bt0=
While others are of format :
!9rPYViI1oqrSMfkDCZSDeJxme4juD2niKcyvKdpB
Passwords are set either using User.objects.create_user() or user.set_password(). Is this difference an expected one ?
You'll be fine. You just have some blank passwords in your database.
Going back as far as V0.95, django used the $ separators for delimiting algorithm/salt/hash. These days, django pulls out the algorithm first by looking at what is in front of the first $ and then passes the whole lot to the hasher to decode. This allows for a wider set of formats, including the one for PBKDF2 which adds an extra iterations parameter in this list (as per your first example).
However, it also recognises that some users may not be allowed to login and/or have no password. This is encoded using the second format you've seen. As you can see here:
If password is None then a concatenation of UNUSABLE_PASSWORD_PREFIX and a random string will be returned which disallows logins.
You can also see that the random string is exactly 40 characters long - just like your second example.
In short, then, this is all as expected.
There is no significant difference between User.objects.create_user() and user.set_password() since first uses second.
Basically, passwords are in string with format <algorithm>$<iterations>$<salt>$<hash> according to docs. The differences might come from PASSWORD_HASHERS settings variable. May be one password was created with one hasher and other password with another. But if you'll keep those hashers in variable mentioned above all should be fine, users will able to change it etc. You can read about it in little notice after bcrypt section.
Also docs for django.contrib.auth package might be helpful too. Link.
UPDATE:
If you find documentation of an old django versions (1.3 for example), you will see that
Previous Django versions, such as 0.90, used simple MD5 hashes without password salts. For backwards compatibility, those are still supported; they'll be converted automatically to the new style the first time check_password() works correctly for a given user.
So I think that the answer might be somewhere here. But it really depends on how legacy your project is, so you can decide if it's normal or what. Anyway you can issue check_password() to be sure. Or you can just email your user with "change password please" notification. There are many factors involved really.

How to I filter out emails from multiple senders on a given date using imaplib in Python?

I am using the imaplib4 search function as of now and calling it multiple times for each email id. I tried looking everywhere on the internet to look for a format that would let me specify email ids in OR and a date.
As of now this format is working for me but I can only specify one email at a time so I make a call per email id.
(FROM "abc#email.com" (ON "25-Dec-2015"))
In case I have abc and def both sending me emails on that date, I would like a way to specify it in one call.
If your server is RFC compliant and has a full search implementation you can chain together ORs. A simple search should look something like
ON "25-DEC-2015" OR FROM "abc#email.com" FROM "def#email.com"
OR takes two search predicates. If you need to chain them, embed an OR in the other one:
ON "25-DEC-2015" OR FROM "abc#email.com" OR FROM "def#email.com" FROM "ghi#email.com"
Not a very nice syntax, and it probably won't work beyond a handful of addresses. Probably won't work on 'off-brand' IMAP servers.

How to model a social news feed on Google App Engine

We want to implement a "News feed" where a user can see messages
broadcasted by her friends, sorted with newest message first. But the
feed should reflect changes in her friends list. (If she adds new
friends, messages from those should be included in the feed, and if
she removes friends their messages should not be included.) If we use
the pubsub-test example and attach a recipient list to each message
this means a lot of manipulation of the message recipients lists when users
connect and disconnect friends.
We first modeled publish-subscribe "fan out" using conventional RDBMS
thinking. It seemed to work at first, but then, since the IN operator
works the way it does, we quickly realized we couldn't continue on
that path. We found Brett Slatkin's presentation from last years
Google I/O and we have now watched it a few times but it isn't clear to
us how to do it with "dynamic" recipient lists.
What we need are some hints on how to "think" when modeling this.
Pasting the answer I got for this question in the Google Group for Google App Engine http://groups.google.com/group/google-appengine/browse_thread/thread/09a05c5f41163b4d# By Ikai L (Google)
A couple of thoughts here:
is removing of friends a common event? similarly, is adding of
friends a common event? (All relative,
relative to "reads" of the news feed)
From what I remember, the only way to make heavy reads scale is to write
the data multiple times in peoples'
streams. Twitter does this, from what
I remember, using a "eventually
consistent" model. This is why your
feed will not update for several
minutes when they are under heavy
load. The general consensus, though,
is that a relational, normalized
model simply will not work.
the Jaiku engine is open source for your study:
http://code.google.com/p/jaikuengine.
This runs on App Engine Hope these
help when you're considering a design.

Categories

Resources