what does 'cgi.parse_qs' mean

what does 'cgi.parse_qs' mean - python

i find this code :
def _oauth_parse_response(body):
p = cgi.parse_qs(body, keep_blank_values=False)
but i don't know what is mean
thanks

It means "look on the cgi object for an attribute called parse_qs, and call it as a function with body as a positional argument and keep_blank_values as a keyword argument with the value of False".
For the definition of cgi look further up, but it probably is the stdlib module of the same name.

docs.python.org has an excellent search engine, which will show you this:
This function is deprecated in this
module. Use urllib.parse.parse_qs()
instead. It is maintained here only
for backward compatibility.
and once you follow the link, you see:
Parse a query string given as a string
argument (data of type
application/x-www-form-urlencoded).
Data are returned as a dictionary. The
dictionary keys are the unique query
variable names and the values are
lists of values for each name.
and so on.
Much as I may like getting easy rep for answering absolutely trivial questions that anybody with a pulse should have zero trouble answering for themselves, maybe with some help from today's reasonably powerful search engines, some questions are really too easy to answer -- the stackoverflow equivalent of shooting sitting birds. You're not a newbie here -- why not, and I'm going to suggest an absolutely revolutionary strategy!, make the microscopic effort of doing your own searches and asking questions when there is something worth asking?

Parses a query string into a dictionary.
Deprecated in python >= 2.6.

Related

Can anyone shed some light on why this code from the alpaca-py documentation does not work?

I am trying to stream bitcoin data using the alpaca-py trading documentation but I keey getting a invalid syntax error. This is taken exactly from the alpaca-py documentation. Does anyone know what I am doing wrong?
from typing import Any
from alpaca.data.live import CryptoDataStream
wss_client = CryptoDataStream(key-id, secret-key)
# async handler
async def quote_data_handler(data: Any):
# quote data will arrive here
print(data)
wss_client.subscribe_quotes(quote_data_handler, "BTC")
wss_client.run()

Take a look at the dashes in your parameters. Usually a no-no in most languages since the "-" or dash usually refers to a minus which is a binary operator or an operator that operates on two operands to produce a new value or result."
Make sure parameters are set before passing them.
Try the underscore instead as in: key_id = "".
Also useful is the following link to a comprehensive list of crypto pairs supported by Alpaca: https://alpaca.markets/support/alpaca-crypto-coin-pair-faq/#:~:text=For%20the%20initial%20launch%20of,%2C%20SOL%2C%20TRX%2C%20UNI)
Stay up to date on the above list as it's membership may be a bit volatile at the moment.

I have a python tuple of namedtuples. Is there a way to access individual members other than by indexing or unpacking them?

I have a python 3.5 tuple where a typical structure of a data item is something like this.
item = (PosixPath('/mnt/dson/Music/iTunes/iTunes Music/funtoons.mp3'), tagtypes(txt=False, word=False, ebook=False, image=False, exe=False, iso=False, zip=False, raw=False, audio=True, music=True, photoshop=False, video=False, src=False, geek=False, pdf=False, appledouble=False, dot=False), fileinfo(size=13229145, datetime=1333848240.0))
This describes a common file on my Linux filesystem. If I want to know the size
of the given file, I can access it with something like this ---
item[2].size. Similarly, logic to grab the tags describing the file's contents would use code like --- item[1].music, etc..
It seems on the face of it, with each object being unique in the tuple
that if you wanted to access one of the members, you should be able to
drill down in the tuple and do something like item.fileinfo.size. All of
the information to select the correct item from the tuple is deducible
to the interpreter. However, if you do attempt something like
item.fileinfo.size you will get (almost expectedly) an attribute error.
I could create a namedtuple of namedtuples but that has a bit of a code smell to it.
I'm wondering if there is a more pythonic way to access the members of the tuple other than by indexing or unpacking. Is there some kind of
shorthand notation such that you convey to the interpreter which one of
the tuple's elements you must be referencing (because none of the other
options will fit the pattern)?
This is kind of a hard thing to explain and I'm famous for leaving out
critical parts. So if more info is needed by the first responders, please
let me know and I'll try and describe the idea more fully.

You really think doing this:
Item = namedtuple('Item', 'PosixPath tagtypes fileinfo')
item = Item(PosixPath('/mnt/dson/Music/iTunes/iTunes Music/funtoons.mp3'), tagtypes(txt=False, word=False, ebook=False, image=False, exe=False, iso=False, zip=False, raw=False, audio=True, music=True, photoshop=False, video=False, src=False, geek=False, pdf=False, appledouble=False, dot=False), fileinfo(size=13229145, datetime=1333848240.0))
is not worth it if it lets you do item.fileinfo.size AND item[2].size ? That's pretty clean. It avoids creating classes by hand and gives you all the functionality in a clear and concise manner. Seems like pretty good python to me.

Creating dictionary from xlsx: TypeError: argument of type 'Book' is not iterable

I'm pretty new to Python (and the xlrd module), so my code is probably not nearly as compact as it could be. I'm just using it to analyse some data, so it's more important for me to get what I'm doing rather than for me to make the code as compact as possible (though I do hope to improve, so feel free to give me advice on the coding itself, provided you manage to explain it to a 'newbie' :p )
That being said, here's my issue:
Context
I have an xlsx file with data on errors that people made when translating a text. The first column contains a code for the error relative to the text (conceptual errors), the second column contains a code for the translator that made the error. I want to create a dictionary in which the keys are the conceptual error codes, and the values are lists of the different translators that made that conceptual error.
An short fragment from the xlsx (to give you an idea of the codes in the two columns):
1722_Z1_CF5 1722_HT_EV_Z1_F1
1722_Z1_CF1 1722_PE_AL_Z1_F1
1722_Z1_CF9 1722_PE_EVC_Z1_F1
1722_Z1_CF5 1722_PE_LH_Z1_F1
As you can see, the conceptual error '1722_Z1_CF5' has been made by 2 different people ('1722_HT_EV_Z1_F1' and '1722_PE_LH_Z1_F1). The dictionary for this fragment would look something like:
1722_Z1_CF5: 1722_HT_EV_Z1_F1, 1722_PE_LH_Z1_F1
1722_Z1_CF1: 1722_PE_AL_Z1_F1
1722_Z1_CF9: 1722_PE_EVC_Z1_F1
Code
The code below is what I tried to do to create the dictionary.
def TranslatorsPerError(sheet):
TotalConceptualErrors(sheet)
TranslatorsPerError = {}
for row_index in range(sheet.nrows):
if sheet.cell(row_index,0).value in ConceptualErrors and sheet.cell(row_index,0).value not in TranslatorsPerError:
TranslatorsPerError[str(sheet.cell(row_index,0).value)]=[str(sheet.cell(row_index,1).value),]
if sheet.cell(row_index,0).value in ConceptualErrors and sheet.cell(row_index,0).value in TranslatorsPerError:
TranslatorsPerError[str(sheet.cell(row_index,0).value)].append(str(sheet.cell(row_index,1).value))
return TranslatorsPerError
'TotalConceptualErrors' is a function I created that returns a list ('ConceptualErrors') of the conceptual error codes from the first column without duplicates (and it filters out some other information that was also present in the first column, that's why I needed to use this one first).
Problem
The problem is that this function keeps giving me an error: TypeError: argument of type 'Book' is not iterable
I know that problems with iterables can sometimes be solved by casting certain things into a different type, but I'm not sure how I should solve this one. I tried to use 'str()' for different elements, but that didn't solve the problem. Maybe it has something to do with my code, maybe with the nature of dictionaries or xlrd... (looking at the type 'book', my guess would be on the latter).
Any help or feedback on how to fix this would be greatly appreciated. If you need extra information to understand what's going on or what I'm looking for, please ask.

where is ConceptualErrors being set?

How do the Python masters recognise and examine contents of Python 3 data structures?

I am a noob to Python.
I constantly find myself looking at a piece of code and trying to work out what is inside a data structure such as for example a dictionary. In fact the first thing I am trying to work out is "what sort of data structure is this?" and THEN I try to work out how to see what is inside it. I look at a variable and say "is this a dict, or a list, or a multidict or something else I'm not yet familiar with?". Then, "What's inside it?". It's consuming vast amounts of time and I just don't know if I'm taking the right approach.
So, the question is, "How do the Python masters find out what sort of data structure something is, and what techniques do they use to see what is inside those data structures?"
I hope the question is not too general but I'm spending ridiculous amounts of time just trying to fix issues with recognizing data structures and viewing their contents, let alone getting useful code written.
thanks heaps.

Using type() function for the variable will tell you the data type. For example:
inventory = {'cows': 4, 'pigs': 3, 'chickens': 5, 'bears': 2}
print(type(inventory))
will print
<class 'dict'>
which means the variable inventory is a dictionary.
Other possible data types are 'str' for string, 'int' for integer, 'float' for float,'tuple' for tuple, and 'bool' for boolean values.
To see what's inside a collection, you can simply use print() function.
aList = [ 'hunger', 'anger', 'burger']
print(aList)
will output
['hunger', 'anger', 'burger']

I usually care more about how a type is /used/ than what exactly a type is.
For example, if an object is used with say:
foo["hey"] = "there"
for key, value in foo.items():
print key, '->', value
Then I assume that 'foo' is some kind of dict-like object, and unless I have reason to investigate further, that's all I care about.
(Note: I'm still in python 2.x land, the syntax is slightly different in python 3.x, however the point remains)

In stead of "what is this?", with Python it can be better to ask "what does this do?" or "how is this used?". If you see something indexed, such as a['foo'], it shouldn't matter whether it is a dictionary or some other object, but simply that it is indexable by a string.
This idea is usually referred to as Duck Typing, so searching for this might give you some useful info. A quick search turned up this article, which seems relevant for you:
http://www.voidspace.org.uk/python/articles/duck_typing.shtml

I put an import pdb;pdb.set_trace() in the relevant place, and once in the debugger I use dir(), .__dict__ and pp, or any other forms of inspection necessary.

Pythonic way to implement a tokenizer

I'm going to implement a tokenizer in Python and I was wondering if you could offer some style advice?
I've implemented a tokenizer before in C and in Java so I'm fine with the theory, I'd just like to ensure I'm following pythonic styles and best practices.
Listing Token Types:
In Java, for example, I would have a list of fields like so:
public static final int TOKEN_INTEGER = 0
But, obviously, there's no way (I think) to declare a constant variable in Python, so I could just replace this with normal variable declarations but that doesn't strike me as a great solution since the declarations could be altered.
Returning Tokens From The Tokenizer:
Is there a better alternative to just simply returning a list of tuples e.g.
[ (TOKEN_INTEGER, 17), (TOKEN_STRING, "Sixteen")]?
Cheers,
Pete

There's an undocumented class in the re module called re.Scanner. It's very straightforward to use for a tokenizer:
import re
scanner=re.Scanner([
(r"[0-9]+", lambda scanner,token:("INTEGER", token)),
(r"[a-z_]+", lambda scanner,token:("IDENTIFIER", token)),
(r"[,.]+", lambda scanner,token:("PUNCTUATION", token)),
(r"\s+", None), # None == skip token.
])
results, remainder=scanner.scan("45 pigeons, 23 cows, 11 spiders.")
print results
will result in
[('INTEGER', '45'),
('IDENTIFIER', 'pigeons'),
('PUNCTUATION', ','),
('INTEGER', '23'),
('IDENTIFIER', 'cows'),
('PUNCTUATION', ','),
('INTEGER', '11'),
('IDENTIFIER', 'spiders'),
('PUNCTUATION', '.')]
I used re.Scanner to write a pretty nifty configuration/structured data format parser in only a couple hundred lines.

Python takes a "we're all consenting adults" approach to information hiding. It's OK to use variables as though they were constants, and trust that users of your code won't do something stupid.

In many situations, exp. when parsing long input streams, you may find it more useful to implement you tokenizer as a generator function. This way you can easily iterate over all the tokens without the need for lots of memory to build the list of tokens first.
For generator see the original proposal or other online docs

Thanks for your help, I've started to bring these ideas together, and I've come up with the following. Is there anything terribly wrong with this implementation (particularly I'm concerned about passing a file object to the tokenizer):
class Tokenizer(object):
def __init__(self,file):
self.file = file
def __get_next_character(self):
return self.file.read(1)
def __peek_next_character(self):
character = self.file.read(1)
self.file.seek(self.file.tell()-1,0)
return character
def __read_number(self):
value = ""
while self.__peek_next_character().isdigit():
value += self.__get_next_character()
return value
def next_token(self):
character = self.__peek_next_character()
if character.isdigit():
return self.__read_number()

"Is there a better alternative to just simply returning a list of tuples?"
Nope. It works really well.

"Is there a better alternative to just simply returning a list of tuples?"
That's the approach used by the "tokenize" module for parsing Python source code. Returning a simple list of tuples can work very well.

I have recently built a tokenizer, too, and passed through some of your issues.
Token types are declared as "constants", i.e. variables with ALL_CAPS names, at the module level. For example,
_INTEGER = 0x0007
_FLOAT = 0x0008
_VARIABLE = 0x0009
and so on. I have used an underscore in front of the name to point out that somehow those fields are "private" for the module, but I really don't know if this is typical or advisable, not even how much Pythonic. (Also, I'll probably ditch numbers in favour of strings, because during debugging they are much more readable.)
Tokens are returned as named tuples.
from collections import namedtuple
Token = namedtuple('Token', ['value', 'type'])
# so that e.g. somewhere in a function/method I can write...
t = Token(n, _INTEGER)
# ...and return it properly
I have used named tuples because the tokenizer's client code (e.g. the parser) seems a little clearer while using names (e.g. token.value) instead of indexes (e.g. token[0]).
Finally, I've noticed that sometimes, especially writing tests, I prefer to pass a string to the tokenizer instead of a file object. I call it a "reader", and have a specific method to open it and let the tokenizer access it through the same interface.
def open_reader(self, source):
"""
Produces a file object from source.
The source can be either a file object already, or a string.
"""
if hasattr(source, 'read'):
return source
else:
from io import StringIO
return StringIO(source)

When I start something new in Python I usually look first at some modules or libraries to use. There's 90%+ chance that there already is somthing available.
For tokenizers and parsers this is certainly so. Have you looked at PyParsing ?

I've implemented a tokenizer for a C-like programming language. What I did was to split up the creation of tokens into two layers:
a surface scanner: This one actually reads the text and uses regular expression to split it up into only the most primitve tokens (operators, identifiers, numbers,...); this one yields tuples (tokenname, scannedstring, startpos, endpos).
a tokenizer: This consumes the tuples from the first layer, turning them into token objects (named tuples would do as well, I think). Its purpose is to detect some long-range dependencies in the token stream, particularly strings (with their opening and closing quotes) and comments (with their opening an closing lexems; - yes, I wanted to retain comments!) and coerce them into single tokens. The resulting stream of token objects is then returned to a consuming parser.
Both are generators. The benefits of this approach were:
Reading of the raw text is done only in the most primitive way, with simple regexps - fast and clean.
The second layer is already implemented as a primitive parser, to detect string literals and comments - re-use of parser technology.
You don't have to strain the surface scanner with complex detections.
But the real parser gets tokens on the semantic level of the language to be parsed (again strings, comments).
I feel quite happy with this layered approach.

I'd turn to the excellent Text Processing in Python by David Mertz

This being a late answer, there is now something in the official documentation: Writing a tokenizer with the re standard library. This is content in the Python 3 documentation that isn't in the Py 2.7 docs. But it is still applicable to older Pythons.
This includes both short code, easy setup, and writing a generator as several answers here have proposed.
If the docs are not Pythonic, I don't know what is :-)

"Is there a better alternative to just simply returning a list of tuples"
I had to implement a tokenizer, but it required a more complex approach than a list of tuples, therefore I implemented a class for each token. You can then return a list of class instances, or if you want to save resources, you can return something implementing the iterator interface and generate the next token while you progress in the parsing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

what does 'cgi.parse_qs' mean - python

i find this code : def _oauth_parse_response(body): p = cgi.parse_qs(body, keep_blank_values=False) but i don't know what is mean thanks

It means "look on the cgi object for an attribute called parse_qs, and call it as a function with body as a positional argument and keep_blank_values as a keyword argument with the value of False". For the definition of cgi look further up, but it probably is the stdlib module of the same name.

Parses a query string into a dictionary. Deprecated in python >= 2.6.

Related

Can anyone shed some light on why this code from the alpaca-py documentation does not work?

I have a python tuple of namedtuples. Is there a way to access individual members other than by indexing or unpacking them?

Creating dictionary from xlsx: TypeError: argument of type 'Book' is not iterable

How do the Python masters recognise and examine contents of Python 3 data structures?

Pythonic way to implement a tokenizer

Categories

Resources