Reformat a date string with hyphens - python

I'm relatively new to python and would like to write clean code from the beginning. I need a parser of a string which formats a date in the form of "20140101" into "2014-01-01". How I achieved this is via:
def parse(date):
list_date = list(date)
list_date.insert(4,"-")
list_date.insert(7,"-")
return "".join(list_date)
which works perfectly fine but looks not very clean. Maybe there is no other solution but if there is a more pythonic way to code it would be appreciated if you would share this!

def parse(date):
return "{}-{}-{}".format(date[:4], date[4:6], date[6:])
Might have made a small error, but that should do the trick, given how you're doing this.
See also pyformat.info for more information about Python's string formatting.
Also, I suggest looking into the datetime module for more info about even more Pythonic ways of dealing with dates. ;)

You can use format, as mentioned by B. Eckles. But you can also do:
def parse(date):
return "%s-%s-%s" % (date[:4], date[4:6], date[6:])
Using format seems unnecessary as you aren't actually using it for anything more than substituting portions of your string.

Related

Alteryx MD5_ASCII() function equivalent solution in Python

I'm not able to match the results of an MD5_ASCII() function in my Python program. Unfortunately I can't post the exact input string and results so please bear with me.
My code in Python looks something like this:
import hashlib
str_text = "SV6*123*TT"
str_hashed_1 = hashlib.md5(str_text.encode()).hexidigest()
The produced string does not match the MD5_ASCII() output. What is the correct way of doing this?
I've eventually found the issue. There was nothing wrong with the hashing function, the issue was with the input itself.

How does unicodecsv.DictReader represent a csv file

I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library.
More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code:
def read_csv(filename):
with open(filename,'rb')as f:
reader = unicodecsv.DictReader(f)
return list(reader)
In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can someone please explain it to me.
For example, one thing I don't understand is why the following throws an error
enrollment['cancel_date']
While the following works fine:
for enrollment in enrollments:
enrollments['cancel_date'] = parse_date(enrollment['cancel_date'])
Hopefully this question makes sense. I'm just having trouble visualizing how all of this is represented.
Any help would be appreciated.
Thanks.
I too landed up here for some troubles related to the course and found this unanswered. However I think you already managed it. Anyway answering here so that someone else might find this helpful.
Like we all know, dictionaries can be accessed like
dictionary_name['key']
and likewise
enrollments['cancel_date'] should also work.
But if you do something like
print enrollments
you will see the structure
[{u'status': u'canceled', u'is_udacity': u'True', ...}, {}, ... {}]
If you notice the brackets, it's like a list of dictionaries. You may argue it is a list of list. Try it.
print enrollments[0][0]
You'll get an error! KeyError.
So, it's like a collection of dictionaries. How to access them? Zoom down to any dictionary (rather rows of the csv) by enrollments[n].
Now you have a dictionary. You can now use freely the key.
print enrollments[0]['cancel_date']
Now coming to your loop,
for enrollment in enrollments:
enrollment['cancel_date'] = parse_date(enrollment['cancel_date'])
What this is doing is the enrollment is the dummy variable capturing each of the iterable element enrollments like enrollments[1], enrollments[2] ... enrollments[n].
So every-time enrollment is having a dictionary from enrollments and so enrollment['cancel_date'] works over enrollments['cancel_date'].
Lastly I want to add a little more thing which is why I came to the thread.
What is the meaning of "u" in u'..' ? Ex: u'cancel_date' = u'11-02-19'.
The answer is this means the string is encoded as an Unicode. It is not part of the string, it is python notation. Unicode is a library that contains the characters and symbol for all of the world's languages.
This mainly happens because the unicodecsv package does not take the headache of tracking and converting each item in the csv file. It reads them as Unicode to preserve all characters. Now that's why Caroline and you defined and used parse_date() and other functions to convert the Unicode strings to the desired datatype. This is all a part of the Data Wrangling process.

Search a database for elements including a string variable

I'm pretty new to SQLite and Python and have run into a bit of confusion. I'm trying to return all elements in a column that contain a substring which is passed to a function as a variable in Python. My code is running, but it's returning an empty result instead of the correct result.
Here's the code with the names generalized:
def myFunc(cursor,myString):
return cursor.execute("""select myID from Column where name like '%'+?'%' """,(myString,))
Like I said, the code does run without error but returns an empty result instead of the result that I know it should be. I'm assuming it has something to do with my use of the wildcard and/or question mark, but I can't be sure. Anyone have any ideas? Thanks in advance for your time/help! Also, this is my first post, so I apologize in advance if I missed any of the recommended protocols for asking questions.
Well, '%'+?'%' definitely isn't going to work—you're trying to concatenate with + on the left, but with no operator…
You can compute LIKE-search fields if you do it right—'%'+?+'%', in this case. That will cause problems with some databases (from not working, to doing a less efficient search), but, at least according to CL.'s comment, sqlite3 will be fine.
But the easy thing to do is to just substitute a complete parameter, rather than part of one. You can put % into the parameters, and it'll be interpreted just fine. So:
return cursor.execute("""select myID from Column where name like ?""",
('%'+myString+'%',))
And this also has the advantage that if you want to do a search for initial substrings ('foo%'), it'll be the same SQL statement but with a different parameter.
Try this:
def myFunc(cursor,myString):
return cursor.execute('select myID from Column where name like "{0}"'.format(myString))

Shortening code for formatted output

I must write a two dimensional array (let's say array[100][20]) into a file. This is just an example of my code:
for i in range(0,99):
print >> file, "%15.9e%24.15e%3d..." % (array[i][0], array[i][1], array[i][2], ... )
Is it possible to shorten it after "%" symbol with a slicing or something similar notation?
You could do:
print >> file, format_string % tuple(array[i])
That said, I think that I would probably use numpy here to save the data in a format that isn't ascii for efficiency (both in terms of time spent doing IO and disk space).
At the risk of being down-voted, are you sure you're asking the right question? I've always found that if something is ugly or hard, it's probably not the best approach, especially in Python.
You might check out the csv module---what format do you want to write, exactly?
There's also the texttable module, that makes pretty tables for you, and will even dump them to a text file.
Finally, if you should be so inclined, there are the Python Excel Utilities, which will dump things into Excel spreadsheets. I have written some wrappers around the excel read and excel write interfaces in that module that I'll post here if you want---they're nothing special, but they may save you some time.

How should I organise my functions with pyparsing?

I am parsing a file with python and pyparsing (it's the report file for PSAT in Matlab but that isn't important). here is what I have so far. I think it's a mess and would like some advice on how to improve it. Specifically, how should I organise my grammar definitions with pyparsing?
Should I have all my grammar definitions in one function? If so, it's going to be one huge function. If not, then how do I break it up. At the moment I have split it at the sections of the file. Is it worth making loads of functions that only ever get called once from one place. Neither really feels right to me.
Should I place all my input and output code in a separate file to the other class functions? It would make the purpose of the class much clearer.
I'm also interested to know if there is an easier way to parse a file, do some sanity checks and store the data in a class. I seem to spend a lot of my time doing this.
(I will accept answers of it's good enough or use X rather than pyparsing if people agree)
I could go either way on using a single big method to create your parser vs. taking it in steps the way you have it now.
I can see that you have defined some useful helper utilities, such as slit ("suppress Literal", I presume), stringtolits, and decimaltable. This looks good to me.
I like that you are using results names, they really improve the robustness of your post-parsing code. I would recommend using the shortcut form that was added in pyparsing 1.4.7, in which you can replace
busname.setResultsName("bus1")
with
busname("bus1")
This can declutter your code quite a bit.
I would look back through your parse actions to see where you are using numeric indexes to access individual tokens, and go back and assign results names instead. Here is one case, where GetStats returns (ngroup + sgroup).setParseAction(self.process_stats). process_stats has references like:
self.num_load = tokens[0]["loads"]
self.num_generator = tokens[0]["generators"]
self.num_transformer = tokens[0]["transformers"]
self.num_line = tokens[0]["lines"]
self.num_bus = tokens[0]["buses"]
self.power_rate = tokens[1]["rate"]
I like that you have Group'ed the values and the stats, but go ahead and give them names, like "network" and "soln". Then you could write this parse action code as (I've also converted to the - to me - easier-to-read object-attribute notation instead of dict element notation):
self.num_load = tokens.network.loads
self.num_generator = tokens.network.generators
self.num_transformer = tokens.network.transformers
self.num_line = tokens.network.lines
self.num_bus = tokens.network.buses
self.power_rate = tokens.soln.rate
Also, a style question: why do you sometimes use the explicit And constructor, instead of using the '+' operator?
busdef = And([busname.setResultsName("bus1"),
busname.setResultsName("bus2"),
integer.setResultsName("linenum"),
decimaltable("pf qf pl ql".split())])
This is just as easily written:
busdef = (busname("bus1") + busname("bus2") +
integer("linenum") +
decimaltable("pf qf pl ql".split()))
Overall, I think this is about par for a file of this complexity. I have a similar format (proprietary, unfortunately, so cannot be shared) in which I built the code in pieces similar to the way you have, but in one large method, something like this:
def parser():
header = Group(...)
inputsummary = Group(...)
jobstats = Group(...)
measurements = Group(...)
return header("hdr") + inputsummary("inputs") + jobstats("stats") + measurements("meas")
The Group constructs are especially helpful in a large parser like this, to establish a sort of namespace for results names within each section of the parsed data.

Categories

Resources