I’m a beginner and I’m having a lot of difficulties to plot my data contained in a csv file given that for this case the name months (which are abbreviated) in the file need to be necessarily converted to numbers. I’ve been told to use the library “time” and I’ve spent so many hours looking for possible solutions but nothing has worked yet. Any help would be much appreciated.
Using time Library for seems like an overkill. Easiest way to this would be to create a dictionary. But if you want to achieve this using time library then you can try
from time import strptime
strptime('Mar','%b').tm_mon
Related
I am really struggling with a programming task I have been handed, I have been asked to read from a CSV file which shows the names of beaches and a rating and then work out the average rating. Any help I could get with this would be great also Ithe task given to me am at a beginner level so please don't judge.
For reading CSV files, check out Python's standard library csv(https://docs.python.org/3.7/library/csv.html)
This content will have enough tutorials and readings that will guide you through your assignment.
As for taking sums and averages, Python's built-in functions(https://docs.python.org/3.7/library/functions.html) will do you enough good.
If you are stuck with any of them, feel free to add comments.
Good luck!
I have many files with three million lines in identical tab delimited format. All I need to do is divide the number in the 14th "column" by the number in the 12th "column", then set the number in the 14th column to the result.
Although this is a very simple function I'm actually really struggling to work out how to achieve this. I've spent a good few hours searching this website but unfortunately the answers I've seen have completely gone over the top of my head as I'm a novice coder!
The tools I have Notepad++ and Ultraedit (which has the ability to use Javascript, although i'm not familiar with this), and Python 3.6 (I have very basic Python knowledge). Other answers have suggested using something called "awk", but when I looked this up it needs Unix - I only have Windows. What's the best tool for getting this done? I'm more than willing to learn something new.
In python there are a few ways to handle csv. For your particular use case
I think pandas is what you are looking for.
You can load your file with df = pandas.read_csv(), then performing your division and replacement will be as easy as df[13] /= df[11].
Finally you can write your data back in csv format with df.to_csv().
I leave it to you to fill in the missing details of the pandas functions, but I promise it is very easy and you'll probably benefit from learning it for a long time.
Hope this helps
Forgive me if my questions is too general, or if its been asked before. I've been tasked to manipulate (e.g. copy and paste several range of entries, perform calculations on them, and then save them all to a new csv file) several large datasets in Python3.
What are the pros/cons of using the aforementioned libraries?
Thanks in advance.
I have not used CSV library, but many people are enjoying the benefits of Pandas. Pandas provides a lot of the tools you'll need, based off Numpy. You can easily then use more advance libraries for all sorts of analysis (sklearn for machine learning, nltk for nlp, etc.).
For your purposes, you'll find it easy to manage different cdv's, merge, concatenate, do whatever you want really.
Heres a link to a quick start guide. Lots of other resources out there as well.
getting started with pandas python
http://pandas.pydata.org/pandas-docs/stable/10min.html
Hope that helps a little bit.
You should always try to use as much as possible the work that other people have already been doing for you (such as programming the pandas library). This saves you a lot of time. Pandas has a lot to offer when you want to process such files so this seems to me to be the the best way to deal with such files. Since the question is very general, I can also only give a general answer... When you use pandas, you will however need to read more in the documentation. But I would not say that this is a downside.
I have a lot of data stored in generators, and i would like to sort them without using lists, to not go out of memory in the process. It's possible to sort the generators by this way?. I have some hours thinking this and i can't find a way to do it without saving the seen values somewhere (or there's a way saving them "partially"). I have read in google about lazy sorting, is that a nice approach? Thanks for the answers!!
EDIT: My final objective is to write all the sorted data to a file.
PS: sorry about my bad english ><
You should just write the data to your output file in non-sorted order, then sort it on the filesystem. If you're on Linux this is easily and very efficiently done using sort(1). Or if you want to do it within Python, try csvsort which is specifically designed for this.
I'm using the python dateutil module for a calendaring application which supports repeating events. I really like the ability to parse ical rrules using the rrulestr() function. Also, using rrule.between() to get dates within a given interval is very fast.
However, as soon as I try doing any other operations (ie: list slices, before(), after(),...) everything begins to crawl. It seems like dateutil tries to calculate every date even if all I want is to get the last date with rrule.before(datetime.max).
Is there any way of avoiding these unnecessary calculations?
My guess is probably not. The last date before datetime.max means you have to calculate all the recurrences up until datetime.max, and that will reasonably be a LOT of recurrences. It might be possible to add shortcuts for some of the simpler recurrences. If it is every year on the same date for example, you don't really need to compute the recurrences inbetween, for example. But if you have every third something you must, for example, and also if you have a maximum recurrences, etc. But I guess dateutil doesn't have these shortcuts. It would probably be quite complex to implement reliably.
May I ask why you need to find the last recurrence before datetime.max? It is, after all, almost eight thousand years into the future... :-)