Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to write a Python 3 script to manage my expenses, and I'm going to have a rules filter that says 'if the description contains a particular string, categorize it as x', and these rules will be read in from a text file.
The only way I can think of doing this is to apply str.find() for each rule on the description of each transaction, and break if one is found - but this is a O^2 solution, is there a better way of doing this?
Strip punctuation from the description, and split it into words. Make the words in the description into a set, and the categories into another set.
Since sets use dictionaries internally and dictionaries are built on hash-tables, average membership checking is O(1).
Only when a transaction is entered (or changed), intersect both sets to find the categories that apply (if any), and add the categories to your transaction record (dict, namedtuple, whatever).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
all.
I have a question on how to add missing values to a dataset object.
I'm currently working on crop growth modeling, and employ NASA Power API as a weather dataset.
However, the NASA Power dataset has missing days.
enter image description here
I used pcse library in order to extract NASA Power dataset.
My question is, how to add the missing day's data.
I tried
wdp(date) = wdp(date-timedelta(days=1))
but it gives me back 'can't assign to function call'
anyhow, it seems that the data for the missing date does not exist in the object and I am not allowed to make it.
You have the right idea, but the wrong syntax. In Python, list and dict access uses square brackets ([]), see the docs.
To add to that, pcse’s WeatherDataProvider object does not support this style access. Checking out the code in this link, it appears there is a method you can call named _store_WeatherDataContainer, where the leading _ indicates it is not intended for public use, but that doesn’t mean you can’t :-)
It should look like this:
wdp._store_WeatherDataContainer(wdp(date-timedelta(days=1)), date)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a large data file where each row looks as follows, where each pipe-delimited value represents a consistent variable (i.e. 1517892812 and 1517892086 represent the Unix Timestamp, and the last pipe delimited object will always be UnixTimestamp)
264|2|8|6|1.32235000|1.33070000|1.31400000|1257.89480966|1517892812
399|10|36|2|1.12329614|1.12659227|1.12000000|148194.47200218|1517892086
How can I pull out the values I need to make variables in Python? For example, looking at a row and getting UnixTimestamp=1517892812 (and other variables) out of it.
I want to pull out each relevant variable per line, work with them, and then look at the next line and reevaluate all of the variable values.
Is RegEx what I should be dealing with here?
No need for regex, you can use split():
int(a.strip().split('|')[-1])
If all variable are only number and you want a matrix whit all your values you can simply do something like:
[int(line.strip().split('|')) for line in your_data.splitlines()]
You can use regex and re.search():
int(re.search(r'[^|]+$', text).group())
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What I am trying to do is to create a Multiple Choice Question (MCQ) generation to our fill in the gap style question generator. I need to generate distracters (Wrong answers) from the Key (correct answer). The MCQ is generated from educational texts that users input. We're trying to tackle this through combining Contextual similarity, similarity of the sentences in which the keys and the distractors occur in and Difference in term frequencies Any help? I was thinking of using big data datasets to generate related distractors such as the ones provided by google vision, I have no clue how to achieve this in python.
This question is way too broad to be answered, though I would do my best to give you some pointers.
If you have a closed set of potential distractors, I would use word/phrase embedding to find the closest distractor to the right answer.
Gensim's word2vec is a good starting point in python
If you want your distractors to follow a template, for example replace a certain word from the right answer with its opposite, I would use nltk's wordnet implementation to find antonyns / synonyms.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to use phonetic algorithms like Soundex and/or Metaphone to generate words that sound similar to a given dictionary word. Do I have to have a corpus of all dictionary words for doing that? Is there another way to generate words that sound similar to a given word without using a corpus? I am trying to do it in Python.
If you don't use a corpus, then you will probably have to manually define a set of rules to split a word in phonetic parts and then find the list of close phonemes. This can generate similar sounding words but most won't exist. If you want to generate close sounding words that exist, then you necessarily need a corpus.
You didn't precise the goal of your task, but you may be interested in the works of Will Leben "Sounder I" (and II and III) and Jabberwocky sentences.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have been given a set of 20,000 entries in Excel. Each entry is a string, and they are all names of events such as: Daytona 500, NASCAR, 3x1 Brand Rep, etc.
Many of the event names are repeated, and I would like to make a list and sort them and find the most common items in the list, and how many times each one is entered. I am half way through my first semester of Python and have just learned about lists, and would like to use Python 2.7 to do this task, but I am also open to using Excel or R if it makes more sense to use one of these.
I'm not sure where to start or how to input such a large list into a program.
In Excel I would use a PivotTable, about 15 seconds to set up:
your_list = ['Daytona 500', 'NASCAR'] # more values of course
Now use a dictionary comprehension to count items for each unique key.
your_dict = {i:your_list.count(i) for i in set(your_list)}