Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have been given a set of 20,000 entries in Excel. Each entry is a string, and they are all names of events such as: Daytona 500, NASCAR, 3x1 Brand Rep, etc.
Many of the event names are repeated, and I would like to make a list and sort them and find the most common items in the list, and how many times each one is entered. I am half way through my first semester of Python and have just learned about lists, and would like to use Python 2.7 to do this task, but I am also open to using Excel or R if it makes more sense to use one of these.
I'm not sure where to start or how to input such a large list into a program.
In Excel I would use a PivotTable, about 15 seconds to set up:
your_list = ['Daytona 500', 'NASCAR'] # more values of course
Now use a dictionary comprehension to count items for each unique key.
your_dict = {i:your_list.count(i) for i in set(your_list)}
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
name=str(input("enter the string:"))
count=0
for x in name:
if x.isupper():
count=count+1
print("The number of capital letters found in the string is:",count)
How can I rewrite this code without a for loop that gets the same function?
Since this seems like a homework problem, it's probably not appropriate to just post an answer. To give you some hints:
you could re-write the for loop as a while loop that uses a counter
you could re-write the for loop as a while loop that pops characters off of name one-at-a-time, and terminates when name is empty
you could use a list comprehension with a filter to get just the upper-case characters, and report the length of the resulting string
you could write a recursive function
you could use filter the same way you would use a list comprehension
you could use sum, as suggested in comments above
you could use functools.reduce (or just reduce if you're using a geriatric python interpreter)
if you're feeling really perverse, you could use regular expressions
Along with probably a dozen other ways that I'm not thinking of now...
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a CSV in which a student appears on multiple lines. The goal is to obtain a CSV where the student's name appears only once and a "Sports" column is created where all the sports practiced by the student separated by a space converge (like the photos)
csv
final csv
I'm not going to post a full solution, as this sounds like a homework problem. If this is infact for a school assignment, please edit your question to include the information.
From your description, the problem can be broken into three steps, each of which can be independently written as code in your solution.
Parse a CSV file
Create a new data structure that reduces the number of rows and adds a new column
Output the data to a new CSV file.
Step 1 and 3 are the simplest. You will want to use things like with open('file', 'r'), list.split(), and ",".join()
For step 2, the problem is eaiser to understand if you think in terms of dictionaries. If you can turn your original data (which is a list of rows) into a dictionary of rows, then it becomes eaiser to detect duplicates. All dictionaries must have a unique key (or column in this case), and you already know that you have a key (student name) that you would like to be unique, but isn't.
Your code for step 2 will iterate over the list of rows, adding each one to a dictionary using student_name as a unique key. If that key already exists, then instead of adding a new entry, you will need to modify the existing entry's "sports" field.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a large data file where each row looks as follows, where each pipe-delimited value represents a consistent variable (i.e. 1517892812 and 1517892086 represent the Unix Timestamp, and the last pipe delimited object will always be UnixTimestamp)
264|2|8|6|1.32235000|1.33070000|1.31400000|1257.89480966|1517892812
399|10|36|2|1.12329614|1.12659227|1.12000000|148194.47200218|1517892086
How can I pull out the values I need to make variables in Python? For example, looking at a row and getting UnixTimestamp=1517892812 (and other variables) out of it.
I want to pull out each relevant variable per line, work with them, and then look at the next line and reevaluate all of the variable values.
Is RegEx what I should be dealing with here?
No need for regex, you can use split():
int(a.strip().split('|')[-1])
If all variable are only number and you want a matrix whit all your values you can simply do something like:
[int(line.strip().split('|')) for line in your_data.splitlines()]
You can use regex and re.search():
int(re.search(r'[^|]+$', text).group())
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have a loop like that;
for i in range(0,500):
But the rest of them takes more time. I want to split my loop for instance 5 steps. In the first step i want to run the first 100, at last, 401 to 500. But i don't want to write this loop five times.
Is there any short-way this kind of progressive run?
Just create a loop inside a loop:
for s in range(0, 500, 100):
for i in range(s, s+100)):
...
Since in python indices start and 0 and the range is not inclusive of the last number this does, 0-99, 100-199, ..., 400-499.
If time is what you are trying to trim, use xrange(), it is MUCH faster, especially when dealing with large numbers:
for i in xrange(500):
Edit: This is for Python 2.x, not 3.x!
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to write a Python 3 script to manage my expenses, and I'm going to have a rules filter that says 'if the description contains a particular string, categorize it as x', and these rules will be read in from a text file.
The only way I can think of doing this is to apply str.find() for each rule on the description of each transaction, and break if one is found - but this is a O^2 solution, is there a better way of doing this?
Strip punctuation from the description, and split it into words. Make the words in the description into a set, and the categories into another set.
Since sets use dictionaries internally and dictionaries are built on hash-tables, average membership checking is O(1).
Only when a transaction is entered (or changed), intersect both sets to find the categories that apply (if any), and add the categories to your transaction record (dict, namedtuple, whatever).