I have a TEXT FILE that looks like:
John: 27
Micheal8483: 160
Mary Smith: 57
Adam 22: 68
Patty: 55
etc etc. They are usernames that is why their names contain numbers occasionally. What I want to do is check each of their numbers (the ones after the ":") and get the 3 names that have the numbers that are closest in value to a integer (specifically named targetNum). It will always be positive.
I have tried multiple things but I am new to Python and I am not really sure how to go about this problem. Any help is appreciated!
You can parse the file into a list of name/number pairs. Then sort the list by difference between a number and targetNum. The first three items of the list will then contain the desired names:
users = []
with open("file.txt") as f:
for line in f:
name, num = line.split(":")
users.append((name, int(num)))
targetNum = 50
users.sort(key=lambda pair: abs(pair[1] - targetNum))
print([pair[0] for pair in users[:3]]) # ['Patty', 'Mary Smith', 'Adam 22']
You could use some regex recipe here :
import re
pattern=r'(\w.+)?:\s(\d+)'
data_1=[]
targetNum = 50
with open('new_file.txt','r') as f:
for line in f:
data=re.findall(pattern,line)
for i in data:
data_1.append((int(i[1])-targetNum,i[0]))
print(list(map(lambda x:x[1],data_1[-3:])))
output:
['Mary Smith', 'Adam 22', 'Patty']
Related
I'm trying to read a file which contains both names and numbers but I don't know how to convert the numbers to integers because they are on the same line with a string. I also want to sort it with the numbers.
This is like a scoreboard system and I want to print the top 5 player's scores and their names. Their scores are appended to a text file after they play a game which is read. The text file will have many more player's scores and names.
the file looks like this (we will call it data.txt):
Jack, 14
Amy, 2
Rock, 58
Jammy, 44
Once read, this is what the list looks like:
['Jack, 14', 'Amy, 2', 'Rock, 58', 'Jammy, 44']
This is what I have done so far:
file = open("data.txt", "r")
Data = file.readlines()
Data = [line.rstrip('\n') for line in Data]
I have tried these:
Data.sort(key = lambda x: x[1])
Data = list(map(int, Data))
However, it shows an error because there is also a string on the same line which I don't want to be converted to an integer.
What I'm hoping to be output is:
Amy, 2
Jack, 14
Jammy, 44
Rock, 58
I just want to know how to sort by the numbers (scores) in ascending order with a newline.
You could use the following key when sorting:
data = ['Jack, 14', 'Amy, 2', 'Rock, 58', 'Jammy, 44']
data.sort(key=lambda s: int(s.split(',')[1].strip()))
print(data)
Output
['Amy, 2', 'Jack, 14', 'Jammy, 44', 'Rock, 58']
The idea is to split the string on ',' then remove any trailing whitespace from the second element and convert to an int.
It's probably best to store the names and scores in pairs rather than as a string
from operator import itemgetter as ig
sorted_pairs = sorted(((pair[0], int(pair[1])) for pair in (line.split(', ') for line in file.readlines())), key = ig(0))
This is a python program written in pyspark ipython notebook. I am trying to count the number of instances of words given in the list 'names' in each RDD(can be considered as file) using for loop. I want to store the count for a word in each file in a list which has same name an word.
For eg. suppose count of word harry in 1 st RDD is 1214, in 2nd RDD is 1506 n so on. I want to create a list
harryList = [1214, 1506, 1825, 2933, 3748, 2617, 2887]
the list of names is dynamic.
names = ['harry', 'hermione','ron','hagrid']
rdds = [hp1RDD,hp2RDD,hp3RDD,hp4RDD,hp5RDD,hp6RDD,hp7RDD]
for n in names:
a = []
for x in rdds:
a.append(x.flatMap(lambda line: line.split(" ")).filter(lambda word: word==n).count())
print a
with code above I can print the contents of list but I cannot save it the way shown above.
If you don't mind having:
words like hagrid's to be counted independently from hagrid
Using collections.Counter will help:
from collections import Counter
hp1RDD = "harry potter has a girlfriend who's name is hermione granger and a friend called ron. harry has an uncle who's name is hagrid. hagrid is a big guy"
hp2RDD = "harry potter is the best movie I've ever saw. hermione is very beautfiful"
names = ['harry', 'hermione','ron','hagrid']
rdds = [hp1RDD, hp2RDD]
results = dict()
for name in names:
tmp_list = list()
for rdd in rdds:
count = Counter(rdd.split())
tmp_list.append(count[name])
results[name] = tmp_list
print results
Also, you could use case-insensitive version just by using lower():
count = Counter([x.lower() for x in rdd.split()])
I have simple problem, i created game and in the end I append score to textfile. Now i have something like this in this file:
John: 11
Mike: 5
John: 78
John: 3
Steve: 30
i want give user possibility to read top 3 scores. Now i created this:
with open(r'C:/path/to/scores.txt', 'r') as f:
for line in f:
data = line.split()
print '{0[0]:<15}{0[1]:<15}'.format(data)
I have this:
John: 11
Mike: 5
John: 78
John: 3
Steve: 30
It looks better but how can i show only three best results with place and highest first etc?
Something like that:
1. John: 78
2. Steve: 30
3. John: 11
You can edit your code a little bit to store the scores in a list, then sort them using the sorted function. Then you can just take the first three scores of your sorted list.
with open(r'doc.txt', 'r') as f:
scores = []
for line in f:
data = line.split()
scores.append(data)
top3 = sorted(scores, key = lambda x: int(x[1]), reverse=True)[:3]
for score in top3:
print '{0[0]:<15}{0[1]:<15}'.format(score)
As in my answer to a very similar question, the answer could be just used sorted; slicing the result to get only three top scores is trivial.
That said, you could also switch to using heapq.nlargest over sorted in this case; it takes a key function, just like sorted, and unlike sorted, it will only use memory to store the top X items (and has better theoretical performance when the set to extract from is large and the number of items to keep is small). Aside from not needing reverse=True (because choosing nlargest already does that), heapq.nlargest is a drop in replacement for sorted from that case.
Depending on what else you might want to do with the data I think pandas is a great option here. You can load it into pandas like so:
import pandas as pd
df = []
with open(r'C:/path/to/scores.txt', 'r') as f:
for line in f:
data = line.split()
df.append({'Name': data[0], 'Score': data[1]})
df = pd.DataFrame(df)
Then you can sort by score and show the top three
df.sort('Score', ascending=False)[:3]
I recommend reading all of the pandas documentation to see everything it can do
EDIT: For easier reading you can do something like
df = pd.read_table('C:/path/to/scores.txt')
But this would require you to put column headings in that file first
with open(r'C:/path/to/scores.txt', 'r') as f:
scores = []
for line in f:
line = line.strip().split()
scores.append((line[0], int(line[1]))
sorted_scores = sorted(scores, key=lambda s: s[1], reverse=True)
top_three = sorted_scores[:3]
This will read every line, strip extra whitespace, and split the line, then append it to the scores list. Once all scores have been added, the list gets sorted using the key of the 2nd item in the (name, score) tuple, in reverse, so that the scores run from high-to-low. Then the top_three slices the first 3 items from the sorted scores.
This would work, and depending on your coding style, you could certainly consolidate some of these lines. For the sake of the example, I simply have the contents of your score file in a string:
score_file_contents = """John: 11
Mike: 5
John: 78
John: 3
Steve: 30"""
scores = []
for line in score_file_contents.splitlines(): # Simulate reading your file
name, score = line.split(':') # Extract name and score
score = int(score) # Want score as an integer
scores.append((score, name)) # Make my list
scores.sort(reverse=True) # List of tuples sorts on first tuple element
for ranking in range(len(scores)): # Iterate using an index
if ranking < 3: # How many you want to show
score = scores[ranking][0] # Extract score
name = scores[ranking][1] # Extract name
print("{}. {:<10} {:<3}".format(ranking + 1, name + ":", score))
Result:
1. John: 78
2. Steve: 30
3. John: 11
I am just starting out with programming and am learning Python. I am having some troubles searching and removing from a text file. The text file contains a list of single spaced names. I need to have the user input a name and have it and the two following items removed from list.
Right now I am able to find and remove the searched for name and write the new list to the text file but I can't figure out how to remove the next two items. I tried using list.index to get the position of the searched for name but it gives the location of the first letter in the name. Is there a way that I can search the input word and get the location of the whole word ('bob','tom','jill') (0,1,2) and use this to do what I need done?
Thanks.
Assuming the contacts file is three lines per contact, an example file might look like this:
Fred
58993884
AnyTown
Mary
61963888
SomeCity
Bill
78493883
OtherTown
Anne
58273854
AnyCity
Script:
x = raw_input('Enter Name of Contact to Delete: ')
# do a case-insensitive match for names
target = x.strip().lower()
# create a list of all valid contacts
skip = False
contacts = []
with open('contacts.txt', 'r') as stream:
for index, line in enumerate(stream):
# check every third line
if not index % 3:
skip = (line.strip().lower() == target)
if skip:
print 'Removed Contact:', line
if not skip:
contacts.append(line)
# re-open the contacts file for writing
with open('contacts.txt', 'w') as stream:
stream.write(''.join(contacts))
Output:
$ python2 /home/baz/code/misc/test.py
Enter Name of Contact to Delete: Mary
Removed Contact: Mary
$ cat contacts.txt
Fred
58993884
AnyTown
Bill
78493883
OtherTown
Anne
58273854
AnyCity
Instead of manipulating the list string of the names it would be better to manipulate a list of string names. You can easily convert the "big string" into a list using string.split:
names_string = 'john robert jimmy'
names_list = names_string.split(' ') # names_list = ['john', 'robert', 'jimmy']
Now, you can easily add, remove or search names in this list, using basic list functions:
names_list.append('michael') # names_list = ['john', 'robert', 'jimmy', 'michael']
names_list.remove('robert') # names_list = ['john', 'jimmy', 'michael']
jimmy_position = names_list.index('jimmy') # jimmy_position = 1
Remember handling the exceptions when the element is not in the list.
To convert the list of names into a "big string" again, you can use string.join:
names_string = ' '.join(names_list)
i've searched pretty hard and cant find a question that exactly pertains to what i want to..
I have a file called "words" that has about 1000 lines of random A-Z sorted words...
10th
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
a
AAA
AAAS
Aarhus
Aaron
AAU
ABA
Ababa
aback
abacus
abalone
abandon
abase
abash
abate
abater
abbas
abbe
abbey
abbot
Abbott
abbreviate
abc
abdicate
abdomen
abdominal
abduct
Abe
abed
Abel
Abelian
I am trying to load this file into a dictionary, where using the word are the key values and the keys are actually auto-gen/auto-incremented for each word
e.g {0:10th, 1:1st, 2:2nd} ...etc..etc...
below is the code i've hobbled together so far, it seems to sort of works but its only showing me the last entry in the file as the only dict pair element
f3data = open('words')
mydict = {}
for line in f3data:
print line.strip()
cmyline = line.split()
key = +1
mydict [key] = cmyline
print mydict
key = +1
+1 is the same thing as 1. I assume you meant key += 1. I also can't see a reason why you'd split each line when there's only one item per line.
However, there's really no reason to do the looping yourself.
with open('words') as f3data:
mydict = dict(enumerate(line.strip() for line in f3data))
dict(enumerate(x.rstrip() for x in f3data))
But your error is key += 1.
f3data = open('words')
print f3data.readlines()
The use of zero-based numeric keys in a dict is very suspicious. Consider whether a simple list would suffice.
Here is an example using a list comprehension:
>>> mylist = [word.strip() for word in open('/usr/share/dict/words')]
>>> mylist[1]
'A'
>>> mylist[10]
"Aaron's"
>>> mylist[100]
"Addie's"
>>> mylist[1000]
"Armand's"
>>> mylist[10000]
"Loyd's"
I use str.strip() to remove whitespace and newlines, which are present in /usr/share/dict/words. This may not be necessary with your data.
However, if you really need a dictionary, Python's enumerate() built-in function is your friend here, and you can pass the output directly into the dict() function to create it:
>>> mydict = dict(enumerate(word.strip() for word in open('/usr/share/dict/words')))
>>> mydict[1]
'A'
>>> mydict[10]
"Aaron's"
>>> mydict[100]
"Addie's"
>>> mydict[1000]
"Armand's"
>>> mydict[10000]
"Loyd's"
With keys that dense, you don't want a dict, you want a list.
with open('words') as fp:
data = map(str.strip, fp.readlines())
But if you really can't live without a dict:
with open('words') as fp:
data = dict(enumerate(X.strip() for X in fp))
{index: x.strip() for index, x in enumerate(open('filename.txt'))}
This code uses a dictionary comprehension and the enumerate built-in, which takes an input sequence (in this case, the file object, which yields each line when iterated through) and returns an index along with the item. Then, a dictionary is built up with the index and text.
One question: why not just use a list if all of your keys are integers?
Finally, your original code should be
f3data = open('words')
mydict = {}
for index, line in enumerate(f3data):
cmyline = line.strip()
mydict[index] = cmyline
print mydict
Putting the words in a dict makes no sense. If you're using numbers as keys you should be using a list.
from __future__ import with_statement
with open('words.txt', 'r') as f:
lines = f.readlines()
words = {}
for n, line in enumerate(lines):
words[n] = line.strip()
print words