How to print two columns alphabetically in python - python

I have a list where I need to sort in alphabetic order based on two columns. My current code is indeed sorting it in alphabetic order but it is not sorting the persons last time as well. In my code below. I used itemgetter(3) to get the persons first name. But how would I do it in such that I am able to get 2 item getter such as itemgetter(3,4). I hope my questions make sense. Thank you.
note: is it possible to join the first and last name into one string? and then use that one string as the item getter?
My code.
def sortAlpha():
newFile = open("ScriptTesting3.txt","w")
def csv_itemgetter(index, delimiter=' ', default=''):
def composite(row):
try:
return row.split(delimiter)[index]
except IndexError:
return default
return composite
with open("ScriptTesting2.txt", "r") as file:
for eachline in sorted(file, key=csv_itemgetter(3)):
print(eachline.strip("\n") , file = newFile)
ScriptTesting2.txt
2312 filand 4 Joe Alex
4541 portlant 4 Alex Gray
5551 highlands 4 Alex Martin
My output
5551 highlands 4 Alex Martin
4541 portlant 4 Alex Gray
2312 filand 4 Joe Alex
Output should be
5551 highlands 4 Alex Gray
4541 portlant 4 Alex Martin
2312 filand 4 Joe Alex

You could try:
from operator import itemgetter
...
with open("ScriptTesting2.txt", "r") as file:
lines = file.readlines()
linesSplit = [l.split() for l in lines]
linesSplitSorted = sorted(linesSplit, key=itemgetter(3, 4))
for l in linesSplitSorted: print(' '.join(l))
See also: sorting how-to
(Actually, I think this is indeed correct:
leverages comparison of built-in lists.)
You could then try:
def itemGetterRest(idx):
def helper(seq):
return seq[idx:]
return helper
with open("Input.txt", "r") as file:
lines = file.readlines()
linesSplit = [l.split() for l in lines]
linesSplitSorted = sorted(linesSplit, key=itemGetterRest(3))
for l in linesSplitSorted: print(' '.join(l))

If you put the data in a excel instead of text and the read the excel through Panda libraries then you can sort by multiple columns very easily.
Suppose the data in excel is like this:
then you can read the data in Panda dataframe and sort:
import pandas as pd
data = pd.read_excel('ScriptTesting3.xlsx')
data = data.sort_values(by=['C','D'], ascending=True)
print(data)
I have chosen to sort by column C and D, you can choose whatever columns you want.

Related

Advance compare two or more csv files

I'm just learning a Python, and as everyone knows, the best way is practice ;)
And now I have a job, and I want to try to do it in python, but I need some advice.
Well... I have a few CSV files. The structure looks like:
1st CVS
workerID, workerName, workerPhoneNumber
2th and the other CSVs contains a subset of this first set.
I mean, in the first file there are, for example, 10,000 employees, and in each of them, there is a section of the same employees.
For example:
in the first file, I have
00001 Randal 555555
00002 Tom 66666
00003 Anthony 77775
00004 Mark 3424435
00005 Anna 3443223
00006 Monica 412415415
.....
in second file:
00001 Randal 555555
00004 Mark 3424435
00006 Monica 412415415
....
and 3th file:
00001 Randal 555555
00004 Mark 3424435
00005 Anna 3443223
....
I have to check the validity of all users in all files. I mean: check than Anna form all files have the same ID and phone in other filers and same for all results (that's huge file 100k rows). Then I will return all mismatches.
An addition problem is some "NA" in rows.
I've just finished a numpy tutorial, but i don't know how to bite it. I don't even know that a good practice to use a numpy. So I need your advice... how I can handle with this problem?
EDIT: Workes have unique names :) Its random string actually not a name :D just example :D in single file IDs is unique too
The use of standard functions and data structures will be enough.
Let's represents your files by a list of dictionaries using list comprehensions:
header = ('id', 'name', 'phone_number')
records_1 = [{k:v for k, v in zip(header, line.strip().split(' ')} } for line in open('path_to_file1', 'r')]
records_2 = [{k:v for k, v in zip(header, line.strip().split(' ')} } for line in open('path_to_file2', 'r')]
Then, if you want to check your records based on the user name, use a dictionary with the name as key and the record as value:
records_1 = {rec['name']: rec for rec in records_1}
records_2 = {rec['name']: rec for rec in records_2}
and check for each name if you have duplicated ids. If so, save it to output:
seen = {}
output = []
for records, others in [(records_1, records_2), (records_2, records_1)]:
for name, rec in records:
if name in seen:
continue
if rec['id'] != others['name']['id']:
output.append((name, rec, others['name']))
Note we could deduce the list of permutations using permutations from itertools:
https://docs.python.org/3/library/itertools.html
Hope this helps!

Writing to a file the same way you read from it

The title may be a little confusing but there was no other way I could explain it. I'm first importing scores from my file, then I sort them into order, then I try to export them back to my file however in the same way it was imported - not as a list.
Here's a little sketch:
Import as ' James 120 ' into list [James, 120]
Export as James 120
Here's what I have so far:
def Leaderboard(User, Score):
Scores = open("Scores.txt", "r+")
content = Scores.readlines()
new = []
for i in content:
temp = []
newlist = i.split(" ")
temp.append(newlist[0])
temp.append(int(newlist[1]))
new.append(temp)
temp = []
temp.append(User)
temp.append(Score)
new.append(temp)
new = sorted(new, key=itemgetter(1), reverse=True)
print(new)
Scores.close()
Leaderboard('Hannah', 3333)
The file currently looks like this:
Olly 150
Billy 290
Graham 320
James 2
Alex 333
Here is the end result:
[['Hannah', 3333], ['Alex', 333], ['Graham', 320], ['Billy', 290], ['Olly',
150], ['James', 2]]
Here's what I want it exported as to my file:
Hannah 3333
Alex 333
Graham 320
Billy 290
Olly 150
James 2
The writing code would be something like this:
Scores.seek(0) # go to the beginning
for name, number in new:
print(name, number, file=Scores)
You can use file.read(), which will read the whole file. I believe you can just run contents = file.read(), as it returns as a string. There is also file.readlines() which returns a list of strings, each string being one line.
If this does not completely answer what you want, read more here.

How to check how close a number is to another number?

I have a TEXT FILE that looks like:
John: 27
Micheal8483: 160
Mary Smith: 57
Adam 22: 68
Patty: 55
etc etc. They are usernames that is why their names contain numbers occasionally. What I want to do is check each of their numbers (the ones after the ":") and get the 3 names that have the numbers that are closest in value to a integer (specifically named targetNum). It will always be positive.
I have tried multiple things but I am new to Python and I am not really sure how to go about this problem. Any help is appreciated!
You can parse the file into a list of name/number pairs. Then sort the list by difference between a number and targetNum. The first three items of the list will then contain the desired names:
users = []
with open("file.txt") as f:
for line in f:
name, num = line.split(":")
users.append((name, int(num)))
targetNum = 50
users.sort(key=lambda pair: abs(pair[1] - targetNum))
print([pair[0] for pair in users[:3]]) # ['Patty', 'Mary Smith', 'Adam 22']
You could use some regex recipe here :
import re
pattern=r'(\w.+)?:\s(\d+)'
data_1=[]
targetNum = 50
with open('new_file.txt','r') as f:
for line in f:
data=re.findall(pattern,line)
for i in data:
data_1.append((int(i[1])-targetNum,i[0]))
print(list(map(lambda x:x[1],data_1[-3:])))
output:
['Mary Smith', 'Adam 22', 'Patty']

Python: read a text file into a dictionary

I really hope you can help me since I'm quite new to Python.
I have a simple text file, without any columns, just rows. Something like this:
Bob
Opel
Mike
Ford
Rodger
Renault
Mary
Volkswagen
Note that in the text file the names and the cars are without the additional enter. I had to this, otherwise, StackOverflow would project the names next to each other.
The idea is to create a dictionary out of the text file to get a format like this:
{[Bob : Opel], [Mike : Ford], [Rodger : Renault], [Mary : Volkswagen]}
Can you guys help me out and give an example on how to do this? Would be much appreciated!
you can open a file and iterate through the lines with readlines
lines, dict = open('file.dat', 'r').readlines(), {}
for i in range(len(lines)/2):
dict[lines[i*2]] = lines[i*2+1]
print dict
with open('test.txt', 'r') as file_:
dict = {}
array = list(filter(lambda x: x != '', file_.read().split('\n')))
for i in range(0, len(array), 2):
dict[array[i]] = array[i+1]
print(dict)

Ascending python text

I have simple problem, i created game and in the end I append score to textfile. Now i have something like this in this file:
John: 11
Mike: 5
John: 78
John: 3
Steve: 30
i want give user possibility to read top 3 scores. Now i created this:
with open(r'C:/path/to/scores.txt', 'r') as f:
for line in f:
data = line.split()
print '{0[0]:<15}{0[1]:<15}'.format(data)
I have this:
John: 11
Mike: 5
John: 78
John: 3
Steve: 30
It looks better but how can i show only three best results with place and highest first etc?
Something like that:
1. John: 78
2. Steve: 30
3. John: 11
You can edit your code a little bit to store the scores in a list, then sort them using the sorted function. Then you can just take the first three scores of your sorted list.
with open(r'doc.txt', 'r') as f:
scores = []
for line in f:
data = line.split()
scores.append(data)
top3 = sorted(scores, key = lambda x: int(x[1]), reverse=True)[:3]
for score in top3:
print '{0[0]:<15}{0[1]:<15}'.format(score)
As in my answer to a very similar question, the answer could be just used sorted; slicing the result to get only three top scores is trivial.
That said, you could also switch to using heapq.nlargest over sorted in this case; it takes a key function, just like sorted, and unlike sorted, it will only use memory to store the top X items (and has better theoretical performance when the set to extract from is large and the number of items to keep is small). Aside from not needing reverse=True (because choosing nlargest already does that), heapq.nlargest is a drop in replacement for sorted from that case.
Depending on what else you might want to do with the data I think pandas is a great option here. You can load it into pandas like so:
import pandas as pd
df = []
with open(r'C:/path/to/scores.txt', 'r') as f:
for line in f:
data = line.split()
df.append({'Name': data[0], 'Score': data[1]})
df = pd.DataFrame(df)
Then you can sort by score and show the top three
df.sort('Score', ascending=False)[:3]
I recommend reading all of the pandas documentation to see everything it can do
EDIT: For easier reading you can do something like
df = pd.read_table('C:/path/to/scores.txt')
But this would require you to put column headings in that file first
with open(r'C:/path/to/scores.txt', 'r') as f:
scores = []
for line in f:
line = line.strip().split()
scores.append((line[0], int(line[1]))
sorted_scores = sorted(scores, key=lambda s: s[1], reverse=True)
top_three = sorted_scores[:3]
This will read every line, strip extra whitespace, and split the line, then append it to the scores list. Once all scores have been added, the list gets sorted using the key of the 2nd item in the (name, score) tuple, in reverse, so that the scores run from high-to-low. Then the top_three slices the first 3 items from the sorted scores.
This would work, and depending on your coding style, you could certainly consolidate some of these lines. For the sake of the example, I simply have the contents of your score file in a string:
score_file_contents = """John: 11
Mike: 5
John: 78
John: 3
Steve: 30"""
scores = []
for line in score_file_contents.splitlines(): # Simulate reading your file
name, score = line.split(':') # Extract name and score
score = int(score) # Want score as an integer
scores.append((score, name)) # Make my list
scores.sort(reverse=True) # List of tuples sorts on first tuple element
for ranking in range(len(scores)): # Iterate using an index
if ranking < 3: # How many you want to show
score = scores[ranking][0] # Extract score
name = scores[ranking][1] # Extract name
print("{}. {:<10} {:<3}".format(ranking + 1, name + ":", score))
Result:
1. John: 78
2. Steve: 30
3. John: 11

Categories

Resources