Python 3: Extracting Data from a .txt File?

Python 3: Extracting Data from a .txt File? - python

So, I have this file that has data set up like this:
Bob 5 60
Carl 7 80
Rick 8 100
Santiago 7 30
I need to separate each part into three different lists. One for the name, one for the first number, and one for the second number.
But I don't really understand, how exactly do I extract those parts? Also, let's say I want to make a tuple with the first line, with each of the different parts (the name, first number, and second number) into a single tuple?
I just don't get how I extract that information.
I just learned how to read and write text files...so I'm pretty clueless.
EDIT: As a note, the text file already exists. The program I'm working on needs to read the text file, which has its data formatted in the way I listed.

You can split each line on whitespace:
with open(yourfile) as f:
rows = [l.split() for l in f]
names, firstnums, secondnums = zip(*rows)
zip(*iterable) re-arranges the 3 columns into 3 lists.

Would not the module Pickle be ideal here? Pickle gives Python functionality to load and save things that need to be 'useable' in Python, so instead of just importing a string from a text file and having to parse it, pickle can load it and give you the actual container you're trying to work with.
example:
import pickle
myList = ["Bob", 1, 2]
listToBeSaved = pickle.dumps(myList) # write this data to your save file
#insert code where you work with the file and save it
#.........
#upon needing to open and work with this file
listToBeLoaded = open(fileYouWroteTo)
listTranslated = pickle.loads(listToBeLoaded) # turns the loaded data back into a proper list

Related

Editing of txt files not saving when I concatenate them

I am fairly new to programming, so bear with me!
We have a task at school which we are made to clean up three text files ("Balance1", "Saving", and "Withdrawal") and append them together into a new file. These files are just names and sums of money listed downwards, but some of it is jumbled. This is my code for the first file Balance1:
with open('Balance1.txt', 'r+') as f:
f_contents = f.readlines()
# Then I start cleaning up the lines. Here I edit Anna's savings to an integer.
f_contents[8] = "Anna, 600000"
# Here I delete the blank lines and edit in the 50000 to Philip.
del f_contents[3]
del f_contents[3]
In the original text file Anna's savings is written like this: "Anna, six hundred thousand" and we have to make it look clean, so its rather "NAME, SUM (as integer). When I print this as a list it looks good, but after I have done this with all three files I try to append them together in a file called "Balance.txt" like this:
filenames = ["Balance1.txt", "Saving.txt", "Withdrawal.txt"]
with open("Balance.txt", "a") as outfile:
for filename in filenames:
with open(filename) as infile:
contents = infile.read()
outfile.write(contents)
When I check the new text file "Balance" it has appended them together, but just as they were in the beginning and not with my edits. So it is not "cleaned up". Can anyone help me understand why this happens, and what I have to do so it appends the edited and clean versions?

In the first part, where you do the "editing" of Balance.txt` file, this is what happens:
You open the file in read mode
You load the data into memory
You edit the in memory data
And voila.
You never persisted the changes to any file on the disk. So when in the second part you read the content of all the files, you will read the data that was originally there.
So if you want to concatenate the edited data, you have 2 choices:
Pre-process the data by creating 3 final correct files (editing Balance1.txt and persisting it to another file, say Balance1_fixed.txt) and then in the second part, concatenate: ["Balance1_fixed.txt", "Saving.txt", "Withdrawal.txt"]. Total of 4 data file openings, more IO.
Use only the second loop you have, and correct the contents before writing it to the outfile. You can use readlines() like you did first, edit the specific line and then use writelines(). Total of 3 data file openings, less IO than previous option

Get different strings from a file and write a .txt

I'am trying to get lines from a text file (.log) into a .txt document.
I need get into my .txt file the same data. But the line itself is sometimes different. From what I have seen on internet, it's usualy done with a pattern that will anticipate how the line is made.
1525:22Player 11 spawned with userinfo: \team\b\forcepowers\0-5-030310001013001131\ip\46.98.134.211:24806\rate\25000\snaps\40\cg_predictItems\1\char_color_blue\34\char_color_green\34\char_color_red\34\color1\65507\color2\14942463\color3\2949375\color4\2949375\handicap\100\jp\0\model\desann/default\name\Faybell\pbindicator\1\saber1\saber_malgus_broken\saber2\none\sex\male\ja_guid\420D990471FC7EB6B3EEA94045F739B7\teamoverlay\1
The line i'm working with usualy looks like this. The data i'am trying to collect are :
\ip\0.0.0.0
\name\NickName_of_the_player
\ja_guid\420D990471FC7EB6B3EEA94045F739B7
And print these data, inside a .txt file. Here is my current code.
As explained above, i'am unsure about what keyword to use for my research on google. And how this could be called (Because the string isn't the same?)
I have been looking around alot, and most of the test I have done, have allowed me to do some things, but i'am not yet able to do as explained above. So i'am in hope for guidance here :) (Sorry if i'am noobish, I understand alot how it works, I just didn't learned language in school, I mostly do small scripts, and usualy they work fine, this time it's way harder)
def readLog(filename):
with open(filename,'r') as eventLog:
data = eventLog.read()
dataList = data.splitlines()
return dataList
eventLog = readLog('games.log')

You'll need to read the files in "raw" mode rather than as strings. When reading the file from disk, use open(filename,'rb'). To use your example, I ran
text_input = r"1525:22Player 11 spawned with userinfo: \team\b\forcepowers\0-5-030310001013001131\ip\46.98.134.211:24806\rate\25000\snaps\40\cg_predictItems\1\char_color_blue\34\char_color_green\34\char_color_red\34\color1\65507\color2\14942463\color3\2949375\color4\2949375\handicap\100\jp\0\model\desann/default\name\Faybell\pbindicator\1\saber1\saber_malgus_broken\saber2\none\sex\male\ja_guid\420D990471FC7EB6B3EEA94045F739B7\teamoverlay\1"
text_as_array = text_input.split('\\')
You'll need to know which columns contain the strings you care about. For example,
with open('output.dat','w') as fil:
fil.write(text_as_array[6])
You can figure these array positions from the sample string
>>> text_as_array[6]
'46.98.134.211:24806'
>>> text_as_array[34]
'Faybell'
>>> text_as_array[44]
'420D990471FC7EB6B3EEA94045F739B7'
If the column positions are not consistent but the key-value pairs are always adjacent, we can leverage that
>>> text_as_array.index("ip")
5
>>> text_as_array[text_as_array.index("ip")+1]
'46.98.134.211:24806'

How to make an input function to find something in a certain row in a CSV file using Tkinter

Okay so I'm making a GUI to look for a specific number in a CSV file basically
ID: Name: Address: Email:
1023 John 123 Normal St John123#hotmail.com
So basically I want the person using the GUI to type in the ID and the GUI just goes through the CSV file and prints the whole row
Also I'm a novice coder so please don't judge me if I keep asking what a certain element is or what does this thing do
Thank you

First Off CSV stands for Comma Separated Values so naturally,we would expect your columns to be comma-separated not space -separated,I tried to adapt the code with your data but in the future,do separate your .csv data with commas
You could simply use readlines to read the contents of a .csv file.The logic you are looking for will be something like:
idx = 2710
with open('my.csv','r') as f:
data = f.readlines()
for d in data:
if d.find(str(idx)) != -1:
print(d)
break
i have already answered a similar question on the tkinter gui here,which applies to the same case as yours,all you have to do is paste the above logic into the on_click function and replace print(d) with text.insert(INSERT, d)
I would recommend reading about tkinter,also just a side dish pandas has some nice functions for working with .csv files,worth reading about

Iteratively replace two strings with values from numpy array

I'm currently trying to make an automation script for writing new files from a master where there are two strings I want to replace (x1 and x2) with values from a 21 x 2 array of numbers (namely, [[0,1000],[50,950],[100,900],...,[1000,0]]). Additionally, with each double replacement, I want to save that change as a unique file.
Here's my script as it stands:
import numpy
lines = []
x1x2 = numpy.array([[0,1000],[50,950],[100,900],...,[1000,0])
for i,j in x1x2:
with open("filenamexx.inp") as infile:
for line in infile:
linex1 = line.replace('x1',str(i))
linex2 = line.replace('x2',str(j))
lines.append(linex1)
lines.append(linex2)
with open("filename"+str(i)+str(j)+".inp", 'w') as outfile:
for line in lines:
outfile.write(line)
With my current script there are a few problems. First, the string replacements are being done separately, i.e. I end up with a new file that contains the contents of the master file twice where one line has the first change and then the next will reflect the second separately. Second, with each subsequent iteration, the new files have the contents of the previous file prepended (i.e. filename100900.inp will contain its unique contents as well as the contents of both filename01000.inp and filename50950.inp before it). Anyone think they can take a crack at solving my problem?
Note: I've looked at using regex module solutions (somehing like this: https://www.safaribooksonline.com/library/view/python-cookbook-2nd/0596007973/ch01s19.html) in order to do multiple replacements in a single pass, but I'm not sure if the way I'm indexing is translatable to a dictionary object.

I'm not sure I understood the second issue but you can use replace more than one time on the same string, so:
s = "x1 stuff x2"
s = s.replace('x1',str(1)).replace('x2',str(2))
print(s)
, will output:
1 stuff 2
No need to do this two times for two different variables. As for the second issue it just seems as your not "reset-ing" the "lines" variable before starting to write a new file. So once you finish writing a file just add:
lines = []
It should be enough to solve these issues.

Building on "How to read and write a table / matrix to file with python?"

Back in Feb 8 '13 at 20:20, YamSMit asked a question (see: How to read and write a table / matrix to file with python?) similar to what I am struggling with: starting out with an Excel table (CSV) that has 3 columns and a varying number of rows. The contents of the columns are string, floating point, and string. The first string will vary in length, while the other string can be fixed (eg, 2 characters). The table needs to go into a 2 dimensional array, so that I can do manipulations on the data to produce a final file (which will be a text file). I have experimented with a variety of strategies presented in stackoverflow, but I am always missing something, and I haven't seen an example with all the parts, which is the reason for the struggle to figure this out.
Sample data will be similar to:
Ray Smith, 41645.87778, V1
I have read and explored numpy and astropy since the available documentation says they make this type of code easy. I have tried import csv. Somehow, the code doesn't come together. I should add that I am writing in Python 3.2.3 (which seems to be a mistake since a lot of documentation is for Python 2.x).
I realize the basic nature of this question directs me to read more tutorials. I have been reading many, yet the tutorials always refer to enough that is different, that I fail to assemble the right pieces: read the table file, write into a 2D array, then... do more stuff.
I am grateful to anyone who might provide me with a workable outline of the code, or pointing me to specific documentation I should read to handle the specific nature of the code I am trying to write.
Many thanks in advance. (Sorry for the wordiness - just trying to be complete.)

I am more familiar with 2.x, but from the 3.3 csv documentation found here, it seems to be mostly the same as 2.x. The following function will read a csv file, and return a 2D array of the rows found in the file.
import csv
def read_csv(file_name):
array_2D = []
with open(file_name, 'rb') as csvfile:
read = csv.reader(csvfile, delimiter=';') #Assuming your csv file has been set up with the ';' delimiter - there are other options, for which you should see the first link.
for row in read:
array_2D.append(row)
return array_2D
You would then be able to manipulate the data as follows (assuming your csv file is called 'foo.csv' and the desired text file is 'foo.txt'):
data = read_csv('foo.csv')
with open('foo.txt') as textwrite:
for row in data:
string = '{0} has {1} apples in his Ford {2}.\n'.format(row[0], row[1], row[2])
textwrite.write(string)
#if you know the second column is a float:
manipulate = float(row[1])*3
textwrite.write(manipulate)
string would then be written to 'foo.txt' as:
Ray Smith has 41645.87778 apples in his Ford V1.\n
and maniuplate would be written to 'foo.txt' as:
124937.63334

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.