read and split information from text file python - python

I'm stuck on a problem:
I have a text file called id_numbers.txt that contains this information:
325255, Jan Jansen
334343, Erik Materus
235434, Ali Ahson
645345, Eva Versteeg
534545, Jan de Wilde
345355, Henk de Vries
I need python to split the information at the comma and write a program that will the display the information as follows:
Jan Jansen has cardnumber: 325255
Erik Materus has cardnumber: 334343
Ali Ahson has cardnumber: 235434
Eva Versteeg has cardnumber: 645345
I've tried to convert to list and split(",") but that ends up adding the next number like this:
['325255', ' Jan Jansen\n334343', ' Erik Materus\n235434', ' Ali Ahson\n645345', ' Eva Versteeg\n534545', ' Jan de Wilde\n345355', ' Henk de Vries']
Help would be appreciated!

You can do it this way
with open('id_numbers.txt', 'r') as f:
for line in f:
line = line.rstrip() # this removes the \n at the end
id_num, name = line.split(',')
name = name.strip() # in case name has trailing spaces in both sides
print('{0} has cardnumber: {1}'.format(name, id_num))

Related

Edit last element of each line of text file

I've got a text file with some elements as such;
0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0
1,Harry Potter,JK Rowling,1/1/2010,8137
2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,0
3,To Kill a Mockingbird,Harper Lee,1/3/2010,1828
The last element of these lists determine which user has taken out the given book. If 0 then nobody has it.
I want a code to replace the ',0' of any given line into an input 4 digit number to show someone has taken out the book.
I've used .replace to change it from 1828 e.g. into 0 however I don't know how I can change the last element of specific line from 0 to something else.
I cannot use csv due to work/education restrictions so I have to leave the file in .txt format.
I also can only use Standard python library, therefore no pandas.
You can capture txt using read_csv into a dataframe:
import pandas as pd
df = pd.read_csv("text_file.txt", header=None)
df.columns = ["Serial", "Book", "Author", "Date", "Issued"]
print(df)
df.loc[3, "Issued"] = 0
print(df)
df.to_csv('text_file.txt', header=None, index=None, sep=',', mode='w+')
This replaces the third book issued count to 0.
Serial Book Author Date \
0 0 The Hitchhiker's Guide to Python Kenneth Reitz 2/4/2012
1 1 Harry Potter JK Rowling 1/1/2010
2 2 The Great Gatsby F. Scott Fitzgerald 1/2/2010
3 3 To Kill a Mockingbird Harper Lee 1/3/2010
Issued
0 0
1 8137
2 0
3 1828
Serial Book Author Date \
0 0 The Hitchhiker's Guide to Python Kenneth Reitz 2/4/2012
1 1 Harry Potter JK Rowling 1/1/2010
2 2 The Great Gatsby F. Scott Fitzgerald 1/2/2010
3 3 To Kill a Mockingbird Harper Lee 1/3/2010
Issued
0 0
1 8137
2 0
3 0
Edit after comment:
In case you only need to use python standard libraries, you can do something like this with file read:
import fileinput
i = 0
a = 5 # line to change with 1 being the first line in the file
b = '8371'
to_write = []
with open("text_file.txt", "r") as file:
for line in file:
i += 1
if (i == a):
print('line before')
print(line)
line = line[:line.rfind(',')] + ',' + b + '\n'
to_write.append(line)
print('line after edit')
print(line)
else:
to_write.append(line)
print(to_write)
with open("text_file.txt", "w") as f:
for line in to_write:
f.write(line)
File content
0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0
1,Harry Potter,JK Rowling,1/1/2010,8137
2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,84
3,To Kill a Mockingbird,Harper Lee,1/3/2010,7895
4,XYZ,Harper,1/3/2018,258
5,PQR,Lee,1/3/2019,16
gives this as output
line before
4,XYZ,Harper,1/3/2018,258
line after edit
4,XYZ,Harper,1/3/2018,8371
["0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0\n", '1,Harry Potter,JK Rowling,1/1/2010,8137\n', '2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,84\n', '3,To Kill a Mockingbird,Harper Lee,1/3/2010,7895\n', '4,XYZ,Harper,1/3/2018,8371\n', '5,PQR,Lee,1/3/2019,16\n', '\n']
you can try this:
to_write = []
with open('read.txt') as f: #read input file
for line in f:
if int(line[-2]) ==0:
to_write.append(line.replace(line[-2],'1234')) #any 4 digit number
else:
to_write.append(line)
with open('output.txt', 'w') as f: #name of output file
for _list in to_write:
f.write(_list)

Text files help(Python 3)

Students.txt
64 Mary Ryan
89 Michael Murphy
22 Pepe
78 Jenny Smith
57 Patrick James McMahon
89 John Kelly
22 Pepe
74 John C. Reilly
My code
f = open("students.txt","r")
for line in f:
words = line.strip().split()
mark = (words[0])
name = " ".join(words[1:])
for i in (mark):
print(i)
The output im getting is
6
4
8
9
2
2
7
8
etc...
My expected output is
64
80
22
78
etc..
Just curious to know how I would print the whole integer, not just a single integer at a time.
Any help would be more than appreciative.
As I can see you have some integer with a string in the text file. You wanted to know about your code will output only full Integer.
You can use the code
f = open("Students.txt","r")
for line in f:
l = line.split(" ")
print(l[0])
In Python, when you do this:
for i in (mark):
print(i)
and mark is of type string, you are asking Python to iterate over each character in the string. So, if your string contains space-separated integers and you iterate over the string, you'll get one integer at a time.
I believe in your code the line
mark = (words[0])name = " ".join(words[1:])
is a typo. If you fix that we can help you with what's missing (it's most likely a statement like mark = something.split(), but not sure what something is based on the code).
You should be using context managers when you open files so that they are automatically closed for you when the scope ends. Also mark should be a list to which you append the first element of the line split. All together it will look like this:
with open("students.txt","r") as f:
mark = []
for line in f:
mark.append(line.strip().split()[0])
for i in mark:
print(i)
The line
for i in (mark):
is same as this because mark is a string:
for i in mark:
I believe you want to make mark an element of some iterable, which you can create a tuple with single item by:
for i in (mark,):
and this should give what you want.
in your line:
line.strip().split()
you're not telling the sting to split based on a space. Try the following:
str(line).strip().split(" ")
A quick one with list comprehensions:
with open("students.txt","r") as f:
mark = [line.strip().split()[0] for line in f]
for i in mark:
print(i)

removing line breaks in a csv file

I have a csv file with lines, each line begins with (#) and all the fields within a line are separated with (;). One of the fields, that contains "Text" (""[ ]""), has some line breaks that produce errors while importing the whole csv file to excel or access. The text after the line breaks is considered as independent lines, not following the structure of the table.
#4627289301; Lima, Peru; 490; 835551022915420161; Sat Feb 25 18:04:22 +0000 2017; ""[OJO!
la premiacin de los #Oscar, nuestros amigos de #cinencuentro revisan las categoras.
+info: co/plHcfSIfn8]""; 0
#624974422; None; 114; 835551038581137416; Sat Feb 25 18:04:26 +0000 2017; ""[Porque nunca dejamos de amar]""; 0
any help with this using a python script? or any other solution...
as output I would like to have the lines:
#4627289301; Lima, Peru; 490; 835551022915420161; Sat Feb 25 18:04:22 +0000 2017; ""[OJO! la premiacin de los #Oscar, nuestros amigos de #cinencuentro revisan las categoras. +info: co/plHcfSIfn8]""; 0
#624974422; None; 114; 835551038581137416; Sat Feb 25 18:04:26 +0000 2017; ""[Porque nunca dejamos de amar]""; 0
any help? I a csv file (54MB) with a lot of lines with line breaks... some other lines are ok...
You should share your expected output as well.
Anyways, I suggest you first clean your file to remove the newline characters. Then you can read it as csv. One solution can be (I believe someone will suggest something better :-) )
Clean the file (on linux):
sed ':a;N;$!ba;s/\n/ /g' input_file | sed "s/ #/\n#/g" > output_file
Read file as csv (You can read it using any other method)
import pandas as pd
df = pd.read_csv('output_file', delimiter=';', header=None)
df.to_csv('your_csv_file_name', index=False)
Let's see if it helps you :-)
You can search for lines that are followed by a line that doesn't start with "#", like this \r?\n+(?!#\d+;).
The following was generated from this regex101 demo. It replaces such line ends with a space. You can change that to whatever you like.
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"\r?\n+(?!#\d+;)"
test_str = ("#4627289301; Lima, Peru; 490; 835551022915420161; Sat Feb 25 18:04:22 +0000 2017; \"\"[OJO!\n"
"la premiacin de los #Oscar, nuestros amigos de #cinencuentro revisan las categoras.\n"
"+info: co/plHcfSIfn8]\"\"; 0\n"
"#624974422; None; 114; 835551038581137416; Sat Feb 25 18:04:26 +0000 2017; \"\"[Porque nunca dejamos de amar]\"\"; 0")
subst = " "
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Python parsing table data into an array

Firstly I am using python 2.7.
What I am trying to achieve is to separate table data from a active directory lookup into "Firstname Lastname" items in an array, which I can later compare to a different array and see which users in the list do not match.
I run dsquery group domainroot -name groupname | dsget group -members | dsget user -fn -ln which outputs a list as such:
fn ln
Peter Brill
Cliff Lach
Michael Tsu
Ashraf Shah
Greg Coultas
Yi Li
Brad Black
Kevin Schulte
Raymond Masters (Admin)
James Rapp
Allison Wurst
Benjamin Hammel
Edgar Cuevas
Vlad Dorovic (Admin)
Will Wang
dsget succeeded
Notice this list has both spaces before and after each data set.
The code I am using currently:
userarray = []
p = Popen(["cmd.exe"], stdin=PIPE, stdout=PIPE)
p.stdin.write("dsquery group domainroot -name groupname | dsget group -members | dsget user -fn -ln\n")
p.stdin.write("exit\n")
processStdout = p.stdout.read().replace("\r\n", "").strip("")[266:]
cutWhitespace = ' '.join(processStdout.split()).split("dsget")[0]
processSplit = re.findall('[A-Z][^A-Z]*', cutWhitespace)
userarray.append(processSplit)
print userarray
My problem is that when I split on the blank space and attempt to re-group them into "Firstname Lastname" when it hits the line in the list that has (Admin) the grouping gets thrown off because there is a third field. Here is a sample of what I mean:
['Brad ', 'Black ', 'Kevin ', 'Schulte ', 'Raymond ', 'Masters (', 'Admin) ', 'James ', 'Rapp ', 'Allison ', 'Wurst ',
I would appreciate any suggestions on how to group this better or correctly. Thanks!
# the whole file.
content = p.stdout.read()
# each line as a single string
lines = content.split()
# lets drop the header and the last line
lines = lines[1:-1]
# Notice how the last name starts at col 19
names = [(line[:19].strip(), line[19:].strip()) for line in lines]
print(names)
=> [('Peter', 'Brill'), ('Cliff', 'Lach'), ('Michael', 'Tsu'), ('Ashraf', 'Shah'), ('Greg', 'Coultas'), ('Yi', 'Li'), ('Brad', 'Black'), ('Kevin', 'Schulte'), ('Raymond', 'Masters (Admin)'), ('James', 'Rapp'), ('Allison', 'Wurst'), ('Benjamin', 'Hammel'), ('Edgar', 'Cuevas'), ('Vlad', 'Dorovic (Admin)'), ('Will', 'Wang')]
Now, if the column size change, just do index = lines[0].indexof('ln') before dropping the header and use that instead of 19
split has a maxsplit argument you so you can tell it to split only the first separator, so you could say:
cutWhitespace = ' '.join(processStdout.split(None,1)).split("dsget")[0]
on your sixth line to tell it to split no more than once.

I can't alter my list generated from a textfile, how would I go about doing this? Python 3.x.x

file = open('Names.txt', 'r')
for lines in file:
names = lines.split()
names_list = [item.strip(',') for item in names]
reformattedName = (names_list[1:]+names_list[0])
print(reformattedName)
This is what I have so far.
With the text file being:
Neuman, Alfred E.
Stevenson, Robert Lewis
Lewis, C.S.
Doe, Jane
Bush, George Herbert Walker
I'm trying to rearrange it to look like:
Alfred E. Nueman
Robert Lewis Stevenson
C.S. Lewis
Jane Doe
George Herbert Walker Bush
You want to concatenate lists with lists, not a string with a list:
reformattedName = names_list[1:] + names_list[:1]
or perhaps you wanted to rejoin the elements into a string again:
reformattedName = ' '.join(names_list[1:] + names_list[:1])

Categories

Resources