Instructions:
If we don't know how many items are in a file, we can use read() to load the entire file and then use the line endings (the \n bits) to split it into lines. Here is an example of how to use split()
source = 'aaa,bbb,ccc'
things = source.split(',') # split at every comma
print(things) # displays ['aaa', 'bbb', 'ccc'] because things is a list
Task
Ask the user to enter names and keep asking until they enter nothing.
Add each new name to a file called names.txt as they are entered.
Hint: Open the file before the loop and close it after the loop
Once they have stopped entering names, load the file contents, split it into individual lines and print the lines one by one with -= before the name and =- after it.
Assuming you already have the file of names called input.txt like so:
Tom
Dick
Harry
Then the code to read the entire file, split on '\n' and print each name in the required format is
# Open the file
with open("input.txt") as file:
# Read the entire file
content = file.read()
# Split the content up into individual names
names = content.split('\n')
# Print the required string for each name
for name in names:
print(f'-={name}=-')
Hope that solves your problem.
Related
name = input("Whats your name?: ")
Can I use for example my .txt that is in the same directory on the input 'name'?
I tried the code bellow /
with open(name.txt, "r") as file:
file.readlines()
But its not working :(
It's a bit unclear what you want exactly. I think you want the variable name to be a name taken from a textfile.
with open("name.txt", "r") as file:
name = file.readline().strip()
print(name)
It works by opening up the file name.txt for reading as text, then it reads the first line, strip() is used to remove any spaces or newlines, and it is stored in the variable name.
There is no need to call input().
Be sure to read the tutorial on input and output.
I have a file with a list of cities I'm trying to print and I cannot get them to print.
Code
citylist = open(os.path.join(folderpath, 'Clipped_Cities.dbf'))
print citylist
Result
<open file 'C:\\Users\\Michaelf\\Desktop\\Test_Folder\\LabData\\Clipped_Cities.dbf', mode 'r' at 0x030B6B78>
How can I actually print the cities within the folder Clipped_Cities.dbf instead of the path?
You're close, but you're trying to print a file object instead of that files contents as mentioned by #n1c9 in the comments.
citylist = open(os.path.join(folderpath, 'Clipped_Cities.dbf'), 'r')
for line in citylist:
print line
You need to specify read permissions, then iterate over the file object.
Also a few things,
Instead of setting a variable to the file object like you are. it's better to use the with as method as it gets rid of it after you're done.
You may want to use f as the name of the object is a popular convention.
So,
with open(os.path.join(folderpath, 'Clipped_Cities.dbf'), 'r') as f:
# Iterate over file...
Note
If if you don't want blank lines in between your lines as pointed out by #zondo, you probably want to do something like
print line.replace('\n', '')
This simply replaces the newline character in the string with ''.
Or simpler,
print line.rstrip()
I'm a huge python noob. Trying to write simple script that will split a line in a file where it see's a "?"
line in Input file (inputlog.log):
http://website.com/somejunk.jpg?uniqid&=123&an=1234=123
line in output file (outputlog.log):
http://website.com/somejunk.jpg uniqid&=123&an=1234=123
The goal here is to end up with a file that has 2 columns:
Here's my code it kinda works except it wont write to the 2nd file
"TypeError: expected a character buffer object"
import re
a = raw_input("what file do you want to open? ")
b = raw_input("what is the file you want to save to? ")
with open(a, 'r') as f1:
with open(b,'w') as f2:
data = f1.readlines()
print "This is the line: ", data #for testing
for line in data:
words= re.split("[?](.*)$",line)
print "Here is the Split: ", words #for testing
f2.write(words)
f1.close()
f2.close()
Your problem is that 'words' is a list. You cannot write that to your file. You need to convert it back to a string. Also, you need to pay attention when converting it back to make sure that you create the spacing/split you want between the string.
You should do something like this.
words = ' '.join(words)
Pay close attention to the space inside the single quotes. That indicates it will put a space between your strings.
Finally, you then make your call to:
f2.write(words)
Upon making that change, I tested your code and it successfully split and wrote them to the file per your specification.
Issue: Remove the hyperlinks, numbers and signs like ^&*$ etc from twitter text. The tweet file is in CSV tabulated format as shown below:
s.No. username tweetText
1. #abc This is a test #abc example.com
2. #bcd This is another test #bcd example.com
Being a novice at python, I search and string together the following code, thanks to a the code given here:
import re
fileName="path-to-file//tweetfile.csv"
fileout=open("Output.txt","w")
with open(fileName,'r') as myfile:
data=myfile.read().lower() # read the file and convert all text to lowercase
clean_data=' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",data).split()) # regular expression to strip the html out of the text
fileout.write(clean_data+'\n') # write the cleaned data to a file
fileout.close()
myfile.close()
print "All done"
It does the data stripping, but the output file format is not as I desire. The output text file is in a single line like
s.no username tweetText 1 abc This is a cleaned tweet 2 bcd This is another cleaned tweet 3 efg This is yet another cleaned tweet
How can I fix this code to give me an output like given below:
s.No. username tweetText
1 abc This is a test
2 bcd This is another test
3 efg This is yet another test
I think something needs to be added in the regular expression code but I don't know what it could be. Any pointers or suggestions will be helpful.
You can read the line, clean it, and write it out in one loop. You can also use the CSV module to help you build out your result file.
import csv
import re
exp = r"(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"
def cleaner(row):
return [re.sub(exp, " ", item.lower()) for item in row]
with open('input.csv', 'r') as i, open('output.csv', 'wb') as o:
reader = csv.reader(i, delimiter=',') # Comma is the default
writer = csv.writer(o, delimiter=',')
# Take the first row from the input file (the header)
# and write it to the output file
writer.writerow(next(reader))
for row in reader:
writer.writerow(cleaner(row))
The csv module knows correctly how to add separators between items; as long as you pass it a collection of items.
So, what the cleaner method does it take each item (column) in the row from the input file, apply the substitution to the lowercase version of the item; and then return back a list.
The rest of the code is simply opening the file, configuring the CSV module with the separators you want for the input and output files (in the example code, the separator for both files is a tab, but you can change the output separator).
Next, the first row of the input file is read and written out to the output file. No transformation is done on this row (which is why it is not in the loop).
Reading the row from the input file automatically puts the file pointer on the next row - so then we simply loop through the input rows (in reader), for each row apply the cleaner function - this will return a list - and then write that list back to the output file with writer.writerow().
instead of applying the re.sub() and the .lower() expressions to the entire file at once try iterating over each line in the CSV file like this:
for line in myfile:
line = line.lower()
line = re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",line)
fileout.write(line+'\n')
also when you use the with <file> as myfile expression there is no need to close it at the end of your program, this is done automatically when you use with
Try this regex:
clean_data=' '.join(re.sub("[#\^&\*\$]|#\S+|\S+[a-z0-9]\.(com|net|org)"," ",data).split()) # regular expression to strip the html out of the text
Explanation:
[#\^&\*\$] matches on the characters, you want to replace
#\S+matches on hash tags
\S+[a-z0-9]\.(com|net|org) matches on domain names
If the URLs can't be identified by https?, you'll have to complete the list of potential TLDs.
Demo
So I have a large text file. It contains a bunch of information in the following format:
|NAME|NUMBER(1)|AST|TYPE(0)|TYPE|NUMBER(2)||NUMBER(3)|NUMBER(4)|DESCRIPTION|
Sorry for the vagueness. All the information is formatted like the above and between each descriptor is the separator '|'. I want to be able to search the file for the 'NAME' and the print each descriptor in it's own tag such as this example:
Name
Number(1):
AST:
TYPE(0):
etc....
In case I'm still confusing, I want to be able to search the name and then print out the information that follows each being separated by a '|'.
Can anyone help?
EDIT
Here is an example of a part of the text file:
|Trevor Jones|70|AST|White|Earth|3||500|1500|Old Man Living in a retirement home|
This is the code I have so far:
with open('LARGE.TXT') as fd:
name='Trevor Jones'
input=[x.split('|') for x in fd.readlines()]
to_search={x[0]:x for x in input}
print('\n'.join(to_search[name]))
First you need to break the file up somehow. I think that a dictionary is the best option here. Then you can get what you need.
d = {}
# Where `fl` is our file object
for L in fl:
# Skip the first pipe
detached = L[1:].split('|')
# May wish to process here
d[detached[0]] = detached[1:]
# Can do whatever with this information now
print d.get('string_to_search')
Something like
#Opens the file in a 'safe' manner
with open('large_text_file') as fd:
#This reads in the file and splits it into tokens,
#the strip removes the extra pipes
input = [x.strip('|').split('|') for x in fd.readlines()]
#This makes it into a searchable dictionary
to_search = {x[0]:x for x in input}
and then search with
to_search[NAME]
Depending on the format you want the answers in use
print ' '.join(to_search[NAME])
or
print '\n'.join(to_search[NAME])
A word of warning, this solution assumes that the names are unique, if they aren't a more complex solution may be required.