I have a string:
"apples = green"
How do I print:
print everything before '=' (apples)
print everything after '=' (green)
specify a number of the string in a text file. I have .txt file which contains:
apples = green
lemons = yellow
... = ...
... = ...
split the string using .split():
print astring.split(' = ', 1)[0]
still split the string using .split():
print astring.split(' = ', 1)[1]
Alternatively, you could use the .partition() method:
>>> astring = "apples = green"
>>> print astring.split(' = ', 1)
['apples', 'green']
>>> print astring.partition(' = ')
('apples', ' = ', 'green')
Partition always only splits once, but returns the character you split on as well.
If you need to read a specific line in a file, skip lines first by iterating over the file object. The itertools.islice() function is the most compact way to return that line; don't worry too much if you don't understand how that all works. If the file doesn't have that many lines, an empty string is returned instead:
from itertools import islice
def read_specific_line(filename, lineno):
with open(filename) as f:
return next(islice(f, lineno, lineno + 1), '')
To read the 3rd line from a file:
line = read_specific_line('/path/to/some/file.txt', 3)
If instead you need to know what the line number is of a given piece of text, you'd need to use the enumerate() to keep track of the line count so far:
def what_line(filename, text):
with open(filename) as f:
for lineno, line in enumerate(f):
if line.strip() == text:
return lineno
return -1
which would return the line number (starting to count from 0), or -1 if the line was not found in the file.
Every string in python has a function within it called 'split.' If you call string.split("substring") It creates a list which does exactly what you are looking for.
>>> string = "apples = green"
>>> string.split("=")
['apples ', ' green']
>>> string = "apples = green = leaves = chloroplasts"
>>> string.split("=")
['apples ', ' green ', ' leaves ', ' chloroplasts']
So, if you use string.split(), you can call the index in the resulting list to get the substring you want:
>>> string.split(" = ")[0]
'apples'
>>> string.split(" = ")[1]
'green'
>>> string.split(" = ")[2]
'leaves'
etc... Just make sure you have a string which actually contains the substring, or this will throw an IndexError for any index greater than 0.
Related
I have a text file for example :
test case 1 Pass
Test case 2 Pass
etc etc etc
I am able to separate the strings using split() function by whitespace, but I want to separate them using the keyword "Pass"/"Fail", how should I go about it?
my current code supports separation through whitespace but not all text file will have similar value, but they will have "Pass" or "Fail" keywords
filestr = ''
f = open('/Users/shashankgoud/Downloads/abc/index.txt',"r")
data=f.read()
for line in data.split('\n'):
strlist = line.split(' ')
filestr += (' '.join(strlist[:3]) +','+','.join(strlist[3:]))
filestr += '\n'
print(filestr)
f1 = open('/Users/shashankgoud/Downloads/abc/index.xlsx',"w")
f1.write(filestr)
f1.close()
You can use the re module for that, for example:
import re
txt = "test case 1 Pass Test case 2 Pass etc etc etc"
pattern = re.compile(r'(Pass|Fail)')
parts = pattern.split(txt)
joined_parts = [
parts[i] + parts[i + 1] for i in range(0, len(parts) - 1, 2)
]
joined_parts += [parts[-1]]
print(joined_parts)
>>> ['test case 1 Pass', ' Test case 2 Pass', ' etc etc etc']
I have a text file that contains these some words and a number written with a point in it. For example
hello!
54.123
Now I only want the number 54.123 to be extracted an converted so that the outcome is 54123
The code I tried is
import re
exp = re.compile(r'^[\+]?[0-9]')
my_list = []
with open('file.txt') as f:
lines = f.readlines()
for line in lines:
if re.match(exp, line.strip()):
my_list.append(int(line.strip()))
#convert to a string
listToStr = ' '.join([str(elem) for elem in my_list])
print(listToStr)
But this returns the error: ValueError: invalid literal for int() with base 10: '54.123'
Does anyone know a solution for this?
You can try to convert the current line to a float. In case the line does not contain a legit float number it returns a ValueError exception that you can catch and just pass. If no exception is thrown just split the line at the dot, join the 2 parts, convert to int and add to the array.
my_list = []
with open('file.txt') as f:
lines = f.readlines()
for line in lines:
try:
tmp = float(line)
num = int(''.join(line.split(".")))
my_list.append(num)
except ValueError:
pass
#convert to a string
listToStr = ' '.join([str(elem) for elem in my_list])
print(listToStr)
You can check if a given line is a string representing a number using the isdigit() function.
From what I can tell you need to just check if there is a number as isdigit() works on integers only (floats contain "." which isn't a number and it returns False).
For example:
def numCheck(string):
# Checks if the input string contains numbers
return any(i.isdigit() for i in string)
string = '54.123'
print(numCheck(string)) # True
string = 'hello'
print(numCheck(string)) # False
Note: if your data contains things like 123ab56 then this won't be good for you.
To convert 54.123 to 54123 you could use the replace(old, new) function.
For example:
string = 54.123
new_string = string.replace('.', '') # replace . with nothing
print(new_string) # 54123
This may help I am now getting numbers from the file I guess you were trying to use split in place of strip
import re
exp = re.compile(r'[0-9]')
my_list = []
with open('file.txt') as f:
lines = f.readlines()
for line in lines:
for numbers in line.split():
if re.match(exp, numbers):
my_list.append(numbers)
#convert to a string
listToStr = ' '.join([str(elem) for elem in my_list])
print(listToStr)
I have this a text file that resembles
alpha alphabet alphameric
I would like to match just the first string `alpha', nothing else
I have the following code that attempts to match just the alpha string and get its line number
findWord = re.findall('\\ba\\b', "alpha")
with open(file) as myFile:
for num, line in enumerate(myFile, 1):
if findWord in line:
print 'Found at line: ', num
However I get the following error:
TypeError: 'in ' requires string as left operand, not list
Issues in your code
re.findall('\\ba\\b', "alpha") gives a matched list but you are using in if findWord in line means using list in place of string . That's what the error you are getting
By giving findWord = re.findall('\\ba\\b', "alpha") you are searching for string a in alpha string which is not existing
Try this
import re
#findWord = re.findall('\\ba\\b', "alpha")
#print findWord
with open("data.txt") as myFile:
for num,line in enumerate(myFile):
if re.findall('\\balpha\\b', line):
print 'Found at line: ', num+1
You may modify your code a bit
with open(file, 'r') as myFile:
for num, line in enumerate(myFile, 1):
if 'alpha' in line.split():
print 'Found at line', num
Output:
Found at line 1
You can try this:
import re
s = "alpha alphabet alphameric"
data = re.findall("alpha(?=\s)", s)[0]
Output:
"alpha"
I am trying to set up a system for running various statistics on a text file. In this endeavor I need to open a file in Python (v2.7.10) and read it both as lines, and as a string, for the statistical functions to work.
So far I have this:
import csv, json, re
from textstat.textstat import textstat
file = "Data/Test.txt"
data = open(file, "r")
string = data.read().replace('\n', '')
lines = 0
blanklines = 0
word_list = []
cf_dict = {}
word_dict = {}
punctuations = [",", ".", "!", "?", ";", ":"]
sentences = 0
This sets up the file and the preliminary variables. At this point, print textstat.syllable_count(string) returns a number. Further, I have:
for line in data:
lines += 1
if line.startswith('\n'):
blanklines += 1
word_list.extend(line.split())
for char in line.lower():
cf_dict[char] = cf_dict.get(char, 0) + 1
for word in word_list:
lastchar = word[-1]
if lastchar in punctuations:
word = word.rstrip(lastchar)
word = word.lower()
word_dict[word] = word_dict.get(word, 0) + 1
for key in cf_dict.keys():
if key in '.!?':
sentences += cf_dict[key]
number_words = len(word_list)
num = float(number_words)
avg_wordsize = len(''.join([k*v for k, v in word_dict.items()]))/num
mcw = sorted([(v, k) for k, v in word_dict.items()], reverse=True)
print( "Total lines: %d" % lines )
print( "Blank lines: %d" % blanklines )
print( "Sentences: %d" % sentences )
print( "Words: %d" % number_words )
print('-' * 30)
print( "Average word length: %0.2f" % avg_wordsize )
print( "30 most common words: %s" % mcw[:30] )
But this fails as 22 avg_wordsize = len(''.join([k*v for k, v in word_dict.items()]))/num returns a ZeroDivisionError: float division by zero. However, if I comment out the string = data.read().replace('\n', '') from the first piece of code, I can run the second piece without problem and get the expected output.
Basically, how do I set this up so that I can run the second piece of code on data, as well as textstat on string?
The call to data.read() places the file pointer at the end of the file, so you dont have anything more to read at this point. You either have to close and reopen the file or more simply reset the pointer at the begining using data.seek(0)
First see the line:
string = data.read().replace('\n', '')
You are reading from data once. Now, cursor is in the end of data.
Then see the line,
for line in data:
You are trying to read it again, but you just can't do it, because there is nothing else in data, you are at the end of it.so len(word_list) are returning 0.
You are dividing by it and getting the error.
ZeroDivisionError: float division by zero.
But when you comment it, now you are reading only once, which is valid, so second portion of your codes now work.
Clear now?
So, what to do now?
Use data.seek() after data.read()
Demo:
>>> a = open('file.txt')
>>> a.read()
#output
>>>a.read()
#nothing
>>> a.seek(0)
>>> a.read()
#output again
Here is a simple fix. Replace the line for line in data: by :
data.seek(0)
for line in data.readlines():
...
It basically points back to the beginning of the file and read it again line by line.
While this should work, you may want to simplify the code and read the file only once. Something like:
with open(file, "r") as fin:
lines = fin.readlines()
string = ''.join(lines).replace('\n', '')
I'm trying to print out a medium sized list in Python and what I'm doing is printing out the entire list on one line to test the program to make sure the right data is being put in to the list in the right order. I read in 2 files and put all the data into 2 dictionaries. Then, I split the dictionaries into parts and put all the similar data into a list. I'm super new to Python and this is a tutorial I found on dictionaries and I'm a little stuck. This line prints the list on one line:
print '[%s]' % ', '.join(map(str, player_list))
But this line prints each value of the list on a separate line which I don't want:
print '[%s]' % ', '.join(map(str, army_list))
Here's my code if needed that adds to the list:
import collections
import operator
terridict = {}
gsdict = {}
terr_list = []
player_list = []
army_list = []
list_length = []
total_territories = 0
with open('territories.txt', 'r') as territory:
for line in territory:
terridict["territory"], terridict["numeric_id"], terridict["continent"] = line.split(',')
with open('gameState.txt', 'r') as gameState:
for line in gameState:
gsdict["numeric_id"], gsdict["player"], gsdict["num_armies"] = line.split(',')
terr_num = gsdict["numeric_id"]
player_num = gsdict["player"]
army_size = gsdict["num_armies"]
if terr_num >= 1 and player_num >= 1 and army_size >= 1:
terr_list.append(terr_num)
player_list.append(player_num)
army_list.append(army_size)
player_list.sort()
counter = collections.Counter(player_list)
print (counter)
total_territories = total_territories + 1
x = counter
sorted_x = sorted(x.items(), key=operator.itemgetter(0))
counter = sorted_x
print terr_num, player_num, army_size
print counter
print "Number of territories: %d" % total_territories
print '[%s]' % ', '.join(map(str, terr_list))
print '[%s]' % ', '.join(map(str, player_list))
print '[%s]' % ', '.join(map(str, army_list))
line, when you read it in, ends with a newline. For example (I'm guessing here):
"1 nelson2013 23\n"
When you split it by space, you get this:
["1", "nelson2013", "23\n"]
Notice that the player name does not end with a newline, but army size does. When you join army sizes together, they end up like this:
"23\n, 18\n, 121\n"
i.e. separated by newlines, which makes them print one per line.
To combat this, you want to invoke rstrip() on line immediately at the top of the loop, before you process it any further.
You probably want to fix what line is now because line.rsplit() doesn't work very well by itself. Building off what Amadan said:
line = line.rsplit()
This way, the new line character is removed and line can be set to a condition where the newline character is not involved. I tried it out and this worked.