How to remove values in list which contain alphabetical characters? - python

I am reading a .dat file and the first few lines are just metadata before it gets to the actual data. A shortened example of the .dat file is below.
&SRS
SRSRUN=266128,SRSDAT=20180202,SRSTIM=122132,
fc.fcY=0.9000
&END
energy rc ai2
8945.016 301.32 6.7959
8955.497 301.18 6.8382
8955.989 301.18 6.8407
8956.990 301.16 6.8469
Or as the list:
[' &SRS\n', ' SRSRUN=266128,SRSDAT=20180202,SRSTIM=122132,\n', 'fc.fcY=0.9000\n', '\n', ' &END\n', 'energy\trc\tai2\n', '8945.016\t301.32\t6.7959\n', '8955.497\t301.18\t6.8382\n', '8955.989\t301.18\t6.8407\n', '8956.990\t301.16\t6.8469\n']
I tried this previously but it :
def import_absorptionscan(file_path,start,end):
for i in range(start,end):
lines=[]
f=open(file_path+str(i)+'.dat', 'r')
for line in f:
lines.append(line)
for line in lines:
for c in line:
if c.isalpha():
lines.remove(line)
print lines
But i get this error: ValueError: list.remove(x): x not in list
i started looking through stack overflow then but most of what came up was how to strip alphabetical characters from a string, so I made this question.
This produces a list of strings, with each string making up one line in the file. I want to remove any string which contains any alphabet characters as this should remove all the metadata and leave just the data. Any help would be appreciated thank you.

I have a suspicion you will want a more robust rule than "does the string contain a letter?", but you can use a regular expression to check:
re.search("[a-zA-Z]", line)
You'll probably want to take a look at the regular expression docs.

Additionally, you can use the any statement to check for letters. Inside your inner for loop add:
If any (word.isalpha() for word in line)
Notice that this will say that "ver9" is all numbers, so if this is a problem, just replace it with:
line_is_meta = False
for word in line:
if any (letter.isalpha() for letter in word):
line_is_meta = True
break
for letter in word:
if letter.isalpha():
line_is_meta = True
break
if not line_is_meta: lines.append (line)

Related

Python Code to Replace First Letter of String: String Index Error

Currently, I am working on parsing resumes to remove "-" only when it is used at the beginning of each line. I've tried identifying the first character of each string after the text has been split. Below is my code:
for line in text.split('\n'):
if line[0] == "-":
line[0] = line.replace('-', ' ')
line is a string. This is my way of thinking but every time I run this, I get the error IndexError: string index out of range. I'm unsure of why because since it is a string, the first element should be recognized. Thank you!
The issue you're getting is because some lines are empty.
Then your replacement is wrong:
first because it will assign the first "character" of the line but you cannot change a string because it's immutable
second because the replacement value is the whole string minus some dashes
third because line is lost at the next iteration. The original list of lines too, by the way.
If you want to remove the first character of a string, no need for replace, just slice the string (and don't risk to remove other similar characters).
A working solution would be to test with startswith and rebuild a new list of strings. Then join back
text = """hello
-yes--
who are you"""
new_text = []
for line in text.splitlines():
if line.startswith("-"):
line = line[1:]
new_text.append(line)
print("\n".join(new_text))
result:
hello
yes--
who are you
with more experience, you can pack this code into a list comprehension:
new_text = "\n".join([line[1:] if line.startswith("-") else line for line in text.splitlines()])
finally, regular expression module is also a nice alternative:
import re
print(re.sub("^-","",text,flags=re.MULTILINE))
this removes the dash on all lines starting with dash. Multiline flag tells regex engine to consider ^ as the start of the line, not the start of the buffer.
this could be due to empty lines. You could just check the length before taking the index.
new_text = []
text="-testing\nabc\n\n\nxyz"
for line in text.split("\n"):
if line and line[0] == '-':
line = line[1:]
new_text.append(line)
print("\n".join(new_text))

How do I output the acronym on one line

I am following the hands-on python tutorials from Loyola university and for one exercise I am supposed to get a phrase from the user, capatalize the first letter of each word and print the acronym on one line.
I have figured out how to print the acronym but I can't figure out how to print all the letters on one line.
letters = []
line = input('?:')
letters.append(line)
for l in line.split():
print(l[0].upper())
Pass end='' to your print function to suppress the newline character, viz:
for l in line.split():
print(l[0].upper(), end='')
print()
Your question would be better if you shared the code you are using so far, I'm just guessing that you have saved the capital letters into a list.
You want the string method .join(), which takes a string separator before the . and then joins a list of items with that string separator between them. For an acronym you'd want empty quotes
e.g.
l = ['A','A','R','P']
acronym = ''.join(l)
print(acronym)
You could make a string variable at the beginning string = "".
Then instead of doing print(l[0].upper()) just append to the string string += #yourstuff
Lastly, print(string)

Reversing a text and find same words - python

I have to write some code in python that will read all words from a text, reverse them and find which of them are the same in normal and reverse format. So far, I 've done this:
filename=raw_input("enter the file name: ")
fop=open(filename)
for line in fop:
words=line.split()
li=[]
li.extend(words)
size=len(li)
for i in range(0,size/2):
li[i], li[size-1-i] = li[size-1-i], li[i]
`enter code here`''.join(li)
but it doesn 't work, because if i give a text with more than one lines, it only processes the last line and doesn 't actually seem to reverse anything. Some help please?
You can just do the following , you can check for reverse with word == word[::-1] that word[::-1] is reverse indexing :
filename=raw_input("enter the file name: ")
with open(filename) as f :
for line in f:
for word in line.split() :
if word == word[::-1]:
print word
If you want to print just once the palindrome words you can use a set comprehension
print '\n'.join({w for w in open('file.txt).read().split() if w==w[::-1]})
Note that my answer doesn't filter any single letters, punctuation etc, in other words it depends on a loose and broad definition of what is a word.

Removing \n from myFile

I am trying to create a dictionary of list that the key is the anagrams and the value(list) contains all the possible words out of that anagrams.
So my dict should contain something like this
{'aaelnprt': ['parental', 'paternal', 'prenatal'], ailrv': ['rival']}
The possible words are inside a .txt file. Where every word is separated by a newline. Example
Sad
Dad
Fruit
Pizza
Which leads to a problem when I try to code it.
with open ("word_list.txt") as myFile:
for word in myFile:
if word[0] == "v": ##Interested in only word starting with "v"
word_sorted = ''.join(sorted(word)) ##Get the anagram
for keys in list(dictonary.keys()):
if keys == word_sorted: ##Heres the problem, it doesn't get inside here as theres extra characters in <word_sorted> possible "\n" due to the linebreak of myfi
print(word_sorted)
dictonary[word_sorted].append(word)
If every word in "word_list.txt" is followed by '\n' then you can just use slicing to get rid of the last char of the word.
word_sorted = ''.join(sorted(word[:-1]))
But if the last word in "word_list.txt" isn't followed by '\n', then you should use rstrip().
word_sorted = ''.join(sorted(word.rstrip()))
The slice method is slightly more efficient, but for this application I doubt you'll notice the difference, so you might as well just play safe & use rstrip().
Use rstrip(), it removes the \n character.
...
...
keys == word_sorted.rstrip()
...
You should try to use the .rstrip() function in your code, it will remove the "\n"
Here you can check it .rstrip()
strip only removes characters from the beginning or end of a string.
Use rstrip() to remove \n character
Also you can use replace syntax, to replace newline with something else.
str2 = str.replace("\n", "")
So, I see a few problems here, how is anything getting into the dictionary, I see no assignments? Obviously you've only provided us a snippet, so maybe that's elsewhere.
You're also using a loop when you could be using in (it's more efficient, truly it is).
with open ("word_list.txt") as myFile:
for word in myFile:
if word[0] == "v": ##Interested in only word starting with "v"
word_sorted = ''.join(sorted(word.rstrip())) ##Get the anagram
if word_sorted in dictionary:
print(word_sorted)
dictionary[word_sorted].append(word)
else:
# The case where we don't find an anagram in our dict
dictionary[word_sorted] = [word,]

Reading two strings from file

I'm writing a program in python and I want to compare two strings that exist in a text file and are separated by a new line character. How can I read the file in and set each string to a different variable. i.e string1 and string2?
Right now I'm using:
file = open("text.txt").read();
but this gives me extra content and not just the strings. I'm not sure what it is returning but this text file just contains two strings. I tried using other methods such as ..read().splitlines() but this did not yield the result I'm looking for. I'm new to python so any help would be appreciated!
This only reads the first 2 lines, strips off the newline char at the end, and stores them in 2 separate variables. It does not read in the entire file just to get the first 2 strings in it.
with open('text.txt') as f:
word1 = f.readline().strip()
word2 = f.readline().strip()
print word1, word2
# now you can compare word1 and word2 if you like
text.txt:
foo
bar
asdijaiojsd
asdiaooiasd
Output:
foo bar
EDIT: to make it work with any number of newlines or whitespace:
with open('text.txt') as f:
# sequence of all words in all lines
words = (word for line in f for word in line.split())
# consume the first 2 items from the words sequence
word1 = next(words)
word2 = next(words)
I've verified this to work reliably with various "non-clean" contents of text.txt.
Note: I'm using generator expressions which are like lazy lists so as to avoid reading more than the needed amount of data. Generator expressions are otherwise equivalent to list comprehensions except they produce items in the sequence lazily, i.e. as just as much as asked.
with open('text.txt') as f:
lines = [line.strip() for line in f]
print lines[0] == lines[1]
I'm not sure what it is returning but this text file just contains two strings.
Your problem is likely related to whitespace characters (most common being carriage return, linefeed/newline, space and tab). So if you tried to compare your string1 to 'expectedvalue' and it fails, it's likely because of the newline itself.
Try this: print the length of each string then print each of the actual bytes in each string to see why the comparison fails.
For example:
>>> print len(string1), len(expected)
4 3
>>> for got_character, expected_character in zip(string1, expected):
... print 'got "{}" ({}), but expected "{}" ({})'.format(got_character, ord(got_character), expected_character, ord(expected_character))
...
got " " (32), but expected "f" (102)
got "f" (102), but expected "o" (111)
got "o" (111), but expected "o" (111)
If that's your problem, then you should strip off the leading and trailing whitespace and then execute the comparison:
>>> string1 = string1.strip()
>>> string1 == expected
True
If you're on a unix-like system, you'll probably have an xxd or od binary available to dump a more detailed representation of the file. If you're using windows, you can download many different "hex editor" programs to do the same.

Categories

Resources