Python won't accept two same strings as the same

Python won't accept two same strings as the same - python

I'm quite a newcomer to Python and I am stuck in the following situation:
I want to hash a password and compare it with the masterhash. Unfortunately Python doesn't accept them as the same:
import hashlib
h=hashlib.sha512()
username='admin'
username=username.encode('utf-8')
h.update(username)
hexdigest=h.hexdigest()
hlist=open("database.txt")#masterhash
lines=hlist.readlines()
userhash=lines[0]#masterhash in line 0
if userhash == hexdigest: # it doesent accept them as the same
text = "True"
else:
text="False"
I already checked the objectypes: both string
The hash, both times:
c7ad44cbad762a5da0a452f9e854fdc1e0e7a52a38015f23f3eab1d80b931dd472634dfac71cd34ebc35d16ab7fb8a90c81f975113d6c7538dc69dd8de9077ec
I really don't understand the problem.

The problem is this line:
lines = hlist.readlines()
Each value in this list will have a trailing newline (which you may not notice when printing). Make sure you strip that off.
userhash = lines[0].strip()

readlines() returns lines with newlines at their ends. You are comparing "A" with "A\n". Try this:
if userhash.strip() == hexdigest

When you use readlines() you get a list of the lines with the new line character at the end of each line, you can do one of two options:
Option #1:
lines = hlist.readlines()
userhash = lines[0].rstrip()
Option #2:
lines = hlist.read().splitlines()
userhash = lines[0]

Related

How to avoid Nonetype when combining lists in Python

I am very new to Python and am looking for assistance to where I am going wrong with an assignment. I have attempted different ways to approach the problem but keep getting stuck at the same point(s):
Problem 1: When I am trying to create a list of words from a file, I keep making a list for the words per line rather than the entire file
Problem 2: When I try and combine the lists I keep receiving "None" for my result or Nonetype errors [which I think means I have added the None's together(?)].
The assignment is:
#8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.You can download the sample data at http://www.py4e.com/code3/romeo.txt
My current code which is giving me a Nonetype error is:
poem = input("enter file:")
play = open(poem)
lst= list()
for line in play:
line=line.rstrip()
word=line.split()
if not word in lst:
lst= lst.append(word)
print(lst.sort())
If someone could just talk me through where I am going wrong that will be greatly appreciated!

your problem was lst= lst.append(word) this returns None
with open(poem) as f:
lines = f.read().split('\n') #you can also you readlines()
lst = []
for line in lines:
words = line.split()
for word in words:
if word:
lst.append(word)

Problem 1: When I am trying to create a list of words from a file, I keep making a list for the words per line rather than the entire file
You are doing play = open(poem) then for line in play: which is method for processing file line-by-line, if you want to process whole content at once then do:
play = open(poem)
content = play.read()
words = content.split()
Please always remember to close file after you used it i.e. do
play.close()
unless you use context manager way (i.e. like with open(poem) as f:)

Just to help you get into Python a little more:
You can:
1. Read whole file at once (if it is big it is better to grab it into RAM if you have enough of it, if not grab as much as you can for the chunk to be reasonable, then grab another one and so on)
2. Split data you read into words and
3. Use set() or dict() to remove duplicates
Along the way, you shouldn't forget to pay attention to upper and lower cases,
if you need same words, not just different not repeating strings
This will work in Py2 and Py3 as long as you do something about input() function in Py2 or use quotes when entering the path, so:
path = input("Filename: ")
f = open(filename)
c = f.read()
f.close()
words = set(x.lower() for x in c.split()) # This will make a set of lower case words, with no repetitions
# This equals to:
#words = set()
#for x in c.split():
# words.add(x.lower())
# set() is an unordered datatype ignoring duplicate items
# and it mimics mathematical sets and their properties (unions, intersections, ...)
# it is also fast as hell.
# Checking a list() each time for existance of the word is slow as hell
#----
# OK, you need a sorted list, so:
words = sorted(words)
# Or step-by-step:
#words = list(words)
#words.sort()
# Now words is your list
As for your errors, do not worry, they are common at the beginning in almost any objective oriented language.
Other explained them well in their comments. But not to make the answer lacking...:
Always pay attention on functions or methods which operate on the datatype (in place sort - list.sort(), list.append(), list.insert(), set.add()...) and which ones return a new version of the datatype (sorted(), str.lower()...).
If you ran into a similar situation again, use help() in interactive shell to see what exactly a function you used does.
>>> help(list.append)
>>> help(list.sort)
>>> help(str.lower)
>>> # Or any short documentation you need
Python, especially Python 3.x is sensitive to trying operations between types, but some might have a different connotation and can actually work while doing unexpected stuff.
E.g. you can do:
print(40*"x")
It will print out 40 'x' characters, because it will create a string of 40 characters.
But:
print([1, 2, 3]+None)
will, logically not work, which is what is happening somewhere in the rest of your code.
In some languages like javascript (terrible stuff) this will work perfectly well:
v = "abc "+123+" def";
Inserting the 123 seamlessly into the string. Which is usefull, but a programming nightmare and nonsense from another viewing angle.
Also, in Py3 a reasonable assumption from Py2 that you can mix unicode and byte strings and that automatic cast will be performed is not holding.
I.e. this is a TypeError:
print(b"abc"+"def")
because b"abc" is bytes() and "def" (or u"def") is str() in Py3 - what is unicode() in Py2)
Enjoy Python, it is the best!

Replace words in list that later will be used in variable

I have a file which currently stores a string eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
which I am trying to pass into as a variable to my subprocess command.
My current code looks like this
with open(logfilnavn, 'r') as t:
test = t.readlines()
print(test)
But this prints ['eeb39d3e-dd4f-11e8-acf7-a6389e8e7978\n'] and I don't want the part with ['\n'] to be passed into my command, so i'm trying to remove them by using replace.
with open(logfilnavn, 'r') as t:
test = t.readlines()
removestrings = test.replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
print(removestrings)
I get an exception value saying this so how can I replace these with nothing and store them as a string for my subprocess command?
'list' object has no attribute 'replace'
so how can I replace these with nothing and store them as a string for my subprocess command?

readline() returns a list. Try print(test[0].strip())

You can read the whole file and split lines using str.splitlines:
test = t.read().splitlines()

Your test variable is a list, because readlines() returns a list of all lines read.
Since you said the file only contains this one line, you probably wish to perform the replace on only the first line that you read:
removestrings = test[0].replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')

Where you went wrong...
file.readlines() in python returns an array (collection or grouping of the same variable type) of the lines in the file -- arrays in python are called lists. you, here are treating the list as a string. you must first target the string inside it, then apply that string-only function.
In this case however, this would not work as you are trying to change the way the python interpretter has displayed it for one to understand.
Further information...
In code it would not be a string - we just can't easily understand the stack, heap and memory addresses easily. The example below would work for any number of lines (but it will only print the first element) you will need to change that and
this may be useful...
you could perhaps make the variables globally available (so that other parts of the program can read them
more useless stuff
before they go out of scope - the word used to mean the points at which the interpreter (what runs the program) believes the variable is useful - so that it can remove it from memory, or in much larger programs only worry about the locality of variables e.g. when using for loops i is used a lot without scope there would need to be a different name for each variable in the whole project. scopes however get specialised (meaning that if a scope contains the re-declaration of a variable this would fail as it is already seen as being one. an easy way to understand this might be to think of them being branches and the connections between the tips of branches. they don't touch along with their variables.
solution?
e.g:
with open(logfilenavn, 'r') as file:
lines = file.readlines() # creates a list
# an in-line for loop that goes through each item and takes off the last character: \n - the newline character
#this will work with any number of lines
strippedLines = [line[:-1] for line in lines]
#or
strippedLines = [line.replace('\n', '') for line in lines]
#you can now print the string stored within the list
print(strippedLines[0]) # this prints the first element in the list
I hope this helped!

You get the error because readlines returns a list object. Since you mentioned in the comment that there is just one line in the file, its better to use readline() instead,
line = "" # so you can use it as a variable outside `with` scope,
with open("logfilnavn", 'r') as t:
line = t.readline()
print(line)
# output,
eeb39d3e-dd4f-11e8-acf7-a6389e8e7978

readlines will return a list of lines, and you can't use replace with a list.
If you really want to use readlines, you should know that it doesn't remove the newline character from the end, you'll have to do it yourself.
lines = [line.rstrip('\n') for line in t.readlines()]
But still, after removing the newline character yourself from the end of each line, you'll have a list of lines. And from the question, it looks like, you only have one line, you can just access first line lines[0].
Or you can just leave out readlines, and just use read, it'll read all of the contents from the file. And then just do rstrip.
contents = t.read().rstrip('\n')

Python - \n appearing in concatenated strings

I've been having an issue with my Python code. I am trying to concatenate the value of two string objects, but when I run the code it keeps printing a '\n' between the two strings.
My code:
while i < len(valList):
curVal = valList[i]
print(curVal)
markupConstant = 'markup.txt'
markupFileName = curVal + markupConstant
markupFile = open(markupFileName)
Now when I run this, it gives me this error:
OSError: [Errno 22] Invalid argument: 'cornWhiteTrimmed\nmarkup.txt'
See that \n between the two strings? I've dissected the code a bit by printing each string individually, and neither one contains a \n on its own. Any ideas as to what I'm doing wrong?
Thanks in advance!

The concatenation itself doesn't add the \n for sure. valList is probably the result of calling readlines() on a file object, so each element in it will have a trailing \n. Call strip on each element before using it:
while i < len(valList):
curVal = valList[i].strip()
print(curVal)
markupConstant = 'markup.txt'
markupFileName = curVal + markupConstant
markupFile = open(markupFileName)

The reason you are not seeing the \n when you actually print out the python statements is because \n is technically the newline character. You will not see this when you actually print, it will only skip to a new line. The problem is when you have this in the middle of your two strings, it is going to cause problems. The solution to your issue is the strip method. You can read into its documentation here (https://www.tutorialspoint.com/python/string_strip.htm) but basically you can use this method to strip the newline character off of any of your strings.

Just to make an addition to the other answers explaining why this came about:
When you need to actually inspect what characters a string contains, you can't simply print it. Many characters are "invisible" when printed.
Turn the string into a list first:
list(curVal)
Or my personal favorite:
[c for c in curVal]
These will create lists that properly show all hard to see characters.

What is the best way to iterate over a python list, excluding certain values and printing out the result

I am new to python and have a question:
I have checked similar questions, checked the tutorial dive into python, checked the python documentation, googlebinging, similar Stack Overflow questions and a dozen other tutorials.
I have a section of python code that reads a text file containing 20 tweets. I am able to extract these 20 tweets using the following code:
with open ('output.txt') as fp:
for line in iter(fp.readline,''):
Tweets=json.loads(line)
data.append(Tweets.get('text'))
i=0
while i < len(data):
print data[i]
i=i+1
The above while loop iterates perfectly and prints out the 20 tweets (lines) from output.txt.
However, these 20 lines contain Non-English Character data like "Los ladillo a los dos, soy maaaala o maloooooooooooo", URLs like "http://t.co/57LdpK", the string "None" and Photos with a URL like so "Photo: http://t.co/kxpaaaaa(I have edited this for privacy)
I would like to purge the output of this (which is a list), and exclude the following:
The None entries
Anything beginning with the string "Photo:"
It would be a bonus also if I can exclude non-unicode data
I have tried the following bits of code
Using data.remove("None:") but I get the error list.remove(x): x not in list.
Reading the items I do not want into a set and then doing a comparison on the output but no luck.
Researching into list comprehensions, but wonder if I am looking at the right solution here.
I am from an Oracle background where there are functions to chop out any wanted/unwanted section of output, so really gone round in circles in the last 2 hours on this. Any help greatly appreciated!

Try something like this:
def legit(string):
if (string.startswith("Photo:") or "None" in string):
return False
else:
return True
whatyouwant = [x for x in data if legit(x)]
I'm not sure if this will work out of the box for your data, but you get the idea. If you're not familiar, [x for x in data if legit(x)] is called a list comprehension

First of all, only add Tweet.get('text') if there is a text entry:
with open ('output.txt') as fp:
for line in iter(fp.readline,''):
Tweets=json.loads(line)
if 'text' in Tweets:
data.append(Tweets['text'])
That'll not add None entries (.get() returns None if the 'text' key is not present in the dictionary).
I'm assuming here that you want to further process the data list you are building here. If not, you can dispense with the for entry in data: loops below and stick to one loop with if statements. Tweets['text'] is the same value as entry in the for entry in data loops.
Next, you are looping over python unicode values, so use the methods provided on those objects to filter out what you don't want:
for entry in data:
if not entry.startswith("Photo:"):
print entry
You can use a list comprehension here; the following would print all entries too, in one go:
print '\n'.join([entry for entry in data if not entry.startswith("Photo:")])
In this case that doesn't really buy you much, as you are building one big string just to print it; you may as well just print the individual strings and avoid the string building cost.
Note that all your data is Unicode data. What you perhaps wanted is to filter out text that uses codepoints beyond ASCII points perhaps. You could use regular expressions to detect that there are codepoints beyond ASCII in your text
import re
nonascii = re.compile(ur'[^\x00-0x7f]', re.UNICODE) # all codepoints beyond 0x7F are non-ascii
for entry in data:
if entry.startswith("Photo:") or nonascii.search(entry):
continue # skip the rest of this iteration, continue to the next
print entry
Short demo of the non-ASCII expression:
>>> import re
>>> nonascii = re.compile(ur'[^\x00-\x7f]', re.UNICODE)
>>> nonascii.search(u'All you see is ASCII')
>>> nonascii.search(u'All you see is ASCII plus a little more unicode, like the EM DASH codepoint: \u2014')
<_sre.SRE_Match object at 0x1086275e0>

with open ('output.txt') as fp:
for line in fp.readlines():
Tweets=json.loads(line)
if not 'text' in Tweets: continue
txt = Tweets.get('text')
if txt.replace('.', '').replace('?','').replace(' ','').isalnum():
data.append(txt)
print txt
Small and simple.
Basic principle, one loop, if data matches your "OK" criteria add it and print it.
As Martijn pointed out, 'text' might not be in all the Tweets data.
Regexp replacement for .replace() would go something along the lines of: if re.match('^[\w-\ ]+$', txt) is not None: (it will not work for blankspace etc so yea as mentioned below..)

I'd suggest something like the following:
# use itertools.ifilter to remove items from a list according to a function
from itertools import ifilter
import re
# write a function to filter out entries you don't want
def my_filter(value):
if not value or value.startswith('Photo:'):
return False
# exclude unwanted chars
if re.match('[^\x00-\x7F]', value):
return False
return True
# Reading the data can be simplified with a list comprehension
with open('output.txt') as fp:
data = [json.loads(line).get('text') for line in fp]
# do the filtering
data = list(ifilter(my_filter, data))
# print the output
for line in data:
print line
Regarding unicode, assuming you're using python 2.x, the open function won't read data as unicode, it'll be read as the str type. You might want to convert it if you know the encoding, or read the file with a given encoding using codecs.open.

Try this:
with open ('output.txt') as fp:
for line in iter(fp.readline,''):
Tweets=json.loads(line)
data.append(Tweets.get('text'))
i=0
while i < len(data):
# these conditions will skip (continue) over the iterations
# matching your first two conditions.
if data[i] == None or data[i].startswith("Photo"):
continue
print data[i]
i=i+1

Importing a text file to create a list in Python 3.x?

I can't seem to figure out how to use values given in a text file and import them into python to create a list. What I'm trying to accomplish here is to create a gameboard and then put numbers on it as a sample set. I have to use Quickdraw to accomplish this - I kind of know how to get the numbers on Quickdraw but I cannot seem to import the numbers from the text file. Previous assignments involved getting the user to input values or using an I/O redirection, this is a little different. Could anyone assist me on this?

Depends on the contents of the file you want to read and output in the list you want to get.
# assuming you have values each on separate line
values = []
for line in open('path-to-the-file'):
values.append(line)
# might want to implement stripping newlines and such in here
# by using line.strip() or .rstrip()
# or perhaps more than one value in a line, with some separator
values = []
for line in open('path-to-the-file'):
# e.g. ':' as a separator
separator = ':'
line = line.split(separator)
for value in line:
values.append(value)
# or all in one line with separators
values = open('path-to-the-file').read().split(separator)
# might want to use .strip() on this one too, before split method
It could be more accurate if we knew the input and output requirements.

Two steps here:
open the file
read the lines
This page might help you: http://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.