Stripping duplicate words from generated text in python script - python

I made a python script to take text from an input file and randomly rearrange the words for a creative writing project based around the cut-up technique (http://en.wikipedia.org/wiki/Cut-up_technique).
Here's the script as it currently stands. NB: I'm running this as a server side include.
#!/usr/bin/python
from random import shuffle
src = open("input.txt", "r")
srcText = src.read()
src.close()
srcList = srcText.split()
shuffle(srcList)
cutUpText = " ".join(srcList)
print("Content-type: text/html\n\n" + cutUpText)
This basically does the job I want it to do, but one improvement I'd like to make is to identify duplicate words within the output and remove them. To clarify, I only want to identify duplicates in a sequence, for example "the the" or "I I I". I don't want to make it so that, for example, "the" only appears once in the entire output.
Can someone point me in the right direction to start solving this problem? (My background isn't in programming at all, so I basically put this script together through a lot of reading bits of the python manual and browsing this site. Please be gentle with me.)

You can write a generator to produce words without duplicates:
def nodups(s):
last = None
for w in s:
if w == last:
continue
yield w
last = w
Then you can use this in your program:
cutUpText = " ".join(nodups(srcList))

Adding the lines
spaces = [(i%10) == 9 and '\n' or ' ' for i in range(0,len(srcList))];
cutUpText = "".join(map(lambda x,y: "".join([x,y]),srcList,spaces));
helps bring some raw formatting to the text screens.

Add this to your existing program:
srcList = list(set(srcText.split()))

Related

Python script for core temperature

So I'm writing a little script in python to read the temperature of built-in core sensor(Raspberry Pi) + print it on the screen + write the data to the .txt file. Everything is working pretty much, but I have two problems which I couldn't solve and I have spent like 3-4 hours on it. Fully output of that command is for example: temp=39.0'C and I need only the number from that string --> 39.0. So I'm trying to replace characters and part of the string that I don't need and make it number float.
return temp.replace("temp=", " ").replace("'C", "") My problem is when I'm running this code I still have 1 space before my number and that's because I'm replacing "temp=" with " ", but if I delete that space second replace becomes part of the string and it doesn't do its job anymore, in my opinion, there's too many --> """""'""" of those characters and program gets confused. But how can I solve it? Second problem how can I change it to float because, in my opinion, those numbers are still part of the string? Please help because I am so frustrated that such little script takes so much time to get it working.
import re
import os
import time
def measure_temp():
temp = os.popen("vcgencmd measure_temp").readline()
return temp.replace("temp=", " ").replace("'C", "")
while True:
temperature = measure_temp()
print(temperature)
f = open("pythonLog.txt", "a")
f.write (temperature)
f.close()
time.sleep(1)
Thanks to all who trie to help i have solved by extracting by using temp = temp[1] so the space at the index 0 is gone.

How to avoid Nonetype when combining lists in Python

I am very new to Python and am looking for assistance to where I am going wrong with an assignment. I have attempted different ways to approach the problem but keep getting stuck at the same point(s):
Problem 1: When I am trying to create a list of words from a file, I keep making a list for the words per line rather than the entire file
Problem 2: When I try and combine the lists I keep receiving "None" for my result or Nonetype errors [which I think means I have added the None's together(?)].
The assignment is:
#8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.You can download the sample data at http://www.py4e.com/code3/romeo.txt
My current code which is giving me a Nonetype error is:
poem = input("enter file:")
play = open(poem)
lst= list()
for line in play:
line=line.rstrip()
word=line.split()
if not word in lst:
lst= lst.append(word)
print(lst.sort())
If someone could just talk me through where I am going wrong that will be greatly appreciated!
your problem was lst= lst.append(word) this returns None
with open(poem) as f:
lines = f.read().split('\n') #you can also you readlines()
lst = []
for line in lines:
words = line.split()
for word in words:
if word:
lst.append(word)
Problem 1: When I am trying to create a list of words from a file, I keep making a list for the words per line rather than the entire file
You are doing play = open(poem) then for line in play: which is method for processing file line-by-line, if you want to process whole content at once then do:
play = open(poem)
content = play.read()
words = content.split()
Please always remember to close file after you used it i.e. do
play.close()
unless you use context manager way (i.e. like with open(poem) as f:)
Just to help you get into Python a little more:
You can:
1. Read whole file at once (if it is big it is better to grab it into RAM if you have enough of it, if not grab as much as you can for the chunk to be reasonable, then grab another one and so on)
2. Split data you read into words and
3. Use set() or dict() to remove duplicates
Along the way, you shouldn't forget to pay attention to upper and lower cases,
if you need same words, not just different not repeating strings
This will work in Py2 and Py3 as long as you do something about input() function in Py2 or use quotes when entering the path, so:
path = input("Filename: ")
f = open(filename)
c = f.read()
f.close()
words = set(x.lower() for x in c.split()) # This will make a set of lower case words, with no repetitions
# This equals to:
#words = set()
#for x in c.split():
# words.add(x.lower())
# set() is an unordered datatype ignoring duplicate items
# and it mimics mathematical sets and their properties (unions, intersections, ...)
# it is also fast as hell.
# Checking a list() each time for existance of the word is slow as hell
#----
# OK, you need a sorted list, so:
words = sorted(words)
# Or step-by-step:
#words = list(words)
#words.sort()
# Now words is your list
As for your errors, do not worry, they are common at the beginning in almost any objective oriented language.
Other explained them well in their comments. But not to make the answer lacking...:
Always pay attention on functions or methods which operate on the datatype (in place sort - list.sort(), list.append(), list.insert(), set.add()...) and which ones return a new version of the datatype (sorted(), str.lower()...).
If you ran into a similar situation again, use help() in interactive shell to see what exactly a function you used does.
>>> help(list.append)
>>> help(list.sort)
>>> help(str.lower)
>>> # Or any short documentation you need
Python, especially Python 3.x is sensitive to trying operations between types, but some might have a different connotation and can actually work while doing unexpected stuff.
E.g. you can do:
print(40*"x")
It will print out 40 'x' characters, because it will create a string of 40 characters.
But:
print([1, 2, 3]+None)
will, logically not work, which is what is happening somewhere in the rest of your code.
In some languages like javascript (terrible stuff) this will work perfectly well:
v = "abc "+123+" def";
Inserting the 123 seamlessly into the string. Which is usefull, but a programming nightmare and nonsense from another viewing angle.
Also, in Py3 a reasonable assumption from Py2 that you can mix unicode and byte strings and that automatic cast will be performed is not holding.
I.e. this is a TypeError:
print(b"abc"+"def")
because b"abc" is bytes() and "def" (or u"def") is str() in Py3 - what is unicode() in Py2)
Enjoy Python, it is the best!

Simple Python string code\decode script question

I'm learning Python and for my homework I wrote a simple script that get a string from user like this one: aaaabbbbcccdde and transforms it to a4b4c3d2e1.
Now I've decided to get things more interesting and modified code for continuous input and output in realtime. So I need a possibility to enter symbols and get an output coded with that simple algorithm.
The only problem I've faced with I needed output without '\n' so all the coded symbols were printed consequently in one string e.g: a4b4c3d2e1
But in that case output symbols mixed with my input and eventually the script froze. Obviously I need Enter symbols for input on one string and output it on another string w/o line breaks.
So, could you tell me please is it possible without a lot of difficulties for newbie make up a code that would do something like this:
a - #here the string in shell where I'm always add any symbols
a4b4c3d2e1a4b4c3d2e1a4b4c3d2e1 - #here, on the next string the script continuously outputs results of coding without breaking the line.
import getch
cnt = 1
print('Enter any string:')
user1 = getch.getch()
while True:
buf = getch.getch()
if buf == user1:
cnt += 1
user1 = buf
else:
print(user1, cnt, sep='')
user1 = buf
cnt = 1
so this snippet outputs me something like this:
a4
s4
d4
f4
etc
And in all cases when I'm trying to add end='' to output print() the program sticks.
What is possible to do to get rid of that?
Thanks !
I don't really know the details but I can say that: when you add end='', the program don't freeze, but the output (stdout) does not refresh (maybe due to some optimisation ? I really don't know).
So what you want to do is to flush the output right after you print in it.
print(user1, cnt, sep='', end='')
sys.stdout.flush()
(It is actually a duplicate of How to flush output of print function? )

Function creation - "Undefined name" - Python

I'm writing some code that reads words from a text file and sorts them into a dictionary. It actually all runs fine, but for reference here it is:
def find_words(file_name, delimiter = " "):
"""
A function for finding the number of individual words, and the most popular words, in a given file.
The process will stop at any line in the file that starts with the word 'finish'.
If there is no finish point, the process will go to the end of the file.
Inputs: file_name: Name of file you want to read from, e.g. "mywords.txt"
delimiter: The way the words in the file are separated e.g. " " or ", "
: Delimiter will default to " " if left blank.
Output: Dictionary with all the words contained in the given file, and how many times each word appears.
"""
words = []
dictt = {}
with open(file_name, 'r') as wordfile:
for line in wordfile:
words = line.split(delimiter)
if words[0]=="finish":
break
# This next part is for filling the dictionary
# and correctly counting the amount of times each word appears.
for i in range(len(words)):
a = words[i]
if a=="\n" or a=="":
continue
elif dictt.has_key(a)==False:
dictt[words[i]] = 1
else:
dictt[words[i]] = int(dictt.get(a)) + 1
return dictt
The problem is that it only works if the arguments are given as string literals, e.g, this works:
test = find_words("hello.txt", " " )
But this doesn't:
test = find_words(hello.txt, )
The error message is undefined name 'hello'
I don't know how to alter the function arguments such that I can enter them without speech marks.
Thanks!
Simple, you define that name:
class hello:
txt = "hello.txt"
But joking aside, all the argument values in a function call are expressions. If you want to pass a string literally you'll have to make a string literal, using the quotes. Python is not a text preprocessor like m4 or cpp, and expects the entire program text to follow its syntax.
So it turns out I just misunderstood what was being asked. I've had it clarified by the course leader now.
As I am now fully aware, a function definition needs to be told when a string is being entered, hence the quote marks being required.
I admit full ignorance over my depth of understanding of how it all works - I thought you could pretty much put any assortment of letters and/or numbers in as an argument and then you can manipulate them within the function definition.
My ignorance may stem from the fact that I'm quite new to Python, having learned my coding basics on C++ where, if I remember correctly (it was well over a year ago), functions are defined with each argument being specifically set up as their type, e.g.
int max(int num1, int num2)
Whereas in Python you don't quite do it like that.
Thanks for the attempts at help (and ridicule!)
Problem is sorted now.

Write text to file in columns

As an attorney I am a total newbie in programimg. As an enthusiastic newbie, I learn a lot. (what are variables, ect.) ).
So I'm working a lot with dir() and I'm looking into results. It would by nicer if I could see the output in one or more columns. So I want to write my first program which write for example dir(sys) in a output file in columns.
So far I've got this:
textfile = open('output.txt','w')
syslist = dir(sys)
for x in syslist:
print(x)
The output on display is exactly what I want, but when I use the .write like:
textfile = open('output.txt','w')
syslist = dir(sys)
for x in syslist:
textfile.write(x)
textfile.close()
The text is in lines.
Can anyone pleaase help me, how to write the output of dir(sys) to a file in columns?
If I can ask you, please write the easysiet way, because I really have to look almost after for every word you write in documentation. Thanks in advance.
print adds a newline after the string printed by default, file.write doesn't. You can do:
for x in syslist: textfile.write("%s\n" % x)
...to add newlines as you're appending. Or:
for x in syslist: textfile.write("%s\t" % x)
...for tabs in between.
I hope this is clear for you "prima facie" ;)
The other answers seem to be correct if they guess that you're trying to add newlines that .write doesn't provide. But since you're new to programming, I'll point out some good practices in python that end up making your life easier:
with open('output.txt', 'w') as textfile:
for x in dir(sys):
textfile.write('{f}\n'.format(f=x))
The 'with' uses 'open' as a context manager. It automatically closes the file it opens, and allows you to see at a quick glance where the file is open. Only keep things inside the context manager that need to be there. Also, using .format is often encouraged.
Welcome to Python!
The following code will give you a tab-separated list in three columns, but it won't justify the output for you. It's not fully optimized so it should be easier to understand, and I've commented the portions that were added.
textfile = open('output.txt','w')
syslist = dir(sys)
MAX_COLUMNS = 3 # Maximum number of columns to print
colcount = 0 # Track the column number
for x in syslist:
# First thing we do is add one to the column count when
# starting the loop. Since we're doing some math on it below
# we want to make sure we don't divide by zero.
colcount += 1
textfile.write(x)
# After each entry, add a tab character ("\t")
textfile.write("\t")
# Now, check the column count against the MAX_COLUMNS. We
# use a modulus operator (%) to get the remainder after dividing;
# any number divisible by 3 in our example will return '0'
# via modulus.
if colcount % MAX_COLUMNS == 0:
# Now write out a new-line ("\n") to move to the next line.
textfile.write("\n")
textfile.close()
Hope that helps!
I'm a little confused by your question, but I imagine the answer is as simple as adding in tabs. So change textfile.write(x) to textfile.write(x + "\t"), no? You can adjust the number of tabs based on the size of the data.
I'm editing my answer.
Note that dir(sys) gives you an array of string values. These string values do not have any formatting. The print x command adds a newline character by default, which is why you are seeing them each on their own line. However, write does not. So when you call write, you need to add in any necessary formatting. If you want it to look identical to the output of print, you need to add in write(x + "\n") to get those newline characters that print was automatically including.

Categories

Resources