I am trying to make a program that uses the functions get_neg and get_posto gather information about the file called project_twitter_data.csv and create a a csv that contains the information about how many negative and positive words that there are. As you can see (I will copy/paste all my current code) on lines 4-7, I am just testing out trying to make a csv without any of the fancy stuff, but for some reason a get an error:
**
NotImplementedError: csv is not yet implemented in Skulpt on line 1
**
I then Googled "What is Skulpt" and got a fat percentage measurer website. Could somebody please explain to a Python noob what the error means and how to fix it?
(PS here is the code):
#
import csv
with open('resulting_data.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(['first line', '2nd line'])
punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '#']
# lists of words to use
positive_words = []
with open("positive_words.txt") as pos_f:
for lin in pos_f:
if lin[0] != ';' and lin[0] != '\n':
positive_words.append(lin.strip())
negative_words = []
with open("negative_words.txt") as pos_f:
for lin in pos_f:
if lin[0] != ';' and lin[0] != '\n':
negative_words.append(lin.strip())
twitter = []
with open("project_twitter_data.csv") as pos_f:
for lin in pos_f:
if lin[0] != ';' and lin[0] != '\n':
twitter.append(lin.strip())
print(twitter)
#######################
def strip_punctuation(x):
lst = []
for letter in x:
if not letter in punctuation_chars:
lst.append(letter)
return ("".join(lst))
######################
def get_neg(x):
lst = []
original_lst_name = []
var = 0
string = x.lower()
original_lst_name = string.split(" ")
print(original_lst_name)
for letter in original_lst_name:
if strip_punctuation(letter) in negative_words:
var += 1
print(var)
print(letter)
return (var)
#######################
def get_pos(x):
lst = []
original_lst_name = []
var = 0
string = x.lower()
original_lst_name = string.split(" ")
print(original_lst_name)
for letter in original_lst_name:
if strip_punctuation(letter) in positive_words:
var += 1
print(var)
print(letter)
return (var)
#########################
try opening CSV file as text file instead and process it from there as a list.
Related
I have a very large file of about 900k values. It is a repetition of values like
/begin throw
COLOR red
DESCRIPTION
"cashmere sofa throw"
10
10
156876
DIMENSION
140
200
STORE_ADDRESS 59110
/end throw
The values keep changing, but I need it like below:
/begin throw
STORE_ADDRESS 59110
COLOR red
DESCRIPTION "cashmere sofa throw" 10 10 156876
DIMENSION 140 200
/end throw
Currently, my approach is removing the new line and including space in them:
the store address is constant throughout the file so I thought of removing it from the index and inserting it before the description
text_file = open(filename, 'r')
filedata = text_file.readlines();
for num,line in enumerate(filedata,0):
if '/begin' in line:
for index in range(num, len(filedata)):
if "store_address 59110 " in filedata[index]:
filedata.remove(filedata[index])
filedata.insert(filedata[index-7])
break
if "DESCRIPTION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","").replace("\n", " ")
filedata[index+4] = filedata[index+4].replace(" ","").replace("\n", " ")
filedata[index+5] = filedata[index+5].replace(" ","").replace("\n", " ")
filedata[index+6] = filedata[index+6].replace(" ","").replace("\n", " ")
filedata[index+7] = filedata[index+7].replace(" ","").replace("\n", " ")
filedata[index+8] = filedata[index+8].replace(" ","")
except IndexError:
print("Error Index DESCRIPTION:", index, num)
if "DIMENSION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","")
except IndexError:
print("Error Index DIMENSION:", index, num)
After which I write filedata into another file.
This approach is taking too long to run(almost an hour and a half) because as mentioned earlier it is a large file.
I was wondering if there was a faster approach to this issue
You can read the file structure by structure so that you don't have to store the whole content in memory and manipulate it there. By structure, I mean all the values between and including /begin throw and /end throw. This should be much faster.
def rearrange_structure_and_write_into_file(structure, output_file):
# TODO: rearrange the elements in structure and write the result into output_file
current_structure = ""
with open(filename, 'r') as original_file:
with open(output_filename, 'w') as output_file:
for line in original_file:
current_structure += line
if "/end throw" in line:
rearrange_structure_and_write_into_file(current_structure, output_file)
current_structure = ""
The insertion and removal of values from a long list is likely to make this code slower than it needs to be, and also makes it vulnerable to any errors and difficult to reason about. If there are any entries without store_address then the code would not work correctly and would search through the remaining entries until it finds a store address.
A better approach would be to break down the code into functions that parse each entry and output it:
KEYWORDS = ["STORE_ADDRESS", "COLOR", "DESCRIPTION", "DIMENSION"]
def parse_lines(lines):
""" Parse throw data from lines in the old format """
current_section = None
r = {}
for line in lines:
words = line.strip().split(" ")
if words[0] in KEYWORDS:
if words[1:]:
r[words[0]] = words[1]
else:
current_section = r[words[0]] = []
else:
current_section.append(line.strip())
return r
def output_throw(throw):
""" Output a throw entry as lines of text in the new format """
yield "/begin throw"
for keyword in KEYWORDS:
if keyword in throw:
value = throw[keyword]
if type(value) is list:
value = " ".join(value)
yield f"{keyword} {value}"
yield "/end throw"
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
for line in in_file:
line = line.strip()
if line == "/begin throw":
entry = []
elif line == "/end throw":
throw = parse_lines(entry)
for line in output_throw(throw):
out_file.write(line + "\n")
else:
entry.append(line)
Or if you really need to maximize performance by removing all unnecessary operations you could read, parse and write in a single long condition, like this:
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
in_section = True
def write(line):
out_file.write(line + "\n")
for line in in_file:
line = line.strip()
first = line.split()[0]
if line == "/begin throw":
in_section = False
write(line)
entry = []
elif line == "/end throw":
in_section = False
for line_ in entry:
write(line_)
write(line)
elif first == "STORE_ADDRESS":
in_section = False
write(line)
elif line in KEYWORDS:
in_section = True
entry.append(line)
elif first in KEYWORDS:
in_section = False
entry.append(line)
elif in_section:
entry[-1] += " " + line
Codefores requires a lot of multi-line input. For example:
https://codeforces.com/contest/71/problem/A
TLDR: read this and reduce length of all words
:
4
word
localization
internationalization
pneumonoultramicroscopicsilicovolcanoconiosis
I used this solution, which I believe is correct and works for me:
lines = []
while True:
line = input()
if line:
lines.append(line)
else:
break
input = '\n'.join(lines)
tab=input.splitlines()
numb=tab[0]
tab.pop(0)
for i in tab:
wordTab=[]
if len(i)>10:
wordTab.append(i[:1])
wordTab.append(i[-1:])
print(f"{i[:1]}{len(i)-2}{i[-1:]}")
else:
print(i)
Yet i got an error (on their side). How can I make Codeforces accept multi-line input in Python?
I have a setup where I have two files called "input.txt" and "output.txt". That is where I write the code locally and then I have a FILE params which I make it True and False when I have to submit to Codeforces. Here is an example of the same problem you had mentioned. It worked fine with 77 ms as running time.
import sys
from os import path
FILE = False # if needed change it while submitting
if FILE:
sys.stdin = open('input.txt', 'r')
sys.stdout = open('output.txt', 'w')
def get_int():
return int(sys.stdin.readline())
def get_string():
return sys.stdin.readline().strip()
n = get_int()
final_result = []
for i in range(n):
word = get_string()
if len(word) > 10:
word = word[0] + str(len(word) - 2) + word[-1]
final_result.append(word)
for item in final_result:
sys.stdout.write(item)
sys.stdout.write('\n')
Here is my answer
n = int(input())
arr = []
for i in range(n) :
word = input()
if len(word) > 10 :
arr.append(word[0] + str(len(word) - 2) + word[len(word)-1])
else :
arr.append(word)
print("\n".join(arr))
lines = '''4
word
localization
internationalization
pneumonoultramicroscopicsilicovolcanoconiosis'''
input = '\n'.join(lines)
tab=lines.splitlines()
for i in tab:
wordTab=[]
if len(i)>10:
start= i[:1]
middle = len(i)-2
end= i[-1:]
print(f"{start}{len(i)-2}{end}")
else:
print(i)
I have INI file formatted like this:
But i need it to look like this:
What would be the easiest solution to write such converter?
I tried to do it in Python, but it don't work as expected. My code is below.
def fix_INI_file(in_INI_filepath, out_INI_filepath):
count_lines = len(open( in_INI_filepath).readlines() )
print("Line count: " + str(count_lines))
in_INI_file = open(in_INI_filepath, 'rt')
out_arr = []
temp_arr = []
line_flag = 0
for i in range(count_lines):
line = in_INI_file.readline()
print (i)
if line == '':
break
if (line.startswith("[") and "]" in line) or ("REF:" in line) or (line == "\n"):
out_arr.append(line)
else:
temp_str = ""
line2 = ""
temp_str = line.strip("\n")
wh_counter = 0
while 1:
wh_counter += 1
line = in_INI_file.readline()
if (line.startswith("[") and "]" in line) or ("REF:" in line) or (line == "\n"):
line2 += line
break
count_lines -= 1
temp_str += line.strip("\n") + " ; "
temp_str += "\n"
out_arr.append(temp_str)
out_arr.append(line2 )
out_INI_file = open(out_INI_filepath, 'wt+')
strr_blob = ""
for strr in out_arr:
strr_blob += strr
out_INI_file.write(strr_blob)
out_INI_file.close()
in_INI_file.close()
Fortunately, there's a much easier way to handle this than by parsing the text by hand. The built-in configparser module supports keys without values via the allow_no_values constructor argument.
import configparser
read_config = configparser.ConfigParser(allow_no_value=True)
read_config.read_string('''
[First section]
s1value1
s1value2
[Second section]
s2value1
s2value2
''')
write_config = configparser.ConfigParser(allow_no_value=True)
for section_name in read_config.sections():
write_config[section_name] = {';'.join(read_config[section_name]): None}
with open('/tmp/test.ini', 'w') as outfile:
write_config.write(outfile)
While I don't immediately see a way to use the same ConfigParser object for reading and writing (it maintains default values for the original keys), using the second object as a writer should yield what you're looking for.
Output from the above example:
[First section]
s1value1;s1value2
[Second section]
s2value1;s2value2
Trying to get my program to split lines into 3 rows from a file and then apply a "if row1 == x:" to add to an existing class. Now thats not my problem, ive gotten it to work, except for when row1 is ''. So i tried changing the input file so it was ' ', then '*', and 'k' (and so on), nothing worked.
Thing is that most lines in the input file reads: 1234565,'streetadress1','streetadress2' but for some lines there are no streetadress1 only ''. but the program has no problem identifying the number or 'streetadress2'.
class adress(object):
def __init__(self,street,ykord,xkord):
self.street = street
self.ykord = ykord
self.xkord = xkord
self.connected = []
self.anlid = []
self.distances = []
self.parent = []
self.child =[]
def set_connections(self):
input_file = open("kopplingar2.txt")
temp = input_file.read().splitlines()
for l in temp:
row = l.split(',')
identity = row[0]
streetA = row[1]
streetB = row[2]
if streetA == self.street:
diction = {'street':streetB, 'identity':identity}
self.child.append(diction)
elif streetA == '':
self.anlid.append(identity)
print 'poop!'
elif streetB == self.street and streetA != '':
diction = {'street':streetA, 'identity':identity}
self.parent.append(diction)
print streetA
The 'print poop' is just to see if it ever occur, but it doesnt. It should be about 400 lines of poop as a result since about 75% of the lines in the inputfile contain ''.
I have no idea why its working for the other rows but not for row1 (except that it sometimes is '' instead of a full string).
'' is an empty string in Python. If you need to compare a value with a string consisting of two apostrophe characters, you need to write streetA = "''".
as #yole said you need to compare with "''", if for example one the line in the file is 123,'','streetB' then l would be "123,'','streetB'" the what you get is
>>> l="123,'','streetB'"
>>> l.split(',')
['123', "''", "'streetB'"]
>>>
The goal of this code is to find the frequency of words used in a book.
I am tying to read in the text of a book but the following line keeps throwing my code off:
precious protégés. No, gentlemen; he'll always show 'em a clean pair
specifically the é character
I have looked at the following documentation, but I don't quite understand it: https://docs.python.org/3.4/howto/unicode.html
Heres my code:
import string
# Create word dictionary from the comprehensive word list
word_dict = {}
def create_word_dict ():
# open words.txt and populate dictionary
word_file = open ("./words.txt", "r")
for line in word_file:
line = line.strip()
word_dict[line] = 1
# Removes punctuation marks from a string
def parseString (st):
st = st.encode("ascii", "replace")
new_line = ""
st = st.strip()
for ch in st:
ch = str(ch)
if (n for n in (1,2,3,4,5,6,7,8,9,0)) in ch or ' ' in ch or ch.isspace() or ch == u'\xe9':
print (ch)
new_line += ch
else:
new_line += ""
# now remove all instances of 's or ' at end of line
new_line = new_line.strip()
print (new_line)
if (new_line[-1] == "'"):
new_line = new_line[:-1]
new_line.replace("'s", "")
# Conversion from ASCII codes back to useable text
message = new_line
decodedMessage = ""
for item in message.split():
decodedMessage += chr(int(item))
print (decodedMessage)
return new_line
# Returns a dictionary of words and their frequencies
def getWordFreq (file):
# Open file for reading the book.txt
book = open (file, "r")
# create an empty set for all Capitalized words
cap_words = set()
# create a dictionary for words
book_dict = {}
total_words = 0
# remove all punctuation marks other than '[not s]
for line in book:
line = line.strip()
if (len(line) > 0):
line = parseString (line)
word_list = line.split()
# add words to the book dictionary
for word in word_list:
total_words += 1
if (word in book_dict):
book_dict[word] = book_dict[word] + 1
else:
book_dict[word] = 1
print (book_dict)
# close the file
book.close()
def main():
wordFreq1 = getWordFreq ("./Tale.txt")
print (wordFreq1)
main()
The error that I received is as follows:
Traceback (most recent call last):
File "Books.py", line 80, in <module>
main()
File "Books.py", line 77, in main
wordFreq1 = getWordFreq ("./Tale.txt")
File "Books.py", line 60, in getWordFreq
line = parseString (line)
File "Books.py", line 36, in parseString
decodedMessage += chr(int(item))
OverflowError: Python int too large to convert to C long
When you open a text file in python, the encoding is ANSI by default, so it doesn't contain your é chartecter. Try
word_file = open ("./words.txt", "r", encoding='utf-8')
The best way I could think of is to read each character as an ASCII value, into an array, and then take the char value. For example, 97 is ASCII for "a" and if you do char(97) it will output "a". Check out some online ASCII tables that provide values for special characters also.
Try:
def parseString(st):
st = st.encode("ascii", "replace")
# rest of code here
The new error you are getting is because you are calling isalpha on an int (i.e. a number)
Try this:
for ch in st:
ch = str(ch)
if (n for n in (1,2,3,4,5,6,7,8,9,0) if n in ch) or ' ' in ch or ch.isspace() or ch == u'\xe9':
print (ch)