Python: CSV delimiter failing randomly

Python: CSV delimiter failing randomly - python

I have created a script which a number of random passwords are generated (see below)
import string
import secrets
import datetime
now = datetime.datetime.now()
T = now.strftime('%Y_%m_d')
entities = ['AA','BB','CC','DD','EE','FF','GG','HH']
masterpass = ('MasterPass' + '_' + T + '.csv')
f= open(masterpass,"w+")
def random_secure_string(stringLength):
secureStrMain = ''.join((secrets.choice(string.ascii_lowercase + string.ascii_uppercase + string.digits + ('!'+'?'+'"'+'('+')'+'$'+'%'+'#'+'#'+'/'+':'+';'+'['+']'+'#')) for i in range(stringLength)))
return secureStrMain
def random_secure_string_lower(stringLength):
secureStrLower = ''.join((secrets.choice(string.ascii_lowercase)) for i in range(stringLength))
return secureStrLower
def random_secure_string_upper(stringLength):
secureStrUpper = ''.join((secrets.choice(string.ascii_uppercase)) for i in range(stringLength))
return secureStrUpper
def random_secure_string_digit(stringLength):
secureStrDigit = ''.join((secrets.choice(string.digits)) for i in range(stringLength))
return secureStrDigit
def random_secure_string_char(stringLength):
secureStrChar = ''.join((secrets.choice('!'+'?'+'"'+'('+')'+'$'+'%'+'#'+'#'+'/'+':'+';'+'['+']'+'#')) for i in range(stringLength))
return secureStrChar
for x in entities:
f.write(x + ',' + random_secure_string(6) + random_secure_string_lower(1) + random_secure_string_upper(1) + random_secure_string_digit(1) + random_secure_string_char(1) + ',' + T + "\n")
f.close()
I use pandas to get the code to import a list, so normally it is for 200-250 entities, not just the 8 in the example.
The issue comes every so often where it looks like the comma delimiter fails to be read (see row 6 of attached photo)
In all the cases I have had of this (multiple run throughs), it looks like the 10th character is a comma, the 4 before (characters 6-9) are as stated in the script, but then instead of generating 6 initial characters (from random_secure_string(6)), it is generating 5. Could this be causing the issue? If so, how do I fix this?
Thank you in advance

Wild guess, because the content of the csv file as text is required to make sure.
A csv is a Comma Separated Values text file. That means that it is a plain text files where fields are delimited with a separator, normally the comma (,). In order to allow text fields to contain commas or even new lines, they can be enclosed in quotes (normally ") or special characters can be escaped, normally with \.
That means that if a line contains abcdefg\,2020_05 the comma will not be interpreted as a separator.
How to fix:
CSV is a simple format, but with many corner cases. The rule is avoid to read or write it by hand. Just use the standard library csv module here:
...
import csv
...
with open(masterpass,"w+", newline='') as f:
wr = csv.writer(f)
for x in entities:
wr.writerow([x, random_secure_string(6) + random_secure_string_lower(1) + random_secure_string_upper(1) + random_secure_string_digit(1) + random_secure_string_char(1), T])
The writer will take care for special characters and ensure that appropriate encoding or escaping will be used

Related

Python - How to handle space as a value of a variable without quotes?

I have a string "bitrate:8000"
I need to convert it to "-bps 8000". Note that the parameter name is changed and so is the delimiter from ':' to space.
Also the delimiters are not fixed always, sometimes I would need to change from ':' to '-' using the same program.
The change rules are supplied as a config file which I am reading through the ConfigParser module. Something like:
[params]
modify_param_name = bitrate/bps
modify_delimiter = :/' '
value = 8000
In my program:
orig_param = modify_param_name.split('/')[0]
new_param = modify_param_name.split('/')[1]
orig_delimiter = modify_delimiter.split('/')[0]
new_delimiter = modify_delimiter.split('/')[1]
new_param_string = new_param + new_delimiter + value
However, this results in the string as below:
-bps' '8000
The question is how can I handle spaces without the ' ' quotes?

The reason why you're getting the ' ' string is probably related to the way you parse your modify_delimiter value.
You're reading that as a string, so that modify_delimiter == ":/' '".
When you're doing:
new_delimiter = modify_delimiter.split('/')[1]
Essentially modify_delimiter.split('/') gives you an array of [':', "' '"].
So when you're doing new_param_string = new_param + new_delimiter + value
, you are concatenating together 'bps' + "' '" + '8000'.
If your modify_delimiter contained the string ':/ ', this would work just fine:
>>> new_param_string = new_param + new_delimiter + value
>>> new_param_string
'bps 8000'
It has been pointed out that you're using ConfigParser. Unfortunatelly, I don't see an option for ConfigParser (either in python 2 or 3) to preserve trailing whitespaces - it looks like they're always stripped.
What I can suggest in that case is that you wrap your string in quotes entirely in your config file:
[params]
modify_param_name = bitrate/bps
modify_delimiter = ":/ "
And in your code, when you initialize modify_delimiter, strip the " on your own:
modify_delimiter = config.get('params', 'modify_delimiter').strip('"')
That way the trailing space will get preserved and you should get your desired output.

Print strings with line break Python

import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = [format(s,'02')+format(d.year,'04')+format(d.month,'02')+format(d.day,'02')+format(d.hour,'02')+format(d.minute,'02')+format(int(p*0.2),'04')]
outfile.writelines(nr+'/n')
Using the above script, I have read in a .txt file and reformatted it as 'nr' so it looks like this:
['012015072314000000']
['012015072313450000']
['012015072313300000']
['012015072313150000']
['012015072313000000']
['012015072312450000']
['012015072312300000']
['012015072312150000']
..etc.
I need to now print it onto my new .txt file, but Python is not allowing me to print 'nr' with line breaks after each entry, I think because the data is in strings. I get this error:
TypeError: can only concatenate list (not "str") to list
Is there another way to do this?

You are trying to combine a list with a string, which cannot work. Simply don't create a list in nr.
import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = "{:02d}{:%Y%m%d%H%M}{:04d}\n".format(s,d,int(p*0.2))
outfile.write(nr)

There is no need to put your string into a list; just use outfile.write() here and build a string without a list:
nr = format(s,'02') + format(d.year,'04') + format(d.month, '02') + format(d.day, '02') + format(d.hour, '02') + format(d.minute, '02') + format(int(p*0.2), '04')
outfile.write(nr + '\n')
Rather than use 7 separate format() calls, use str.format():
nr = '{:02}{:%Y%m%d%H%M}{:04}\n'.format(s, d, int(p * 0.2))
outfile.write(nr)
Note that I formatted the datetime object with one formatting operation, and I included the newline into the string format.
You appear to have hard-coded the s value; you may as well put that into the format directly:
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, int(p * 0.2))
outfile.write(nr)
Together, that updates your script to:
with open('soundTransit1_remote_rawMeasurements_15m.txt', 'r') as infile,\
open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile, delimiter='\t')
for row in inr:
d = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
p = int(int(row[5]) * 0.2)
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, p)
outfile.write(nr)
Take into account that the csv module works better if you follow the guidelines about opening files; in Python 2 you need to open the file in binary mode ('rb'), in Python 3 you need to set the newline parameter to ''. That way the module can control newlines correctly and supports including newlines in column values.

Writing to UTF-16-LE text file with BOM

I've read a few postings regarding Python writing to text files but I could not find a solution to my problem. Here it is in a nutshell.
The requirement: to write values delimited by thorn characters (u00FE; and surronding the text values) and the pilcrow character (u00B6; after each column) to a UTF-16LE text file with BOM (FF FE).
The issue: The written-to text file has whitespace between each column that I did not script for. Also, it's not showing up right in UltraEdit. Only the first value ("mom") shows. I welcome any insight or advice.
The script (simplified to ease troubleshooting; the actual script uses a third-party API to obtain the list of values):
import os
import codecs
import shutil
import sys
import codecs
first = u''
textdel = u'\u00FE'.encode('utf_16_le') #thorn
fielddel = u'\u00B6'.encode('utf_16_le') #pilcrow
list1 = ['mom', 'dad', 'son']
num = len(list1) #pretend this is from the metadata profile
f = codecs.open('c:/myFile.txt', 'w', 'utf_16_le')
f.write(u'\uFEFF')
for item in list1:
mytext2 = u''
i = 0
i = i + 1
mytext2 = mytext2 + item + textdel
if i < (num - 1):
mytext2 = mytext2 + fielddel
f.write(mytext2 + u'\n')
f.close()

You're double-encoding your strings. You've already opened your file as UTF-16-LE, so leave your textdel and fielddel strings unencoded. They will get encoded at write time along with every line written to the file.
Or put another way, textdel = u'\u00FE' sets textdel to the "thorn" character, while textdel = u'\u00FE'.encode('utf-16-le') sets textdel to a particular serialized form of that character, a sequence of bytes according to that codec; it is no longer a sequence of characters:
textdel = u'\u00FE'
len(textdel) # -> 1
type(textdel) # -> unicode
len(textdel.encode('utf-16-le')) # -> 2
type(textdel.encode('utf-16-le')) # -> str

Extracting Data from Multiple TXT Files and Creating a Summary CSV File in Python

I have a folder with about 50 .txt files containing data in the following format.
=== Predictions on test data ===
inst# actual predicted error distribution (OFTd1_OF_Latency)
1 1:S 2:R + 0.125,*0.875 (73.84)
I need to write a program that combines the following: my index number (i), the letter of the true class (R or S), the letter of the predicted class, and each of the distribution predictions (the decimals less than 1.0).
I would like it to look like the following when finished, but preferably as a .csv file.
ID True Pred S R
1 S R 0.125 0.875
2 R R 0.105 0.895
3 S S 0.945 0.055
. . . . .
. . . . .
. . . . .
n S S 0.900 0.100
I'm a beginner and a bit fuzzy on how to get all of that parsed and then concatenated and appended. Here's what I was thinking, but feel free to suggest another direction if that would be easier.
for i in range(1, n):
s = str(i)
readin = open('mydata/output/output'+s+'out','r')
#The files are all named the same but with different numbers associated
output = open("mydata/summary.csv", "a")
storage = []
for line in readin:
#data extraction/concatenation here
if line.startswith('1'):
id = i
true = # split at the ':' and take the letter after it
pred = # split at the second ':' and take the letter after it
#some have error '+'s and some don't so I'm not exactly sure what to do to get the distributions
ds = # split at the ',' and take the string of 5 digits before it
if pred == 'R':
dr = #skip the character after the comma but take the have characters after
else:
#take the five characters after the comma
lineholder = id+' , '+true+' , '+pred+' , '+ds+' , '+dr
else: continue
output.write(lineholder)
I think using the indexes would be another option, but it might complicate things if the spacing is off in any of the files and I haven't checked this for sure.
Thank you for your help!

Well first of all, if you want to use CSV, you should use CSV module that comes with python. More about this module here: https://docs.python.org/2.7/library/csv.html I won't demonstrate how to use it, because it's pretty simple.
As for reading the input data, here's my suggestion how to break down every line of the data itself. I assume that lines of data in the input file have their values separated by spaces, and each value cannot contain a space:
def process_line(id_, line):
pieces = line.split() # Now we have an array of values
true = pieces[1].split(':')[1] # split at the ':' and take the letter after it
pred = pieces[2].split(':')[1] # split at the second ':' and take the letter after it
if len(pieces) == 6: # There was an error, the + is there
p4 = pieces[4]
else: # There was no '+' only spaces
p4 = pieces[3]
ds = p4.split(',')[0] # split at the ',' and take the string of 5 digits before it
if pred == 'R':
dr = p4.split(',')[0][1:] #skip the character after the comma but take the have??? characters after
else:
dr = p4.split(',')[0]
return id_+' , '+true+' , '+pred+' , '+ds+' , '+dr
What I mainly used here was split function of strings: https://docs.python.org/2/library/stdtypes.html#str.split and in one place this simple syntax of str[1:] to skip the first character of the string (strings are arrays after all, we can use this slicing syntax).
Keep in mind that my function won't handle any errors or lines formated differently than the one you posted as an example. If the values in every line are separated by tabs and not spaces you should replace this line: pieces = line.split() with pieces = line.split('\t').

i think u can separte floats and then combine it with the strings with the help of re module as follows:
import re
file = open('sample.txt','r')
strings=[[num for num in re.findall(r'\d+\.+\d+',i) for i in file.readlines()]]
print (strings)
file.close()
file = open('sample.txt','r')
num=[[num for num in re.findall(r'\w+\:+\w+',i) for i in file.readlines()]]
print (num)
s= num+strings
print s #[['1:S','2:R'],['0.125','0.875','73.84']] output of the code
this prog is written for one line u can use it for multiple line as well but u need to use a loop for that
contents of sample.txt:
1 1:S 2:R + 0.125,*0.875 (73.84)
2 1:S 2:R + 0.15,*0.85 (69.4)
when you run the prog the result will be:
[['1:S,'2:R'],['1:S','2:R'],['0.125','0.875','73.84'],['0.15,'0.85,'69.4']]
simply concatenate them

This uses regular expressions and the CSV module.
import re
import csv
matcher = re.compile(r'[[:blank:]]*1.*:(.).*:(.).* ([^ ]*),[^0-9]?(.*) ')
filenametemplate = 'mydata/output/output%iout'
output = csv.writer(open('mydata/summary.csv', 'w'))
for i in range(1, n):
for line in open(filenametemplate % i):
m = matcher.match(line)
if m:
output.write([i] + list(m.groups()))

Encrypting the lines in a file

I'm trying to write a program that opens a text file, and shifts each of the characters in the file 5 characters to the right. It should only do this for alphanumeric characters, and leave nonalphanumerics as they are. (ex: C becomes H) I'm supposed to be using the ASCII table to do this, and I'm having an issue when the characters wrap around. ex: w should become b, but my program gives me a character that's in the ASCII table. Another issue I'm having is that all the characters are printing on separate lines and I'd like them all to print on the same line.
I can't use lists or dictionaries.
This is what I have, I'm not sure how to do the final if statement
def main():
fileName= input('Please enter the file name: ')
encryptFile(fileName)
def encryptFile(fileName):
f= open(fileName, 'r')
line=1
while line:
line=f.readline()
for char in line:
if char.isalnum():
a=ord(char)
b= a + 5
#if number wraps around, how to correct it
if
print(chr(c))
else:
print(chr(b))
else:
print(char)

Using str.translate:
In [24]: import string
In [25]: string.uppercase
Out[25]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
In [26]: string.uppercase[5:]+string.uppercase[:5]
Out[26]: 'FGHIJKLMNOPQRSTUVWXYZABCDE'
In [27]: table = string.maketrans(string.uppercase, string.uppercase[5:]+string.uppercase[:5])
In [28]: 'CAR'.translate(table)
Out[28]: 'HFW'
In [29]: 'HELLO'.translate(table)
Out[29]: 'MJQQT'

First, it matters if it is lower or upper case. I am going to assume here that all the characters are lower case (if they aren't, it would be easy enough to make them)
if b>122:
b=122-b #z=122
c=b+96 #a=97
w=119 in ASCII and z=122 (decimal in ASCII) so 119+5=124 and 124-122=2 which is our new b, then we add that to a-1 (this takes care of if we get a 1 back, 2+96=98 and 98 is b.
For the printing on the same line, instead of printing when you have them, I would write them to a list, then create a string from that list.
e.g instead of
print(chr(c))
else:
print(chr(b))
I would do
someList.append(chr(c))
else:
somList.append(chr(b))
then join each element of the list together into one string.

You could create a dictionary to handle it:
import string
s = string.lowercase + string.uppercase + string.digits + string.lowercase[:5]
encryptionKey = {s[i]:s[i+5] for i in range(len(s)-5)}
The final addend to s (+ string.lowercase[:5]) adds the first 5 letters into the key. Then, we use a simple dictionary comprehension to create a key for the encryption.
Put into your code (I also changed it so you iterate through the lines rather than using f.readline():
import string
def main():
fileName= input('Please enter the file name: ')
encryptFile(fileName)
def encryptFile(fileName):
s = string.lowercase + string.uppercase + string.digits + string.lowercase[:5]
encryptionKey = {s[i]:s[i+5] for i in range(len(s)-5)}
f= open(fileName, 'r')
line=1
for line in f:
for char in line:
if char.isalnum():
print(encryptionKey[char])
else:
print(char)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: CSV delimiter failing randomly - python

Related

Python - How to handle space as a value of a variable without quotes?

Print strings with line break Python

Writing to UTF-16-LE text file with BOM

Extracting Data from Multiple TXT Files and Creating a Summary CSV File in Python

Encrypting the lines in a file

Categories

Resources