I'm writing a short program to go through a directory and write create table and load from csv statements for a bunch of csvs and get them all into mySQL. I'm sure there's an easier way to do this, but I thought it would be fun to make it myself.
This is one of the lines I have in python to build the load csv statement, where l_d is a variable I'm storing it in, f is the file path, and n is the table name:
l_d = "LOAD DATA INFILE " + "'" + f + "'" + "\nINTO TABLE " + n + "\nFIELDS TERMINATED BY ','\nENCLOSED BY '" + '"' +"'" + "\nLINES TERMINATED BY" +"\'\n\'" + "\nIGNORE 1 ROWS;"
The statement I want in SQL is:
LOAD DATA INFILE 'file.csv'
INTO TABLE table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY'\n'
IGNORE 1 ROWS;
but what I get is always
LOAD DATA INFILE 'file.csv'
INTO TABLE table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY'
'
IGNORE 1 ROWS;
because it thinks my \n is supposed to be a line break and not the actual characters.
How can I get the actual characters to show up here?
Also, I know my whole string concatenation in the original statement is kinda gross (I'm pretty new to this), so any general tips on how to improve that would also be much appreciated :)
to escape the backspace add another one before it:
\\n
gets \n
so your code will be:
l_d = "LOAD DATA INFILE " + "'" + f + "'" + "\nINTO TABLE " + n +
"\nFIELDS TERMINATED BY ','\nENCLOSED BY '" + '"' +"'" + "\nLINES
TERMINATED BY" +"\'\\n\'" + "\nIGNORE 1 ROWS;"
print("hello\ \n")
#this print a original "\n"
Related
I have the following code that pulls a list from a txt file and then prints one line at a time. The problem I am having is that I want the code to remove that line from the txt file after it has printed it.
I have tried using a few different methods I found online but had no success on maiking it work.
Would any have any idea's on how to achive this?
import time
from time import sleep
import random
my_file=open('testlist.txt','r')
file_lines=my_file.readlines()
my_file.close()
for line in file_lines:
try:
sep = line.split(":")
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
print(random.choice(select_list))
sleep(1)
except Exception as e:
print(e)
Basically after the "print(random.choice(select_list))", we would want to delete "line" from "testlist.txt".
Let's go through some logic and see how to achieve the results you are expecting.
Humanly / Intuitively, the actions are
1. Read files line-by-line to memory
my_file=open('testlist.txt','r')
file_lines=my_file.readlines()
my_file.close()
It would be a better practice to consider using with context managers (it will automatically help you close the file one you are out of the indentation from the with scope, i.e.
with open('testlist.txt','r') as my_file:
file_lines = my_file.readlines()
2. For each line that is read, (a) split it by : character, (b) perform a few string operations and (c) randomly select one of the output from (2b), i.e
for line in file_lines:
sep = line.split(":")
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
print(random.choice(select_list))
2b. Now, lets take a look at (2b) and see what we are trying to achieve, i.e.
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
We produce 3 items in the select_list, where it seems the "test " + sep[0] + "? " + sep[1] has 2 occurence and the "test " + sep[0] + "! " + sep[1] is included once.
"test " + sep[0] + "? " + sep[1]
"test " + sep[0] + "! " + sep[1]
"test " + sep[0] + "? " + sep[1]
in any case, the select_list = [ ... ] is a valid line of code.
2c. Regarding the print(random.choice(select_list)) line, it doesn't affect any variables and it's just randomly choosing an item from the select_list
Going back to original question,
I want the code to remove that line from the txt file after it has printed it.
Q: Would this mean removing the line from the original file_lines in open('testlist.txt','r')?
A: If so, then it would be removing all lines from the original testlist.txt, because if everything would checks out for step 2b and 2c (in the try part of the code).
But if step 2b or 2c throws an error and get caught in the except, then it would be a line that you won't want to throw out (as per your original question).
In that case, it looks like what you want to get eventually is a list of lines that falls into the except scope of the code.
If so, then you would be looking at something like this:
# Reading the original file.
with open('testlist.txt','r') as my_file:
# Opening a file to write the lines that falls into the exception
with open('testlist-exceptions.txt', 'w') as fout:
# Iterate through the original file line by line.
for line in my_file:
# Step 2a.
sep = line.split(":")
# Supposedly step 2b, but since this is the only
# point of the code that can throw an exception
# most probably because there's no sep[1],
# you should instead check the length of the sep variable.
if len(sep) < 2: # i.e. does not have sep[1].
# Write to file.
fout.write(line)
else: # Otherwise, perform the step 2b.
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
print(random.choice(select_list))
Now the new logic is a lot simpler than the intuition based logic you have in the original code but achieves the same output that you are expecting.
The new logic is as such:
Open the original file for reading, open another file to write the file that I expect
Read the file line by line
Split the file by :
Check if it allows the string operation to join the sep[0] and sep[1]
if yes, perform the string operation to create select_list and choose one of the item in select_list to print to console
if no, print out that line to the output file.
If for some reason, you really want to work with the file in place with Python, then take a look at Is it possible to modify lines in a file in-place?
And if you really need to reduce memory footprint and want something that can edit lines of the file in-place, then you would have to dig a little into file seek function https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
Some unsolicited Python suggestions (hope this helps)
When you need to delete something from file, if disk space allows, don't delete it, create another file with the expected output without the deleted lines.
Whenever possible, try to treat files that you want to read as immutable, and less like Perl-style in-place files that allows edits
It's tempting to do try-excepts when you just want something to work, but catch-all excepts are hard to debug and normally shows that the logic of the steps can be better. https://docs.python.org/3/tutorial/errors.html#handling-exceptions
I would give you example but better try to follow this guide there is various ways , first issue is opening the file with 'r' mode
Deleting lines
This question already has answers here:
How to print without a newline or space
(26 answers)
Closed 1 year ago.
I have one problem with print in Python, I am starting to learn Python, but when I want to print variables in print function, it looks like that after one variable it adds newline to the outcome:
print(game_name + " | " + rating)
I am making a game database with my own ratings on the games but if it prints the game and the rating there is like one empty line belpw the game_name and rating is written, how can I delete that empty newline? I am very new to this, so please don't be mean...
Welcome to Stack Overflow! The most likely culprit is that there is a newline at the end of the game_name variable. The easy fix for this is to strip it off like this:
print(game_name.strip() + " | " + rating)
Say we had two variables like this.
game_name = 'hello\n'
rating = 'there'
game_name has the newline. To get rid of that use strip().
print(game_name.strip() + " | " + rating)
output
hello | there
If you want to remove the line break after printing, you can define the optional end parameter to the print statement.
print('Hello World', end='') # No new line
If your variable has a new line added to it, and you want to remove it, use the .strip() method on the variable.
print('Hello World\n'.strip()) # No empty line
For your code, you could run it:
print(game_name + " | " + rating, end='')
Or
print(game_name + " | " + rating.strip())
If the error is that a new line is printed after game_name, you'll want to call the strip method on that instead.
print(game_name.strip() + " | " + rating)
rating or game_name most likely have a new line after the specified string.
You can fix it by doing this:
game_name = game_name.strip('\n')
rating = rating.strip('\n')
print(game_name + " | " + rating)
I have created a script which a number of random passwords are generated (see below)
import string
import secrets
import datetime
now = datetime.datetime.now()
T = now.strftime('%Y_%m_d')
entities = ['AA','BB','CC','DD','EE','FF','GG','HH']
masterpass = ('MasterPass' + '_' + T + '.csv')
f= open(masterpass,"w+")
def random_secure_string(stringLength):
secureStrMain = ''.join((secrets.choice(string.ascii_lowercase + string.ascii_uppercase + string.digits + ('!'+'?'+'"'+'('+')'+'$'+'%'+'#'+'#'+'/'+':'+';'+'['+']'+'#')) for i in range(stringLength)))
return secureStrMain
def random_secure_string_lower(stringLength):
secureStrLower = ''.join((secrets.choice(string.ascii_lowercase)) for i in range(stringLength))
return secureStrLower
def random_secure_string_upper(stringLength):
secureStrUpper = ''.join((secrets.choice(string.ascii_uppercase)) for i in range(stringLength))
return secureStrUpper
def random_secure_string_digit(stringLength):
secureStrDigit = ''.join((secrets.choice(string.digits)) for i in range(stringLength))
return secureStrDigit
def random_secure_string_char(stringLength):
secureStrChar = ''.join((secrets.choice('!'+'?'+'"'+'('+')'+'$'+'%'+'#'+'#'+'/'+':'+';'+'['+']'+'#')) for i in range(stringLength))
return secureStrChar
for x in entities:
f.write(x + ',' + random_secure_string(6) + random_secure_string_lower(1) + random_secure_string_upper(1) + random_secure_string_digit(1) + random_secure_string_char(1) + ',' + T + "\n")
f.close()
I use pandas to get the code to import a list, so normally it is for 200-250 entities, not just the 8 in the example.
The issue comes every so often where it looks like the comma delimiter fails to be read (see row 6 of attached photo)
In all the cases I have had of this (multiple run throughs), it looks like the 10th character is a comma, the 4 before (characters 6-9) are as stated in the script, but then instead of generating 6 initial characters (from random_secure_string(6)), it is generating 5. Could this be causing the issue? If so, how do I fix this?
Thank you in advance
Wild guess, because the content of the csv file as text is required to make sure.
A csv is a Comma Separated Values text file. That means that it is a plain text files where fields are delimited with a separator, normally the comma (,). In order to allow text fields to contain commas or even new lines, they can be enclosed in quotes (normally ") or special characters can be escaped, normally with \.
That means that if a line contains abcdefg\,2020_05 the comma will not be interpreted as a separator.
How to fix:
CSV is a simple format, but with many corner cases. The rule is avoid to read or write it by hand. Just use the standard library csv module here:
...
import csv
...
with open(masterpass,"w+", newline='') as f:
wr = csv.writer(f)
for x in entities:
wr.writerow([x, random_secure_string(6) + random_secure_string_lower(1) + random_secure_string_upper(1) + random_secure_string_digit(1) + random_secure_string_char(1), T])
The writer will take care for special characters and ensure that appropriate encoding or escaping will be used
So, I have an extremely inefficient way to do this that works, which I'll show, as it will help illustrate the problem more clearly. I'm an absolute beginner in python and this is definitely not "the python way" nor "remotely sane."
I have a .txt file where each line contains information about a large number of .csv files, following format:
File; Title; Units; Frequency; Seasonal Adjustment; Last Updated
(first entry:)
0\00XALCATM086NEST.csv;Harmonized Index of Consumer Prices: Overall Index Excluding Alcohol and Tobacco for Austria©; Index 2005=100; M; NSA; 2015-08-24
and so on, repeats like this for a while. For anyone interested, this is the St.Louis Fed (FRED) data.
I want to rename each file (currently named the alphanumeric code # the start, 00XA etc), to the text name. So, just split by semicolon, right? Except, sometimes, the text title has semicolons within it (and I want all of the text).
So I did:
data_file_data_directory = 'C:\*****\Downloads\FRED2_csv_3\FRED2_csv_2'
rename_data_file_name = 'README_SERIES_ID_SORT.txt'
rename_data_file = open(data_file_data_directory + '\\' + rename_data_file_name)
for line in rename_data_file.readlines():
data = line.split(';')
if len(data) > 2 and data[0].rstrip().lstrip() != 'File':
original_file_name = data[0]
These last 2 lines deal with the fact that there is some introductory text that we want to skip, and we don't want to rename based on the legend # the top (!= 'File'). It saves the 00XAL__.csv as the oldname. It may be possible to make this more elegant (I would appreciate the tips), but it's the next part (the new, text name) that gets really ugly.
if len(data) ==6:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==7:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==8:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==9:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==10:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[5][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
(etc)
What I'm doing here is that there is no way to know for each line in the .csv how many items are in the list created by splitting it by semicolons. Ideally, the list would be length 6 - as follows the key # the top of my example of the data. However, for every semicolon in the text name, the length increases by 1...and we want everything before the last four items in the list (counting backwards from the right: date, seasonal adjustment, frequency, units/index) but after the .csv code (this is just another way of saying, I want the text "title" - everything for each line after .csv but before units/index).
Really what I want is just a way to save the entirety of the text name as "new_name" for each line, even after I split each line by semicolon, when I have no idea how many semicolons are in each text name or the line as a whole. The above code achieves this, but OMG, this can't be the right way to do this.
Please let me know if it's unclear or if I can provide more info.
Having a problem with parsing Snort logs using the pyparsing module.
The problem is with separating the Snort log (which has multiline entries, separated by a blank line) and getting pyparsing to parse each entry as a whole chunk, rather than read in line by line and expecting the grammar to work with each line (obviously, it does not.)
I have tried converting each chunk to a temporary string, stripping out the newlines inside each chunk, but it refuses to process correctly. I may be wholly on the wrong track, but I don't think so (a similar form works perfectly for syslog-type logs, but those are one-line entries and so lend themselves to your basic file iterator / line processing)
Here's a sample of the log and the code I have so far:
[**] [1:486:4] ICMP Destination Unreachable Communication with Destination Host is Administratively Prohibited [**]
[Classification: Misc activity] [Priority: 3]
08/03-07:30:02.233350 172.143.241.86 -> 63.44.2.33
ICMP TTL:61 TOS:0xC0 ID:49461 IpLen:20 DgmLen:88
Type:3 Code:10 DESTINATION UNREACHABLE: ADMINISTRATIVELY PROHIBITED HOST FILTERED
** ORIGINAL DATAGRAM DUMP:
63.44.2.33:41235 -> 172.143.241.86:4949
TCP TTL:61 TOS:0x0 ID:36212 IpLen:20 DgmLen:60 DF
Seq: 0xF74E606
(32 more bytes of original packet)
** END OF DUMP
[**] ...more like this [**]
And the updated code:
def snort_parse(logfile):
header = Suppress("[**] [") + Combine(integer + ":" + integer + ":" + integer) + Suppress("]") + Regex(".*") + Suppress("[**]")
cls = Optional(Suppress("[Classification:") + Regex(".*") + Suppress("]"))
pri = Suppress("[Priority:") + integer + Suppress("]")
date = integer + "/" + integer + "-" + integer + ":" + integer + "." + Suppress(integer)
src_ip = ip_addr + Suppress("->")
dest_ip = ip_addr
extra = Regex(".*")
bnf = header + cls + pri + date + src_ip + dest_ip + extra
def logreader(logfile):
chunk = []
with open(logfile) as snort_logfile:
for line in snort_logfile:
if line !='\n':
line = line[:-1]
chunk.append(line)
continue
else:
print chunk
yield " ".join(chunk)
chunk = []
string_to_parse = "".join(logreader(logfile).next())
fields = bnf.parseString(string_to_parse)
print fields
Any help, pointers, RTFMs, You're Doing It Wrongs, etc., greatly appreciated.
import pyparsing as pyp
import itertools
integer = pyp.Word(pyp.nums)
ip_addr = pyp.Combine(integer+'.'+integer+'.'+integer+'.'+integer)
def snort_parse(logfile):
header = (pyp.Suppress("[**] [")
+ pyp.Combine(integer + ":" + integer + ":" + integer)
+ pyp.Suppress(pyp.SkipTo("[**]", include = True)))
cls = (
pyp.Suppress(pyp.Optional(pyp.Literal("[Classification:")))
+ pyp.Regex("[^]]*") + pyp.Suppress(']'))
pri = pyp.Suppress("[Priority:") + integer + pyp.Suppress("]")
date = pyp.Combine(
integer+"/"+integer+'-'+integer+':'+integer+':'+integer+'.'+integer)
src_ip = ip_addr + pyp.Suppress("->")
dest_ip = ip_addr
bnf = header+cls+pri+date+src_ip+dest_ip
with open(logfile) as snort_logfile:
for has_content, grp in itertools.groupby(
snort_logfile, key = lambda x: bool(x.strip())):
if has_content:
tmpStr = ''.join(grp)
fields = bnf.searchString(tmpStr)
print(fields)
snort_parse('snort_file')
yields
[['1:486:4', 'Misc activity', '3', '08/03-07:30:02.233350', '172.143.241.86', '63.44.2.33']]
You have some regex unlearning to do, but hopefully this won't be too painful. The biggest culprit in your thinking is the use of this construct:
some_stuff + Regex(".*") +
Suppress(string_representing_where_you_want_the_regex_to_stop)
Each subparser within a pyparsing parser is pretty much standalone, and works sequentially through the incoming text. So the Regex term has no way to look ahead to the next expression to see where the '*' repetition should stop. In other words, the expression Regex(".*") is going to just read until the end of the line, since that is where ".*" stops without specifying multiline.
In pyparsing, this concept is implemented using SkipTo. Here is how your header line is written:
header = Suppress("[**] [") + Combine(integer + ":" + integer + ":" + integer) +
Suppress("]") + Regex(".*") + Suppress("[**]")
Your ".*" problem gets resolved by changing it to:
header = Suppress("[**] [") + Combine(integer + ":" + integer + ":" + integer) +
Suppress("]") + SkipTo("[**]") + Suppress("[**]")
Same thing for cls.
One last bug, your definition of date is short by one ':' + integer:
date = integer + "/" + integer + "-" + integer + ":" + integer + "." +
Suppress(integer)
should be:
date = integer + "/" + integer + "-" + integer + ":" + integer + ":" +
integer + "." + Suppress(integer)
I think those changes will be sufficient to start parsing your log data.
Here are some other style suggestions:
You have a lot of repeated Suppress("]") expressions. I've started defining all my suppressable punctuation in a very compact and easy to maintain statement like this:
LBRACK,RBRACK,LBRACE,RBRACE = map(Suppress,"[]{}")
(expand to add whatever other punctuation characters you like). Now I can use these characters by their symbolic names, and I find the resulting code a little easier to read.
You start off header with header = Suppress("[**] [") + .... I never like seeing spaces embedded in literals this way, as it bypasses some of the parsing robustness pyparsing gives you with its automatic whitespace skipping. If for some reason the space between "[**]" and "[" was changed to use 2 or 3 spaces, or a tab, then your suppressed literal would fail. Combine this with the previous suggestion, and header would begin with
header = Suppress("[**]") + LBRACK + ...
I know this is generated text, so variation in this format is unlikely, but it plays better to pyparsing's strengths.
Once you have your fields parsed out, start assigning results names to different elements within your parser. This will make it a lot easier to get the data out afterward. For instance, change cls to:
cls = Optional(Suppress("[Classification:") +
SkipTo(RBRACK)("classification") + RBRACK)
Will allow you to access the classification data using fields.classification.
Well, I don't know Snort or pyparsing, so apologies in advance if I say something stupid. I'm unclear as to whether the problem is with pyparsing being unable to handle the entries, or with you being unable to send them to pyparsing in the right format. If the latter, why not do something like this?
def logreader( path_to_file ):
chunk = [ ]
with open( path_to_file ) as theFile:
for line in theFile:
if line:
chunk.append( line )
continue
else:
yield "".join( *chunk )
chunk = [ ]
Of course, if you need to modify each chunk before sending it to pyparsing, you can do so before yielding it.