Python 2.7 and MySQL: grooming tuple - python

I am self-taught in MySQL, Python and Linux based OS, and quite sure there must be a more elegant solution to the problem, or, at least, one that works along my lines.
The code is taking data from the last 24 hours from a database and storing them to a .txt file to be handled further on. However, the output I am getting has additional symbols that are making further analysis troublesome - I want to know if there is a way to remove them.
My (relevant) code is:
...
cur = db.cursor()
cur.execute("SELECT * FROM Sens WHERE sdate > DATE_SUB(NOW(),INTERVAL 24 HOUR)")
query = cur.fetchall()
OutputFile = open("/root/Desktop/data.txt", "w")
for i in range (0, len(query)):
print>>OutputFile, query[i]
...
The reason I am using for loop is to have each row fetched printed in a newline.
The result I get is as follows:
('0,01/24/16,12:41:49,45.185\r\r\n',)
The result I need is:
0,01/24/16,12:41:49,45.185
Much appreciate the help,
LZ.

You could use:
for i in range (0, len(query)):
print>>OutputFile, query[i][0].strip()
The [0] index selects the string from the tuple, and the strip() function removes the whitespace from the left and right hand side of the string.

For a start, you shouldn't ever be iterating over range(len(something)). Always iterate over the thing itself. Also, it's more idiomatic to use file.write to output, rather than pritnting.
From there, all you need to do is just output the first element in each item, with [0].
for q in query:
OutputFile.write(q[0])

Related

Delete a Portion of a CSV Cell in Python

I have recently stumbled upon a task utilizing some CSV files that are, to say the least, very poorly organized, with one cell containing what should be multiple separate columns. I would like to use this data in a Python script but want to know if it is possible to delete a portion of the row (all of it after a certain point) then write that to a dictionary.
Although I can't show the exact contents of the CSV, it looks like this:
useful. useless useless useless useless
I understand that this will most likely require either a regular expression or an endswith statement, but doing all of that to a CSV file is beyond me. Also, the period written after useful on the CSV should be removed as well, and is not a typo.
If you know the character you want to split on you can use this simple method:
good_data = bad_data.split(".")[0]
good_data = good_data.strip() # remove excess whitespace at start and end
This method will always work. split will return a tuple which will always have at least 1 entry (the full string). Using index may throw an exception.
You can also limit the # of splits that will happen if necessary using split(".", N).
https://docs.python.org/2/library/stdtypes.html#str.split
>>> "good.bad.ugly".split(".", 1)
['good', 'bad.ugly']
>>> "nothing bad".split(".")
['nothing bad']
>>> stuff = "useful useless"
>>> stuff = stuff[:stuff.index(".")]
ValueError: substring not found
Actual Answer
Ok then notice that you can use indexing for strings just like you do for lists. I.e. "this is a very long string but we only want the first 4 letters"[:4] gives "this". If we now new the index of the dot we could just get what you want like that. For exactly that strings have the index method. So in total you do:
stuff = "useful. useless useless useless useless"
stuff = stuff[:stuff.index(".")]
Now stuff is very useful :).
In case we are talking about a file containing multiple lines like that you could do it for each line. Split that line at , and put all in a dictionary.
data = {}
with open("./test.txt") as f:
for i, line in enumerate(f.read().split("\n")):
csv_line = line[:line.index(".")]
for j,col in enumerate(csv_line.split(",")):
data[(i,j)] = col
How one would do this
Notice that most people would not want to do it by hand. It is a common task to work on tabled data and there is a library called pandas for that. Maybe it would be a good idea to familiarise yourself a bit more with python before you dive into pandas though. I think a good point to start is this. Using pandas your task would look like this
import pandas as pd
pd.read_csv("./test.txt", comment=".")
giving you what is called a dataframe.

How to impliment a binary search on a list created from a file

This is my first post, please be gentle. I'm attempting to sort some
files into ascending and descending order. Once I have sorted a file, I am storing it in a list which is assigned to a variable. The user is then to choose a file and search for an item. I get an error message....
TypeError: unorderable types; int() < list()
.....when ever I try to search for an item using the variable of my sorted list, the error occurs on line 27 of my code. From research, I know that an int and list cannot be compared, but I cant for the life of me think how else to search a large (600) list for an item.
At the moment I'm just playing around with binary search to get used to it.
Any suggestions would be appreciated.
year = []
with open("Year_1.txt") as file:
for line in file:
line = line.strip()
year.append(line)
def selectionSort(alist):
for fillslot in range(len(alist)-1,0,-1):
positionOfMax=0
for location in range(1,fillslot+1):
if alist[location]>alist[positionOfMax]:
positionOfMax = location
temp = alist[fillslot]
alist[fillslot] = alist[positionOfMax]
alist[positionOfMax] = temp
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
selectionSort(year)
testlist = []
testlist.append(year)
print(binarySearch(testlist, 2014))
Year_1.txt file consists of 600 items, all years in the format of 2016.
They are listed in descending order and start at 2017, down to 2013. Hope that makes sense.
Is there some reason you're not using the Python: bisect module?
Something like:
import bisect
sorted_year = list()
for each in year:
bisect.insort(sorted_year, each)
... is sufficient to create the sorted list. Then you can search it using functions such as those in the documentation.
(Actually you could just use year.sort() to sort the list in-place ... bisect.insort() might be marginally more efficient for building the list from the input stream in lieu of your call to year.append() ... but my point about using the `bisect module remains).
Also note that 600 items is trivial for modern computing platforms. Even 6,000 won't take but a few milliseconds. On my laptop sorting 600,000 random integers takes about 180ms and similar sized strings still takes under 200ms.
So you're probably not gaining anything by sorting this list in this application at that data scale.
On the other hand Python also includes a number of modules in its standard libraries for managing structured data and data files. For example you could use Python: SQLite3.
Using this you'd use standard SQL DDL (data definition language) to describe your data structure and schema, SQL DML (data manipulation language: INSERT, UPDATE, and DELETE statements) to manage the contents of the data and SQL queries to fetch data from it. Your data can be returned sorted on any column and any mixture of ascending and descending on any number of columns with the standard SQL ORDER BY clauses and you can add indexes to your schema to ensure that the data is stored in a manner to enable efficient querying and traversal (table scans) in any order on any key(s) you choose.
Because Python includes SQLite in its standard libraries, and because SQLite provides SQL client/server semantics over simple local files ... there's almost no downside to using it for structured data. It's not like you have to install and maintain additional software, servers, handle network connections to a remote database server nor any of that.
I'm going to walk through some steps before getting to the answer.
You need to post a [mcve]. Instead of telling us to read from "Year1.txt", which we don't have, you need to put the list itself in the code. Do you NEED 600 entries to get the error in your code? No. This is sufficient:
year = ["2001", "2002", "2003"]
If you really need 600 entries, then provide them. Either post the actual data, or
year = [str(x) for x in range(2017-600, 2017)]
The code you post needs to be Cut, Paste, Boom - reproduces the error on my computer just like that.
selectionSort is completely irrelevant to the question, so delete it from the question entirely. In fact, since you say the input was already sorted, I'm not sure what selectionSort is actually supposed to do in your code, either. :)
Next you say testlist = [].append(year). USE YOUR DEBUGGER before you ask here. Simply looking at the value in your variable would have made a problem obvious.
How to append list to second list (concatenate lists)
Fixing that means you now have a list of things to search. Before you were searching a list to see if 2014 matched the one thing in there, which was a complete list of all the years.
Now we get into binarySearch. If you look at the variables, you see you are comparing the integer 2014 with some string, maybe "1716", and the answer to that is useless, if it even lets you do that (I have python 2.7 so I am not sure exactly what you get there). But the point is you can't find the integer 2014 in a list of strings, so it will always return False.
If you don't have a debugger, then you can place strategic print statements like
print ("debug info: binarySearch comparing ", item, alist[midpoint])
Now here, what VBB said in comments worked for me, after I fixed the other problems. If you are searching for something that isn't even in the list, and expecting True, that's wrong. Searching for "2014" returns True, if you provide the correct list to search. Alternatively, you could force it to string and then search for it. You could force all the years to int during the input phase. But the int 2014 is not the same as the string "2014".

When I write in csv how do I separate columns in Python

My code is
import pymysql
conn=pymysql.connect(host=.................)
curs=conn.cursor()
import csv
f=open('./kospilist.csv','r')
data=f.readlines()
data_kp=[]
for i in data:
data_kp.append(i[:-1])
c = csv.writer(open("./test_b.csv","wb"))
def exportFunc():
result=[]
for i in range(0,len(data_kp)):
xp="select date from " + data_kp[i] + " where price is null"
curs.execute(xp)
result= curs.fetchall()
for row in result:
c.writerow(data_kp[i])
c.writerow(row)
c.writerow('\n')
exportFunc()
data_kp is reading the tables name
the tables' names are like this (string, ex: a000010)
I collect table names from here.
Then, execute and get the result.
The actual output of my code is ..
My expectation is
(not 3 columns.. there are 2000 tables)
I thought my code is near the answer... but it's not working..
My work is almost done, but I couldn't finish this part.
I had googled for almost 10 hours..
I don't know how.. please help
I think something is wrong with these part
for row in result:
c.writerow(data_kp[i])
c.writerow(row)
The csvwriter.writerow method allows you to write a row in your output csv file. This means that once you have called the writerow method, the line is wrote and you can't come back to it. When you write the code:
for row in result:
c.writerow(data_kp[i])
c.writerow(row)
You are saying:
"For each result, write a line containing data_kp[i] then write a
line containing row."
This way, everything will be wrote verticaly with alternation between data_kp[i] and row.
What is surprising is that it is not what we get in your actual output. I think that you've changed something. Something like that:
c.writerow(data_kp[i])
for row in result:
c.writerow(row)
But this has not entirely solved your issue, obviously: The names of the tables are not correctly displayed (one character on each column) and they are not side-by-side. So you have 2 problems here:
1. Get the table name in one cell and not splitted
First, let's take a look at the documentation about the csvwriter:
A row must be an iterable of strings or numbers for Writer objects
But your data_kp[i] is a String, not an "iterable of String". This can't work! But you don't get any error either, why? This is because a String, in python, may be itself considered as an iterable of String. Try by yourself:
for char in "abcde":
print(char)
And now, you have probably understood what to do in order to make the things work:
# Give an Iterable containing only data_kp[i]
c.writerow([data_kp[i]])
You have now your table name displayed in only 1 cell! But we still have an other problem...
2. Get the table names displayed side by side
Here, it is a problem in the logic of your code. You are browsing your table names, writing lines containing them and expect them to be written side-by-side and get columns of dates!
Your code need a little bit of rethinking because csvwriter is not made for writing columns but lines. We'll then use the zip_longest function of the itertools module. One can ask why don't I use the zip built-in function of Python: this is because the columns are not said to be of equal size and the zip function will stop once it reached the end of the shortest list!
import itertools
c = csv.writer(open("./test_b.csv","wb"))
# each entry of this list will contain a column for your csv file
data_columns = []
def exportFunc():
result=[]
for i in range(0,len(data_kp)):
xp="select date from " + data_kp[i] + " where price is null"
curs.execute(xp)
result= curs.fetchall()
# each column starts with the name of the table
data_columns.append([data_kp[i]] + list(result))
# the * operator explode the list into arguments for the zip function
ziped_columns = itertools.zip_longest(*data_columns, fillvalue=" ")
csvwriter.writerows(ziped_columns)
Note:
The code provided here has not been tested and may contain bugs. Nevertheless, you should be able (by using the documentation I provided) to fix it in order to make it works! Good luck :)

Search a database for elements including a string variable

I'm pretty new to SQLite and Python and have run into a bit of confusion. I'm trying to return all elements in a column that contain a substring which is passed to a function as a variable in Python. My code is running, but it's returning an empty result instead of the correct result.
Here's the code with the names generalized:
def myFunc(cursor,myString):
return cursor.execute("""select myID from Column where name like '%'+?'%' """,(myString,))
Like I said, the code does run without error but returns an empty result instead of the result that I know it should be. I'm assuming it has something to do with my use of the wildcard and/or question mark, but I can't be sure. Anyone have any ideas? Thanks in advance for your time/help! Also, this is my first post, so I apologize in advance if I missed any of the recommended protocols for asking questions.
Well, '%'+?'%' definitely isn't going to work—you're trying to concatenate with + on the left, but with no operator…
You can compute LIKE-search fields if you do it right—'%'+?+'%', in this case. That will cause problems with some databases (from not working, to doing a less efficient search), but, at least according to CL.'s comment, sqlite3 will be fine.
But the easy thing to do is to just substitute a complete parameter, rather than part of one. You can put % into the parameters, and it'll be interpreted just fine. So:
return cursor.execute("""select myID from Column where name like ?""",
('%'+myString+'%',))
And this also has the advantage that if you want to do a search for initial substrings ('foo%'), it'll be the same SQL statement but with a different parameter.
Try this:
def myFunc(cursor,myString):
return cursor.execute('select myID from Column where name like "{0}"'.format(myString))

Using Python to write a CSV file with delimiter

I'm new to programming, and also to this site, so my apologies in advance for anything silly or "newbish" I may say or ask.
I'm currently trying to write a script in python that will take a list of items and write them into a csv file, among other things. Each item in the list is really a list of two strings, if that makes sense. In essence, the format is [[Google, http://google.com], [BBC, http://bbc.co.uk]], but with different values of course.
Within the CSV, I want this to show up as the first item of each list in the first column and the second item of each list in the second column.
This is the part of my code that I need help with:
with open('integration.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',', dialect='excel')
writer.writerows(w for w in foundInstances)
For whatever reason, it seems that the delimiter is being ignored. When I open the file in Excel, each cell has one list. Using the old example, each cell would have "Google, http://google.com". I want Google in the first column and http://google.com in the second. So basically "Google" and "http://google.com", and then below that "BBC" and "http://bbc.co.uk". Is this possible?
Within my code, foundInstances is the list in which all the items are contained. As a whole, the script works fine, but I cannot seem to get this last step. I've done a lot of looking around within stackoverflow and the rest of the Internet, but I haven't found anything that has helped me with this last step.
Any advice is greatly appreciated. If you need more information, I'd be happy to provide you with it.
Thanks!
In your code on pastebin, the problem is here:
foundInstances.append(['http://' + str(num) + 'endofsite' + ', ' + desc])
Here, for each row in your data, you create one string that already has a comma in it. That is not what you need for the csv module. The CSV module makes comma-delimited strings out of your data. You need to give it the data as a simple list of items [col1, col2, col3]. What you are doing is ["col1, col2, col3"], which already has packed the data into a string. Try this:
foundInstances.append(['http://' + str(num) + 'endofsite', desc])
I just tested the code you posted with
foundInstances = [[1,2],[3,4]]
and it worked fine. It definitely produces the output csv in the format
1,2
3,4
So I assume that your foundInstances has the wrong format. If you construct the variable in a complex manner, you could try to add
import pdb; pdb.set_trace()
before the actual variable usage in the csv code. This lets you inspect the variable at runtime with the python debugger. See the Python Debugger Reference for usage details.
As a side note, according to the PEP-8 Style Guide, the name of the variable should be found_instances in Python.

Categories

Resources