Python: IDE support to output database query like pandas data frame - python

(This may be stupid question due to my ignorance.)
Is it possible in Visual Studio Code or PyCharm (perhaps with a plugin) to automatically output a database query, say from an Sqlite source, be nicely formatted like a Pandas DataFrame? (So when I run the code it will be displayed in a nicely formatted table.)

You can use .format(), there are a few different ways you could do this - I'd normally do something like this:
print('{:>len(longestResult)}'.format(i))
If you iterate through all your results to find the longest one and use the length of it as above and iterate through your results again it'll give you a nicely padded table.

Related

Is there a way to iterate over table rows using Ibis (impala)

I have a fairly large Ibis TableExpr for which I would like to iterate over the rows to produce a specialized file output (FASTA nucleotide sequences). Is there any way to do this with Ibis, or should I just call execute to create a pandas DataFrame for which I can call iterrows?
I cannot find anything in the API or tutorials.
You should iterate over the pandas DataFrame as you say.
Or you should be able to also get the Impyla cursor that the backend generates calling lower level functions than .execute(). But those functions are likely to change when we release Ibis 2.0, so your code is likely to break.
Happy to receive feedback if that's something you'd be interested in. You can open an issue in the project GitHub.

Filter by file extension in MySQL

I have some information stored in MySQL and I want to fetch certain data. I have a column that contains "File_names" and these names correspond with all kinds of files like images, scripts, txts...Example. There's a lot of script names of different programming languages like cpp, PHP, sh, py,...
I only need the script related data. In my head is clear: fetch data only if the file name corresponds with a script (we know this because of its extension). But I don't know how to translate this idea to a MySQL query. I'm also thinking about fetching all info from MySQL and then filter it using python but I still have no idea.
I have a python solution in my mind but I think it's too complex. For example, if I create a list with a lot of script extensions, fetch all info from MySQL, split all file_names obtaining the extension, and finally compare it with the script extensions list. I think there's an easier/efficient way to do this.
Any idea?
Thanks and best regards
A simple query:
SELECT File_names
FROM Your_table
WHERE (File_names like '%.php'
OR File_names like '%.cpp'
OR File_names like '%.sh')
Add more OR options according to your needs.

Working with Arabic in Python on MacOS

I'm working on a project that has some data in Arabic. One task requires me to create a database mapping for some dicts. I don't read Arabic, but with the help of Google Translate and original English versions of the data, I'm able to surmise which Arabic strings map to the database columns.
The problem I'm facing is that Python / MacOS / Something seems to be converting ligatures (?) in the Arabic when I use copy/paste on them, which leads to my code not recognizing some of the dicts.
I believe I have a way around the problem, but given the nature of the work I'm doing, I would like to understand what is happening.
The original Arabic key looks like this:
However, when I copy/paste it on MacOS, it converts to the following:
Google Translate, MacOS, Safari, etc... all seem to think these are equivalent text, but Python disagrees and throws a KeyError when it encounters the original (due to the system having converted it to the second version. Even if I paste it here, it converts: الفئة
Is there a way to work with this text at the system level that does not end up with it being converted to something that Python doesn't recognize?
In case anybody finds this and runs into a similar problem...
What I needed to do was to parse through 350k structured Arabic records (though not all with the same schema), extract the key values, map them to English database column names, and then insert the original records into a table. Thinking laziness would work, I created a set of the unique keys, printed it to screen, then copy/pasted it into a text editor, converted it to a dict, and used the Arabic words as dict keys and the English column names as the values. Except, I did not notice that when I pasted the set of Arabic field names that the system "fixed" the Arabic misspellings, resulting in key names that were no longer recognized when parsing the records.
To fix the problem, instead of printing the Arabic column names (there were 32 of them) to the screen, I created a SQLite database and inserted them into a table that also included a blank "standardized" column. I then went into SQLite and updated the records to map the English to the Arabic. I then read the table back into Python and created a lookup dict that I used when parsing the full data payload. Inserting the Arabic into SQLite did not "correct" the misspellings for me, and hence, the records extracted from there served as an accurate lookup.
The lookup table ended up looking like this:
In spite of trying, I never figured out how to get MacOS to stop correcting the misspelled Arabic.

SPSS Python writing variable list to excel

Is it possible to write, from SPSS, (using Python), into a newly created Excel file, the variable list and variable labels?
Yes, lookup DISPLAY DICTIONARY and/or CODEBOOK. It would then be a case of exporting these outputs (from SPSS's output viewer) to Excel (OUTPUT EXPORT command).
If you needed something more customized then you can either capture the output via OMS and do manipulations as you please (and then export to Excel) or you can use python APIS directly to retrieve variable, value labels and then write results to Excel (using any Python/Excel library of your choice such as xlrd or xlsxwriter, to name a couple).
The latter requires much more programming knowledge whereas the former can all be done with native SPSS syntax.
I have done something similar (producing a customized data dictionary) taking the Python programming approach and found this module written by an unknown author very useful as a basis.
(Assuming you meant an automated way of achieving this else you could just copy and past the column of variable names and labels to Excel! Value labels can't be done similarly though for obvious reasons).
You can also consult the discussion on
http://www.spssforum.com/viewtopic.php?f=12&t=12076
that also includes some python code (only for the variable labels but value labels are an easy extension, using the GetVariableLabel function. However, it depends a bit on how you want to have them, though. (on separate lines, or as following the variable.)
You may also do it like so, followed by code of e.g. openpyxl:
from savReaderWriter import *
with SavHeaderReader(filename) as header:
report = str(header) # txt report
metadata = header.all()

How can I adapt my code to make it compatible to Microsoft Excel?

Problem
I was trying to implement an web API(based on Flask), which would be used to query the database given some specific conditions, reconstruct the data and finally export the result to a .csv file.
Since the amount of data is really really huge, I can not construct the whole dataset and generate the .csv file all at once(e.g. create a DataFrame using pandas and finally call df.to_csv()), because that would cause a slow query and maybe the http connection would end up timeout.
So I create a generator which query the database 500 records per time and yield the result one by one, like:
def __generator(q):
[...] # some code here
while True:
if records == None:
break
records = q[offset:offset+limit] # q means a sqlalchemy query object
[...] # omit some reconstruct code
for record in records:
yield record
and finally construct a Response object, and send .csv to client side:
return Response(__generate(q), mimetype='text/csv') # Flask
The generator works well and all data are encoded by 'uft-8', but when I try to open the .csv file using Microsoft Excel, it appears to be messy code.
Measures Already Tried
add a BOM header to the export file, doesn't work;
using some other encode like 'gb18030', and 'cp936', most of the messy code disappear, some still remained, and some part of the table structure become weird.
My Question Is
How can I make my code compatible to Microsoft Excel? That means at least two conditions should be satisfied:
no messy code, well displayed;
well structured table;
I would be really appreciated for your answer!
How are you importing the csv file to excel? Have you tried importing the csv as a text file?
By reading as text format for each column, it wont modify columns that it reads as different types like dates. Your code may be correct, and excel may just be modifying the data when it parses it as a csv - by importing as text format, it wont modify anything.
I would recommend you look into xlutils. It's been around for quite some time, and our company has used it both for reading configuration files to run automated test and for generating reports of test results.

Categories

Resources