execute multiple lines in jupyter notebook - python

I am a newbie with both python and jupyter notebook.
After searching I found that if I wanted to execute multiple lines for sqlContext I had to use triple """ like this:
sqlContext.sql("""select year,month,count(distinct station) as count
from tempReadingsTable
where year>=1950 and year<=2014 and value>=10
group by year,month
order by count desc
""").show()
Now, I am trying to find the same for this:
schMax = schMax.groupBy('year').
agg(fun.max('value').alias('value')).
join(sch['year','value']).
drop_duplicates(['year']).
select(['year','station','value']).
orderBy(['value'],ascending=[0])
Unless I run it all in one line it fails! how can I prevent that? I want to be able to execute all the lines separately...

You can use \ at the end of a line to have python continue reading the next line as part of the previous line (removing white spaces if needed).
Although I think it's more readable if you put the . on the start of each new line. It's more apparent that it's part of the previous statement since normal statements never begin with . in python.
schMax = schMax.groupBy('year')\
.agg(fun.max('value').alias('value'))\
.join(sch['year','value'])\
.drop_duplicates(['year'])\
.select(['year','station','value'])\
.orderBy(['value'],ascending=[0])

Related

Python loop error in SPSS syntax only if i run the same code twice

I'm quite new in python programming.
I'm trying to automate some tabulations in SPSS using python (and i kind of managed it...) using a loop and some python code, but it works fine only the first time i run the syntax, the second time it tabulates only once:
I have an SPSS file with different projects merged together (i.e. different countries) , so first i try to extract a list of projects using a built in function.
Once i have my list of project i run a loop and i change the spss syntax for the case selection and tabulation.
this is the code:
begin program.
import spss
#Function that extracts the data from spss
def DatiDaSPSS(vars, num):
if num == 0:
num = spss.GetCaseCount()
if vars == None:
varNums = range(spss.GetVariableCount())
else:
allvars = [spss.GetVariableName(i) for i in range(spss.GetVariableCount())]
varNums = [allvars.index(i) for i in vars]
data = spss.Cursor(varNums)
pydata = data.fetchmany(num)
data.close()
return pydata
#store the result of the function into a list:
all_prj=DatiDaSPSS(vars=["Project"],num=0)
#remove duplicates and keep only the country that i need:
prj_list=list(set([i[0] for i in all_prj]))
#loop for the tabulation:
for i in range(len(prj_list)):
prj_now=str(prj_list[i])
spss.Submit("""
compute filter_$=Project='%s'.
filter by filter_$.
exe.
TEXT "Country"
/OUTLINE HEADING="%s" TITLE="Country".
CTABLES
/VLABELS VARIABLES=HisInterviewer HisResult DISPLAY=DEFAULT
/TABLE HisInterviewer [C][COUNT F40.0, ROWPCT.COUNT PCT40.1] BY HisResult [C]
/CATEGORIES VARIABLES=HisInterviewer HisResult ORDER=A KEY=VALUE EMPTY=EXCLUDE TOTAL=YES
POSITION=AFTER
/CRITERIA CILEVEL=95.
""" %(prj_now,prj_now))
end program.
When i run it the second time it shows only the last value of the list (and only one tabulation). If i restart SPSS it works fine the first time.
Is it because of the function?
i'm using spss25
can I reply myself, should i edit the discussion or maybe delete it? i think i found out the reason, i guess the function picks up only the values that are already selected, i tried now adding this SPSS code before the begin and it seems to be working:
use all.
exe.
begin program.
...
at the last loop there is a filter on the data and i removed it before of running the script. please let me know if you want me to edit or remove the message

Using the Join command to eliminate extra paragraph breaks

So I have this text:
'
Location
Address
Number
Website
'
Except the top and bottom lines are empty as well, there aren't single quotes on those two lines. I basically want to make sure each line is one after another without any line breaks. This is what I would like it to look like.
Location
Address
Number
Website
I want to strip all of the line breaks and just have each result one line after another. This is the code to scrape the information from a webpage.
results = soup.findAll('div', class_='name')
for each in results:
worksheet.write(row,1,each.text)
row += 1
Each time I run through this, I want the results to print one line after another. Thanks.
Is there a reason you cannot use a simple if?
results = soup.findAll('div', class_='name')
for each in results:
if each.text:
worksheet.write(row,1,each.text)
row += 1
To join the results with a line-break use :
('\n').join(results)
To join with new lines and remove any new line present use :
import re
line=re.sub(r"(\n)+",r"\n",('\n').join(results))
The above case is useful if you don't know how many new lines exist between the text.(reduces multiple newlines to one)
Also the answer given by Malvolio is to avoid the blank line while writing:
if each.text:
This line would check if a line(each in your case) has text, if it doesn't it skips the statements below it.

Shell Scripting | Bash Programming | Custom right click in nautilus

I am trying to make a custom right click command for nautilus.
I managed to find a useful content here.
What I don't understand is what does these two lines essentially mean ?
IFS_BAK=$IFS
IFS="
"
And these are present at the bottom too. What do they mean ?
Please help.
IFS_BAK is essentially creating a backup of existing value of IFS variable.
The next line then assigns a new value to IFS i.e specific/required the script.
More info on Internal Field Separator (IFS) can be found here: https://unix.stackexchange.com/questions/16192/what-is-ifs-in-context-of-for-looping
https://unix.stackexchange.com/questions/184863/what-is-the-meaning-of-ifs-n-in-bash-scripting
https://unix.stackexchange.com/questions/26784/understanding-ifs
Okay, I got it.
It is called an 'Internal Field Separator', a special variable in shell.
If you set IFS to | (i.e. IFS=| ), | will be treated as delimiters between words/fields when splitting a line of input.
In the first line:
IFS_BAK=$IFS
the initial 'IFS' value is stored in the variable 'IFS_BAK' and the value of IFS is set to 'new line' by
IFS="
"
so that the entire line is treated as a 'single input'.
Later, at the end of the program, the IFS value is restored to what it was originally.

Import File to Database in PhpMyAdmin

I want to import a file to a phpmyadmin database. It is to have 5 columns: id, url, lat, lon and address. However each line of the file is structured as follows:
23947501894 https://farm2.staticflickr.com/1664/23947501894_09e21ac1c4_q.jpg 53.404021 -2.996651 Belgian Merchant Seamen, Queensway (Mersey Tunnel), Liverpool, North West England, England, CH41, United Kingdom
Most of the data I want to input is seperated by a space, other than when it gets to the address at the end, where it has many spaces and commas. Is it possible to input this data to the database as is? If so can anyone suggest how I might do this?
I am very new to phpmyadmin and I am using python to do this. Thanks in advance for your help I am very stuck!
You'll have to process the text file before importing, since the delimiter also appears unescaped in line with your data.
The good news is that your data format makes this really easy. Take the first four spaces and convert them to a special character (maybe ; or ~, something that doesn't appear anywhere else in your data). You can accomplish this with your favorite stream editor or text manipulation program (sed, awk, perl, and python are all good candidates for this work).
There are many ways to do this (see also these answers for an idea how many different ways exist, though note that question is about working on an entire file and we want to work on individual lines), but probably the simplest is by running sed four times:
for i in $(seq 4) ; do sed -i -e 's/ /~/' ~/import.csv ; done
Make sure you do this with a copy of the file because this will edit the specified file in-place.
From your phpMyAdmin Import tab, you'll then use ~ (or whatever separator you used) as the value for "Columns separated with:" and leaving blank all the others except for leaving "auto" at "Lines terminated with:"
Your import settings should look like this (again, substitute whatever character you need to for the delimiter):
Log in PHPMyAdmin, then do:
[Refer Right Frame]
1. Click Database Tab and Create DB
2. Click Import Tab
3. Click Browse and select csv file
4. Change Format from SQL to CSV
5. Click Go

How to use python Transaction without database ?

I have two line in my code which first one is os.unlink and second one is os.symlink. like :
os.unlink(path)
os.symlink(new_path)
The sequence should not be change, The problem is, some times it unlink a file (in other word it remove it's shortcut) but second line could not create symbolic link (do to some addressing issue).
My question is: Is there any all or non transaction tool like the one we have in database, to do both line or non ?
you could try this:
import os
linkname = '/tmp/test.lnk'
orig_target = os.path.realpath(linkname)
os.unlink(linkname)
try:
os.symlink(new_target, linkname)
except:
os.symlink(orig_target, linkname)
maybe check what exceptions can occur and only catch the ones that are relevant.
Strictly speaking it is not possible unless you use Transactional filesystem like TxF (https://en.wikipedia.org/wiki/Transactional_NTFS) because nothing prevents your machine from poweroff between two commands.
I can see 2 ways here:
1) Switch to Database
2) Check all conditions before unlinking. What prevents you from symlinking?

Categories

Resources