I am very new with python and am struggling to figure out how to output my data to a file.
The following section of my script works great, however I would like to be able to output the printed data to a text file.
x = np.linspace(141, 144.5, 500)
print x
y = np.linspace(-38.53, -38.53, 500)
z = np.linspace(0, 0, 500)
gz = tesseroid.gz(x,y,z,model)
print gz
As an addendum to that; when it prints in my terminal it prints the data as follows...
(x)
1 2 3 4
5 ... 500
(gz)
1 2 3 4
5 ... 500
...however I would love it to output the data in a single column like this:
1
2
3
4
5
...
8
It would be even better if they could both be in the same file...
1 1
2 2
3 3
4 4
5 5
...
500 500
...however this is not necessary, as I could easily manually manipulate them from there.
Many thanks in advance, and apologies if it is a very simple question; it is just something that I haven't been able to figure out with my limited experience.
Edit: Please note that it does not necessarily have to use the numpy savetext function. If there is an easier way to perform this task then I am perfectly willing to use it instead.
As you title says, you can use np.savetxt() to save in one column:
np.savetxt('x.txt', x.ravel())
np.savetxt('gz.txt', gz.ravel())
or to save in two columns:
np.savetxt('x_gz.txt', np.vstack((x.ravel(), gz.ravel())).T)
Related
I am loading a txt file containig complex number. The data are formatted in this way
How can I create a two separate arrays, one for the real part and one for the imaginary part?
I tried to create a panda dataframe using e-01 as a separator but in this way I loose this info
df = pd.read_fwf(r'c:\test\complex.txt', header=None)
df[['real','im']] = df[0].str.extract(r'\(([-.\de]+)([+-]\d\.[\de\-j]+)')
print(df)
0 real im
0 (9.486832980505137680e-01-3.162277660168379412... 9.486832980505137680e-01 -3.162277660168379412e-01j
1 (9.486832980505137680e-01+9.486832980505137680... 9.486832980505137680e-01 +9.486832980505137680e-01j
2 (-9.486832980505137680e-01+9.48683298050513768... -9.486832980505137680e-01 +9.486832980505137680e-01j
3 (-3.162277660168379412e-01+3.16227766016837941... -3.162277660168379412e-01 +3.162277660168379412e-01j
4 (-3.162277660168379412e-01+9.48683298050513768... -3.162277660168379412e-01 +9.486832980505137680e-01j
5 (9.486832980505137680e-01-3.162277660168379412... 9.486832980505137680e-01 -3.162277660168379412e-01j
6 (-3.162277660168379412e-01+3.16227766016837941... -3.162277660168379412e-01 +3.162277660168379412e-01j
7 (9.486832980505137680e-01-9.486832980505137680... 9.486832980505137680e-01 -9.486832980505137680e-01j
8 (9.486832980505137680e-01-9.486832980505137680... 9.486832980505137680e-01 -9.486832980505137680e-01j
9 (-3.162277660168379412e-01+3.16227766016837941... -3.162277660168379412e-01 +3.162277660168379412e-01j
10 (3.162277660168379412e-01-9.486832980505137680... 3.162277660168379412e-01 -9.486832980505137680e-01j
Never knew how annoyingly involved it is to read complex numbers with Pandas, This is a slightly different solution than #Алексей's. I prefer to avoid regular expressions when not absolutely necessary.
# Read the file, pandas defaults to string type for contents
df = pd.read_csv('complex.txt', header=None, names=['string'])
# Convert string representation to complex.
# Use of `eval` is ugly but works.
df['complex'] = df['string'].map(eval)
# Alternatively...
#df['complex'] = df['string'].map(lambda c: complex(c.strip('()')))
# Separate real and imaginary parts
df['real'] = df['complex'].map(lambda c: c.real)
df['imag'] = df['complex'].map(lambda c: c.imag)
df
is...
string complex \
0 (9.486832980505137680e-01-3.162277660168379412... 0.948683-0.316228j
1 (9.486832980505137680e-01+9.486832980505137680... 0.948683+0.948683j
2 (-9.486832980505137680e-01+9.48683298050513768... -0.948683+0.000000j
3 (-3.162277660168379412e-01+3.16227766016837941... -0.316228+0.316228j
4 (-3.162277660168379412e-01+9.48683298050513768... -0.316228+0.948683j
5 (9.486832980505137680e-01-3.162277660168379412... 0.948683-0.316228j
6 (3.162277660168379412e-01+3.162277660168379412... 0.316228+0.316228j
7 (9.486832980505137680e-01-9.486832980505137680... 0.948683-0.948683j
real imag
0 0.948683 -3.162278e-01
1 0.948683 9.486833e-01
2 -0.948683 9.486833e-01
3 -0.316228 3.162278e-01
4 -0.316228 9.486833e-01
5 0.948683 -3.162278e-01
6 0.316228 3.162278e-01
7 0.948683 -9.486833e-01
df.dtypes
prints out..
string object
complex complex128
real float64
imag float64
dtype: object
I trying to read the message from database, but under the class label can't really read same as CSV dataset.
messages = pandas.read_csv('bitcoin_reddit.csv', delimiter='\t',
names=["title","class"])
print (messages)
Under the class label the pandas only can read as NaN
The version of my CSV file
title,url,timestamp,class
"It's official! 1 Bitcoin = $10,000 USD",https://v.redd.it/e7io27rdgt001,29/11/2017 17:25,0
The last 3 months in 47 seconds.,https://v.redd.it/typ8fdslz3e01,4/2/2018 18:42,0
It's over 9000!!!,https://i.imgur.com/jyoZGyW.gifv,26/11/2017 20:55,1
Everyone who's trading BTC right now,http://cdn.mutually.com/wp-content/uploads/2017/06/08-19.jpg,7/1/2018 12:38,1
I hope James is doing well,https://i.redd.it/h4ngqma643101.jpg,1/12/2017 1:50,1
Weeeeeeee!,https://i.redd.it/iwl7vz69cea01.gif,17/1/2018 1:13,0
Bitcoin.. The King,https://i.redd.it/4tl0oustqed01.jpg,1/2/2018 5:46,1
Nothing can increase by that much and still be a good investment.,https://i.imgur.com/oWePY7q.jpg,14/12/2017 0:02,1
"This is why I want bitcoin to hit $10,000",https://i.redd.it/fhzsxgcv9nyz.jpg,18/11/2017 18:25,1
Bitcoin Doesn't Give a Fuck.,https://v.redd.it/ty2y74gawug01,18/2/2018 15:19,-1
Working Hard or Hardly Working?,https://i.redd.it/c2o6204tvc301.jpg,12/12/2017 12:49,1
The separator in your csv file is a comma, not a tab. And since , is the default, there is no need to define it.
However, names= defines custom names for the columns. Your header already provides these names, so passing the column names you are interested in to usecols is all you need then:
>>> pd.read_csv(file, usecols=['title', 'class'])
title class
0 It's official! 1 Bitcoin = $10,000 USD 0
1 The last 3 months in 47 seconds. 0
2 It's over 9000!!! 1
3 Everyone who's trading BTC right now 1
4 I hope James is doing well 1
5 Weeeeeeee! 0
I am converting some of my web-scraping code from R to Python (I can't get geckodriver to work with R, but it's working with Python). Anyways, I am trying to understand how to parse and read HTML tables with Python. Quick background, here is my code for R:
doc <- htmlParse(remDr$getPageSource()[[1]],ignoreBlanks=TRUE, replaceEntities = FALSE, trim=TRUE, encoding="UTF-8")
WebElem <- readHTMLTable(doc, stringsAsFactors = FALSE)[[7]]
I would parse the HTML page to the doc object. Then I would start with doc[[1]], and move through higher numbers until I saw the data I wanted. In this case I got to doc[[7]] and saw the data I wanted. I then would read that HTML table and assign it to the WebElem object. Eventually I would turn this into a dataframe and play with it.
So what I am doing in Python is this:
html = None
doc = None
html = driver.page_source
doc = BeautifulSoup(html)
Then I started to play with doc.get_text but I don't really know how to get just the data I want to see. The data I want to see is like a 10x10 matrix. When I used R, I would just use doc[[7]] and that matrix would almost be in a perfect structure for me to convert it to a dataframe. However, I just can't seem to do that with Python. Any advice would be much appreciated.
UPDATE:
I have been able to get the data I want using Python--I followed this blog for creating a dataframe with python: Python Web-Scraping. Here is the website that we are scraping in that blog: Most Popular Dog Breeds. In that blog post, you have to work your way through the elements, create a dict, loop through each row of the table and store the data in each column, and then you are able to create a dataframe.
With R, the only code I had to write was:
doc <- htmlParse(remDr$getPageSource()[[1]],ignoreBlanks=TRUE, replaceEntities = FALSE, trim=TRUE, encoding="UTF-8")
df <- as.data.frame(readHTMLTable(doc, stringsAsFactors = FALSE)
With just that, I have a pretty nice dataframe that I only need to adjust the column names and data types--it looks like this with just that code:
NULL.V1 NULL.V2 NULL.V3 NULL.V4
1 BREED 2015 2014 2013
2 Retrievers (Labrador) 1 1 1
3 German Shepherd Dogs 2 2 2
4 Retrievers (Golden) 3 3 3
5 Bulldogs 4 4 5
6 Beagles 5 5 4
7 French Bulldogs 6 9 11
8 Yorkshire Terriers 7 6 6
9 Poodles 8 7 8
10 Rottweilers 9 10 9
Is there not something available in Python to make this a bit simpler, or is this just simpler in R because R is more built for dataframes(at least that's how it seems to me, but I could be wrong)?
Ok, after some hefty digging around I feel like I came to good solution--matching that of R. If you are looking at the HTML provided in the link above, Dog Breeds, and you have the web driver running for that link you can run the following code:
tbl = driver.find_element_by_xpath("//html/body/main/article/section[2]/div/article/table").get_attribute('outerHTML')
df = pd.read_html(tbl)
Then you are looking a pretty nice dataframe after only a couple lines of code:
In [145]: df
Out[145]:
[ 0 1 2 3
0 BREED 2015 2014 2013.0
1 Retrievers (Labrador) 1 1 1.0
2 German Shepherd Dogs 2 2 2.0
3 Retrievers (Golden) 3 3 3.0
4 Bulldogs 4 4 5.0
5 Beagles 5 5 4.0
I feel like this is much easier than working through the tags, creating a dict, and looping through each row of data as the blog suggests. It might not be the most correct way of doing things, I'm new to Python, but it gets the job done quickly. I hope this helps out some fellow web-scrapers.
tbl = driver.find_element_by_xpath("//html/body/main/article/section[2]/div/article/table").get_attribute('outerHTML')
df = pd.read_html(tbl)
it Worked pretty well.
First, read Selenium with Python, you will get basic idea of how Selenium work with Python.
Than, if you want to locate element in Python, there are tow ways:
Use Selenium API, you can refer Locating Elements
Use BeautifulSoup, there is nice Document you can read
BeautifulSoupDocumentation
I'm trying to make a simple spreadsheet:
I input some details, and it gets saved into the spreadsheet.
So, I input the details into 2 lists, and 1 normal variable:
Dates = ['01/01/14', '01/02/14', '01/03/14']
Amount = ['1', '2', '3']
Cost = 12 (because it is always the same)
I'm trying to insert these into a spreadsheet like this:
for i in range(len(Dates)):
insertThese.extend([Dates[i], Amount[i], Cost])
ws.append(insertThese)
but this adds the 3 things side-by-side like:
A B C D E F G H I
01/01/14 1 12 01/02/14 2 12 01/03/14 3 12
but I want it to be like, basically adding a new row at the end of insertThese.expand...
A B C
01/01/14 1 12
01/02/14 2 12
01/03/14 3 12
I don't understand how to do this without removing by headers at the top of the file.
I tried using iter_rows() but that removes the header.
So how do I get the details to be added row-by-row?
I'm new to openpyxl, so if anything's obvious - sorry!
You can use zip/ itertools.izip to loop over your lists in parallel. Something like the following:
for d, a in zip(Dates, Amount):
ws.append([d, a, 12])
This is a simple question for which answer are surprisingly difficult to find online. Here's the situation:
>>> A
[('hey', 'you', 4), ('hey', 'not you', 5), ('not hey', 'you', 2), ('not hey', 'not you', 6)]
>>> A_p = pandas.DataFrame(A)
>>> A_p
0 1 2
0 hey you 4
1 hey not you 5
2 not hey you 2
3 not hey not you 6
>>> B_p = A_p.pivot(0,1,2)
>>> B_p
1 not you you
0
hey 5 4
not hey 6 2
This isn't quite what's suggested in the documentation for pivot -- there, it shows results without the 1 and 0 in the upper-left-hand corner. And that's what I'm looking for, a DataFrame object that prints as
not you you
hey 5 4
not hey 6 2
The problem is that the normal behavior results in a csv file whose first line is
0,not you,you
when I really want
not you, you
When the normal csv file (with the preceding "0,") reads into R, it doesn't properly set the column and row names from the frame object, resulting in painful manual manipulation to get it in the right format. Is there a way to get pivot to give me a DataFrame object without that additional information in the upper-left corner?
Well, you have:
In [17]: B_p.to_csv(sys.stdout)
0,not you,you
hey,5.0,4.0
not hey,6.0,2.0
In [18]: B_p.to_csv(sys.stdout, index=False)
not you,you
5.0,4.0
6.0,2.0
But I assume you want the row names. Setting the index name to None (B_p.index.name = None) gives a leading comma:
In [20]: B_p.to_csv(sys.stdout)
,not you,you
hey,5.0,4.0
not hey,6.0,2.0
This roughly matches (ignoring quoted strings) what R writes in write.csv when row.names=TRUE:
"","a","b"
"foo",0.720538259472741,-0.848304940318957
"bar",-0.64266667412325,-0.442441171401282
"baz",-0.419181615269841,-0.658545964124229
"qux",0.881124313748992,0.36383198969179
"bar2",-1.35613767310069,-0.124014006180608
Any of these help?
EDIT: Added the index_label=False option today which does what you want:
In [2]: df
Out[2]:
A B
one 1 4
two 2 5
three 3 6
In [3]: df.to_csv('foo.csv', index_
index_exp index_label= index_name=
In [3]: df.to_csv('foo.csv', index_name=False)
In [4]:
11:24 ~/code/pandas (master)$ R
R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
re> read.csv('foo.csv')
A B
one 1 4
two 2 5
three 3 6