From Dbf to numpy array - python

How can I convert a dbf to numpy array without using arcpy?
I tried using the dbf library, but I didn't figure out how to select specific columns from my dbf to build the adequate numpy array.
Here is the script I want to reproduce without using arcpy:
arr = arcpy.da.TableToNumPyArray(inTable ,("PROVINCE","ZONE_CODE","MEAN", "Datetime","Time"))
arr = sorted(arr,key=lambda x:datetime.strptime(str(x[3]),"%d/%m/%Y %H:%M:%S"))
Using this command lines, I am able to chose the colums I want and then sort them chronologically (that's the aim of my program).
Here is the one I made with the dbf lib:
arr = dbf.Table(inTable)
arr.open()
arr = sorted(arr,key=lambda x:datetime.strptime(str(x[7]),"%d/%m/%Y %H:%M:%S"))
I don't know how to select the columns I want, and it just lasts an eternity to compile and sort.
Thank you for your help.

One thing to note is that arr is not the same between your code snippets -- in the first it is a numpy array, and in the second it's a dbf Table.
To get what you want:
import dbf
import numpy
table = dbf.Table('some_table.dbf')
table.open()
arr = numpy.array([
(r.province, r.zone_code, r.mean, r.datetime, r.time)
for r in table
])
arr = sorted(arr,key=lambda x:datetime.strptime(str(x[3]).strip(),"%d/%m/%Y %H:%M:%S"))
I'm not sure what the difference will be in performance.
Disclosure: I am the author of the dbf package.

Related

How do i read and store items from a csv file into 2D arrays in python?

So I've been tasked with creating a suitable 2D array to contain all of the data from a csv with data on rainfall from the whole year. In the csv file, the rows represent the weeks of the year and the columns represent the day of the week.
I'm able to display the date I want using the following code.
import csv
data = list(csv.reader(open("rainfall.csv")))
print(data[1][2])
My issue is I'm not sure how to store this data in a 2D array.
I'm not sure how to go about doing this. Help would be appreciated, thanks!
You could use numpy for that. It seems to me, that you have created a list of lists in data. With that you can directly create a 2D numpy-array:
import numpy as np
2d_data = np.array(data)
Or you could even try to directly read the file with numpy:
import numpy as np
# Use the appropriate delimiter here
2d_data = np.genfromtxt("rainfall.csv", delimiter=",")
With pandas:
import pandas as pd
# Use the appropriate delimiter here
2d_data = pd.read_csv("rainfall.csv")

Insert data range into multi dimensionnal array

I am trying to get data from an excel file using xlwings (am new to python) and load it into a multi dimensionnal array (or rather, table) that I could then loop through later on row by row.
What I would like to do :
db = []
wdb = xw.Book(r'C:\temp\xlpython\db.xlsx')
db.append(wdb.sheets[0].range('A2:K2').expand('down'))
So this would load the data into my table 'db', and I could later loop through it using :
for i in range(len(db)):
print(db[i][1])
If I wanted to retrieve the data originally in column B for instance
But instead of this, it loads the data in a single dimension, so if I run the code :
print(range(len(db)))
I will get (0,1) instead of the (0,145) expected if I had 146 rows of data in the excel file
Is there a way to do this, except loading the table line by line ?
Thanks
Have a look at the documentation here on converting the range to a numpy array or specifying the dimensions.
db = []
wdb = xw.Book(r'C:\temp\xlpython\db.xlsx')
db.append(wdb.sheets[0].range('A2:K2').options(np.array, expand='down').value)
After looking at numpy arrays as suggested by Rawson, it seems they have the same behaviour than python lists when appending a whole range, meaning it generates a flat array and does not preserve the rows of the excel range into the array; at least I couldn't get it to work that way.
So finally I looked into panda DataFrame and it seems to do the exact needed job, you can even import column titles which is a plus.
import pandas as pd
wdb = xw.Book(r'C:\temp\xlpython\db.xlsx')
db= pd.DataFrame(wdb.sheets[0].range('A2:K2').expand('down').value)

Get index point from pointcloud pcl python file

Is it possible to retrieve index point from PCL pointcloud file?
I have pointcloud data in txt file with XYZ and some other colum information. I use the following code to convert the txt file into pcl cloud file:
import pandas as pd
import numpy as np
import pcl
data = pd.read_csv('data.txt', usecols=[0,1,2], delimiter=' ')
pcl_cloud = pcl.PointCLoud()
cloud = pcl_cloud.from_array(np.array(data, dtype = np.float32))
As I know, the module from_array only need the XYZ column. After some processing (eg. filtering), the number of raw and result most probably different. Is it possible to know which point number from the result file, so I can mix it with another information from the raw data?
I tried to filter by comparing the coordinates, but it doesn't work because the coordinate slightly changes during the converting from double to float.
Any idea? Thank you very much
I just got the answer, by using extract indices.
eg:
filter = pcl.RadiusOutlierRemoval(data)
indeces = filter.Extract()
Thanks

Numpy CSV fromfile()

I'm probably trying to reinvent the wheel here, but numpy has a fromfile() function that can read - I imagine - CSV files.
It appears to be incredibly fast - even compared to Pandas read_csv(), but I'm unclear on how it works.
Here's some test code:
import pandas as pd
import numpy as np
# Create the file here, two columns, one million rows of random numbers.
filename = 'my_file.csv'
df = pd.DataFrame({'a':np.random.randint(100,10000,1000000), 'b':np.random.randint(100,10000,1000000)})
df.to_csv(filename, index = False)
# Now read the file into memory.
arr = np.fromfile(filename)
print len(arr)
I included the len() at the end there to make sure it wasn't reading just a single line. But curiously, the length for me (will vary based on your random number generation) was 1,352,244. Huh?
The docs show an optional sep parameter. But when that is used:
arr = np.fromfile(filename, sep = ',')
...we get a length of 0?!
Ideally I'd be able to load a 2D array of arrays from this CSV file, but I'd settle for a single array from this CSV.
What am I missing here?
numpy.fromfile is not made to read .csv files, instead, it is made for reading data written with the numpy.ndarray.tofile method.
From the docs:
A highly efficient way of reading binary data with a known data-type, as well as parsing simply formatted text files. Data written using the tofile method can be read using this function.
By using it without a sep parameter, numpy assumes you are reading a binary file, hence the different lengths. When you specify a separator, I guess the function just breaks.
To read a .csv file using numpy, I think you can use numpy.genfromtext or numpy.loadtxt (from this question).

NumPy writing windings instead of numbers to CSV in ArcMap

I'm using this code:
import arcpy
import numpy as np
f = open("F:\INTRO_PY\LAB_7\lab_7.csv","w")
array = np.random.rand(1000,1000)
f.write(array)
f.close
in order to create a 1000x1000 random array in arcpy.
This is what I get when I open the csv:
CSV
I have absolutely no idea why it's doing this, and I'm at my wit's end. Any advice would be really, really appreciated!
In order to save it to CSV, you need to can use numpy's numpy.savetxt [numpy-doc]:
np.savetxt(
r"F:\INTRO_PY\LAB_7\lab_7.csv",
np.random.rand(1000,1000),
delimiter=','
)
The `delimeter thus specifies what one uses to split the different values.
Note that you can only save 1D arrays or 2D arrays to a text file.
I think you are trying to store a numpy in a file, you should convert it to a string first.
Something like the following:
f = open("test.csv","w")
array = np.random.rand(1000,1000)
f.write(str(array))
f.close

Categories

Resources