Loading .npz file onto R

Loading .npz file onto R - python

I am currently having a d.npz file that was born from Python that I would like to load it into R. I tried using the package RcppCNPy, but when I run load("d.npz"), it gave me this error message:
bad restore file magic number (file may be corrupted) -- no data loaded
I tried other tools like source("d.npz") or readRDS("d.npz"), but none worked out. Could anyone please help me solve this seemingly simple problem??

Related

Error when using Writer.Close() function within my Pandas and Openpyxl code

I have written a code which combines some CSV files into a single Excel file, and ended the 'writer' with the code:
writer.save()
writer.close()
However, I get the following error when trying to then open that file after the code has finalised:
We found a problem with some content in 'the file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.'
This seems to purely be related to the 'Writer.Close()' aspect, as without it I don't get the error. However, instead I cannot open the file as it states that someone else is using it (ie - openpyxl)
I'm not sure if relevant, but my file system runs on a OneDrive cloud based system.
My current plan beyond the 'writer.close()' is to pause the script to allow me to print the excel to PDF (I found this to be unreliable via Python), and then 'hit continue' to continue with exporting the PDF via Email.
Any ideas on how to resolve this error?

With out seeing more of your code and maybe an example of the data you are writing it's tough to make any assumptions. Based on the error you are experiencing it is likely due to the inputs/data going into the actual xlsx file that is causing the issue and not with the actual 'writer'. This is Excel saying that data in your file is 'corrupted' from their standards perspective and needs to be fixed.
You should be able to do a 'recovery' of the file through excel and it will identify the problem spots in your file which you can then back track into your python program and properly address to eliminate the probelm.

pandas python Load JSON from local file

I'm trying to set up a github so that all the code is self contained and the other authors don't need to post their entire path to certain files.
my code:
dataSet = pd.read_json("file://repository/Datasets/JSON/data.json", convert_dates=False)
This gives me this error:
URLError: <urlopen error [WinError 3] The system cannot find the path specified: '\\repository\\Datasets\\JSON\\data.json'>
As this is the way that the docs seem to describe how to do this, I'm stumped on how to do it

I'd say move the file into the same directory and simply use
dataSet = pd.read_json('data.json')
Once that works then you know for sure that it's not an issue with reading the file. The error suggests it's an issue with Windows reading the path to the file.
Not sure what editor you're using but in VS Code if you right click the file it allows you to copy the 'relative path' in relation to the file you're currently working on.
Sorry I can't be of more help.

How to parse a binary file in python

I am still new to the python when it comes to parsing data. I'd like solve this problem that can be seen in the image and the respective "telemetry.bin" and 'TLM_LIST.csv" are in the google drive folder. I am able to operate the csv file using pandas but don't how to deal with the bin file as it throws output like below:
https://drive.google.com/open?id=1h_15khW2abjT8V6L38VSrqpb5vrmDfa-
b'\n\x00\x00\x07\x08\x01\x0b\xe7\x08\xc3\x0b\xd9\x07\x9e\x04\xe4\x00\x00\x0c\xef\x00\x99\x1f\xdb\x00\x00\x00\x00\xbe\xef\xca\xfe\x00\x00\x01\x00\x02\x04\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00mg2878_a A\x00\xc1\x00\x8fe\xb8>\x00\x00\x00\x05\x00%~\x1a\x00\x00\x00\x96\x00\x00\x00\x91\x00\x01\x02\x00\x00\x00\x00\x8fe\xb8?\x00\x00\x00\x00'
enter image description here
Please I really wanna learn so please help.
Thanks in advance.

Saving vcf to npz and then loading to Python. No error, but no output

I'm trying to analyze genome data from a huge (1.75GB compressed) vcf file using Python. The technician suggested I use scikit-allel and gave me this link: http://alimanfoo.github.io/2017/06/14/read-vcf.html. I wasn't able to install the module on my computer; but I successfully installed it on a cluster which I access through vpn. There, I successfully opened the file and have been able to access the data. But I can only access the cluster through a command line interface, and that isn't as friendly as the Spyder I have on my computer; so I've been trying to bring the data back. The GitHub link says I can save the data into a npz file which I can read straight into Python's numpy; so I've been trying to do that.
First, I tried allel.vcf_to_npz('existing_name.vcf','new_name.npz',fields='calldata/GT') on the cluster. This created a (suspiciously small) new npz file on the cluster, which I downloaded. But when I opened up Spyder on my computer and typed genotypes=np.load('real_genotypes.npz'), no new variable called genotypes appeared in the Variable Explorer. Adding the line print(genotypes) produces <numpy.lib.npyio.NpzFile object at 0x00000__________>
Next, thinking that I should copy everything to be sure, I tried allel.vcf_to_npz('existing_name.vcf','new_name.npz',fields='*',overwrite=True)
This created a 2.10GB file. After a lengthy download, I tried the same thing, but got the same results: No new variable when I try to np.load the file, and <numpy.lib.npyio.NpzFile object at 0x000001DB0DEC7F88> when I ask to print it.
When I tried to Google search this problem, I saw this question: Load compressed data (.npz) from file using numpy.load. But my case looks different. I don't get an error message; I just get nothing. So what's wrong?
Thanks

permission denied error while reading an excel file

i got a permission denied error while i tried to open an excel file.
I dont have the ms excel complete version. I mean, im just using the trial version.
Could it be because of that?
my code has just 4 lines
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_excel("E:\\ML")

It's something about how file open function works. I successfully reproduced your problem and find the way.
It's believed that you have a directory named ML in E disk, and maybe there are some excels files (such as *.xls or *.xlsx) in ML(I bet you just started learning machine learning). Now you try to load the excel data into your program, but you give the path E:\\ML, which is a directory instead of a file, so operation is forbidden by system when pandas try to serialize the directory as a file, which is the cause of error "Permission denied".
The method is that you're supposed to use file path like E:\\ML\\your_database_file_name.xls.
I hope it will work for you.

For me, it turns out that it was because I had the same Excel file opened (I kept getting the error while trying to push my work to Github) which was resolved immediately after I closed the MS Excel (the program using the file I wanted to push..)
I hope you find this helpful!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.