I'm trying to access some data from a public source, however I have trouble getting the module fileio to work despite installing it with pip. Here's my code:
from fileio import read
import gzip
odffn = 'test-data/Level1_IC59_data_Run00115150_Part00000000.odf.gz'
f = gzip.open(odffn)
ev = read(f)
hit_dist = list()
while ev :
# do some analysis with the event
hit_dist.append(len(ev.hits))
# get the next event
ev = read(f)
import pylab
pylab.hist(hit_dist,30,range=(0,1000), log=True, histtype='step')
pylab.title('IceCube Hit Distribution')
pylab.xlabel('nhit')
pylab.savefig('nhits.png')
And I get the following error:
from fileio import read
ModuleNotFoundError: No module named 'fileio'
However, I already checked using the pip installer,
python -m pip install fileio
And I get the module is already installed. I don't think it's a problem with the PATH since it works well with all the other models (i.e. numpy), so I'm not really sure what could be the problem. I appreciate in advance any insight.
I looked in pip for fileio and from what I can see, this doesn't appear to be a legitimate package. When installed from pip it doesn't install any importable Python modules or packages. All it does it create a skeleton directory under site-packages.
I think you should step back and re-evaluate what this code is doing:
from fileio import read
import gzip
odffn = 'test-data/Level1_IC59_data_Run00115150_Part00000000.odf.gz'
f = gzip.open(odffn)
ev = read(f)
hit_dist = list()
This seems fine (ignoring the import from fileio), up until the line: ev = read(f). What is the purpose of using this function to read the file object that gzip returns? That object has its own set of read methods that should be able to do the job:
import gzip
odffn = 'test-data/Level1_IC59_data_Run00115150_Part00000000.odf.gz'
f = gzip.open(odffn)
lines = f.readlines()
Assuming this is a text file, that should read the whole thing into a list of strings, one per line. You can also buffer it:
buf_size = 100
buf = f.read(buf_size)
while buf:
<do something with 1-100 characters of input>
buf = f.read(buf_size)
or buffer whole lines:
line_buf = f.readline()
while line_buf:
<do something with a line of input>
line_buf = f.readline()
Related
I'm writing a simple program to save a given twitter user's tweets word-by-word into a .csv file, as well as using nltk to tag them with parts of speech.
When attempting to iterate through twint.output.tweets_list, I receive the following error:
twint.get:User:'NoneType' object is not subscriptable
I know for a fact that there are tweets to be returned, so it's not simply missing tweets.
My code is as follows:
import twint
import csv
import nltk
# Configure Twint object
c = twint.Config()
c.Username = "POTUS"
c.Limit = 100
# Run Twint
twint.run.Search(c)
# Open a CSV file and write the tweets and their parts of speech to it
with open('tweets_with_POS.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["word", "part_of_speech"])
for tweet in twint.output.tweets_list:
words = nltk.word_tokenize(tweet.tweet)
pos_tags = nltk.pos_tag(words)
for word, pos in pos_tags:
writer.writerow([word, pos])
I've tried running the code from a variety of networks, thinking it may be an IP block, but it doesn't seem to be. Any help is appreciated.
You will need to include the following code if you want to reproduce this
nltk.download('punkt') nltk.download('averaged_perceptron_tagger')
Turns out the issue was with a screwy compatibility issue with Twint.
I ran pip install --upgrade -e git+https://github.com/twintproject/twint.git#origin/master#egg=twint to upgrade twint, which had automatically downloaded one version behind the main branch for some reason.
I then encountered AttributeError: module 'typing' has no attribute '_ClassVar' which I resolved by running pip uninstall dataclasses -y
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I am actually able to get this code to work perfectly without error through one environment that I am working through. However, when I write the code (see below) in another environment I get an error:
import tarfile
import numpy as np
texting = []
tar = tarfile.open("home/mk/text.tar.gz", "r:gz")
for member in tar.getmembers():
f=tar.extractfile(member):
if f is not None:
content = f.read()
texting.append(content)
Just to repeat, I got this to work with no issues at all in my one environment but I get the following error in the other environment:
EOFError: Compressed file ended before the end-of-stream marker was reached
What's the problem and how can I fix this?
I've edited the post to reflect the changes recommended.
def Excel2CSV(ExcelFile, Sheetname, CSVFile):
import xlrd
import csv
workbook = xlrd.open_workbook('C:\Users\Programming\consolidateddataviewsyellowed.xlsx')
worksheet = workbook.sheet_by_name (ARC)
csvfile = open (ARC.csv,'wb')
wr = csv.writer (csvfile, quoting = csv.QUOTE_ALL)
for rownum in xrange (worksheet.nrows):
wr.writerow(
list(x.encode('utf-8') if type (x) == type (u'') else x
for x in worksheet.row_values (rownum)))
csvfile.close()
Excel2CSV("C:\Users\username\Desktop\Programming\consolidateddataviewsyellowed.xlsx","ARC","output.csv")
It displays the following error.
Traceback (most recent call last):
File "C:/Programming/ExceltoCSV.py", line 18, in <module>
File "C:/Programming/ExceltoCSV.py", line 2, in Excel2CSV
import xlrd
ImportError: No module named xlrd
Any help would be greatly appreciated.
Response to edited code
No module named xlrd indicates that you have not installed the xlrd library. Bottom line, you need to install the xlrd module. Installing a module is an important skill which beginner python users must learn and it can be a little hairy if you aren't tech savvy. Here's where to get started.
First, check if you have pip (a module used to install other modules for python). If you installed python recently and have up-to-date software, you almost certainly already have pip. If not, see this detailed how-to answer elsewhere on stackoverflow:
How do I install pip on Windows?
Second, use pip to install the xlrd module. The internet already has a trove of tutorials on this subject, so I will not outline this here. Just Google: "how to pip install a module on your OS here"
Hope this helps!
Old Answer
your code looks good.. Here's the test case I ran using mostly what your wrote. Note that I changed your function so that it uses the arguments rather than hardcoded values. that may be where your trouble is?
def Excel2CSV(ExcelFile, Sheetname, CSVFile):
import xlrd
import csv
workbook = xlrd.open_workbook (ExcelFile)
worksheet = workbook.sheet_by_name (Sheetname)
csvfile = open (CSVFile,'wb')
wr = csv.writer (csvfile, quoting = csv.QUOTE_ALL)
for rownum in xrange(worksheet.nrows):
wr.writerow(
list(x.encode('utf-8') if type (x) == type (u'') else x
for x in worksheet.row_values (rownum)))
csvfile.close()
Excel2CSV("C:\path\to\XLSXfile.xlsx","Sheet_Name","C:\path\to\CSVfile.csv")
Double check that the arguments you are passing are all correct.
I have a blog written in reStructuredText which I currently have to manually convert to HTML when I make a new post.
I'm writing a new blog system using Google App Engine and need a simple way of converting rst to HTML.
I don't want to use docutils because it is too big and complex. Is there a simpler (ideally single python file) way I can do this?
docutils is a library that you can install. It also installs front end tools to convert from rest to various formats including html.
http://docutils.sourceforge.net/docs/user/tools.html#rst2html-py
This is a stand alone tool that can be used.
Most converters will exploit the docutils library for this.
The Sphinx documentation generator Python library includes many restructured text (RST) command-line converters.
Install Sphinx:
$ pip install sphinx
Then use one of the many rst2*.py helpers:
$ rst2html.py in_file.rst out_file.html
Have a look at the instructions for hacking docutils. You don't need the whole docutils to produce a html from rst, but you do need a reader, parser, transformer and writer. With some effort you could combine all of these to a single file from the existing docutils files.
Well you could try it with the following piece of code, usage would be:
compile_rst.py yourtext.rst
or
compile_rst.py yourtext.rst desiredname.html
# compile_rst.py
from __future__ import print_function
from docutils import core
from docutils.writers.html4css1 import Writer,HTMLTranslator
import sys, os
class HTMLFragmentTranslator( HTMLTranslator ):
def __init__( self, document ):
HTMLTranslator.__init__( self, document )
self.head_prefix = ['','','','','']
self.body_prefix = []
self.body_suffix = []
self.stylesheet = []
def astext(self):
return ''.join(self.body)
html_fragment_writer = Writer()
html_fragment_writer.translator_class = HTMLFragmentTranslator
def reST_to_html( s ):
return core.publish_string( s, writer = html_fragment_writer )
if __name__ == '__main__':
if len(sys.argv)>1:
if sys.argv[1] != "":
rstfile = open(sys.argv[1])
text = rstfile.read()
rstfile.close()
if len(sys.argv)>2:
if sys.argv[2] != "":
htmlfile = sys.argv[2]
else:
htmlfile = os.path.splitext(os.path.basename(sys.argv[1]))[0]+".html"
result = reST_to_html(text)
print(result)
output = open(htmlfile, "wb")
output.write(result)
output.close()
else:
print("Usage:\ncompile_rst.py docname.rst\nwhich results in => docname.html\ncompile_rst.py docname.rst desiredname.html\nwhich results in => desiredname.html")
Building the doc locally
Install Python.
Clone the forked repository to your computer.
Open the folder that contains the repository.
Execute: pip install -r requirements.txt --ignore-installed
Execute: sphinx-build -b html docs build
The rendered documentation is now in the build directory as HTML.
If Pyfunc's answer doesn't fit your needs, you could consider using the Markdown language instead. The syntax is similar to rst, and markdown.py is fairly small and easy to use. It's still not a single file, but you can import it as a module into any existing scripts you may have.
http://www.freewisdom.org/projects/python-markdown/