I am a novice at python so I apologize if this is confusing. I am trying to create a 6 variable venn diagram. I was trying to use matplotlib-venn, however the problem I am having is creating the sets is turning out to be impossible for me. My data is thousands of rows long with a unique index and each column has boolean values for each category. It looks something like this:
|A|B|C|D|E|F|
|0|0|1|0|1|1|
|1|1|0|0|0|0|
|0|0|0|1|0|0|
Ideally I'd like to make a venn diagram which would show that these # of people overlap with category A and B and C. How would I go about doing this? If anyone would be able to point me in the right direction, I'd be really grateful.
I found this person had a similiar problem with me and his solution at the end of that forum is what I'd like to end up at except with 6 variables: https://community.plotly.com/t/how-to-visualize-3-columns-with-boolean-values/36181/4
Thank you for any help!
Perhaps you might try to be more specific about your needs and what you have tried.
Making a six-set Venn diagram is not trivial at all, ever more so if you want to make the areas proportional. I made a program in C++ (nVenn) with a translation to R (nVennR) that can do that. I suppose it might be used from python, but I have never tried and I do not know if that is what you want. Also, interpreting six-set Venn diagrams is not easy, you may want to check upSet for a different kind of representation. In the meantime, I can point you to a web page I made that explains how nVenn works (link).
Related
I am new to this forum, so this will be my first question ever (by having used the forum for several years now :D).
What's my Problem:
I am working in a Company now, where we want to automate processes like finding lowest and/or highest points/lines in classified 3d point cloud data (such as walls, roofs, ...). So I have a classified point cloud where I don't want to draw the lines myself of the lowest and highest points of walls or roofs or anythin, but figure out how python could do the job for me instead!
What I'd like to know:
To start, I'd like to know what is the best and proper way to process point cloud data using python? I came up with several ideas by simply google searching (such as laspy, open3d, ...) but I am very confused, which one might be the library I'd need for my mission or where I should really start to put effort in learning to deal with a certain package..
So, I am grateful for your answers and suggestions (maybe there exists a similar entry which I haven't found already?).
Thanks
Max
You might want to check out the Open3D Tutorials found here.
There isn't one that does exactly what you're looking for, but pretty dam close (IMO).
I'm not interested in doing what you're doing, but if I was this is where I'd figure it out.
I have a dataset which has items with the following layout/schema:
{
words: "Hi! How are you? My name is Helennastica",
ratio: 0.32,
importantNum: 382,
wordArray: ["dog", "cat", "friend"],
isItCorrect: false,
type: 2
}
where I have a lot of different types of data, including:
Arrays (of one type only, e.g an array of strings or array of numbers, never both)
Booleans
Numbers with fixed min/max (i.e on a scale of 0 to 1)
Limitless integers (any integer -∞ to ∞)
Strings, with some dictionary, some new, words
The task is to create an RNN (well, generally, a system that can quickly retrain when given one extra bit of data instead of reprocessing it all - I think an RNN is the best choice; see below for reasoning) which can use all of these factors to categorise any dataset into one of 4 categories - labelled by the type key in the above example, a number 0-3.
I have a set of lots of the examples in the above format (with answer provided), and I have a database filled with uncategorised examples. My intention is to be able to run the ML model on that set, and sort all of them into categories. The reason I need to be able to retrain quickly is because of the feedback feature: if the AI gets something wrong, any user can report it, in which case that specific JSON will be added to the dataset. Obviously, having to retrain with 1000+ JSONs just to add one extra on would take ages - if I am not mistaken, an RNN can get around this.
I have found many possible use-cases for something like this, yet I have spent literal hours browsing through Github trying to find an implementation, or some Tensorflow module/addon to make this easier/copy, but to no avail.
I assume this would not be too difficult using Tensorflow, and I understand a bit of the maths and logic behind it (but not formally educated, so I probably have gaps!), but unfortunately I have essentially no experience with using Tensorflow/any other ML frameworks (beyond copy-pasting code for some other projects). If someone could point me in the right direction in the form of a Github repo/Python framework, or even write some demo code to help solve this problem, it would be greatly appreciated. And if you're just going to correct some of my technical knowledge/tell me where I've gone horrendously wrong, I'd appreciate that feedback to (just leave it as a comment).
Thanks in advance!
This is my first post on StackOverflow. I've never asked here before because all of it found its answer by browsing here, except this one time.
I am trying to plot date to have a schematic representation but I am not sure what is the 'best way' to do it and how to achieve what I thought of.
I figured this is the best way to represent the dataset I have :
So each data has
a x-axis start
a x-axis end
a y-axis value
a z-axis value
My data are stored into csv file like
start | stop | y-value | z-value
I thought about using heatmap to do so but I am not sure if :
This is the best way to do it (The overlap can be problematic to handle)
is there is an easy way to do it (should I manually add all the required point between start & stop?
If I want to highlight some data, can I change (just for some of it) the z-color scale?
I thought a little help from here might clarify things :)
Cheers,
Little update : The way I was heading is working. I am not sure whether is is the best aproach but at least it is working.
I still need to make formating better but this is more or less what one can get :
So, the way I implemented it is to build a matrix and for each data fill each discreet point of the matrix with the highest z-value.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to take a 1.8mB txt file. There are couple of header lines afterwards its all space separated data. I can pull the data off using pandas. What I'm wanting to do with the data is:
1) Cut out the non essential data. ie the first 1675 lines, roughly I want to remove and the last 3-10 lines, varies day to day, I also want to remove. I can remove the first lines, kind of. The big problem with this idea I'm having right now is knowing for sure where the 1675 pointer location is. Using something like
df = df[df.year > 1978]
only moves the initial 'pointer' to 1675. If I try
dataf = df[df.year > 1978]
it just gives me a pure copy of what I would have with the first line. It still keeps the pointer to the same 1675 start point. It won't allow me to access any of the first 1675 rows but they are still obviously there.
df.year[0]
It comes back with an error suggesting row 0 doesn't exist. I have to go out and search to find what the first readable row is...instead of flat out removing the rows and moving the new pointer up to 0 it just moves the pointer to 1675 and won't allow access to anything lower than that. I still haven't found a way to be able to determine what the last row number is through programming, through the shell it's easy but I need to be able to do it through the program so I can set up the loop for point 2.
2) I want to be able to take averages of the data, 'x' day moving averages and create a new column with the new data once I have calculated the moving average. I think I can create the new column with the Series statement...I think...I haven't tried it yet though as I haven't been able to get this far yet.
3) After all this and some more math I want to be able to graph the data with a homemade graph. I think this should be easy once I have everything else completed. I have already created the sample graph and can plot the points/lines on the graph once I have the data to work with.
Is panda the right lib for the project or should I be trying to use something else? So far the more research I do...the more lost I get as everything I keep trying gets me a little further but sets me even further back at the same time. In something similar I saw mention using something else when wanting to do math on the data block. Their wasn't any indication as to what he used though.
It sounds like you main trouble is indexing. If you want to refer to the "first" thing in a DataFrame, use df.iloc[0]. But DataFrame indexes are really powerful regardless.
http://pandas.pydata.org/pandas-docs/stable/indexing.html
I think you are headed in the right direction. Pandas gives you nice, high level control over your data so that you can manipulate it much more easily than using traditional logic. It will take some work to learn. Work through their tutorials and you should be fine. But don't gloss over them or you'll miss some important details.
I'm not sure why you are concerned that the lines you want to ignore aren't being deleted as long as they aren't used in your analysis, it doesn't really matter. Unless you are facing memory constraints, it's probably irrelevant. But, if you do find you can't afford to keep them around, I'm sure there is a way to really remove them, even if it's a bit sideways.
Processing a few megabytes worth of data is pretty easy these days and Pandas will handle it without any problems. I believe you can easily pass pandas data to numpy for your statistical calculations. You should double check that, though, before taking my word for it. Also, they mention matplotlib on the pandas website, so I am guessing it will be easy to do basic graphing as well.
I want to create a table with Python that looks like a simple excel table, therefore I have already used the pyExcelerator. But now I thought about just using pyplot.table which seems to be very easy. However, I need to make some changes and I don't know if this is possible in vpyplot.table`.
For example I want to add a cell in the upper left corner and I also want to make two cells beneath the cell t+1 (see the table example below).
So, is it possible to do these changes in pyplot.table or should I better use another way to make tables?
Building a program to generate an table in a image for inclusion into your word document is a bit overkill. Its a lot of added work and completely unnecessary effort. Make the table in Excel and then paste it into Word. It'll look good and will be easier to update and change.
If you are using this as an excuse to learn something new, that is all well and good, but you need to give us more to help you with. SO isn't a code factory. Offer up what you have tried, and samples of code you are having trouble with. We can help with that.