I'm working on a project to breakdown 3D models but I'm quite lost. I hope you can help me.
I'm getting a 3D model from Autodesk BIM and the format could be native or generic CAD formats (.stp, .igs, .x_t, .stl). Then, I need to "measure" somehow the maximum dimensions to model a raw material body, it will always have the shape of a huge panel. Once I get both bodies, I will get the difference to extract the solids I need to analyze; and, on each of these bodies, I need to extract the faces, and then the lines or curves of each face.
This sounds something really easy to do on a CAD software, but the idea is to automate this process. I was looking into openSCAD, but seems that works only to model geometry and it doesn't handle well imported solids. I'm leaving a picture with the idea of what I need to do in the link below.
So, Any idea how can I do this? which langue and library can help in this project?
I can see this automation possible with a few in between steps:
OpenSCAD can handle differences well, so your "Extract Bodies" seems plausible
1.5 Before going further, you'll have to explain how you "filtered out" the cylinder. Will you do this manually? If you don't, you will have it considered for analysis and have a lot of faces as a result.
I don't think openSCAD provides you a vertex array. However, it can save to .STL, which is kinda easy to parse with the programming language of your choice, you'll have to study .stl file structure a bit (this sounds much more frightening than it is - if you open an stl with an editor you will probably immediately realize what's happening).
Since you've parsed the file, you can now calculate lines with high school math.
This is not an easy, GUI way to do what you ask, but if you have a few skills you'll have your automation, and depending on the amount of your projects it may be worth it.
I have been working in this project, and foundt the library "trimesh" is better to solve this concern. Give it a shot, and save some time.
Related
I need to convert a lot PDF tables data scans with bad quality to excel tables. The only way I see the solution is to train tesseract or some other framework on pre-generated images(all tables in PDF are the same in most cases). Is it real to have a great solution around 70-80% at home conditions and what you can advice. I will appreciate any advice other than Abby FineReader or similar solution(tested on my dataset - result is so bad and few opportunities for automation)
All tables structures need to be correct in result for further handwork.
You should use a PDF parser for that.
Here's the parsed result using Parsio (https://parsio.io). It looks correct to me. You can export the parsed data to Sheets / Excel / CSV / Zapier.
When the input image is a very poor quality the dirt tends to get in the way of text recognition. This is exacerbated when trying to look for areas without dictionary entries, thus only numbers can be the worst type of text to train, for every twist and turn that bad scanning produces.
If the electronic source before manual stamp and scan is available it might be possible to meld the text with the distorted image , but its a highly manual task defeating the aim.
The docents need to be rescanned, by a trained operator, with a good eye for details. That, with an OCR scan device, will be faster than tuning images that are never likely to provide a reasonably trustworthy output. There are too many cases of numeric fails, that would make any single page worthless for reading or computations.
Recently scanned some accounts and spent more time check/correct than if it had been typed, but it needed to be "legal" copy, however clearly it was not as I did it after the event.
The best result I could squeeze from Adobe PDF to Excel was "Pants"
There are some improvements in image contrast and noise reduction (handwork).
Some effect but not obvious.
Image2word
I am looking for some high level advice about a project that I am attempting.
I want to write a PyQt application (following the model-view pattern) to read in images from a directory one by one and process them. Typically there will be a few thousand .png images (each around 1 megapixel, 16 bit grayscale) in the directory. After being read in, the application will then process the integer pixel values of each image in some way, and crucially the result will be a matrix of floats for each. Once processed, the user should be able be able to then go back and explore any of the matrices they choose (or multiple at once), and possibly apply further processing.
My question is regarding a sensible way to store the matrices in memory, and access them when needed. After reading in the raw .png files and obatining the corresponding matrix of floats, I can then see the following options for handling the result:
Simply store each matrix as a numpy array and have every one of them stored in a class attribute. That way they will all be easily accessible to the code when requested by the user, but will this be poor in terms of RAM required?
After processing each, write out the matrix to a text file, and read it back in from the text file when requested by the user.
I have seen examples (see here) of people using SQLite databases to store data for a GUI application (using MVC pattern), and then query the database when you need access to data. This seems like it would have the advantage that data is not stored in RAM by the "model" part of the application (like in option 1), and is possibly more storage-efficient than option 2, but is this suitable given that my data are matrices?
I have seen examples (see here) of people using something called HDF5 for storing application data, and that this might be similar to using a SQLite database? Again, suitable for matrices?
Finally, I see that PyQt has the classes QImage and QPixmap. Do these make sense for solving the problem I have described?
I am a little lost with all the options, and don't want to spend too much time investigating all of them in too much detail so would appreciate some general advice. If someone could offer comments on each of the options I have described (as well as letting me know if any can be ruled out in this situation) that would be great!
Thank you
I aim to design an app that recognize a certain type of objects (let's say, a book) and that can say whether the input is effectively a book or not (binary classification).
For a better user experience, I would like the input to be a video rather than a picture: that way, the user won't have to deal with issues such as sharpness, centering of the object... He'll just have to make a "scan" of the object, without much consideration for the quality of a single image.
And there comes my problem : As I intend to create my training dataset from scratch (the true object I want to detect being absent from existing datasets such as ImageNet),
I was wondering if videos were irrelevant for this type of binary classification and if I should rather ask the user to take a good picture of the object.
On one hand, videos have the advantage of constituting a larger dataset than one created only from photos (though I can expand my picture's dataset thanks to data augmentation) as it is easier to take a 10s video of an object rather than taking 10x24 (more or less…) pictures of it.
But on the other hand I fear the result will be less precise, as in a video many frames are redundant and the average quality might not be as good as in a single, proper image.
Moreover, I do not intend to use the time property of a video (as in a scan the temporality is useless) but rather working one frame at a time (as depicted in this article).
What is the proper way of constituting my dataset? As I really would like to keep this “scan” for the user’s comfort and if images are more precise than videos in such a classification is it eventually possible to automatically extract a single image from a “scan”, and working directly on it?
Good question! The answer is: you should train your model on how you plan to use it. So if you ask the user to take photos, train it on photos. If you ask the user to film the object, train on frames extracted from video.
The images might seem blurry to you, but they won't be for a computer. It will just learn to detect "blurry books", but that's OK, that's what you want.
Of course this is not always the case. The image might become so blurry that the information whether or not there is a book in the frame is no longer there. Where is the line? A general rule of thumb: if you can see it's a book, the computer will also see it. As I think blurry images of books will still be recognizable as books, I think you could totally do it.
Creating "photos (single image, sharp)" from "scan (more blurry, frames from video)" can be done, it's called super-resolution. But those models are pretty beefy, not something you would want to run on a mobile device.
On a completely unrelated note: try googling Transfer Learning! It will benefit you for sure :D.
I would like to code a script that could locate a specific word or number in a financial statement. Financial statements roughly contain the same information, they are however not identical and organized in the same way. My thought is that by using Tensorflow I could train a neural network to locate the specific words or numbers for me. I am thinking that if I label different text and numbers in 1000 financial statements and use them to train the neural network, it will then be able to identify these numbers or words in all financial statements. For example, tell it in all 1000 training statements which number that is the profit of the company.
Is this doable? I have been working with coding in python for a couple of months and so far I've built some web scrapers and integrated them with twitter, slack and google sheets. I would be very grateful for all your thoughts on this project and if anyone could steer me in the right direction by sharing relevant tutorials.
Thanks a lot!
Great thing that you're getting started, I believe before thinking about the actual implementation using tensorflow or any other library, you should first try to understand the problem in regards with the basic domain of the problem itself.
I'm not really sure what are you exactly trying to achieve but to a rough idea I'm guessing it's about trying to find is a statement turns out to be a benificial to the company or not, something like of semantic analysis type of problem.
So I strongly believe that, first you should try to learn the various methodologies related to semantic analysis and find the most appropriate technique.
In short theory/understanding before the actual code.
Finally i would suggest you ask such theoratical questions on stack exchange of AI, here in SO we generally deal with code or something that of intermediate to code.
I hope that makes sense? ;)
drop a comment if any doubts.
I am trying to solve what I have realized is quite a hard problem to address due to my lack of expertise in the subject. Suppose I have an image of a table with 3 rows and 5 columns. Each row contains text (let's assume only english for now) or numbers (normal Indo-Arabic numerals). There is nothing but whitespace between the columns and between each row. Now assuming all rows and all columns are aligned, my task would be to get an algorithm to recognize and extract each row out from the document (don't know if I'm articulating this well enough).
Could someone suggest a good starting point (library , similar example , textbook chapter that deals with something like this) etc.. for me to get started.
My background is data science but I have just never been exposed to computer vision.
Any help would be appreciated.
You should start off with OpenCV, like Racialz suggested. This tool contains a Hough lines/Hough transform method which should be the primary and easiest way for you to find and crop text from table sections. There are many different tasks for lines to find for which people use this algorythm (like THIS or THIS), but with your task it would be much easier, because lines should be much clearer and simplier, rather than in these examples. After you do your extraction, you then will need to scan your text, for this I would suggest you using tesseract ocr engine. This engine is for free, really easy to use, it provides pretty decent results and allows you to train it to scan specific types of letters.