Schematics Recognition

retiredfeline · April 19, 2022, 2:58pm

It would be interesting if you manage to digitise those circuits from service manuals that are not available in schematic form. Once converted into a KiCad schematic, people would be willing to crowd supply effort to correct and enhance popular schematics.

Also I’d be interested in how you handle junctions and hops.

bayer · April 19, 2022, 3:33pm

At the moment, i would like to focus on schematic diagrams for electric/electronic circuits. The general idea was that if a model is powerful enough to even handle hand-drawn schematics, digitizing the printed ones should be easy to perform as well. As in all machine learning, the quality of the results depend on the training data, that’s why i’m asking for more schematics. Generally, digitizing circuits from datasheets would also be possible. But again, this relies on the availability of training data. As as i definitely share your idea of having it open-source, the problem arises what manufacturer would made its datasheets available… By the way: processing tables and properly associating them with the “circuit graph” is yet another story. As of of this is complex, i wanted to solve the digitization of plain schematics first…

paulvdh · April 19, 2022, 3:35pm

I did follow a few links and found:

https://arxiv.org/pdf/2107.10373.pdf

So it recognized crossovers without problems.
I do not know if it is able to remove those inductor hopovers though.

I also saw a 1.9GB file with “training data”, but did not attempt do download that.

I do admit tat I find this result quite impressive, although I’m still not sure if it’s actually useful.
Even if it works flawlessly, you still have to verify manually if the schematic is correct. Verification is mostly automatic if you do manual entry, while it’s a bit of a tedious and therefore error prone step if it has to be done separately.

I am intrigued enough to find it interesting though, but not for hand drawn schematics, how many of those still exist?

But a once computer generated schematic from an old service manual, printed on paper 5x photocopied then scanned again and released as a .pdf (gif, png, etc) file is quite common.

bayer · April 19, 2022, 3:36pm

By hops i assume you mean crossovers?

Well, the current strategy is to treat all symbols, junctions, cross-overs and texts as objects which are detected with their position and type. Afterwards, the graph is constructed by considering the lines in between (with e.g. computer vision or semantic segmentation). Later on, the raw graph is refined by graph algorithms…

bayer · April 19, 2022, 3:38pm

Well, to be perfectly honest, these are annotations, not predictions (i.e. we labeled it this way rather than the AI already detected it). However, as crossovers are quite prevalent, their detection is already working quite well…

paulvdh · April 19, 2022, 3:40pm

Hopovers are an old KiCad joke.

I’m relieved though that that thread (and similar threads like it) have been closed now.

Also:
Please don’t start the hopover discussion again. Everything has been said already and it quickly goes out of hand and hijacks threads.

retiredfeline · April 19, 2022, 3:41pm

Junctions vs hops is a long standing joke on this forum. It’s probably one of the easier problems if not already solved,

bayer · April 19, 2022, 3:45pm

I can imagine this junctions-hops discussion took quite a while. Well, i don’t mean to re-open it again. Just saying: As i already wrote an importer for our recognition results to KiCad, i was quite suprised they are not explicitly denoted in the sch files… But of course i don’t know the whole motiviation behind it. All i can say: Hops do exist “as an optical feature on the image” (which can be detected and has subsequently be utilized).

bayer · April 19, 2022, 3:51pm

Indeed, these are valid points! Resolving the relations between symbols and texts is something we already have on the table. But you are right, without more external knowledge, it is hard doing this for specialized components. Our current intention here is to integrate external knowledge like product catalogues (future work).

Regarding busses: YES! Thats one of the reasons i would love to have more training material.

paulvdh · April 19, 2022, 4:06pm

So you’ve got a dataset of around 1600 images, is that correct?

Have you done things like below to get more data?

I also do not understand why you want to limit your dataset to hand drawn schematics.
You already realized that most “real” input would be CAD program generated data (printed 5x photocopied, scannned again, etc). I thought that AI’s are best trained with data that matches the real-life use.

The image below was probably computer generated, but probably quite a challenge to recognize.

Source:
http://getdrawings.com/get-drawing#circuits-drawing-8.gif

I’m also curious as to how do you verify that your results are correct.
KiCad is able to load a graphical image into a schematic (Schematic Editor / Place / Add Image) and the image can be scaled and this helps with comparisons.

Also, why limit yourself to schematics?
This could become a great tool for reverse-engineering PCB’s.

https://html.duckduckgo.com/html?q=Non-destructive+PCB+Reverse+Engineering+Using+X-ray+Micro+Computed+Tomography

This one has a lengthy description and plenty of nice pictures of PCB reverse engineering:
https://html.duckduckgo.com/html?q=Printed+Circuit+Board+Deconstruction+Techniques

If your software could get both a schematic and a PCB into KiCad, and also DRC error free that would be quite a feat.

bayer · April 19, 2022, 4:25pm

Well, basically i would prefer not limit myself to hand-drawn circuits. So here are the issues i face:

Most importantly, i need a dataset. This comprises of images and annotations (we usually create the annotations ourselves). It should be public in order to allow for reproduceable results in training the model. A lot of effort go into the annotation. Of course, i could to this on stuff i found on the internet. But then, i cannot make my dataset public. There are workarounds, e.g. only making the annotations public and have links to the original sources. But then the consistency of the dataset relies on external sources no-one can control.

Regardless whether to work with hand-drawn or printed circuits, i simply want to avoid legal issues (of course, using a search engine also was the starting point for me.

Again, that’s why i’m looking for schematics from people who allow me to use them.

bayer · April 19, 2022, 4:27pm

Mapping schematics to PCBs (or back) is also a very interesting reseach topic i thought about in the future… If you can provide me 200+ pairs (even better 1000+) of schematics+corresponding PCBs, that are freely available, i would be happy to start working on this

johannespfister · April 19, 2022, 5:34pm

This is even more problematic. Lets say there is a microcontroller splited in 3 parts. Maybe only one of this parts have the type, the other 2 have the same identifier (something like IC102) but with different 1 letter suffixes (IC102a IC102b and IC102c). So a software would have to recognize that. Then there is the problem of the library. You probably don’t have the same part in the library and if you have the same part in the library, you probably don’t have the pins at the correct locations.
To the pins, pin can have multiple functions, something like P4.2/UART1RX/I2C1D/SPIDIN, the drawn schematic may only have some of the functions listed (the one that are needed).
On top of that, different companies, engineers and programs use different ways/standards. I don’t believe there was ever an AI created that can deal with such things, since it needs to understand the context to interpret the data correctly. For example, is CONTR1012.0 a identifier of the same part as CONTR1012.1? Or are this 2 identifiers of 2 different parts? Or are this some other random text? Or is the the part number? Or are this 2 net class labels so that both nets belong to netclass CONTR1012? Or is this the function of 2 pins? A engineer can make sense of the schematic by the context and the function, but even for me it can become too hard to understand what is referring to what.

You then have to deal with things like missing pin numbers. Do you just assign a number? Or do you store somehow that this pin number is missing. For something like a resistor it does not matter, but for something like a LED it does and it only matters sometimes for capacitors.

To make it short: This is an extremely difficult project and i don’t think it can succeed for anything more than schematics with only simple components.

johannespfister · April 19, 2022, 5:46pm

Try something simpler. Maybe a Schematic to ASCII Art (At least you don’t need AI for that). Or generate just the netlist (but i think this is already too hard for current technology).

RRPollack · April 19, 2022, 5:53pm

Intel released images of the schematic for the MCS-4 family of microprocessor chips from 1971. They are available under the CC-BY-NC license. A team of fans then converted the original schematics to a simple B&W BMP image for use in a simulator. The simulator can export a verified netlist containing component references, positions, and connections.

I used the BMP image and netlists as input to a Java program which converted the output to Cadsoft Eagle schematic scripts to recreate the schematic as an Eagle project. This is a greatly reduced version of the problem you’re tackling.

Maybe this would help get you started.

The original investigative project is documented at 4004.com
My re-creation project is described in my blog at insanity4004.blogspot.com

You may also want to look at a presentation James Lewis gave at KiCon 2019 entitled “Preserving history with KiCad”.

bayer · April 19, 2022, 8:39pm

Well, again: An image-to-netlist pipeline is exactly what we are focusing on right now. Basically, all the necessary steps are working (at least for simple circuits). Training on more complex and realistic sample circuits should help to boost the recognition performance. Likewise, it should help to face aspects we didn’t captured before. So for example, some circuits with buses would help. The same for circuits with complex switches and relays.

So if you have such circuits, please let me know…

bayer · April 19, 2022, 8:46pm

Thank you, this looks interesting!

retiredfeline · April 19, 2022, 11:46pm

I think you have missed the true calling of your project. It will be useful not in inputting schematics by hand but for reconstructing schematics from possibly degraded machine generated plans. There must be heaps of plans in archives which have no usable schematic files because they have been lost or are in an unreadable format. If you can get hold of a substantial corpus of such plans, you can ditch the camera.

I hardly ever do any schematics on paper now except for simple subcircuits. Some of those I even do in my head, a skill honed through tuning out of sermons as a kid. For anything substantial I need to input a schematic, reason the operation of the circuit on that, modify and repeat.

bayer · April 20, 2022, 12:10am

Well, i would love to include such printed schematics as well. So are you aware of a large corpus (yes, ideally one single) of printed schematics for which the ower would be fine with republishing them along with the annotations we would create?

retiredfeline · April 20, 2022, 12:20am

Sorry I worked on software most of my life. You’re welcome to use any schematics of my amateur PCB projects which I generally publish under MIT license, but it’s a very small collection: retiredfeline / Repositories · GitHub Generally they are the projects called something-board.

PS: In the retrocomputing world there are lots of designs involving old CPUs whose schematics would have features like buses. Some of those designs are released under liberal licenses. Here’s one site: Sitemap [RetroBrew Computers Wiki]