Schematics Recognition

bayer · April 19, 2022, 12:02pm

Hello,

my name is Johannes. As this is my first post, i’m sorry if the topic has already been raised or the category is not fitting well.

I’m doing some research on the recognition of electronic schematics from hand-drawn (and printed) paper sources. Core idea: Just take a photo of a circuit you sketched, process it with our pipeline and then see a digitized version (e.g. in KiCad). We already published a first dataset for training neural network models:

Publication: [2107.10373] A Public Ground-Truth Dataset for Handwritten Circuit Diagram Images
Assiciated Dataset: A Public Ground-Truth Dataset for Handwritten Circuit Diagram Images | Zenodo

Based on this we are developing a pipeline for the complete image-to-schematic process, which we also intend to make open-source.

My questions would be:

Has something like this already been done in the context of KiCad? Would it be still useful in nowadays electrical engineering (considering that complex schematics are usually drawn directly in CAE systems)?
In order to improve recognition results, we are constantly looking for more schematics which we can use. Especially, after we already have some rather simple and old circuits, i would be interested in schematics of real projects. Would someone here be willing to share his/her kicad schematic files with us for further publications? FYI: i’m asking students to draw the schematics by hand, annotate them and use them for training our pipeline. I would like to publish all the hand-drawn versions and the KiCad sch files under open-source licenses (for the images we use creative commons).

paulvdh · April 19, 2022, 12:48pm

First, I wish you a prosperous future with your project.

I do not see this as a very useful project though.

Schematic entry itself is a quite trivial and quick process.
The time consuming parts are in researching parts, verifying they work in the way you intended them to work. And in figuring all the details out.

I find it hard to imagine that people would still draw complete schematics on paper. Some quick drafts as support for ideas in your head sure, but filling in the details you normally only do in your PC.

Such quick drafts also tend to have scratched out part as corrections and other stuff that is hard for ai, but simple and quick for humans.

bayer · April 19, 2022, 12:57pm

Thank you very much for your feedback! It is quite valuable to know the distribution of work time during larger engineering projects. Can you explain why finding parts is hard? Naively speaking, component libraries already exist and apparently some of them already link data sheets…

paulvdh · April 19, 2022, 1:43pm

Go to an electornics store, and search for an opamp.
opamps are a very simple building blocks for schematics, but all opamps have flaws (GBWP, Offsets, bias currents, slew rate, output current power supply voltage limitatcions, etc) and therefore prices of opamps vary between a few cents and over thirty EUR.

Some years ago I did a search for 1/4w 1k THT resistors at digikey and I got over 3000 results. metal and carbon types, different tolerances, different brands, etc.

The amount of parts that were available before the chip shortage was completely redicilous.
At the moment, if you’ve designed an STM32, atmega or any other popular microcontroller brand into your product you are mostly out of luck. The big shops used to have some 600+ variants of STM32 controllers, and now available stock is often ZERO. regardless of the variant you want. This may mean a redesign of your product with any microcontroller family that is capable of doing the task for your product.

What sort of background do you have?
Do you design electronic circuits yourself, or are you just a programmer looking for some project?

jmk · April 19, 2022, 1:55pm

As Paul states:

Details include: availability, short & long term, cost, ease of manufacture, etc.etc.

I understand what you are attempting. It would be useful to know in which, if any, CAD program or programs you intend creating the schematics from paper.

If it is your own program, will it be compatible with the plethora of FOS & Pay For CAD programs?

mwielgus · April 19, 2022, 2:05pm

I’ve never heard about any tool that does this. I would find this useful from my perspective as a computer user if it had the following properties:

Would recognize and import schematics also from non-photos, i.e. images created entirely on the computer, e.g. in GIMP, or a PDF exported from an ECAD. That’s the most common way of sharing schematics on the internet.
Would be a part of a tool with broader scope. I don’t want to choose a different tool for a different image digitization/recognition task. I want a Swiss Army knife that will image-recognize and import as many graphical representations as possible, not only schematics, or anything related to EDA, and not only to KiCad. E.g. text, mathematical formulas, music notation, PCB layouts, integrated circuit layouts, structural chemical formulas, charts, all in one.
Would be well-integrated with the desktop. I want to take a photo with my phone, or do a screenshot (I use Flameshot), click a button or two, and then immediately have the imported thing in my system’s clipboard.

Conclusion: it could be quite useful, but would have to be integrated with many other tools.

As for finding parts: there are component search engines online, but they are incomplete because vendors use different nomenclatures and usually provide datasheets as PDFs, not in a semantic format, making it hard to extract the data automatically. As a result, you often have to manually search through tables in many datasheets before you can find the right component. And the search engines are proprietary and owned by big companies, which may decide to squeeze you for money at any moment.

I would be interested in an open source machine learning-based tool that extracts information from datasheets and a component search engine built on top of it. It would also be interesting if it could be queried with natural language inputs, given the recent developments in this area.

retiredfeline · April 19, 2022, 2:58pm

It would be interesting if you manage to digitise those circuits from service manuals that are not available in schematic form. Once converted into a KiCad schematic, people would be willing to crowd supply effort to correct and enhance popular schematics.

Also I’d be interested in how you handle junctions and hops.

bayer · April 19, 2022, 3:33pm

At the moment, i would like to focus on schematic diagrams for electric/electronic circuits. The general idea was that if a model is powerful enough to even handle hand-drawn schematics, digitizing the printed ones should be easy to perform as well. As in all machine learning, the quality of the results depend on the training data, that’s why i’m asking for more schematics. Generally, digitizing circuits from datasheets would also be possible. But again, this relies on the availability of training data. As as i definitely share your idea of having it open-source, the problem arises what manufacturer would made its datasheets available… By the way: processing tables and properly associating them with the “circuit graph” is yet another story. As of of this is complex, i wanted to solve the digitization of plain schematics first…

paulvdh · April 19, 2022, 3:35pm

I did follow a few links and found:

https://arxiv.org/pdf/2107.10373.pdf

So it recognized crossovers without problems.
I do not know if it is able to remove those inductor hopovers though.

I also saw a 1.9GB file with “training data”, but did not attempt do download that.

I do admit tat I find this result quite impressive, although I’m still not sure if it’s actually useful.
Even if it works flawlessly, you still have to verify manually if the schematic is correct. Verification is mostly automatic if you do manual entry, while it’s a bit of a tedious and therefore error prone step if it has to be done separately.

I am intrigued enough to find it interesting though, but not for hand drawn schematics, how many of those still exist?

But a once computer generated schematic from an old service manual, printed on paper 5x photocopied then scanned again and released as a .pdf (gif, png, etc) file is quite common.

bayer · April 19, 2022, 3:36pm

By hops i assume you mean crossovers?

Well, the current strategy is to treat all symbols, junctions, cross-overs and texts as objects which are detected with their position and type. Afterwards, the graph is constructed by considering the lines in between (with e.g. computer vision or semantic segmentation). Later on, the raw graph is refined by graph algorithms…

bayer · April 19, 2022, 3:38pm

Well, to be perfectly honest, these are annotations, not predictions (i.e. we labeled it this way rather than the AI already detected it). However, as crossovers are quite prevalent, their detection is already working quite well…

paulvdh · April 19, 2022, 3:40pm

Hopovers are an old KiCad joke.

I’m relieved though that that thread (and similar threads like it) have been closed now.

Also:
Please don’t start the hopover discussion again. Everything has been said already and it quickly goes out of hand and hijacks threads.

retiredfeline · April 19, 2022, 3:41pm

Junctions vs hops is a long standing joke on this forum. It’s probably one of the easier problems if not already solved,

bayer · April 19, 2022, 3:45pm

I can imagine this junctions-hops discussion took quite a while. Well, i don’t mean to re-open it again. Just saying: As i already wrote an importer for our recognition results to KiCad, i was quite suprised they are not explicitly denoted in the sch files… But of course i don’t know the whole motiviation behind it. All i can say: Hops do exist “as an optical feature on the image” (which can be detected and has subsequently be utilized).

bayer · April 19, 2022, 3:51pm

Indeed, these are valid points! Resolving the relations between symbols and texts is something we already have on the table. But you are right, without more external knowledge, it is hard doing this for specialized components. Our current intention here is to integrate external knowledge like product catalogues (future work).

Regarding busses: YES! Thats one of the reasons i would love to have more training material.

paulvdh · April 19, 2022, 4:06pm

So you’ve got a dataset of around 1600 images, is that correct?

Have you done things like below to get more data?

I also do not understand why you want to limit your dataset to hand drawn schematics.
You already realized that most “real” input would be CAD program generated data (printed 5x photocopied, scannned again, etc). I thought that AI’s are best trained with data that matches the real-life use.

The image below was probably computer generated, but probably quite a challenge to recognize.

Source:
http://getdrawings.com/get-drawing#circuits-drawing-8.gif

I’m also curious as to how do you verify that your results are correct.
KiCad is able to load a graphical image into a schematic (Schematic Editor / Place / Add Image) and the image can be scaled and this helps with comparisons.

Also, why limit yourself to schematics?
This could become a great tool for reverse-engineering PCB’s.

https://html.duckduckgo.com/html?q=Non-destructive+PCB+Reverse+Engineering+Using+X-ray+Micro+Computed+Tomography

This one has a lengthy description and plenty of nice pictures of PCB reverse engineering:
https://html.duckduckgo.com/html?q=Printed+Circuit+Board+Deconstruction+Techniques

If your software could get both a schematic and a PCB into KiCad, and also DRC error free that would be quite a feat.

bayer · April 19, 2022, 4:25pm

Well, basically i would prefer not limit myself to hand-drawn circuits. So here are the issues i face:

Most importantly, i need a dataset. This comprises of images and annotations (we usually create the annotations ourselves). It should be public in order to allow for reproduceable results in training the model. A lot of effort go into the annotation. Of course, i could to this on stuff i found on the internet. But then, i cannot make my dataset public. There are workarounds, e.g. only making the annotations public and have links to the original sources. But then the consistency of the dataset relies on external sources no-one can control.

Regardless whether to work with hand-drawn or printed circuits, i simply want to avoid legal issues (of course, using a search engine also was the starting point for me.

Again, that’s why i’m looking for schematics from people who allow me to use them.

bayer · April 19, 2022, 4:27pm

Mapping schematics to PCBs (or back) is also a very interesting reseach topic i thought about in the future… If you can provide me 200+ pairs (even better 1000+) of schematics+corresponding PCBs, that are freely available, i would be happy to start working on this

johannespfister · April 19, 2022, 5:34pm

This is even more problematic. Lets say there is a microcontroller splited in 3 parts. Maybe only one of this parts have the type, the other 2 have the same identifier (something like IC102) but with different 1 letter suffixes (IC102a IC102b and IC102c). So a software would have to recognize that. Then there is the problem of the library. You probably don’t have the same part in the library and if you have the same part in the library, you probably don’t have the pins at the correct locations.
To the pins, pin can have multiple functions, something like P4.2/UART1RX/I2C1D/SPIDIN, the drawn schematic may only have some of the functions listed (the one that are needed).
On top of that, different companies, engineers and programs use different ways/standards. I don’t believe there was ever an AI created that can deal with such things, since it needs to understand the context to interpret the data correctly. For example, is CONTR1012.0 a identifier of the same part as CONTR1012.1? Or are this 2 identifiers of 2 different parts? Or are this some other random text? Or is the the part number? Or are this 2 net class labels so that both nets belong to netclass CONTR1012? Or is this the function of 2 pins? A engineer can make sense of the schematic by the context and the function, but even for me it can become too hard to understand what is referring to what.

You then have to deal with things like missing pin numbers. Do you just assign a number? Or do you store somehow that this pin number is missing. For something like a resistor it does not matter, but for something like a LED it does and it only matters sometimes for capacitors.

To make it short: This is an extremely difficult project and i don’t think it can succeed for anything more than schematics with only simple components.

johannespfister · April 19, 2022, 5:46pm

Try something simpler. Maybe a Schematic to ASCII Art (At least you don’t need AI for that). Or generate just the netlist (but i think this is already too hard for current technology).