I teach KiCad and every year, at least one student takes someone else’s project, modifies it and tries to pass it as their own. I catch them most of the time, but not always. So I thought I may be able to detect such files by comparing UUIDs. Basically, I’d dump all component UUIDs from all projects and then search for duplicates. If I understand from the docs correctly, UUIDs are preserved even when moving projects between different operating systems, changing project name (Save As) etc.
Would that work? Are there any caveats I should be aware of? Should I compare entire UUIDs (128 bits) or only certain blocks? I tried googling for scripts or tools that would do this, but found none. Thanks.
KiCad uses the UUID’s to match schematic symbols with PCB footprints. So at least those UUID’s normally don’t change. However, if a (part of) a schematic is cut, and then Paste Special is used to re-insert that section on another (hierarchical) sheet, you have to preserve the RefDes, and t hen re-synchronize with the PCB using the RefDes. When the parts (block) is inserted in the other sheet, the UUID’s change.
No, as I was writing above while you posted, UUID’s are changed when a circuit (section) is pasted. UUID’s would not be unique if there are duplicates. The sole purpose of an UUID is to be unique.
In KiCad, every object has an UUID, even wire segments and junction dots. I don’t know what’s the use of that, never looked into details but it does seem a bit exaggerated to me (and it increases file size).
KiCad’s file format is in the form of S-expressions and most programming languages will have some library to read such files into memory with just a few lines of code. The whole Idea behind UUID’s is that they have so many bits that the chance on collisions is negligible. When you reduce the amount of bits you check, the chance on collisions increases. In a moderately small project of mine, there are around 500 UUID’s in the schematic, and 7000 UUID’s in the PCB file. Is it a performance problem to check such a number of UUID’s in between the students work? (or do you want to cross check with other year’s work, or whole github?) Checking only part of the UUID seems premature optimisation to me. Start by using standard checks, optimize only if needed.
So even individual symbol and footprint primitives (lines, polylines etc.) get new UUIDs whenever they are placed into the project? That sounds promising. Students turn in 30 to 60 projects every year and they tend to be simple. If I limit the dump only to component UUIDs, it will be a few thousand entries at maximum.
I looked at KiCad’s internal Python script examples, but TBH their syntax looks… daunting. Moreover, I need the script to search a given directory and subdirectiores and dump UUIDs from all kicad_sch and kicad_pcb files it finds. Will standard methods like os.listdir() or modules like glob() work in internal Python? Any tips which class I should then use to get the UUIDs? (The search field at the official KiCad Python docs page searches only their names, not fulltext.)
I would not do this with internal python scripting, but write a separate program for it.
As I previously posted, reading an S-expression based file from disk is just a few lines of code.
With a command line program, you can easily script it. Maybe you want to create a (sort of) database which holds all the UUID’s extracted from past years and of multiple classes of students.
As your only concern is to find plagiates and there is no intention to further modify or even interact with those projects, I also do not see any benefit for using Python inside KiCad for this. Have you looked at KiCad’s files in a text editor? S-Expressions are human readable, and have a very simple structure.
With an external script, you can also easily do other things. For example, there are utilities that can create PNG images of different projects, and then XOR them to show the differences. Such utilities are mostly made for comparing different versions of the same project. You can also do things such as launch two KiCad instances with different project to examine them further.
KiCad also has a command line interface (called kicad-cli)
I guess that UUID’s do not change when things are moved or rearranged on a schematic, but any copy or paste action will generate new UUID’s. (Even the “Paste Special”, which only preserves RefDes). Making some very simple dummy projects (for example with a shorted resistor) and doing some comparisons will give you more insight in the details. Things like this are not part of the KiCad user documentation, as users are not supposed to interact with the UUID’s (called “timestamp” in historic KiCad versions, “timestamp” is still used to reference to UUID’s in some parts of the software).