I wasn’t talking about Python at all here but about a GraphQL schema. When your query the API it just returns JSON though which you can trivially turn into Python dict
.
I think the point here is set some library / language that could be used in all tools. So I was tending to Python.
I may be not the guy to help with maintenance and integration with some other languages.
About use some centralized service / serve to the tools (KiCad and others) communicate and this be responsible to “scrape” (build the integration around kitspace-partinfo if this is not local). I may not agree, in my point of view this may create a rigid dependence (in case of server overload/overuser).
I don’t disagree that it creates a rigid dependence but am still arguing for it as the least bad option since I have had some bad experiences trying to scrape these sites. With a centralized service we can combine scraping approaches, pool money for API access, cache requests and use it from any language regardless of the implementation. If it breaks because of overuse we can tackle the problem collectively.
If you want to start an effort to collaborate on scrapers using Python though I’d be happy to try and make use of it in Partinfo too and contribute back. I prefer to use Javascript for web stuff, but I am a Python developer too.
I do recommend using headless Chrome though for the best chance at not getting blocked. Looks like there is a good Python lib too.
Aisler seems to be very friendly towards KiCad, and with their “precious parts” service:
https://aisler.net/products/parts
the goal seems to be that you give them your KiCad project, and they source all the parts and put them on the PCB they make for you.
I have never made any use of Aisler’s services, but maybe there could be some kind of cooperation from that direction.
I think the our library could also have some communication and use partinfo. But not just, I think it is important have a way to rely just on my own computer.
This python lib is Chrome browser dependent. I think think it is a good idea have a browser / OS dependence.
Just came across this thread and maybe we can help out a bit.
As mentioned before, our Precious Parts offering makes use of DigiKey and Farnells APIs natively. We’d be happy to provide open access to our part search free of charge.
Does that sound like a plan? If so we’d have to build an open API for this.
As we do not make our money with advertising we’ve no plans to charge or (like Octopart) lock down the API.
You will need some dependence on something at some point. Just keep it in mind as a tool if you run into scraping issues. I hope that a Partinfo fallback in your lib won’t cause problems for making use of your lib in Partinfo.
@aisler, very generous! As you know I am waiting to make use of your API in Partinfo but I didn’t want to announce anything without checking with you first.
+1 from my side…
the best option to avoid a distributor banning is acting with a client emulating a browser behavior… No heavy mass requests from a single IP or server
IMO standard API is a dead route, because of the dependency on distributors inclination.
Great, but I don’t know how to really stay out robot detection / use browser emulation.
I just concerned here: I can’t say to users of KiCost (for example) that it just can use installing Chrome or Linux OS (because of the package used for browser emulation).
So browser emulation may be the way, since it is not strict to: one browser, one OS or a lot of user configurations.
Sure, maybe you can emulate a browser well enough so you are not flagged. I think actually using a browser is the safest bet though. Chrome Headless is available for all platforms, doesn’t need configuration and shouldn’t interfere with the users default browser at all. Did you see this note in the Pyppeteer docs?
Note : When you run pyppeteer first time, it downloads a recent version of Chromium (~100MB). If you don’t prefer this behavior, run
pyppeteer-install
command before running scripts which uses pyppeteer.
Before I would resort to web scrapers again, I would want to see someone scrape mouser.com successfully for a week or so. They’re using Distil and I haven’t seen any people publicly claiming success at scraping of Distil-protected sites. And if you could, how long would that solution work? And how much effort are you willing to invest keeping your scrapers updated and running? Is that the best use of your resources?
I would rather have a service like Partinfo use their time/energy to get whatever info they can from public APIs, and use that to deliver value to the community. A service that provides part data without a great deal of effort will attract utility creators. More/better utilities will increase the size of the community. Eventually, distributors might give Partinfo increased access because they don’t want to be left out of that community.
this seems a good start
https://www.seleniumhq.org/projects/webdriver/
and this
https://code.google.com/archive/p/pyv8/
Oh man, I just researched a bit. I thought Farnell was bad. “They transparently inject script tags that do fingerprinting, and they obfuscate the code that does so.” It’s funny because we are just trying to buy stuff from them!
What I like in PartInfo is that it may possible work on this issue https://github.com/xesscorp/KiCost/issues/17
and make possible an scrape / API for generic resistors just given the value + package + …
(use the value and packge from schematic / layout and give additional information, such: tolerance, material on KiCost GUI).
Yes! The goal for Electro Grammar, which is used by Partinfo, is to be able to give it text that describes any generic component. It only works for surface mount resistors, capacitors and LEDs at the moment but I should be able to work on it some more soon. David Craven already did a lot of work to compile transistors and diodes on an unmerged branch.
I could learn some JS to help. Language logical are almost the same in all “sequencial language”, if the community decide that is the better option centralize. I just may not be so helpful in specific configuration / aspects.
One curiosity: if I start I localhost, PartInfo application, this will make possible KiCost access it to scrape? @kasbah (just being careful in case of server down)
Yeah, use the dev
branch and cp config.js.in config.js
and add your Octpart API key. You probably want to just disable the Element14 API for now (it’s only used to get more accurate stock information). I changed it so if you don’t add the keys it won’t try and use the element14 API.
You will also need the Redis in-memory persistent database (used for caching). If you want to clear your local cache you can do redis-cli flushall
.
That said I am happy for you to use the dev-partinfo.kitspace.org/graphql endpoint for testing, i think the traffic will be ok.
Edit: I added some instructions to the repo and made it so that not giving any element14 API keys will simply not try and access that API.
I see that kitspace can provide the parts that match with some description. This could be interesting and KiCost, cold provide some high level (saying in programming language code) that select one of than by price / manufacture name or other characteristics in the case of https://github.com/xesscorp/KiCost/issues/17 .
I did check how to interact with and we will need some concept prove.
But first of this new feature, it important to restore the KiCost capabilities.
Yeah, the BOM Builder can do that (kitspace.org is what I call Kitspace) but it’s actually built into Partinfo and the BOM Builder is using Partinfo.
If you run into something Partinfo can’t do that’s required to restore current KiCost functionality, let me know please and we can see how to add it.
@kasbah, could you help to modify https://github.com/xesscorp/KiCost/blob/master/kicost/distributors/dist_octopart.py to get data from kispace?