Do you have any idea how to procedure? (What to use?)
Just remembering: a requirement is OS and browser independence.
Do you have any idea how to procedure? (What to use?)
That’s what my Chrome Headless script is doing actually. I don’t think you’ll find a better solution than that and it won’t work on certain IP addresses, new IP addresses can be blocked if you send a high amount of traffic.
Sorry, I am not sure what you are talking about
Sorry, my English. I this = I think. (I had edited it).
I am still not sure what you are describing though. I am proposing we centralize scraping and API integration efforts around Partinfo and let BOM tools like KiCost make use of the Partinfo API.
Yes. I only try to go more further that the interface the API taking input of dictionary specify a queries, and return a list of dictionaries of result with a matching scores. So it can be flexible for whichever scraping code/API called from partfinfo.
You can look at https://github.com/asciimoo/searx for inspiration. Before KiCad I was one of the devs there ^^. It’s basically a mixture of combining results from official API and web-scrapping.
Oh ok, you are getting into the nitty gritty. Maybe we should take it to the issue tracker or Kitspace chat to discuss. I would say take a look at what it currently does first of all using the Docs browser on the GraphiQL endpoint and we can get on the same page easier.
Currently there are 3 query types and all of the return
Part types. As with the Octopart API the
Part types contains an array of
Offer types from different retailers with prices. E.g.
part(mpn: Mpn, sku: Sku): Part
Returns a single
Part that matches a manufacturer part number (
Mpn) or retailer stock keeping unit (
Sku). This is considered a match. You can also batch multiple part matches requests and get back multiple parts.
match(parts: [MpnOrSku]): [Part]
We can also search and get back multiple parts.
search(term: String): [Part]
I had an idea, but I do not know if it would be workable.
What if individual users can easily add purchasing info to the BOM of a PCB they made and can upload the BOM + pricing info to a central repository. Would such a database be likely to gather enough upto date information to significantly reduce the effort to gather pricing info for future BOM’s?
Interesting idea, but I think it’s much more workable to cache responses for what people request for their BOM on a server for a period of time. This is what Partinfo does currently.
This is what I try to tell to stay away from “custom” part class, instead using dict which is much more flexible to extend, and less issue with backward compatible.
I wasn’t talking about Python at all here but about a GraphQL schema. When your query the API it just returns JSON though which you can trivially turn into Python
I think the point here is set some library / language that could be used in all tools. So I was tending to Python.
I may be not the guy to help with maintenance and integration with some other languages.
About use some centralized service / serve to the tools (KiCad and others) communicate and this be responsible to “scrape” (build the integration around kitspace-partinfo if this is not local). I may not agree, in my point of view this may create a rigid dependence (in case of server overload/overuser).
I don’t disagree that it creates a rigid dependence but am still arguing for it as the least bad option since I have had some bad experiences trying to scrape these sites. With a centralized service we can combine scraping approaches, pool money for API access, cache requests and use it from any language regardless of the implementation. If it breaks because of overuse we can tackle the problem collectively.
I do recommend using headless Chrome though for the best chance at not getting blocked. Looks like there is a good Python lib too.
Aisler seems to be very friendly towards KiCad, and with their “precious parts” service:
the goal seems to be that you give them your KiCad project, and they source all the parts and put them on the PCB they make for you.
I have never made any use of Aisler’s services, but maybe there could be some kind of cooperation from that direction.
I think the our library could also have some communication and use partinfo. But not just, I think it is important have a way to rely just on my own computer.
This python lib is Chrome browser dependent. I think think it is a good idea have a browser / OS dependence.
Just came across this thread and maybe we can help out a bit.
As mentioned before, our Precious Parts offering makes use of DigiKey and Farnells APIs natively. We’d be happy to provide open access to our part search free of charge.
Does that sound like a plan? If so we’d have to build an open API for this.
As we do not make our money with advertising we’ve no plans to charge or (like Octopart) lock down the API.
Need to buy Octapart API for KiCost
You will need some dependence on something at some point. Just keep it in mind as a tool if you run into scraping issues. I hope that a Partinfo fallback in your lib won’t cause problems for making use of your lib in Partinfo.
@aisler, very generous! As you know I am waiting to make use of your API in Partinfo but I didn’t want to announce anything without checking with you first.
+1 from my side…
the best option to avoid a distributor banning is acting with a client emulating a browser behavior… No heavy mass requests from a single IP or server
IMO standard API is a dead route, because of the dependency on distributors inclination.
Great, but I don’t know how to really stay out robot detection / use browser emulation.
I just concerned here: I can’t say to users of KiCost (for example) that it just can use installing Chrome or Linux OS (because of the package used for browser emulation).
So browser emulation may be the way, since it is not strict to: one browser, one OS or a lot of user configurations.
Sure, maybe you can emulate a browser well enough so you are not flagged. I think actually using a browser is the safest bet though. Chrome Headless is available for all platforms, doesn’t need configuration and shouldn’t interfere with the users default browser at all. Did you see this note in the Pyppeteer docs?
Note : When you run pyppeteer first time, it downloads a recent version of Chromium (~100MB). If you don’t prefer this behavior, run
pyppeteer-installcommand before running scripts which uses pyppeteer.