KiCost going quiet

hildogjr · February 22, 2019, 1:37pm

Do you have any idea how to procedure? (What to use?)
Just remembering: a requirement is OS and browser independence.

kasbah · February 22, 2019, 1:39pm

That’s what my Chrome Headless script is doing actually. I don’t think you’ll find a better solution than that and it won’t work on certain IP addresses, new IP addresses can be blocked if you send a high amount of traffic.

Sorry, I am not sure what you are talking about

nhatkhai · February 22, 2019, 1:44pm

Sorry, my English. I this = I think. (I had edited it).

kasbah · February 22, 2019, 1:47pm

I am still not sure what you are describing though. I am proposing we centralize scraping and API integration efforts around Partinfo and let BOM tools like KiCost make use of the Partinfo API.

nhatkhai · February 22, 2019, 2:03pm

Yes. I only try to go more further that the interface the API taking input of dictionary specify a queries, and return a list of dictionaries of result with a matching scores. So it can be flexible for whichever scraping code/API called from partfinfo.

pointhi · February 22, 2019, 2:22pm

You can look at https://github.com/asciimoo/searx for inspiration. Before KiCad I was one of the devs there ^^. It’s basically a mixture of combining results from official API and web-scrapping.

kasbah · February 22, 2019, 2:29pm

Oh ok, you are getting into the nitty gritty. Maybe we should take it to the issue tracker or Kitspace chat to discuss. I would say take a look at what it currently does first of all using the Docs browser on the GraphiQL endpoint and we can get on the same page easier.

Currently there are 3 query types and all of the return Part types. As with the Octopart API the Part types contains an array of Offer types from different retailers with prices. E.g.

part(mpn: Mpn, sku: Sku): Part

Returns a single Part that matches a manufacturer part number (Mpn) or retailer stock keeping unit (Sku). This is considered a match. You can also batch multiple part matches requests and get back multiple parts.

match(parts: [MpnOrSku]): [Part]

We can also search and get back multiple parts.

search(term: String): [Part]

Here free form text is parsed by Electro Grammar and turned into parametric Octopart searches and also matched against the Common Parts Library.

paulvdh · February 22, 2019, 2:36pm

I had an idea, but I do not know if it would be workable.

What if individual users can easily add purchasing info to the BOM of a PCB they made and can upload the BOM + pricing info to a central repository. Would such a database be likely to gather enough upto date information to significantly reduce the effort to gather pricing info for future BOM’s?

kasbah · February 22, 2019, 2:39pm

Interesting idea, but I think it’s much more workable to cache responses for what people request for their BOM on a server for a period of time. This is what Partinfo does currently.

nhatkhai · February 22, 2019, 3:11pm

This is what I try to tell to stay away from “custom” part class, instead using dict which is much more flexible to extend, and less issue with backward compatible.

kasbah · February 22, 2019, 3:42pm

I wasn’t talking about Python at all here but about a GraphQL schema. When your query the API it just returns JSON though which you can trivially turn into Python dict.

hildogjr · February 22, 2019, 3:52pm

I think the point here is set some library / language that could be used in all tools. So I was tending to Python.
I may be not the guy to help with maintenance and integration with some other languages.

About use some centralized service / serve to the tools (KiCad and others) communicate and this be responsible to “scrape” (build the integration around kitspace-partinfo if this is not local). I may not agree, in my point of view this may create a rigid dependence (in case of server overload/overuser).

kasbah · February 22, 2019, 4:07pm

I don’t disagree that it creates a rigid dependence but am still arguing for it as the least bad option since I have had some bad experiences trying to scrape these sites. With a centralized service we can combine scraping approaches, pool money for API access, cache requests and use it from any language regardless of the implementation. If it breaks because of overuse we can tackle the problem collectively.

If you want to start an effort to collaborate on scrapers using Python though I’d be happy to try and make use of it in Partinfo too and contribute back. I prefer to use Javascript for web stuff, but I am a Python developer too.

I do recommend using headless Chrome though for the best chance at not getting blocked. Looks like there is a good Python lib too.

paulvdh · February 22, 2019, 4:24pm

Aisler seems to be very friendly towards KiCad, and with their “precious parts” service:
https://aisler.net/products/parts
the goal seems to be that you give them your KiCad project, and they source all the parts and put them on the PCB they make for you.

I have never made any use of Aisler’s services, but maybe there could be some kind of cooperation from that direction.

hildogjr · February 22, 2019, 4:38pm

I think the our library could also have some communication and use partinfo. But not just, I think it is important have a way to rely just on my own computer.

This python lib is Chrome browser dependent. I think think it is a good idea have a browser / OS dependence.

aisler · February 22, 2019, 4:40pm

Just came across this thread and maybe we can help out a bit.
As mentioned before, our Precious Parts offering makes use of DigiKey and Farnells APIs natively. We’d be happy to provide open access to our part search free of charge.
Does that sound like a plan? If so we’d have to build an open API for this.

As we do not make our money with advertising we’ve no plans to charge or (like Octopart) lock down the API.

kasbah · February 22, 2019, 4:45pm

You will need some dependence on something at some point. Just keep it in mind as a tool if you run into scraping issues. I hope that a Partinfo fallback in your lib won’t cause problems for making use of your lib in Partinfo.

@aisler, very generous! As you know I am waiting to make use of your API in Partinfo but I didn’t want to announce anything without checking with you first.

maui · February 22, 2019, 5:08pm

+1 from my side…
the best option to avoid a distributor banning is acting with a client emulating a browser behavior… No heavy mass requests from a single IP or server
IMO standard API is a dead route, because of the dependency on distributors inclination.

hildogjr · February 22, 2019, 5:35pm

Great, but I don’t know how to really stay out robot detection / use browser emulation.

I just concerned here: I can’t say to users of KiCost (for example) that it just can use installing Chrome or Linux OS (because of the package used for browser emulation).

So browser emulation may be the way, since it is not strict to: one browser, one OS or a lot of user configurations.

kasbah · February 22, 2019, 5:43pm

Sure, maybe you can emulate a browser well enough so you are not flagged. I think actually using a browser is the safest bet though. Chrome Headless is available for all platforms, doesn’t need configuration and shouldn’t interfere with the users default browser at all. Did you see this note in the Pyppeteer docs?

Note : When you run pyppeteer first time, it downloads a recent version of Chromium (~100MB). If you don’t prefer this behavior, run pyppeteer-install command before running scripts which uses pyppeteer.