Don’t. The web distributors changed a lot the web pages and added some java scripts to block robots. Because these issues and the constants changes at the HTML from they (that was driving the few developers crazy), it was decided to change from web scrape to API.
(opinion)
Despise the option #1 (individual API implementation) be more stable and “easy” to add in the current code, it will need a enormous work from the few existent developers.
Option #2 will be complicated to a free tool.
Best of the scenarios is some others users start to the develop the extra code from API of each distributors. But for now, freeze the development.
Bad news indeed. I am still making use of my free API key for the Kitspace BOM Builder but I believe my free access will go away soon too.
For the BOM Builder I developed a service called Partinfo that mostly searches Octopart (though it accesses the Farnell API too, does some clever stuff like turn search queries into parametric searches using Electro Grammar and will incorporate other data sources in the future). It uses GraphQL but it could fairly easily be turned into a standard HTTP/REST-like endpoint as well (there is a library that will do it in a couple of lines I believe). Here is an example query with Partinfo.
I’d like to keep the Partinfo service running no matter what as I am building more and more on top of it. Maybe KiCost could make use of it too and people could donate to opencollective.com/kitspace to offset the API costs and any additional work to improve the search results.
Would be a shame to see such a popular tool as KiCost fall to the wayside.
@kasbah, is your project using javascript? Is it possible to use Python to scrape/API?
I am seeing that get the information (scrape / using API) is a common issue for your tools (KiCost - @devbisme, Kitspace and now BOMGarter - @volunteerlabs).
Seeing that the three tools have different proposes, e.g. KiCost focus is get the cost with the part code already defined: could be it possible set-up a common package to get the informations of distributors web sites? This will centralize the issues and maintenance.
We could start it at GitLab (because appear to be a eforce, even of KiCad, to use this Git at the future).
@kasbah, I saw your Partinfo server a while ago and thought that would be a good way to centralize part searches so everyone would not have to get their own keys. I doubt using Octopart as a component of Partinfo would last for long before they ask for their monthly fee (it looks like you were quoted the same pricing I got for KiCost). That would mean writing API interfaces for Digikey, Mouser, Farnell, etc. In effect, Partinfo would become a sort of FOSS Octopart. An important question is whether the distributors will let you use their APIs this way, or does it violate some deal they have with Octopart (and others) to restrict mass distribution of data from their websites.
@hildogjr, yes it’s NodeJS but if someone is very good at writing Python scrapers and willing to contribute I am sure we could integrate it somehow.
I think we have to use a combination of scraping and API access as not all have APIs and others are very hostile to scraping (looking at you Farnell).
I have only looked at the Farnell API so far. Anyone can register, but it’s severely slow and rate-limited. I haven’t heard of any special deals with Octopart. The Digikey API looks pretty good without having looked at it closely yet. The terms and conditions say
Any use of the API that (…) aggregates, in any way, any Digi-Key Corporation Content with third party content (without distinction) or fails to attribute the Digi-Key Corporation Data appropriately to Digi-Key Corporation is expressly not permitted.
But I think as long as we attribute and keep the data “distinct” we wouldn’t be in violation.
Anyway, I am happy for people working on BOM tools, especially KiCost users and developers, to use (but not abuse) the Partinfo endpoint right now, specifically: https://dev-partinfo.kitspace.org/graphql.
It’s currently an open endpoint so anyone can access it. Docs are a bit thin but you are able to browse the schema using the “Documentation Explorer” on the right on the above URL and you can jump on our chat or the GitHub issue tracker if you have any questions. You don’t need any special GraphQL client to use it, here is the example query using curl:
curl -H 'Content-Type: application/json' -X POST -d '{"query": "{ part(mpn: {part: \"NE555P\", manufacturer: \"Texas Instruments\"}) { datasheet description type offers { sku { vendor part } prices { USD EUR GBP SGD } } } }"}' https://dev-partinfo.kitspace.org/graphql | python -m json.tool
One thing to note is that the requests are batched and cached so 1 request to Partinfo doesn’t necessarily mean 1 request to Octopart.
If we do hit the Octopart request limit we’ll have to come up with something but I don’t want impede development efforts by limiting access right now. Of course, if we don’t have money or alternative sources when Octopart wants to start charging then this won’t work out either so it’s kind of up to people to contribute code and money towards it if it’s something they want. The pricing structure Octopart btw. means our money also goes a lot further if we pool it.
Unfortunately, I think that the Altium take over has driven Octopart to restrict the previously fairly free access to their data. I did have a couple of Octopart API codes but one was revoked despite writing at length to them and without even a reply. They seem to have recruited a lot more staff - who all have to be paid - but I am really not sure how they think they are going to monetise this further; most 'lightweight ’ users will be put off by even a modest subscription model. I would be happy to pay a nominal fee per search - the amateur nature of my work would make this affordable. I would be very happy to pay 0.01c per search line - but this doesn’t seem to be the way they are going.
My own BOM tool https://github.com/Gasman2014/KC2PK is dependant on an Octopart connection - I have never bundled a code with it and have expected users to register with Octopart for their own code. So far, I have been able to use it OK but after spending a long time writing it, I would be very disappointed to lose access.
TBH, I think that Octopart are probably shooting themselves in the foot here - their current revenues are from the big distributors - I suspect that unless there is a LOT more added value, people are not going to sign up, they will see a downturn in Octopart usage and the distributors will not see a value in subsidising them.
I would also be very happy with paying 0.01c for each youtube / Vimeo video I watch and for every 10 minutes spend on a news site and similar things could be implemented in for example forums like this, something like:
you pay 20ct for asking a question, and when an answer is marked as a satisfactory solution a part of those 20ct go the one with the best answer.
But instead we, as collective couch potatoes let the whole internet getting wrecked by money and advertisements which are not designed to give us the products we want, but to generate money to the advertisers and their products.
The whole idea of advertisements on the internet, but also on TV etc is a very bad idea. There should be laws against it.
I would happily pay a cent for each Wikipedia article I read or for a search engine question, but instead I do not use google anymore because I can not trust the results.
On Octopart I’m getting increasingly annoyed by “sponsored” chips from Ti.
Github got bought by microsoft for a ludicous amount of money, which is another hit from the cheese grater that’s eating away the freedom & choice options on the Internet.
I think there is enough honesty in people for a system based on honesty to work, but I can only see it becoming effective if it is fully integrated in web browsers.
Web browsers collect the things you do on the net. Websites and services state a suggestion for costs and once a month you skim through the collected micropayments and make a choise whether you want to make those payments or not.
David A Stockman wrote a book about this long before the Internet existed
The Corruption of Capitalism in America
And he has written similar books along those lines.
It’s also ingrained in captitalistic thinking.
Small companies want to get big.
Big companies get greedy.
Plenty of wiggle room in there. If KiCost places DigiKey, Mouser, Farnell pricing info into a spreadsheet, that’s definitely “aggregating”, but is it “without distinction” even if the columns of data are labeled with distributor names? I’d also bet the terms and conditions are “subject to change without notice”.
So that’s a pretty shaky bridge to cross. Unfortunately, it may be the only bridge available.
@devbisme @hildogjr @John_Pateman @volunteerlabs
Unfortunately I think opting for web scraping is the only viable option.
It may be also a difficult road, but emulating a browser behavior would be IMO the only option to avoid a later change of distributor’s terms and condition.
Spending time on API would lead in the above issue quite soon and will lead to lose resources and time as it did in the past with octopart’s api.
The point to develop a common scrape library should help us to keep the routines up-to-date.
These library could also works with API, but as separated implementation providing to users that have the KEY. And this should not be the focus now, since as free tools we have to focus on the free users.
@devbisme may remember, one problem that made us to stop scraping was the robot nature of ours algorithm (robot detected by the sites). In that time we checked by some python libraries that make the algorithm behavior as browser kind. But we made no progress to configure it.
I think we need to use a combination of techniques where appropriate. For Farnell for instance the introduction of heavy scraping protection nixed my previous attempts. I am now using the API and will hopefully have some higher request limit soon which I could make available through Partinfo.
The only way I found I could still scrape Farnell is using Chrome Headless pretending to be a normal Chrome browser on a residential IP, i.e. it will work on my home network but not on any of my server or VPN IPs. Here is the code I used.
I suspect they will quickly blacklist a residential IP too if they notice the scraping though. I know RS blocked the IP for the whole building of my Hackspace indefinetely because I was running my the test-suite for my browser extension for automating purchases. And that was running from a real browser on a residential IP.
On the whole we are dealing with websites that are extremely hostile to scraping and we need to use what we can where we can, which includes scraping but also legitimate use of vendor and third party APIs. I think centralizing this effort on Partinfo would make sense as we already have something that works right now, we just need to keep it running!
This would lead to the same problem with common web scraping approach but plug the locate of where the server would be.
I think the common part extractor is a great way to go. I think it should be flexible like a input is a dictionary of where keys is a field name string defined by python constants, the value can be regex string, a list of regex string, or regex object. Return is array of dictionary if key is field name string defined by same python constants of the same python script, the values is actual value it found, with a key call match_ratio, with value of 0 to 1. 1 mean match 100% everything. 0.5 mean match 50%. Or this can be use for matching score if needed. It what I think it would be flexible, and simple without introduce class structure.
Oh, that right. But is require extra step of set up local server. I would rather have comon API first, and may be that can be addition separate project using the same common API.
Do you have any idea how to procedure? (What to use?)
Just remembering: a requirement is OS and browser independence.
That’s what my Chrome Headless script is doing actually. I don’t think you’ll find a better solution than that and it won’t work on certain IP addresses, new IP addresses can be blocked if you send a high amount of traffic.
Sorry, I am not sure what you are talking about
Sorry, my English. I this = I think. (I had edited it).
I am still not sure what you are describing though. I am proposing we centralize scraping and API integration efforts around Partinfo and let BOM tools like KiCost make use of the Partinfo API.