What happened to nightly builds?

davidsrsb · October 21, 2017, 5:25am

I never had problems with downloading Windows Nightlies to Malaysia.
Is Australia having network cable or routing problems?
The Ubuntu PPA of the libraries is another story, taking a long time

GyrosGeier · October 21, 2017, 6:01am

Current status: new box is ordered (took a while to hammer out something that the supplier is willing to support for five years), and space in the colo facility has been reserved as well. We’re waiting for the SSDs for the VMs still though.

Timeline:

October 27th: SSDs should arrive
October 28th–29th: Burn-in testing
October 30th–31st: Shipping
November 1st: Setup

Joan_Sparky · October 21, 2017, 6:44am

Are you guy(s) funded via the CERN donation or is that separate?

GyrosGeier · October 21, 2017, 2:27pm

No, that is separate. CERN uses that money to commission new features. Using donations for recurring payments is a logistical nightmare unless it’s something small like a domain name.

The build server needs a bit of horsepower and connectivity, which puts renting a dedicated box out of the sensible range for donations anyway. We have a few offers for machines, but none of these work well because the build process needs six hours of CPU time, downloads about 1 GB, writes 50 GB to disk and uses 700 MB of RAM per thread.

A lot of that could be solved with better software (e.g. caching of results), but all we have right now is Windows+MSYS2+Java+Jenkins.

HiGreg · October 21, 2017, 6:39pm

Just posted on developer’s list from the developer volunteering his time for the Windows builds, Simon Richter:

New server is ordered, but the SSDs seem to be difficult to get right
now. The hardware will probably be ready by next Friday, then the
selftests will run over the weekend, then shipping to the colo facility,
so if all works out, November 1st will be setup day.

Sprig · October 21, 2017, 9:47pm

LOL. Check the profile of the poster above your post…

Sprig · October 22, 2017, 3:26am

[quote=“GyrosGeier, post:25, topic:8167”]…but none of these work well because the build process needs six hours of CPU time, downloads about 1 GB, writes 50 GB to disk and uses 700 MB of RAM per thread.
[/quote]

Welcome to the USER FORUMS! We, well most of us, really REALLY, REALLY appreciate this mostly awesome Open Source software!

However, there will be grumblings when things don’t work as expected.

How much does this new machine cost?

GyrosGeier · October 22, 2017, 7:54am

All in all, about 10k€, but KiCad nightlies only need a tiny slice of that machine since that is a once-a-day job that is done in about twenty minutes, mostly thanks to the RAID controller adding a more aggressive cache layer than could be implemented in software.

The rest of the time, it will mainly run database and computation workloads for my company, so there is a reason it is overpowered.

novaktamas · October 25, 2017, 8:58am

What about distributing nightlies through Torrent? Works well and probably lots of us would contribute.
Separate libs. I often install nightly and always untick all the options of installer.

Rene_Poschl · October 25, 2017, 10:21am

Again the distribution is not the problem here. It is building the software. (Converting source code to the stuff a computer does understand.)

ArtG · October 25, 2017, 12:06pm

What exactly so challenging about compiling code? Thanks, by the way, for explaining to us in laymen’s terms about what this mysterious process is all about. While you’re at it, can you also explain to me what exactly RAID controller and a high bandwidth link does?

Rene_Poschl · October 25, 2017, 12:36pm

This is not easy to explain in such a way that someone without knowledge of what is going on can easily understand it. (And i need to confess that i am not as knowledgeable about compilers as would like. I fear i forgot a lot about this topic already.)

Summary from an answer on stack exchange

Many files hold information that needs to be interpreted by the compiler. (Lots of disk read access operations -> very slow)
parsing c++ syntax into the internal data structure used by the compiler takes some time.
optimization: There are a lot of operations that can be improved by the compiler such that the performance of the resulting program is increased. Optimizing is a hard problem. (Needs lots of computation power)
creating the assembler code (machine code) takes some time as well.

And as far as i found out this server is a bit overpowered if it would be only used for compiling kicad. (The owners donate a bit of computing time to compiling kicad.)

A raid controller manages systems where you have more than one hard disk. There are different options for how these can be managed. All have their own benefits and drawbacks. (below a simplified summary.)

One option is to split out your date across all disks (fast but if one disk fails you loose all your data)
Another option is to have all data mirrored on multiple disks (low chance to loose data.)
And there are other options where you split the data but also include recovery information in case a disk breaks. (depending on raid level, 1 or 2 disks are allowed to fail before data recovery is not possible anymore.)

I would guess this has to do with the fact that this server is normally used for serving a database. (But this is something where i really know next to nothing.)

The term bandwidth has its routes in signal theory. (describes how wide of a frequency band is taken by a signal.)
In computing it is more or less used to describe network speeds. (How much data can be transferred per time instance.)

So from that i would guess a high bandwidth link is “just” very fast network access hardware.

Joan_Sparky · October 25, 2017, 10:42pm

Official note

…in case anyone is wondering why the thread reads a bit differently.
I edited some posts in this thread to remove offending tones by a singular user.
Thanks for staying polite and civil @ everybody else involved.

bobc · October 26, 2017, 9:41am

That’s a deep and complex question, but a full answer is probably too long for this forum

However, your question does contain a clue. You missed out the word “is”, a fairly important verb wrt to parsing the meaning. As a human, I can immediately see the error, mentally correct it and carry on. However, computers are stupid, extremely stupid (but fast). They would likely just say “Syntax error”. You then have to figure out what the error was and how to fix it, before the computer can even attempt to answer the question.

Complex software also has a set of instructions telling the computer how to compile the software, the build instructions are also a kind of programming language, so you have the same problem of telling the computer exactly what to do, and the computer failing to understand when you get the smallest thing wrong.

Imagine talking to a person who whenever you made a typo, a punctuation or grammar mistake etc, just said “Error”.

tldr; computers are stupid.

ArtG · October 26, 2017, 12:21pm

So what exactly IS the problem? I never said that there is no human in the loop. The original statement that I was puzzled by was “…distribution is not a problem but building it is.” What exactly is that mysterious “problem” in building the software that is related to the current predicament? So far all I’ve heard was that it is difficult because it can’t be fully automated. “Difficult” is not a problem

Joan_Sparky · October 26, 2017, 1:59pm

You have to ask the people who run this, but usually don’t frequent these support forums, but hang around in their own world - the developers mailing list.
I’m amazed @GyrosGeier showed up to be honest.

maui · October 26, 2017, 2:10pm

probably you missed that the ‘problem’ is to have the building hosted on-line for free…
this is available as ‘jenkins’ for Linux or OSX but NOT for windows… that is the problem here…
Maurice

GyrosGeier · October 26, 2017, 4:16pm

We use Jenkins (on a Linux box) to remote-control a Windows VM that does the actual build.

This works reasonably well, but requires that the connection between Jenkins and the agent remains up for the entire duration of the build, otherwise Jenkins will mark the build as failed and the agent will kill everything. File transfers between Jenkins and the agent go through an RPC protocol over the same channel.

Jenkins isn’t built for this kind of setup, really — they expect all machines to be on a LAN, with low latency (copying one file requires a full round trip, and the next file is not started before the last one is acknowledged) and high throughput (otherwise, large file transfers clog up the RPC pipe, causing the periodic “ping” requests over the same connection to time out).

So, the box needs a really good connection. I’ve had the builds running on a VM on my home DSL for some time, where about 50% of builds would succeed, and every attempt would take six hours. CPU horsepower isn’t that important, even an i3 can do it in about 2.5 hours wall clock time, but rushing through in ten minutes will reduce the chance for failures even more.

Also, the build process could be a bit more efficient. Right now, we always download the latest state of the libraries, unpack them, copy them to the installation directiory, repack them, copy the archive into the installer, and then sign the installer, which requires another copy. That is where most of the build time comes from on the small boxes, they simply don’t have the I/O bandwidth for that.

This is where the RAID card comes in handy — it has a few GB of RAM that has its own battery, so it will just accept the write of the footprint archive in one big transaction, report that the data has been written, and then send it to the disks in the background, while the build process does the next step, which conveniently uses the same data that is still in RAM, dropping the wall clock time for unpacking the archive from a few minutes to a few seconds.

I have an item on my TODO list to make the build more efficient and also allow separation of binaries and library, but this will take some time to get right — there are a few frameworks for that already, but none of them fit exactly (like Jenkins, which makes 90% of the job easy, and the remaining 10% would require a full rewrite of Jenkins from the ground up).

GyrosGeier · October 30, 2017, 4:21pm

Good news everyone.

After a month, the old server managed to boot (in fact, the RAID controller was not dead, but just too stupid to kick out the broken disk). The colo facility pulled the broken disk, so it should work again for some time if nothing else breaks.

Next nightly build is scheduled at 17:41 CET, as usual, so starts in about 20 minutes. Progress can be seen on the other Jenkins.

New box will still be rolled out at some point, but is delayed because of a public holiday and shipping delays (enterprise class SSDs are hard to get), which should not be that much of a problem now though.

nickoe · October 30, 2017, 9:07pm

Yay, the build succeeded. Thank you.