What happened to nightly builds?

novaktamas · October 25, 2017, 8:58am

What about distributing nightlies through Torrent? Works well and probably lots of us would contribute.
Separate libs. I often install nightly and always untick all the options of installer.

Rene_Poschl · October 25, 2017, 10:21am

Again the distribution is not the problem here. It is building the software. (Converting source code to the stuff a computer does understand.)

ArtG · October 25, 2017, 12:06pm

What exactly so challenging about compiling code? Thanks, by the way, for explaining to us in laymen’s terms about what this mysterious process is all about. While you’re at it, can you also explain to me what exactly RAID controller and a high bandwidth link does?

Rene_Poschl · October 25, 2017, 12:36pm

This is not easy to explain in such a way that someone without knowledge of what is going on can easily understand it. (And i need to confess that i am not as knowledgeable about compilers as would like. I fear i forgot a lot about this topic already.)

Summary from an answer on stack exchange

Many files hold information that needs to be interpreted by the compiler. (Lots of disk read access operations -> very slow)
parsing c++ syntax into the internal data structure used by the compiler takes some time.
optimization: There are a lot of operations that can be improved by the compiler such that the performance of the resulting program is increased. Optimizing is a hard problem. (Needs lots of computation power)
creating the assembler code (machine code) takes some time as well.

And as far as i found out this server is a bit overpowered if it would be only used for compiling kicad. (The owners donate a bit of computing time to compiling kicad.)

A raid controller manages systems where you have more than one hard disk. There are different options for how these can be managed. All have their own benefits and drawbacks. (below a simplified summary.)

One option is to split out your date across all disks (fast but if one disk fails you loose all your data)
Another option is to have all data mirrored on multiple disks (low chance to loose data.)
And there are other options where you split the data but also include recovery information in case a disk breaks. (depending on raid level, 1 or 2 disks are allowed to fail before data recovery is not possible anymore.)

I would guess this has to do with the fact that this server is normally used for serving a database. (But this is something where i really know next to nothing.)

The term bandwidth has its routes in signal theory. (describes how wide of a frequency band is taken by a signal.)
In computing it is more or less used to describe network speeds. (How much data can be transferred per time instance.)

So from that i would guess a high bandwidth link is “just” very fast network access hardware.

Joan_Sparky · October 25, 2017, 10:42pm

Official note

…in case anyone is wondering why the thread reads a bit differently.
I edited some posts in this thread to remove offending tones by a singular user.
Thanks for staying polite and civil @ everybody else involved.

bobc · October 26, 2017, 9:41am

That’s a deep and complex question, but a full answer is probably too long for this forum

However, your question does contain a clue. You missed out the word “is”, a fairly important verb wrt to parsing the meaning. As a human, I can immediately see the error, mentally correct it and carry on. However, computers are stupid, extremely stupid (but fast). They would likely just say “Syntax error”. You then have to figure out what the error was and how to fix it, before the computer can even attempt to answer the question.

Complex software also has a set of instructions telling the computer how to compile the software, the build instructions are also a kind of programming language, so you have the same problem of telling the computer exactly what to do, and the computer failing to understand when you get the smallest thing wrong.

Imagine talking to a person who whenever you made a typo, a punctuation or grammar mistake etc, just said “Error”.

tldr; computers are stupid.

ArtG · October 26, 2017, 12:21pm

So what exactly IS the problem? I never said that there is no human in the loop. The original statement that I was puzzled by was “…distribution is not a problem but building it is.” What exactly is that mysterious “problem” in building the software that is related to the current predicament? So far all I’ve heard was that it is difficult because it can’t be fully automated. “Difficult” is not a problem

Joan_Sparky · October 26, 2017, 1:59pm

You have to ask the people who run this, but usually don’t frequent these support forums, but hang around in their own world - the developers mailing list.
I’m amazed @GyrosGeier showed up to be honest.

maui · October 26, 2017, 2:10pm

probably you missed that the ‘problem’ is to have the building hosted on-line for free…
this is available as ‘jenkins’ for Linux or OSX but NOT for windows… that is the problem here…
Maurice

GyrosGeier · October 26, 2017, 4:16pm

We use Jenkins (on a Linux box) to remote-control a Windows VM that does the actual build.

This works reasonably well, but requires that the connection between Jenkins and the agent remains up for the entire duration of the build, otherwise Jenkins will mark the build as failed and the agent will kill everything. File transfers between Jenkins and the agent go through an RPC protocol over the same channel.

Jenkins isn’t built for this kind of setup, really — they expect all machines to be on a LAN, with low latency (copying one file requires a full round trip, and the next file is not started before the last one is acknowledged) and high throughput (otherwise, large file transfers clog up the RPC pipe, causing the periodic “ping” requests over the same connection to time out).

So, the box needs a really good connection. I’ve had the builds running on a VM on my home DSL for some time, where about 50% of builds would succeed, and every attempt would take six hours. CPU horsepower isn’t that important, even an i3 can do it in about 2.5 hours wall clock time, but rushing through in ten minutes will reduce the chance for failures even more.

Also, the build process could be a bit more efficient. Right now, we always download the latest state of the libraries, unpack them, copy them to the installation directiory, repack them, copy the archive into the installer, and then sign the installer, which requires another copy. That is where most of the build time comes from on the small boxes, they simply don’t have the I/O bandwidth for that.

This is where the RAID card comes in handy — it has a few GB of RAM that has its own battery, so it will just accept the write of the footprint archive in one big transaction, report that the data has been written, and then send it to the disks in the background, while the build process does the next step, which conveniently uses the same data that is still in RAM, dropping the wall clock time for unpacking the archive from a few minutes to a few seconds.

I have an item on my TODO list to make the build more efficient and also allow separation of binaries and library, but this will take some time to get right — there are a few frameworks for that already, but none of them fit exactly (like Jenkins, which makes 90% of the job easy, and the remaining 10% would require a full rewrite of Jenkins from the ground up).

GyrosGeier · October 30, 2017, 4:21pm

Good news everyone.

After a month, the old server managed to boot (in fact, the RAID controller was not dead, but just too stupid to kick out the broken disk). The colo facility pulled the broken disk, so it should work again for some time if nothing else breaks.

Next nightly build is scheduled at 17:41 CET, as usual, so starts in about 20 minutes. Progress can be seen on the other Jenkins.

New box will still be rolled out at some point, but is delayed because of a public holiday and shipping delays (enterprise class SSDs are hard to get), which should not be that much of a problem now though.

nickoe · October 30, 2017, 9:07pm

Yay, the build succeeded. Thank you.

davidsrsb · November 29, 2017, 12:55am

Gone missing again, at a critical time as V5 is stabilising

rgilliom · December 13, 2017, 4:16pm

Very glad to see the Windows nightlies back up and running. Thank you to all responsible.

davidsrsb · April 14, 2018, 9:54am

And missing again since 9th April. No mention on the mailing list

eelik · April 14, 2018, 10:10am

This time it’s not what you think - the number rolled over 9999 and the sorting order is “wrong”. Friday the 13th is there

davidsrsb · April 14, 2018, 10:24am

I see now, sorting is hard to get right. I have documents at work with reference numbers 000001 just to avoid this.

GyrosGeier · April 18, 2018, 9:13am

The official Discordian document numbering scheme uses five digits.

eelik · April 18, 2018, 9:53am

The good news is that this will happen again only after a couple of hundreds of years

Joan_Sparky · April 19, 2018, 12:32am

Unless singularity happens first and our uploaded minds run at different time normals

PS: been reading Stross - Accelerando last week