Hi Piotr
Good comment, I will rephrase in more words. If a skew exists on a differential trace pair, that is to say between the transmitter and receiver, one of the pairs is longer than the other ; the skew should be removed close to where the skew occurred. I have seen situations where skew was introduced early in the trace route, in a U-shape, and then travelled 50mm to the receiver. The skew was not corrected until at the receiver pins. This is suboptimal.
If the skew exists on the pair over a large distance, the differential properties of the signal may be degraded if the length of the skew exceeds the one third of the edge rate of the waveform, approximately. Degradation will introduce loss of amplitude at the receiver, jitter and EMR (EMC problems) .
Sometimes skew is hard to avoid , such as a differential trace doing a U turn… In this case it’s ideal if the shape is varied, or the path is compensated (with another shape) as to reduce the skew.
Sometimes, it is not worth caring , because the distance the signals travel with skew is short compared to the distance in rise time terms. That is, for a 50mm distance (about 320pS) , and a rise time of the signal of 1nS, then the signal will not look (much) different anywhere along the trace when the signal is changing.
Fast signals with 100pS rise time, say, fast LVDS, MIPI, now we more easily experience problems. Length matching between pairs should always be better than 1/3 of the rise time . There’s a few exceptions to this, but that’s a good number. 33pS on FR4 is 10mm. so you see alot of length matching is is BS - however because these skew transitions happen thousands of times per second, and they are associated with noise, as all processes have noise (transmit and receiver, power supplies etc) , jitter will be produced, creating jitter in the acquisition of the signal at the receiver
Therefore we try ( need ) to route diff pairs to better than that, in practice routing within 1/10 of a rise time is no big deal. In practice the direction of the weave of the coarse glass mat in PCB prepreg can have a large effect on propagation velocity.
If the signal is multi level, not binary, constraints tighten proportionally.
That’s a real big rough take on it, made as simple as I can make it. For single ended traces on DDRRAM etc, length matching in groups there is a bit of a different game- the memory chips are fast and contain asynchronous paths that are timing flukes and very much rely on signals arriving at exact times relative to other signals. But no need to get carried away, (although we do) I’ve never had a simple DDR4 that did not work first go.