i would go for the nec-tokins change first. most ps3 work after that, but the crack solder under gpu-cpu is also posible. anyway the xbox 360-ps3 generation of consoles was bad. xbox 360 was reliable after the jasper model and ps3 is only slim lines. but question remains ps4 and xbox one used too leed-free solder why the curent generation doent fail as much as previous? do they use a better leed-free that wont crack after multiple heats-cooldowns? maybe @Bad_Ad84
know something about it
The short answer is that stress cracking of solder balls is an old problem - there are papers where people are talking about it dating from back into the '90s - but it wasn't considered severe enough to worry about in anything but high-reliability applications, so it was largely ignored.
There are multiple reasons it went from being pretty much a technical curiosity to a major problem, and that's also why some products (consoles and GPUs for example) were affected much more than others:
1) The change to RoHS solder, which is less ductile
2) Increase in transistor density combined with smaller packaging
3) Adoption of extensive clock gating as a power and thermal control measure
4) The wider use of flip-chip packaging
5) Wider use of high-powered parts in consumer applications
The first is the one that's often blamed as "the problem" - but that's not really true. There is no reason that a joint made with RoHS solder needs to be less reliable than one made with PbSn solder - the main difference is that the PbSn joint can withstand more abuse, but that just means that (if you're lucky) you push the failure threshold past the point where the device is removed from use for some other reason.
The second and third have to be considered together - increasing transistor density will typically increase power dissipation, and the package size reduction will compound this by increasing the power density even further. This results in more heating. The clock gating is important because it makes thermal events more frequent - if you have a chip that has no clock gating then it will heat up as soon as the power is applied and remain at a largely constant temperature until the power is removed. As a result, each power up/power down cycle represents a single thermal cycle. If you have extensive clock gating then the device temperature is constantly changing - an a single power up/power down cycle may correspond to hundreds or even thousands of thermal cycles.
The reason flipchips are significant is that they have two electrical interconnects - one between the die and the carrier PCB ("primary interconnect") and the second between the carrier PCB and the PCB of the system it's installed into ("secondary interconnect"). This is a problem because the die has a certain expansion coefficient (2.6ppm/K for silicon) and the main PCB has a different one (about 12ppm/K for FR4) - but the carrier PCB has to be rigidly bonded to both of them, which obviously represents something of a problem since it's impossible to match both these values simultaneously. The generally adopted solution was to use low CTE carrier boards and try to match the silicon (since that interconnect is finer pitch) and rely on the ductility of the solder balls to accommodate the CTE mismatch. This mostly worked pretty well, until it didn't.
The fact it's a consumer product is significant because it's typically going to be used in a quiet environment and this rules out of the other approach for keeping temperatures under control because fans that sound like jet engines are generally considered undesirable in a box that has to sit under your TV.
In the specific case of the 360, you also have to add just plain bad thermal design. They used a small heatsink on the GPU and stuck it under the DVD drive with an air duct that narrowed down and generated considerable backpressure which drastically cut down on airflow.
There are multiple reasons it's not so much of a problem anymore. One of the biggest changes is that the packaging is now typically made using high CTE materials that match the PCB much better and use soft pads on the primary interconnect to the die to accommodate the stresses there - although the ratios remain the same, the die is smaller so the distances are also smaller. The use of smaller device geometries means there is less heat to deal with anyway - and most importantly, this is now a potential problem that's on everyone's radar so they actually think about it.
Incidentally, the existence of two interconnects in a flip-chip BGA is why "reflow it with a hot air gun" is terrible advice - using a powerful and uncontrolled heat source like that is very likely to introduce a large thermal gradient across the package and this can easily damage the primary interconnect - if you look at the rework data provided by the device vendors they typically call out a 120-180s ramp to bring the device up to temperature. This is not being done to annoy people or waste their time - it's because doing it any faster risks damaging the device.