Merging models // combining weights — spherical linear interpolation (SLERP)
How good is the model in practice? Pretty good, but there’s some clear leaderboard hacking with this technique. To be clear, it’s still very experimental and I don’t think those are the best 7B param LLMs you can find.