If the correctness tests will pass, expect a sensibly faster DS4 inference speed for DGX Spark, and especially a lot flatter prefill as context increases. Soon in the repo if everything goes as expected.
If the correctness tests will pass, expect a sensibly faster DS4 inference speed for DGX Spark, and especially a lot flatter prefill as context increases. Soon in the repo if everything goes as expected.