PhAIL Benchmark: Ranking Physical AI Models on Real Hardware