I'm running some simple timing tests comparing performance for Tensor vs [Float] and ran into some strange behavior. The basic code is below. With device = Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER), everything runs as expected. Tensor results agree exactly with [Float] results, and the code prints |testArray - testTensor|_max = 0.0.
However, with device = Device(kind: .CPU, ordinal: 0, backend: .XLA) and with the parameters below, |testArray - testTensor|_max = 0.001953125. Also, memory usage is much greater. With nLoop >= 1024, the code simply crashes in the error check loop.
Two questions: (1) why does my code crash with XLA, and (2) why is the arithmetic different for XLA?
Thanks in advance!
//--------------------------
let tSize = 1024
let nLoop = 512
let testIntArray: [Int] = Array(1...tSize)
let testFloatArray = testIntArray.map{Float($0)}
var testArray = testFloatArray
//let device = Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER)
let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
var testTensor = Tensor(shape: [tSize], scalars: testFloatArray, on: device)
for _ in 0..<nLoop {
testTensor = 0.9999*testTensor
}
for _ in 0..<nLoop {
for j in testArray.indices {
testArray[j] = 0.9999*testArray[j]
}
}
var maxLinf: Float = 0.0
for j in testArray.indices{
let absDiff = abs(testArray[j] - testTensor[j].scalar!)
if absDiff > maxLinf {
maxLinf = absDiff
}
}
print("|testArray - testTensor|_max = ", maxLinf)
//--------------------------------