Credit System, New Driver, Fixed Accuracy Bug, Support for Resnet18 filters
Credit System:
The old backpressure system has been replaced by a new credit system to avoid overflowing in the input buffers in the event where the backpressure signals is too slow to propagate back.
Comparisons from the two systems can be found in the tables below:
Rewritten driver + MV2 Bug Fix + Credit System + Resnet18 changes (@300MHz) | ||||||||
Throuhghput (im/s) |
Top1-acc |
Top5-acc |
Latency (first image) |
Logic Utilization (ALMs) |
Total Block Memory bits |
Total RAM Blocks |
Total DSP_Prime Blocks |
|
MV1 (3400DSP) |
20674.8 |
66.482 |
87.012 |
0.398 ms |
60% | 47% | 73% | 86% |
MV2 (2800 DSP) |
19795.5 |
63.81 |
85.26 |
0.459 ms |
83% | 37% | 63% | 69% |
MV3 (2900 DSP) |
26356.7 |
55.124 |
78.788 |
0.420 ms |
80% | 37% | 57% | 64% |
Rewritten driver + MV2 Bug Fix + Old Backpressure (@300MHz) | ||||||||
Throuhghput (im/s) |
Top1-acc |
Top5-acc |
Latency (first image) |
Logic Utilization (ALMs) |
Total Block Memory bits |
Total RAM Blocks |
Total DSP_Prime Blocks |
|
MV1 (3400DSP) |
22285.8 |
66.482 |
87.012 |
0.397 ms |
62% | 47% | 73% | 86% |
MV2 (2800 DSP) | Hangs |
63.81 |
85.26 |
-- |
85% | 37% | 62% | 69% |
MV3 (2900 DSP) |
27490.9 |
55.124 |
78.788 |
0.403 ms |
81% | 37% | 57% | 64% |
New Driver:
Because the old driver included here only reliably operated at 16,000 im/s, and often hung when modified for higher throughputs, we included a new driver that will allow operation at higher speeds
Accuracy Bug Fixed:
In the previous commit, MV2 and MV3 had poor accuracy because of a bug in the Add layers that caused two RAM blocks to send their outputs at the same time. The bug is fixed in this commit.
Support for Resnet filters in TensorMode:
In the previous commit, kernels of height and width greater than 1 would only work in tensormode if the number of input channels was less than ICP (10 in this case). In other words, we could support a 3x3x10x32 filter, but not a 3x3x11x32 filter. Changes have been added here to support any number of input channels (such as 3x3x64x64). Moreover, we now support convolutions with kernel of size 1x1 but stride 2.