Skip to content

Credit System, New Driver, Fixed Accuracy Bug, Support for Resnet18 filters

Mario Doumet requested to merge mario_dev into master

Credit System:

The old backpressure system has been replaced by a new credit system to avoid overflowing in the input buffers in the event where the backpressure signals is too slow to propagate back.

Comparisons from the two systems can be found in the tables below:

Rewritten driver + MV2 Bug Fix + Credit System + Resnet18 changes (@300MHz)

Throuhghput (im/s)

Top1-acc

Top5-acc

Latency (first image)

Logic Utilization (ALMs)

Total Block Memory bits

Total RAM Blocks

Total DSP_Prime Blocks

MV1 (3400DSP)

20674.8

66.482

87.012

0.398 ms

60% 47% 73% 86%
MV2 (2800 DSP)

19795.5

63.81

85.26

0.459 ms

83% 37% 63% 69%
MV3 (2900 DSP)

26356.7

55.124

78.788

0.420 ms

80% 37% 57% 64%
Rewritten driver + MV2 Bug Fix + Old Backpressure (@300MHz)

Throuhghput (im/s)

Top1-acc

Top5-acc

Latency (first image)

Logic Utilization (ALMs)

Total Block Memory bits

Total RAM Blocks

Total DSP_Prime Blocks

MV1 (3400DSP)

22285.8

66.482

87.012

0.397 ms

62% 47% 73% 86%
MV2 (2800 DSP) Hangs

63.81

85.26

--

85% 37% 62% 69%
MV3 (2900 DSP)

27490.9

55.124

78.788

0.403 ms

81% 37% 57% 64%

New Driver:

Because the old driver included here only reliably operated at 16,000 im/s, and often hung when modified for higher throughputs, we included a new driver that will allow operation at higher speeds

Accuracy Bug Fixed:

In the previous commit, MV2 and MV3 had poor accuracy because of a bug in the Add layers that caused two RAM blocks to send their outputs at the same time. The bug is fixed in this commit.

Support for Resnet filters in TensorMode:

In the previous commit, kernels of height and width greater than 1 would only work in tensormode if the number of input channels was less than ICP (10 in this case). In other words, we could support a 3x3x10x32 filter, but not a 3x3x11x32 filter. Changes have been added here to support any number of input channels (such as 3x3x64x64). Moreover, we now support convolutions with kernel of size 1x1 but stride 2.

Merge request reports

Loading