I want a hardware multiplier

I'm currently working on the data multiplexer, the part that comes between the CPU and the memory and is responsible for putting the right byte at the right place, but I can't stop thinking about what comes next.

And let me tell you, there's still quite a lot of development to do and many topics to cover!

The next article will probably be about branches and jumps, leaving out everything related to exceptions and interrupts handling for the moment.

On the development side, and in no particular order, there's waiting for slower memory or IO device, actual memory mapping, CSR's and all the system instructions, exceptions and interrupts, global reset circuitry, ...
And then the peripherals.

There is also one thing that may come much later in the project but that I don't want to rule out: a hardware multiplier. (And while at it, why not put a division unit along with it? But that is another story.)

In any case, there's just no way that it will run in a single cycle, so it will require some sort of mechanism to freeze the pipeline at the Execute stage while the "MulDiv" unit does its job. I expect this mechanism to be very similar to what I'll have to implement for the slow memory access anyway.

Regarding the multiplier itself, chances are that I will go for simplicity and do the slowish shift-add method because all the other ways I know are just too expensive to build with discrete gates at 32 bits.

And before you ask, no, I will not go out of order. The instruction following a mul will have to wait in the Decode stage!

Now, honestly, wouldn't it be very sad to have to draw the Mandelbrot set using only software multiplication?