Before the MMU is up, all reads/writes must be aligned; the optimized
memcpy implementation does not guarantee all reads/writes it performs
are aligned.
This commit splits the libc impl to be separate for kernel/kernel_ldr,
and so now only kernel will use the optimized impl. This is safe,
as the MMU is brought up before kernel begins executing.