First of all before beginning with any generating techniques, I’ll shortly introduce some concepts that are used throughout the whole article to minimize the size overhead for the generators. However you can apply most of the stuff to arbitrary texture formats.
I’ll assume textures to be 256^2 in size and 32bits per pixel if not stated other. Under this assumption it is pretty easy to handle the access, since pixels are 32bit aligned and using 256^2 as texture size gives the bonus being able to use only one counter for looping over all pixels, beeing able to directly access x/y via low/high byte register access.
I’ll also assume that esi is bound to a source image and edi is bound to the destination image memory. Any other assumptions will be states explicitly.
A simple fill texture to a constant color would for example look like this:
; assume edi = destination texture memory
; assume eax = fill color
xor ecx, ecx
dec cx
fillloop:
; you can use cl and ch here to access x/y counter values
stosd
loop fillloop ; warning last pixel wont be processed.
; you need to use dec ecx, jns instead
; assume ecx = gridsize [0..7]
xor ebx, ebx ; using ebx this time
dec bx ; as counter
checkerloop:
mov eax, ebx ; move x/y to eax
shr eax, cl ; scale by ecx
; note bits shiftet from ah to al
; wont affect checkerboard, because
; we use only the lowest bits of ah/al
xor ah, al ; invert ah using al as mask
; if the lowest bits were were equal, it will
; become 0 else if will become 1
sahf ; load ah to the flags (now carry holds lowest bit)
sbb eax, eax ; substract eax, from eax (resulting into 0)
; and also substract 1 more if carry was set
; so we get black/white
stosd ; store the pixel
dec ebx
jns checkerloop
; assume ebx = seed (must not be 0)
xor ecx, ecx
dec cx
@fillloop:
add eax, ebx
rol eax, 3
add ebx, eax
stosd
loop @fillloop
Note: this noise does not have good “random” properties, but should work well for most purposes in a 4k
; assume color in eax
xor ecx, ecx
dec cx
shr ecx,1 ; processing 2 pixel per iteration
push eax
push eax
blendloop:
movq mm0, QWORD [esi+ecx*8]
paddusb mm0, QWORD [esp] ; using psubusb would substract the color
movq QWORD ptr[edi+ecx*8], mm0
dec ecx
jns blendloop
pop eax
pop eax
You could also modify the code to blend (additive or subtractive) two images:
; assume second source image in edx
xor ecx, ecx
dec cx
shr ecx,1 ; processing 2 pixel per iteration
blendloop:
movq mm0, QWORD [esi+ecx*8]
paddusb mm0, QWORD [edx+ecx*8] ; using psubusb would substract the second image
movq QWORD ptr[edi+ecx*8], mm0
dec ecx
jns blendloop