Tuesday, December 2, 2014

New Website Coming

After looking back on my tutorial series, I have decided to hold off on writing future parts of the series until I have a website to support the content better.  The context of a blog post keeps me limited in the amount of content I can put into the tutorial.

Going forward the plan is to write an in depth tutorial in assembly programming.  Hopefully getting a little more hands on, and with a lot more visuals.  I will continue to post updates and useful information here.  

Keep on the lookout for updates on the new website.

Monday, December 1, 2014

Welcome December with Cyber Monday

I hope everyone in the US had a great Thanksgiving, and has had luck with any Black Friday or Cyber Monday purchases.

On the tech side there are a few good deals to note.

InformIT is having a 50% off sale on eBooks as well as 70% off on Video Tutorials

O'Reilly Media is having a 50% off sale as well with purchases over $100 getting 60% off.

There are of course plenty of other sales out there, leave a comment if you know of a deal that is hard to pass up.

Also keep watching for more tutorial installments.

Saturday, November 22, 2014

Assembly Tutorial 5: Strings

Accepting input from the user is basically pointless if you don't have the ability to do anything with it.  That is where string functions come in handy.

It is always handy to be able to change the case of a string.  This is handy for formatting the output or for string comparison.

To Upper
To make a character uppercase we will first check to make sure that it is a lowercase character by comparing it to 'a' and 'z' to make sure that it is between them.  After that we will simply subtract the difference between 'a' and 'A' from the character.

To Lower
Making a character lowercase is almost exactly the same as making it upper case.  We simply make sure it is between 'A' and 'Z' and then add the difference between 'a' and 'A' to the character.

For an entire string we simply loop through the whole string and do this to each character.

Substring
Taking a substring is a little more involved than changing the case, though not too complicated.  Using a length parameter in register CX and the address of the string in BX.

We need to copy the character at BX into a buffer and then decrement CX and increment BX.  When CX or BX are zero we null terminate the string in the buffer and return the buffer address in BX.

String Compare
Comparing two strings is very similar to taking a substring, except without the count or copy.

We simply take the addresses of 2 strings and then compare the characters at their addresses.  If they are equal and not 0 we increment the addresses and repeat.  If they are both zero we return that they are equal, if the characters are not equal we return which one was greater in value as a 1 or -1.

Like usual examples of this are in the repository for this tutorial.
https://github.com/musicman89/Assembly-Tutorial

Assembly Tutorial 4: Keyboard IO

When writing a program, it is generally necessary to be able to accept user input.  Fortunately in 8086 real mode this is quite simple.  We can use a BIOS interrupt to get a key from the user.

Getting A Key
Accepting a single key is very simple.  We just call interrupt 0x16 after setting ax to 0x01.  This will return a character in register AL, and the key id in register AH.

Getting a String
Once we are able to get a single key we can expand this into a loop that accepts any key press and stores it in a buffer until the enter key is pressed.  At that point we return the buffer in register BX.

This loop can be constructed simply by setting BX to the address of the buffer, after that we set a label as the start of our loop and call to get a key press.

When a key is returned we check if that key was the enter key.  If it is we jump out of the loop.

Otherwise we push the character in register AL to the address in BX and increment the address and jump to the top of the loop.

After the loop we add a 0 to the end of the buffer to indicate that it is the end of the string and push the address of the buffer back to BX and return.

Please check the repository for this tutorial for an example of this code.
https://github.com/musicman89/Assembly-Tutorial

Assembly Tutorial 3: Disk IO

For this tutorial we are going to keep things as short and simple as possible.  I will cover how to load information from a disk into memory using CHS addressing, and how to jump to that part of memory to run code.

CHS Addressing
First I will start with explaining what CHS addressing it.  CHS stands for Cylinder, Head, and Sector.  When referring to a traditional hard disk which is typically made of several discs that are stacked with read heads on each surface.

The cylinder is an imaginary cylinder that extends through all the discs, so basically all the data that can be read without moving the read heads.  The first cylinder is numbered 0.

The head is the read head, also starting at number 0.  Now the sector is the smallest storage unit that is addressable by the hard drive.  This actually starts at 1.

Loading the Data from the Disk
In assembly we can use BIOS interrupt 0x13 to load data from the disk.  Unlike our print functions this actually takes a lot more parameters.

We need to put 0x02 in AH to tell the BIOS that we want to use the Disk Read function.

In DL we will put the drive to load from.

In BX we will put the address in memory to load to.

The number of sectors to load goes in register AL.

We need to address where to start from in CH with the Cylinder, DH with the Head, and CL with the Sector.  With our program we will put 0 in CH, 0 in DH, and 2 in CL.

We will also put 9 in AL to specify that we will load 9 sectors.
We will also set BX to 0x9000.

For most computers the BIOS loads DL with the boot drive at start so we will store that at the label BOOT_DRIVE at the top of our program and push it back to DL when we go to load from the disk.

At this point we can jump to the data we just loaded at 0x9000 by calling JMP 0x9000.

Building the Program
For the data we are going to have at this location we will put our hello world program, except instead of using the org directive to set our start at 0x7c00 we will set it to 0x9000 so the compiler knows we are loading it at this location.  We also need to make sure the file is 9 sectors long so we will do similar to the bootloader and put a times macro except we will make it 4608-($-$$) db 0 since there are 512 bytes in a sector.

At this point we can compile our two files using NASM and then merge them using a binary copy.  This is done with the copy command:
copy /b "Bootloader.com" + "Hello World.com" "Hello World.bin"

The output file can then be run in QEMU.

Please refer to the code in the repository for an exact implementation.
https://github.com/musicman89/Assembly-Tutorial


Wednesday, November 19, 2014

New Tutorial Sections This Week

I realize that it has been nearly a week since the last post in the assembly tutorial series.  I have been working on the code for the next sections.  At this point I am just polishing up the code a bit through section 5, so you can expect a few tutorials to be heading out in the next week.

Hopefully the first will be tonight with a tutorial on Disk IO, this is followed by Keyboard IO and String Functions.

After these sections it gets a little more advanced and there will likely be a little more time between posts.  At that point we will get into more advanced graphics and memory management functions.

If you have any questions or request like usual leave a comment below.

Thursday, November 13, 2014

Upcoming Segment

If anyone has been watching the repository for the tutorial series I have been busy lately adding the code for the next few segments.  While the code is not complete yet, I am hoping to push the next segment tonight.

For any C# developers out there I hope you have been keeping up with the news from the Connect(); conference this week.  It is rather exciting that Microsoft has finally announced the much anticipated cross platform capabilities and open source nature of .NET in 2015.  We have been waiting a long time for this.  The repository where the .NET Core Framework will be housed can be found at http://github.com/dotnet/corefx.  For more details check out Scott Hanselman's blog here.

Tuesday, November 11, 2014

Assembly Tutorial 2: Include Files and Macros

The code for this tutorial can be found at https://github.com/musicman89/Assembly-Tutorial
Tutorial 1: Intro to x86 Assembly

The last segment of this series covered an introduction into the world of assembly programming.  It also was a bit winded.  Today I am going to keep it a little shorter and we will cover a relatively simple topic that often is overlooked when starting in assembly.  That topic would be macros, since what I am going to cover is quite short I will also talk briefly about include files.

Lets get started with Macros

A macro is simply a statement that is processed by the compiler.  This can be from a simple variable, a string of statements, or even mathematical functions.

First there is the single line macro.  A single line macro is exactly what it sounds a macro defined in a single line, we define them using the %define directive. 

For example, we could write %define LightGrayOnBlack 0x07, this would allow us to put LightGrayOnBlack in our code rather than writing 0x07 when printing a character to the screen.  Though the compiled code is identical, this is much easier to read and debug.

We could also write %define Poly(a,b) ((a*a) + b + 2) so if we later in our code put Poly(2,3) it would output 9 into our assembled code.  

Now for the multiple line macro.  This operates on the same rules as the single line macro, but has multiple lines.  This is useful when you have a task you do often like calling a function.  We could write the following macro to prep a string to be printed using the function we wrote in the last tutorial.

%macro Print 1
     mov bx, %1
     call print_string
%endmacro

Now when we type Print HelloWorldString it outputs the assembly instructions for us using the macro.  Also note from this example the format of parameters.

In a multiple line macro they are numbered.  You simply put the number of parameters you expect after the name and then you reference them by number of the parameter preceded by %.

There are lots of additional features to macros that can be found in the NASM manual in Chapter 4.

Now for Include Files

Include files are quite similar to macros as they tell the compiler to rather than include a line or two, that it needs to insert an entire file at that point.  An include is given using the %include directive.  Using this we can extract our print functions from the Hello World example and put them in their own file.

When compiling though you must keep in mind that for relative paths to include files to work the compiler must be run from the location they are relative to.  Either that or you need to use the -i parameter on the compiler to define the path.

Monday, November 10, 2014

Update

The last post which was the first of the tutorial series is going to get a bit of a rework this week, as well as some followup content.  I realized as I was writing that stage in the tutorial that even though we did not get very far in actual programming we covered a lot of content for one part of a tutorial series.

I am going to thus break up that content into several smaller more manageable chunks.  Please expect that update within the next few days.

Also over the past few days I have been digging up my old code and finding parts of my old graphics engine I was writing back in high school and the first year of college.  The majority of it is missing, so I am going to be rebuilding it into a tutorial series.

This series, like the assembly series is going to start with the very core basics.  A tutorial series on creating a software rendering engine.  At the end of the series I may go into taking what we have learned into DirectX and or OpenGL to use hardware rendering.

Once again if there are any topics you would like to see or any programming questions you have please feel free to ask in the comments.

Wednesday, November 5, 2014

Assembly Tutorial 1: Part 4 Using NASM and QEMU

The code for this tutorial can be found at https://github.com/musicman89/Assembly-Tutorial

Part 1: Intro
Part 2: Basic Concepts
Part 3: Writing the Code
Part 4: Using QEMU and NASM

Registering Paths in Windows
First to make using NASM and QEMU easy to use we will register them in Windows.

This is done by simply adding each path to the Path Environment Variable.
This is found by going to System Properties (Right Click on My Computer and Click Properties) then go to Advanced System Settings and Click the Environment Variables Button.

The the path field is a list of paths separated by semi-colons.

We simply need to add the paths that the QEMU and NASM executable files are in to this field and click OK.  You will need to restart you computer for this change to take effect.

Compiling with NASM
To compile basic assembly code in NASM simply open a command window and navigate to the folder the code is located in and call nasm.exe with the following parameters.
nasm -f bin "<assembly file name>.asm" -o "<output file name>.com"

The -f parameter specifies the file type, we are using binary output.  We then specify the name of the file we put the assembly code and, finally the -o parameter specifies the output file path including the name.

Running with QEMU
Once your have your code compiled you can run it in QEMU with the following command.
qemu-system-i386 -m 256 -hda "<file name>.com"

This specifies that we are going to emulate the 386 processor with 256 MB of RAM and we are setting the Hard Drive to our binary file.

Assembly Tutorial 1: Part 3 Writing the Code

The code for this tutorial can be found at https://github.com/musicman89/Assembly-Tutorial

Part 1: Intro
Part 2: Basic Concepts
Part 3: Writing the Code
Part 4: Using QEMU and NASM

Writing a Boot Sector
This is the point where we actually start writing code.  We will start by writing a basic boot sector. We will expand on this in the following steps.

We start off by putting our directives at the top of the file

[BITS 16] ;Tell the assembler to compile the code for 16 bit execution
[ORG 0x7c00]         ;Tell the assembler that our addresses start at the address 0x7c00


After that we can get into the program.  Though it is not required I always start with a label.

main: ;Declare a label for the start of the program
cli ;Disable Interrupts
hlt ;Halt the Processor

A boot sector must be 512 bytes and end with the word (2 bytes) 0xAA55.  This can easily be done using the times macro in the assembler.

times 510-($-$$) db 0         ;Pad the file to make it 512 Bytes or 1 sector
dw 0xAA55 ;Declare the word 0xAA55 signifying the end of the boot sector

Writing a Character to the Screen
At this point if you compile the code with NASM you will get a blank screen.  We need to do a little more if we want to get some output.

We will add the following code into our file after the HLT instruction.
print:
push bx ;Push the BX register to the stack
mov bh, 0x0          ;Set the Page
mov bl, 0x07      ;Set the Text Color
mov ah, 0x0E    ;Instruct the Bios to Write a Character
int 0x10    ;Call the Video Interrupt
pop bx ;Pop the BX register back from the stack
ret

In order to push a character to the screen we need to set BH to the page 0, BL to 0x07 which is light gray on black, set AH to 0x0E which tells the BIOS that we are writing a character to the screen using Teletype Output.

AL must contain the value of the character we are going to write to the screen.  We set that and call this method by updating main.

main:
        mov al, 'H' ;Store the value of the character H in the register AL
call print         ;Call the Print Function
cli ;Disable Interrupts
hlt ;Halt the Processor


Output Hello World to the Screen
Finally we get to write the string hello world to the screen.

This is simply done by pointing the register BX to the address of the string and printing the character at that address recursively while we increment the address stored in BX.  If the value at the address in BX is 0 we return as that signifies the end of the string.

A string ending with a 0 is called a null terminated string, and is the common standard among programming languages.

print_string:
push ax ;Push the AX register to the stack
push bx ;Push the BX register to the stack

mov al, [bx]       ;Move the first byte from the address stored in bx
or al,al           ;Do a binary OR on al to verify it is not zero
jz .return           ;If al is zero then skip the loop and return
.loop:
call print          ;Print the character in al
inc bx ;Increase the address in BX by 1
mov al, [bx]      ;Get the next byte from the address stored in bx
or al,al      ;Once again do a binary OR on al to verify it is not zero
jnz .loop      ;If al is not zero go back to the start of the loop
.return:
pop bx                            ;POP the BX register back from the stack
pop ax                            ;POP the AX register back from the stack
ret

HelloWorldString db 'Hello World', 0

When declaring a static location in memory with the DB instruction you can prefix it with a label. At this point you do not need to put the colon as though it is not required the assembler expects a label to precede this instruction.

At this point we can update the call in main. Do remember that a label points to an address, so when we use the MOV instruction to set BX we are setting it to the address that the HelloWorldString label is at.


main: ;Declare a label for the start of the program
mov bx, HelloWorldString         ;Set BX to the address of the Hello World String
call print_string         ;Call the print string function
cli ;Disable Interrupts
hlt ;Halt the Processor

Assembly Tutorial 1: Part 2 Basic Concepts

The code for this tutorial can be found at https://github.com/musicman89/Assembly-Tutorial

Part 1: Intro
Part 2: Basic Concepts
Part 3: Writing the Code
Part 4: Using QEMU and NASM

Basic Concepts
In this tutorial we will be covering a lot of the basic details of assembly programming including assembler directives, registers, opcodes, labels, and memory allocation.

Directives
We will start with assembler directives.  These tell the assembler how to compile the code.  We will only cover 3 of these in this tutorial.

  • BITS: This simply tells the compiler that we are writing code for 16 bit execution.  
  • ORG: This tells the compiler where this program is going to be located in memory.  In our first program we will declare ORG 0x7c00 which is the default location the bios puts the boot sector in memory.  
  • CPU: Tells the assembler what processor we are assembling the code for.  An example would be CPU 8086 if we wanted to make sure that our code is compatible with that specific processor.

Registers
Next we have the processor registers.  A register is in simple terms just a section of memory on the processor, basically a variable.  For this tutorial I will stick to the 4 general purpose 16 bit registers

  • AX: the Accumulator Register which is commonly used for Arithmetic
  • BX: the Base Register often used to store memory addresses
  • CX: the Count Register and commonly stores count values for loops
  • DX: the Data Register and is used to store the upper 16 bits of a multiply operation or the remainder of a divide operation

Each of these registers can be accessed as 2 8 bit registers by replacing the X with an H (High) or L (Low).  For example AX can be split into AH and AL.

Also note that no matter how you access the register it is the same memory so if we set the AX register to 0xFFFF and then set the AL register to 0xAA, the value of AX will now be 0xFFAA.

Opcodes
Now we get to the meat of assembly development.  The opcodes, these are the mnemonics that represent instruction calls in assembly language.  We will only cover a few basic ones in this tutorial and many more will be added in future tutorials.  At this point I will not explain how to use them, just simply explain what they do.

  • MOV: copies data from one location to another
  • INT: initiates a software interrupt we use this to call commands on the BIOS
  • JMP: moves the current point of execution to another place in the program
  • CALL: stores the current point of execution in memory and goes to another place in the program 
  • RET: sends us back to where the CALL instruction was made
  • HLT: stops the processor
  • CLI: clears the interrupts
  • ADD: performs addition
  • SUB: performs subtraction
  • MUL: performs multiplication
  • DIV: performs division


Memory Allocation and Labels
At this point we are almost to where we can write a program.  For this tutorial we will need memory allocation instructions as well as labels.

For static memory allocation we will be using the instructions DB (define byte) and DW (define word). These simply tell the assembler that what follows is data and not an instruction.

For dynamic memory allocation we use the stack, which at this point we will basically describe as a first in last out pool of data.  We use the PUSH instruction to put data into the stack and the POP instruction to retrieve it.  Think if it like a playing card discard pile.  The last card you put on the pile is going to be the first one you pick up if you take one from the pile.

As for labels, these are simply addresses that we name in the program so we can call them later.  The assembler will swap them out with the actual address.  A label is simply a name followed by a colon.  We can also have sub labels or labels within labels.  As a general rule these are prefixed with a '.'

Assembly Tutorial 1: Part 1 Intro to x86 Assembly

The code for this tutorial can be found at https://github.com/musicman89/Assembly-Tutorial

Part 1: Intro
Part 2: Basic Concepts
Part 3: Writing the Code
Part 4: Using QEMU and NASM

Intro 
This is the first in a long series of tutorials on assembly programming.  The first many will be strictly in real mode.  We will eventually get into protected mode toward the end of this series.  At that point there will be another series on operating system development using mixed C and Assembly.

Disclaimer
I do no make any claims that this is the best or most efficient way to write any of the code you see.  In many cases I deliberately chose to demonstrate in a less efficient manner because I feel it is easier to explain.

I also will not give any guarantee with the code as it is for educational purposes only.
tools needed

Feel free to copy any of the code from these tutorials for use in your own programs.  I only ask that if you do directly copy large portions of my code that you remark in your code that you did so.

Tools Required
To complete the examples in these tutorials you will need the following tools.  I am going to leave the setup of the tools out of this tutorial as that can change depending on your environment, if assistance is needed though I would be happy to help.

The first tool you will need is a text editor.  You don't need anything fancy, notepad will do.  If you prefer though there are options like Notepad++ and Sublime Text that will offer some form of syntax highlighting.

Next you will need the Netwide Assembler (NASM), which can be found at nasm.us.  This is what we will be using to compile our assembly code into binary machine code.

Finally you will need an emulator.  While this code should run just fine on a computer, it is much faster and safer to test on an emulator.  At this low of a level you can damage your computer among other things, so it is best to play it safe.  I choose to use QEMU which can be found at http://qemu.weilnetz.de/w64/, and all future tutorials will be using it.

Feel free to use another emulator such as Bochs if you want.

Try to avoid virtual machines as they generally do not emulate the processor, but pass off the instructions to your machine's processor. This can cause irregularities when developing assembly code.

Adventures in Development Wonderland

I realized recently that I created this blog a long time ago and had never actually published a post.

This blog was created to share my ideas about programming, what I have learned, what I have experienced, and what I am doing.

Currently I am working on a lot of projects in the world of programming, my biggest being writing a port of an old game in 8086 assembly.

I specify the CPU rather than the family because I am writing it to be compatible with the 8086 processor.  There is definitely progress being made, and hopefully I will be posting a lot more details in the next couple weeks, as I am nearing a time where I can publish a beta release of it.

Over the next few weeks I plan to start releasing tutorials about assembly development.

I have noticed that though tutorials exist, most are incomplete or have not been touched in more than 10 years.

If you would like to peek at the code for these tutorials before they are published, you can find the repository at https://github.com/musicman89/Assembly-Tutorial.

Please note that since the tutorials are still a work in progress the code and structure in this repository is bound to change.

Please feel free to request a topic for a blog post or tutorial in the comments, as I am always looking for new topics.