When working on OS161 system calls, you'll probably see a bunch of this error,
especially you haven't implemented _exit
syscall and try to do some basic user
programs, e.g., p /bin/true
.
Note, this problem has been fixed in OS/161 version 1.99.07.
The code for /bin/true
is as follows.
int
main()
{
/* Just exit with success. */
exit(0);
}
It does nothing but just exit with 0. Because at this point, you may don't
have exit
syscall implemented, so it'll fail, so you'll see one error message
saying "Unknown syscall 3", in which 3 is just SYS__exit
. Then what happens?
Why are there a bunch of "Unknown syscall -1" following that?
To understand this, you need to know about a bit of GCC optimization and also several
MIPS instructions, especially jal
and jr
.
MIPS Function Call and Return
Here is the MIPS assembly instruction that "calls" a function foo
.
jal foo
jal
stands for "Jump And Link", it will first save $epc+8
into register
$ra
(return address), and set $epc
to whatever address foo
are, to "jump"
to that function.
Now you may wonder why $ra
is $epc+8
, since a natural next instruction
would be $epc+4
. That's because $epc+4
is in jal
's delay slot,
which means the instruction will get executed before the jal
instruction.
So the real next instruction after the function call should be $epc+8
.
And when foo
is done and about to return, it just does this:
jr ra
jr
stands for "Jump Register". It just set $epc
to whatever value in that
register. In this case, since $ra
contains the value of return address, the
foo
functions "returns" to the next instruction after jal
in callee.
GCC Optimization
As per the comments in $OS161_SRC/user/lib/libc/stdlib/exit.c
, GCC is way too
smart to know, without being explicitly told, that exit
doesn't return. So it
actually omit the jr
instruction at the end of exit
. That is, if exit
does return, the CPU will continue to execute whatever the following
instructions.
What really happened?
Here is the assembly code of /bin/true
. You can obtain it by doing this in the
root
directory:
$ os161-objdump -d bin/true > true.S
00400100 <main>:
400100: 27bdffe8 addiu sp,sp,-24
400104: afbf0010 sw ra,16(sp)
400108: 0c10004d jal 400134 <exit>
40010c: 00002021 move a0,zero
00400110 <__exit_hack>:
400110: 27bdfff8 addiu sp,sp,-8
400114: 24020001 li v0,1
400118: afa20000 sw v0,0(sp)
40011c: 8fa20000 lw v0,0(sp)
400120: 00000000 nop
400124: 1440fffd bnez v0,40011c <__exit_hack+0xc>
400128: 00000000 nop
40012c: 03e00008 jr ra
400130: 27bd0008 addiu sp,sp,8
00400134 <exit>:
400134: 27bdffe8 addiu sp,sp,-24
400138: afbf0010 sw ra,16(sp)
40013c: 0c100063 jal 40018c <_exit>
400140: 00000000 nop
...
00400150 <__syscall>:
400150: 0000000c syscall
400154: 10e00005 beqz a3,40016c <__syscall+0x1c>
400158: 00000000 nop
40015c: 3c010044 lui at,0x44
400160: ac220430 sw v0,1072(at)
400164: 2403ffff li v1,-1
400168: 2402ffff li v0,-1
40016c: 03e00008 jr ra
400170: 00000000 nop
So main
calls exit
(0x400108), exit
calls _exit
(0x40013c). Note that
at this point, $ra=$epc+8=0x400144
. _exit
fails (because we haven't
implemented it yet), $v0
is set to -1, and
returns to $ra
. The memory between 0x400140 and 0x400150 are filled by 0,
which is nop
instruction in MIPS. So the CPU get all the way down to the
__syscall
function at 0x400150, and execute the syscall
instruction. At this
point, the value of $v0
is -1. That's why we see the first Unknown syscall
-1
error message.
And after the syscall fails, the CPU will continue execution at 0x400154, and
finally do jr ra
(0x40016c). Since $ra
is still 0x400144, the whole process
repeats again. That's why you keep seeing Unknown syscall -1
error.
How to fix?
The problem is, GCC assumes exit
does not return, thus doesn't generate the
jr ra
instruction for exit
. But before we implement _exit
syscall, exit
does return. Then we lose control and things get messy.
Then how to fix this? Well, the easiest way to fix this is...implement _exit
,
of course. After all, that's what you suppose to do in ASST2 anyway.
In terms of the problem itself, the latest version of OS/161 (1.99.07) has fixed this. Here is how:
void
exit(int code)
{
/*
* In a more complicated libc, this would call functions registered
* with atexit() before calling the syscall to actually exit.
*/
#ifdef __mips__
/*
* Because gcc knows that _exit doesn't return, if we call it
* directly it will drop any code that follows it. This means
* that if _exit *does* return, as happens before it's
* implemented, undefined and usually weird behavior ensues.
*
* As a hack (this is quite gross) do the call by hand in an
* asm block. Then gcc doesn't know what it is, and won't
* optimize the following code out, and we can make sure
* that exit() at least really does not return.
*
* This asm block violates gcc's asm rules by destroying a
* register it doesn't declare ($4, which is a0) but this
* hopefully doesn't matter as the only local it can lose
* track of is "code" and we don't use it afterwards.
*/
__asm volatile("jal _exit;" /* call _exit */
"move $4, %0" /* put code in a0 (delay slot) */
: /* no outputs */
: "r" (code)); /* code is an input */
/*
* Ok, exiting doesn't work; see if we can get our process
* killed by making an illegal memory access. Use a magic
* number address so the symptoms are recognizable and
* unlikely to occur by accident otherwise.
*/
__asm volatile("li $2, 0xeeeee00f;" /* load magic addr into v0 */
"lw $2, 0($2)" /* fetch from it */
:: ); /* no args */
#else
_exit(code);
#endif
/*
* We can't return; so if we can't exit, the only other choice
* is to loop.
*/
while (1) { }
}
So if _exit
returns for any reason, we just access an address we know is
invalid, thus trigger an exception, and the kernel just panics.