<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-252748366838692884</id><updated>2011-07-07T21:10:42.910-07:00</updated><title type='text'>personal ngaro vm news ;)</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>23</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-3501269473122369508</id><published>2009-08-21T03:54:00.000-07:00</published><updated>2009-08-21T03:54:38.568-07:00</updated><title type='text'>Some new words for an assembler</title><content type='html'>Hello,&lt;br /&gt;&lt;br /&gt;Ngaro mnemonics can now be uses like other words in colon definations:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;: test swps + dup * ;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Immediate parameters are handled from within the interpreter:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;: fib.n ins &lt;br /&gt;        adda &lt;br /&gt;        swps&lt;br /&gt;        decd&lt;br /&gt;        bnzd [ tail ], rts&lt;br /&gt;; fib.n&lt;br /&gt;  constant fib.n-trace&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;I'm now writing some documentation and will add mor compiler back-ends (x86-64 and ARM at first).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-3501269473122369508?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/3501269473122369508/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=3501269473122369508' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/3501269473122369508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/3501269473122369508'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/08/some-new-words-for-assembler.html' title='Some new words for an assembler'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-8716822717757224165</id><published>2009-08-19T05:02:00.000-07:00</published><updated>2009-08-19T05:13:17.813-07:00</updated><title type='text'>benchmarking</title><content type='html'>A first, simple benachmark test:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;TYPE: Machine-code generation test&lt;br /&gt;OS:   XUbuntu 9.04&lt;br /&gt;CPU:  Intel Pentium 4 rev 2&lt;br /&gt;      1,6 GHz&lt;br /&gt;MEM:  1024 Mbyte&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;GFORTH 0.7.0;&lt;br /&gt;&lt;br /&gt;SOURCE:&lt;br /&gt;: loop-test 1000000000 0 do 1+ loop ; 0 loop-test . bye&lt;br /&gt;&lt;br /&gt;START:&lt;br /&gt;time ./gforth --dynamic loop.fs&lt;br /&gt;&lt;br /&gt;RESULT: 1000000000&lt;br /&gt;&lt;br /&gt;TIMINGS: (at stable):&lt;br /&gt;real 0m5.458s, user 0m4.280s, sys 0m0.036s&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;4P 3.1 LUXOR;&lt;br /&gt;&lt;br /&gt;SOURCE:&lt;br /&gt;: loop-test 1000000000 0 do 1+ loop ; 0 loop-test . bye&lt;br /&gt;&lt;br /&gt;START:&lt;br /&gt;time ./forth4p -j loop.fs&lt;br /&gt;&lt;br /&gt;RESULT: 1000000000&lt;br /&gt;&lt;br /&gt;TIMINGS: (at stable):&lt;br /&gt;real 0m1.493s, user 0m1.308s, sys 0m0.012s&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;RETRO DEV (10.2), extended-ngaro:&lt;br /&gt;&lt;br /&gt;SOURCE:&lt;br /&gt;: loop-test ins [ inca , decb , bnzb , 11 , rts , ] ;&lt;br /&gt;  loop-test constant loop-test-trace&lt;br /&gt;&lt;br /&gt;: start ins [ lib , 1000000000 , lia , 0 , bi , loop-test-trace , rts , ] ;&lt;br /&gt;  start constant start-trace&lt;br /&gt;&lt;br /&gt;: test start-trace aot . ; test bye&lt;br /&gt;&lt;br /&gt;START:&lt;br /&gt;time ./retro --with loop-opt.fs&lt;br /&gt;&lt;br /&gt;RESULT: 1000000000&lt;br /&gt;&lt;br /&gt;TIMINGS (at stable):&lt;br /&gt;real 0m2.606s, user 0m1.608s, sys 0m0.096s&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;is promising because the code generator doesn't use any optimations yet. As written before I'm not happy with the current assembler and try to make code generation simplier. Another work in progress is to integrate the native-code generator into retro's compiler so forth definations would generate traces instead of vm code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-8716822717757224165?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/8716822717757224165/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=8716822717757224165' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/8716822717757224165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/8716822717757224165'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/08/benchmarking.html' title='benchmarking'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-7374754349089399403</id><published>2009-08-09T06:10:00.000-07:00</published><updated>2009-08-09T06:58:35.363-07:00</updated><title type='text'>all finished !</title><content type='html'>&lt;span style="font-weight:bold;"&gt;The compiler interface is finsihed so all parts are now complete and can be used from within retro !&lt;/span&gt; I'm now writing an assembler and think about a parser for compiling forth source-code into traces.&lt;br /&gt;&lt;br /&gt;The first application for the assembler will be a port of these benchmark suite (just for fun and to oversee possible optimation demands):&lt;br /&gt;&lt;br /&gt;http://shootout.alioth.debian.org/&lt;br /&gt;&lt;br /&gt;Then there are some projects of me which wait for implementation...&lt;br /&gt;&lt;br /&gt;Roadmap for next version:&lt;br /&gt;&lt;br /&gt;- Writing an assembler.&lt;br /&gt;- Finishing documentation of the compiler and extended-instruction set.&lt;br /&gt;- Implement back-ends for x86-64, ARM, PowerPC and TMS320c60x&lt;br /&gt;  (how about using Retro on the beagle board ?)&lt;br /&gt;&lt;br /&gt;For future releases:&lt;br /&gt;&lt;br /&gt;- extend the port system for low-level accessing of block devices.&lt;br /&gt;- writing an os for supporting easy file and data access.&lt;br /&gt;- Implement an IDE for retro, a spreedsheed, database and text processor.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-7374754349089399403?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/7374754349089399403/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=7374754349089399403' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/7374754349089399403'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/7374754349089399403'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/08/all-finished.html' title='all finished !'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-6020956238966327507</id><published>2009-08-05T15:16:00.000-07:00</published><updated>2009-08-06T00:31:34.503-07:00</updated><title type='text'>Compiler is finished</title><content type='html'>The only thing left is implementing some new opcodes for interpreter integration and execution of generated instructions. At current there exist only a back-end for x86-32 class cpu's but work for x86-64, ARM, PowerPC and possibly MIPS and ez80 processors are under way.&lt;br /&gt;&lt;br /&gt;By the way, here is a little benchmark of the new interpreter against gforth (please note, without using the compiler yet):&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;OS:&lt;/B&gt;&lt;/I&gt; &lt;br /&gt;Ubuntu 2.6.28-14.47-generic&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;C compiler:&lt;/B&gt;&lt;/I&gt; &lt;br /&gt;gcc-Version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;CPU:&lt;/B&gt;&lt;/I&gt;&lt;br /&gt;Intel(R) Pentium(R) 4 CPU 2.40GHz stepping 07&lt;br /&gt;Total of 1 processors activated (4782.53 BogoMIPS)&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;Mem:&lt;/B&gt;&lt;/I&gt;&lt;br /&gt;492852k/516032k&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;Benchmark (FIB):&lt;/B&gt;&lt;/I&gt;&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;gforth 0.7.0 - retro 10.2.1&lt;/B&gt;&lt;/I&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;time ./gforth --dynamic fib2.fs&lt;br /&gt;&lt;br /&gt;result: 102334155 &lt;br /&gt;&lt;br /&gt;real | 0m9.418s 0m9.466s 0m9.297s&lt;br /&gt;user | 0m9.365s 0m9.373s 0m9.273s&lt;br /&gt;sys. | 0m0.012s 0m0.024s 0m0.012s&lt;br /&gt;&lt;br /&gt;time ./retro &lt; fib.f&lt;br /&gt;&lt;br /&gt;result: 102334155&lt;br /&gt;&lt;br /&gt;real | 0m9.042s 0m8.931s 0m9.651s&lt;br /&gt;user | 0m8.873s 0m8.857s 0m9.565s&lt;br /&gt;sys. | 0m0.056s 0m0.060s 0m0.060s&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;I&gt;&lt;B&gt;Source code:&lt;/B&gt;&lt;/I&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;fib2.fs -&lt;br /&gt;: fib dup 2 &lt; if exit then 1- dup recurse swap 1- recurse + ; 40 fib . cr bye&lt;br /&gt;&lt;br /&gt;fib.f -&lt;br /&gt;: fib dup 2 &lt; if ;; then 1- dup fib swap 1- fib + ; 39 fib . cr bye&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;That's interesting because ngaro is a TTC vm and gforth uses DTC via GCC's label-value extension. It seams replicated-switch threading can be compiled to very efficient code (and is ANSI conform) !&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-6020956238966327507?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/6020956238966327507/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=6020956238966327507' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/6020956238966327507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/6020956238966327507'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/08/compiler-is-finished.html' title='Compiler is finished'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-934169570413929862</id><published>2009-07-21T15:01:00.000-07:00</published><updated>2009-07-21T15:15:04.381-07:00</updated><title type='text'>Compiler is working !</title><content type='html'>The compiler for ngaro now works as espected and I can translate and execute some little vm-code sequences at runtime by hand. Now, as the main work is done it's time to implement the full ngaro-instruction set and extend the new vm with some opcodes for compilation on demand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-934169570413929862?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/934169570413929862/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=934169570413929862' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/934169570413929862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/934169570413929862'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/07/compiler-is-working.html' title='Compiler is working !'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-4428019288682940602</id><published>2009-07-06T09:57:00.000-07:00</published><updated>2009-07-06T10:27:12.408-07:00</updated><title type='text'>New vm version</title><content type='html'>The newer gcc-compiler suites drives me crazy because of some "optimations" which make it very hard to implement an indirect-threading interpreter without some deeper investigation of the generated assembler code (the version which uses gcc's first-class label extension).&lt;br /&gt;&lt;br /&gt;So I implement a new method called replicated-switch threading which have some advantages:&lt;br /&gt;&lt;br /&gt;- this method is ANSI conform.&lt;br /&gt;- the performance seems to be equivalent to the older extended-vm.&lt;br /&gt;- the code generator (gcc) produce better code without the need for using expensive optimation flags.&lt;br /&gt;- the vm source code looks more simply yet.&lt;br /&gt;&lt;br /&gt;The next step: Porting my little native-code compiler to this vm.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-4428019288682940602?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/4428019288682940602/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=4428019288682940602' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/4428019288682940602'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/4428019288682940602'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/07/new-vm-version.html' title='New vm version'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-6926647667446805956</id><published>2009-02-04T11:11:00.000-08:00</published><updated>2009-02-04T11:35:50.326-08:00</updated><title type='text'>Status</title><content type='html'>Baíran:&lt;br /&gt;&lt;br /&gt;TTC and DTC part: works, &lt;br /&gt;meta-code compiler: 3'th rewriting (the last one for sure)&lt;br /&gt;&lt;br /&gt;summary: think the vm is finished the end of february.&lt;br /&gt;&lt;br /&gt;Then I reanimate my GIT account and port retro to it. As I plan to use the current retro forth for the assembler I think abount the pro and contra of self compilation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Baíran:&lt;br /&gt;&lt;br /&gt;TTC und DTC interpreter: Fertig.&lt;br /&gt;Metacode Compiler: Finale, dritte Version in Arbeit.&lt;br /&gt;&lt;br /&gt;Zusammenfassung: Die VM wird Ende Februar fertig sein.&lt;br /&gt;&lt;br /&gt;Ich werd dann meinen GIT Account reanimieren und beginnen Retro zu portieren. Hierbei plane ich die derzeitige Version für den Assembler zu nutzen weshalb ich gerade über die Vor und Nachteile philosophiere, das ganze irgendwann sich selbst kompilieren zu lassen.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-6926647667446805956?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/6926647667446805956/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=6926647667446805956' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/6926647667446805956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/6926647667446805956'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/02/status.html' title='Status'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-8071249100674484516</id><published>2009-01-07T10:13:00.000-08:00</published><updated>2009-01-07T11:44:34.380-08:00</updated><title type='text'>a new year, a new approach</title><content type='html'>First, I wish all who read this a happy new year !&lt;br /&gt;&lt;br /&gt;I have lately experimented with concepts of functional programming, different vm dispatch routines, interpreter design and optimation. My conclusions from all these experiments are:&lt;br /&gt;&lt;br /&gt;1.)&lt;br /&gt;Indirect and direct threading interpreters written in C compile to very efficient code if one knows the right compiler flags (for GCC).&lt;br /&gt;&lt;br /&gt;2.) &lt;br /&gt;So a simple interpreter in C, with static or dynamic super instructions and caching of vm registers can have a low performance disadvantage in relation to a simple (JiT) native-code compiler, too small as it would be worth to give up platform independence for it. Interestingly enough, it's seems I'm not the only one who came to similarly conclusions. The author of the LuaJIT project describes in its roadmap for 2008 the following:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Preliminary benchmarks show that *the interpreter alone* is&lt;br /&gt;already 2x-4x faster in most cases. Embarassingly this is better&lt;br /&gt;than LuaJIT 1.x in a few benchmarks. Much higher speedups are&lt;br /&gt;still to be expected from the trace compiler.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Similarly as I tried it with Vid, he had rewritten the Lua vm from a tree based to a indirect threading interpreter in assembler. To archive the high run-time performance:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;the bytecode format has been redesigned. It's much more orthogonal, partially specialized and tuned for faster dispatch.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Ah, a déjà vue !&lt;br /&gt;&lt;br /&gt;As I see it, a trace compiler can be realized with threaded code also (this would have the advantage of guaranteed platform independence), in fact the stream compiler of the extended Ngaro vm isn't a very different approach just a bit simplier ! &lt;br /&gt;&lt;br /&gt;2.)&lt;br /&gt;For a high run-time performance, a simple opcode encoding can be much more important than the threading variant (so register based vm's has a disadvantage here).&lt;br /&gt;&lt;br /&gt;3.)&lt;br /&gt;A complex but efficient instruction set is better suited for interpretation because of reduced dispatch time per functionality.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Which follows from it ?&lt;br /&gt;&lt;br /&gt;I write at present a new vm for retroforth which implements a new instruction set and a new opcode encoding. The vm is higly orthogonal, scalable and supports fine-gained, parallel execution of vm-code sequences. I expect a far better performance as of the old vm.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-8071249100674484516?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/8071249100674484516/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=8071249100674484516' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/8071249100674484516'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/8071249100674484516'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2009/01/new-year-new-approach.html' title='a new year, a new approach'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-905739878658631950</id><published>2008-11-20T23:45:00.000-08:00</published><updated>2008-11-21T03:45:20.255-08:00</updated><title type='text'>compiler details (ngaro)</title><content type='html'>after discussing a bit, I had a nice idea to solve one little problem: Translate branch offsets form ITC to native code and back. My solution is simple to add an lenght field to the opcode table so for each instruction the start adress of the appropriate machine code is calculable. There, peep-hole optimation can be realized with some kind of opcode fusion (a vm-code sequence is replaced with a single one of identical functionality), I don't see any resolvable problems.&lt;br /&gt;&lt;br /&gt;Another challange is calling to ITC from native code (that's not so relevant for forth but can be an issue for functional programming languages). The problem here is the next routine of the vm (no definable return point to native code). Adding a new instruction offers some benefits therefore I decided to inline ITC sequences. For this a new vm instruction is present (SINST_ITC).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-905739878658631950?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/905739878658631950/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=905739878658631950' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/905739878658631950'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/905739878658631950'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/11/compiler-details-ngaro.html' title='compiler details (ngaro)'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-7085298994509904056</id><published>2008-11-13T12:23:00.000-08:00</published><updated>2008-11-13T13:51:27.566-08:00</updated><title type='text'>my current work</title><content type='html'>&lt;span&gt;&lt;b&gt;Vid Forth&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Vid Forth is evolved to a hybrid threaded/native code generating forth written in assembler (yasm) for x86_64 class cpu's. The compiler decompose words to function primitives which consume one or two stack items. The resulting list is then compiled to native code. As the stack adressing can be directly mapped to register operations this way only a peep-hole optimation is needed for a good code quality.&lt;br /&gt;&lt;br /&gt;Bu this method has also a drawbreak: Though each word defination results in a mostly large list of functional sequences the code size for compiled words can get large compared to itc or dtc (all dependencies are resolved though inlining). To compensate this, word definations can be compiled to itc also (and itc dependencies are resolved though calling the inner interpreter).&lt;br /&gt;&lt;br /&gt;Why itc and not stc or dtc ? My benchmark test shows no significant performance advantages for x86_64 class cpu's and call sequences add the opcode lenght to the adress field independent of the adress-field size so the resulting code would be a bit larger.&lt;br /&gt;&lt;br /&gt;By the way, this means I need a fast interpreter because performance can be an important issue for itc words also !&lt;br /&gt;&lt;br /&gt;At the moment I implement the vm and compiler primitives and new features needed for functional processing, for example:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;quotations:&lt;span&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;2 &lt;br /&gt;25 [dup * sin] ither&lt;br /&gt;&lt;br /&gt;1 2 3 4 &lt;br /&gt;3 [pi + 1.2436 *] map&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;span&gt;locals:&lt;span&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;: multi {a b} a b * ;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Strings, quotation and base prefixes can be used without a space as seperator !&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;extended ngaro vm&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I work on a little native code compiler for at runtime generated opcodes and write a documentation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-7085298994509904056?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/7085298994509904056/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=7085298994509904056' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/7085298994509904056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/7085298994509904056'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/11/my-current-work.html' title='my current work'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-520333360238178447</id><published>2008-11-11T05:59:00.000-08:00</published><updated>2008-11-11T11:03:51.689-08:00</updated><title type='text'>benchmark results</title><content type='html'>ok, i've finished a little benchmark test to compare different threading schemes and here are the results (based upon Anton Ertl's threading benchmark V2):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;CPU:&lt;/span&gt;&lt;br /&gt;AMD Athlon64 3000+&lt;br /&gt;512 MB Ram&lt;br /&gt;FreeBSD 7&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Compiler:&lt;/span&gt;&lt;br /&gt;gcc   (GCC) 4.2.1 20070719  [FreeBSD]&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;compiler flags&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;subroutine.c :  &lt;br /&gt;-O  -fomit-frame-pointer (°1)&lt;br /&gt;direct.c :      &lt;br /&gt;-O3 -fomit-frame-pointer&lt;br /&gt;indirect.c :    &lt;br /&gt;-O3 -fomit-frame-pointer&lt;br /&gt;switch.c :      &lt;br /&gt;-O  -fomit-frame-pointer (°1)&lt;br /&gt;repl-switch.c : &lt;br /&gt;-O  -fomit-frame-pointer (°1)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;1: &lt;br /&gt;the -O3 flag would unrolling the loop in subroutine.c and result in&lt;br /&gt;impressive bad code for switch constructs so in both cases the -O flag gives&lt;br /&gt;better results for comparations.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Results:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;Vid:            &lt;br /&gt;0.548u 0.000s 0:00.57&lt;br /&gt;subroutine :    &lt;br /&gt;0.251u 0.000s 0:00.25&lt;br /&gt;direct:         &lt;br /&gt;0.429u 0.000s 0:00.46&lt;br /&gt;indirect:       &lt;br /&gt;0.517u 0.007s 0:00.56&lt;br /&gt;switch:         &lt;br /&gt;1.277u 0.000s 0:01.29&lt;br /&gt;repl-switch:    &lt;br /&gt;0.604u 0.000s 0:00.64&lt;br /&gt;call:           &lt;br /&gt;1.157u 0.000s 0:01.19&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Results for subroutine.c, switch.c and repl-switch.c with -O3 flag&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;subroutine:     &lt;br /&gt;0.010u 0.000s 0:00.01&lt;br /&gt;switch:         &lt;br /&gt;1.247u 0.000s 0:01.28&lt;br /&gt;repl-switch:    &lt;br /&gt;0.858u 0.000s 0:00.89&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The replicating switch test shows a similar performance compared to indirect or direct threading dispatches:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;0:00.64 (repl-switch) versus &lt;br /&gt;0:00.56 (ITC) versus &lt;br /&gt;0:00.46 (DTC).&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;The time difference is so small that one can ask about the avantage to implement threading though gnu's label extension. However, this result is surprising for me but don't forget: For other cpu's, the results can be quite different of course !&lt;br /&gt;&lt;br /&gt;The current dispatch scheme of Vid Forth isn't optimated yet but shows equal results with highly optimated C Code and thats a good thing. If I insert 4 nops to the NEXT pattern, the executing time goes down to 0:00.46 ! (this is a AMD specific optimation).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Ich habe einen kleinen Benchmark Test durchgeführt um die einzelnen Threading Schemen besser einschätzen zu können. Hier sind die Ergebnisse:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Prozessor:&lt;/span&gt;&lt;br /&gt;AMD Athlon64 3000+&lt;br /&gt;512 MB Ram&lt;br /&gt;FreeBSD 7&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Compiler:&lt;/span&gt;&lt;br /&gt;gcc   (GCC) 4.2.1 20070719  [FreeBSD]&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Compiler Flaggen&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;subroutine.c :  &lt;br /&gt;-O  -fomit-frame-pointer (°1)&lt;br /&gt;direct.c :      &lt;br /&gt;-O3 -fomit-frame-pointer&lt;br /&gt;indirect.c :    &lt;br /&gt;-O3 -fomit-frame-pointer&lt;br /&gt;switch.c :      &lt;br /&gt;-O  -fomit-frame-pointer (°1)&lt;br /&gt;repl-switch.c : &lt;br /&gt;-O  -fomit-frame-pointer (°1)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;1: &lt;br /&gt;Der -O3 Parameter führt zum exessiven "Loop-Unrolling" in subroutine.c und erzeugt beeindruckend schlechten Code für switch Konstrukte so das stattdessen in diesen Fällen mit "-O" kompiliert wurde um vergleichbare Ergebnisse zu erzielen.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Ergebnis:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;Vid:            &lt;br /&gt;0.548u 0.000s 0:00.57&lt;br /&gt;subroutine :    &lt;br /&gt;0.251u 0.000s 0:00.25&lt;br /&gt;direct:         &lt;br /&gt;0.429u 0.000s 0:00.46&lt;br /&gt;indirect:       &lt;br /&gt;0.517u 0.007s 0:00.56&lt;br /&gt;switch:         &lt;br /&gt;1.277u 0.000s 0:01.29&lt;br /&gt;repl-switch:    &lt;br /&gt;0.604u 0.000s 0:00.64&lt;br /&gt;call:           &lt;br /&gt;1.157u 0.000s 0:01.19&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Ergebnisse für subroutine.c, switch.c und repl-switch.c mit -O3 Parameter:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;subroutine:     &lt;br /&gt;0.010u 0.000s 0:00.01&lt;br /&gt;switch:         &lt;br /&gt;1.247u 0.000s 0:01.28&lt;br /&gt;repl-switch:    &lt;br /&gt;0.858u 0.000s 0:00.89&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Der repl-switch Test zeigt eine similare Performanz zu ITC und DTC:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;0:00.64 (repl-switch) versus &lt;br /&gt;0:00.56 (ITC) versus &lt;br /&gt;0:00.46 (DTC).&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Die Zeitdifferenz ist so gering, daß man bezweifeln könnte ob es wirklich Sinn machen würde, Threading über die proprietäre Labelerweiterung des GCC Compilers zu realisieren. Wie auch immer, dieses Ergebnis hat mich überrascht, man sollte allerdings nicht vergessen, daß es für andere Prozessoren natürlich nicht automatisch verallgemeinert werden kann.&lt;br /&gt;&lt;br /&gt;Das derzeitige ITC Schema von VidForth ist noch nicht optimiert, liefert allerdings gleichrangige Ergebnisse mit hochoptimierten C Code (soweit schonmal nicht schlecht). Erweitere ich die NEXT Routine um 4 NOPs (allerdings eine AMD spezifische Optimierung) sinkt die Ausführungzeit auf 0:00.46 s ! Folglich ist eine weitere Verbesserung noch möglich.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-520333360238178447?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/520333360238178447/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=520333360238178447' title='3 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/520333360238178447'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/520333360238178447'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/11/benchmark-results.html' title='benchmark results'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-1541208085121799045</id><published>2008-11-05T05:43:00.000-08:00</published><updated>2008-11-05T09:02:09.783-08:00</updated><title type='text'>benchmarking threading schemes</title><content type='html'>I have spent a night with counting clock cycles and latencies of various threading schemes (and study a lot of assembler-optimization guides of course) to finalize my effort with a benchmark test today. Some things are generally interesting:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;String opcodes for dispatch&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The traditional way to implement threading for ia32 processors is to dispatch the next word via LODSx. I have found that these instruction is always a serialized, micro-coded instruction on all newer cpu's from Intel and AMD (including the VIA C7). This results in a huge latency. In combination with an indirect jump the NEXT routine can have a dependency of something between 5-7 clock states or greater ! As the advantage for older processors like the i386 of a shorter instruction length isn't important here (Pentium 4 and AMD Athlon class cpu's have a much larger level 1 cache) I think it's better to avoid string instructions for threading dispatch.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;CALL versus JMP&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A indirect call instead a indirect jump seems to have the advantage of a lesser branch mis-prediction ratio (AMD Athlon64) but I need some more testing because a sequence of variable length nops after a jmp have an effect also and I'm puzzled why (the AMD guides gives an example for this but don't explain the reason for these behavior).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-1541208085121799045?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/1541208085121799045/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=1541208085121799045' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/1541208085121799045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/1541208085121799045'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/11/benchmarking-threading-schemes.html' title='benchmarking threading schemes'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-6796313003284341249</id><published>2008-11-05T03:48:00.000-08:00</published><updated>2008-11-05T09:05:22.257-08:00</updated><title type='text'>Threaded and native code</title><content type='html'>As the code grows, the token threaded code seems to me redundant so I switched to a kind of indirect threading so token threaded code can be implemented as addition (I named the class word DOCOL.TTC). As compiled, native code primitives are simply implemented in the classic way with a pointer to the code. Now, there exist three compiler-class words, one for TTC, one for ITC and one for native code generation and two words for colon definitions (DOCOL.ITC and DOCOL.TTC). A bit unorthodox but I'm quite happy with it !&lt;br /&gt;&lt;br /&gt;Eine TTC basierte VM erscheint mir mitlerweile überflüssig, da entsprechender Code innerhalb einer ITC VM bei Bedarf ebenfalls ausführbar wäre (des entsprechende Interpreterwort habe ich DOCOL.TTC genannt). Da des weiteren, auf diesem Wege, die Ausführung von nativem Code in klassischer Weise implementierbar wird, was vieles vereinfacht, ist das Ergebnis ein ITC Forth mit 5 zusätzlichen Klassen, jeweils eine für die Generierung von TTC, ITC sowie nativen Code sowie zwei für Colondefinitionen (DOCOL.ITC sowie DOCOL.TTC).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-6796313003284341249?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/6796313003284341249/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=6796313003284341249' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/6796313003284341249'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/6796313003284341249'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/11/as-code-grows-token-threaded-code-seems.html' title='Threaded and native code'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-3104422507090857292</id><published>2008-10-26T00:34:00.000-07:00</published><updated>2008-10-26T01:03:03.685-07:00</updated><title type='text'>A native code compiling forth system called Vid.</title><content type='html'>&lt;p&gt;To differentiate the native code compiler from Retroforth,  I name it Vid Forth; Now working on the compiler. The resulting system becomes compatible with Retroforth 10, including an ANS modul.&lt;/p&gt;&lt;p&gt;Um mein System vom Retroforth Interpreter zu unterscheiden habe ich es Vid Forth genannt; Arbeite momentan am Compiler. Das fertige System wird zu Retroforth 10 kompatibel sein sowie einen einigermaßen vollständiges ANS Modul umfassen.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-3104422507090857292?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/3104422507090857292/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=3104422507090857292' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/3104422507090857292'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/3104422507090857292'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/10/native-code-compiling-forth-system.html' title='A native code compiling forth system called Vid.'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-2255209072952925850</id><published>2008-10-22T02:30:00.000-07:00</published><updated>2008-10-26T00:34:42.824-07:00</updated><title type='text'>compiler strategy</title><content type='html'>&lt;p&gt;the smp detection is done. I'm searching now for a way to init long mode without needing to handle these obscure paging mechanism (who needs a 4 or 3 way ass. page table ?! For my propose, I don't need paging at all and if one study modern operationg systems like MS Singularity, Sharp OS and others, paging seems to be in general very redundant to me).&lt;/p&gt;&lt;p&gt;After discuss a bit on comp.lang.forth I had a nice idea related to my compiler strategy:&lt;/p&gt;&lt;p&gt;If the source of a forth word is simply interpretated as functional quotation, it's possible to decompose these to a reduced form in which each function adress only one or two stack items. In this case, the compiler can map all parameters to fixed register allocations with the chance to avoid stack accessing at all (this is possible because the first two stack items shoud be cached in registers anyway).&lt;/p&gt;&lt;p&gt;References to other words become in principle inlined bytecode fragments (so no problems with the optimation of inlined code blocks) and code optimation is done though a very simple peep-hole optimiser (no sophisticated compiler optimations are needed).&lt;/p&gt;&lt;p&gt;I have create a prototype compiler and the generated code is very good - Forthers should think functional I think !&lt;/p&gt;&lt;p&gt;Die SMP Detektion ist abgeschlossen. Ich versuche nun derart den "long mode" so zu initialisieren, daß ich um diesen obskuren Paging Mechanismus herumkomme (welches Betriebsystem benötigt eigentlich eine bis zu 4 fach assoziative Seitenbeschreibungstabelle ?! Meine VM benötigt eigentlich überhaupt keine und wenn man sich einmal moderne Betriebsysteme wie MS Singularity, Sharp OS und andere ansieht wird schnell deutlich, daß jeder Paging Mechanismus prinzipiell überflüssig ist).&lt;/p&gt;&lt;p&gt;Nach einer netten Diskussion unter comp.lang.forth kam mir eine gute Idee für meine Kompilerstrategie:&lt;/p&gt;&lt;p&gt;Wenn man den Code eines Wortes funktional auffasst, kann man jenen prinzipiell in eine reduzierte Form überführen deren Funktionen maximal zwei Parameter beanspruchen. Falls man nun diese, maximal zwei Parameter fest einzelnen Registern zuordnet und die ersten beiden Stapelelemente ebenfalls in Registern hält ist es möglich Stapelbezüge komplett in Registeradressierungen zu überführen.&lt;/p&gt;&lt;p&gt;Der so generierte Machinencode ist durch einen einfachen "peep-hole" Algorithmus optimierbar.&lt;/p&gt;&lt;p&gt;Ich habe nach dieser Strategie einen Kompilerprototypen erstellt und der so generierte Code ist mehr als zufriedenstellend - ich denke, Forth Programmierer sollten funktional denken !&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-2255209072952925850?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/2255209072952925850/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=2255209072952925850' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/2255209072952925850'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/2255209072952925850'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/10/compiler-strategy.html' title='compiler strategy'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-3928201172511003116</id><published>2008-10-11T02:01:00.000-07:00</published><updated>2008-10-11T02:31:57.571-07:00</updated><title type='text'>I like to code in assembler for sure</title><content type='html'>&lt;p&gt;The new vm will get along without operating system. The bootstrap code is finished and I`m working now on smp detection and initialisation whereupon a small, retroforth compatible,  forth kernel will be the basis for the vm compiler and other nice things. This have one big avantage: The engine can be implementated in forth ! I think it would no problem to port the assembler source code to other architectures like PowerPC based systems in one or two days because the kernel is small so it doesn't need to be really written in C (and you doesn't get crazy about compiler and "syntax" realated pitfalls of this great "language").&lt;/p&gt;&lt;p&gt;Die neue VM wird kein Betriebsystem mehr benötigen.  Der Bootlader ist fertig und ich habe begonnen am Code für die SMP Erkennung und Initialisierung zu arbeiten. Dieser wird dann die Basis für einen kleinen, Retroforth kompatiblen, Forthkernel bilden welcher es wiederum zulässt den VM Kompiler komplett in Forth zu implementieren. Ich denke die Assembler Sourcen sind recht einfach auf andere Prozessorarchitekturen zu übertragen, der Kernelcode ist simpel und vor allem nicht derart umfangreich, das eine Implementierung in C sinnvoll wäre (endlich kein Herumärgern mehr mit syntaktischen und kompilerabhängigen Feinheiten dieser tollen "Programmiersprache").&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-3928201172511003116?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/3928201172511003116/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=3928201172511003116' title='5 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/3928201172511003116'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/3928201172511003116'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/10/i-like-to-code-in-assembler-for-sure.html' title='I like to code in assembler for sure'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-2810963699078457762</id><published>2008-09-11T23:36:00.000-07:00</published><updated>2008-10-16T01:24:25.007-07:00</updated><title type='text'>A new Ngaro VM - die nächste Version</title><content type='html'>&lt;p&gt;The extended Ngaro VM implements new possibilities like the generation of new opcodes at runtime but is inconsequent in its architecture. That's the reason I'am working on a complete new instruction level design which offers much more possibilities for code optimations. The new VM implement a kind of cpu independant microcode for the generation of instruction level architectures (lots of fun with IA32, TMS9900 and 32 bit extended 6502 like ISA's on the same vm). In addition, it is now possible to stream generated opcodes conditionally to serialise loops, recursions and other flow dependant code. This way the new VM is scalable and opcodes can be executed generally parallel (ok, a compiler is needed that can be serialise code).&lt;/p&gt;&lt;p&gt;Die bisherige VM bietet zwar einige neue Möglichkeiten wie etwa die Generierung neuer Bytecodes zur Laufzeit, ist jedoch hierbei nicht konsequent und das ist ein grundlegender Nachteil. Daher arbeite ich zur Zeit an einer komplett neuen Instruktionsarchitektur welche Eigenschaften aufweist, die bisher meines Wissens in keiner etablierten VM Eingang gefunden haben. So implementiert die neue VM einen prozessorunabhängigen, jedoch sehr hardwarenahen, Mikrocode zur Implementierung beliebiger Instruktionsarchitekturen zur Laufzeit. Es ist möglich derart generierte Opcodes zu streamen, d.h. ihre Laufzeitverhalten von einzelnen Flaggen abhängig zu machen wodurch z.B. Rekursionen sowie Schleifen als serielle Codeabschnitte implementierbar sind. Hierdurch wird es wiederum möglich die Ausführung ohne zusätzlichen Aufwand zu parallelisieren da Datenabhängigkeiten aufgelöst bzw. auf die VM ausgelagert werden und seitens eines Compilers oder Interpreters nicht mher berücksichtigt werden brauchen. Die VM Architektur ist also in hohen Maße skalierbar (man beachte nur die Möglichkeiten, welcher sich im Hinblick auf moderne Mehrkernprozessoren heiraus ergeben).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-2810963699078457762?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/2810963699078457762/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=2810963699078457762' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/2810963699078457762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/2810963699078457762'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/09/new-ngaro-vm-die-nchste-version.html' title='A new Ngaro VM - die nächste Version'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-5215665850681764622</id><published>2008-08-19T09:00:00.000-07:00</published><updated>2008-08-19T09:18:41.401-07:00</updated><title type='text'>Status update</title><content type='html'>&lt;p&gt;VM runs stably and I am about to accomplish a last test to check around one potential error. As preparation for the bench mark test I have in addition begun to write an assembler because Toka doesn't work under my present 64 bits Linux system and I urgently need a retro image which uses the possibilities of the extended ngaro version.&lt;/p&gt;&lt;p&gt;Die VM läuft soweit, ich führe gerade noch einen kleinen Test durch um eine potentielle Fehlerquelle abzufangen. In Vorbereitung auf den Benchmark habe ich desweiteren begonnen einen Assembler zu schreiben da Toka unter meinem derzeitigen 64 Bit Linuxsystem nicht lauffähig ist und ich dringend ein Retroabbild brauche welches die Möglichkeiten der erweiterten VM auch nutzt.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-5215665850681764622?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/5215665850681764622/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=5215665850681764622' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/5215665850681764622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/5215665850681764622'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/08/status-update.html' title='Status update'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-1671951720163476575</id><published>2008-07-07T01:26:00.000-07:00</published><updated>2008-07-07T02:05:49.050-07:00</updated><title type='text'>just more optimations - noch mehr Optimierungen</title><content type='html'>&lt;p&gt;It is now possible to nesting SINST bytecodes. Thereby Lambda functions can be e.g. realized á la LISP for example. Besides I have begun to optimize the vm primitives as well as begin to write a documentation. Everything should be finished soon. Charles Childers extends the RETRO compiler, therefore it will slowly time for a comprehensive benchmark test. I want to refer here also different C based interpreters (Pforth, Gforth, Ficl, Lua, Parrot and so on) in order to see, how the new VM lets itself arrange within the results. &lt;/p&gt;&lt;p&gt;Es ist nun möglich innerhalb einer neuen Instruktion per SINST beliebig weitere zu definieren. Hierdurch lassen sich z.B. Lambda Funktionen á la LISP realisieren. Nebenbei habe ich damit begonnen den Instructionscode zu optimieren sowie eine Dokumentation zur neuen VM zu schreiben. Sollte alles bald fertig sein. Charles Childers  erweitert gerade den RETRO Compiler, folglich wird es langsam Zeit einen umfassenden Benchmarktest durchzuführen. Ich will hierbei auch andere C basierte Interpreter miteinbeziehen (Pforth, Gforth, Ficl, Lua, Parrot etc.) um zu sehen, wie sich die neue VM  innerhalb der Ergebnisse einordnen läßt.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-1671951720163476575?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/1671951720163476575/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=1671951720163476575' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/1671951720163476575'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/1671951720163476575'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/07/just-more-optimations-noch-mehr.html' title='just more optimations - noch mehr Optimierungen'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-4163108910184123444</id><published>2008-06-29T07:08:00.000-07:00</published><updated>2008-06-29T07:26:23.006-07:00</updated><title type='text'>Performance</title><content type='html'>Die neue VM ist nun soweit lauffähig und ich kann beginnen einen neuen Compiler für Retro zu konzipieren um die Möglichkeiten zur dynamischen Codegenerierung, direkten Rgisteradressierung etc. auch zu nutzen. Die bisherigen Benchmarktests zeigen eine weit höhere Performance als mit der alten Version allerdings musste ich mangels dynamischen Compiler den entsprechenden Bytecode selbst generieren. Ein zusätzlicher Vergleich mit GForth und Lina zeigte darüberhinaus einige interessante Ergebnisse, die ich bisher noch nicht sicher interpretieren kann bis auf die Aussage, der DTC Interpreter von GForth liegt von seiner Performance her im Grunde auf einem Niveau mit dem nichtoptimierten ITC Code von Lina (die Werte für Ngaro liegen 30-50% darüber) ?! Ich habe da einen gewissen Verdacht im Hinblick auf die Funktionsweise von TLB Caches ....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-4163108910184123444?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/4163108910184123444/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=4163108910184123444' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/4163108910184123444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/4163108910184123444'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/06/performance.html' title='Performance'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-5392341240468760114</id><published>2008-06-26T02:53:00.000-07:00</published><updated>2008-06-26T03:30:09.765-07:00</updated><title type='text'>Auf ein neues</title><content type='html'>&lt;p&gt;Zur Zeit teste ich die neue VM aus. Aus meinen ursprünglichen Plan ging ein vollkommen neuer Ansatz hervor. Anstatt den VM Code in etwas performanteres umzuwandeln  erweiterte ich einfach den Instruktionssatz um eine Instruktion welche eine beliebig lange Sequenz von seriellen Bytecodes in einen DTC Strom compiliert. Dies hat den Vorteil die bessere TLB Effizienz des TTC Interpreters bei Programmsprüngen mit der höheren Geschwindigkeit einer DTC Interpretation für serielle Codeabschnitte zu verbinden ohne prozessorspezifische Anpassungen vornehmen zu müssen. Des weiteren kann für weitere Optimierungen die entsprechende Routine einfach ersetzt werden ohne das der Interpreter weiter angepasst werden müsste. Es wäre z.B. möglich anstatt DTC die Bytecodesequenzen direkt in Maschinencode zu überführen.&lt;/p&gt;&lt;p&gt;Nachteilig ist allerdings, das der Retro Sourcecode angepasst werden muss um die neue Möglichkeit auch zu nutzen. Der Forth Interpreter kann theoretisch die Compilierung von Colon Definitionen ebenso wie ihre Ausführung zum Großteil an die VM delegieren. Entsprechende Wörter würden dann schlicht den Instruktionssatz der VM erweitern und die separate Interpretation (derzeit in Form von STC) könnte komplett entfallen...&lt;/p&gt;&lt;p&gt;At present I am debugging my new VM version. From my original plan followed a perfectly new beginning. Instead of compiling the VM code into something with a better performance I extended simply the instruction set by an instruction which convert serial sequences of byte codes into a DTC stream. This have the advantage to combine the better TLB efficiency of the TTC interpreter for jumps with the higher speed of DTC interpretation for serial code sections without processor-specific adjustments. Moreover the appropriate routine can be simply replaced without that the interpreter needs to be adapted for further optimizations. E.g. it would be possible for example to compile the byte code sequences directly in machine code without any side effect for the interpreter.&lt;/p&gt;&lt;p&gt;However,the retro source code need to be rewritten to use the new feature and I can't oversee the problems for Java and Javascript versions of ngaro to implementating a DTC compiler (is there any way? I doubt no).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-5392341240468760114?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/5392341240468760114/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=5392341240468760114' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/5392341240468760114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/5392341240468760114'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/06/auf-ein-neues.html' title='Auf ein neues'/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-2609106944323384338</id><published>2008-05-28T01:08:00.000-07:00</published><updated>2008-05-28T01:42:42.856-07:00</updated><title type='text'></title><content type='html'>Die Strategie für den neuen Interpreter sieht folgendermaßen aus:&lt;br /&gt;&lt;br /&gt;Dynamische Superinstruktionen:&lt;br /&gt;Zunächst erweitere ich die NEXT Routine um die Generierung von Maschinencodesequenzen welche aus den vorhandenen Intruktionscodes bezogen werden. Dies erfolgt solange bis eine Sprunginstruktion erreicht wird. Anschließend wird noch der Maschinencode zur Ausführung eben des Programmsprungs zugefügt und die so generierte Superinstruktion ausgeführt.  Als Folge gibt es nun zwei Klassen von Instruktionen: Stapelinstruktionen die den Programmzähler nicht verändern als Maschinencodevorlagen sowie Sprunginstruktionen die wie bisher interpretiert werden.&lt;br /&gt;&lt;br /&gt;Stapellose Instruktionen:&lt;br /&gt;Durch das cachen der beiden obersten Stapelelemente können alle bit, logischen sowie arithmetischen Operationen mit Varianten erweitert werden welche keine Stapeladressierung durchführen. Stapeladressierungen sind so minimierbar.&lt;br /&gt;&lt;br /&gt;Statische Superinstruktionen:&lt;br /&gt;Der Instruktionssatz wird erweitert indem jeweils  zwei Instruktionen zu einer neuen zusammengeführt werden.&lt;br /&gt;&lt;br /&gt;Die statischen wie nicht stapeladressierten Instruktionen müssen im Ngaro Image auch verwendet werden. Daher werde ich im letzten Schritt ein kleines Programm schreiben welches den Code in das neue Bytecodeformat transformiert. Bei der Gelegenheit kann dieser auch gleichzeitig auf mögliche Fehlerquellen überprüft werden.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-2609106944323384338?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/2609106944323384338/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=2609106944323384338' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/2609106944323384338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/2609106944323384338'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/05/die-strategie-fr-den-neuen-interpreter.html' title=''/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-252748366838692884.post-1743662283975171787</id><published>2008-05-26T08:59:00.000-07:00</published><updated>2008-05-26T09:13:09.911-07:00</updated><title type='text'></title><content type='html'>Nachdem nun die Benchmarksuite von Charles Childers fertig ist kann ich anfangen den Interpreter durch die dynamische Bildung von Superinstruktionen derart zu erweitern, daß die bisher notwendige Stapeladressierung minimiert werden kann. Der erste Schritt hierzu ist das cachen der obersten beiden Stapelelemente  durch Prozessorregister. Die derzeitige Performance fällt nach dem neuen Test so aus:&lt;br /&gt;&lt;br /&gt;With the new benchmark suite from Charles Childers i'am now prepared to extend the threading vm with the dynamic generation of super instructions. As a preparation of this the vm now caches the first two stack elements in registers. A side effect of this is the possibility to minimise stack accesses on the fly. The current state of the vm gives me the following performance hints:&lt;br /&gt;&lt;br /&gt;CPU: AMD Athlon 64 3500&lt;br /&gt;RAM: 512 MByte&lt;br /&gt;GCC: 4.1.2&lt;br /&gt;&lt;br /&gt;Switch based ngaro:&lt;br /&gt;&lt;br /&gt;Recursive FIB (39)... 134.540261 seconds&lt;br /&gt;Countdown Loop (1,000,000)... 8.096384 seconds&lt;br /&gt;Next/Unnest (256 million pairs)... 7.197207&lt;br /&gt;&lt;br /&gt;Threading version:&lt;br /&gt;&lt;br /&gt;Recursive FIB(39)... 76.825494 seconds&lt;br /&gt;Countdown Loop (1,000,000)... 4.962105&lt;br /&gt;Nest/Unnest (256 million pairs)... 4.793791&lt;br /&gt;&lt;br /&gt;I hope to reduce these times a lot more soon :D&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/252748366838692884-1743662283975171787?l=pnvn.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://pnvn.blogspot.com/feeds/1743662283975171787/comments/default' title='Kommentare zum Post'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=252748366838692884&amp;postID=1743662283975171787' title='0 Kommentare'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/1743662283975171787'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/252748366838692884/posts/default/1743662283975171787'/><link rel='alternate' type='text/html' href='http://pnvn.blogspot.com/2008/05/nachdem-nun-die-benchmarksuite-von.html' title=''/><author><name>Mat</name><uri>http://www.blogger.com/profile/11342379309761295064</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
