TCL Count Words
I came across this article the other day on Hacker News: https://benhoyt.com/writings/count-words/#performance-results-and-learnings
The article describes a simple problem which was implememnted in multiple languages as a basic comparision.
I took a look at the TCL solution, and found that there was a simple optimization that results in about a 2x performance improvement. TCL is much faster if code is placed in a proc and then run- this is described in the TCL Wiki Performance page under "Put Everything in a proc". This would put TCL between Ruby and AWK, instead of between Common Lisp and Haskell.
I put the result as simple.tcl in my fork. I also wrote a simple 'optimzed.tcl' which just uses dicts instead of arrays, which happen to be slightly faster for this problem.
I tried a few other concepts such as different amounts of buffers, with no results. I would guess that default buffering by TCL is good enough and no further tweaking was going to help.
For fun, I also write this version using an in-memory sql database. The result was many times slower. This is clearly not a good idea, but I had to wonder whether native data structures would win or a database. It turns out, at least for such a simple query, the database is a huge loss.
My SQL is not good, so perhaps someone could do better. The one thing I do know how to do is to wrap things in a transaction, which was a big performance boost, but not enough to make this a reasonable solution.
#!/usr/bin/env tclsh package require sqlite3 proc main {} { fconfigure stdin -buffering line sqlite3 db ":memory:" db eval { create table counts(word text primary key, count int default 1) } db eval { begin transaction } while {[gets stdin data] >= 0} { foreach word [split [string tolower $data]] { db eval { INSERT INTO counts(word) VALUES($word) ON CONFLICT(word) DO UPDATE SET count=count+1; } } } db eval { select word,count from counts } { puts "$word $count" } db eval { end transaction } } main
If I ever revisit this, it would be cool to try out a Forth solution. I'm surprised to see Forth so low on the list, and I can't help but wonder if a better solution could bring it at least near Python if not higher.