Opened 12 months ago
#865 new defect
TCP is much slower than expected
Reported by: | Jiri Svoboda | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | helenos/unspecified | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
TCP transfer rate is *very* low, much lower than expected (more than 16-times). In a nutshell, I would expect up to ~3 MB/s on my machine, but I get ~200 kB/s. Note that this enigmatic problem exists since the beginning of the TCP implementation.
Internal TCP transfer rate between two HelenOS tasks is almost exactly twice as slow as rate of transferring TCP out from HelenOS or into HelenOS from outside. This suggests that it's HelenOS TCP that incurs the penalty.
I also seem to remember that if we transfer 2x larger TCP PDUs, we get 2x the transfer rate.
I tried disabling retransmission timers to see if they had an effect. This improved the transfer rate slightly, but not significantly.
The fact that the network stack transfers data using IPC writes incurs an upper limit on the transfer rate, but it is significantly larger.
See below for detailed measurements of rate of transferring out of HelenOS, into HelenOS, between two tasks in HelenOS, plus IPC baseline transfer rate measurement.
Upload to host rate
method:
# mkfile -p -s 16m /data/web/test # websrv $ wget http://127.0.0.1/test [read kB/s from wget]
ns_ping gives 46000 ops/s
4 hops: websrv→tcp, tcp→ip, ip→ethip, ethip→e1k
data is sent in 1024B segments (based on BUFFER_SIZE in websrv)
baseline: 210 kB/s
no FS: 261 kB/s
no FS + no RT: 280-300 kB/S
Download from host rate
method:
$ cd /tmp $ dd if=/dev/zero of=test bs=1024k count=16 $ cat >lighttpd.conf server.document-root = "/tmp" server.port = 3000 $ lighttpd -D -f lighttpd.conf # cd /tmp # download -o test http://<host-ip>:3000/test [measure number of seconds taken and compute transfer rate]
no FS + no RT: 290 kB/S
Internal OS transfer rate
method:
# mkfile -p -s 8m /data/web/test # websrv # download -o /tmp/test http://127.0.0.1:8080/test [measure time taken in seconds, compute transfer rate]
baseline: 145 kB/s
[note: one half upload/download, exactly to measuring precision!]
IPC 1kB block transfer rate
hbench write1k: 13.5 MiB / s
method:
# /srv/test/ipc-test # hbench write1k
nic→ethip→ip→tcp→app = 4 hops
4 hop transfer rate estimate: 13.5 MiB / s / 4 = 3.375 MiB / s
3.375 / 0.21 = 16.875
So we are at least 16 times slower than what is the limit based on the
method used to transfer data via IPC.