High Performance SSH/SCP

nickcw · 2025-12-16T16:51:38 1765903898

The fact that sftp is not the fastest protocol is well known by rclone users.

The main problem is that it packetizes the data and waits for responses, effectively re-implementing the TCP window inside a TCP stream. You can only have so many packets outstanding in the standard SFTP implementation (64 is the default) and the buffers are quite small (32k by default) which gives a total outstanding data of 2MB. The highest transfer rate you can make depends on the latency of the link. If you have 100 ms of latency then you can send at most 20 MB/s which is about 200 Mbit/s - nowhere near filling a fast wide pipe.

You can tweak the buffer size (up to 256k I think) and the number of outstanding requests, but you hit limits in the popular servers quite quickly.

To mitigate this rclone lets you do multipart concurrent uploads and downloads to sftp which means you can have multiple streams operating at 200 Mbit/s which helps.

The fastest protocols are the TLS/HTTP based ones which stream data. They open up the TCP window properly and the kernel and networking stack is well optimized for this use. Webdav is a good example.

rapier1 · 2025-12-16T20:33:05 1765917185

If you want to see the impact that the flow control buffer size has on OpenSSH I put up a graph based on data collected last week. Basically, it has a huge impact on throughput.

https://gist.github.com/rapier1/325de17bbb85f1ce663ccb866ce2...

adrian_b · 2025-12-16T19:14:12 1765912452

When you are limited to use SSH as the transport, you can still do better than using scp or sftp by using rsync with --rsh="ssh ...".

Besides being faster, with rsync and the right command options you can be certain that it makes exact file copies, together with any file metadata, even between different operating systems and file systems.

I have not checked if in recent years all the bugs of scp and sftp have been fixed, but some years ago there were cases when scp and sftp were losing silently, without warnings, some file metadata (e.g. high-precision timestamps, which were truncated, or extended file attributes).

I am using ssh every day, but there are decades since I have last used scp or sftp, with the exception of the cases when I have to connect to a server that I cannot control and where it happens that rsync is not installed. Even on such servers, if I may add an executable in my home directory, I first copy there an rsync with scp, then I do any other copies with that rsync.

riobard · 2025-12-16T17:22:34 1765905754

"(sftp) packetizes the data and waits for responses, effectively re-implementing the TCP window inside a TCP stream."

why is it designed this way? what problems it's supposed to solve?

nickcw · 2025-12-16T18:02:45 1765908165

Here is some speculation:

SFTP was designed as a remote file system system access protocol rather than transfer a single file like scp.

I suspect that the root of the problem is that SFTP works over a single SSH channel. SSH connections can have multiple channels but usually the server binds a single channel to a single executable so it makes sense to use only a single channel.

Everything flows from that decision - packetisation becomes necessary otherwise you have to wait for all the files to transfer before you can do anything else (eg list a directory) and that is no good for your remote filesystem access.

Perhaps the packets could have been streamed but the way it works is more like an RPC protocol with requests and responses. Each request has a serial number which is copied to the response. This means the client can have many requests in-flight.

There was a proposal for rclone to use scp for the data connections. So we'd use sftp for the day to day file listings, creating directories etc, but do actual file transfers with scp. Scp uses one SSH channel per file so doesn't suffer from the same problems as sftp. I think we abandoned that idea though as many sftp servers aren't configured with scp as well. Also modern versions of OpenSSH (OpenSSH 9.0 released April 2022) use SFTP instead of scp anyway. This was done to fix various vulnerabilities in scp as I understand.

charonn0 · 2025-12-16T18:21:02 1765909262

Notably, the SFTP specification was never completed. We're working off of draft specs, and presumably these issues wouldn't have made it into a final version.

Veserv · 2025-12-16T17:59:47 1765907987

Because that is a poor characterization of the problem.

It just has a in-flight message/queue limit like basically every other communication protocol. You can only buffer so many messages and space for responses until you run out of space. The problem there is just that the default amount of buffering is very low and is not adaptive to the available space/bandwidth.

rapier1 · 2025-12-16T18:04:27 1765908267

Yeah, it's an issue because there is also the per channel application layer flow control. So when you are using SFTP you have the TCP flow control, the SSH layer flow control, and then the SFTP flow control. The maximum receive buffer ends up being the minimum of all three. HPN-SSH (I'm the dev) normalizes the SSH layer flow control to the TCP receive buffer but we haven't done enough work on SFTP except to bump up the buffer size/outstanding requests. I need to determine if this is effective enough or if I need some dynamism in there as well.

adolph · 2025-12-16T18:03:07 1765908187

> The fastest protocols are the TLS/HTTP based ones which stream data.

I think maybe you are referring to QUIC [0]? It'd be interesting to see some userspace clients/servers for QUIC that compete with Aspera's FASP [1] and operate on a point to point basis like scp. Both use UDP to decrease the overhead of TCP.

0. https://en.wikipedia.org/wiki/QUIC

1. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol

Veserv · 2025-12-16T18:54:55 1765911295

Available QUIC implementations are very slow. MsQUIC is one of the fastest and can only reach a meager ~7 Gb/s [1]. Most commercial implementations sit in the 2-4 Gb/s range.

To be fair, that is not really a problem of the protocol, just the implementations. You can comfortably drive 10x that bandwidth with a reasonable design.

[1] https://microsoft.github.io/msquic/

rapier1 · 2025-12-16T18:07:21 1765908441

We've been looking at using QUIC as the transport layer in HPN-SSH. It's more of a pain that you might think because it breaks the SSH authentication paradigm and requires QUIC layer encryption - so a naive implementation would end up encrypting the data twice. I don't want to do that. Mostly what we are thinking about doing is changing the channel multiplexing for bulk data transfers in order to avoid the overhead and buffer issues. If we can rely entirely on TCP for that then we should get even better performance.

adolph · 2025-12-16T18:53:19 1765911199

Yeah, my naive implementation thought experiment was oriented towards a side channel brokered by the ssh connection using nginx and curl. Something like source opens nginx to share a file and tells sink via ssh to curl the file from source with a particular cert.

However, I observed that curl [0] uses openssl' quic implementation (for one of its experimental implementations). Another backend for curl is Quiche [1] which has client and server components already, has the userspace crypto etc. It's a little confusing to me, but CloudFlare also has a project quiche [2] which is a Rust crate with a CLI to share and consume files.

0. https://curl.se/docs/http3.html

1. https://github.com/google/quiche/tree/main/quiche/quic

2. https://github.com/cloudflare/quiche

nickcw · 2025-12-16T19:09:30 1765912170

Actually the fastest ones in my experience are the HTTP/1.x ones. HTTP/2 is generally slower in rclone though I think that is the fault of the stlib not opening more connections. I haven't really tried QUIC

I just think for streaming lots of data quickly HTTP/1.x plus TLS plus TCP has received many more engineering hours of optimization than any other combo.

formerly_proven · 2025-12-16T18:43:49 1765910629

Besides limiting the length and number of outstanding IO requests, SFTP also rides on top of SSH, which also has a limited window size.

josephg · 2025-12-16T11:01:13 1765882873

Any chance this work can be upstreamed into mainline SSH? I'd love to have better performance for SSH, but I'm probably not going to install and remember to use this just for the few times it would be relevant.

Bender · 2025-12-16T13:48:51 1765892931

I doubt this would ever be accepted upstream. That said if one wants speed play around with lftp [1]. It has a mirror subsystem that can replicate much of rsync functionality in a chroot sftp-only destination and can use multiple TCP/SFTP streams in a batch upload and per-file meaning one can saturate just about any upstream. I have used this for transferring massive postgres backups and then because I am paranoid when using applications that automatically multipart transfer files I include a checksum file for the source and then verify the destination files.

The only downside I have found using lftp is that given there is no corresponding daemon for rsync on the destination then directory enumeration can be slow if there are a lot of nested sub-directories. Oh and the syntax is a little odd for me anyway. I always have to look at my existing scripts when setting up new automation.

Demo to play with, download only. Try different values. This will be faster on your servers, especially anything within the data-center.

    ssh mirror@mirror.newsdump.org # do this once to accept key as ssh-keyscan will choke on my big banner

    mkdir -p /dev/shm/test && cd /dev/shm/test

    lftp -u mirror, -e "mirror --parallel=4 --use-pget=8 --no-perms --verbose /pub/big_file_test/ /dev/shm/test;bye" sftp://mirror.newsdump.org

For automation add --loop to repeat job until nothing has changed.

[1] - https://linux.die.net/man/1/lftp

chasil · 2025-12-16T15:56:30 1765900590

The normal answer that I have heard to the performance problems in the conversion from scp to sftp is to use rsync.

The design of sftp is such that it cannot exploit "TCP sliding windows" to maximize bandwidth on high-latency connections. Thus, the migration from scp to sftp has involved a performance loss, which is well-known.

https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers...

The rsync question is not a workable answer, as OpenBSD has reimplemented the rsync protocol in a new codebase:

https://www.openrsync.org/

An attempt to combine the BSD-licensed rsync with OpenSSH would likely see it stripped out of GPL-focused implementations, where the original GPL release has long standing.

It would be more straightforward to design a new SFTP implementation that implements sliding windows.

I understand (but have not measured) that forcibly reverting to the original scp protocol will also raise performance in high-latency conditions. This does introduce an attack surface, should not be the default transfer tool, and demands thoughtful care.

https://lwn.net/Articles/835962/

Bender · 2025-12-16T17:09:36 1765904976

I included LFTP using mirror+sftp in my example as it is the secure way to give less than trusted people access to files and one can work around the lack of sliding windows by spawning as many TCP flows as one wishes with LFTP. I would love to see SFTP evolve to use sliding windows but for now using it in the data-center or over WAN accelerated links is still fast.

Rsync is great when moving files between trusted systems that one has a shell on but the downside is that rsync can not split up files into multiple streams so there is still a limit based on source+dest buffer+rtt and one has to provide people a shell or add some clunky way to prevent a shell by using wrappers unless using native rsync port 873 which is not encrypted. Some people break up jobs on the client side and spawn multiple rsync jobs in the background. It appears that openrsync is still very much work in progress.

SCP is being or has been deprecated but the binaries still exist for now. People will have to hold onto old binaries and should probably static compile them as the linked libraries will likely go away at some point.

chasil · 2025-12-16T17:44:43 1765907083

The scp program switched to calling sftp as the server in OpenSSH version 8.9, and notably Windows is now running 9.5, so large segments of scp users are now invoking sftp behind the scenes.

If you want to use the historic scp server instead, a command line option is provided to allow this:

"In case of incompatibility, the scp(1) client may be instructed to use the legacy scp/rcp using the -O flag."

https://www.openssh.org/releasenotes.html

The old scp behavior hasn't been removed, but you need to specifically request it. It is not the default.

It would seem to me that an alternate invocation for file transfer could be tested against sftp in high latency situations:

  ssh yourhost 'cat somefile' > somefile

That would be slightly faster than tar, which adds some overhead. Using tar on both sides would allow transfers of special files, soft links, and retain hard links, which neither scp nor sftp will do.

  ssh yourhost 'tar cf - yourdir' | tar xpf -

Windows has also recently added a tar command.

rapier1 · 2025-12-16T20:28:58 1765916938

Keep in mind that SCP/SSH might be faster in some cases than SFTP but in both cases it is still limited to a 2MB application layer receive window which is drastically undersized in a lot of situations. It doesn't matter what the TCP window is set to because the OpenSSH window overrides that value. Basically, if your bandwidth delay product is more than 2MB (e.g. 1gbps @ 17ms RTT) you're going to be application limited by OpenSSH. HPN-SSH gets most of the performance benefit by normalizing the application layer receive window to the TCP receive window (up to 128MB). In some cases you'll see 100X throughput improvement on well tuned hosts on a high delay path.

If your BDP is less than 2MB you still might get some benefit if you are CPU limited and use the parallel ciphers. However, the fastest cipher is AES-GCM and we haven't parallelized that as of yet (that's next on the list).

rapier1 · 2025-12-16T17:49:37 1765907377

Rsync commonly uses SSH as the transport layer so it won't necessarily be any faster than SFTP unless you are using the rsync daemon (usually on port 873). However, the rsync daemon won't provide any encryption and I can't suggest using it unless it's on a private network.

zenoprax · 2025-12-16T15:37:56 1765899476

Wow, I hadn't heard of this before. You're saying it can "chunk" large files when operating against a remote sftp-subsystem (OpenSSH)?

I often find myself needing to move a single large file rather than many smaller ones but TCP overhead and latency will always keep speeds down.

Bender · 2025-12-16T17:11:12 1765905072

Not every OS or every SSH daemon support byte ranges but most up to date Linux systems and OpenSSH absolutely support it. One should not assume this exists on legacy systems and daemons.

fanf2 · 2025-12-16T17:23:19 1765905799

Byte ranges are the only way to access files over sftp. Look at the read and write requests in https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filex...

Bender · 2025-12-16T17:51:53 1765907513

I agree but there are legacy daemons that do not follow the spec. Most here will never see them in their lifetime but I had to deal with it in the financial world. People would be amazed and terrified at all the old non-standard crap that their payroll data is flying across. They just ignore the range and send the entire file. I am happy to not have to deal with that any more.

aidenn0 · 2025-12-16T15:42:42 1765899762

I use lftp a lot because of it's better UI compared to sftp. However, for large files, even with scp I can pin GigE with an old Xeon-D system acting as a server.

harvie · 2025-12-16T11:26:23 1765884383

Also upstream is extremely well audited. That's a huge benefit i don't want to loose by using fork.

rapier1 · 2025-12-16T16:36:41 1765903001

I do want to say that HPN-SSH is also well audited; you can see the results of CI tests on the github. We also do fuzz testing, static analysis, extensive code reviews, and functionality testing. We build directly on top of OpenSSH and work with them when we can. We don't touch the authentication code and the parallel ciphers are built directly on top of OpenSSL.

I've been developing it for 20+ years and if you have any specific questions I'd be happy to answer them.

Bad_CRC · 2025-12-16T11:55:14 1765886114

this, I'm not going to start using a random ssh fork with modified ciphers.

Zambyte · 2025-12-16T13:25:10 1765891510

It may still be sensible if you only expose it to private networks.

bomewish · 2025-12-16T13:34:21 1765892061

So could this safely be used on Tailscale then ? I’m very curious though also a bit paranoid.

messe · 2025-12-16T14:14:13 1765894453

> So could this safely be used on Tailscale then ? I’m very curious though also a bit paranoid.

You may as well just use tailscale ssh in that case. It already disables ssh encryption because your connection is encrypted with WireGuard anyway.

gear54rus · 2025-12-16T13:39:03 1765892343

It could safely be used on public internet, all this fearmongering has no basis under it.

Better question is 'does it have any actual improvements in day-to-day operations'? Because it seems like it mostly changes up some ciphering which is already very fast.

yjftsjthsd-h · 2025-12-16T15:20:26 1765898426

> It could safely be used on public internet, all this fearmongering has no basis under it.

On what basis are making that claim? Because AFAICT, concern about it being less secure is entirely reasonable and is one of the big caveats to it.

rapier1 · 2025-12-16T16:37:48 1765903068

Concern about it being less secure is fully justified. I'm the lead developer and have been for the past 20 years. I'm happy to answer any questions you might happen to have.

Zambyte · 2025-12-16T13:52:37 1765893157

I'm not fear mongering. I'm just saying

- IF you don't trust it

- AND you want to use it

=> run it on a private network

You don't have to trust it for security to use it. Putting services on secure networks when the public doesn't need access is standard practice.

joecool1029 · 2025-12-16T16:46:20 1765903580

I remember the last time I really cared to look into this was in the 2000’s, I had these wdtv embedded boxes that had a super anemic cpu that doing local copies with scp was slow as hell from the cipher overhead. I believe at the time it was possible to disable ciphers in scp but it was still slower than smbfs. NFS was to be avoided as wifi was shit then and losing connection meant risking system locking up. This of course was local LAN so I did not really care about encryption.

But I don’t miss having those limitations.

rapier1 · 2025-12-16T20:38:12 1765917492

It's still possible but we only suggest doing it on private known secure networks or when it's data you don't care about. Authentication is still fully encrypted - we just rekey post authentication with a null cipher.

emilfihlman · 2025-12-16T15:22:32 1765898552

lose*

rapier1 · 2025-12-16T16:44:31 1765903471

I'm the lead developer. I can go into this a bit more when I get from an appointment if people are interested.

joecool1029 · 2025-12-16T16:59:16 1765904356

I’m interested. Mainly to update the documentation on it for Gentoo, people have asked about it over the years. Also, TIL it appears HN has a sort of account dormancy status it appears you are in.

rapier1 · 2025-12-16T17:53:47 1765907627

For Gentoo I should put you in touch with my co-developer. He's active in Gentoo and has been maintaining a port for it. I'll point him at this conversation. That said, documentation wise, the HPN-README goes into a lot of detail about the HPN-SSH specific changes. I should point out that while HPN-SSH is a fork we follow OpenSSH. Whenever they come out with a new release we come out with one that incorporates their changes - usually we get this out in about a week.

Almondsetat · 2025-12-16T11:30:26 1765884626

OpenSSH is from the people at OpenBSD, which means performance improvements have to be carefully vetted against bugs, and, judging by the fact that they're still on fastfs and the lack of TRIM in 2025, that will not happen.

wahern · 2025-12-16T14:02:43 1765893763

There's nothing inherently slow about UFS2; the theoretical performance profile should be nearly identical to Ext4. For basic filesystem operations UFS2 and Ext4 will often be faster than more modern filesystems.

OpenBSD's filesystem operations are slow not because of UFS2, but because they simply haven't been optimized up-and-down the stack the way Ext4 has been Linux or UFS2 on FreeBSD. And of course, OpenBSD's implementation doesn't have a journal (both UFS and Ext had journaling bolted late in life) so filesystem checks (triggered on an unclean shutdown or after N boots) can take a long time, which often cause people to think their system has frozen or didn't come up. That user interface problem notwithstanding, UFS2 is extremely robust. OpenBSD is very conservative about optimizations, especially when they increase code complexity, and particularly for subsystems where the project doesn't have time available to give it the necessary attention.

chasil · 2025-12-16T16:22:30 1765902150

OpenBSD UFS did have "soft updates" which were some kind of alternative to journaling.

I believe that these were recently removed. Perhaps they don't play well with SMP.

throw0101d · 2025-12-16T17:13:42 1765905222

McKusick, who wrote the original BSD FFS, also later came up with SU:

* https://en.wikipedia.org/wiki/Soft_updates

frantathefranta · 2025-12-16T14:22:00 1765894920

I admittedly don't really know how SSH is built but it looks to me like the patch that "makes" it HPN-SSH is already present upstream[1], it's just not applied by default? Nixpkgs seems to allow you to build the pkg with the patch [2].

[1] https://github.com/freebsd/freebsd-ports/blob/main/security/...

[2] https://github.com/NixOS/nixpkgs/blob/d85ef06512a3afbd6f9082...

yjftsjthsd-h · 2025-12-16T15:18:23 1765898303

Upstream is either OpenBSD itself or https://github.com/openssh/openssh-portable , not the FreeBSD port. I'm... not sure why nix is pulling the patch from FreeBSD, that's odd.

gorgoiler · 2025-12-16T16:29:00 1765902540

There’s a third party ZFS utility (zrepl, I think) that solves this in a nice way: ssh is used as a control channel to coordinate a new TLS connection over which the actual data is sent. It is considerably faster, apparently.

suprjami · 2025-12-16T12:38:29 1765888709

Unlikely. These patches have been carried out-of-tree for over a decade precisely because upstream OpenSSH won't accept them.

rapier1 · 2025-12-16T17:58:58 1765907938

More than 2 decades at this point. The primary reasons is that the full patch set would be a burden for them to integrate and they don't prioritize performance for bulk data transfers. Which is perfectly understandable from their perspective. HPN-SSH builds on the expertise of OpenSSH and we follow their work closely - when they make a new release we incorporate it and follow with our own release inside of a week or two (depending on how long the code review and functionality/regression testing takes). We focus on throughput performance which involves receive buffer normalization, private key cipher speed, code optimization, and so forth. We tend to stay clear of anything involve authentication and we never roll our own when it comes to the ciphers.

hsbauauvhabzb · 2025-12-16T13:05:51 1765890351

Depending on your hardware architecture and security needs, fiddling with ciphers in mainline might improve speed.

joecool1029 · 2025-12-16T16:40:08 1765903208

This has been around for years (like at least mid-2000’s). Gentoo used to have this patchset available as a USE flag on net-misc/openssh, but some time ago it was moved to net-misc/openssh-contrib (also configurable by useflag).

There are some minor usability bugs and I think both endpoints need to have it installed to take advantage. I remember asking ages ago why it wasn’t upstreamed, there were reasons…

rapier1 · 2025-12-16T18:12:00 1765908720

to be honest, there was a period of time in about 2010 or 2012 where I simply wasn't maintaining it as well as I should have been. I wouldn't have upstreamed it then either. That's changed a lot since then.

As an aside - you only really need HPN-SSH on the receiving side of the bulk data to get the buffer normalization performance benefits. It turns out the bottleneck is almost entirely on the receiver and the client will send out data as quickly as you like. At least it was like that until OpenSSH 8.8. At that point changes were made where the client would crash if the send buffer exceeded 16MB. So we had to limit OpenSSH to HPN-SSH flows to a maximum of 16MB receive space. Which is annoying but that's still going to be a win for a lot of users.

nhatcher · 2025-12-16T14:55:35 1765896935

If folks find this interesting, maybe also mosh[1] is for you. Different trade offs.

[1]: https://mosh.org/

freedomben · 2025-12-16T14:59:31 1765897171

This is cool very cool and I think I'll give it a try, though I'm wary about using a forked SSH so would love to see things land upstream.

I've been using mosh now for over a decade and it is amazing. Add on rsync for file transfers and I've felt pretty set. If you haven't checked out mosh, you should definitely do so!

chrisweekly · 2025-12-16T15:04:12 1765897452

wary (cautious or skeptical), not weary (tired)

actionfromafar · 2025-12-16T15:07:22 1765897642

At this point, maybe both. :) Can we have a portmanteau?

freedomben · 2025-12-16T15:08:06 1765897686

Indeed! I meant wary, but both kind of fit :-D

freedomben · 2025-12-16T15:07:40 1765897660

Heh, thank you! Edited

ollybee · 2025-12-16T15:00:01 1765897201

It's not clear if you need it on both ends to get an advantage?

rapier1 · 2025-12-16T16:40:31 1765903231

The bottleneck in SSH is entirely on the receiving side. So as long at the receiver is using HPN-SSH you will see some performance improvements if the BDP of the path exceeds 2MB. Note: because of changes made to OpenSSH in 8.8 the maximum buffer with OpenSSH as the sender is 16MB. In an HPN to HPN connection that maximum receive buffer is 128MB.

baden1927 · 2025-12-16T15:06:01 1765897561

The contracting activity in terms of rsync and async, where SFTP is secure tunneling, either with SSH or OpenSSH, which -p flag specifies as the port: 22, but /ssh/.configuring 10901 works for TCP.

tristor · 2025-12-16T15:40:58 1765899658

I don't think it comes as a surprise that you can improve performance by re-implementing ciphers, but what is the security trade-off? Many times, well audited implementations of ciphers are intentionally less performant in order to operate in constant time and avoid side channel attacks. Is it even possible to do constant time operations while being multithreaded?

The only change I see here that is probably harmless and a speed boost is using AES-NI for AES-CTR. This should probably be an upstream patch. The rest is more iffy.

rapier1 · 2025-12-16T16:43:49 1765903429

The parallel ciphers are built using OpenSSL primitives. We aren't reimplementing the cipher itself in anyway. Since counter ciphers use an atomically increasing counter you can precompute the blocks in advance. Which is what we do - we have a cache of ketstream data that is precomputed and we pull the correct block off as needed - this gets around the need to have the application compute the blocks serially which can be a bottleneck at higher throughput rates.

The main performance improvement is from the buffer normalization. This can provide, on the right path, a 100x improvement in throughput performance without any compromise in security.