I had an opportunity to perform insertion of 700MM rows into Aurora Postgresql, for which performance insights are available. Turns out, that there are two stages of insert slowdown - first happens when max WAL buffers limit reached, second happens around 1 hour after.
The first stage cuts insert performance twice, and WALWrite lock is main bottleneck. I think WAL just can't sync changes log that fast, so it waits while older log entries are flushed. This creates both read and write IO.
The second stage is unique to Aurora/RDS and is characterized by excessive read data locks and total read IO. I couldn't figure out why does it read so much in a write only process, and AWS support didn't answer yet.
So, for you, try to throttle inserts so WAL is never overfilled and you don't experience WALWrite locks, and then increase wal buffers to max.