ever-growing UDP socket buffer memory on 127.0.0.1 in Linux

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Wed Oct 23, 2019 9:25 pm

Hi there, I'm facing an issue running a 3-node hyperconverged setup of the StarWind vSAN Linux VSA on vSphere, ESXI 6. I've noticed that on the 2 nodes where I have HA devices running, netstat indicates that a number of UDP sockets on 127.0.0.1 have an ever-growing/increasing amount of Recv-Q bytes. Odd thing is that I have configured StarWind.cfg file connections to use non-localhost IP addresses so I'm not even sure why these sockets are getting spawned in the first place.

This can be reproduced on both of my HA device nodes by running `ss -lp --udp`, pause, then execute it again and you'll see the numbers increasing in the Recv-Q column for a bunch of local addresses on host 127.0.0.1 and a high port number. The process owner appears to be wineserver/StarWindService.

Code: Select all

State       Recv-Q Send-Q                                                                                    Local Address:Port                                                                                                     Peer Address:Port                
UNCONN      0      0                                                                                                     *:64669                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=482),("StarWindService",pid=10944,fd=616),("StarWindService",pid=10944,fd=610))
UNCONN      391680 0                                                                                             127.0.0.1:40110                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=184),("StarWindService",pid=10944,fd=199),("StarWindService",pid=10944,fd=197))
UNCONN      0      0                                                                                                     *:48466                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=199),("StarWindService",pid=10944,fd=217),("StarWindService",pid=10944,fd=209))
UNCONN      0      0                                                                                                     *:40324                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=277),("StarWindService",pid=10944,fd=334),("StarWindService",pid=10944,fd=330))
UNCONN      0      0                                                                                                     *:40970                                                                                                               *:*                     users:(("avahi-daemon",pid=9822,fd=13))
UNCONN      5423616 0                                                                                             127.0.0.1:16450                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=313),("StarWindService",pid=10944,fd=385),("StarWindService",pid=10944,fd=383))
UNCONN      0      0                                                                                                     *:49361                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=373),("StarWindService",pid=10944,fd=464),("StarWindService",pid=10944,fd=457))
UNCONN      0      0                                                                                             127.0.0.1:323                                                                                                                 *:*                     users:(("chronyd",pid=9905,fd=1))
UNCONN      4976640 0                                                                                             127.0.0.1:49885                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=389),("StarWindService",pid=10944,fd=517),("StarWindService",pid=10944,fd=505))
UNCONN      0      0                                                                                                     *:25924                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=207),("StarWindService",pid=10944,fd=234),("StarWindService",pid=10944,fd=227))
UNCONN      0      0                                                                                                     *:26078                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=476),("StarWindService",pid=10944,fd=612),("StarWindService",pid=10944,fd=607))
UNCONN      0      0                                                                                                     *:51217                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=376),("StarWindService",pid=10944,fd=468),("StarWindService",pid=10944,fd=458))
UNCONN      0      0                                                                                                     *:26835                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=390),("StarWindService",pid=10944,fd=494),("StarWindService",pid=10944,fd=486))
UNCONN      5053440 0                                                                                             127.0.0.1:35166                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=326),("StarWindService",pid=10944,fd=403),("StarWindService",pid=10944,fd=402))
UNCONN      1708800 0                                                                                             127.0.0.1:10850                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=280),("StarWindService",pid=10944,fd=335),("StarWindService",pid=10944,fd=333))
UNCONN      372480 0                                                                                             127.0.0.1:43680                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=466),("StarWindService",pid=10944,fd=595),("StarWindService",pid=10944,fd=593))
UNCONN      5509632 0                                                                                             127.0.0.1:11084                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=194),("StarWindService",pid=10944,fd=220),("StarWindService",pid=10944,fd=216))
UNCONN      0      0                                                                                                     *:60357                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=406),("StarWindService",pid=10944,fd=516),("StarWindService",pid=10944,fd=490))
UNCONN      5004288 0                                                                                             127.0.0.1:44073                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=391),("StarWindService",pid=10944,fd=495),("StarWindService",pid=10944,fd=493))
UNCONN      0      0                                                                                                     *:winshadow                                                                                                           *:*                     users:(("wineserver",pid=10946,fd=505),("StarWindService",pid=10944,fd=648),("StarWindService",pid=10944,fd=647))
UNCONN      371712 0                                                                                             127.0.0.1:19805                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=378),("StarWindService",pid=10944,fd=469),("StarWindService",pid=10944,fd=467))
UNCONN      0      0                                                                                                     *:44450                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=183),("StarWindService",pid=10944,fd=198),("StarWindService",pid=10944,fd=192))
UNCONN      398592 0                                                                                             127.0.0.1:52755                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=457),("StarWindService",pid=10944,fd=585),("StarWindService",pid=10944,fd=583))
UNCONN      0      0                                                                                                     *:61741                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=311),("StarWindService",pid=10944,fd=384),("StarWindService",pid=10944,fd=378))
UNCONN      0      0                                                                                                     *:29007                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=285),("StarWindService",pid=10944,fd=342),("StarWindService",pid=10944,fd=325))
UNCONN      0      0                                                                                                     *:21086                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=465),("StarWindService",pid=10944,fd=594),("StarWindService",pid=10944,fd=590))
UNCONN      0      0                                                                                                     *:mdns                                                                                                                *:*                     users:(("avahi-daemon",pid=9822,fd=12))
UNCONN      0      0                                                                                                     *:54575                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=193),("StarWindService",pid=10944,fd=212),("StarWindService",pid=10944,fd=207))
UNCONN      0      0                                                                                                     *:21978                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=325),("StarWindService",pid=10944,fd=399),("StarWindService",pid=10944,fd=391))
UNCONN      0      0                                                                                                     *:38428                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=456),("StarWindService",pid=10944,fd=580),("StarWindService",pid=10944,fd=577))
UNCONN      371712 0                                                                                             127.0.0.1:46776                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=286),("StarWindService",pid=10944,fd=343),("StarWindService",pid=10944,fd=339))
UNCONN      372480 0                                                                                             127.0.0.1:31041                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=195),("StarWindService",pid=10944,fd=213),("StarWindService",pid=10944,fd=211))
UNCONN      9124608 0                                                                                             127.0.0.1:22911                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=374),("StarWindService",pid=10944,fd=465),("StarWindService",pid=10944,fd=461))
UNCONN      3816960 0                                                                                             127.0.0.1:55975                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=208),("StarWindService",pid=10944,fd=235),("StarWindService",pid=10944,fd=231))
UNCONN      5211648 0                                                                                             127.0.0.1:23318                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=483),("StarWindService",pid=10944,fd=617),("StarWindService",pid=10944,fd=615))
UNCONN      5137920 0                                                                                             127.0.0.1:15239                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=479),("StarWindService",pid=10944,fd=613),("StarWindService",pid=10944,fd=611))
then a few seconds later:

Code: Select all

State       Recv-Q Send-Q                                                                                    Local Address:Port                                                                                                     Peer Address:Port                
UNCONN      0      0                                                                                                     *:64669                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=482),("StarWindService",pid=10944,fd=616),("StarWindService",pid=10944,fd=610))
UNCONN      432384 0                                                                                             127.0.0.1:40110                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=184),("StarWindService",pid=10944,fd=199),("StarWindService",pid=10944,fd=197))
UNCONN      0      0                                                                                                     *:48466                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=199),("StarWindService",pid=10944,fd=217),("StarWindService",pid=10944,fd=209))
UNCONN      0      0                                                                                                     *:40324                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=277),("StarWindService",pid=10944,fd=334),("StarWindService",pid=10944,fd=330))
UNCONN      0      0                                                                                                     *:40970                                                                                                               *:*                     users:(("avahi-daemon",pid=9822,fd=13))
UNCONN      5667840 0                                                                                             127.0.0.1:16450                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=313),("StarWindService",pid=10944,fd=385),("StarWindService",pid=10944,fd=383))
UNCONN      0      0                                                                                                     *:49361                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=373),("StarWindService",pid=10944,fd=464),("StarWindService",pid=10944,fd=457))
UNCONN      0      0                                                                                             127.0.0.1:323                                                                                                                 *:*                     users:(("chronyd",pid=9905,fd=1))
UNCONN      5220864 0                                                                                             127.0.0.1:49885                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=389),("StarWindService",pid=10944,fd=517),("StarWindService",pid=10944,fd=505))
UNCONN      0      0                                                                                                     *:25924                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=207),("StarWindService",pid=10944,fd=234),("StarWindService",pid=10944,fd=227))
UNCONN      0      0                                                                                                     *:26078                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=476),("StarWindService",pid=10944,fd=612),("StarWindService",pid=10944,fd=607))
UNCONN      0      0                                                                                                     *:51217                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=376),("StarWindService",pid=10944,fd=468),("StarWindService",pid=10944,fd=458))
UNCONN      0      0                                                                                                     *:26835                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=390),("StarWindService",pid=10944,fd=494),("StarWindService",pid=10944,fd=486))
UNCONN      5297664 0                                                                                             127.0.0.1:35166                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=326),("StarWindService",pid=10944,fd=403),("StarWindService",pid=10944,fd=402))
UNCONN      1924608 0                                                                                             127.0.0.1:10850                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=280),("StarWindService",pid=10944,fd=335),("StarWindService",pid=10944,fd=333))
UNCONN      413184 0                                                                                             127.0.0.1:43680                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=466),("StarWindService",pid=10944,fd=595),("StarWindService",pid=10944,fd=593))
UNCONN      5753856 0                                                                                             127.0.0.1:11084                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=194),("StarWindService",pid=10944,fd=220),("StarWindService",pid=10944,fd=216))
UNCONN      0      0                                                                                                     *:60357                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=406),("StarWindService",pid=10944,fd=516),("StarWindService",pid=10944,fd=490))
UNCONN      5248512 0                                                                                             127.0.0.1:44073                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=391),("StarWindService",pid=10944,fd=495),("StarWindService",pid=10944,fd=493))
UNCONN      0      0                                                                                                     *:winshadow                                                                                                           *:*                     users:(("wineserver",pid=10946,fd=505),("StarWindService",pid=10944,fd=648),("StarWindService",pid=10944,fd=647))
UNCONN      412416 0                                                                                             127.0.0.1:19805                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=378),("StarWindService",pid=10944,fd=469),("StarWindService",pid=10944,fd=467))
UNCONN      0      0                                                                                                     *:44450                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=183),("StarWindService",pid=10944,fd=198),("StarWindService",pid=10944,fd=192))
UNCONN      439296 0                                                                                             127.0.0.1:52755                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=457),("StarWindService",pid=10944,fd=585),("StarWindService",pid=10944,fd=583))
UNCONN      0      0                                                                                                     *:61741                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=311),("StarWindService",pid=10944,fd=384),("StarWindService",pid=10944,fd=378))
UNCONN      0      0                                                                                                     *:29007                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=285),("StarWindService",pid=10944,fd=342),("StarWindService",pid=10944,fd=325))
UNCONN      0      0                                                                                                     *:21086                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=465),("StarWindService",pid=10944,fd=594),("StarWindService",pid=10944,fd=590))
UNCONN      0      0                                                                                                     *:mdns                                                                                                                *:*                     users:(("avahi-daemon",pid=9822,fd=12))
UNCONN      0      0                                                                                                     *:54575                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=193),("StarWindService",pid=10944,fd=212),("StarWindService",pid=10944,fd=207))
UNCONN      0      0                                                                                                     *:21978                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=325),("StarWindService",pid=10944,fd=399),("StarWindService",pid=10944,fd=391))
UNCONN      0      0                                                                                                     *:38428                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=456),("StarWindService",pid=10944,fd=580),("StarWindService",pid=10944,fd=577))
UNCONN      412416 0                                                                                             127.0.0.1:46776                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=286),("StarWindService",pid=10944,fd=343),("StarWindService",pid=10944,fd=339))
UNCONN      413184 0                                                                                             127.0.0.1:31041                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=195),("StarWindService",pid=10944,fd=213),("StarWindService",pid=10944,fd=211))
UNCONN      10227456 0                                                                                             127.0.0.1:22911                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=374),("StarWindService",pid=10944,fd=465),("StarWindService",pid=10944,fd=461))
UNCONN      4061184 0                                                                                             127.0.0.1:55975                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=208),("StarWindService",pid=10944,fd=235),("StarWindService",pid=10944,fd=231))
UNCONN      5455872 0                                                                                             127.0.0.1:23318                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=483),("StarWindService",pid=10944,fd=617),("StarWindService",pid=10944,fd=615))
UNCONN      5382144 0                                                                                             127.0.0.1:15239                                                                                                               *:*                     users:(("wineserver",pid=10946,fd=479),("StarWindService",pid=10944,fd=613),("StarWindService",pid=10944,fd=611))
Firewall are disabled. This is the most recent version of the Linux VSA from Oct 2019.

Why is this happening? I've probably configured something incorrectly, but I have run out of ideas to troubleshoot it on this end. It is a problem because it eventually leads to socket buffer receive errors on the StarWind VSA guests, which ... seems like a bad thing.

Is there a way to eliminate the issue?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Oct 24, 2019 9:09 am

I would like to request logs to be collected from your system using the log collection script available in /opt (check https://knowledgebase.starwindsoftware. ... collector/ for steps).
You can upload them to your file sharing service of choice and PM me the link to download. We will check everything on our side.
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Thu Oct 24, 2019 10:19 am

Hi Boris, of course. I've collected the logs and submitted the link to them via PM. Thank you for looking into this so quickly.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Oct 24, 2019 11:24 am

Logs are transferred to R&D. Will keep the thread updated.
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Thu Oct 24, 2019 12:22 pm

Thank you, Boris. I am happy to provide any additional context and details as needed.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Oct 24, 2019 12:59 pm

Sure, I will let you know if any additional info is required.
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Tue Oct 29, 2019 5:18 pm

Hi Boris, wanted to check in here. Is there anything I can do to help?
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Fri Nov 01, 2019 1:08 pm

Boris, I wanted to share a couple updates.

First, it looks like the UDP socket buffer memory actually plateaus in my environment at the following values:
  1. node 1 (contains HA devices) : 512.1 MiB
  2. node 2 (contains HA devices) : 512.7 MiB
  3. node 3 (does not contain any HA devices; image devices only) : 456 KiB
Second, I've been thinking a lot about this and wanted to share a couple theories I've developed about this issue:

--------

Theory 1 is that one of the adjustments I made in an attempt to eliminate this issue was to fiddle with the OS kernel tuning. I've both activated the "network-throughput" tuned.service profile as well as tweaked some of the values in /etc/sysctl.conf (I have DAC dual-port 40GBe Mellanox cards in these servers). I am wondering if this may have messed things up if the Linux VM is tightly tuned for running StarWind VSAN.

Along those same lines, when I run

Code: Select all

sysctl -a
I can see some interesting values, particularly this one: net.ipv4.udp_mem = 3145728 4194304 16777216. To my eye, that looks like a pretty high setting for the "min" value. In my system, this is apparently being set by the "network-throughput" profile configuration in /usr/lib/tuned/network-throughput/tuned.conf.

Should I revert the tuned.service to use the default "balanced" profile, and then revert the tweaks I made to /etc/sysctl.conf?

What I still don't understand is why there's any use of UDP over 127.0.0.1 in the first place. I am assuming it has something to do with enabling StarWind VSAN to use the Linux networking stack via Wine?

--------

Theory 2 is that the HA devices on nodes 1 and 2 are holding storage data under the expectation that they need to distribute it to node 3 -- even though I have not configured a 3rd replica for those HA devices on node 3 -- and so the buffers are never lowered since storage data is never sent to node 3.

--------

Theory 3 is that I've misconfigured the networking environment in the ESXi hosts and/or the switches that connect the management infrastructure. Happy to provide details here if it would help.

--------

Would really appreciate any feedback you would be able to provide on the above. It just seems odd that this issue is only occurring in my environment, so I'm chalking it up to user-error at the moment, but I am really hoping to fix it.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Nov 01, 2019 3:31 pm

Thank you for this update. I made sure this information reaches the team responsible for the product.
Unfortunately, I do not have any update for you at the moment, as they are still looking into the whole thing together with some testing, and that did not show any similar behavior. Hope this information from you is going to help them identify everything.
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Sat Nov 16, 2019 11:56 am

Hi Boris, I wanted to check in again and see if there had been any progress made evaluating the issue?

If nothing else, would it be possible to get more details on what the UDP-over-localhost/loopback is doing?
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Sat Nov 16, 2019 6:55 pm

Also, is anyone else able to replicate the behavior of the UDP receive buffers on the localhost address/loopback device? Another interesting thing I have only recently noticed is that running `ss -lp --tcp` shows the wineserver TCP send buffers stuck at 50000 bytes on my system (PIDs and FDs are different from above because I've rebooted several times since then):

Code: Select all

State       Recv-Q Send-Q                                                                                    Local Address:Port                                                                                                     Peer Address:Port                
LISTEN      0      128                                                                                       192.168.1.135:ssh                                                                                                                 *:*                     users:(("sshd",pid=10384,fd=3))
LISTEN      0      50000                                                                                     192.168.1.135:iscsi-target                                                                                                        *:*                     users:(("wineserver",pid=10834,fd=350))
LISTEN      0      50000                                                                                     192.168.71.50:iscsi-target                                                                                                        *:*                     users:(("wineserver",pid=10834,fd=346))
LISTEN      0      50000                                                                                     192.168.69.50:iscsi-target                                                                                                        *:*                     users:(("wineserver",pid=10834,fd=340))
LISTEN      0      50000                                                                                       192.168.5.7:iscsi-target                                                                                                        *:*                     users:(("wineserver",pid=10834,fd=339))
LISTEN      0      50000                                                                                       192.168.3.7:iscsi-target                                                                                                        *:*                     users:(("wineserver",pid=10834,fd=337))
LISTEN      0      50000                                                                                         127.0.0.1:iscsi-target                                                                                                        *:*                     users:(("wineserver",pid=10834,fd=335))
LISTEN      0      50000                                                                                                 *:winshadow                                                                                                           *:*                     users:(("wineserver",pid=10834,fd=341),("StarWindService",pid=10832,fd=822),("StarWindService",pid=10832,fd=433))
I'm kind of wondering if the "full" TCP send buffers are causing the UDP receive buffers to back up ...
Al (staff)
Staff
Posts: 43
Joined: Tue Jul 26, 2016 2:26 pm

Tue Nov 26, 2019 6:12 pm

Hello dsw9742,

Answering your question about the UDP socket, StarWind Service utilizes UDP for a Service discovery protocol. It is used to find out StarWind services over the network for StarWind Management Console. Since, this mechanism is not needed for a VSAN for vSphere (Management Console is delivered separately), it is going to be removed from StarWind Service on Linux.

TCP sockets are utilized for StarWind Control protocol and for iSCSI protocol itself.
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Wed Nov 27, 2019 9:08 pm

Hi Al, thank you for that information, that is helpful to know. Are you aware of any way to deactivate that service discovery in the current Linux VSA ?
Al (staff)
Staff
Posts: 43
Joined: Tue Jul 26, 2016 2:26 pm

Thu Nov 28, 2019 10:55 am

Hi dsw9742,

Unfortunately, it could not be done manually at the moment. We are going to fix in the next builds. I will update the community as soon as we will do that.
dsw9742
Posts: 17
Joined: Wed Aug 07, 2019 1:38 am

Thu Dec 05, 2019 10:00 am

Hi Al, thank you for the continued answers. Are you able, by any chance, to provide an ETA for the next release that includes the fix?
Post Reply