cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hansen <jer...@skidrow.la>
Subject Ethernet issues with CephFS client mount on a CS instance
Date Tue, 31 Aug 2021 00:47:23 GMT
I’m going to also post this to the Ceph list since it seems to only happen when I have a
cephfs volume mounted from a cloudstack instance.

Attempting to rsync a large file to the Ceph volume, the instance becomes unresponsive at
the network level. It eventually returns but it will continually drop offline as the file
copies. Dmesg shows this:

[ 7144.888744] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <100687140>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7146.872563] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <100687900>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7148.856703] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <1006880c0>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7150.199756] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly

The host machine:

System Information
Manufacturer: Dell Inc.
Product Name: OptiPlex 990

Running CentOS 8.4.

I also see the same error on another host of a different hw type:

Manufacturer: Hewlett-Packard
Product Name: HP Compaq 8200 Elite SFF PC

but both are using e1000 drivers.

I upgraded the kernel to 5.13.x and I thought this fixed the issue, but now I see the error
again.

Migrating the instance to a bigger server class machine (also e1000e, old Rackable system)
where I have a bigger pipe via bonding, I don’t seem to have the issue.

Just curious if this could be a known bug with e1000e and if there is any kind of work around.

Thanks
-jeremy


Mime
View raw message