net/mlx5: No command allowed when command interface is not ready [Linux 5.0]

net/mlx5: No command allowed when command interface is not ready [Linux 5.0]

This Linux kernel change "net/mlx5: No command allowed when command interface is not ready" is included in the Linux 5.0 release. This change is authored by Huy Nguyen <huyn [at] mellanox.com> on Thu Feb 7 09:22:56 2019 -0600. The commit for this change in Linux stable tree is 4cab346 (patch).

net/mlx5: No command allowed when command interface is not ready

When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.

Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.

Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <[email protected]>
Reviewed-by: Daniel Jurgens <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

There are 21 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c       | 18 ++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/health.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 3e0fa8a..e267ff9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1583,6 +1583,24 @@ void mlx5_cmd_trigger_completions(struct mlx5_core_dev *dev)
    spin_unlock_irqrestore(&dev->cmd.alloc_lock, flags);
 }

+void mlx5_cmd_flush(struct mlx5_core_dev *dev)
+{
+   struct mlx5_cmd *cmd = &dev->cmd;
+   int i;
+
+   for (i = 0; i < cmd->max_reg_cmds; i++)
+       while (down_trylock(&cmd->sem))
+           mlx5_cmd_trigger_completions(dev);
+
+   while (down_trylock(&cmd->pages_sem))
+       mlx5_cmd_trigger_completions(dev);
+
+   /* Unlock cmdif */
+   up(&cmd->pages_sem);
+   for (i = 0; i < cmd->max_reg_cmds; i++)
+       up(&cmd->sem);
+}
+
 static int status_to_err(u8 status)
 {
    return status ? -1 : 0; /* TBD more meaningful codes */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 196c073..cb9fa34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -103,7 +103,7 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
    mlx5_core_err(dev, "start\n");
    if (pci_channel_offline(dev->pdev) || in_fatal(dev) || force) {
        dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
-       mlx5_cmd_trigger_completions(dev);
+       mlx5_cmd_flush(dev);
    }

    mlx5_notifier_call_chain(dev->priv.events, MLX5_DEV_EVENT_SYS_ERROR, (void *)1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 5300b0b6..4fdac02 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -126,6 +126,7 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev,
                 struct ptp_system_timestamp *sts);

 void mlx5_cmd_trigger_completions(struct mlx5_core_dev *dev);
+void mlx5_cmd_flush(struct mlx5_core_dev *dev);
 int mlx5_cq_debugfs_init(struct mlx5_core_dev *dev);
 void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev);

Leave a Reply

Your email address will not be published. Required fields are marked *