// removed jquery ui css and js

CCO Troubleshooting

After Configuring the AMPQ and CCO, Communication is not Established between Both Servers

All port-related nuances and requirements are provided in Phase 2: Configure Firewall Rules in the Install section.

The most common communication issue between the CCO and AMQP servers is that the RabbitMQ configuration is not completed at the appropriate time. The RabbitMQ configuration must be completed before one of the following occurrences:

  • Virtual Appliance Installation: The AMQP appliance image is rebooted

  • Manual Installation: The core installer is rebooted on the AMQP server

You can verify if the RabbitMQ configuration was successfully completed by running the following commands and verifying the output as listed here:

# rabbitmqctl list_vhosts
Listing vhosts ...
/
/cliqr

# rabbitmqctl list_users
Listing users ...
cliqr  [administrator]
cliqr_worker  []

# rabbitmqctl list_permissions -p /cliqr
Listing permissions in vhost "/cliqr" ...
cliqr  .*     .*     .*
cliqr_worker  .*     .*     .*

Recovering and Rejoining a Cluster after a CCO Server Failure

The following process applies to any CCO server that fails. This procedure assumes the following:

  • The CCO3 server failed

  • CCO1 and CCO 2 continue to service requests for the CloudCenter platform.

  • A Load Balancer is configured in front of all three CCO servers

  • HA Proxy is running on a separate CentOS7 VM.

To address the failure of CCO3, you must build a new VM to replace the failed server.

  1. Remove the Failed CCO IP from the CCO Load Balancer. 

    • Method 1:

      1. Edit the configuration file and reload the HA Proxy.

        sudo -i
        vi /etc/haproxy/haproxy.cfg
        #Comment out the failed CCO from the backend section
        #Save and Exit

      2. Reload HA Proxy to use the new configuration and verify the status to ensure that it is running.

        systemctl reload haproxy
        systemctl status haproxy


        OR

    • Method 2:

      1. Remove the Failed CCO IP using the HA Proxy GUI, if you have it enabled, to set the failed CCO to Maintenance Mode.

      2. Select the Failed CCO backend and set it to MAINT mode.

      3. Click Apply.

  2. Deploy a new VM to replace the Failed CCO VM. See CCO (Required) for procedural details.

  3. Verify that the replacement CCO VM is ready. SSH (using the -i option) into the new replacement CCO VM. 

    The 4.8.2x appliance images block ICMP, so you may not know that the VM is up if you use the ping method. Instead use the SSH method to verify readiness.

    ssh <user>@<IP Address of replacement CCO>

    or

    ssh -i <KeyPair File> centos@<IP Address of replacement CCO>
  4. Switch user to "root" access.

    sudo -i 
  5. Once connected via SSH, verify the following VM settings.

    1. Check the Interface config for Static or DHCP for your setup. Modify if you need to make changes. Save and exit.

      cat /etc/sysconfig/network-scripts/ifcfg-eth0

      or

      /etc/sysconfig/network-scripts/ifcfg-eth0
    2. Check the /etc/hosts file matches that of the two Running CCO VMs. If not add the entries to the Failed CCO3 Node. Save and exit.

      cat /etc/hosts

      or

      vi /etc/hosts
    3. Verify the Hostname in /etc/hostname file using the CLI command. Save and exit.

      cat /etc/hostname

      or

      vi /etc/hostname

    4. Verify DNS settings and try to resolve a search site.

      cat /etc/resolv.conf

      ping www.yahoo.co

    5. Setup SSH Keys between all three CCOs.  When you’re done, you should be able to SSH from each of the three CCOs to the other CCOs without being asked for a username/password.

      When you boot using a Key-Pair, the .ssh directory and authorized_keys file are already there!

    6. On one of the existing CCOs servers (in this procedural example, CCO1 or CCO2), issue the following commands.

      sudo -i
      cat /root/.ssh/id_rsa.pub
    7. On CCO3, copy the text key output from the above command, insert the text, save, and exit.

      vi /root/.ssh/authorized_keys
      #Cursor down to the last line
      #Append the copied text from the from "id_rsa" file to the end of that line
    8. On CCO1 (assuming SFTP usage), validate that SSH from CCO1 is good to CCO3.

      sftp <IP Address of CCO3>
      put /root/.ssh/id_rsa* /root/.ssh
      quit
    9. Validate the SSH from CCO2 is good to CCO3.

      ssh <CCO3 IP Address>
    10. Validate the SSH from CCO3 is good to CCO 1 and CCO2.

      ssh <CCO1 IP Address>
      ssh <CCO2 IP Address>
    11. Centos 7 has chmod 600 on these files and this does not need to be changed.

      ls -l /root/.ssh/

                  

Error after Multiple CCOs are Restarted Simultaneously

Sometimes, CCOs do not come up successfully in HA mode due to an operation-heartbeat-timeout error.

When CCOs are simultaneously restarted, they need additional time to form a cluster and this may sometimes result in an operation-heartbeat-timeout.

Restart one CCO gateway service first and wait for it to be up completely. Then restart the other two CCO gateway services. You will know that the CCO gateway service is up completely when you see the following message in the gateway service log file (/usr/local/cliqr/logs/gateway.log):

2018-04-09 23:55:20,470 INFO  orchestrator.OrchestratorServer [main] 
 - Started OrchestratorServer in 30.553 seconds (JVM running for 32.062)



  • No labels
© 2017-2019 Cisco Systems, Inc. All rights reserved