Monday, October 4, 2021

The Person Factor

 With your experience and knowledge of technology and your ability to navigate the politics of work, you may find yourself as the main technical resource for troubleshooting. At this point in your career, you may be empowered to pull a troubleshooting team together to resolve a major issue.

There’s no surprise that you have to manage EVERYONE impacted by the issue which may include technical teams, management, clients and everyone that has a vested interest in the current situation.

It’s a balancing act dealing with prioritizing how to best troubleshoot the situation and deal with the various people involved and, oh yes, don’t forget about keeping everyone informed. If you feel that you are being pulled in multiple directions, then you are doing things correctly.

Depending on what you’re troubleshooting, the situation may have caused multiple issues affecting other areas of business and require even more co-ordination.

Politics While Troubleshooting

Obviously your main priority is to resolve the issue as quickly as possible.

The challenge is managing the technical resources and keeping management in the loop so you don’t go insane. Like it or not, for the time being, you are the layer between your boss, peers, direct reports and clients. Below are some strategies that may help you through this stressful and demanding situation:

  1. Maintaining Professionalism

You will be dealing with various types of personalities and conflicts as well as providing continuous updates to your boss, clients and business operation stakeholders. Here are some key points that you and your team may follow to help maintain professionalism under stress:

  • Being organized and having a plan is an asset; when the team is working towards a common goal with key roles and responsibilities identified.

  • Diffuse any issues of conflict by staying calm and committing to address any issues after the technical issue has been resolved.

  • Adaptable to change and displaying competence; as things progress during troubleshooting and dealing with everyone involved it may be necessary to shift gears in order to resolve the problem

  • Being ethical and honoring your word and following up; being sure to honor any commitments made and communicate regularly.

  • Being respectful and polite – treating everyone from peers to clients respectfully

Maintaining professionalism gives you a professional advantage maximizing you value in the organization and giving you respect and credibility in the workplace. You and your team will be regarded as reliable, competent, respectful and approachable.

  1. Identify a SPOC (Single Point of Contact):

Take the opportunity to identify a person or department that can provide updates to the teams, groups and clients, by having someone else help in communicating updates this will minimize your involvement of constantly reporting on the issue and provide you with breathing room you need to resolve the issue and minimizing distractions. Some key points in selecting the SPOC are:

  • Does the issue impact a specific project that someone is working on? There may be an opportunity to have them spearhead managing the updates and support issues as it pertains to the specific project.

  • Engage the Help Desk to provide updates to specific departments and individuals.

  • Delegate this to a team member that is being groomed for a management position or could benefit from the experience, giving them an opportunity to shine as they deal with stressful situation. It’s also an opportunity for you to mentor and coach the team member.

  • If you are not authorized to delegate this to anyone, you can put forward a recommendation to your manager highlighting the benefits to the team member.

If you are relying on another department or person to handle this task, be sure to obtain their contact details and their work schedule.

  1. Communication and updates

People underestimate the power of staying in touch. Communication is key, not only do you need to provide continuous updates but you need to be clear on assigning accountabilities.

  • Provide a clear understanding of the tasks to be performed and identify who will be doing what.

  • Set clear expectations of how often updates should be provided. Update intervals will vary based on the severity of the problem and the sensitivity of the issue.

  • Tip 1: if you commit to providing updates every 30 minutes; ensure people working on the problem provide you an update every 25 minutes giving you 5 minutes to summarize and provide updates. Updates can be as simple as “Still working on the issue or there are no significant updates at the moment.”

  • Tip 2: Don’t get stressed out if someone is unable to provide an update at every interval; there may not be any changes at that time.

  • Tip 3: You can change the update intervals as the severity of the situation decreases, and be sure everyone has been informed of the new requirement.

  • Tip 4: Support your team by acknowledging that you are aware that they are busy and may lose track of time; if this happens you will be contacting them if you don’t hear from them in the time allotted.

  • Tip 5: ensure that everyone understands an update can be a simple ‘still working on it’, ‘looking at router x’ or ‘problem is not software upgrade’. When people realize the updates are simple, short statements, they are more inclined to provide them.

  • Tip 6: Updates should also identify what isn’t the problem such as “We have determined the current issues are not related to the changes implemented this past weekend”.

  • Tip 7: Leverage technology such as email distribution, ticketing system, voice mail, texts, social media or any other means to provide updates

  • Tip 8: Ensure people are aware that they are not to call for updates, assure them that updates will be provided regularly.

  1. Stay Focused

Stick with the action plan. The key here is not to be pulled in too many directions or to deal with escalations. Escalations tend to add more resources into the mix to try and assist which may not be effective in resolving the issue.

  • Minimize any distractions and interruptions. I can tell you from personal experience that it is easy to be distracted when troubleshooting since everything is possibly part of the issue.

  • Encourage team members to follow the process; One of the challenges is to manage all the underground support dialogue between clients and technical staff. Even with their best intentions, this is where things can go horribly wrong. If the technical team receives any information from anyone outside of the troubleshooting team, it needs to be vetted by the team leader.

Post troubleshooting Politics

Finally the issue has been resolved. Now it’s time to deal with the post outage review of what happened and how to prevent it or mitigate it from happening again. Here are some key points that may be helpful in following up after troubleshooting:

  • Being sure to keep things factual and exclude any personal feeling during this process; describe in detail what caused the issue, how it was detected and reported, and identify the events and tasks that occurred to resolve the issue, be sure to include timelines.

  • As part of the post outage review, obtain and review feedback. Ensure the feedback isn’t viewed as negative. For example, I was involved in a situation where management questioned why it took so long to respond to the problem when the Network Monitoring system identified an outage 30 minutes before anything was done. After chatting with the team it was discovered that the help desk had an outdated contact list. This was an easy fix and the Help desk manager was asked to periodically review the contact list as part of their procedures.

  • With the details from the report, you and/or the management team can identify areas for improvement and implement any procedures or processes that can mitigate this from happening again.

  • I have been involved in many troubleshooting scenarios where the staff have told me what is wrong with the current process. I simply document the information provided, along with my sources and suggested solutions. I then present them to management and when it is immediately adopted by management the team is shocked how simple it was.

  • The difference here is that staff believes implementing change is very difficult, but the catalyst for change is actually very simple.

  • With the incident documented, it can be used as a reference for any future outages.

  • Be sure to thank the team for all their hard work and support.

Finally, dealing with the politics of resolving the issue and managing the resources may be a stressful and chaotic situation. By maintaining your professionalism and focusing on the task at hand, your leadership and communication skills will be well received by all affected in this situation. Hopefully these tips and suggestions provide you with some value and guidance as you tackle your next troubleshooting adventure.


Monday, September 27, 2021

NTP Broadcast Issue

 Many of my regular customers refer to me as the ‘Network Janitor’ because I seem to gravitate to ‘cleaning up’ networks.

In some cases, yes physically cleaning up and organizing datacenters, wiring closets, etc – kind a network version of a personal organizer. In most cases though I clean up the network from the packet perspective.


For years I have been preaching concepts such as ‘The pc bootup and login baseline’ as well as “The VLAN or subnet broadcast analysis”. In both cases, I look for unnecessary traffic to make things run smoother and more efficiently.


In my throughput class I explain how quickly things get messy using basic math. For example, assume you have a 7% broadcast rate on switch where everybody has a 1 Gbps connection.

Then on this same switch assume there is a Wireless access point with a 100 Mbps connection. Here comes the math: 7% of 1 Gbps is 70 Mbps hitting the access point with a 802.11g or 54 Mbps radios. See what I mean.


This is precisely why I look for ways to minimize the number of broadcasts floating around your network.


In this specific example a HP printer was using its default NTP configuration where it transmits a broadcast packet looking for its time server or services. Since this is a large flat network, hundreds of devices ARP for the printer. This wouldn’t be an issue if there were fewer devices within this VLAN, but like I just said, hundreds of devices respond with an ARP broadcast.


Depending when the device’s arp tables expired, I observer anywhere from approx 50 – 7,000 broadcasts per second. After seeing this the symptoms made perfect sense. When there a lot of ARP’s the wireless users got kicked off as well as general performance issues everywhere.

When there where less, then there was just performance issues and general network slowdowns.

Yikkes!! Fortunately in this case just the one printer was configured this way and was easy modified. Regardless, I showed the customer how their current network design is affected by broadcasts


Note the space bar trick doesn’t work anymore.



Tuesday, September 21, 2021

Documenting Why “Its Slow”

 I can’t tell you how many times I have heard that dreaded phrase, “Its Slow”.

I’ve heard this so many times I typically casually respond with, “Great, what is it and how slow is slow?”


The biggest issue I have with this statement is that this is the typical network complaint that sucks you into the troubleshooting vortex since nothing is clearly defined.


For example, if I said, “email is slow or it takes 2 hours to download my files” you have a chance to address this since I can measure the problem and the end result. The 2 hour comment actually gives you a measurable value to compare against.


One of the toughest things about troubleshooting is when an assumption is made like drive x on the server is fine, therefore the server is fine.


In this example I wanted to demonstrate to a client that one disk or file system on a server can be slower than the other. I also taught him some Wireshark tips and tricks along the way to help him in the future.


Enjoy


Popular post