Category: Organization

I told you so...

1/2/2022

You Never Know What You Are Going to Get

Who doesn’t like a good meeting

As we gathered into the conference room to discuss transitioning of applications to the Development team I knew we were about to uncover some nasty secrets hidden in the OPS team….. Manager presents an application called CENTRON, it is basically an analyst task and incident tracking tool with scheduling capabilities. I was familiar with this app, 2 years ago during the initial discussions of the app i had recommended the code, requirements, and issue tracking be done using the Development teams dev studio. We got as far as importing the v1 code base into the repo and then never heard anything else from the application author/developer/team…. Now, with turnover and organization ‘realignment’ they were coming to the development team for help (and to take over ownership of the tool). I was about to say: “I told you so….” but when looking around the room there was NO ONE from the prior organization that i had worked with or worked on the tool. Acquisitions can have a huge impact on people and vision. So I focused my energy on cleanup and getting this application under a supportable maintenance cycle.

Laying the Foundation

First step, onboard the project into the Development Studio:

Create needed Confluence space to store documents, notes, requirements, etc.
Create needed Jira Project for issue management, resource planning, and prioritization
Create needed Bitbucket repo(s) for storing code and configuration items.
Create needed Bamboo plans for CI/CD

Our Development Studio has all the necessary components to on board any type of project and work it from the requirements/gathering phases all the way to the automated deployment of said tool. The studio doesn’t just have integrated tools that make for quick execution but the documentation and processes to keep the studio running. If folks are hired or the path of the team changes there should be an overall layout of how things run. In this day and age of all things Agile, sometimes we lose the fact that documentation needs to be written, ‘lived’, and reviewed to keep it up to date and relevant to the current day operation.

Collect ALL CODE!

Need to get ALL code and configurations under version control. A majority of the work done to the app was on the production server so it was a phased approach:
- Get all code into the repo
- Verify the files ‘match’ what is available in production
- Cleanup unneeded files
  - With many projects that do not have a repeatable release process you will find there are backup files and directories include. (ie: CENTRON.php_backup or CENTRON.php_old)
Removing the unnecessary files will allow the new maintainers to focus on the important files/parts of the tool. Any time saved is well spent. Think about the code review of the application by the ‘new’ development team. If they are reviewing copies of the originals it is still time they have used for ‘wasted work’ and ineffective cycles. Removing the files will guarantee that this time will be saved in the future over and over again.

Documentation Review/Creation:

Collect all related/relevant documentation and notes about the application. Storing them all in a Confluence Space for the Application.
Create a System Integration Diagram to visualize the other systems the application integrates/communicates with. Without some type of diagram or map you can refer to, it becomes very difficult to understand the other pieces of the puzzle.
ANY documentation about the system/application will help you in the long run!

System Review:

Review current application in production
Noting directories, configuration items, notes, security concerns, etc
Adding all notes to a central page in confluence
Maintaining an application in a production environment is more than just the CODE BASE! You need to understand the overall architecture of the application, how it integrates and communicates with other tools and services, operating system level configurations, asset details, user/account information, Outage and recovery procedures/documents are all important and cannot be overlooked.

Backups/Recover:

Do a complete backup of the system before ANYTHING is done on the server.
- In our case it is a VM Snapshot which makes life soooo much easier then yesteryear
- Along with the backup procedure there should be recovery steps documented somewhere for the greater good and to reduce SPF (Single Point of Failure) within the teams.
  - It’s hard to train without some documentation.
  - How do you know procedure is followed if it isn’t documented?

A New Day
Start a new:

Deploy a new server(vm) to replace the old production server. After years of a system living in production you can no longer fully understand who was on the box, what they did on the box, what changes were made on the box, and who has root access on the box. You are better off starting from a clean slate.
Determine how the code and components can be more easily managed. Does it make sense to switch to containers and have different parts/components updatable by piece or all at once.
Determine how testing will be done to the application. This project had no unit or integration tests. Any change of the code needs to be manually verified with the knowledge that other areas of the tool could break. We implemented a basic testing framework and focused on adding tests as we fixed bugs and/or implemented new features.
Determine how deployments will be done. The old application was updated by hand on the server. We moved it to a bamboo deployment plan which pulled docker images down from our private repo and started them with a docker-compose file. Now when deployments are done it is automated and done by the machine which greatly reduces the errors made by someone making the changes.
Determine when and how the "lift and shift" will happen from the old production server to the new pristine server. Keep in mind that with any active application communication is key. Users need to be aware when maintenance windows will happen and understand "how" to handle this time in their day to day jobs. Very much like a disaster incident, folks will need to know how to do work without this system for the short period of time. Have a shadow period with both systems up and running. This way you can always hit the o sh88 handle if things are found missing (or wrong) on the new system.
Determine where issues, complaints, and feedback will go during the cut over AND moving into the future. We have a designated email distro list setup just for this. Next step for us is to automate the case creation off of email submissions (ie: outage email submitted, case created and put in IT team queue for resolution)
Document and make known where your documentation is for the application, system, and environment. If your IT team has to respond to outages they better know where to go and what to do.
Make an effort to understand WHY and WHAT areas of the tool/application are used for operations. Times change, new technology/tools come along, don’t repeat the same steps when new and improved actions can happen. Management input will be a huge portion of this discussion, they will need to drive the change and what will happen with legacy applications in your workplace!

0 Comments

You Don't Need a backup, until you NEED THE BACKUP!

3/13/2018

1 Comment

Had an unfortunate event at work today. One of my coworkers deleted the Deployment plan for one of our projects. Not just one of the deployment plans, but the WHOLE projects deployment plans. We are an Atlassian shop utilizing everything from Confluence(for requirements and documentation), Jira (for issue and project management), Bitbucket (for code repo, peer review, branch management), to Bamboo (for builds/automated testing/deployments). Bamboo can be a pain at times but it gets the job done and like any CI/CD server it has its quirks and you need to provide it proper care and feeding.

Well, half way through the morning I receive a message basically saying "O SH**" and that the deployment plans had been deleted on accident. After a few minutes of fuming it was time to get this thing back up and running. I contacted our IT team to find out this VM has not been backed up since Sept of last year. This day kept getting better and better. I took a little time to run through the deployment plans, build plans, and configurations on the server and document anything and everything I thought would or could be useful after the revert. I was surprised by the lack of options in Bamboo to recover from changes, when it's gone IT'S GONE!

Long story short. We reverted. Luckily a lot of the deployment plan which was deleted was in that snapshot but it got me thinking about what we could have done to avoid this problem all together.

Backups:
- Verify there is a CONSISTENT backup plan with your IT team. Consistent being a FULL BACKUP at least every few days and an incremental backup done nightly.
  - Backups can come at many levels. Understand the difference between a FULL System backup and a backup the application provides. In our case Bamboo does have a backup 'option' but if you read the fine print it is not meant to be used in production. So we are leaning to a full system backup, thankfully VMs make this a VERY easy process, not like yesteryear.
  - At least every 6 months do a FULL SYSTEM restore from the backup to triple check things will work as expected when the SH** hits the fan. Include the team on this recovery test, don't allow one member of the team to become the 'backup/recovery' person.
  - Define which team is responsible for what part of the 'machine'. If the IT team is responsible for the underlying HOST but the development team is on the hook for the application have that documented and agreed upon. You can't do these things in times of crisis!
  - DOCUMENT THIS PLAN and share it with all related parties!!
User Permissions
- Who on your team should have DELETE permissions on anything? Is it necessary?
- Should DELETE only be given to the Manager or LEAD Developer? But who watches the 'watchers'
- Stick to LEAST PRIVILEGE for users no matter who they are and make them follow the process when additional permissions are needed.
- Have some level of skills/knowledge 'check' for up and coming developers or team members so you provide necessary training before handing over the keys to the kingdom. A Junior level developer might not have any idea what the CI/CD server does, so don't allow them to go in as a member of the ADMIN group and make changes.
Changes:
- Changes are changes whether they are code, tests, build plans, or deployment plans. They should go through some level of Peer Review.
- Added bonus of Peer Review is you get another set of eyes on the change which boosts cross training and relieves some of the SPF (single point of failure) within the team.
Training and Documentation:
- The more folks know the better the organization will be, train the team(s) on the technology and PROCESS around build/deployments.
- Keep a high level diagram showing the network, hosts, and communications so explaining the overall process is easy to understand and management of the IPs/hosts is clear.
- Keep in mind that your document repo (Confluence) is just another system which can go down. So if you are relying on that server because it holds all your backup/emergency SOPs you better have a plan B!

Today was rough, but with any issue comes opportunity. Use the 'lesson' and learn from it so it doesn't happen again. EVERYONE should walk away from the experience with more knowledge of the tool/process, better skills around the tool, and the confidence that this type of issue will NOT happen again because the team is taking the right steps in the future.

What are some of your worst backup/recovery experiences?

Bartlett

1 Comment

All Aboard

12/27/2017

0 Comments

I updated my All Aboard presentation. It's my thoughts on project management, vision, goals, and impediments.

0 Comments

Communications

8/4/2017

0 Comments

If you have poor collaboration tools and/or folks don’t know how to use them you will be in trouble. Defining what tool is used and when seems like a silly task but it will save you a lot of heartache in the long run.

Slack
If you use Slack each channel should have a designated use and the team should know when slack is the appropriate medium vs something like email or documenting in confluence. Here are a couple of channel examples with 'use definition':

JIRA - Any update to an issue (added, modified, comments)
Confluence - Any update to a space/page (add, modified, comments)
Bitbucket - Any update to Pull requests (add, modified, comments)
Develop - Any update related to the actual development work or questions
Process - Any update about the overall development process or changes to the process

One area to think about when creating channels in Slack is will the channels defined be granular enough to make sense if multiple projects are using this integration? If we have project A and project Z and they are both utilizing the existing channels will the messages confuse other teams? Should we create specific channels for each project? Project A - Jira, Project Z - Jira so only those related issues will be seen in these channels and specific teams can subscribe to the channels they care about and will not be inundated with details of other projects they don't need to worry about.

Collaboration Tools
A quick list of collaboration tools with the use will go a long way when on boarding new employees into the environment or adding different teams into the development studio.

Confluence - Project documentation and requirements are stored here. Different spaces are created for each project.

Jira - Issues/Work items are created here. Each 'project' has it's own project in Jira. Boards are created to show one or multiple project statuses.
Bitbucket - Code repository. All code is stored in this repo and any changes to code are made through a branch and pull request defined in the development process.
Jenkins/Bamboo/Team City - CI server is used to create builds, run tests (unit/e2e/etc), and do deployments
Webex - Used for meetings with more than 2 people and screen sharing.

Once everyone understands when/where/how to communicate it will save the team a ton of time.

0 Comments

Idea Central

12/12/2015

0 Comments

With any project the importance of a central idea/collaboration site is crucial.

Establish a central idea store for ideas, questions, requirements, and ramblings. In my past engagements I've used Confluence, Alfresco, and SharePoint. Pick something that is easy to use/setup, has integration capabilities with other tools (plugins for issue tracking/management systems, code repository, email, IM, etc), and keep an eye on the license structure (is it 2 dollars a person till you get to 10 users where the vendor charges $5000 for a license?).
Create the layout of the idea store. If you don't spend the time up front to decide on where notes, design docs, meeting notes, and general ideas will be stored you will end up with a disorganized website where no one can find relevant information. Create a quick site map or hierarchy diagram of where things will 'live'. Based on the type of site(development, project, team) the content hierarchy will be different.
Get the team using it! Provide a quick training (or create a walk through video and post it for the team to view at their leisure). Monitor what is being posted to the site, what it is, where it is going, and are there any changes necessary for the structure or labeling used for pages. Pay close attention to how the 'other' communication technologies are being used. Are folks still sending information directly related to development/requirements over email or IM? If so, remind the team about the site and where this type of information should live. The HABIT is the hardest thing to change, if folks are used to communicating all over Lync or IM it is a mindset change, but once the information is available and easy to find weeks or months down the road the whole process will improve.
REVIEW the site regularly. Are things still organized? Is there information which is outdated and should be moved or reordered? People change, process change, technology changes make sure you are improving your site over time.

Here is an example site layout a development team may use to get started:

Team/Project Homepage
    - Development Documentation
              - Architecture/Design Documentation
              - Development Process Document
     - FAQ Page
     - How To Articles
              - how to login to Time System
              - etc
      - Meeting Notes
              - Meeting on 01/01/2016 for Product Vision
              - etc
       - Retrospecitves
              - Retro for Sprint 99
              - etc
        - Links
        - Product Requirements
               - Turn the green button blue
               - etc
        - Scrum/Sprint Documentation
               - Sprint Roadmap Page (showing objectives for next few sprints with linkage)
               - Definition of Done
               - Team Rules

0 Comments

I told you so...

You Don't Need a backup, until you NEED THE BACKUP!

All Aboard

Communications

Idea Central

Author

Archives

Categories