Category: Work

You Don't Need a backup, until you NEED THE BACKUP!

3/13/2018

Had an unfortunate event at work today. One of my coworkers deleted the Deployment plan for one of our projects. Not just one of the deployment plans, but the WHOLE projects deployment plans. We are an Atlassian shop utilizing everything from Confluence(for requirements and documentation), Jira (for issue and project management), Bitbucket (for code repo, peer review, branch management), to Bamboo (for builds/automated testing/deployments). Bamboo can be a pain at times but it gets the job done and like any CI/CD server it has its quirks and you need to provide it proper care and feeding.

Well, half way through the morning I receive a message basically saying "O SH**" and that the deployment plans had been deleted on accident. After a few minutes of fuming it was time to get this thing back up and running. I contacted our IT team to find out this VM has not been backed up since Sept of last year. This day kept getting better and better. I took a little time to run through the deployment plans, build plans, and configurations on the server and document anything and everything I thought would or could be useful after the revert. I was surprised by the lack of options in Bamboo to recover from changes, when it's gone IT'S GONE!

Long story short. We reverted. Luckily a lot of the deployment plan which was deleted was in that snapshot but it got me thinking about what we could have done to avoid this problem all together.

Backups:
- Verify there is a CONSISTENT backup plan with your IT team. Consistent being a FULL BACKUP at least every few days and an incremental backup done nightly.
  - Backups can come at many levels. Understand the difference between a FULL System backup and a backup the application provides. In our case Bamboo does have a backup 'option' but if you read the fine print it is not meant to be used in production. So we are leaning to a full system backup, thankfully VMs make this a VERY easy process, not like yesteryear.
  - At least every 6 months do a FULL SYSTEM restore from the backup to triple check things will work as expected when the SH** hits the fan. Include the team on this recovery test, don't allow one member of the team to become the 'backup/recovery' person.
  - Define which team is responsible for what part of the 'machine'. If the IT team is responsible for the underlying HOST but the development team is on the hook for the application have that documented and agreed upon. You can't do these things in times of crisis!
  - DOCUMENT THIS PLAN and share it with all related parties!!
User Permissions
- Who on your team should have DELETE permissions on anything? Is it necessary?
- Should DELETE only be given to the Manager or LEAD Developer? But who watches the 'watchers'
- Stick to LEAST PRIVILEGE for users no matter who they are and make them follow the process when additional permissions are needed.
- Have some level of skills/knowledge 'check' for up and coming developers or team members so you provide necessary training before handing over the keys to the kingdom. A Junior level developer might not have any idea what the CI/CD server does, so don't allow them to go in as a member of the ADMIN group and make changes.
Changes:
- Changes are changes whether they are code, tests, build plans, or deployment plans. They should go through some level of Peer Review.
- Added bonus of Peer Review is you get another set of eyes on the change which boosts cross training and relieves some of the SPF (single point of failure) within the team.
Training and Documentation:
- The more folks know the better the organization will be, train the team(s) on the technology and PROCESS around build/deployments.
- Keep a high level diagram showing the network, hosts, and communications so explaining the overall process is easy to understand and management of the IPs/hosts is clear.
- Keep in mind that your document repo (Confluence) is just another system which can go down. So if you are relying on that server because it holds all your backup/emergency SOPs you better have a plan B!

Today was rough, but with any issue comes opportunity. Use the 'lesson' and learn from it so it doesn't happen again. EVERYONE should walk away from the experience with more knowledge of the tool/process, better skills around the tool, and the confidence that this type of issue will NOT happen again because the team is taking the right steps in the future.

What are some of your worst backup/recovery experiences?

Bartlett

1 Comment

All Aboard

12/27/2017

0 Comments

I updated my All Aboard presentation. It's my thoughts on project management, vision, goals, and impediments.

0 Comments

All Aboard

10/24/2017

0 Comments

My thoughts on micro-management and other areas of project/product and team management.

0 Comments

The Three

10/6/2017

0 Comments

When in doubt about your job ask yourself the following questions:

1. Do you like the work you do?
2. Do you like the folks you work with?
3. Are you learning/growing?

If you answer NO to any of these then ask WHY NOT?

No one can answer these questions but you! Other folks can help you GET to your answer but at the end of the day it's up to you! If you answer NO, ask yourself how can i change that to a YES? Can I get involved with other projects at work? Can i spend more time with my colleagues to get to know them? Can i signup for an on-line learning site? Or ask other team members how they do their work?

If there is no good answer or action plan you can put together it might be time to move on our look for other opportunities.

0 Comments

Communications

8/4/2017

0 Comments

If you have poor collaboration tools and/or folks don’t know how to use them you will be in trouble. Defining what tool is used and when seems like a silly task but it will save you a lot of heartache in the long run.

Slack
If you use Slack each channel should have a designated use and the team should know when slack is the appropriate medium vs something like email or documenting in confluence. Here are a couple of channel examples with 'use definition':

JIRA - Any update to an issue (added, modified, comments)
Confluence - Any update to a space/page (add, modified, comments)
Bitbucket - Any update to Pull requests (add, modified, comments)
Develop - Any update related to the actual development work or questions
Process - Any update about the overall development process or changes to the process

One area to think about when creating channels in Slack is will the channels defined be granular enough to make sense if multiple projects are using this integration? If we have project A and project Z and they are both utilizing the existing channels will the messages confuse other teams? Should we create specific channels for each project? Project A - Jira, Project Z - Jira so only those related issues will be seen in these channels and specific teams can subscribe to the channels they care about and will not be inundated with details of other projects they don't need to worry about.

Collaboration Tools
A quick list of collaboration tools with the use will go a long way when on boarding new employees into the environment or adding different teams into the development studio.

Confluence - Project documentation and requirements are stored here. Different spaces are created for each project.

Jira - Issues/Work items are created here. Each 'project' has it's own project in Jira. Boards are created to show one or multiple project statuses.
Bitbucket - Code repository. All code is stored in this repo and any changes to code are made through a branch and pull request defined in the development process.
Jenkins/Bamboo/Team City - CI server is used to create builds, run tests (unit/e2e/etc), and do deployments
Webex - Used for meetings with more than 2 people and screen sharing.

Once everyone understands when/where/how to communicate it will save the team a ton of time.

0 Comments

You Don't Need a backup, until you NEED THE BACKUP!

All Aboard

All Aboard

The Three

Communications

Author

Archives

Categories