I just got back from the 2010 QCon SF conference. It was a very exciting week filled with great and some not so great talks. As usually happens after a conference, I am filled with new ideas and excited about how we can apply what I heard to our work at Edmunds. Sometimes the ideas one comes away with work and sometimes they don't, regardless one always learns something new and trying new ideas even those that don't work teaches you valuable lessons.
Over the course of the next week I'll be covering "What I learned at QCon" focusing on the following:
- Software Design
- Pretotyping Rocks
- Continuous Deployments
Each post will cover several speakes as well as discuss thoughts for how to apply what I heard to Edmunds.
We just posted another "Paddy Does Coherence" video on YouTube. Check it out:
Also Karim Qazi, one of our engineering directors, posted a great post on data versioning and coherence. You can find it on the technology.edmunds.com blog: http://technology.edmunds.com/blog/2010/10/keeping-data-backward-compatible-with-coherence-pof.html
We just posted a short video describing coherence that was shot a while back for our technology teams.
We are currently using Oracle's Coherence product as our primary data store behind our website. Coherence provides a scalable, fast data grid that allows for developers to easily access data and allows our data to be versioned so that structures can morph over time.
To date we have deployed our grid following the same pattern we used for our relational databases creating a separate grid for each integration environment. As integration environments grow we need to create more and more grids. Our ultimate goal for environments is to have a virtual stack that development teams can spin up and down as needed, given our current deployment architecture this would greatly multiply the number of grids we need.
A potential solution is to treat our data services more like a shared service that can be used by anyone. In this model there would be one production data grid that provides tested, approved data services to development, qa, and production environments.
The upside is that data consistency is easy to maintain and deployment and management would be easier. The data services would roll through their own deployment model and have internal integration environments that the services team could use for testing prior to releasing a new version of their service. Service upgrades would be independent of all other code (in many cases they already are) and be instantly available to all consumers. The instant availability would also cause the services team to focus on backwards compatibility.
There are also many downsides, most notably non-production applications could negatively impact our production website. Such impacts could be mitigated by ensuring that only production released data services clients are used by applications (our services all have client access libraries), however, that may not be enough to prevent a rouge process from impact end user performance and thus, revenue.
Perhaps there is a hybrid approach based on SLAs, such a hybrid approach could be the use of an internal facing production grid and an external facing production grid. Regardless of the eventual topology our move towards virtualization has an enormous impact on "shared" resources and I believe that as technologists we need to ensure we are taking in the big picture to ensure that small decisions seemingly disconnected from one another do not lead us towards a future in which trap ourselves in a corner. Every day we make small decisions that without regard to the bigger picture lead us towards an architecture that is not thought through and can have long term negative impacts. As a technology manager/leader it is my job to ensure that the big picture questions are at least asked.
I am still excited about moving to an distributed source control management system. I think there are a lot of benefits that would come along with it, however, there are also a number of obvious costs that I haven't been paying attention to.
Our current scm, perforce, is used to store both code and our content. My original goal was to have a single SCM system for both. By keeping code and content in the same system, I thought, we could more easily manage releasing code and content together. However, our content publishing system allows us to abstract the deployment of content to a certain extent using publishing "packages" (ie p4 labels) and should allow us to run separate systems and still meet our release management requirements. So, while I would like to have one system to manage perhaps, it is not such a hard requirement.
My biggest concern in moving is that we have a large number of developers and, due to our content management system using perforce as its data store, a large number of non-developer users. It will take time and training to move the hundred or so people to a new scm. The amount of money spent moving people needs to be weighed against the efficiencies gained via git (or mercurial). Some of these costs can be diminished by preserving perforce as our content repository and only moving our code to git (or mercurial). At the end of the day, moving to a new system needs to provide more benefit than the cost and time sucked up by moving. Right now I am beginning to be skeptical that that is the case and I am beginning to think that there are more pressing problems for us to tackle, such as continuous deployments.
This weekend I picked up a Jon Wegener Alaia riding a finless board is very difficult and requires relearning how to surf. However, it is teaching me a lot about how my "regular" boards work. It could be that I just want change for the sake of change, but I feel that trying different boards helps me become a better surfer.
Software is the same way, I've spent the last decade writing Java applications that use a relational database to store data. In the last 18 months or so I have been leading a change throughout the development organization to dismantle our RDBMS infrastructure and move our production systems towards using technologies such as Solr and Coherence. Additionally, I've started to use Python to test Map Reduce jobs and we moved our build system from Ant to Maven. We've had a lot of change over the last year and we will have more change coming. My hope is that the change forces us to rethink our assumptions and leads us to create faster more resilient software.
One thing I have learned from my new board and from the changes we have instituted over the last year, you have to be comfortable wiping out. The trick is to make sure you don't give up, realize that things will get better. Remember, change requires practice and most importantly, have fun wiping out.
I've been reading a very good book from our chairman's reading list called Open Leadership by Charlene Li. The book has got me thinking about being a manager in technology. Many of us have risen through the ranks due to our technical expertise and our ability to design and implement solutions for technical and business problems. As we become managers we are constantly faced with letting go and having others do the work we prided ourselves on being so good at. I struggled for a long time with letting go of coding, now my struggle is letting go of designing the solutions.
Open Leadership is really driving home the point that I need to let go of the solution. As a technology leader I need to ensure that my team is focused on the right problems and empower them to come up with solutions. Part of empowering them is allowing them to fail safely. As someone that has prided themselves on delivery it is scary to step back and give up the reigns. The irony is my hold on the reigns is purely imaginary. Once I stopped coding every day, I still write some code, I had already given over the reigns to developers more talented and capable than I.
I've been struggling for a long time trying to figure out how to get my team to really embrace owning quality. It is almost as if as long as there is a QA engineer on the team the developers rely on the QA engineer to find defects and enforce quality. While that is ostensibly the job of the QA engineer, I'd like our developers to ensure that their code is defect free before handing it off for the QA engineer. Often QA finds simple bugs that lead me to believe that the code was never really tested in any rigourous way prior to the developer being "done." After reading some posts by the Kaching team on their blog posts regarding continuous deployments I started thinking that perhaps their approach really focuses the technical leads and developers on the quality of their work. If you know that within a few minutes your commited code will be live you, probably, will spend more time ensuring it is defect free. If you don't it will become readily apparent. If not, after a few late night incidents you will.
What I need to figure out is what the code management strategy is for continuous deployments, what sort of tools we'll need, and how git fits in.
There has been a lot of talk around the office the last few weeks about source control management. Since I have been at Edmunds we have always used Perforce as our version control system. It seems to work for most users, and we have even used it as the storage engine for our content management system.
However, Perforce does have its problems, the one that I find most annoying is that the state of my local repository is stored on the server as part of my workspace specification. This "feature" causes us to implement a number of hacks within our CMS to support our centralized publishing service. This service can publish content from any branch or any label. The way workspaces handle keeping track of versions means we do a lot of work to allow the central service to meet our needs. Our first implementation used Subversion, which has its own set of problems, however, one thing we did like was that we did not need to sync to disk for anything we could request a specific version of a file from the Subversion server directly and send it out over our publishing bus without ever hitting disk. workspace specification.
Along comes Git. Several of our developers have been using Git on their own using git-p4 to keep their repositories in sync. There are several features of Git I find interesting. The primary feature I like is the ability to have multiple repositories. We've struggled for a long time trying to figure out how to implement an open source type model whereby developers become the curators of code bases. The goal is to ensure that an API stays true to its intentions and does not accumulate too much cruft. We've never come up with a good way of ensuring that the lead for an API is informed of changes, and Git's repository model seems like it would be a good fit. A lead for an API would have the authoritative repository for that API and only that repository could push changes to the central build repository.
Moving a large development group to a new source control system is a lot of work. There is training and tooling that needs to be created and moving our code and build systems would be a lot of work. I'm thinking that we could use git-p4 to have a single team test it. We're starting a new green field project in the next few weeks, I just need to convince the leads that this is a good additional risk to take on. I also have to research the git java APIs to ensure that the functionality we need for our CMS is supported as I'd like to keep one source control system in use.
This is the first post to my new technology focused blog. I am not a newcomer to blogging, however, I have never sat down to compose my thoughts on building software in such a public way. I am hoping that this blog will become a platform by which I can share my current musings and conundrums with designing and building a large public web site. I've been at Edmunds.com for almost six years now, and while this blog is not directly work related and is not endorsed by Edmunds, there are a lot of problems and topics we discuss regularly here that will influence my blog posts.
Santa Monica, CA