The issue was rather nasty, because the action that led to it was actually quite logical and made sense to the developer. I might well have done the same thing, had I encountered the code that was changed. Let me explain the issue.
Our application makes heavy use of external API’s, usually SOAP webservices, to configure systems, fetch data, etc. In this specific case, our application was calling an external webservice to push a configuration into it. The developer causing the issue found a small typo in this code: It said “Emegrency” instead of “Emergency”. Funny enough though, in earlier tests the system appeared to be working. He fixed the issue, committed it to Subversion, and went on to do his job.
Little did he know that his quick spelling fix would cause a lot of searching later on. After the sprint, we deployed to our test server, and our test users started testing the system. Quickly they found out that the configuration system would not work anymore, and let us know about this blocking issue. I started digging into the code, amongst other things by reading the diff of the sprint to find out what was changed. I found the spelling fix but quickly discarded this as the cause of the problem, since, after all, it was just a spelling fix.
It took me ages to find the problem. I faked SOAP calls, sniffed the SOAP connection to see if there was anything going wrong, checked the WSDL of the SOAP server to see if there were any obvious issues. There was nothing strange to be found! Or was there? On a closer inspection of the WSDL I noticed an inconsistency in it: The inline documentation was mentioning the element “Emergency” while the actual definition was mentioning “Emegrency”. The SOAP server was expecting the element with the typo!
Apparently, the original developer of this code had actually spotted this inconsistency (since there is hardly any documentation on this SOAP server aside from the inline docs of the WSDL) and developed to accomodate the spelling mistake. Unfortunately this was not spotted by the fixing developer. And not by me while trying to find the cause of this issue, at least not before spending a lot of time trying to find the cause of the issue.
So what to learn from this? Don’t fix things that are actually working! Even though there is a spelling mistake somewhere, be very cautious and don’t just go about fixing the mistake unless you are absolutely sure you are not breaking anything, especially when working with external APIs.
Leave a Reply