Magnifying glass on migrating applications
Modernising Applications
The current trend of application modernisation is somewhat of a misnomer, especially when we consider the typical end-results from the modernisation procedure:
Typical results:
- Application code-unchanged
- Operating environment (libraries and dependancies) largely unchanged
- Application behaviour largely un-modified
- Application network interactions typically un-modified
The actual result tends to be Application environment modernisation, where the application is re-platformed onto a modern platform but is still behaving in a “perhaps” un modern pattern. This does allow a number of benefits:
- Finally getting an old application running in the same environment, but on a modern platform
- Getting the application monitored by modern tooling
- Adding the application into a consolidated platform, increased packing (utilisation)
- The application can be monitored for health and behaviour and restarted
- Application is now packaged as a container for ease of movement and re-deployment
The next questions is how do we identify an application that can be “modernised”, a lot depends on what remains of the application. With Source code, we can review it’s behaviour and perhaps make changes to make it more cloud-native. However without that, then we need to adopt an investigative route to work out what to do next.
Source code analysis
1 | import "file.h" |
The above code-snippet is an small example of a program (written in C) that performs a few simple actions:
- Attempt to open a file
state.dat
- Read the contents of this file (as an integer/number)
- Increment this value
- Write the result back to the file
We can start to infer a few things by reading through the source code:
Dependancies
Looking at the source code of any application from the perspective of what its dependencies are immediately can start to give clues as to its behaviour. The above example imports a header called file.h
that immediately leads to a conclusion that this application may be reading/writing to an underlying filesystem. (note: reading from a filesystem is perfectly accepted behaviour for a stateless application)
Variable analysis
This type of analysis is completely dependent on the “hope” that the original developer chose to utilise variable names that were descriptive enough to be both human readable and parsable by some level of analysis tool. In most programming languages variables are defined using a protected keyword (or token) such as int
or bool
to define the type of variable and a name for this variable. With this knowledge we can scan through the source code to find variable definitions that can start to define some level of application behaviour.
Code analysis (behaviour and external function calls)
The previous two methods for starting to understand application behaviour can be very quick to infer how the application will behave at runtime, especially in even large source files the dependencies are typically only declared once (at the top of the source code) regardless of the number of lines of code. However to truly understand what the application behaviour entails, we will need to analyse what functions the code is making use of and how it is calling them.
If we consider the above snippet we can see that it is making use of a header that we know is used for file based operations, we can now start to analyse the code in more detail to determine what operations the application will be performing. A simple scan should reveal that there are three operations above that are attributed to file based behaviour open(), read() and write()
.
As mentioned above opening and reading files is perfect behaviour for a stateless application, think of a stateless web server that is only serving assets such as images. Every request will result in an open("/path/to/image")
followed by a read()
function to read the image data from the file so that it can be served to an end user.
However just by the naming of the function write()
we can see infer a behaviour that will attempt to mutate the environment where this code is running and what it is interacting with. Luckily this naming of function call can be seen across functions in various languages, such as GO, python, nodejs and a number of others.. although in reading about other languages I came across a perl example (which i’ve not used in many years) which involves using the print command to a file handle
. This exists in some of the prior examples but typically wouldn’t be the average behaviour.
No Source code, No Problem (perhaps)
Without source code, we still can determine the behaviour of a program and it’s a approach to immutability (and permissions) by doing the following:
- Running in a read only environment and watching the behaviour
- Running in a container, restarting and examining the resulting behaviour
- Both scenarios in combination with a tool like
strace
to watch for what caused the failed behaviour
The first example will result in typically one of two scenarios.. the program will behave as expected taking input from somewhere, processing it in memory and sending the results somewhere else .. or .. reading the input, attempting to open a handle to where results will be stored and then .. error messages.
The second example .. has caught out, and will continue to catch people out since this paradigm began. A container environment is a brand new hire car, a brand new hotel room (built the same way every time you use it). If you left anything in the hire car/hotel room when you last used it .. well it’s gone (but the room was exactly the same, just it was a new copy).