Episode Two
Hello there, and welcome to Episode Two of ProScala podcasts. This is the second tech episode released on the 19th of April, 2021. We will go into detail with some tech topics targeting both newbies, senior, and business audiences. I’m Csaba Kincses, we’ll start in a moment.
As you already know from the first Episode Zero of this show, the show has a pet project, more concretely, a semi-finished trading robot, that can serve well for demonstration purposes. I have the intention to open source many parts of that software. This is the proof of concept episode to check out how intriguing technological details and alternate solution paths can be interpreted using a demonstration project.
Being a Scala developer, I had no objections against sticking with this language as I found all the features I could search for to design this implementation, though giving a complete overview to answer the why this particular functional language question is outwith a single episode, I’d suggest taking into account the benefits described in Episode One and what will come in the succeeding episodes.
We will try to answer why Scala is a good choice for a trading robot compared to other programming languages while telling about how such a trading robot shall be designed. As well as we will need to do some research involving paradigms, language-specific technological details, but in the end, it will shed light on how powerful Scala can be looking at the project level both cost-wise and tech-wise.
This time introducing a side topic will be skipped due to the broadness of the main topic.
The coming part will involve details that would be highly technological for my business audience. Still, such discussions could be familiar to someone, for example, with a project management background, so even if this could mean a steep learning curve for you, I expect you to go on listening; I’m on inserting the right non-tech clarifications where necessary.
To tackle the question of whether Scala is a good choice for trading robot development, we will need to break down the most important tasks such a software might do with which highlight the challenges, and will need to do some research-based case study on which common language would give us what we need to achieve our goals. By doing these, we will have the chance to get a clear picture once we organize what we have, based on the project and software design principles we have in mind.
We should ask what a trading robot does, starting from the basics. We should also bear in mind that such a system would work in many cases differently in development mode, compared to production. Suppose we would have a development version that we most likely would like to create even if it goes with some duplication of code responsible for executing, also considering that we may want to be able to do some test-driven development without that robust infrastructure requirement and scalability features we may need in production.
While starting to go through what a trading robot does, I can immediately underline why Scala could be the first thought for such an implementation. We can easily evoke memories from the previous episode about patterns and go on that track to see what Scala can do for us.
Before giving an exact solution for the why Scala question, the best idea could be to go through logic steps to tell what a trading robot actually does. Like building up a class hierarchy in mind, finding the simplest piece to extend it then. This kind of brainstorming is a nice way to detect how language features come into the picture.
If you come from the business side, either if you have experience with discussion on a greenfield projects’ possible architecture or a transition, similar talks can be familiar if your role is close to project management or if you’re involved in any agile role. If you go with a new venture, I suggest that you use listening to the coming content to develop a skill of calibrating the optimal technical knowledge level to enhance communication in agile.
The briefest explanation we could give is that a trading robot is some input-output software. Sounds simplistic, though in some edge cases, it could even be true in that form; still building up a hierarchized logic, we need to find the root that cannot really be simplified any step further, and we probably found it. The input is a current price, and the output is a trading decision.
Given something that simplified, which can only execute some basic trading logic, like some limit buy/sell action without any dynamics, cause mind that we’re building this logic bottom-up, there’s nothing more than a spot price, and a trading decision output, then comes the question, why would we need Scala at all for this?
This is probably the point where we do not yet need Scala. We could happily solve this with some imperative language, but at least we can start to identify the first bottlenecks and have a starting point to roll out complexity.
So then what would we need in the simplest setup? First, we can distinguish between a development and a production setup. Though probably with this setup, we would only need to test some no-brainer logic, we would still need to implement some read operation from a data source to run the simple logic on the spot price. Suppose the least feature demanding scenario; we read data from the filesystem going file by file or line by line of a file or a combination. With no real possibility of a read error, we read the data, run the simple logic, and assert the result we would expect, and that’s all. The production equivalent of this simple read one-by-one setup would be to ask via the network with some REST API.
These were the basic language features; all we needed was a file system and network reach. Most likely, applying the logic does not require so many language features. We would need a class with a function that can accept the incoming spot price. Error handling may be narrowed to basic network errors and may trigger a retry or a conditional retry. Regarding memory handling, we may deallocate memory after each spot price processing. Any commonly used imperative or functional languages like C, C++, Java, Python, or Haskell can do these.
Next step further, we may want to listen to a stream instead of pinging an endpoint, so in this case, we may distinguish between language features by the level of support they may provide, and it's also noteworthy to notice that the applicable patterns just got a step closer to Functional.
What are the main differences in dealing with streams instead of running a timer to run a ping routine? The timer approach is heavy-weight considering network calls as it sends complete HTTP or other network requests each time. We would use it with the assumption that it is worth sending the bigger request so that even some time may pass between requests, and most likely, this time should be enough to produce an output from the incoming data. In the previous setup, all ends are simple, so the output creation logic should run fast.
Network traffic-wise it is a no-brainer that we will look for stream implementation possibilities if we want really short-paced processing or an every tick processing implementation.
Because ticks can follow one another at a really short pace, even if the output generation logic is simple, it might come to mind that what if processing gets jammed as new ticks come faster than the output is generated? And what if we intend to create extendable code on the logic side, elevating the risk of not being able to process in real-time?
We just started to extend the hierarchy of needs of a trading robot, and even by now, things started to get tricky. On the code readability side, we need to know that a stream in Functional can be treated as an ever-long collection letting us pick an element from its end one-by-one. Later if we want to keep control over the “what” aspect of the code, and focus on what it does instead of how it does it, this Scala readability feature will be crucial.
And there we come to some cross-paths regarding what we get for granted and what not. For example, if back-pressure is handled externally by some other service than our own, this may give us more freedom in choosing a programming language of implementation. If we can expect data to flow in at a regular speed, that processing can keep pace with, as a buffer, control, or drop strategy is already implemented on the stream provider's side, then Our implementation gets simplified to providing a function to process the current incoming data.
If we need finer control on the cases when we cannot keep pace with incoming data, there may come a technology like Spark into the picture.
What does Scala give us at that point? To implement the aforementioned strategies to answer the problem of diverging from real-time, we can both employ the already mentioned pattern that treats streams as collections and control whether a processing step applied to this collection shall be run or not to keep pace with timing, or we can use technologies that let us scale infrastructure needs and decouple some parts of processing using parallel computing, both Akka or Spark could be used depending on the case.
Mind that we would only need such strategies if doing a production implementation. However, we still can create the implementation with these patterns in a layered like fashion to let us execute only the necessary processing parts in development mode.
The mentioned technologies can run a cluster of workers to make us able to comply with our processing speed needs. We can see again what a huge difference can be between development and production implementations considering such software. In development, we probably won’t need more computers than our own, as we can even mock timing. We may only employ infrastructure in dev mode if we wanted to test on so much data that would require it.
Taking a step forward, we may start to complicate the steps in between the input and the output to discover more about why we would prefer a functional language in favor of an imperative, illustrating this research with Scala.
So we had this binary logic for processing that would only be based on the current spot price, but of course, things are more complicated than that. In reality, a trading robot would base its decisions on some complex rolling dataset that is continuously generated from incoming data like the spot price.
We will most likely have a preprocessing layer of data like tick data turned into candlestick data to generate indicator data. Thus, we may have the pre-results of indicator data processing in the form of signals, and finally, the actual trading decisions. Given that an indicator value may be generated from hundreds of previous data or previous other indicator values. We can see how complex rolling data structure we are dealing with.
What is Scala’s advantage here? I’d highlight its verbose collection API.
We have come to a point when, regardless of the technicality like checking the presence and suitability of language features, we can have a look at something that is understandably business-related. Therefore, I’d like to remind my non-developer audience to focus on the business effects of the coming part cause it’s really intriguing for we can discover that a technical paradigm works for us in business means providing a kind of framework for project management.
Going back to tech: Suppose we want to use 5 indicators, out of which 3 are complex, having dependencies resulting in 10 base indicators. That sounds too much to handle for the first glance independently from which paradigm we stick to.
Imagine a monad as some data container letting us apply procedures to modify its contents. Scala collections are monads, and the common pattern is to apply chaining, which means we can call a set of functions returning the same monad but with a modified inner value. Can you see how concise and useful this is to make a complex input-output pattern less error-prone?
Given the fundamentals like using constants and providing a focus on the “what” instead of the “how” as was previously said many times, the case of the usage of monadic collections is the phenomenal example we need to see to understand how Scala and Functional enforces both good software design and clean code in the functional sense.
Why do I say functional sense clean code? I don't know how common this is, but I have personal experience with getting some criticism from people more familiar with the imperative paradigm, that collections having so many operations chained on them look weird, but this is the point where we need to look into the whys, functional design propagates the use of this.
Previously we answered a question raised from the paradigm, namely the usage of constants, but there’s more to this. In functional design, one functional block should do one thing well without tangling the code with parts responsible for lower-level implementation answering the “how.”
To highlight what we get design-wise, imagine what would happen if we diverged from following the principles of how functional tries to enforce good design. Firstly, what we do by chaining operations on a monad constant is basically creating another modified constant in one closed block. Let me emphasize the closed nature of blocks, as this gives us some advantage here. What would happen in an imperative fashion code that is more likely to be avoiding this design?
Likely, we would either introduce a new constant or variable per every operation, which would harm code readability, as we would have to read back line by line to realize that the same thing was modified many times. Also, as we should use constants, if we want to stick to Functional, this would be a waste of memory as all the intermediate phases will be duplicately stored. If we back away from constants, we expose our intermediate calculation values for possible modification because we have the chance to insert whatever we want between the lines of calculation.
One more argument on the imperative side would be to introduce some layered code design that would scatter parts of logic throughout different parts of our code, but this is not the point of Functional.
See, what we would lose with this?
The greatest advantage of Functional is that we can have an answer in one place about what our code is actually doing, we do not even need to search and click around the project to unscramble the answer for this, seeing a clear and easy to read explanation for the purpose of our code is not a riddle anymore.
I think we have reached a point to put the tech and business side of what functional design gives us, together, meanwhile handling the objections that can occur and may influence on both business and tech points of view.
The first objection that is predetermined by the previous research made is said to be using the chained application of functions on a monadic collection bravely looks unfamiliar, which is, in my view, a learning curve issue and a misunderstanding of functional sense clean code if the paradigm’s principles are not taken into account. I’d suggest looking further ahead to the other design principles that come from applying this pattern, so that we can avoid this and the descendant patterns not to become a subject to premature criticism.
Business-wise it would be a shortcoming to only check out the number of saved code lines and some gained maintainability advantages from the cleaner and more centrally structured code; just going through these reasons, letting developers bravely use function chaining with monadic collections won’t be that a great business innovation in itself.
Then comes what we get on the code design side sticking to the right functional implementation and the objections that this may bring as a knee-jerk reaction.
Let’s check these; what can easily be seen is if we stick to treating mostly everything as some input-output pattern, with which I want to highlight that we did not just make this chaining application of functions on monads, but do this with functions that do not have side effects, that only deal with what they got as input and return a transformed output, then it’s also clear to see that basically, we provide a hierarchical breakdown for everything that answers the “what” the program does.
What do I mean by hierarchical in these terms? You might feel that even if the paradigm focuses on the “what,” we still need to answer the “how.” Otherwise, we would end up with a completely abstract implementation that clearly describes its purpose but won’t do anything.
As this was introduced earlier in Episode One, Scala is a high-level language, but that does not mean we would not find lower-level functionality. What sticking with the before introduced concept of treating mostly all of our functions as ways of transforming input to an output mostly with a monadic function chaining behavior, also sticking with the principle of letting a function only deal with what it got as an argument means to us, is practically enforcing us to go from the highest-level code to ever lower, finishing with the lowest-level. And this is enforced in an orderly manner; this set of patterns does not let us mess up different level code.
The objection could be in that this code structure gives too much power for someone instructed to modify the hierarchy's top, the highest level code. I have to say it's not that easy to mitigate which code part’s modification could affect a system adversely, but contrary to many system designs, we have many clues about that due to the easy way to see through the organizing of the code. So to handle the objection of too much power at certain parts, I’d say the highest-level parts are supposed to be the easiest to read, so adding or removing a high-level function call should have an effect that is easy to be interpreted, assuming that junior colleagues should also know in general ways what the system does. There won’t be such problems with the lowest-level as that supposed to be easily unit tested. Regarding a too much power issue, maybe surprisingly mid-level code could require more skill, as that can require some experience to see through complexity, one should understand both the low-level parts from what it is built up and the high-level of which mid-level parts are the building blocks. We gain from the hierarchy of the blocks because while debugging, we can easily follow the flow of the data.
Approaching the too much power objection from the business perspective, we should see that it's the opposite. This way of software design gives us lots of power on the project level. The clear visibility given by the code organized in a way to let us see what the purpose of the actual blocks are, lets us easily decide what the best task for a junior is and what for a senior developer. This is a great point cost-wise due to the possibility of directing efficiently, not to mention easier debuggability. Considering agile, due to clear data flows, these functional patterns can also strengthen code ownership in a way easy to see through.
After the necessary detour of diving into monadic collection behavior and its business aspects, let's recall the original point aiming to summarize the pros in favor of choosing Scala. We were modeling a situation having 10 base indicators crowded in a single data structure for the reason of how Functional practically works. That was the last step till now of enumerating features we may need to model a trading robot.
We aim to provide real-time data processing to have the needed signals letting us implement a strategy. Also, we may have mechanisms for granted externally or in place to handle divergence from timely processing. If we have the calculated data to do trading, this is where a layer comes in. This is the part where we would execute the trading decision that comes from the generated data, and in this special case, we need to count with different network behavior than when we listen to our stream data. It’s clear that if we execute a trade right from the stream processing flow, that can cause problems if we do not have a solution to avoid blocking because waiting, for example, a limit trade to execute, could halt processing incoming streaming data for an unacceptably long time.
This is where the need for concurrent, asynchronous processing emerges, and luckily Scala has its solution in the form of actor implementations built-in or using the library Akka. So once there is a trading signal to be activated for a trade, executing the trade can be decoupled from the data processing flow by doing an asynchronous call.
Scala offers an advantage because its related library Akka offers probably the most robust concurrent and distributed solution compared to other languages.
To wrap up, after a long deep dive into many concepts and features, we can sum up by briefly going through the desired features for our pet project and what features this encompasses on the language side.
We needed basic file input and output, networking, streams as collections, ability to handle decision critical calculated data rolling with the collection, less error-prone and more concise ways of applying transformations to a collection, chance to employ technology to handle infrastructure to provide an adequate data processing throughput, and concurrent asynchronous solutions to handle trades because executing them will require a longer lag than what we could tolerate keeping this coupled with the stream processing pipeline.
Scala offers all of these capabilities. Besides the basics and collection-wise solutions in a functional way, it has the needed accompanying technologies like Akka or Spark, with which we can solve an infrastructure scale-out problem related to data processing, and Akka is also a powerful toolset for concurrent/async implementation needs. We can find similar capability languages in the functional world, but Scala has the advantage that it’s based on the Java Virtual Machine, giving interoperability features with Java.
Bottom line is functional languages and specifically Scala offer a lot to give us a software design advantage with its nature of giving us an agile friendly software structure if patterns are implemented in a functional sense clean way, while we are boosted with features letting us easily deal with complex data flow handling systems that require concurrency.
This chapter is about to end, I’ll be back with a new one on the 14th of June, 2021. This will be about Scala object oriented programming and how it differs from other languages.
I hope I was able to present how deeply functional ways of development affect software design and how useful this can be to keep a complex project under control.
Mind that this podcast has a LinkedIn group where you can meet great fellow tech people to discuss and stay tuned about the happenings related to this show.
I was Csaba Kincses, be back to you at the next episode; thanks for listening!