Top Tips and Tools for a Data Science Career in Finance

We caught up with Graham Giller, the former head of data science research at JPMorgan and ex-head of primary research at Deutsche. These days, Giller is CEO of his own firm, Giller Investments, and has written a book, Adventures in Financial Data Science, out later this month.  

If you’re looking for develop a career in financial data, these are Giller’s tips.

What are your favorite programming languages for data science? Why?

For programming languages, my practice is now almost completely concentrated on three to four platforms:

I use Python3 for data acquisition, preparation, and management, plus some computational operations that don’t fit easily into other systems. I do not use any “notebook” interfaces, I write code in an IDE that can be scheduled automatically or run manually from the command line.

I use a combination of R and more dedicated commercial time-series analysis software for inferential work. The T-S software I use (RATS) is a minority interest program that I get on well with, but is to some extent a legacy usage. I probably wouldn’t have started with it if it wasn’t the 1990’s when I began my career. I am a fan of Mathematica, but it is not a big part of my practice.

I use SQL databases extensively and pretty complex SQL queries and operations. I’m a big user of User Defined Aggregate functions, which I have written in C++, to deploy machine learning operations at scale within the SQL database. I use the database to manage organizing and scheduling the calculations, which it does very much more efficiently than I could pull off myself….

Does Hadoop have a future in finance (or anywhere, for that matter)?

I think big-iron NoSQL platforms, such as Hadoop and its kin, are going to fade from view. Most of their technical innovations, just as schema-free storage, column oriented storage, massive parallelization, geospatial operations, free-text operations etc., are provisioned in commercial RDBMS now and those platforms can not only provide scale but also strong data management if required. I would imagine these functions will continue to downshift into the open source platforms, such as MySQL and Postgres, over the next few years. For what I do, MySQL is my current data management platform of choice.

Which languages do you think are becoming more popular in data science in finance?

From my experience, I think Python3 is still in ascent. Some shops are probably still clinging to Python2, but that is a mistake. I always urge people to “fix it now” rather than “fix it later after you’ve lost money.” R is falling out of favor which, personally, I am unhappy about because it is more rooted in rigorous inference than in “coding.”

How is the role of the data scientist in finance changing?

The role of the data scientist is becoming more that of an IT professional than a thought leader for organizations. Personally, I feel that this is the wrong direction, but it makes the IT leadership more comfortable and the non-technical leadership don’t realize that this is a problem. 

What’s your advice to people starting out?

For people starting out, who want to do analytics within a financial context, I would suggest spending time to learn time-series analysis and econometrics properly. Financial data has properties that make it quite difficult to deploy conventional tools on, and I see many pieces of work on venues like Medium etc., where people use very complex algorithms, the current favorite being LSTM networks, to essentially conclude that the best predictor of tomorrow’s price is today’s price, or far worse than that. 

Much of the work I do is quite computationally intense (hours to occasional days of compute), so it is important to understand how the algorithms you use scale with data size but do not kid yourself that you can write a better optimizer or linear algebra system than somebody who has built their career in that space. Also do not be scared of going back to square one when you find an error. If you know something is broken, it’s better to “fix it now” than labor under technical debt because that always leads to technical bankruptcy.

A modified version of this article originally appeared in eFinancialCareers.

One Response to “Top Tips and Tools for a Data Science Career in Finance”

  1. I have to disagree with your comment “big-iron NoSQL platforms, such as Hadoop and its kin, are going to fade from view.” NoSQL allows for much more rapid scalability and iteration for large amounts of data. Relational databases don’t have the same ability, though newer RDBMS are more scalable than older systems. For massive amounts of data, such as in the FinTech field, NoSQL is a big improvement over SQL systems. I believe it would be more accurate to say we will see more of a hybrid approach rather than one or the other fading from view.