Tuesday, 28 October 2014

Using sqllite3 to store static and time series data

I've had a request to help out with using sqlite in python to store data for systematic trading. There are three kinds of data I generally keep; static data, state data and timeseries data.

Static data include:

  • Futures, and specific contract details
  • System parameters
Timeseries data includes:
  • Price data
  • Volume data
  • Fundamental data if used, eg PE ratios for equities
  • Accounting data
  • Diagnostic data, storing what the system did at various points in the past

State data relates to the control of the trading system and I won't be going into details here - its just static data which is frequently modified.

Code is in the usual  git repo. You will need pandas and sqllite3  (which came with my python distro automatically so check your own).


Creating the database

dbname="mydb"
dbfilename=get_db_filename(dbname)


setup_blank_tables(dbfilename, ["CREATE TABLE timeseries (datetime text, code text, price float)",
                          "CREATE TABLE static (code text, fullname text)"])


Here we're creating a database file with two tables in it. Notice the use of database names to abstract away from where they are stored. I find the performance of sqllite3 with massive files isn't great so I tend to stick to one table per file in practice, but for this simple example we don't need to.

If you're a SQL whizz you'll see that I am not doing any relational stuff here.

Static data

st_table=staticdata(dbname)
st_table.add("FTSE", "FTSE 100 index")
st_table.modify("FTSE", "FTSE all share")
print st_table.read("FTSE")
st_table.delete("FTSE")
 
Notice that we use staticdata so we don't need to use any SQL in these commands (in case the underlying table structure changes and to avoid having reams of repetitive nonsense), and within that the connection object ensures that the staticdata code isn't specific to sqlite3.

The sqlite3 read returns lists of tuples, which  staticdata.read() resolves to a single string.

Timeseries data

dt_table=tsdata(dbname)
someprices=pd.TimeSeries(range(100), pd.date_range('1/1/2014', periods=100))
dt_table.add("FTSE", someprices)
dt_table.read("FTSE")



We use pandas TimeSeries class as the input and output, which is then translated into database terms. sqlite has no native datetime format, only text or float, so we need to translate between pandas/datetime and text. I define a specific format for the text representation to be precise and ensure the database is forward compatible to any changes in pandas.

The end

This is a very brief and simple example and is missing a lot of error handling it really ought to have, but it should provide you with enough help to get started even if you're not a SQL whizz.

4 comments:

  1. Why don't you use MongoDB instead of SQL?

    ReplyDelete
    Replies
    1. I think mongodb is great for storing things like system configuration information and the saved state. However for straightforward time series sql is simpler. I haven't used mongodb in my "home" enviroment, only an enterprise enviroment, so I don't know how easy it is to setup.

      Having said that I have heard very good things about arctic

      https://github.com/manahl/arctic

      Delete
  2. Hey Rob, great and concise post! Thanks for the help.

    If you allow me, I'd advise you to change the code template color, it's very hard to read pale green over white. Sorry to bug in with such simple stuff :)

    ReplyDelete
    Replies
    1. Good suggestion. I will bear that in mind for future posts (can't promise I will spend time changing the past).

      Delete