Learning & Development

Conducting an LRS Needs Analysis

10 June 2016

Chris Dunne

So you want to start with xAPI? Wonderful! Now comes the first hurdle: selecting a Learning Record Store (LRS). Of course we think you should choose Learning Pool Learning Record Store, but beyond brands, what are the key questions you should be asking of any LRS provider?

This checklist is the process we run through when working with a new client to discover more about their LRS requirements. You might not need to answer every question, and I’m sure it’s not exhaustive, but it will give you some sensible questions to think about and ask your potential suppliers / developers about and help with your LRS analysis.

28 Questions to Ask When Deploying an LRS

Before you get started on the technical aspects, there’s a few other things that you need to consider…

Has the outline Use Case been gathered?

Knowing about the overall objective in adopting the xAPI will inform your answers throughout this LRS analysis.

Will it be SaaS or deployed On-site?

One of the first decisions you’ll need to make is whether you want to host your own LRS, or use an online service. An online service will be quicker to setup, probably cheaper short-term (unless your labor cost is zero) and will be tried and tested. However, you will need to be comfortable with data storage / ownership responsibilities (is all this data OK in the cloud or do you need on-premise?) and you’ll need to be comfortable with the medium / long term costs of continually paying for a service.

Which environments are required (Dev/Staging/Production)?

As you develop your application, you are likely to want to do some testing. Remember, xAPI statements are immutable. You don’t want to be poisoning your ‘live’ data with test data. At the very least, your system should be able to segregate data logically into different LRSs for different purposes. If you are consistently pushing updates, you might want to invest in a third environment, which can be used for further testing before committing. So it might be you don’t just want a single instance of an LRS; you may need two or three.

If you need On-site Deployment

If you’ll be deploying your LRS on-site, you need to be asking yourself these questions:

Is the solution to be load-balanced?

The availability of the LRS can represent a single point of failure in a learning ecosystem. Where this is the case, any on-site deployment should have built in redundancy. This can be provided by using a Load Balanced setup, distributing the LRS application between multiple application servers and having the database itself located on different servers to the application itself..

What experience have we had with NoSQL?

Typically, most LRSs will utilise a NoSQL-style document database for storage. These are well suited to storing non-relational data like xAPI statements. They scale well and they are built for redundant running (for example, Replica Sets are a standard feature of MongoDB). But this is a different tech stack to the normal LAMP/WAMP setup. Do you have the internal capability to manage this?

Estimated concurrency?

How many simultaneous requests will be made of your servers? How many statements will be sent per minute? How long which each request take to process? How many queries will be made of the data? A handy tool to help you calculate this is Little’s Law.

How much data can be lost in worst case scenario?

The killer question for redundancy and backup is always how much data can you afford to lose? Of course the preferred answer is ‘none’ but that tends to be unrealistic in the face of cost / benefit analysis. In the worst-case scenario, how much data could you afford to lose and hope to recover normal operating practice?

For many circumstances putting in some failover mechanism and also doing off-site daily backups is enough. But in high-risk or testing environments, even that might not be good enough. How can you get that 24hr number down to 1 or 2 hrs without breaking the bank? How much money will you have to spend to reduce this number?

Backup / DR requirements?

What existing disaster recovery processes are already in place? See above.

How is your data secured at rest?

How is the server / database hardened against intrusion? Is the database to be encrypted? What about physical access controls?

If you want SaaS Deployment

If you’re going down the Saas route, be sure to ask the following:

Do you want a Single or Multi-tenanted Application?

Typically, Software-as-a-Service is delivered as a multi-tenant application – your data will sit alongside other orgs data at rest. It will be logically separated. Is that OK? What happens if you want to go single-tenant?

Are there any location restrictions?

Does your organisation require data to be stored in a particular geographic location? Or, perhaps more likely, are there particular areas of the world you are required to avoid your data-at-rest?

How much data can be lost in worst case scenario?

If your provider’s data centre blew up, how much data could you loose?

What are your Backup / Disaster Recovery requirements?

What existing disaster recovery processes are already in place?

How is your data secured at rest?

How is the server / db hardened against intrusion? Is the database to be encrypted? What about physical access controls? What certifications does the provider have? What’s the contract and relationship between software provider and host?

Sending Data

Regardless of your deployment scenario, you’ll want to know the answers to the following questions regarding sending your data:

Who are the initial Activity Providers?

What systems will be used to create and send xAPI data to the LRS initially? Are they all using v1.0+ of the spec? Any known issues?

How are users identified?

If multiple Activity Providers (APs) are in the system, you will need to ensure they are using a common identifier for users. Mailbox is most common, but isn’t very desirable – mail addresses change and they also explicitly identify a user in plaintext. It would be better to use an Account Number, a unique identifier, to be used throughout the ecosystem. If we don’t have a single identifier you will need to make sure your LRS can help reconcile users – like Learning Locker does with creating personas for all a learner’s different identifiers.

What Queuing system is in place for each AP?

For most production systems it is not good enough to simply ‘fire and forget’ xAPI statements; they should be queued and tracked to make sure they actually get to the LRS in one piece. For example, if the LRS wasn’t available for any reason, would the Activity Provider cache/store the statement to be resent at a later time?

Are AP’s using standard libraries (TinCanPHP, TinCanJS)?

The easiest way to be conformant with the spec is to use standard libraries. If your AP doesn’t use these, how do they evidence that they are following best practices?

Will PUT requests be chunked?

If your AP is sending a lot of statements, does it have any ability to chunk up requests to place less load on the server? It’s generally easier for a server to process 1 request with 1000 statements, than it is to process 1000 requests with 1 statement each. This is perfectly valid; the LRS will differentiate between the observed time and the stored time of statements.

Is SSL in place?

Your data should be secured in transit, as well as at rest. Every time.

Storing Data

The answers to your data sending questions are important, as is the storage of your data so be sure that you also get answers to the following:

Likely volume of Statements to be stored?

Your AP’s should have adopted recipes and you should have some notion of how large your audience is going to be. You can use these two numbers to ballpark estimate the amount of statements that could be stored by the system. This can be important in SaaS circumstances, where you are most often charged on the basis of data stored. See our previous post on how many statements Learning Locker can store per GB.

Using Attachments? If so, how?

Statements allow for pretty much any sort of document to be ‘sent with a statement as an attachment. Will your AP’s use this? Any restrictions on file types / file size on your servers? Can these attachments be accessed outside of querying statements? Will you use a CDN?

What Archiving process do you have/use?

Do you have any process for archiving old data? If not, think about it… the LRS is going to get pretty big in years 2, 3 and 4…

Is any PID stored?

If your statements are stored as raw plaintext, is any personally identifiable data associated with your users? Mailbox is a likely candidate here. Avoid where possible.

Retrieving Data

Sending and storing data queries resolved, you also need to consider how you’ll share and query your data…

Is real-time data required?

Will other tools consume data from the LRS in real-time? If so, how will you facilitate this? Is real-time really required, or could it be near-time?

Will you customise indexes?

Data retrieval can be optimised by creating new indexes based on common requests. You can know some of this in advance, based on your use case, but it can be an on-going process.

Will it be linked to another LRS / Data warehouse?

How will data be pushed to other data warehouses? Setting up a cron?

Will you queue data for transmission elsewhere?

If other systems rely on the LRS to push data to them, is a queuing system in place to ensure data can be resent if it fails for whatever reason?

Get started with your LRS analysis today

Needs Analysis complete, it’s time to get started! At Learning Pool we offer additional LRS Enterprise services and consultancy alongside the free, open source version of Learning Pool Learning Record Store. Find out more about your options here.