General > Benefits of HTTP Digest Authentication vs API Keys

So I was inspired to blog this by something Leigh Dodds said at yesterday's Talis platform open day about the fact that their SaaS Semantic Web platform relies upon HTTP Digest authentication rather than API Keys as the mechanism for authentication and access control.

The point I want to get across is that HTTP Digest authentication is cool and far preferable to API Keys as far as I'm concerned, so what exactly do I think is wrong with API Keys:

  1. API Keys are Ugly - API Keys are more often than not just ugly, they are usually some form of hexadecimal string that can be quite long. They aren't nice to look at and they clutter up your URIs and code.
  2. API Keys are Not Memorable - They are not easy or possible to remember (unless you have an eidetic memory) which means you'll inevitably end up forgetting or losing them at some stage.
  3. API Keys are Visible - API Keys are almost always embedded as a parameter in a URI which means if you want to share URIs that use an API then you are forced to share your API Key. This is a bad idea since your API Key may allow someone to access sensitive data or manipulate your resource without you realising. Due to this they also often end up being hard-coded into the code base which reduces maintainability and is unnecessary.

Despite this API Keys fulfil a useful role as far as the providers of HTTP APIs are concerned, they allow them to control what a specific user does and limit the amount of requests a specific user makes.

Where they start to fall down is when you want multiple people to share access to some resource which which uses API Keys. For example there's typically no way to say that a particular person has read-only access while another has full read-write access other than to get separate API keys for each person. If you start having multiple API keys then you have the potential for mixing up API keys and giving the wrong user the wrong access permissions and to boot any kind of audit trail you have is linked to cryptic API Keys not to user names.

Here's where HTTP Digest authentication comes to the rescue, each user who uses your API is granted a user name and password just like for any other website/computer system. Straight away this addresses points 1 and 2 of my complaints about API Keys since a user name/password combination is perfectly readable and easy to remember. As for point 3 you can now share your URIs without sharing your credentials, since the credentials are entirely separate to the URI it's now safe to share URIs since without the credentials someone else can't invoke the API. Even if they have credentials for the relevant API it is unlikely that their credentials allow them to do undesirable things with your data/resources.

The advantage of digest authentication is that it occurs as part of the HTTP protocol rather than being part of the URI (as with API Keys) so for a start no need to hard code stuff into your code! As you only need credentials when you want to make a request you can build your application so it prompts users for credentials when required or stores credentials in some secure manner separate from your code base.

We also no longer have the issues of one API key per user, since we have one user name per user it should always be clear which user did a particular action (without resorting to looking up whose API key is whose). As for the problem of different access permissions it's reasonable to assume that any API that uses credentials rather than API keys is set up behind the scenes to support proper access control mechanisms which means you can permit each user specific permissions on your data/resources.

So what are my takeaway points:

  1. HTTP Digest is Invisible - Authentication happens behind the scenes outside of the URI, sharing a URI does not share your access to the data/resource provided by the API
  2. HTTP Digest is Memorable - We all remember numerous user name and password combinations already so remembering an additional one is far easier than trying to remember a hexadecimal API key
  3. HTTP Digest is HTTP friendly - Digest authentication leverages the protocol rather than the URI to provide authentication - this fits nicely with the REST methodology which most modern HTTP APIs are based upon and is all about using the protocol properly

25/03/2010 17:18:36 by Rob Vesse in English
17417 Views


Twitter about this

Tags: API, API Keys, Authentication, Credentials, Digest, HTTP, Key, Opinion, REST

General > Work in Progress - SPARQL Features and Performance

So one of my favourite aspects of the library is the SPARQL engine - I fully admit it's not the best out there but it's becoming increasingly powerful as I implement more and more of the SPARQL 1.1 specification and I continue to come up with ways to optimise the engine.

Note that all my effort is now focused on the Leviathan engine which was introduced in the 0.2.0 release and is far more powerful and maintainable than the previous Labyrinth engine which is due to be removed in future releases. This means that any improvements and new features I add are only in the Leviathan engine, if you aren't already using it you should be (it's set as the default engine from the 0.2.0 release onwards)

SPARQL Property Paths

So I've talked about SPARQL Property Paths previously and this is a feature which I just love - initially as an implementer I was worried about the cost (and realistically they can be quite complex and costly to evaluate) but they allow you to write really useful queries e.g.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT * WHERE {<http://example.org/someone> foaf:knows+ / foaf:name ?name }

The above example finds the names all the people that the person identified by the URI <http://example.org/someone> knows in one/more steps. So this literally finds the names of friends of friends!

But property paths also allows us to do things like implicit inference e.g.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/schema#>

SELECT * WHERE { ?class rdfs:subClassOf+ ex:SuperClass }

The above will find anything which is a sub-class of the given superclass regardless of the number of steps between the two classes.

Note that you aren't limited to having specific path starts or ends so you can do the following which finds all paths between a class and its superclasses:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT * WHERE { ?class rdfs:subClassOf+ ?superclass }

Path Lengths

Path Lengths is an open issue for the working group and is unlikely to make it to standardisation but when I implemented support for full property paths I realised my implementations gives this as a by-product so I added a syntax extension like so:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT * WHERE {<http://example.org/someone> foaf:knows+ LENGTH ?l ?person }

The above example gives you all the people that know the person identified by the URI <http://example.org/someone> and tells you how many steps away from them they are. Note that paths do not permit duplicates so you'll always get the shortest path length possible.

The LENGTH ?var syntax must follow the end of a path and binds the length of that path to the given variable. This then allows you to filter/order by and do other calculations on the length of the path. Note that doing a FILTER(?var > 4) is not advised, it's far better to just explicitly state the cardinality constraints in your path e.g. foaf:knows{4,}

IN and NOT IN

SPARQL 1.1 provides for two new operators which allow you to filter a variable to a set of values e.g.

PREFIX : <http://example.org/schema>

SELECT * WHERE {?s ?p ?o . FILTER(?p IN (:propertyOne :propertyTwo)) }

Note that this is not necessarily the best way to write queries which are efficient, it simply gives you a shorthand as opposed to writing the following:

PREFIX : <http://example.org/schema>

SELECT * WHERE {?s ?p ?o . FILTER(?p = :propertyOne || ?p = :propertyTwo) }

Performance

Being an in-memory engine performance is always going to be an issue for Leviathan so we've worked hard to improve performance over time. Unfortunately while we've made some impressive improvements we've struggled with scalability up until now. I've just today been working on a new indexing optimisation technique which shows dramatic results in testing and leads to linear rather than exponential scaling. Below are some results from the Berlin SPARQL Benchmark comparing different versions of our SPARQL Engine:

Berlin SPARQL Benchmark Results for dotNetRDF

We use the same datasets generated with the BSBM dataset generator using the scale factors on the X axis and perform 50 runs with 10 warmups for our testing. The value given on the X axis is the total runtime for the 50 runs.

As can be seen the improvement is massive, the improved optimisations allow us to run the benchmarks against datasets which were previously far too slow to even attempt. You'll probably notice that there is some data missing for some implementations - this is because we decided when we ran the tests originally that the performance difference was so minimal that it was unnecessary to run further tests. We use these tests mostly as a guide and a check to see if experimental optimisations have had a beneficial effect on performance so if they should little/no difference we tend to limit the number we run.

19/03/2010 17:26:56 by Rob Vesse in English
14777 Views


Twitter about this

Tags: Benchmark, BSBM, dotNetRDF, Leviathan, Query, SPARQL

Releases > Site Hacked & Full SPARQL 1.1 Property Paths coming

I'm sure some people will have noticed that we got hacked earlier in the week - the vulnerabilities that were exploited have been fixed and we are (fingers-crossed) now fully secure again. The attack was a relatively unsophisticated attempt to high jack the website so it served a malware script to everyone who visited, fortunately because of the underlying architecture of the CMS we run the database corruption just caused the software to crash after outputting the <head> section of the HTML page so we thankfully didn't serve anyone malware.

Sorry for any inconvenience caused - remember you can always Download dotNetRDF from our SourceForge Project Page if the site is ever down for any reason.

In the time it has taken us to get the site back up and running we've been busy continuing work towards the next release of the library and one of our key features will be full support for SPARQL 1.1 property paths. The current release parses them but can only evaluate simple fixed length paths which can be transformed into a BGP in the SPARQL algebra. In our latest SVN builds we can evaluate any paths bar a few bugs in alternatives which we're working on at the moment, we're also going to be providing a syntax extension which will allow you to extract the length of paths. More information and a demo to follow in the next couple of weeks.

18/03/2010 11:09:50 by Rob Vesse in English
12414 Views


Twitter about this

There are currently no Tags for this Content!

Releases > dotNetRDF 0.2.1 Alpha Released

The latest stable release of dotNetRDF is now out and available for download - Download dotNetRDF

Key new features in the latest release are as follows:

  • RDFa support
    • RDFa can now be parsed from HTML and XHTML and supports malformed real world "tag soup" HTML
    • XHTML+RDFa representations of Graphs can now be produced
  • SPARQL 1.1
    • Support for parsing property paths and executing some simple paths - see SPARQL Engine for details
  • Inference
    • Improved RDFS reasoner now supports class and property hierarchies plus domain and range based inferencing
  • Storage
    • Various bug and stability fixes for stores including Virtuoso, 4store, Talis and Joseki

11/03/2010 15:15:17 by Rob Vesse in English
11153 Views


Twitter about this

There are currently no Tags for this Content!

 
 

Powered By Visual Log from Visual Design Studios

Visual Log is Licensed Free for Any Use on this Website (User is Unregistered)