[CloudRAID] 3. Concept

This post is part of a series of posts about CloudRAID. The predecessor can be found in Markus Holtermann’s blog and the successor here.

3 Concept

3.1 Requirements

The application has to meet several requirements. At first there are data-security, data-safety and data-availability derived from the weaknesses of cloud storage services (see chapter 2.2 Cloud on page 4). Since CloudRAID is supposed to be a cloud backup solution there does not have to be a complex synchronization functionality as it is provided by several cloud storage solutions.

3.2 General Architecture

A typical three layer architecture was considered the best solution. The persistence layer is an aggregate of three cloud storage solutions as a RAID 5 unit. The application layer is an application running on a server that provides a simple Representational State Transfer (REST) Application Programming Interface (API) to the presentation layer. The connection to the persistence layer is realized via API calls to the cloud storage services’ web service interfaces. On this layer the RAID functionality is implemented and also the encryption of the files. For security reasons the application layer and the presentation layer should communicate via SSL/TLS encrypted lines.

Three layer architecture Figure 9: three layer architecture

The application layer needs a local14 storage for meta information about files. This information can be used to effectively find RAID5 chunks on different storages. The user may also want to have access to the information of the files’ back-up dates etc. The presentation layer is a simple Graphical User Interface (GUI), Command Line Interface (CLI) or maybe a website. It wraps the REST API and presents the result of different REST calls to the user.

3.2.1 Advantages

The advantage of this approach is that it is easier to react to changes in the APIs of cloud storage providers. Since one server can handle different clients and their access to the cloud storages, only the server instance has to be updated to the latest API version. The client software does not have to be updated. Additionally, users can simply access their backups from different end user equipment and from different locations. For this one only needs to establish a connection to the second layer server and not to three cloud storages. Another advantage is that the user equipment only has to transfer the “normal” amount of data to back up a file. A solution where the client software transfers the data directly to the cloud storage servers on its own would cause about 1.5 times more data traffic. But this argument only counts for mobile devices that have a slow Internet connection and may be charged for consumed traffic.

3.2.2 Disadvantages

The biggest disadvantage of this approach is that there is a single point of failure: If the server application is not available because it crashed or because its Internet connection is broken, it is not possible to access the backed up files. An architecture of the server application that allows using more than one server instance to back up files might overcome this disadvantage. One point more between storage and end user equipment means one point more where an attack or security leak can occur. Therefore the server application has to be secured by encrypted data transfer. Additionally, it has to be ensured by suitable tests etc. that the server application is secure.

3.3 Server Architecture

A first concept for the server’s general architecture based on OSGi. Several bundles were defined to give a good modularity (see Figure 10).

  • Java Native Interface (JNI) RAID contains the logic needed to split files to RAID-5-chunks using an external C library.
  • HSQL Driver contains the HSQL Java DataBase Connectivity (JDBC) driver.
  • DBAccess uses the HSQL Driver to implement classes needed to access a local database to store different data.
  • DBInterface is a single Java interface that defines the methods for the database access. It is implemented by DBAccess. The implementation of this interface is exported as an OSGi service so that the DBAccess package can be replaced by implementations for other databases.
  • PWD Manager contains the logic for password management.
  • PWDInterface is the Java interface that defines the methods for password management so that the concrete implementation can be replaced by other implementations.
  • Jetty is a web server basing on Java.
  • Jersey provides REST management for Java.
  • JSON contains JSON-java15 as an OSGi bundle and is used to send server responses to the client in form of JSON-encoded messages.
  • Scribe contains Scribe-java16 as an OSGi bundle and provides OAuth-encryption needed for different cloud storage APIs.
  • REST Service uses Jetty, Jersey and JSON to provide the interface to the presentation layer.
  • Core combines the different packages and implements the interface to the persistence layer.

Server architecture Figure 10: Server architecture

  Figure 11 on page 19 shows the actual architecture of the server application on OSGi bundle level.

  • Interfaces contains only Java interfaces defining the behavior of different bundles. By implementing an interface of this bundle another bundle is able to offer the interface implementation as an OSGi service.
  • Core statically imports only the interfaces bundle. This is to be able to get the service definitions and dynamically load services at runtime. Core provides the basic functionality for CloudRAID: merging and splitting of files. For this it uses the RAID level 5 implementation by accessing a shared object (Unix) or a DLL (Windows) via JNI.
  • MetadataManager provides the persistence functionality to store meta data of files (hash, status etc.) or user information (name, password). For the standard CloudRAID implementation the MetadataManager imports the HSQL JDBC driver to store the meta data in an HSQL database. Because of the usage of services it is possible to change the MetadataManager implementation (also at runtime) to another implementation, for example providing access to a MySQL, DB2, or Oracle database.
  • Config is the bundle that loads the configuration of the CloudRAID server. In the standard implementation the configuration is read from a partially encrypted XML file. Passwords needed for logging into cloud storage services etc. are stored encrypted. Other parameters are stored plain text. Of course the Config implementation can be dynamically changed so that the configuration is read for example from a database.
  • PasswordManager provides the functionality to get the master password of the CloudRAID server. The password is read at server start-up. The development version of CloudRAID contains a hard coded password. Later versions will use more sophisticated and secure ways.
  • RESTful is the RESTful API. It starts an Hypertext Transfer Protocol (HTTP) server that communicates with the CloudRAID client software. This bundle can also be replaced by other implementations. Possible replacements are a WebDAV interface or an SMB interface.
  • AmazonS3, Dropbox, SugarSync, and UbuntuOne are so called connectors. They provide specific implementations wrapping the regarding cloud storage APIs. By implementing an interface of the interfaces bundle they can export services to the core bundle. Core can then access the cloud storages on a unified way.
  • MiGBase64 (“Mikael Grev Base64”) is an open source (BSD license) and very high performing base64-encoder.17 Since the official implementation is not available as OSGi bundle the project was forked and slightly modified into an OSGi bundle.18

Actual server architecture Figure 11: Actual server architecture. “A → B” means that A is statically imported by B. Since the RAID implementation is not a real bundle the line is dashed.

The different interface bundles were joined together to one single bundle to reduce the number of bundles. Since interfaces have in general a very small file size using a bundle for every interface would only cause a lot of – administration and file size – overhead. The Core bundle does not implement the connection functionality to the cloud storages any more, but uses IStorageConnector services to do so (see next section). Additionally, the dependency between the RESTful API and the Core bundle was inverted: In the first design Core should load the API, but in the later design the RESTful API loads the Core bundle. The configuration functionality was also excluded from the Core bundle and transferred to an own bundle for a higher flexibility. Jersey and Jetty were replaced by javax.servlet to reduce the number of dependencies that have to be administered. The javax.servlet bundle is already shipped with the Equinox OSGi framework whilst Jersey and Jetty have to be downloaded, built, and installed.

3.3.1 Core

For a better understandability Figure 12 on page 21 shows how the Core bundle interacts with the other bundles. Since the architecture diagram above does only show hard dependencies this graphic explicitly shows soft dependencies. The red rectangular boxes with italic text are Java interfaces from the Interfaces bundle. The Core bundle loads different services via the OSGi registry. One is a ICloudRAIDConfig service that is implemented by the Config bundle. The IMetadataManager service is provided by the MetadataManager bundle, the IPasswordManager service is implemented by the PasswordManager bundle. Important is that Core loads three IStorageConnector services. Regarding services are provided by the different storage connectors. CloudRAID can use three times the same service (which is not the intention of CloudRAID). It can also use twice the same service and another service as the third one (which also annihilates the advantages of CloudRAID). The best setup is to use three different services (for example Dropbox, UbuntuOne, and AmazonS3). The services used can be controlled via the Config bundle. The only package that is not indirectly loaded by Core via a service is the RESTful bundle. This bundle loads the Core bundle as a service. To be able to do so Core imple- ments the ICoreAccess interface from the Interfaces bundle. The dependency between those two bundles was implemented in this direction because a dependency in the other direction (Core loads RESTful) would make less sense. Core does not have to know the RESTful API or other APIs that provide access to Core, but RESTful needs to know the Core service. By using this kind of dependency it is possible to easily operate two or more APIs at the same time that access the same ICoreAccess service.

Core architecture Figure 12: Dependencies around core bundle. Empty arrowhead =ˆ implementation of interfaces from interfaces bundle; dashed line =ˆ soft dependencies via OSGi service

3.3.2 Storage Connectors

Figure 13 on page 23 shows a more detailed view of the storage connectors respectively their dependencies since the diagrams above does not show all relevant soft dependencies between the bundles. Every bundle providing an IStorageConnector service implements the regarding interface from the Interfaces bundle. The DropboxConnector, UbuntuOneConnector, and the AamazonS3Connector additionally require third level bundles. Every storage connector loads the Config service using the OSGi registry. Without the Config service the startup of the storage connectors would fail since the storage connectors need to know API access tokens, user names, and passwords to be able to log in – for security reasons and portability they should not be hard coded in the bundles. As showed in listing 1 on page 22 every IStorageConnector service has to implement eight methods. The create() method is called first – it gets the configuration parameters and checks, if all prerequisites are fulfilled. The connect() method executes the actual login to the cloud storage service and retrieves for example access tokens. The disconnect() method is not necessary for the currently supported cloud storages but is thought for services that support logging out from a service. The upload() method creates a new file on a cloud storage, if and only if (iff) the file is not already on the cloud storage, while the update() method uploads a new file version, iff the file is already on the cloud storage. get() returns an InputStream that reads a file from a cloud storage while getMetadata() reads file metadata. delete() removes a file from a cloud storage.

IStorageConnector interface Listing 1: The IStorageConnector interface.

StorageConnector architecture Figure 13: Dependencies of storage connectors. Empty arrowhead =ˆ implementation of IStorageConnector interface (includes hard dependency); solid line =ˆ hard dependency via import; dashed line =ˆ soft dependencies via OSGi service