OpenStack Swift Introduction
The OpenStack Object Store project, known as Swift, offers cloud storage software so that you can store and retrieve lots of data with a simple API. It's built for scale and optimized for durability, availability, and concurrency across the entire data set. Swift is ideal for storing unstructured data that can grow without bound.
Swift basically stores and serve files over http with standard file system features such as list, stat, put and delete in addition to access control and the ability to share private files through temporary URLs.
There's a vast amount of clients, backends and backup software supporting OpenStack Swift spanning multiple operating systems, platforms and languages. It can be used for anything from backup, general file storage and hosting images and static files for web pages.
A very common misconception is that swift only handles files in a local filsystem. Using the language APIs you can feed any stream into Swift. This opens for a lot of possibilities. Some providers use RADOS Gateway that do require a Content-Length
header. This can be a challenge in instance were you don't exactly know the size of the stream. Zetta.IO supports uploading streams without this header.
This introduction is fairly low level but reveals many important details that will be beneficial to know.
Authentication
The standard way to authenticate using OpenStack clients is using an RC file. This is just a bash script setting environment variables in your shell/terminal. You can download a generated RC-file in our dashboard under Overview -> APIs.
Example using rc file (bash):
$ source openrc-zettaoi-mydomain-myproject-myuser.sh
Enter password for myuser in domain mydomain: ***************
We have now added authentication environment variables and are ready to use all OpenStack clients.
List all containers in swift:
$ swift list
backup
test
documents
Containers
To store files in Swift you have to make a container. All the files in the container will follow the rules you defined for this container such as access control and storage policy. You can make as many containers as you want and will only be billed for the amount you store.
Swift File System
Swift's file system is really a flat structure of key-value pairs where the key is a path and the value is a file/object. Instead of directories, swift uses pseudo-folders. You don't really have to worry about the difference. Just think about it as a normal hierarchical file system for now.
Objects/files size are limited to the max_file_size
in the cluster's capability endpoint and defaults to 5GB. Larger files can easily be supported by segmenting the file. Multiple 5GB (or lower) objects can be referenced and download as a single file through a manifest file. Swift has support for Dynamic Large Objects and Static Large Objects. This concept is explained later in the article.
The advantage of this limitation is that the cluster can easier deal with object replication. It also adds the ability to upload multiple segments concurrently for higher transfer rates. DLOs and SLOs also have other features, such as appending a file without having to re-upload the entire file from the start.
Basic Usage
A full reference can be found in the official Swift Client documentation. You can also use the general OpenStack client, but the syntax is very different and it may not support the more advanced options.
Important
Mac OS X users may experience a filename conflict between Apple's swift language interpreter and the swift client if Xcode is installed. To solve this you need to either modify your path or use a full path to the correct executable.
While it's probably more common to Swift as a storage backend embedded into existing software, it's still useful to know how basic management works. Our dashboard support browsing swift containers, and has the ability to delete and upload files up to ~1GB. There's also nothing wrong with using the Swift client directly to upload files, either manually or in scheduled tasks.
Create a container mycontainer
that will by default be set to private.
$ swift post mycontainer
$ swift list
mycontainer
Let's display the container's information as well. We will come back to some of these fields later.
$ swift stat mycontainer
Account: AUTH_aabbccddeeff00112233445566778899
Container: mycontainer
Container: mycontainer
Objects: 0
Bytes: 0
Read ACL:
Write ACL:
Sync To:
Sync Key:
Accept-Ranges: bytes
X-Storage-Policy: Standard
X-Timestamp: 1488909310.79677
Content-Type: text/plain; charset=utf-8
X-Trans-Id: abcdef123456789-0123456789
We have some local files we want to upload.
$ ls -l
-rw-r--r-- 1 demo staff 239743 Mar 7 18:49 document.pdf
drwxr-xr-x 3 demo staff 102 Mar 7 18:49 images
$ swift upload mycontainer .
document.pdf
images/logo.png
$ swift list mycontainer
document.pdf
images/logo.png
Delete the local files and download the container:
$ rm -rf *
$ swift download mycontainer
images/logo.png [auth 0.526s, headers 0.754s, total 0.755s, 0.038 MB/s]
document.pdf [auth 1.541s, headers 1.832s, total 1.893s, 0.682 MB/s]
$ ls -l
-rw-r--r-- 1 demo staff 239743 Mar 7 18:49 document.pdf
drwxr-xr-x 3 demo staff 102 Mar 7 19:11 images
Commands also support a prefix path you can use to only affect a subset the files.
$ swift list mycontainer -p images
images/logo.png
$ swift download mycontainer -p images
images/logo.png [auth 0.495s, headers 0.738s, total 0.738s, 0.036 MB/s]
Delete a file:
$ swift delete mycontainer images/logo.png
images/logo.png
$ swift list mycontainer
document.pdf
Delete the entire container and all the containing files:
$ swift delete mycontainer
Storage Policy
When and only when creating a container you can define the storage policy it should follow. Policies are defined by the service provider.
We have the following storage policies:
- Standard : Stores 3 copies of the data
- Reduced : Stores 2 copies of the data
Using the Reduced storage policy is cheaper but have the obvious side effects such as higher risk of data loss and possibly lower read speeds from the cluster. The Standard policy will be applied by default.
The available storage policies and other capabilities in swift are exposed as json or they can be displayed using the official swift client with swift capabilities
.
Policies are also referenced if you stat your account to get summary of your current usage:
$ swift stat
Account: AUTH_aabbccddeeff00112233445566778899
Containers: 4
Objects: 53
Bytes: 1263057
Containers in policy "reduced": 0
Objects in policy "reduced": 0
Bytes in policy "reduced": 0
Containers in policy "standard": 4
Objects in policy "standard": 53
Bytes in policy "standard": 1263057
X-Timestamp: 1432059075.23025
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
X-Account-Project-Domain-Id: 998877665544332211aabbccddeeff
X-Trans-Id: abcdef123456789-0123456789
Access Control
Newly create containers are by default set to private. We can change this by using swift post
to set ACLs. A more comprehensive list of examples of ACLs can be found in the official OpenStack documentation. The dashboard supports toggeling containers as private and public, but that may overwrite more fine-grained ACLs set using the cli tools.
Let's create our container again and upload the previous test files:
$ swift post mycontainer -r ".r:*,.rlistings"
$ swift upload mycontainer .
Navigate to https://objects.zetta.io/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer?format=json
(or fetch the container URL from the dashboard) using a web browser and you will see an output similar to this. Notice the ?format=json
query parameters is used to request a container list in json
format. By default swift will return xml
.
[{
"hash": "7aeb0d094f47e6a8035486c10cbcd315",
"last_modified": "2017-03-06T18:56:01.062430",
"bytes": 239743,
"name": "document.pdf",
"content_type": "application/pdf"
}, {
"hash": "06ea2496bcd74ddaf0232fb1ed551a8e",
"last_modified": "2017-03-06T18:56:02.105210",
"bytes": 8664,
"name": "images/logo.png",
"content_type": "image/png"
}]
With the additional -r
option we specified that requests with any HTTP referer header can read container contents. In addition we added .rlistings
also allowing the container to be listed as shown in the output above. Without .rlistings
all files in the container can still be accessed if the paths are known.
Running a stat on the container clearly displays the read ACLs.
$ swift stat mycontainer
Account: AUTH_aabbccddeeff00112233445566778899
Container: mycontainer
Objects: 2
Bytes: 248407
Read ACL: .r:*,.rlistings
Write ACL:
Sync To:
Sync Key:
Accept-Ranges: bytes
X-Storage-Policy: Standard
X-Timestamp: 1488909310.79677
Content-Type: text/plain; charset=utf-8
X-Trans-Id: abcdef123456789-0123456789
Note that files in private containers can also be shared temporarily using the temp-url feature in OpenStack Swift. Some prefer generate temp-urls even for web content to make their files harder to index and track for various reasons. The container is then private and the web server will generate a temp-url per resource only available for a short period of time. It's fairly cheap and will be covered later.
Cross-Origin HTTP Request (CORS)
When you want to embed files from swift into an HTML document you will have to deal with CORS headers. Swift will have to tell your web page that it consents to this. If you don't configure CORS headers on the container they will be missing in the response and the users browser will interpret this as a deny. This also includes requests using javascript or anything that is more than just a link. More details about CORS can be found in the Mozilla developer documentation.
Let's attempt to do a CORS request using curl. We are simulating a browser doing a CORS request from a users browser from http://webapp.example.com.
curl -i -XOPTIONS
-H "Origin: http://webapp.example.com"
-H "Access-Control-Request-Method: GET" https://objects.zetta.io:8443/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png
HTTP/1.1 401 Unauthorized
Content-Length: 131
Content-Type: text/html; charset=UTF-8
Allow: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
X-Trans-Id: tx5d62a6590a434d8bbb59c-0058bf207c
Date: Tue, 07 Mar 2017 21:05:00 GMT
As we can see this request retuned HTTP/1.1 401 Unauthorized
because swift have not allowed http://webapp.example.com to access the files in the containers through a CORS request. This is how browsers do all requests to external resources.
To solve this we set the X-Container-Meta-Access-Control-Allow-Origin
header on the container:
$ swift post mycontainer -H "X-Container-Meta-Access-Control-Allow-Origin: http://webapp.example.com"
Let's stat the container to see the changes:
$ swift stat mycontainer
Account: AUTH_aabbccddeeff00112233445566778899
Container: mycontainer
Objects: 2
Bytes: 248407
Read ACL: .r:*,.rlistings
Write ACL:
Sync To:
Sync Key:
Meta Access-Control-Allow-Origin: http://webapp.example.com
Accept-Ranges: bytes
X-Storage-Policy: Standard
X-Timestamp: 1488909310.79677
Content-Type: text/plain; charset=utf-8
X-Trans-Id: abcdef123456789-0123456789
We can see that a Access-Control-Allow-Origin
field is added with value http://webapp.example.com
. Doing the same request with curl...
curl -i -XOPTIONS
-H "Origin: http://webapp.example.com"
-H "Access-Control-Request-Method: GET" https://objects.zetta.io:8443/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png
HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://webapp.example.com
Access-Control-Allow-Methods: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
Allow: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
Content-Length: 0
X-Trans-Id: abcdef123456789-0123456789
Date: Tue, 06 Mar 2017 17:38:57 GMT
More detailed configuration can be done with CORS headers. This can be found in the official OpenStack documentation.
Cache
Swift will always respond to an object with Etag
and Last-Modified
header.
$ curl -I https://objects.zetta.io/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png
...
Last-Modified: Tue, 06 Mar 2017 23:36:55 GMT
Etag: 06ea2496bcd74ddaf0232fb1ed551a8e
Temporary URLs
Temporary URLs is a way to give temporary access to a private resource for a period if time. Before we can use the feature we need to define a secret key on our account. Most (if not all) clients and backends support temprary urls and will generate them on the fly if you configure it with the secret key.
$ swift post --meta "Temp-URL-Key:secret-key-data"
The secret-key-data
should be a properly generated secret using ascii characters. We can now generate a temp-url for the resource using the swift client.
$ swift tempurl GET 86400 /v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png secret-key-data
/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png?temp_url_sig=fabfda7b618816e8aef8d253cb0241a4ad7bb47c&temp_url_expires=1489013276
This allows access to the file using http GET
for 86400 seconds (24 hours). Notice that we have to use a full path after the domain for this to work. The resulting url is also returned in the same format. The private files is now public through https://objects.zetta.io/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png?temp_url_sig=fabfda7b618816e8aef8d253cb0241a4ad7bb47c&temp_url_expires=1489013276
.
To generate this URL is fairly simple in any language. It's using keyed-hash message authentication (HMAC) with sha1. The message we hash has the following format: Http method, a unix timestamp when it expires and the full unquoted path to the file. (unquoted here meaning it should be be url encoded)
GET
1489013276
/v1/AUTH_{{YOUR_PROJECT_ID}}/mycontainer/images/logo.png
Since we are sending the signature though temp_url_sig
and timestamp though temp_url_expires
as query parameters, the server can reconstruct the message and verify the access easily. Temporary URLs are more or less inexhaustible. Examples generating temporary urls can be found in the swiftclient code base.
Static Large Object (SLO) and Dynamic Large Object (DLO)
These are covered in detail in official OpenStack documentation, but we'll cover the basics here. Both SLOs and DLOs allow for uploading segments concurrently for faster uploads.
DLOs can be created by uploading multiple objects with a common prefix. A 0 byte manifest file with a X-Object-Manifest
header containing the container name and the prefix the segments (example: mycontainer/segments/large-file
) are located is then created to represent all the segments as a single file. Fetching the manifest will stream down all the segments in alphanumeric order.
The advantage with DLOs is simply that they are dynamic as the manifest file only references a prefix in a container. You can keep adding new segments with the same prefix at any time causing the manifest to represent an even larger file. Files that should be appended can take advantage of this. The size of each segment must be at least 1 byte and can vary between files and the number of segments can be in the thousands.
SLOs are very similar to DLOs except that the manifest file is a json file referencing every single segment. The format is described in detail in the official documentation.
Official Python API
The official python api is fairly straight forward to use if you know the general workings of OpenStack Swift. Do note that the official API is separated into a connection api and a service api. The connection api is low level while the service API is much more high level and supports uploading in multiple threads/workers to handle working with large number of files efficiently.
The HTTP API is documented at https://developer.openstack.org/api-ref/object-storage/