HIFIS transfer service#

The HIFIS transfer service enables Helmholtz scientists to transfer large data sets between sites. In order to achieve a comfortable solution, we can use an instance of CERN’s FTS3 in combination with WebFTS. FTS been developed at CERN and is used for the distribution of experimental data to hundreds of LHC tier centres. WebFTS is a browser-based UI for intuitive use of FTS. The advantage of FTS in comparison with FTP-based solutions or Dropbox is that it can commission third-party data transfers between endpoints that will run asynchronously. Transfers of large data sets can thus be commissioned in a ‘fire-and-forget’ manner.

In order to use FTS at a Helmholtz centre, an endpoint needs to be present there. Up until now, dedicated WLCG storage solutions, e.g. dCache, DPM and EOS, had to be installed on site. For HIFIS, an Apache web server with some modified modules can be used instead of the rather complex WLCG solutions.

The modules needed and the modifications that need to be patched in can be found at HZDR gitlab.

This is a copy of the instructions hosted in the repository:

The repo features the implementation and configuration details for an Apache2-server that is capable of being addressed by CERN’s FTS3 in order to do 3rd-party-copy (TPC) of large datasets. The Apache2 instance has been chosen because of its ubiquity and reliability. The modules that are used are tested and externally maintained, apart from a patch applied to the optional mpm-itk, which has to be applied manually and is not being maintained externally.

Purpose and features of this endpoint realisation in short:

  • Apache2 webserver capable of serving files through utilisation of the WebDAV protocol.
  • WebDAV endpoint is secured by OAuth2
  • With the mpm-itk module and a lua script, it is possible to map a remote user to a local user on the system and read & write files to the filesystem while honouring the local user’s ACLs
  • Together with a prototype module the Apache is also able to compare checksums of transmitted files in accordance with RFC 3230 (see mod_want_digest in this repo).

Prerequisites#

  • a machine that can be accessed on port 80 and 443 from the outside
  • a place to store data, e.g. a persistent volume or a storage endpoint that can be accessed by the Apache2 server, this is the PROTECTED_LOCATION in the default-ssl.conf
  • install the necessary packages with apt (ubuntu) on the machine
    • for the apache, this is simply apache2, apache2-dev and libapache2-mod-auth-openidc
    • for the OIDC/OAuth2 support, you can also build mod_auth_openidc by zmartzone’s Hans Zandbelt by yourself, the dockerfile included in the repo tells you which packages to install additionally
    • if you need the transferred files to have a certain user’s ACLs, you will also need the mpm-itk module mpm-itk and replace mpm_itk.c with the corresponding file in this repo.
  • get an SSL certificate either by
    • using Let’s Encrypt, if you only need a certificate for quick testing (n.b.: LE only offers Domain Validation certificates!)
    • requesting it from a CA such as gridcert by KIT GridKA
    • requesting it from your local CA if you need it within the context of your home organization
    • making one yourself, if the machine is just for testing purposes, tutorial here: Heroku, be aware that this might trigger certificate warnings and might not work with FTS
  • get an account with an OIDC proxy or IdP
  • register a client with your OIDC proxy/IdP and get the client_id and client_secret

Configuration#

All necessary files were put in this repository. Please note that, wherever values of your choice are required, there is a placeholder in the form YOUR_VALUE_REQUIRED in the scripts. There are at least two locations in the config-scripts that need to be specified: one is the location of the content you want to serve as a protected resource and there is another location on the file system where you can put your certificates, the VarLockDB and the lua script, although it is recommended to keep those in separate directories. All directories need to be read- and writeable for the apache but the latter must not be exposed to the outside!

  • after installing the apache, all configuration files should be available in /etc/apache2/ and the corresponding subdirectories
    • the main config is apache2.conf
    • additional configuration is done with conf-files, which are held in conf-available and activated by a2enconf CONFNAME, which sets a symlink of the corresponding conf in conf-enabled
    • the mod-configs are in /etc/apache2/mods-available and are activated by a2enmod MODNAME
    • sites are in /etc/apache2/sites-available and are activated by a2site SITENAME
    • SSL/TLS is configured by setting a vhost for all ips and port 443 in /etc/apache2/sites-available/default-ssl.conf (file name may vary depending on your Apache2 release)
    • if you want to redirect http to https permanently, you can set Redirect permanent / https://YOUR_DOMAIN_DE within the VirtualHost-directive in /etc/apache2/sites-available/000-default.conf or use mod_rewrite for the same purpose with the lines

RewriteEngine On RewriteCond %{HTTPS} != on RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R,L]

- for additional protection against phishing and dns cache poisoning, you can enable HSTS by enabling mod_headers and inserting `Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains"` into the VirtualHost directive for port 443. You could also consider adding HSTS-preload to the header and registering your domain with an HSTS preload list for added security in a production environment.
- you can also choose to deny access to port 80 either within the apache itself or by using firewall rules, which could be the sensible thing to do here.
  • use a2enmod to enable mod_dav, mod_dav_fs and mod_dav_lock for directory listing and put requests; the options for the directory are set in openidc.conf
  • clone the mod_auth_openidc git repository and follow the build instructions for the underlying library and install it on your machine, or install it with apt (not tested yet, the ubuntu repositories might not contain the latest version)
  • copy the config-file openidc.conf from this repo into /etc/apache2/conf-available
    • set an arbitrary OIDCCryptoPassPhrase (it is needed for the internal encryption of data, it is only needed elsewhere if multiple Apache instances are using the same cache)
    • set all info in the conf-file according to the IdP you are using (sample config for iam.extreme-datacloud.eu is provided at the end)
    • set the path for the protected resource (also in the sample config)
  • in case you are using a grid hostCert (IMPORTANT FOR FTS):
    • copy the hostCert, hostKey and the GridKa rootCert into a folder the apache can access
    • if you want to be very careful, check the rootCert fingerprint with openssl x509 -noout -fingerprint -sha256 -inform pem -in rootCertFile and compare it to the one given at www.gridka.de
    • in /etc/apache2/sites-available:
      • set the paths for the hostCert and hostkey with SSLCertificateFile and SSLCertificateKeyFile
      • set the path of the rootCert in SSLCertificateChainFile
  • Caveat: with only OAUTH2 enabled, simple browser access is not possible at the moment.

  • in order to access the file system as a certain local user, you will also need to add a mapping between the remote user identified by sub@iss from the OAUTH2-token and a local user. in this first version, this is realized with a lua-hook that reads sub and iss claims from the token and queries a csv-file that links the remote and local identity.

  • dynamic setuid and setgid-capabilities are provided by the multi-processing-module mpm-itk, that needs a slight modification as shown in mpm-itk.c
    • you will need to download the module sources and replace mpm_itk.c in the source-tree before using make and sudo make install to build and install the module to your Apache2 module files.
  • lastly, you need to load the mpm-itk module in /etc/apache2/apache2.conf with LoadModule mpm_itk_module /usr/lib/apache2/modules/mpm_itk.so

For a first test, you can simply use cURL commands from your terminal to download and upload files:

curl https://your.domain.de/path/to/file -O -H "Authorization: Bearer `oidc-token YOUR_OP`" --cacert path/to/GridKa-CA-root.pem
curl -X PUT https://your.domain.de/protected/path--upload-file foo.txt -H "Authorization: Bearer `oidc-token YOUR_OP`" --cacert path/to/GridKa-CA-root.pem

The authorization with OAuth2 takes place in form of a bearer token in the Authorization directive in the http header of the request. oidc-token is a command belonging to the oidc-agent that manages OIDC and OAuth2 tokens for CLI. You can get it at https://indigo-dc.gitbook.io/oidc-agent/, where the download options, installation directions and documentation are provided.