Removed README.txt and TODO.txt, added README.md. (67218e2e) · Commits · VOSpace INAF / vospace-transfer-service

README.md

0 → 100644

+132 −0

Original line number	Diff line number	Diff line
		## VOSpace backend

		### Introduction

		This repository hosts the code of the VOSpace backend.
		This is a dockerized version including all the VOSpace components which can be directly executed on your laptop.

		The VOSpace implementation is composed by several parts. Each of these parts is represented by one of the [following repositories](https://www.ict.inaf.it/gitlab/vospace).

		For a production-like demo, please refer to the [vospace-demo](https://www.ict.inaf.it/gitlab/vospace/vospace-demo) repository or simply visit [this page](http://staging.ia2.inaf.it/) and try it out.

		For more information about the VOSpace specification, please refer to:
		- [IVOA Documents & Standards](https://www.ivoa.net/documents/)
		- [VOSpace standard v2.1](https://www.ivoa.net/documents/VOSpace/20180620/REC-VOSpace-2.1.html)
		- [Universal Worker Service Pattern v1.1](https://www.ivoa.net/documents/UWS/20161024/REC-UWS-1.1-20161024.html)

		Further documentation on the VOSpace implementation can be found [here](https://redmine.ict.inaf.it/projects/401/wiki).

		### Main features

		- Recursive scan, checksum calculation and .tar generation for data provided by users
		- Database interaction for storing and retrieving information about VOSpace nodes, jobs, storage points and users
		- Simple FCFS (First Come First Served) scheduling mechanism for jobs based on Redis lists
		- Set of command line tools to simplify the administrator interaction with the backend architecture.


		### Getting started

		First of all clone the repository on your local Linux machine, open a terminal and place yourself within the vospace-transfer-service folder.
		You can launch the whole environment by running the following commands (as user):

		```
		docker-compose pull
		docker-compose up
		```
		The web interface will be available on your browser at http://localhost:8080/ once all the containers are up and running.

		To stop the environment and perform a cleanup, launch the following commands from another shell:

		```
		docker-compose down
		docker system prune -a
		docker volume prune
		```

		### Components

		- Client (container_name: client), is used to provide to the user some useful command line tools to interact with the backend
		- Transfer service (container_name: transfer_service) is the core of the backend architecture
		- RabbitMQ (container_name: rabbitmq), is a AMQP broker used to deliver messages containing requests from the user command line tools and from the VOSpace REST APIs
		- Redis (container_name: job_cache), is used as a cache for job queues
		- File catalog (container_name: file_catalog), now available [here](https://www.ict.inaf.it/gitlab/vospace/vospace-file-catalog), is a posgresql database used to store information on VOSpace nodes, but also on storage locations, jobs and users.


		#### Client

		The client container provides the following command line tools:

		- vos_data: launches a job to automatically store data provided by the user on a given storage point (hot or cold)
		- vos_import: imports VOSpace nodes on the file catalog for data already stored on a given storage point
		- vos_job: provides information about jobs
		- vos_storage: allows to add/list/remove storage points.

		You can launch each of these commands without any argument to see their help page.

		#### Transfer service

		The transfer service is the core of the VOSpace backend architecture.

		You can access the transfer_service container with:
		```
		docker exec -it transfer_service bash
		```
		On this container, hosted on the so-called transfer node, each user will find his/her own home folder and two subfolders representing respectively the entry point and exit point for the user data:
		- /home/name.surname/store
		- /home/name.surname/retrieve

		The user will copy the data to be stored within the store folder, while he/she will find the requested data within the retrieve folder.
		This use case was implemented in order to try to offer support to users providing huge amounts of data in the order of terabytes.

		#### RabbitMQ

		You can access the RabbitMQ web interface via browser in two steps.

		1. Find the IP address of the RabbitMQ broker:
		```
		docker network inspect vospace-transfer-service_backend_net \| grep -i -A 3 rabbitmq
		```
		2. Open your browser and point it to http://IPv4Address:15672 (user: guest, password: guest)


		#### Redis

		You can access the Redis server from the client container by following the steps here below.

		1. Execute an interactive bash shell on the client container:
		```
		docker exec -it client bash
		```

		2. Use redis-cli command to connect to Redis:
		```
		redis-cli -h job_cache
		```

		3. You can obtain some info about the jobs by searching them on the following queues:
		- For write operations the queues are write_pending, write_ready and write_terminated
		- For read operations the queues are read_pending, read_ready and read_terminated.

		Example: list the first six elements of the write_ready queue
		```
		redis:6379> lrange write_ready 0 5
		```

		#### File catalog

		You can access the file catalog from the client container by following the steps here below.

		1. Execute an interactive bash shell on the client container:
		```
		docker exec -it client bash
		```
		2. Access the database via psql client:
		```
		psql -h file_catalog -d vospace_testdb -U postgres
		```
		3. You can now perform a query, for example show all the tuples of the node table displaying some fields:
		```
		vospace_testdb=# SELECT node_id, path, name, parent_path, type, owner_id, content_md5, async_trans, sticky FROM node;
		```

		You can perform whatever query also on the other tables: deleted_node, storage, location, job and users.

README.txt

deleted100644 → 0

+0 −225

Original line number	Diff line number	Diff line
		Simple communication test that involves 5 docker containers:
		- client (container_name: client, commands available: 'vos_data')
		- server (container_name: transfer_service)
		- RabbitMQ (container_name: rabbitmq)
		- Redis (container_name: redis)
		- File catalog (container_name: file_catalog), now available here:
		https://www.ict.inaf.it/gitlab/vospace/vospace-file-catalog

		In addition to these containers, Sonia Zorba modified 'docker-compose.yml' by adding REST, file service and ui portions.
		The images used for this purpose are:
		- git.ia2.inaf.it:5050/vospace/vospace-rest
		- git.ia2.inaf.it:5050/vospace/vospace-file-service
		- git.ia2.inaf.it:5050/vospace/vospace-ui

		The web interface is available on your browser at http://localhost:8080/ when all the containers are up and
		running (read the section here below).

		###############################################################################################################

		You can start the whole environment from the 'vos-ts' directory with:
		$ docker-compose up

		Once all the containers are up and running, open another shell and access the 'client' container:
		$ docker exec -it client /bin/bash

		Now you can launch the 'vos_data' command.
		Launching the client without any argument will show you how to use it:

		client@28970a09202d:~$ vos_data

		NAME
		vos_data

		SYNOPSYS
		vos_data COMMAND USERNAME

		DESCRIPTION
		The purpose of this client application is to notify to the VOSpace backend that
		data is ready to be saved somewhere.

		The client accepts only one (mandatory) command at a time.
		A list of supported commands is shown here below:

		cstore
		performs a 'cold storage' request, data will be saved on tape

		hstore
		performs a 'hot storage' request, data will be saved to disk

		The client also needs to know the username associated to a storage request process.
		The username must be the same used for accessing the transfer node.


		For example, if we want to perform a 'cold storage' request for the 'curban' user, we do:
		client@28970a09202d:~$ vos_data cstore curban

		Choose one of the following storage locations:

		----------------------------------------------------------------------
		[*] storage_id: 1 => hostname: tape-fe.ia2.inaf.it
		----------------------------------------------------------------------

		Please, insert a storage id: 1

		!!!!!!!!!!!!!!!!!!!!!!!!!!WARNING!!!!!!!!!!!!!!!!!!!!!!!!!!!
		If you confirm, all your data on the transfer node will be
		available in read-only mode for all the time the storage
		process is running.
		!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

		Are you sure to proceed? [yes/no]: yes

		JobID: c63697eafbf711eaa44d0242ac1c0008
		Storage process started successfully!

		client@28970a09202d:~$


		After receiving this request the application will:
		1) Create a job object, insert it into the job table of the file catalog database and push a copy into a
		'pending' queue stored in Redis for scheduling purposes
		2) Scan the content of '/home/curban/store/' to find crowded 'leaf' dirs and substitute these ones wit an
		uncompressed tar according to some constraints defined in the global configuration file
		3) Re-scan the folder, move the content into a temporary folder if needed and perform recursive MD5 checksum
		4) Re-scan the folder for the last time in order to obtain the final directory structure
		5) Insert information about files and folders into the Node table of the file catalog, according to the VOSpace
		specification
		6) Move the job from the 'write_pending' queue to a 'Write_ready' queue in Redis, if all the previous steps
		succeeded.
		7) Obtain the physical paths from the VOSpace paths of the nodes and copy all the data to the right destination
		according to the information previously inserted by the user
		8) Cleanup of the '/home/curban/store/' directory (remove data and set right permissions) and database update
		(async_trans flag is set to 'true').


		Another thing you can do is to import nodes on the VOSpace file catalog from data already stored somewhere.
		For example, suppose we have a hot storage mounted on /mnt/hot_storage/users and visible from the transfer node.
		Our user folder will be, for example, /mnt/hot_storage/users/curban.

		On the transfer node you will find a directory called 'test_import' containing some data to be used for an import
		test.

		First of all, launch vos_import without any argument in order to see how to use it:

		client@28970a09202d:~$ vos_import

		NAME
		vos_import

		SYNOPSYS
		vos_import DIR_PATH USERNAME

		DESCRIPTION
		This tool recursively imports nodes on the VOSpace file catalog.

		Two parameters are required:

		DIR_PATH:
		the physical absolute path of a directory located within the
		user directory for a given mount point.

		USERNAME:
		the username used for accessing the transfer node.

		EXAMPLE
		The following command will import recursively all the nodes contained
		in 'mydir' on the VOSpace for the 'jsmith' user:

		# vos_import /mnt/storage/users/jsmith/mydir jsmith

		client@28970a09202d:~$

		Now, launch the import command to import the 'test_import' directory:

		client@28970a09202d:~$ vos_import /mnt/hot_storage/users/curban/test_import curban

		Import procedure completed!

		client@28970a09202d:~$

		This kind of operation works only for directories located at the first level of your user folder.


		###############################################################################################################

		You can access the rabbitmq web interface via browser:
		1) Find the IP address of the RabbitMQ broker:
		$ docker network inspect vos-ts_backend_net \| grep -i -A 3 rabbitmq
		2) Open your browser and point it to http://IP_ADDRESS:15672 (user: guest, password: guest)

		You can access the redis server from the 'client' container:
		1) Use redis-cli to connect to redis:
		$ redis-cli -h redis
		2) You can obtain some info about the jobs by searching them on the 'write_pending' and 'write_ready' queues
		using the lrange command. For example, a few seconds after launching three jobs with 'dataArchiverCli.py',
		you should be able to see an output similar to the following one:
		redis:6379[2]> lrange write_ready 0 5
		1) "{\"jobId\": \"56577c8645da11ebbbfe356e379843eb\", \"jobType\": \"other\", \"ownerId\": \"2386\", \"phase\": \"PENDING\",
		\"quote\": null, \"startTime\": null, \"endTime\": null, \"executionDuration\": null, \"destruction\": null, \"parameters\": null,
		\"results\": null, \"jobInfo\": {\"requestType\": \"HSTORE\", \"userName\": \"szorba\"}}"
		2) "{\"jobId\": \"53d2f2a545da11ebb7bd356e379843eb\", \"jobType\": \"other\", \"ownerId\": \"2048\", \"phase\": \"PENDING\",
		\"quote\": null, \"startTime\": null, \"endTime\": null, \"executionDuration\": null, \"destruction\": null, \"parameters\": null,
		\"results\": null, \"jobInfo\": {\"requestType\": \"CSTORE\", \"userName\": \"sbertocco\"}}"
		3) "{\"jobId\": \"502afdca45da11eb9676356e379843eb\", \"jobType\": \"other\", \"ownerId\": \"3354\", \"phase\": \"PENDING\",
		\"quote\": null, \"startTime\": null, \"endTime\": null, \"executionDuration\": null, \"destruction\": null, \"parameters\": null,
		\"results\": null, \"jobInfo\": {\"requestType\": \"CSTORE\", \"userName\": \"curban\"}}"

		You can access the file catalog from the 'client' container:
		1) Access the db via psql client:
		$ psql -h file_catalog -d vospace_testdb -U postgres
		2) You can now perform a query, for example show all the tuples of the Node table displaying some fields:
		vospace_testdb=# SELECT node_id, parent_path, path, name, type, owner_id, creator_id, content_MD5 FROM Node;

		The default output of the query after the container initialization should be something like this:

		vospace_testdb=# SELECT node_id, parent_path, path, name, tstamp_wrapper_dir, type, owner_id, creator_id, content_MD5 FROM node;
		node_id \| parent_path \| path \| name \| tstamp_wrapper_dir \| type \| owner_id \| creator_id \| content_md5
		---------+-------------+---------+------------+--------------------+-----------+----------+------------+-------------
		1 \| \| \| \| \| container \| 0 \| 0 \|
		2 \| \| 2 \| curban \| \| container \| 3354 \| 3354 \|
		3 \| \| 3 \| sbertocco \| \| container \| 2048 \| 2048 \|
		4 \| \| 4 \| szorba \| \| container \| 2386 \| 2386 \|
		5 \| \| 5 \| test \| \| container \| 2386 \| 2386 \|
		6 \| 5 \| 5.6 \| f1 \| \| container \| 2386 \| 2386 \|
		7 \| 5.6 \| 5.6.7 \| f2_renamed \| \| container \| 2386 \| 2386 \|
		8 \| 5.6.7 \| 5.6.7.8 \| f3 \| \| data \| 2386 \| 2386 \|
		(8 rows)

		A few seconds after launching three jobs with 'dataArchiverCli.py', the database will be populated and launching the previous
		SQL query you will be able an output like the one here below:

		vospace_testdb=# SELECT node_id, parent_path, path, name, tstamp_wrapper_dir, type, owner_id, creator_id, content_MD5 FROM node;
		node_id \| parent_path \| path \| name \| tstamp_wrapper_dir \| type \| owner_id \| creator_id \| content_md5

		---------+-------------+------------+------------------+---------------------+-----------+----------+------------+----------------------------------
		1 \| \| \| \| \| container \| 0 \| 0 \|
		2 \| \| 2 \| curban \| \| container \| 3354 \| 3354 \|
		3 \| \| 3 \| sbertocco \| \| container \| 2048 \| 2048 \|
		4 \| \| 4 \| szorba \| \| container \| 2386 \| 2386 \|
		5 \| \| 5 \| test \| \| container \| 2386 \| 2386 \|
		6 \| 5 \| 5.6 \| f1 \| \| container \| 2386 \| 2386 \|
		7 \| 5.6 \| 5.6.7 \| f2_renamed \| \| container \| 2386 \| 2386 \|
		8 \| 5.6.7 \| 5.6.7.8 \| f3 \| \| data \| 2386 \| 2386 \|
		9 \| 2 \| 2.9 \| mydir \| 2021_01_12-14_48_07 \| container \| 3354 \| 3354 \|
		10 \| 2 \| 2.10 \| foo2.txt \| 2021_01_12-14_48_07 \| data \| 3354 \| 3354 \| e07f37a6bfe96ad66e408380a5e3a899
		11 \| 2.9 \| 2.9.11 \| another_foo2.txt \| 2021_01_12-14_48_07 \| data \| 3354 \| 3354 \| e048e5108d71191158b50052d531b0ca
		12 \| 3 \| 3.12 \| foo4.txt \| 2021_01_12-14_48_22 \| data \| 2048 \| 2048 \| 5f429d803340bb7748c52b3931ed54cf
		13 \| 4 \| 4.13 \| aaa \| \| container \| 2386 \| 2386 \|
		14 \| 4.13 \| 4.13.14 \| bbb \| \| container \| 2386 \| 2386 \|
		15 \| 4.13.14 \| 4.13.14.15 \| foo5.txt \| \| data \| 2386 \| 2386 \| 262214d5cde30a74997199fb4e220a26
		(15 rows)


		From the file catalog database you can also obtain information about jobs, according to the UWS specification.
		Just try the following query:
		vospace_testdb=# SELECT * FROM job;

		###############################################################################################################

		Stop the whole environment:
		$ docker-compose down

		Cleanup:
		$ docker image prune -a
		$ docker volume prune

TODO.txt

deleted100644 → 0

+0 −26

Original line number	Diff line number	Diff line
		- Paths in config file

		vospace_path_prefix = /vos

		[transfer_node]
		base_path = /home/{username}/store

		[servers]
		hostname =
		base_path = /home/{username}

		[tape]
		frontend =
		base_path = /home/users/{username}

		- We should maintain a coherence between {username} on different storage points
		({username} should be the same in all places)

		- Temporary dir with timestamp => flag?

		- Add hostname parameter to dataArchiver.py

		- How to scale the system: multiple queues (more complex scheduling) + FSM
		- If we have more than one tape library, do we need an entry on the configuration file for each one?
		Does spectrum archive send data to the right tape according to a defined policy?
		- And how many IA2 servers?
		No newline at end of file