Where is the documentation for uploaded test data?

blowekamp · April 27, 2018, 5:58pm

I have encounter a lot of old documentation that indicate that new baselines should be automatically uploaded:
https://www.google.com/search?client=safari&rls=en&q=ITK+upload+test+data&ie=UTF-8&oe=UTF-8

github.com

InsightSoftwareConsortium/ITK/blob/master/Documentation/Data.md

ITK Data
========

This page documents how to add test data while developing ITK. See our
[CONTRIBUTING](../CONTRIBUTING.md) and
[UploadBinaryData](UploadBinaryData.md) guides for more information.

Setup
-----

The workflow below depends on local hooks to function properly. Follow the main
developer setup instructions before proceeding. In particular, run
`SetupForDevelopment.sh`:

```sh
   $ ./Utilities/SetupForDevelopment.sh
```

Workflow
--------

This file has been truncated. show original

I believe this is the current document:

github.com

InsightSoftwareConsortium/ITK/blob/master/Documentation/UploadBinaryData.md

Uploading binary data in ITK
============================

Since every local [Git] repository contains a copy of the entire project
history, it is important to avoid adding large binary files directly to the
repository. Large binary files added and removed throughout a project's history
will cause the repository to become bloated, take up too much disk space,
require excessive time and bandwidth to download, etc.

A [solution to this problem] which has been adopted by ITK is to store binary
files, such as images, in a separate location outside the Git repository, then
download the files at build time with [CMake].

A "content link" file contains an identifying [SHA512 hash]. The content
link is stored in the [Git] repository at the path where the file would exist,
but with a `.sha512` extension appended to the file name. CMake will find these
content link files at **build** time, download them from a list of server
resources, and create symlinks or copies of the original files at the
corresponding location in the **build tree**.

This file has been truncated. show original

But there are still a lot of references to the data being automatically uploaded.

blowekamp · April 27, 2018, 7:14pm

It also appear that if you just run the local CMake configure and build, you only get the MD5 file. But then the recommendation is to upload to data.kitware.com, which uses the sha512 file, which is not created.

matt.mccormick · April 27, 2018, 10:07pm

Either set of documentation can currently be followed.

InsightSoftwareConsortium/ITK/blob/master/Documentation/Data.md

In this document, cmake is executed, and it generates a .md5 file.

InsightSoftwareConsortium/ITK/blob/master/Documentation/UploadBinaryData.md

In this document, upload the binary to data.kitware.com, and download the .sha512 file.

Please follow one or the other – they are different methods.

After the transition to GitHub, only the second method will be valid, and we will update the documentation accordingly.

blowekamp · April 27, 2018, 10:25pm

The first method is broken as the data is not uploaded automatically. The second newer method does not work well with CMake because it automatically move my data and replaces it with the md5 file. While the documentation is verbose it is not clear on the process for ITK testing data.

matt.mccormick · April 27, 2018, 10:37pm

The data gets uploaded when git gerrit-push is executed.

The second method is better because it does not require running CMake.

The key is to follow one or the other document. Soon, we will have just one document.

blowekamp · April 27, 2018, 10:43pm

Nope, doesn’t seem to happen.

I like ( at least I try ) to run and test before submitting to gerrit, so that does not work either.

They are broken.

matt.mccormick · April 28, 2018, 2:30pm

The documentation needs to be read and followed.

The old system, which requires the developer to 1) generate the blob and content hash with CMake, and 2) upload the contents with git gerrit-push, has a few issues. First, if step 1) and 2) are not performed with the repository, the binary data does not get uploaded. This may be why it was not uploaded in your case. Second, people have a hard time understanding how the data is uploaded in step 2). In the new system, uploading to data.kitware.com, the process is explicit, which helps to address these issues.

They work. They have some issues, but these issues are being addressed.

blowekamp · April 30, 2018, 8:12pm

Ahh, it work following the instructions precisely. I see the following during the commit:

Modules/Filtering/ImageGrid/test/Baseline/foo.png.md5: Added content to Git at refs/data/MD5/d41d8cd98f00b204e9800998ecf8427e
Modules/Filtering/ImageGrid/test/Baseline/foo.png.md5: Added content to local store at .ExternalData/MD5/d41d8cd98f00b204e9800998ecf8427e

However, my usual process involves using append. When I try to append a commit with data, as oppose to literally “commit”, the appropriate script is NOT called. I did not see the above message when using “amend”.

matt.mccormick · May 1, 2018, 10:23pm

I assume this refers to git commit --amend – there is no standard append command.

This message will only show up when the data is added to Git. If git commit --amend is used multiple times afterwards without any changes to the data, it will not show up.

Here is an example where the data was added with git commit --amend:

http://review.source.kitware.com/#/c/23383/

jhlegarreta · May 2, 2018, 4:41pm

@matt.mccormick @blowekamp thanks for casting light on this.

@blowekamp I grant that the documentation may be misleading or may seem to be duplicated at some points. As it has been said, this may partially be due to the transition period we are going through, but I’d be happy to edit any section, remove redundancies, and overall, make more robust any section of the documentation.

Also, at the time the md files were created, the information was dumped from several sources, and they may need refinement and edition.

If there is any tribal knowledge, this should make it into the documentation. And if overly complicated, we should also work with fresh members to simplify it so that it does not confuse anybody.

We can work on an Atlassian issue or a gerrit topic if you have already thought about the ways to improve this.