Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys, I'M In The Process Of Setting Up A Clearml Server For Experiment Tracking. I Have The Server Hosted In A Virtual Linux Machine On Azure And Run Experiments From Some Local Compute. Our Training Environment Is Pytorch Lightning And I Have Written

Hi guys,

I'm in the process of setting up a ClearML server for experiment tracking. I have the server hosted in a virtual Linux machine on Azure and run experiments from some local compute. Our training environment is Pytorch Lightning and I have written a logger, that uses ClearML report_* functions. Scalars and console output is nicely uploaded to the server, but I can't quite wrap my head around getting media and plots uploaded too. When I host the server on the same machine that runs the experiments, there are no issues.

Can you help me understand how I should setup storage / upload to get media and plots uploaded to the remote server?
I have also setup an Azure Blob Storage, but don't quite see how that could be connected to media uploads.

Thanks.

  
  
Posted one year ago
Votes Newest

Answers 18


The server will never access the storage - only the clients (SDK/WebApp etc.) will access it

Oh okay. So that's the reason I can access media when the client and server is running on the same machine?

  
  
Posted one year ago

Yeah, the server can run anywhere 🙂

  
  
Posted one year ago

Do you mean to the Web UI?

Yes that's what I meant, sorry I'm still coming to terms with ClearML terminology 😅 . Is it possible to store the web app cloud access token serverside so we don't have to input it in the Web UI? 🙂

  
  
Posted one year ago

How does it look in the Web UI?

I just had a look, and they are visible under debug samples, but not under plots, as I had expected.
I thought that by using report_matplotlib_figure it would get grouped under plots? 🙂

  
  
Posted one year ago

Would you recommend doing both then? :-)

You will need to if you want the SDK to be able to actually access this storage - on is to let the SDK know which is the default storage, the other is to provide details on how to access it

  
  
Posted one year ago

It's actually complementary - the SDK will use the clearml.conf configuration by matching that configuration with the destination you provided

Would you recommend doing both then? :-)

  
  
Posted one year ago

And how do I ensure that the server can access the files from the blob storage?

The server will never access the storage - only the clients (SDK/WebApp etc.) will access it

  
  
Posted one year ago

I've also added a token to my server, so now I can access the audio samples from the server.

Do you mean to the Web UI?

  
  
Posted one year ago

Hey SweetBadger76 , thanks for answering. I'll check it out! Does that correspond to filling out azure.storage in the clearml.conf file?

And how do I ensure that the server can access the files from the blob storage?

  
  
Posted one year ago

Does that correspond to filling out

azure.storage

in the clearml.conf file?

It's actually complementary - the SDK will use the clearml.conf configuration by matching that configuration with the destination you provided

  
  
Posted one year ago

Hi GiganticMole91 , how did you set up your clearml.conf file?

  
  
Posted one year ago

Sure. Really, I'm just using the default client:
# ClearML SDK configuration file
api {
web_server: http://server.azure.com:8080
api_server: http://server.azure.com:8008
files_server: http://server.azure.com:8081
credentials {
"access_key" = "..."
"secret_key" = "..."
}

}
sdk {
# ClearML - default SDK configuration

storage {
    cache {
        # Defaults to system temp folder / cache
        default_base_dir: "~/.clearml/cache"
    }

    direct_access: [
        # Objects matching are considered to be available for direct access, i.e. they will not be downloaded
        # or cached, and any download request will return a direct reference.
        # Objects are specified in glob format, available for url and content_type.
        { url: "file://*" }  # file-urls are always directly referenced
    ]
}

metrics {
    # History size for debug files per metric/variant. For each metric/variant combination with an attached file
    # (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
    # X files are stored in the upload destination for each metric/variant combination.
    file_history_size: 100

    # Max history size for matplotlib imshow files per plot title.
    # File names for the uploaded images will be recycled in such a way that no more than
    # X images are stored in the upload destination for each matplotlib plot title.
    matplotlib_untitled_history_size: 100

    # Limit the number of digits after the dot in plot reporting (reducing plot report size)
    # plot_max_num_digits: 5

    # Settings for generated debug images
    images {
        format: JPEG
        quality: 87
        subsampling: 0
    }

    # Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
    tensorboard_single_series_per_graph: false
}

network {
    metrics {
        # Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
        # a specific iteration
        file_upload_threads: 4

        # Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
        # being sent for upload
        file_upload_starvation_warning_sec: 120
    }

    iteration {
        # Max number of retries when getting frames if the server returned an error (http code 500)
        max_retries_on_server_error: 5
        # Backoff factory for consecutive retry attempts.
        # SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
        retry_backoff_factor_sec: 10
    }
}
aws {
    s3 {
        # S3 credentials, used for read/write access by various SDK elements

        # Default, used for any bucket not specified below
        region: ""
        # Specify explicit keys
        key: ""
        secret: ""
        # Or enable credentials chain to let Boto3 pick the right credentials.
        # This includes picking credentials from environment variables,
        # credential file and IAM role using metadata service.
        # Refer to the latest Boto3 docs
        use_credentials_chain: false

        credentials: [
            # specifies key/secret credentials to use when handling s3 urls (read or write)
            # {
            #     bucket: "my-bucket-name"
            #     key: "my-access-key"
            #     secret: "my-secret-key"
            # },
            # {
            #     # This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
            #     host: "my-minio-host:9000"
            #     key: "12345678"
            #     secret: "12345678"
            #     multipart: false
            #     secure: false
            # }
        ]
    }
    boto3 {
        pool_connections: 512
        max_multipart_concurrency: 16
    }
}
google.storage {
    # # Default project and credentials file
    # # Will be used when no bucket configuration is found
    # project: "clearml"
    # credentials_json: "/path/to/credentials.json"
    # pool_connections: 512
    # pool_maxsize: 1024

    # # Specific credentials per bucket and sub directory
    # credentials = [
    #     {
    #         bucket: "my-bucket"
    #         subdir: "path/in/bucket" # Not required
    #         project: "clearml"
    #         credentials_json: "/path/to/credentials.json"
    #     },
    # ]
}
azure.storage {
    # containers: [
    #     {
    #         account_name: "clearml"
    #         account_key: "secret"
    #         # container_name:
    #     }
    # ]
}

log {
    # debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
    null_log_propagate: false
    task_log_buffer_capacity: 66

    # disable urllib info and lower levels
    disable_urllib3_info: true
}

development {
    # Development-mode options

    # dev task reuse window
    task_reuse_time_window_in_hours: 72.0

    # Run VCS repository detection asynchronously
    vcs_repo_detect_async: true

    # Store uncommitted git/hg source code diff in experiment manifest when training in development mode
    # This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
    store_uncommitted_code_diff: true

    # Support stopping an experiment in case it was externally stopped, status was changed or task was reset
    support_stopping: true

    # Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
    default_output_uri: ""

    # Default auto generated requirements optimize for smaller requirements
    # If True, analyze the entire repository regardless of the entry point.
    # If False, first analyze the entry point script, if it does not contain other to local files,
    # do not analyze the entire repository.
    force_analyze_entire_repo: false

    # If set to true, *clearml* update message will not be printed to the console
    # this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
    suppress_update_message: false

    # If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
    detect_with_pip_freeze: false

    # Log specific environment variables. OS environments are listed in the "Environment" section
    # of the Hyper-Parameters.
    # multiple selected variables are supported including the suffix '*'.
    # For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
    # This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
    # Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
    log_os_environments: []

    # Development mode worker
    worker {
        # Status report period in seconds
        report_period_sec: 2

        # ping to the server - check connectivity
        ping_period_sec: 30

        # Log all stdout & stderr
        log_stdout: true

        # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
        # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds
        console_cr_flush_period: 10

        # compatibility feature, report memory usage for the entire machine
        # default (false), report only on the running process and its sub-processes
        report_global_mem_used: false
    }
}

# Apply top-level environment section from configuration into os.environ
apply_environment: false
# Top-level environment section is in the form of:
#   environment {
#     key: value
#     ...
#   }
# and is applied to the OS environment as `key=value` for each key/value pair

# Apply top-level files section from configuration into local file system
apply_files: false
# Top-level files section allows auto-generating files at designated paths with a predefined contents
# and target format. Options include:
#  contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
#  format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
#          base64-encoded contents string, otherwise ignored
#  path: the target file's path, may include ~ and inplace env vars
#  target_format: format used to encode contents before writing into the target file. Supported values are json,
#                 yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
#  overwrite: overwrite the target file in case it exists. Default is true.
#
# Example:
#   files {
#     myfile1 {
#       contents: "The quick brown fox jumped over the lazy dog"
#       path: "/tmp/fox.txt"
#     }
#     myjsonfile {
#       contents: {
#         some {
#           nested {
#             value: [1, 2, 3, 4]
#           }
#         }
#       }
#       path: "/tmp/test.json"
#       target_format: json
#     }
#   }

}

  
  
Posted one year ago

SuccessfulKoala55 Thanks for the help. I've setup my client to use my blob storage now, and it works wonderfully.

I've also added a token to my server, so now I can access the audio samples from the server.
Is there a way to add a common token serverside so the other members of the team don't have to create a token?

I also struggle a bit with report_matplotlib_figure() in which plots does not appear in the web ui. I have implemented the following snippet in my pytorch lightning logger:
` @rank_zero_only
def log_image(self, name: str, fig: Figure, step: int):

metric, series = reinterpret_metric(name)
self.task.get_logger().report_matplotlib_figure(
    title=metric,
    series=series,
    iteration=step,
    figure=fig,
)
plt.close("all") `Am I missing something in order to get the figures in a way that the server can see it correctly? When I inspect the blob storage, I do see the plots, so they are uploaded next to my other media files.
  
  
Posted one year ago

Am I missing something in order to get the figures in a way that the server can see it correctly? When I inspect the blob storage, I do see the plots, so they are uploaded next to my other media files

How does it look in the Web UI?

  
  
Posted one year ago

hey GiganticMole91
you can set the logger to set your bucket as your default upload destination :
task.get_logger().set_default_upload_destination(' s3://xxxxx ')

  
  
Posted one year ago

On the server or the client? :)

  
  
Posted one year ago

I've tried setting the output_uri on Task.init, but that seems to only affect model checkpoints and artifacts

  
  
Posted one year ago

on the client, where you run your logger

  
  
Posted one year ago
654 Views
18 Answers
one year ago
one year ago
Tags