Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi! I Use Self-Hosted Server. I Uploaded Datasets With

Hi! I use self-hosted server. I uploaded datasets with clearml-data . After a while I am fetch one of them

clearml-data get --copy shows_test --id 155299fcad6e4470a784eb587a606510

but getting the error:

2023-03-16 19:56:10,104 - clearml - INFO - Dataset.get() did not specify alias. Dataset information won't be automatically logged in ClearML Server.
Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f396251d2e0>: Failed to establish a new connection: [Errno 113] No route to host')': /balacoon/.datasets/shows_test/shows_test.155299fcad6e4470a784eb587a606510/artifacts/state/state.json

When I check this dataset on the server, its still there:

(base) clement@balacoonstorage:~$ du -sh /opt/clearml/data/fileserver/balacoon/.datasets/shows_test/shows_test.155299fcad6e4470a784eb587a606510/
288M	/opt/clearml/data/fileserver/balacoon/.datasets/shows_test/shows_test.155299fcad6e4470a784eb587a606510/

When I try to add a new dataset - and download it - it works without issues. But whole bunch of datasets I added before seems to have this issue.
I am suspicious that path in the error is absolute one and doesn't have prefix "/opt/clearml/data/fileserver/". Why is it missing from the request?

  
  
Posted 11 months ago
Votes Newest

Answers 12


@<1547390422483996672:profile|StaleElk72> the registered URLs are properties of the artifact metadata stored in the task object in the server's mongodb database - to change that, you ill need to exec into the mongodb container and use the mongo CLI to edit the URL

  
  
Posted 11 months ago

added couple of prints to dataset object. it seems cleaml hardcodes IP for state.json URL. The problem is that server migrated to a new IP. Is there a way to change IP that is hardcoded?

  
  
Posted 11 months ago

any docs where I can learn a bit more on structure of database? I managed to connect to MongoDB container. databases:

> show dbs
admin    0.000GB
auth     0.000GB
backend  0.027GB
config   0.000GB
local    0.000GB

I assume backend..so

> use backend
> show collections
company
model
project
queue
settings
task
task__trash
url_to_delete
user
versions

nothing related to dataset. I would assume dataset is a task, but not sure

  
  
Posted 11 months ago

link with "localhost" in it Oo

Hmm I think this is the main issue, for some reason the dataset default upload destination is "localhost", what do you have configured in your clearml.conf under files server?

  
  
Posted 11 months ago

on the server itself there is clearml.conf with:

# ClearML SDK configuration file
api {
    # Notice: 'host' is the api server (default port 8008), not the web server.
    api_server: 

    web_server: 

    files_server: 
  
  
Posted 11 months ago

@<1547390422483996672:profile|StaleElk72> when you go to the dataset in the UI, and press on "Full Details" then go to the Artifacts tab, what is the link you see there?

  
  
Posted 11 months ago

link with "localhost" in it Oo

  
  
Posted 11 months ago

now i cant download neither of them

would be nice if address of the artifacts (state and zips) was assembled on the fly and not hardcoded into db.

The idea is this is fully federated, the server is not actually aware of it, so users can manage multiple storage locations in a transparent way.

if you have any tips how to fix it in the mongo db that would be great ....

Yes that should be similar, but the links would be in artifact property on the Tasks object
not exactly sure on how to do that though ... maybe something like db.task.find({artifact ?!

  
  
Posted 11 months ago

tbh i have no experience with mongodb. from what I can see, its a nested schema. smth like:

execution -> artifacts -> { hash1_output: {uri: ...},  hash2_output: {uri: ... }, ... }

cant compose a compelling find for it

  
  
Posted 11 months ago

okay. I think I see the pattern. datasets that I added from storage server itself have "localhost" in uri of the files. because clearml.conf on the server has it like that. datasets that I added remotely - have old IP address

  
  
Posted 11 months ago

now i cant download neither of them 😕 would be nice if address of the artifacts (state and zips) was assembled on the fly and not hardcoded into db. if you have any tips how to fix it in the mongo db that would be great. I found this tip on model relocation: None . I think I need smth really similar but for datasets

  
  
Posted 11 months ago

posted on SO: None

  
  
Posted 11 months ago
571 Views
12 Answers
11 months ago
11 months ago
Tags
Similar posts