Reputation
Badges 1
12 × Eureka!its a directory (sha generation step actually successfull:
Generating SHA2 hash for 1136604 files
as in github issue). given previous experience, i would expect it to be uploaded as multiple zip files.
yes, I dont use s3. i have a dedicated machine with raid configured, were clearml server is running.
any docs where I can learn a bit more on structure of database? I managed to connect to MongoDB container. databases:
> show dbs
admin 0.000GB
auth 0.000GB
backend 0.027GB
config 0.000GB
local 0.000GB
I assume backend..so
> use backend
> show collections
company
model
project
queue
settings
task
task__trash
url_to_delete
user
versions
nothing related to dataset. I would assume dataset is a task, but not sure
added couple of prints to dataset object. it seems cleaml hardcodes IP for state.json
URL. The problem is that server migrated to a new IP. Is there a way to change IP that is hardcoded?
tbh i have no experience with mongodb. from what I can see, its a nested schema. smth like:
execution -> artifacts -> { hash1_output: {uri: ...}, hash2_output: {uri: ... }, ... }
cant compose a compelling find
for it
I end up using dvc for the dataset management. It doesnt have fancy UI, but works flawlessly with large datasets
now i cant download neither of them 😕 would be nice if address of the artifacts (state and zips) was assembled on the fly and not hardcoded into db. if you have any tips how to fix it in the mongo db that would be great. I found this tip on model relocation: None . I think I need smth really similar but for datasets
on the server itself there is clearml.conf with:
# ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server:
I am not sure about that. I have another dataset of similar structure which is smaller (40gb) and which succeeded to be uploaded. Seems like the how it works - first it computes sha for all the files, but during uploading - aggregates small files in to zip archives approx 512 mb each.
okay. I think I see the pattern. datasets that I added from storage server itself have "localhost" in uri of the files. because clearml.conf on the server has it like that. datasets that I added remotely - have old IP address