
Reputation
Badges 1
32 × Eureka!conf file:
# ClearML SDK configuration file
api {
# Tomer Roditi's workspace
web_server:
api_server:
files_server:
# corractions server
credentials {"access_key": "****", "secret_key": "****"}
}
sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
# default_cache_manager_size: 100
...
after playing with it a bit more, i figured out that you need to set _
allow
omegaconf_edit_
to False
and the parameters naming should be done with dot notation. for example Hydra/some_key.
other_key.my _key
. there is no documentation regarding hydra with hyperparameters tuning, it is best to add a section regarding it in the link you attached.
@<1822805241150574592:profile|ShinySparrow39> thanks, but i don't understand your workaround. how does it enables overrides from clearm UI? specifically for hyperparameters app which doesn't seems to work with logged configurations files.
we would love to get some clarifications as well, how does strings should be provided? what is the step size param in the uniform type? is it normal that all unused hyperparams in the HPO are displayed as str typein the UI?
thanks!
setting ignore_remote_overrides = True
help solve the issue, but obviously we can't use it as a solution. what reasons might be that it would take so much time when trying to find params override in the backend? is it a network issue? maybe needs to change the machine network configuration?
@<1523701070390366208:profile|CostlyOstrich36> i just edited my question, can you refer to it pls.
thanks!
by application you mean the web UI? is it possible to estimate the amount of calls it's performing when idle? while viewing an experiment? or some other typical usage?
i don't have one, as i said it is not very reproduceable. the same code runs fine one time, and another time (running the exact same experiment) it works the same but with the logging issues. as i mentioned, IMO it is not something related to the code itself but to connectivity with clearml servers. i'm running on GCP machines, which is not the first time i'm experiencing connectivity issues with clearml when working on them (we migrated from AWS ec2 a few weeks ago). the first issue was with...
It seems that when the working directory is set to '.' (root of the cloned repo) I am able to import my package as expected.
I thought about your solution but then it requires me to push every time I change my package which is inconvenient.
we are using the community server (pro account). full configuration is attached.
update:--docker-args "-v some_dir:other_dir -v some_dir:other_dir"
is the correct format
It worked, thanks! i spent a few hours trying to figure it out 😅
@<1523701070390366208:profile|CostlyOstrich36> Hi, i would expect a feature that looks something like this:
clearml Task CLI option "--mount-files" (or other informative name) which would be used to add local files to the sent task in the following format:
clearml-task --project examples --name remote_test --script my_script.py --mount-files "local_file_1:target_path_1, local_file_2:target_path_2"
of course there would be some size limit to the mounted files (same as you do with l...
here is an example of the logged values, I was hoping for a way to log them so they could be visualize better.. any advice (beside making my own plot and log it)?
well there is no default one.. and in the docs there is nothing about it. would be nice to add a minimal requirements for the AMI in the docs, instead of just writing "The AWS AMI to launch".
thanks for the answer :)
@<1523701070390366208:profile|CostlyOstrich36> thanks for the reply!
yes i'm using app.clear.ml
the vm is initialized via the clearml autoscalers, in the aws autoscaler i didn't have to do some network configurations there, thus i assume that it should be the same in the gcp VMs.
can you direct me to tests that should reveal lagging issues?
yes the agent has already cloned the same repo in the first task (from the same account with the same user and token).
do you mean the full log of the machine itself? the full log of the failed task is attached already
no, im clonning only from gitlab.
another issue i had regarding clonning git raised when i tried using clearml-agent daemon, it was unable to find my git user name (i used the --git-user and --git-pass args). how can one debug these issues, if possible at all from the user side?
from the consule logs i can see that:entry_point = preprocessing.py
working_dir = Scripts/Volvo/volvo_data_alignment
installed packages before the task is running:
# Python 3.10.10 | packaged by Anaconda, Inc. | (main, Mar 21 2023, 18:39:17) [MSC v.1916 64 bit (AMD64)]
GitPython == 3.1.31
bokeh == 2.4.3
boto3 == 1.26.158
botocore == 1.29.158
clearml == 1.11.1
h5py == 3.8.0
holoviews == 1.16.0
joblib == 1.2.0
lightgbm == 3.3.5
mat73 == 0.60
matplotlib == 3.7.1
numpy == 1.23.5
pandas == 1.5.3
pytest == 7.3.1
scikit_learn == 1.3.0
scipy == 1.10.1
shapely == 2.0.1
sktime == 0.21.0
statsmodels == 0.13.5
t...
my repo structure is:
|----my_package
|----scripts
|----some_folder
|----some_folder
|----task_script.py
|----requirements.txt
so how can i make clearml to install my req file? i thought that it is automatically detects the req file in the cloned repository..
since it is specifying the pandas==1.5.3
i thought it might be pulling the req file from my main branch so i updated the req file there as well (since i am working on a different branch) and it didn't changedanything (as you might expect)
it seems like it does finds the requirments file otherwise why would it install tsfel
tsfresh
and some other "non-basic" packages?
@<1822805241150574592:profile|ShinySparrow39> thanks for replying but i think you got me wrong. in your suggestion you can connect only files that are committed and pushed to git when running remotely, which is exactly what i'm trying to find a workaround to.
any idea what it might be? or how can i test it with them?
hi, thanks for the reply, i can access the web UI, I am using the Pro plan (clearm's host)
I'm currently overcoming it by just adding a "_
" ("data_
something" -> "_data_something")