
Reputation
Badges 1
113 × Eureka!@<1523701070390366208:profile|CostlyOstrich36> I would like to point to azure blob storage, what kind of url schema should I use ? And also, where do you configure the credential for the ClearML server to access to Azure blob as file_server ? I couldn't find any documentation around this topic 😞
TIA
one specify the venv python, the other tell it to not do anything
the underlying code has this assumption when writing it
That means that you want to make things work not in a standard Python way ... In which case you need to do "non-standard" things to make it work.
You can do this for example in the beginning of your run.py
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
In this way, you not relying on a non-standard feature to be implemented by your tool like pycharm
or `cle...
depend on how the agent is launched ...
That --docker_args
seems to be for clearml-task
as described here , while you are using clearml-agent
which is a different thing
--gpus 0,1
: I believe this basically say that your code launched by the agent has access to both GPUs and that is it. Now it is up to your code to choose which GPU to use and what not and how ...
you may want to share your config (with credential redacted) and the full docker compose start up log ?
if you are on github.com , you can use Fine tune PAT token to limit access to minimum. Although the token will be tight to an account, it's quite easy to change to another one from another account.
I don;t think there is a "kill task" code. By principle, in Linux, as a parent process, ClearML agent launch the training process. When a parent process is terminated, the linux kernel will, in most of the case, kill all child processes, including your training process.
There may be some way to resume a task from ClearML agent when it restart, but I don;t think that is the default behavior
if you are using a self hosted clearml server spin up with docker-compose, then you can just mount your NAS to /opt/clearml/fileserver
on the host machine, prior to starting clearml server with docker-compose up
there is a tricky thing: clearml-agent should not be running from a venv itself ... don't remember where I read that doc
are you using the agent docker mode ?
In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above
while the other may need to be 1
instead of true
Are you running within a zero-trust environment like ZScaler ?
Feels like your issue is not ClearML itself, but issue with https/SSL and certificate from your zero-trust system
if you want to replace MLflow by ClearML: do it !! It's like "Should I use sandal or running shoes for my next marathon ..."
Let your user try ClearML, and I am pretty sure all of them will want to swap over !!!
so what was the solution/hack then ?
Are the uncommit changes in un-tracked files ?
In other words: clearml will only save uncommited changes from files that are tracked by your local git repo
You will need to change more than just REQUESTS_CA_BUNDLE
to use custom certificate. Python libraries don't all follow REQUESTS_CA_BUNDLE
You need to also add your certificate to your OS
In conda we have to export SSL_CERT_FILE=~/ca-bundle.crt
etc ...
Found it: None
And credential are set with :
sdk {
azure.storage {
containers: [
{
account_name: "account"
account_key: "xxxx"
container_name:"clearml"
}
]
}
}
I think ES use a greedy strategy where it allocate first then use it from there ...
I tried mounting azure storage account on that path and it worked: all files end up in the cloud storage
your need both in certain case
Looks like your issue is not that ClearML is not tracking your changes but more about your Configuration is overwrriten.
This often happen to me. The way I debug this is put a lot of print statement along the code to track when the Configuration is overwriten and narrow down why. print statement will show up in the Console tab.
For #2: it's a pull rather than a push system: you need to have a script that do pulling at regular interval and need to keep track what new and what not?
Clear. Thanks @<1523701070390366208:profile|CostlyOstrich36> !
I found that if pip is upgraded to latest version 25.0.1 then the package install fine.
The question become: why does the agent downgrade pip ?
Ignoring pip: markers 'python_version < "3.10"' don't match your environment
Collecting pip<22.3
Downloading pip-22.2.2-py3-none-any.whl.metadata (4.2 kB)
Downloading pip-22.2.2-py3-none-any.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 3.9 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
...
Actually, I can set agent.package_manager.pip_version=""
in the clearml.conf
And after reading 4x the doc, I can use the env var:CLEARML_AGENT__AGENT__PACKAGE_MANAGER__PIP_VERSION
Sure:
def main():
repo = "redacted"
commit = "redacted"
commit = "redacted"
bands = ["redacted"]
test_size = 0.2
batch_size = 64
num_workers = 12
img_size = (128, 128)
random_seed = 42
epoch = 20
learning_rate = 0.1
livbatch_list = get_livbatch_list(repo, commit)
lbs = download_batches(repo, commit, livbatch_list)
df, label_map = get_annotation_df(lbs, bands)
df_train, df_val = deterministic_train_val(df, test_size=test_siz...