Reputation
Badges 1
87 × Eureka!Are you running within a zero-trust environment like ZScaler ?
Feels like your issue is not ClearML itself, but issue with https/SSL and certificate from your zero-trust system
You are using CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL the wrong way
thanks for all the pointer ! I will try to have a good play around
normally, you should have a agent running behind a "services" queue, as part of your docker-compose. You just need to make sure that you populate the appropriate configuration on the Server (aka set the right environment variable for the docker services)
That agent will run as long as your self-hosted server is running
In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above
that format is correct as I can run pip install -r requirements.txt
using the exact same file
following this thread as it happen every now and then that clearml miss some package for some reason ...
I mean, what happen if I import and use function from another py file ? And that function code changes ?
Or you are expecting code should be frozen and only parameters changes between runs ?
Clear. Thanks @<1523701070390366208:profile|CostlyOstrich36> !
no. I set apo.file_server to the None in Both the remote agent clearml.conf and my local clearml.conf
In which case, both case where the code is ran from local or remote, will store metrics to cloud storage
ok, so if git commit or uncommit changes differ from previous run, then the cache is "invalidated" and the step will be run again ?
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
so the issue is that for some reason, the pip install
by the agent don't behave the same way as your local pip install
?
Have you tried to manually install your module_b with pip install inside the machine that is running clearml-agent ? Seeing your example, looks like you are even running inside docker ?
Are you talking about this: None
It seems to not doing anything aboout the database data ...
@<1523701868901961728:profile|ReassuredTiger98> I found that you an set the file_server
in your local clearml.conf
to your own cloud storage. In our case, we use something like this in our clearml.conf:
api {
file_server: "azure://<account>..../container"
}
All non artifact model are then store in our azure storage. In our self-hosted clearml setup, we don't even have a file server running alltogether
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
What should I put in there? What is the syntax for git package?
you are forcing ssh with force_git_ssh_protocol: true
Have you setup ssh keys ?
If you are using ssh keys, why enable_git_ask_pass: true
?
Is it because Azure is "whitelisted" in our network ? Thus need a different certificate ?? And how do I provide 2 differents certificate ? Is bundling them simple as a concat of 2 pem file ?
Found a trick to have empty Installed package:clearml.Task.force_requirements_env_freeze(force=True,requirements_file="/dev/null")
Not sure if this is the right way or not ...
Nice ! That is handy !!
thanks !
oh, looks like I need to empty the Installed Package before enqueue the cloned task
but then it still missing a bunch of library in the Taks (that succeed) > Execution > INSTALLED PACKAGES
So when I do a clone of that task, and try to run the clone, the task fail because it is missing python package 😞
Found the issue: my bad practice for import 😛
You need to import clearml before doing argument parser. Bad way:
import argparse
def handleArgs():
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('--device', type=int, default=0,
help='cuda device index to run the training')
args = parser....
if you are using a self hosted clearml server spin up with docker-compose, then you can just mount your NAS to /opt/clearml/fileserver
on the host machine, prior to starting clearml server with docker-compose up
I use ssh public key to access to our repo ... Never tried to provide credential to clearml itself (via clearml.conf
) so I cannot help much here ...
Do you want to use https
or ssh
to do git clone ? Setting up both in the same time is confusing
not sure if related but clearml 1.14 tend to not "show" the gpu_type