Reputation
Badges 1
282 × Eureka!Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
ap...
I did another test by runningkubectl exec pod-name -- echo $PIP_INDEX_URL
and it returned nothing. So the env are not passed to the container at all.
clearml-serving does not support Spacy models out of the box among many others and that Clearml-Serving only supports following;
Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
Support Deep Learning Models (Tensorflow, PyTorch, ONNX).
An easy way to extend support to different models would be a boon.
I believe in such scenarios, a custom engine would be required. I would like to know, how difficult is it to create a custom engine with clearml-serving? For example, in this...
I'm also beginning to think this is related to https://clearml.slack.com/archives/CTK20V944/p1620664770492400 . Previously when i set force_repo_requirements_txt=true
and system_site_packages: true
, it seems to work. upgrading to v1.02 seems to change things.
So i kept trying, but i'm stuck on this when i run python k8s_glue_example.py
TypeError: init () got an unexpected keyword argument 'base_pod_num'
Reply…
Hi SuccessfulKoala55 , thanks. Opened issue on the CLearml-Agent GH at https://github.com/allegroai/clearml-agent/issues/67
Hi, Self-hosted using docker-compose.
ok, i'll wait till i get my hands on vault then. thanks.
Its. 0.17-63.
It doesn't appear in profile page.
I also think it make sense that when you do certain definitive CI actions like publish, it would support some custom scripts to run.
python k8s_glue_example.py --queue gpu --namespace default
Traceback (most recent call last):
File "k8s_glue_example.py", line 86, in <module>
main()
File "k8s_glue_example.py", line 80, in main
namespace=args.namespace,
File "/home/administrator/clearml-agent-k8s/venv/lib/python3.6/site-packages/clearml_agent/helper/base.py", line 239, in _ call _
cls. instances[cls] = super(Singleton, cls). call_(*args, **kwargs)
TypeError: _ init _() got an unexpected keyword argument 'base_pod...
Hi yes, still getting the SSLs. It looks like some incompatibility with the OS ssl libraries.
Sorry take back. Just realised that this argument only worked on running the agent, but when you enqueue a task into this agent, the argument is not passed on to the container that the agent spawned.
This is the same issue for the docker image. It reverts back to nvidia/cuda:10.1-runtime-ubuntu18.04 despite me setting something else.
Hi, I was expecting to see the container rather then the actual physical machine. For example, in the file panel on the left of the jupyter panel, I see the file contents of the physical machine. I was expecting this to be the container.
yah i got that too. This happens when i run the client code on the same machine as the clearml-agent. So i'm wondering if sharing the same clearml.conf cause that problem. Is there a way to specify the clearml.conf instead of defaulting to ~/clearml.conf?
I managed to find out why. The docker image I'm using is not set as root user thus the error. But I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
Hi it is missing --docker on the agent. Thanks! Dynamic GPU option only available with Enterprise version right?
Yes it is! But ClearML didn't support multi node training out of the box in a way that it streamline the process. So we are trying to figure out a way to do it.
Hi thanks. How about Agent, does its docker mode or k8s mode require docker.sock to be exposed?
I think the default action of clearml-agent k8s glue when running a task is to create a virtual env and installing the dependancies. So i'm just checking how to change that behaviour to look at global instead.
No i didn't indicate this particular issue on the git issue. Only the apply template.yml is on the issue.
[root@2c7498711bef elasticsearch]# curl
`
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...
Hi, when i tried ip:port, it references the right host and bucket....BUT... the file is not found on the ECS S3 even though i can see from the logs that it states Completed model upload to s3://ecs.ai:80/clearml-models/artifacts/ ...
Hi, this is the setup.
clientfrom clearml import Task, Logger task = Task.init(project_name='DETECTRON2',task_name='Train',task_type='training') task.set_base_docker("quay.io/fb/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser" ) task.execute_remotely(queue_name="single_gpu", exit_process=True)
k8s_glue_example.py spawned a pod and starts running.
ClearML UI -> Experiment -> Results -> Console.
` At the top it will pri...
Hi,
It did, nvidia/cuda:10.1-runtime-ubuntu18.04.
So if i need to set this every time, what is the following config for? And how do i pass in new env parameters?
` default_docker: {
# default docker image to use when running in docker mode
image: "dockerrepo/mydocker:custom"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
Would you have an example of this in your code blogs to demonstrate this utilisation?
Its actually in your documentation. Its removed since 0.17 apparently.
https://allegro.ai/clearml/docs/docs/release_notes/ver_0_17.html#clearml-agent-0-17-2
And this is my logs, it tried to install something and encountered permission denied. It wouldn't if it obeyed the force_repo_requirements_txt.
1620664917916 Kahs-MacBook-Pro.local info ClearML Task: created new task id=024a421c0e174650a1c7ff64af756c26 ClearML results page:
`
1620664920359 Kahs-MacBook-Pro.local info ClearML Mon...
Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?