Reputation
Badges 1
282 × Eureka!In the Kube logs of the pod, i see 'Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease Temporary failure resolving http://security.ubuntu.com '. My guess is its trying to do a apt update.
As we are on disconnected network, we have a server hosting the repo but on a differennt name.
Thanks AgitatedDove14 , will take a look.
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
So i kept trying, but i'm stuck on this when i run python k8s_glue_example.py
TypeError: init () got an unexpected keyword argument 'base_pod_num'
Reply…
I'm not familiar with elastic. What role does elastic play in ClearML?
Yes for both clearml and clearml-agent
This would be solved if --env GIT_SSL_NO_VERIFY=true is passed to the k8s pod that's spawned to run the job. Currently its not.
Thanks AgitatedDove14 , unfortunately it didn't take effect.
i passed it through the yaml as follows.apiVersion: v1 kind: Pod spec: containers: - image: clearml-agent:latest" env: - name: PIP_INDEX_URL value: "
" - name: PIP_TRUSTED_HOST value: "192.168.56.253" - name: PIP_FIND_LINKS value: "
` "
- name: GIT_SSL_NO_VERIFY
value: true
resources:
requests:
cpu: "2"
...
So these (PIP_INDEX_URL) weren't used when clearml starts running pip.
I'm also noticing a lot of this while the k8s glue is running.Ex: Expecting value: line 1 column 1 (char 0) K8S Glue pods monitor: Failed parsing kubectl output:
I did another test by runningkubectl exec pod-name -- echo $PIP_INDEX_URL
and it returned nothing. So the env are not passed to the container at all.
Do you mean this?Removing containers section: [{'image': 'clearml-agent:latest"', 'env': [{'name': 'PIP_INDEX_URL', 'value': '
'},
Hi, please correct me if i am wrong, to use the glue, i need the following.
A k8s cluster A kubectl that is connected to the k8s cluster A pip install of clearml-agent 0.17.1
So i did all the above, I'm not what it meant by running the entire thing on own machine.
Hi, clearml-agent==0.17.2rc3 did work. I'm on a 1.19 k8s cluster, and has this error when a task is pulled. Is the glue not compatible with 1.19?
` Pulling task 3a90802d1dfa4ec09fbccba0beffbaa8 launching on kubernetes cluster
Pushing task 3a90802d1dfa4ec09fbccba0beffbaa8 into temporary pending queue
Kubernetes scheduling task id=3a90802d1dfa4ec09fbccba0beffbaa8
kubectl output:
Flag --replicas has been deprecated, has no effect and will be removed in the future.
Flag --generator has been depre...
Unfortunately it's not. The problem previously encountered with the docker method surfaced again. In this case, the BASE DOCKER IMAGE
nvidia/cuda:10.1-runtime-ubuntu18.04 --env GIT_SSL_NO_VERIFY=true
is not taking effect with the k8s glue.
clearml=1.0.3
python=3.8.10clearml-data upload --id 12314jhg42342j4j --storage
http://ecs.ai is an on-prem DELL EMC ECS that serves as our S3 storage configured with s self signed cert.
Hi.
We tried as advised above and it still didn't work.
Host: http://ecs.ai:443
output_uri = S3://ecs.ai:443/bucketname
This time round the client gave this error.
Botocore.exceptions.connectiinclosederror: connection was closed before we received a valid response from endpoint URL: ' http://ecs.ai/bucketname/.clearml.test '.
It's quite apparent that whatever clearml passed to boto3 ends up as a http call instead of https, which is wrong.
I can't seem to find the version number on the clearml web app. Is there a specific way?
Hi. The upgrade seems to go well but i'm seeing one wierd output. When i ran a task and observe the software installed
under the execution
tab , i still see clearml=0.17
. Is this expected?
Hi SuccessfulKoala55 , can i confirm the following comments in the docker-compose.yml ?
And after that to run docker-compose commands without loss of data?
docker-compose down docker-compose up
docker-compose.yml
`
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
#...
Hi. If we disable the API service, how will it affect the system? How do we disable?
It also stopped taking in tasks from the queue after that.
That didn't work as well...
I would say yes, otherwise the vscode feature is only available on internet connected premises due to the hard coded URL to download vscode.
Hi, building a container with vscode is not possible. If i have an alternative location for the vscode, where should i indicate in the configuration?