Answered

Hi Everyone, I’M Trying To Run A Clearml Pipeline. I’M Using The "

Hi everyone,

I’m trying to run a ClearML pipeline. I’m using the " Pipeline from function" script and running it with the ClearML agent. However, the run fails even though I don’t get any errors in the terminal.

The following packages were successfully installed: Successfully installed Pillow-10.4.0 SQLAlchemy-2.0.34 annotated-types-0.7.0 boto3-1.35.16 botocore-1.35.19 clearml-1.16.4 exceptiongroup-1.2.2 greenlet-3.1.0 iniconfig-2.0.0 jmespath-1.0.1 joblib-1.4.2 pandas-2.2.2 pluggy-1.5.0 pydantic-2.7.4 pydantic-core-2.18.4 pydantic-settings-2.5.2 pytest-7.4.4 python-dotenv-1.0.1 pytz-2024.2 s3transfer-0.10.2 scikit-learn-1.5.1 scipy-1.13.1 structlog-24.4.0 tenacity-9.0.0 threadpoolctl-3.5.0 typing-extensions-4.12.2 tzdata-2024.1

Also, the virtual environment was added to the cache:
Adding venv into cache: /root/.clearml/venvs-builds/3.9

Despite this, the task is failing. Could someone help me with this?

Thanks!

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

Votes Newest

Answers 9

The clearML agent works within the Kubernetes cluster

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

Hi @<1669152726245707776:profile|ManiacalParrot65> , is this a specific task or the controller?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Its actually happens in both sometimes in the pipeline task and sometimes in pipeline controller

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

I just found out that ClearML Agent has a service mode. However, I'm currently using ClearML Agent with a Helm chart on Kubernetes (K8s). How can I start the agent in service mode in this setup?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

Screenshot of the problem

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> can you help me with that? I can provide you more information if you need 🙌

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

@<1669152726245707776:profile|ManiacalParrot65> could you please send your values file override for the Agent helm chart?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

@<1729671499981262848:profile|CooperativeKitten94> here:

None

global:
imageRegistry: "docker-proxy.nexmart.com:5000"
clearml:
existingAgentk8sglueSecret: "clearml-agent-secret"
existingClearmlConfigSecret: "clearml-agent-secret"
agentk8sglue:
defaultContainerImage: "repo.nexmart.com:5000/nm-container-python:3.9"
apiServerUrlReference: " None "
fileServerUrlReference: " None "
webServerUrlReference: " None "

Use SA from default agent

serviceExistingAccountName: "test-clearml-clearml-agent-sa"
queue: "services"
initContainers:
resources:
requests:
memory: "50M"
cpu: "50m"
limits:
memory: "200M"
cpu: "1"
resources:
requests:
memory: "50M"
cpu: "10m"
limits:
memory: "500M"
cpu: "500m"
basePodTemplate:
env:
- name: "GL_PACA_PIPELINE_TRIGGER_TOKEN"
valueFrom:
secretKeyRef:
key: "gl_paca_trigger_token"
name: "clearml-agent-mlops-secret"
resources:
requests:
memory: "50M"
cpu: "10m"
limits:
memory: "750M"
cpu: "500m"

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

@<1729671499981262848:profile|CooperativeKitten94> I still have this problem even with other pipelines, can you please help me?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					ManiacalParrot65
				
					0
					 × 1

Write your answer

183 Views

9 Answers

3 months ago