Hi AgitatedDove14 , Just updated that flag, but the problem continues..
` agent.package_manager.system_site_packages = true
.....
Environment setup completed successfully
Starting Task Execution:
ClearML results page: files_server:
Traceback (most recent call last):
File "base_template_keras_simple.py", line 15, in <module>
import tensorflow as tf # noqa: F401
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/binding/import_bind.py", line 59, in __pat...
AgitatedDove14 Not creating but more for orchestrating...
Currently, we manually push a dataset to cleaml-dataset .
Have a pipeline controller Task which (takes in data from clearml-dataset, runs preprocessing, runs training) and Publishes a model (if certain threshold is met).
We have clearml monitor which will monitor all Published models .It will push the uri of the published model to a rabbitmq.
We have a subscriber (python code) listening to the rabbitmq. This takes in the uri from t...
Yeah within clearml , we use the PipelineController. We are now mainly looking for a single tool to stitch together other products.
But of course, will give first precedence to tools which will work best with clearml. Thus asking, if anyone has had similar experience on setting up such systems.
hi FriendlySquid61 , The clearml-agent got filled up due to values.yaml file. However, agentservices was empty so I filled it up manually..
Hi AgitatedDove14 ,
At this point, Showing the url of the cleamltask might be sufficient. Unless in the future, someone wants it to be customised.
But the bigger question is if there is tool to aid with this workflow building ? We are currently experimenting with airflow/prefect.
HI another qn,dataset_upload_task = Task.get_task(task_id=args['dataset_task_id'])
iris_pickle = dataset_upload_task.artifacts['dataset'].get_local_copy()
How would I replicate the above for Dataset ? Like how to get the iris_pickle file. I did some hacking likewise below.ds.get_mutable_local_copy(target_folder='data')
Subesequently, I have to load the file by name also.I wonder whether there is more elegant way
The use case, is lets say i runpython k8s_glue_example.py --queue glue_q
And some guys pushes an hyperparameterization job with over 100 experiments to the glue_q, one minute later, I push a simple training job to glue_q.. But I will be forced to wait for the 100 experiments to finish.
Github Issue : https://github.com/allegroai/clearml-agent/issues/50
AgitatedDove14 , Have added the github issue as requested. Thanks for the help. 👍
` python3 k8s_glue_example.py --queue glue_high_q glue_low_q
usage: k8s_glue_example.py [-h] [--queue QUEUE] [--ports-mode] [--num-of-services NUM_OF_SERVICES] [--base-port BASE_PORT] [--base-pod-num BASE_POD_NUM] [--gateway-address GATEWAY_ADDRESS]
[--pod-clearml-conf POD_CLEARML_CONF] [--overrides-yaml OVERRIDES_YAML] [--template-yaml TEMPLATE_YAML] [--ssh-server-port SSH_SERVER_PORT] [--namespace NAMESPACE]
k8s_glue_example.py: error: unrecognized arguments: glue...
Hi, Some walk around I thought of.. Btw, I havent tried . AnxiousSeal95 , your comments
1 ) Attach a clearml-task id to each new dataset-id
So in the future, when new data comes in, get the last data commit from the project(Dataset) and get the clearml-task for it. Then clone the clearml-task, and pass in the new data. The only downside, is the need to clone the cleaml-task.
Or alternatively
2) Attach a gitsha-id of the processing code to each new dataset-id.
This can't give the exact code ...
kkie..now I get it.. I set up the clearml-agent on an EC2 instance. and it works now.
Thanks
Could it be another applications's "elasticsearch-pv" and not clearml's
It'll be good if there was yaml file to deploy clearml-agents into the k8 system.
However, I am able to get it to work, if I launch a clearml-agent outside the kubernetes ecosystem.
When I push a job to an agent node, i got this error.
"Error response from daemon: network None not found"
Hi, sorry for the delayed response. Btw, all the pods are running all good.
This is where I downloaed the log. Seems like some docker issue, though i cant seem to figure it out. As an alternative, I spawned a clearml-agent outside the k8 environment and it was able to execute well.
Hi, will proceed to close this thread. We found some issue with the underlying docker in our machines. We've have not shifted to another k8 of ec2 instances in AWS.
Btw, this is just the example code from clearml repo..
I just had to set up the clearml-agent on my machine. Closing this issue.
Hmm, unfortutenly it is still pending as in nothing is running
Currently, in the diagram here.. Clearml File server is shown as a local storage drive. Our 2 primary concerns.
Is there any ways , we can scale this file server when our data volume explodes. Maybe it wouldnt be an issue in the K8s environment anyways. Or can it also be configured such that all data is stored in the hdfs (which helps with scalablity). Is there any security to protect this data in this storage ?