Reputation
Badges 1
282 × Eureka!Hi ResponsiveHedgehong88 , I was trying to do the same thing but the loggerhook doesn't seem to work. The console log and scalar logs didn't come out when I registered via init.py and load via log_config. Are you able to share how you configure it?
So i kept trying, but i'm stuck on this when i run python k8s_glue_example.py
TypeError: init () got an unexpected keyword argument 'base_pod_num'
Reply…
first line to make sure kubectl is connected to k8s.
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.
yes its on purpose, each user would have their own AWS credentials for default_output_uri.
Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.
Referencing your suggestion, we can configure output_uri on task.set_base_docker() but how should we do this for the credentials?
I'm having the same problem. You using latest clearmagent? Is your docker image a root user by default?
I also see this on my logs, noting that the config is read in but its still printing the supposedly hidden keys on the logs and UI.agent.hide_docker_command_env_vars.enabled = true agent.hide_docker_command_env_vars.extra_keys.0='TRAINS_AGENT_GIT_USER' ..... docker_cmd=harbor.ai/public/detectron2:v3 --env TRAINS_AGENT_GIT_USER=gituser
Hi SuccessfulKoala55 , just to add, my clearml.conf (client) and clearml.agent.conf (agent) can have differing values. I'm not sure which one takes precedence and if this could be the cause.
Hi AgitatedDove14 , i've got the same error. It would appear that the code references clearml_agent/helper/base.py
which i believe is part of clearml-agent v0.17.1. Could that be the issue?
Ok thanks, looking forward to it. Would you advise on the bug you encountered?
Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
Ok, that seems clearer, thanks.
Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
ap...
clearml-serving does not support Spacy models out of the box among many others and that Clearml-Serving only supports following;
Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
Support Deep Learning Models (Tensorflow, PyTorch, ONNX).
An easy way to extend support to different models would be a boon.
I believe in such scenarios, a custom engine would be required. I would like to know, how difficult is it to create a custom engine with clearml-serving? For example, in this...
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.
Hi thanks.
So i suppose ClearML make use of the information in .git folder at the root of the script folder to gather those info.
I have yet to go through thoroughly with ClearML agent. TimelyPenguin76 , so if i run a training with uncommited changes and didn't commit/push after. When i clone the task, isn't ClearML agent unable to pull that script from the git repo?
Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?
Hi, i can't seem to find the source. What are the kind of situations where it will try to install torch outside of user requirements?
This is strange then, is it possible for clearml logs to register successfully saving into a S3 storage when actually it isn't? For example, i've seen in past experiences with certain S3 client that saved onto a local folder called 's3:/' instead of putting it on S3 storage itself.
Going back to the open source, I think that adding the credentials as part of the source code might allow to have "credentials" auto populate as part of the remote execution, wdyt?
Not sure how this will work when i can't supply the credentials to ClearML programatically.
Hi, we are still not getting the model repo to work, mainly due to clearml.storage failing to save the models.
We tried a vanilla boto3 code and it works, but we can't figure out why we get connectionreseterror 104 when clearml does it.
How do we configure clearml in correspondence to following boto code?
S3= boto3.resource('s3', endpoint_url=' https://ecs.ai ', aws_access_key_id='mykey', aws_secret_access_key='mysevret', config=Config(signature_version='s3v4'), region_name='us-east-1', ve...
Hi, by agent logs i suppose you meant the logs from the ClearML server console panel?
ah... thanks!
Thanks TimelyPenguin76 , let me try it out now.
Hi TimelyPenguin76 , i am adding a debug sample to an existing task using the above method. What should i put for the iteration? I do not want to overwrite existing ones but i do not know what's the last count. This is for both scalar and media reporting.
It didn't work as expected.
` task init
task report iter 10
task init
task report iter 10
The second task pushed the reporting iteration to 20 instead. `