
Reputation
Badges 1
282 × Eureka!Hi, for both of them, args.lastiter
is the exact same value. But when plotted out, they are 2 actually iterations apart.
Just to put a ping for those on this side of the timezone to look at. Thanks.
Thanks that did solve the problem, the tasks are running again.
Hi, i have the same question. Why would this be ignored if called remotely?
https://clear.ml/docs/latest/docs/references/sdk/task/#set_base_docker
Thought this looked familiar.
https://clearml.slack.com/archives/CTK20V944/p1635323823155700?thread_ts=1635323823.155700&cid=CTK20V944
Nice, what are the names of the talks?
Ok, i guess i will have to kill the whole thing and refresh it.
Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.
Hi. nice read. Your permalink is wrong though, here's the right one.
https://cpatrickalves.com/mlops-what-it-is-and-why-does-it-matter
Here's my two cents worth.
I thought its really nice to start off the topic highlighting 'pipelines', its unfortunately one of the most missed component when ppl start off with ML work. Your article mentioned about drfits and how MLOps process covered it. I thought there are 2 more components that was important and deserves some mention.Retraining pipelines. ML engineers tend not to give much thought to how they want to transit a training pipeline in development to a automated retraining pipe...
I would like to run ClearML agent on kubernetes. So basically I need to run the image on a pod, but there isn't any information on how the agent would communicate with the code, nor how it would spawn more pods to run the task.
Hi, the latest k8sglue-example.py was last commited about 4 months ago. Are you refering to that version?
Thanks. This appears to be solely for web UI and API, What if i want to orchestrate on K8S?
yes, previously run experiments. I will just kill clearml-elastic container if that may solve the problem.
I see i understand better now. Thanks.
docker exec clearml-elastic curl
zsh: no matches found:
running git diff
on my terminal in this repo gave nothing. nothing at all.
and yes, there are stuff in there. In fact its been running for a few weeks with no issue. This appears to have happened after i added new workers, though i can't be sure this is the cause. Is there a limit to the number of workers that i can add for community edition?
Hi, i was reading this thread and wondered which version of clearml-server and clearml-agent has this taken effect with?
Hi AgitatedDove14 , i dug a bitt deeper. I saw this in installed packages
in the original completed task. When the task is cloned, this is copied over and thus the problem. Can i ask, how ClearML create the list of installed packages? Why is it that some of them (E.g. attr is being pulled from @ file:///tmp/build/80754af9/attrs_1604765588209/work)
` absl-py==0.11.0
alabaster==0.7.12
antlr4-python3-runtime==4.8
apex==0.1
appdirs==1.4.4
argon2-cffi==20.1.0
ascii-graph==1.5.1
async-gener...
Hi TimelyPenguin76 , i am adding a debug sample to an existing task using the above method. What should i put for the iteration? I do not want to overwrite existing ones but i do not know what's the last count. This is for both scalar and media reporting.
Ok i get the logic now. extra_docker_shell_script
executes before clearml-agent talks to clearml server.
It didn't work as expected.
` task init
task report iter 10
task init
task report iter 10
The second task pushed the reporting iteration to 20 instead. `
Hi,
basically i run this block first and ended the script.task = Task.init(project_name="afro-nmt", task_name=args.taskname, continue_last_task=args.taskid) Logger.current_logger().report_scalar(title="BLEU",series="JW300",value=args.jwbleu, iteration=args.lastiter)
Then i run another script, with series different.
` task = Task.init(project_name="afro-nmt", task_name=args.taskname, continue_last_task=args.taskid)
Logger.current_logger().report_scalar(title="BLEU",series="SS900",value=arg...
So these (PIP_INDEX_URL) weren't used when clearml starts running pip.
I've been reading the documentation for a while and I'm not getting the following very well.
Given an open source codes say, huggingface. I wanted to do some training and i wanted to track my experiments using ClearML. The obvious choice would be to use Explicit Reporting in ClearML. But the part on sending my training job. and let ClearML orchestrate is vague. Would appreciate if i can be guided to the right documentation on this.