
Reputation
Badges 1
25 × Eureka!This doesn't seem to be running inside a container...
What's the clearml-agent launch command you are using ? (i.e. do you have --docker flag)
Yes I think the difference is running conda install with arguments vs conda install with env file...
@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task()
you can use the parts that you need in the dict and pass them to:task_overrides
argument in add_step
(you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)
ModelCheckpoint('best_model', save_best_only=True)
That worked for me now, what's the diff
GreasyPenguin66 you can pass:AZURE_STORAGE_ACCOUNT AZURE_STORAGE_KEY
As the default azure access/secret 🙂
I "think" I have a clue on the issue that is lost here in the translation:
Specifically to me it all comes down to the definition of "pipeline"
From the clearml perspective:
Manual Task - code that is executed by the user (or any other mechanism Outside of the agent)
Remote Task - code that is executed by the Agent
Pipeline is a Task
Pipeline can be "manual task" but also "remote task"
Pipeline generates "remote tasks"
Task status (e.g. pipeline status as it is also a Task) can be: draft, a...
Edit the cloned version and enqueue it?
Hi GreasyPenguin14
Yes, I think you are right the series name should be next to the title. Let me check it...
Do you think the local agent will be supported someday in the future?
We can take this ode sample and extent it. can't see any harm in that.
It will enable very easy to ran "sweeps" without any "real agent" installed.
I'm thinking roll out multiple experiments at once
You mean as multiple subprocesses, sure if you have the memory for it
GrievingTurkey78 please feel free to send me code snippets to test 🙂
Essentially the example provide just prints out ids to the log file,
What do mean?
@<1523710674990010368:profile|GreasyPenguin14> make sure it to uses https not ssh:
edit ~/clearml.conf
force_git_ssh_protocol: false
and that you have both git_user & git_pass set in your clearml.conf
Hi SparklingHedgehong28
What would be the use for "end of docker hook" ? is this like an abort callback? completion ?
instance protection
Do you mean like when instance just died (line spot in AWS) ?
EnviousStarfish54 you can also run the docker-compose on one of the machines on your local LAN. but then you will not be able to access it from home 🙂
Not yet 😞
It should not be complex to implement,
The actual aws auto scaler class is implementing just two functions:
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L104
def spin_down_worker(self, instance_id):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L...
replace the base-docker-image and it should work fine 🙂
The issue is the 400 returned form the server, let me check with backend guys
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
Just making sure i understand, you are to upload your models with clearml to the Yandex compatible s3 storage?
Also what do you have in the "Configuration" section of the serving inference Task?
Is the agent idle ? it is running something else ?
It doesn't not seem to be related to the upload. The upload itself finished... What's your Trains version?