Hi RoughTiger69
A. Yes makes total sense . Basically you can use Task.export Task.import to do achieve this process (notice we assume the dataset artifacts links are available on both, usually this is the case)
B. The easiest way would be to use Process , then one subprocess is exporting from dev , where the credentials and configuration is passed with os environment. The another subprocess imports it to the prod server (again with os environment pointing to the prod server). Make sense?
Thanks SparklingHedgehong28
So I think I'm missing information on what you call "Instance protection" ?
You mean like respining spot instances ? or is it away to review the performance of AWS ASG (i.e. like a watchdog of a sort) ?
Hi SparklingHedgehong28
What would be the use for "end of docker hook" ? is this like an abort callback? completion ?
instance protection
Do you mean like when instance just died (line spot in AWS) ?
SparklingHedgehong28 this is actually quite cool! Still not sure why not just use the built in autoscaler https://github.com/allegroai/clearml/tree/master/examples/services/aws-autoscaler , but it is a really cool usage of ASG 🤩
As we use a custom CUDA image, we do not want this running on user login, and get ugly error messages about missing symlinks.
You can customize the startup bash script (running inside Any container) here:
https://github.com/allegroai/clearml-agent/blob/bf07b7f76d3236c1118b81730c6d9718705a795a/docs/clearml.conf#L145
LackadaisicalOtter14 Would that help?
MelancholyElk85
After I set base docker for pipeline controller task, I cannot clone the repo...
What do you mean by that?
Also, how do you set the PipelineController base_docker_image (I'm assuming the is needed to run the pipeline logic?!, is that correct?)
Hi TartSeal39
So the thing is, the agent does not support yaml env for conda. Currently if the requirements section is empty, the agent will use the requirements.txt of the repo. We first need to add support for conda yaml, and then allow you to disable the auto requirements or push the specific yaml. Would that work? Also is there a reason the auto package is not working?
TartSeal39 please let me know if it works, conda is a strange beast and we do our best to tame it.
Specifically when you execute manually on a conda env we collect (separately) the conda packages & the python packages (so later we can replicate on both conda & pip, or at least do our best)
Are you running both development env and agent with conda ?
Yep 🙂
Basically:
` task = Task.get_task(task_id='aaaa')
while task.status not in ('completed', 'stopped',):
do something ?
sleep(15) `(Notice task.status / task.get_status() will refresh the Task status on every call)
Maybe this one?
https://github.com/allegroai/clearml/issues/448
I think it is already there (i.e. 1.1.1)
Hi ExuberantParrot61
Is the pipeline logic code running from inside the repo?
Hi ExuberantParrot61 the odd thing is this, message
No repository found, storing script code instead
when you are actually running from inside the repo... (
is it saying that on a specific step, or is it on the pipeline logic itself?
Also any chance you can share the full console output ?
BTW:
you can manually specify a repo branch for a step:
https://github.com/allegroai/clearml/blob/a492ee50fbf78d5ae07b603445f4983feb9da8df/clearml/automation/controller.py#L2841
Example:
https:/...
DistressedGoat23 notice the last argument in report_histogram, 'extra_layout'
https://clear.ml/docs/latest/docs/references/sdk/logger#report_histogram
You can then specify the plotly histogram orientation, full details here:
https://plotly.com/javascript/reference/bar/
I'm assuming the one you are after is 'orientation '
https://plotly.com/javascript/reference/bar/#bar-orientation
BeefyHippopotamus73 this error seems like it is coming from boto3, are you sure the credentials are properly configured and that you have read permission ?
RipeGoose2 models are automatically registered
i.e. added to the models artifactory, but it only points to where the files are stored
Only if you are passing the output_uri
argument to the Task.init, they will be actually uploaded.
If you want to disable this behavior you can passTask.init(..., auto_connect_frameworks={'pytorch': False})
I still think the issue is getting boto3 credentials
It might be the case
Are you using clearml-agent or are you running it manually ?
This one seem to work
` from clearml import Task
task = Task.init(...)
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('_mpl-gallery')
make data:
np.random.seed(10)
D = np.random.normal((3, 5, 4), (0.75, 1.00, 0.75), (200, 3))
plot:
fig, ax = plt.subplots()
vp = ax.violinplot(D, [2, 4, 6], widths=2,
showmeans=False, showmedians=False, showextrema=False)
styling:
for body in vp['bodies']:
body.set_alpha(0.9)
ax.set(xlim=(0, 8), xticks=np.arang...
FrothyShark37 what was different in your script ?
Thanks FrothyShark37
I just verified, this would work as well, I suspect what was missing is the plt.show
call, this is the actual call that triggers clearml
Can you post here the actual line? seems like we can fix it to also support this scenario (if we could test it)
Fixed in pip install clearml==1.8.1rc0
🙂
can I add user properties to a scheduler configuration?
please expand, what do you mean by user property and how one would use it?
Hi DrabCockroach54
This seems like a pip issue trying to install from source, try upgrading the pip version and before installing numpy, it should solve it 🤞
Hi FierceHamster54
Dataset is downloading multi threaded already
But yes get_local_copy() is thread / process safe
Thanks GorgeousMole24
That is a very good point! passing to product guys
What is the Model url?print(model.url)
You could change infrastructure or hosting, and now your data is associated with the wrong URL
Yeah that makes sense, so have it on a specific dns name? (this is usually the case with k8s deployments)
SpicyCrab51 you can change the task to complete, it is just a state change nothing will actually change other than the status. Task.get_task(pass_dataset_id_here).mark_complete()
Hi SpicyCrab51 ,
Hmm, how exactly is the Dataset opened?
If the Dataset object is alive for 30h it will keep the dataset alive, why isn't it being closed ?
Okay, so basically set a template for the pod, specifying the docker image. Make sure you pass the correct trains-server configuration (i.e. api/web/file server addresses and credentials), and select the queue name the agent will listen to.
container image / details
https://hub.docker.com/r/allegroai/trains-agent
https://github.com/allegroai/trains-agent/tree/master/docker/agent
Full environment variable list to pass can be found here:
https://github.com/allegroai/trains-server/blob/...