
Reputation
Badges 1
533 × Eureka!It's kind of random, it works sometimes and sometimes it doesn't
Lets see if this is really the issue
I suspect that it has something to do with remote execution / local execution of pipelines, because we play with this , so sometimes the pipeline task itself executes on the client, and sometimes on the host (where the agent is also)
Okay so at the first part of the code, we define some kind of callback that we add to our steps, so later we can collect them and attach the results to the pipeline task. It looks something like this
` class MedianPredictionCollector:
_tasks_to_collect = list()
@classmethod
def collect_description_tables(cls, pipeline: clearml.PipelineController, node: clearml.PipelineController.Node):
# Collect tasks
cls._tasks_to_collect.append(node.executed)
@classmethod...
Maybe even a dedicated argument specifically for apt-get
packages, since it is very common to need stuff like that
Hi guys, just updated the issue - seems like the new release did fix the color scale, but I notice some data points are missing (the plot is missing data!)
see my comment on the issue
https://github.com/allegroai/clearml/issues/373#issuecomment-894756446
AgitatedDove14 is the scale a part of the problem? Because not only the colors are wrong, the scale does not appear
SuccessfulKoala55 AppetizingMouse58
[ec2-user@ip-10-0-0-95 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 880K 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/nvme0n1p1 8.0G 6.5G 1.5G 82% / tmpfs 790M 0 790M 0% /run/user/1000
AgitatedDove14 just so you'd know this is a severe problem that occurs from time to time and we can't explain why it happens... Just to remind, we are using a pipeline controller task, which at the end of the last execution gathers artifacts from all the children tasks and uploads a new artifact to the pipeline's task object. Then what happens is that Task.current_task()
returns None
for the pipeline's task...
I'm using ip address show
Trains docs have at no point any mention on what should I do on the AWS interface... So I'm not sure at what point I should encounter this wizard
I'm going to play with it a bit and see if I can figure out how to make it work
but remember, it didnt work also with the default one (nvidia/cuda)
TimelyPenguin76 if I build a custom image, do I have to host it on dockerhub for it to run on the agent? If not how do I make the agent aware of my custom image?
SuccessfulKoala55 The simplest thing i can think of is on Task.execute_remotely
to be able to append ot the docker_init_bash_script
So could you re-explain assuming my piepline object is created by pipeline = PipelineController(...)
?
DangerousDragonfly8 but would this work if they are not concurrent but sequential?
Yep what 😄
and the machine I have is 10.2.
I also tried nvidia/cuda:10.2-base-ubuntu18.04 which is the latest
TimelyPenguin76 , this can safely be set to s3://
right?
I assume trains passes it as is, so I think the quoting I mentioned might work