Reputation
Badges 1
25 × Eureka!GentleSwallow91 how come it does not already find the correct pytorch version inside the docker ? whats the clearml-agent version you are using ?
JitteryCoyote63
So there will be no concurrent cached files access in the cache dir?
No concurrent creation of the same entry 🙂 It is optimized...
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
This is an official Ubuntu container (nothing to do with ClearML), this is Very Very odd...
Hi @<1534706830800850944:profile|ZealousCoyote89>
We'd like to have pipeline A trigger pipeline B
Basically a Pipeline is a Task (of a specific Type), so you can have pipeline A function clone/enqueue the pipelineB Task, and wait until it is done. wdyt?
And when exactly are you getting the "user aborted" message)?
How do you start the process (are you manually running it, or is it an agent, or maybe pycharm?)
Can you provide the full log ?
ThickDove42 sorry, it took some time 🙂import json from trains.backend_api.session.client import APIClient client = APIClient() events = client.events.get_task_plots(task='task_id_here') table = json.loads(events.plots[0]['plot_str']) print('column order', table['data'][0]['cells']['values'])
Not the most comfortable way, but at least it is there
trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Hi PanickyMoth78
it was uploading fine for most of the day but now it is not uploading metrics and at the end
Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_events
This seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens
Hi GiddyTurkey39
Are you referring to an already executed Task or the current running one?
(Also, what is the use case here? is it because the "installed packages are in accurate?)
Hi @<1523701523954012160:profile|ShallowCormorant89>
This means the system did not detect any "iteration" reporting (think scalars) and it needs a time-series axis for the monitoring, so it just uses seconds from start
Hi FunnyTurkey96
what's the clearml server you are using ?
Hmm. What's the Hydra version you have?
the agent does not auto-refresh the configuration, after a conf file change you should restart the agent, after that it should present the new configuration when loading
@<1523704157695905792:profile|VivaciousBadger56> regrading: None
Is this a discussion or PR ?
(general ranting is saved for our slack channel 🙂 )
NastyOtter17
Usually the first report will happen after 30 seconds, could that be the difference ?
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
and the clearml server version ?
But pytorch has no specific backend, it uses TB.
No?! Can you point me to an example? What I mostly find is how to calc metrics not standard way to then store them...
Hm GiganticTurtle0 let me check quickly it
The task pod (experiment) started reaching out to an IP associated with malicious activity. The IP was associated with 1000+ domain names. The activity was identified in AWS guard duty with a high severity level.
BoredHedgehog47 What is the pod container itself ?
EDIT:
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
https://hub.docker.com/layers/library/ubuntu/18.04/images/sha256-d5c260797a173fe5852953656a15a9e58ba14c5306c175305b3a05e0303416db?context=explore
And your ~/clearml,conf ?
GiddyTurkey39 I think I need some more details, what exactly is the scenario here?
Hi AverageBee39
What's the clearml-server and clearml packge you are using ?
(I looks like some capability that is missing from the server, i.e. needs upgrade ?!)
Oh that is odd... let me check something