
Reputation
Badges 1
25 × Eureka!No worries 🙂 glad it worked
IrateBee40 I think I have an idea what's wrong, https
could it be there is some firewall in the middle intercepting the entwork, and without installing SSL certificate the ssl connection is failing ?
For visibility, after close inspection of API calls it turns out there was no work against the saas server, hence no data
I can't think of any actual difference in flow ...
Can you try the following?task._setup_reporter() task.set_initial_iteration(0)
Hi ThickDove42 ,
Yes, but by the time you will be able to access it, it will be in a display form (plotly), not very convient.
If this is something you need to re-use, I would argue that it is an artifact and should be stored as artifact (then accessing it is transparent) , obviously you can both report as table and upload as artifact, no harm in that.
what do you think?
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do 🙂
The difference is that running the agent in daemon mode, means the "daemon" itself is a job in SLURM.
What I was saying is pulling jobs from the clearml queue and then pushing them as individual SLURM jobs, does that make sense ?
Okay good news, there is a fix, bad news, sync to GitHub will only be tomorrow
should i only do mongodb
No, you should do all 3 DBs ELK , Mongo, Redis
PompousParrot44 , so you mean like a base conda env?
Configuring trains-agent to use conda is done here:
https://github.com/allegroai/trains-agent/blob/699d13bbb34649c7e5337b4187cda59b7fa6fd33/docs/trains.conf#L44
Then for every experiment trains-agent will create a new conda environment based on the requirements of that experiment.
You can tell it to inherit the base conda env (or the one it is running from, I think) by settingsystem_site_packages: true
https://github.com/allegroai/tr...
hmm... try to run the trains-agent from the ml
environment with "system_site_packages: true", it might do the trick. Anyhow please let me know if it worked 🙂
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?
Hmm I'm sorry it might be "continue_last_task", can you try:Task.init(..., continue_last_task="aabb11")
Hi JealousParrot68
I'll try to shed some light on these modules and use cases.
Storagemanager is general speaking, low level access to http/object-storage/files utility. In most cases there is no need to directly use it if objects are already stored/managed on clearml (for example artifacts/models/datasets). But, it is quite handy to use with your S3 buckets etc.
Artifacts: Passing an artifact between Tasks will usually be something like:
` artifact_object = Task.get_task('task_id').artifa...
I find it quite difficult to explain these ideas succinctly, did I make any sense to you?
Yep, I think we are totally on the same wavelength 🙂
However, it also seems to be not too prescriptive,
One last question, what do you mean by that?
Hi ElegantCoyote26
is there a way to get a Task's docker container id/name?
you mean like Task.get_task("task_id_here").get_base_docker()
?
ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?
This is actually on the container level, meaning checked from inside the container. It should be what you are looking for
none of my pipeline tasks are reporting these graphs, regardless of runtime. I guess this line would also fix that?
Same issue, that said, good point, maybe with pipeline we should somehow make that a default ?
Oh, yes, that might be (threshold is 3 minutes if no reports) but you can change that:task.set_resource_monitor_iteration_timeout(seconds_from_start=10)
ElegantCoyote26 could be, if the Task run is under 30sec?!
Have a grid view (e.g. 3 plots per line instead of just one)Yes the plots are resizable move the cursor to the separating line and drag 🙂
2. Check the group by section, they can be split per series (like in TB)
Seems like passing the Task object is not working as expected (I'll make sure it is fixed).
Try:dataset._task.set_parent(Task.current_task().id)
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
From the log it installed:cudatoolkit==11.1.1
based on the CUDA it found on the host machine: agent.cuda_version = 110
But for some reason it installed the pytorch from the conda "pytorch" repo without the cuda support.
For .git-credentials remove the git_pass/git_user from the clearml.conf
If you want to use ssh you need to also add:force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/a2db1f5ab5cbf178840da736afdc370cfff43f0f/docs/clearml.conf#L25
Hi UnsightlySeagull42
Do you mean how tp pass user/pass (user/token) to the clearml-agent so it can clone your repository ?
https://github.com/allegroai/clearml-agent/blob/a2db1f5ab5cbf178840da736afdc370cfff43f0f/docs/clearml.conf#L18
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way 🙂
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65
Probably less secure though :)