Reputation
Badges 1
25 × Eureka!Sorry, on the remote machine (i.e. enqueue it and let the agent run it), this will also log the print 🙂
OutrageousGrasshopper93tensorflow-gpu
is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?
CharmingStarfish14 can you check something from code, just to see if this would solve the issue?
the issue moving forward is if we restart the pod we will have to manually update that again.
Can't you map the nginx configuration file ? (making the changes persistent across pods)
The release was supposed to be out this week, got delayed by some py2 support issue, anyhow the release will be almost exactly like the latest we now have on the GitHub repo (and I'm assuming it will be out just after the weekend)
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
query
tasks
that are both Running --> You mean
status=["in_progress"]
Yes!
How do I figure out other possible parameter I can use with
status
parameter?
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
https://clear.ml/docs/latest/docs/references/api/definitions#taskstask
Filter only tasks that start say
10 min ago
. Is there any parameter for it also ?
last_update or created then use...
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
Will do 🙂 give me 5
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
ExcitedFish86 Oh if this is the case:
in your cleaml.conf:agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: true
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://git...
Could not find a version that satisfies the requirement open3d==0.15.2 .. from versions: 0.10.0.0, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.13.0)
This points to the agent installing using a different python version that you run the original code, I would guess python3.6
is number of calls performed, not what those calls were.
oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones
Hi @<1600661428556009472:profile|HighCoyote66>
However, we need to allocate resources to ourselves manually, using an
srun
command or
sbatch
Long story short, there is a full SLURM integration, basically you push a job into the ClearML queue and it produces a slurm job that uses the agent to setup the venv/container and run your Task, but this is only part of the enterprise version 😞
You can however do the following (notice this is ...
We should probably add (set_task_type :))
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
DrabCockroach54 that is quite cool!
Basically here is what I would do
Query Tasks that are both Running and Do not have system tag "development" (that means running on agents) + filter only tasks that start say 10 min ago Go over the list and see if (1) they have GPU scalar reported (meaning GPU is accessible) (2) min/max/val of GPU utilization is under 5%wdyt?
GreasyPenguin14 the demo-server is soon to be deprecated, so we are slow on upgrades there. But you can already see it in the SaaS free tier.
https://app.community.clear.ml/
from clearml.backend_api.session.client import APIClient c = APIClient() c.projects.update(project="project-id-here", system_tags=[])
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
FlutteringWorm14 an RC is out (1.7.3dc1) with the ability to configure from clearml.conf
you can now setsdk.development.worker.report_event_flush_threshold
from clearml.conf
Actually this is by default for any multi node training framework torch DDP / openmpi etc.
Could you please add it, I really do not want to miss it 🙂
I have to commit the YAML with my AWS credentials to git.
CleanPigeon16 please do not 🙂
either put them on the Task itself, or as OS env on the machine/agent running the Task.
Regrading where it is stored (I think the default is DevOps project, need to look at the code)
YummyWhale40 no idea what the pytorch-lighting guys did there. let me check a the actual code.
Oh right, I missed the fact the helper functions are also decorated, yes it makes sense we add the tags as well.
Regarding nested pipelines, I think my main question is , are they independent or are we generating everything from the same code base?
I found something btw, let me check...