Reputation
Badges 1
25 × Eureka!(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')
This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
hmm... try to run the trains-agent from the ml
environment with "system_site_packages: true", it might do the trick. Anyhow please let me know if it worked 🙂
print(requests.get(url='
print(requests.get(url='
Sure SharpDove45 ,from clearml import Model model = Model('model_id_aabbcc') model.system_tags += ['archived']
I think this is the main issue, is this reproducible ? How can we test that?
BTW: you should probably update the server, you're missing out on a lot of cool features 🙂
And your ~/clearml,conf ?
Hi SarcasticSparrow10
Is it better to post such questions on Stackoverflow so they benefit everybody?
Yes, I think you are correct it would please do 🙂
Try to do " reuse_last_task_id='task_id_here'" ,t o specify the exact Task to continue )click on the ID button next to the task name in the UI)
If this value is true it will try to continue the last task on the current machine (based on project/name, combination) if the task was executed on another machine, it will just start a ...
Hi MelancholyBeetle72
You mean the venv creation takes the bulk of the time, or it something else ?
TrickyRaccoon92 actually Click is on the to do list as well ...
Hi PompousBeetle71
Try this one, let me know if it helpedlogging.getLogger('trains.frameworks').setLevel(ERROR)
what do you see in the console when you start the trains-agent , it should detect the cuda version
How can I specify the agent to use a specific conda environment inside the docker?
Hi CrookedWalrus33
By default it will pick the highest python in the PATH.
Then if you have a python version (in PATH) that matches the requested on on the Task, it will look for it.
Do you want to limit it to a specific python binary ?
It might be the file upload was broken?
Sigint (ctrl c) only
Because flushing state (i.e. sending request) might take time so only when users interactively hit ctrl c we do that. Make sense?
ComfortableShark77 it seems the clearml-serving is trying to Upload data to a different server (not download the model)
I'm assuming this has to do with the CLEARML_FILES_HOST, and missing credentials. It has nothing to do with downloading the model (that as you posted, will be from the s3 bucket).
Does that make sense ?
So if I do this in my local repo, will it mess up my git state, or should I do it in a fresh directory?
It will install everything fresh into the target folder (including venv and code + uncommitted changes)
JitteryCoyote63 of course there is 🙂Task.debug_simulate_remote_task(task_id="<task_id_here>")
Hi RoughTiger69
seems to not take the pacakges that are in the requirements.txt
The reason for not taking the entire python packages, it will most likely break when trying to run inside the agent.
The directly imported packages aill essentially pull their required packages, and thus create a stable env on the remote machine. The agent then will store the Entire env, as it assumes it will be able to fully replicate it the next time it runs.
If the "Installed Packages" section is empty...
Hi JitteryCoyote63 , I cannot reproduce it... when I call set initial iteration 0, it does what I'm expecting, and resend the scalar. I tested with the clearml ignite example, any thoughts on how I can reproduce?
Let me check what's the subsampling threshold
Hi PungentLouse55
it depends on the trains-server version you are running.
If the trains-server >= 0.16 then you have to add "Args/" prefix. If you are running an older version, then you should not add any prefix.
Check the examples on the github page, I think this is what you are looking for 🙂
https://github.com/allegroai/trains-agent#running-the-trains-agent
WackyRabbit7 I do 'pkill -f trains' but it's the same... If you need to debug and test run with --foreground and just hit ctrl-c to end the process (it will never switch to background...). Helps?
- Yes Task.init should be called on each subprocess (because torch forks them before they ar epatched)
- I think the main issue is that we patch the argparse on the Subprocess (this is assuming you did not manually parse non argv argument)
- If you can create a mock test I think we can work around the issue, as long as the way you spin it is the standard pytorch distub way
Notice you have in the Path:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfi
But you should have:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/
Hi CloudySwallow27
Is there a way to still use the auto_connect but limit the amount of debug imgs?
Basically you can set the number of image it will store for you (per title/series combination)m the way it works it rotates the image names so essentially overriding old images (the UI is ware and will only show the last X of them)
See here on setting it:
https://github.com/allegroai/clearml/blob/81de18dbce08229834d9bb0676446a151046e6a7/docs/clearml.conf#L32
1e876021bbef49a291d66ac9a2270705
just make sure you reset it 🙂