
Reputation
Badges 1
25 × Eureka!SmarmySeaurchin8 just so that I don't miss anything.
One machine, two trains-agents each one connected to a different trains-server, correct ?
from the trains-agent --help
trains-agent --config-file /home/user/my_trains_server1.conf daemon trains-agent --config-file /home/user/my_trains_server2.conf daemon
yey π notice that when executed by the agent the call execute_remotely
is skipped, and so does the If statement I added (since running_locally will return False when the process is executed by the agent)
that does make more sense π
JitteryCoyote63 Hmmm in theory, yes.
In practice you need to change this line:
https://github.com/allegroai/clearml/blob/fbbae0b8bc933fbbb9811faeabb9b6d9a0ea8d97/clearml/automation/aws_auto_scaler.py#L78
` python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker} --gpus 0 --detached
python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker} --gpus 1 --detached
python -m clearml_agent --config-file '/root/clearml.conf' d...
No worries, just wanted to make sure it doesn't slip away π
You could change infrastructure or hosting, and now your data is associated with the wrong URL
Yeah that makes sense, so have it on a specific dns name? (this is usually the case with k8s deployments)
so far I understand, clearml tracks each library called from scripts and saves the list of this libraries somewhere (as I assume, this list is saved as requirements.txt file somewhere - which is later loaded into venv, when pipeline is running).
Correct
Can I edit this file (just to comment the row with "object-detection==0.1)?
BTW, regarding the object-detection library. My training scripts have calls like:
Yes in the UI, iu can right click on the Task select "reset", then it...
I would like to put table with url links and image thumnails.
StraightParrot3 links will work inside table (your code sample looks like the correct way to add them), but I think plotly (which is the UI package that displays the table) does not support embedding images into tables π
When they add it, the support will be transparent and it would work as you expect
Hi GrievingTurkey78
I think the main issue is the lack of support for jsonargparse
, is that correct ?
(vanilla pytorch lightning is using argpraser, which seems to work out of the box)
Hmm I think you are correct:param auto_create: Create new dataset if it does not exist yet
it should have created it, this seems like a bug, I'll make sure to pass along π
at means I need to pass a single zip file toΒ
path
Β argument inΒ
add_files
Β , right?
actually the opposite, you pass a folder (of files) to add_files. Then add_files remembers the files location (and pre calculates the hash of the files content). When you call upload
it will actually compress the files that changed into a zip file (or files depending on the chunk size), and upload the files to the destination (as specified in the upload
call...
I can't find out how to pass my custom clearml.conf
Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)
Do we have it on the git issue ?
Hi @<1625303806923247616:profile|ItchyCow80>
Could you add some prints ? Is it working without the Task.init call? the code looks okay and the - No repository found,
message basically says it logs it as a standalone script (which makes sense)
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes π I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
All the 3 steps can be found here:
https://github.com/allegroai/trains/tree/master/examples/pipeline
MotionlessCoral18 I think there is a fix in the latest clearml-agent RC 1.4.0rc0 can you test and update if your are still having this issue?
OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?
Any chance you can PR a fix to the docs?
Hi AttractiveCockroach17
. Many of these experiments appear with status running on clearml even though they have finish running,
Could it be their process just terminated? (i.e. not properly shutdown) ?
How are you running these multiple experiments?
BTW: if the server does not see any change in a Task for (I think the default is 2 hours) it will automatically mark these Task as aborted
Now that we have the free tier (a.k.a community server) we might change the default behavior.
The idea is always to allow an easy way to on-board and test the system.
ReassuredTiger98
BTW: what's the scenario where your machine reverted to the default configuration (i.e. no configuration file) ?
SubstantialElk6
Regrading cloning the executed Task:
In the pip requirements syntax, "@" is a hint that tells pip where to find the package if it is not preinstalled.
Usually when you find the @ /tmp/folder
It means the packages was preinstalled (usually pre installed in the docker).
What is the exact scenario that caused it to appear (this was always the case, before v1 as well).
For example zipp
package is installed from pypi be default and not from local temp file.
Your fix b...
SourSwallow36 it is possible.
Assuming you are not logging metrics by the same name, it should work.
try:Task.init('examples', 'training', continue_last_task='<previous_task_id_here>')
Thanks OutrageousGrasshopper93
I will test it "!".
By the way the "!" is in the project or the Task name?
ElegantCoyote26 could you upgrade the docker-compose ?