agree, but setting the agentβs env variable TMPDIR
I think this needs to be passed to the docker with -e TMPDIR=/new/tmp
as additional container args:
see example
None
wdyt?
oh sorry my bad, then you probably need to define all OS environment variable for python temp folder for the agent (the Task process itself is a child process so it will inherit it)
TMPDIR/new/tmp TMP=/new/tmp TEMP=/new/tmp clearml-agent daemon ...
where people can do @'s for experiments/projects/tasks and even comparisons ...
ohhh I like that! for me this throws me directly to Slack integration .
I think my main question is, "is the discussion ephemeral?" in other words, is this an on going discussion that later no one will care about, or are we creating some "knowledge base" that we want to later share?
Also, by "address bar at the top", i assume you mean address url right?
yes... apologies for the phrasing, it was w...
MysteriousBee56 I see...
So yes, you can with the APIClient you have full RESTful access to the backend.
I think there was a similar discussion https://allegroai-trains.slack.com/archives/CTK20V944/p1593524144116300
HandsomeCrow5 how did you end up solving it? I think you had a similar use case?!
MysteriousBee56 I would do Task.create()
you can get the full Task internal representation with task.data
Then call task._edit(script={'repo': ...}) to edit/update all the Task entries.
You can check the dull details of the task object here: https://github.com/allegroai/trains/blob/master/trains/backend_api/services/v2_8/tasks.py#L954
BTW: when you have a sample script working, consider PR-ing it, I'm sure it will be useful for others π (also a great way to get us involved with debuggin...
Hi JealousParrot68
do tasks that are created through create_function_task run the entry_script again instead of just the pure function
Basically they will run the code until the "create_function_task" call, but never after. We are working on adding a decorator to a function, making it a "standalone" script, is this what you actually need ?
just want to be very precise an concise about them
Always appreciated π
It will automatically switch to docker mode
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
@<1734020162731905024:profile|RattyBluewhale45> could you attach the full Task log? Also what do you have under "installed packages" in the original manual execution that works for you?
because it should have detected it...
Did you see "Repository and package analysis timed out ..."
Hi @<1590152178218045440:profile|HarebrainedToad56>
Yes you are correct all TB logs are stored into the ELK in the clearml backend. This really scales well and rarely has issues, as long of course that the clearml-server is running on strong enough machine. How many RAM / HD you have on the clearml-server ?
How much free RAM / disk do you have there now? How's the CPU utilization ? how many Tasks are working with this machine at the same time
Hi FreshKangaroo33
clearml.conf is HOCON format, to parse you can use pyhocon:
https://github.com/chimpler/pyhocon
Or the built in version of clearml:from clearml.utilities.pyhocon import ConfigFactory config_dict = ConfigFactory.parse_string(text).as_plain_ordered_dict()
You can also just get the parsed objectfrom clearml.config import config_obj
Is it possible to make a checkbox in the profile settings. which would answer az the maximum limit for comparison?
This feature is becoming more and more relevant.
So we are working on a better UI for it, so that this is not limited (it's actually the UI that is the limit here)
specifically you can add custom columns to the experiment table (like accuracy loss etc), and sort based on those (multiple values are also supported, just hold the Shift-Key). This way you can quickly explore ...
I also found that you should have a deterministic ordering
before
you apply a fixed seed
Not sure I follow ?
Hi @<1523706645840924672:profile|VirtuousFish83>
could it be you have some permission issues ?
: Forbidden: updates to statefulset spec for fields other than 'replicas',
It might be that you will need to take it down and restart it. not while it is running.
(do make sure you backup your server π )
But my previous ques and other query are still not figured out.
What do you mean by "previous ques and other query" ?
Notice the error:
Cannot install albucore==0.0.13 and numpy==1.23.5 because these package versions have conflicting dependencies
what is the pip version you have configured in the clearml.conf? also can you provide the full Task log (i.e. click on Download in the web UI console tab)
Hi JealousParrot68
This is the same as:
https://clearml.slack.com/archives/CTK20V944/p1627819701055200
and,
https://github.com/allegroai/clearml/issues/411
There is something odd happening in the files-server as it replaces the header (i.e. guessing the content o fthe stream) and this breaks the download (what happens is the clients automatically ungzip the csv).
We are working on a hit fix to he issue (BTW: if you are using object-storage / shared folders, this will not happen)
Hi SubstantialElk6 I'll start at the end, you can run your code directly on the remote GPU machine π
See clearml-task
documentation, on how to create a task from existing code and launch it
https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md
That said, the idea is that you add the Task.init
call when you are writing/coding the code itself, then later when you want to run it remotely you already have everything defined in the UI.
Make sense ?
SoreDragonfly16 notice that if in the web UI you aborting a task it will do exactly what you described, print a message and quit the process. Any chance someone did that?
PompousBeetle71 If this is argparser and the type is defined, the trains-agent will pass the equivalent in the same type, with str
that amounts to '' . make sense ?
PompousBeetle71 you can also use ModelOutput.update_weights_package to store multiple files at once (they will all be packaged into a single zip, and unpacked when you get them back via ModelInput). Would that help?
So it is the automagic that is not working.
Can you print the following before calling Both Task.debug_simulate_remote_task
and Task.init
, Notice you have to call Task.initprint(os.environ)
JitteryCoyote63 What did you have in mind?
Have a grid view (e.g. 3 plots per line instead of just one)Yes the plots are resizable move the cursor to the separating line and drag π
2. Check the group by section, they can be split per series (like in TB)
Hover near the edge of the plot, the you should get a "bar" you can click on to resize