Hmmm, what's your trains version ?
Well it should work out if the box as long as you have the full route, i.e. Section/param
Thanks SubstantialElk6 !
Happy new year 🎉 🍺 🍾 🎇
Task.init should be called before pytorch distribution is called, then on each instance you need to call Task.current_task() to get the instance (and make sure the logs are tracked).
SmallAnt76
see https://clear.ml/pricing/ , under "What plan should I choose?"
what you are looking for is the first column "open-source". make sense ?
BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
I think a dataset monitoring example wil...
Hi SarcasticSparrow10 , so yes it does, this is more efficient when using pytorch loaders, and in some other situations.
To disable it add to your clearml.conf:sdk.development.report_use_subprocess = false2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
Could it be there is a Task.init being called Before this code snippet ?
however can you see the inconsistency between the key and the name there:
Yes that was my point on "uniqueness" ... 😞
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?
I want in my CI tests to reproduce a run in an agent
you mean to run it on the CI machine ?
because the env changes and some things break in agents and not locally
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
It should also work with host IP and two docker compose files.
I'm not sure where to push a for a unified docker compose?
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')Basically there is no "random_state" only "General/random_state"
Make sense ?
@<1523706266315132928:profile|DefiantHippopotamus88> seems like you are missing the ports 🙂
CLEARML_WEB_HOST="
"
CLEARML_API_HOST="
"
CLEARML_FILES_HOST="
"
it is just local copy so you can rerun and reconfigure
It reflects what is stored by Keras, so if Keras stores the best model this is what you get. BTW if you pass output_uri=True it will automatically upload the models
BTW: I tested the code you previously attached, and it showed the plot in the "Plots" section
(Tested with latest trains from GitHub)
UnevenDolphin73 sounds great, any chance you can open a git issue on clearml-agent repo for this feature request ?
Hi @<1695969549783928832:profile|ObedientTurkey46>
Why do tags only show on a version level, but not on the dataset-level? (see images)
Tags of datasets are tags on "all the dataset versions" i.e. to help someone locate datasets (think locating projects as an analogy). Dataset Version tags are tags on a specific version of the dataset, helping users to locate a specific version of the dataset. Does that make sense ?
Hi JitteryCoyote63 when you run the trains-agent it tells you where it puts the logs, it's a temp auto generated filename usually under /tmp/Running TRAINS-AGENT daemon in background mode, writing stdout/stderr to /tmp/.trains_agent_daemon_out4uahki3i.txt
assume clearml has some period of time that after it, shows this message. am I right?
Yes you are 🙂
is this configurable?
It is 🙂task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it
I think this is the main issue, how come it does not catch it? Are you using argparser ?
Hi AgitatedTurtle16
You can find documentation here:
https://github.com/allegroai/clearml-session
Basically it uses the cleaml-agents to launch a session on one of the machines in the cluster.
In the remote session itself it install jupyterlab + vscode-server, then it connects to the remote session (running on the agent's machine) automatically over ssh and creates tunnel to these services.
Yea I know, I reported this
LOL, apologies these days it a miracle I still remember my login passwords 😉
Hi @<1658281099807166464:profile|SmallCamel52>
Lack of authentication in all versions of the fileserver component
Are you leaving the fileserver open to the world ?
I think you cannot change it for a running process, do you want me to check for you if this can be done ?
Hi GreasyPenguin14
Quick question, any reason not to use a 2D scatter ? or a histogram (or any other non time-series plot)?
@<1533619716533260288:profile|SmallPigeon24> , failed task should not actually be reused (i.e. cached), are you saying a failed Task is being reused? or are you saying that you want to "invalidate" the cache in the execution but still leave the Task as completed ?
That is exactly that, the trains-agent is replicating the code from the git repo, and trying to apply the git diff (see uncommitted changes section). Obviously it failed 🙂
- look at immediate parents for identically-named files
....
UnevenDolphin73 are you saying this will be your way to log the diff between two versions (for increased visibility) ?
If so, how would you visualize it ?
(I really like this idea of visualizing the changeset, trying to think if there is "smart" way to create a callback to make the approach kind of best-practice) wdyt?