Reputation
Badges 1
25 × Eureka!- In a notebook, create a method and decorate it by fastai.scriptβs
@call_parse
.Any chance you have a very simple code/notebook to reference (this will really help in fixing the issue)?
Thanks MinuteGiraffe30 , fix will be pushed later today
Hi @<1546303293918023680:profile|MiniatureRobin9> could it be the pipeline logic is created via the clrarml-task CLI? If this is the case, I think this is an edge case we should fix. Basically it creates a Task instead of pipeline, which in.essence only effects the UI. To solve it, just run the pipeline locally, notice that by default when you start it, it will actually stop the local run and relaunch itself on an agent.
Also, could you open a GitHub issue so we add a flag for it?
Hi AbruptWorm50
the second "epoch loss" is the scalar for the "validation" process (see "validation: epoch loss" series is actually the TF file/folder prefix automatically added)
Make sense ?
AbruptWorm50 can you send full image (X axis is missing from the graph)
Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.
Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agen...
"erasing" all the packages that had been set in the base task I'm cloning from. I
Set is not add, if you are calling set_packages, you are overwriting all of them with this single call.
You can however do:
task_data = task.export_task()
requirements = task_data["script"]["requirements"]["pip"]
requirements += "new packages"
task.set_packages(requirements)
I guess we should have get_requirements
?!
I think it is only in get_task
(and by default it is true)
I think query task does not filter the
Yeah we should definitely have get_requirements π
Hi @<1539055479878062080:profile|FranticLobster21>
hey, how do I use local files as dependencies?
You mean like a repository ?
Can I specify in task what local files do I use that should be packaged?
In a git repo?
Basically the agent can do two things, either replicate a single script or clone a git repo + uncommitted changes
I found the issue, the first run it jumps over the first day (let me check if we can quickly fix that)
Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart
We created an account, setup our data pipeline, and now we can't get back in. Nothing is in the project. Can someone from support reach out to help?
Hi @<1545216077846286336:profile|DistraughtSquirrel81>
You mean in the SaaS? (app.clearml.ml) or is it a local installation?
If this is the SaaS, could it be the data is on a different workspace ? (you can switch workspace and refresh the page)
Hi @<1547028031053238272:profile|MassiveGoldfish6>
hmm yeah you need to remove the "hidden" system_tag from the project
from clearml.backend_api.session.client import APIClient
c = APIClient()
print(c.projects.get_by_id("PROJECT_ID_HERE").to_dict())
c.projects.update(project="PROJECT_ID_HERE", system_tags=["test"])
print(c.projects.get_by_id("PROJECT_ID_HERE").to_dict())
Notice you can get the project ID from the URL
`/projects/1974af8ccdac454b836c47349c4e826e/experiments/84...
@<1533620191232004096:profile|NuttyLobster9> I think we found the issue, when you are passing a direct link to the python venv, the agent fails to detect the python version and since the python version is required for fetching the correct torch it fails to install it. This is why passing CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE=none
because it skipped resolving the torch / cuda version (that requires parsing the python version)
Sure thing, anyhow we will fix this bug so next version there is no need for a workaround (but the workaround will still hold so you won't need to change anything)
Hi @<1533620191232004096:profile|NuttyLobster9>
I, but no system stats. ,,,
If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?
Hi @<1523701066867150848:profile|JitteryCoyote63>
Setting to redis from version 6.2 to 6.2.11 fixed it but I have new issues now
Was the docker tag incorrect in the docker compose ?
Hi @<1691258563357315072:profile|ColorfulKitten60>
I think we need some context for this question π
Hi @<1691620877822595072:profile|FlutteringMouse14>
Do I have to use Hydra
You can, and then the entire configuration is fully captured by ClearML (automatically) while you can still override values with the manual "key.sub=value" both in the UI and in the CLI
Otherwise you can connect nested dict with task.connect (these will be flattened with /
for sub keys).
Or you can connect configuration files ( task.connect_configuration
) and edit them as is in the UI (with override of...
I still don't get resource logging when I run in an agent.
@<1533620191232004096:profile|NuttyLobster9> there should be no difference ... are we still talking about <30 sec? or a sleep test? (no resource logging at all?)
have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?
This is odd and somewhat consistent with actu...
there is a bug wherein both
Task.current_task()
and
Logger.current_logger()
return
None
.
This is not a bug this means something broke, the environment variable CLEARML_TASK_ID
Has to be set inside the agent's process
How are you running it? (also log π , you can DM so it is not public here)
with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.
This is odd, could you send the full Task log?
Thanks @<1694157594333024256:profile|DisturbedParrot38> !
Nice catch.
Could you open a github issue so that at least we output a more informative error?
Are hparms saved in hypeparameter section superior to hparams saved in configuration objects?
well I'm not sure about "superior" but they are structured, as opposed to configuration object, which is as generic as could be
Can you provide some further explanation, please? Sorry, I am beginner.
My bad, I was thinking out loud on improving the HPO process and allowing users to modify the configuration_object , not just the hyperparameters
Hi TrickyRaccoon92 , TB is automatically collected and converted into data stored on the system The UI uses plotly to display the data itself (on your web browser).
You still have the original TB protobuf file, if you want to dive deeper and debug the data (it is not automatically uploaded, but some users do upload it as additional artifact on the experiment)
Make sense ?
PompousParrot44 did you manage to get it working ?