or by trains
We just upload the image as is ... I think this is SummaryWriter issue
ReassuredTiger98 are you saying you want to be able to run the pipeline as a standalone and as "remote pipeline",
Or is this for a specific step in the pipeline that you want to be able to run standalone/pipelined ?
Can you share the log?
Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with
pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl
Then run the daemon with the additional --debug
argument, basically:
clearml-agent --debug daemon --foreground ...
Once the agent is running please send the Task's log from your console 🙂
Hi SubstantialElk6
Unicodeencodeerror:'ascii' codec can't encode characters in position 296-297: ordinal not in range (128)Â (edited)
I'm assuming this is the usual UTF8 missing from the container.
Can you try to launch it with PYTHONIOENCODING=utf-8
?
That makes no sense to me?!
Are you absolutely sure the nntrain is executed on the same queue? (basically could it be that the nntraining is executed on a different queue in these two cases ?)
Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?
in Your Additional ClearML Configuration
(which is basically clearml.conf configuration)
Add the following:environment { GOOGLE_APPLICATION_CREDENTIALS="~/gs.cred" } files { gsc { contents: "<this is your GCP storage credentials file>" path: "~/gs.cred" } }
Reference:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L421
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a...
If you want to rename it (any pipeline), click on the "Full details" in the "Run Info" (right hand side panel), then in the full detail of the Pipeline Task you will be able to rename the pipeline execution
(Is renaming useful? should we add a right click to rename ?)
Is this example working for you?
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_config.py
last iteration is no reset and I still have a gap in my scalars
Hmm is this reproducible ? can you check with the latest clearml version (1.10.3) ?
btw: I'm assuming continue_last_task=0
I think I found the issue, the fact the agent is launching it causes it to ignore the "overridden" set_initial_iteration
task.update({'script': {'version_num': 'my_new_commit_id'}})
This will update to a specific commit id, you can pass empty string '' to make the agent pull the latest from the branch
Hmm, let me see if you can somehow "signal" to the subprocess that it should not use the main process Task. (btw: are you forking or spawning a subprocess?)
What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)
I think this is the temp requirements it creates not your requirements file. If you attach a log here with the "installed packages" section maybe we could help to debug it
. Can I get gpu usage over time frame via API also?
task.get_reported_scalars
But this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?
Unfortunately that is correct. It continues as if nothing happened!
oh dear, let me make sure this is taken care of
And thank you for the reproduce code!!!
Check the log, the container has torch 1.13.0 but the task requires torch==1.13.1
Now torch package inside those nvidia prepackaged containers are compiled a bit differently . What I suspect happens is the torch wheel from pytorch is not compatible with this container . Easiest fix , change the task requirments to 1.13
Wdyt ?
my experiment logic
you mean the actual code doing the training ?
so that it gets lazily executed and not at task definition time
Task definition time -> when creating the Pipeline Task? remember the base_task_factory a the end creates a Task object (it does not run the code itslef).
BTW: if you have simple training logic you can use pipeline decorators , it might be a better fit?
https://clear.ml/docs/latest/docs/fundamentals/pipelines#pipeline-from-function-decorator
FiercePenguin76 the git repo should detect only clearml
as required python package
Basically the steps are:
decide if the initial python entry script is a standlone script (i.e. no local imports) in the git repo (in your example "task_with_deps.py") If this is a "standlone script" only look for imports inside the calling python script, and list those packages under "installed packages" If this is Note a standalone script, go over All the python files inside the repository, look for "i...
TrickyRaccoon92 I didn't know that 🙂
where did you try to add it? did you report a plotly figure or is it with report_???
Hi RipeGoose2
Just to clarify, the issue with the html stuck in cache is a UI, thing, basically the webapp needs to tell the browser not to cache the artifacts, it has nothing to do with how the artifacts are created.
Regardless we love improvements so feel free to mass around with the code and PR once you get something useful 😉
Specifically this is where the html conversion happens
https://github.com/allegroai/clearml/blob/9d108d855f784e1fe7f5691d3b7bf3be64576218/clearml/backend_in...
ContemplativePuppy11
yes, nice move. my question was to make sure that the steps are not run in parallel because each one builds upon the previous one
if they are "calling" one another (or passing data) then the pipeline logic will deduce they cannot run in parallel 🙂 basically it is automatic
so my takeaway is that if the funcs are class methods the decorators wont break, right?
In theory, but the idea of the decorator is that it tracks the return value so it "knows" how t...
Hi @<1529633468214939648:profile|CostlyElephant1>
Is it possible to get user ID of the current user
On the Task.data
object itself there should be a filed named " user
" that's the user ID of the owner (creator) of the Task.
You can filter based on this id with
Tasks.get_tasks(..., task_filter={'user': ["user-id-here"]})
wdyt?
does this work for multiple levels?
Yep 😄