
Reputation
Badges 1
25 × Eureka!Yes, no reason to attach the second one (imho)
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
Also, don't be shy, we love questions 🙂
JitteryCoyote63 the agent.cuda_version
(or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version
should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDN...
BroadMole98 thank you for noticing !
I'll make sure it is fixed (a few other properties are also missing there, not sure why, I'll ask them to take a look)
Hmm I guess doable 🙂 could you open a github issue with feature request ?
If we have enough support it will bump it in the priority 🤞
I'm sorry JitteryCoyote63 No 😞
I do know that the enterprise addition have these features (a.k.a vault & permissions), basically to answer these types of situations.
Found it, definitely a bug in the callback, it has not effect on the HPO process itself
I see... In the triton pod, when you run it, it should print the combined pbtxt. Can you print both before/after ones? so that we could compare ?
GiganticTurtle0 quick update, a fix will be pushed, so that casting is based on the Actual value passed not even type hints 🙂
(this is only in case there is no default value, otherwise the default value type is used for casting)
If this is the case then the easiest is:from clearml.backend_api.session.client import APIClient client = APIClient() res = client.events.get_task_plots(task="<task-id>")
We should defiantly have a nice interface 🙂
` from clearml.automation.parameters import LogUniformParameterRange
sampler = LogUniformParameterRange(name='test', min_value=-3.0, max_value=1.0, step_size=0.5)
sampler.to_list()
Out[2]:
[{'test': 1.0},
{'test': 3.1622776601683795},
{'test': 10.0},
{'test': 31.622776601683793},
{'test': 100.0},
{'test': 316.22776601683796},
{'test': 1000.0},
{'test': 3162.2776601683795}] `
I think this is the temp requirements it creates not your requirements file. If you attach a log here with the "installed packages" section maybe we could help to debug it
Any recommended way to make a task/pipeline “pause” until some external condition is met?
RoughTiger69 I would setup a trigger on the Dataset (i.e. new version)
https://github.com/allegroai/clearml/blob/df3d3b269acd2df0f31bfe804eb54ddc84d807c0/examples/scheduler/trigger_example.py#L44
wdyt?
Can I delete logs from existing experiments on the ClearML server?
Only by resetting the Task (which would delete everything), or deleting the Task iteself.
You can also disable the auto console log, and report manually ?
Task.init(..., auto_connect_streams=False)
So if you are using the latest clearml (i.e. +1.3) reenqueuing the pipline will automatically continue it from where it stopped.
With previous versions (which is your case, I think), you clone the pipeline Task, change the parameter and enqueue it.
(The state itself of the pipeline is stored on the Task, and when you clone it, you are cloning the state as well).
Make sense ?
Hi DilapidatedDucks58 just making sure, the link is pyrorch nightly artifactory? Or is it a direct link to the package? Reason for asking, I was not aware they have proper artifactory... When the task runs the trains agent will update the installed packages with all the installed packages it used. Could you verify you have the correct version?
Regarding the extra files, you are correct, the docker container is reset every run, so they will get lost. What are those files for? Could you add ...
Hi SarcasticSparrow10
Is it better to post such questions on Stackoverflow so they benefit everybody?
Yes, I think you are correct it would please do 🙂
Try to do " reuse_last_task_id='task_id_here'" ,t o specify the exact Task to continue )click on the ID button next to the task name in the UI)
If this value is true it will try to continue the last task on the current machine (based on project/name, combination) if the task was executed on another machine, it will just start a ...
Hi JitteryCoyote63 , is there a callback for that?
Hmm okay let me check that, I think I understand the issue
ContemplativeCockroach39 unfortunately No directly as part of clearml 😞
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
Hi @<1663354518726774784:profile|CrookedSeal85>
However, I systematically notice a jump of some number of "ghost iterations" when resuming my trainings...
Try the following:
task = Task.init(..., continue_last_task=0
from the Task.init docstring (Notice this value can be both boolean and integer)
:param bool continue_last_task: Continue the execution of a
...
- An integer - Specify initial iteration offset (override the auto automatic last_iteratio...
@<1523706266315132928:profile|DefiantHippopotamus88> seems like you are missing the ports 🙂
CLEARML_WEB_HOST="
"
CLEARML_API_HOST="
"
CLEARML_FILES_HOST="
"
It is the folder the clearml creates and the folder we create ourself to store the predictions
I see... If that is the case, the only solution I can think of is manually uploading the files with StorageManager(...) then get the url, and register it as debug_media or artifact:logger.report_media("image", "type a", iteration=iteration, url="
") task.upload_artifact('a link', artifact_object='
')