data:image/s3,"s3://crabby-images/ea8fc/ea8fc4a242d3fbf9f124d8906a48b69b89ea53a2" alt="Profile picture"
Reputation
Badges 1
25 × Eureka!Maybe we should do that automatically ? wdyt?
CloudyHamster42 you mean that when you set sdk.metrics.tensorboard_single_series_per_graph
to True and you rerun the experiment, you are still getting multiple series on the same graph?
What's your Trains version?
Hi DilapidatedDucks58
apologies, this thread slipped way.
I double checked, there server will not allow you to overwrite it (meaning to have it fixed will need to release a server version which usually takes longer)
That said maybe we can pass an argument to the "Task.init" so it ignores it? wdyt?
Hi DilapidatedDucks58
is this something new ?
usually copy pasting directly from the UI parses everything, no?
Hi ConvolutedSealion94
You can archive / delete the SERVING-CONTROL-PLANE
Task from the DevOps project in the UI.
Do notice you will need to make sure the clearml-serving is updated with a new sesison ID or remove it (i.e. take down the pods / docker-compose)
Make sense ?
Were you able to interact with the service that was spinned? (how was it spinned?)
DilapidatedDucks58 use a full link , without the package namegit+
delete logged images and texts though
logged images are also stored there?
Hi PanickyAnt52
hi, is there a way to get back the pipeline object when given a pipeline id?
Yes basically this is a specific type of Task, anything you stored on it can be accessed via the Task object, i.e. pipeline_task=Task.get_task(pipeline_id)
I'm curious, how would you use it?
BTW: since pipeline is also a Task you can have a pipeline launch a step that is a pipeline by its own
Basically you create the Task and make sure the "Dataset" is attached to it:task = Task.init(...) dataset = Dataset.create(task=task) dataset.add_files(...)
This will make sure the code is attached to the Dataset
hmmm, somehow I have a bed feeling about it... Could you check the log, it should say something like "Collecting torch==1.6.0.dev20200421+cu101 from https://"
It should be right at the top of the installation. What do you have there?
Hi ElegantCoyote26
If there is, it will have to be using the docker-mode, but I do not think this is actually possible because this is not a feature of docker. It is possible to do on k8s, but that's a diff level of integration π
EDIT:
FYI we do support k8s integration
Hi @<1560798754280312832:profile|AntsyPenguin90>
The image itself is uploaded in a blackground process, flush just triggers the starting of the process.
Could it be that it is showing a few seconds after?
Thanks DefeatedOstrich93
Let me check if I can reproduce it.
well I do not think you set your pytorch lightining to use cuda:
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
Hi ZealousSeal58
What's the clearml version you are using ?
If there was a "debug mode" for viewing the stack trace before the crash that would've been most helpful...
import traceback traceback.print_stack()
Hi GreasyLeopard35
I try to resume a stopped or aborted parameter optimization experiment,
How are you continuing the HPO? are you runing everything locally? is this with an agent? are you seeing the '[0, 0]' value on the configuration when launching the HPO or when continuing it ?
task.set_script(working_dir=dir, entry_point="my_script.py")
Why do you have this part? isn't it the same code, the script entry point is auto detected ?
... or when I run my_script.py locally (in order to create and enqueue the task)?
the latter, When the script is running locally
So something like
os.path.join(os.path.dirname(file), "requirements.txt")
is the right way?
Sure this will work π
where is it running? could you restart all the dockers ? Is it running on your machine?
(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)
Hi EagerOtter28
Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: ...
Use Dataset.sync (or clearml-data sync) to check which files where changed/added.
All files are already hashed, right? I wonder whyΒ
clearml-data
Β does not keep files in a semi-flat hierarchy and groups them together to datasets?
It kind of does, it has a full listing of all the files with their hash (SHA2) values, ...
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
FreshParrot56 we could add this capability, but the main caveat is that f your version depends on multiple parent versions you still need to download and extract all the parent versions, which means that when you clear them you might hurt later performance. Does that make sense? What is the use-case / scenario for you?
I think EmbarrassedSpider34 is correct.
When you pass the requirements to clearml-task, actually the agent depending on how it was configured (conda / pip) will do the installation.
That said, maybe it is worth adding support to provide the env.yml in the CLI ?
(Notice that adding specific channels needs to be configured on the agent, they are not stored per Task)
AlertCamel57 wdyt?
The import process actually creates a new Task every import, that said if you take a look here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/task.py#L1733
you can pass a pre-existing Task ID to "import_task" https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/task.py#L1653
@<1671689437261598720:profile|FranticWhale40> this one: None
Hmm that is odd. Let me take a look and ask the guys. Thank you for quickly testing the RC! I'm hoping a new RC with a fix will be there tomorrow, if we can quickly replicate
I'm guessing this is done through code-server?
correct
I'm currently rolling a JupyterHub instance (multiuser, with codeserver inside) on the same machine as clearml-server. Thatβs where tasks are executed etc. so, all browser dev env.
Yeah, the idea with clearml-session each user can self serve themselves the container that works best for them. With a jupyterhub they start to step on each other's toes very quickly ...
Done HandsomeCrow5 +1 added π
btw: if you feel you can share how your reports looks like (screen shot is great), this will greatly help in supporting this feature , thanks