
Reputation
Badges 1
25 × Eureka!Hmm StrangePelican34
Can you verify you call Task.init before TB is created ? (basically at the start of everything)
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
Bugs, definitely GitHub, this is the easiest to track.
Documentation, if these are small issues, Slack is fine, otherwise, GitHub issue.
Regrading the documentation, we are working on another iteration of improvement, but if you find inaccuracies/broken links please report π
Hi WackyRabbit7
Yes, we definitely need to work on wording there ...
"Dynamic" means you register a pandas object that you are constantly logging into while training, think for example the image files you are feeding into the network. Then Trains will make sure it is constantly updated & uploaded so you have a way to later verify/compare different runs and detect dataset contemplation etc.
"Static" is just, this is my object/file upload and store it as an artifact for me ...
Make sense ?
Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion
This is a good point! I'll make sure we stress it (BTW: it will work with elevated credentials, but probably not recommended)
BTW: if you could implement _AzureBlobServiceStorageDriver
with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620
...I'm not sure I follow, the clearml-task
is designed to always be used so that at the end the agent will be running the Task. What am I missing?
would I have to execute each task in the pipeline locally(but still connected to trains),
Somehow you have to have the pipeline step Task in the system, you can import it from code, or you can run it once, then the pipeline will clone it and reuse it. Am I missing something ?
Okay yes, that's exactly the reason!! Cross origin blocks the file link
So I have a task that just loads a model, but I don't see it as an artifact in the UI
You should see it under Artifacts, Input model if you are calling Keras load function (or similar)
What is the Model url?print(model.url)
btw: any specific reason to call current_task after you closed the main Task ?
One more question, in the second log, trains agent is configured with Conda, on the first it is configured with pip, or at least this is what it looks like, can you confirm?
Notice that you need to pass the returned scroll_id to the next call
scroll_id = response["scroll_id"]
BroadMole98 Awesome, can't wait for your findings π
JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
Hi FancyWhale93 you can disable the auto model uploading with@PipelineDecorator.component(..., auto_connect_frameworks={'pytorch': False}) def step(): pass
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
Hi MuddySquid7
Hmmm what would be the use case ? (I mean how are we using Vertex ?)
I could take a look and figure that out.
This will greatly accelerate integration π
My current experience is there is only print out in the console but no training graph
Yes Nvidia TLT needs to actually use tensorboard for clearml to catch it and display it.
I think that in the latest version they added that. TimelyPenguin76 might know more
BattyLion34 let me see if I understand.
The same base_task_id when cloned by the UI and enqueues on the same queue as the pipeline, will work but when the pipeline runs the same Task it fails?!
Could it be that you enqueue them on different queues ?
OutrageousGiraffe8 this sounds like a bug, how can we reproduce it?
Maybe a add another layer here?
https://github.com/allegroai/clearml/blob/a47f127679ebf5912690f7c3e60791a2daa5c984/examples/frameworks/tensorflow/tensorflow_mnist.py#L40
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
SmarmySeaurchin8
updated_tags = task.tags
updated_tags.remove(tag)
task.tags = updated_tags
If this is how the repo links look like, do not set anything in the clearml.conf
It "should" use the ssh for the ssh links, and http for the http links.