not really π
Why would you want to set it up manually ? makes sense to have it in the cache folder, no?
JitteryCoyote63 maybe this is an old example of the pytrorch ddp code? it is basically copy pasted from the pytorch website:
https://pytorch.org/tutorials/intermediate/dist_tuto.html
I could take a look and figure that out.
This will greatly accelerate integration π
They don't give an in app notification.
Oh I see, I assume this is because the github account is not connected with any email, so no invite is sent.
Basically they should just be able to re-login and then they could switch to your workspace (with the link you generated)
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
I should manually copy it to the remote services agents?
The code itself needs to run somewhere, currently this has to be your machine, either you manually run the AWS autoscaler or an agents runs it for you. Make sense ?
Hi SquareFish25
Sure, here are a few:
HPO
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Pipeline
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
Automation:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
Hi SkinnyPanda43
Yes, I think you are right the documentation might be missing it. I'll make sure they know it π
In the meantime :task.update_output_model
https://github.com/allegroai/clearml/blob/d3929033c016476c580557639ff44f900e65904a/clearml/backend_interface/task/task.py#L734
(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
JitteryCoyote63 I think I failed explaining myself.
- I think the problem of the controller is that you are interacting (aka changing hyper parameters)) with a Task created using new SDK version, with an older SDK version. specifically we added section names to the hyper parameters, and only new version of the SDK is aware of it.
Make sense? - Regrading the actual problem. It seems like this is somehow related to the first one, the task at run time is using an older SDK version , and I t...
JitteryCoyote63 What did you have in mind?
WickedGoat98 sorry, I missed the thread...
that the trains.conf has to be located on the node running the trains-agent.
Correct π
The easiest way to check is to see if you can curl to the ip:port from the docker.
If you fail it is probably the wrong IP.
the IP you need to use is the IP of the machine running the docker-compose (not the IP of the docker inside that machine).
Make sense ?
link with "localhost" in it Oo
Hmm I think this is the main issue, for some reason the dataset default upload destination is "localhost", what do you have configured in your clearml.conf under files server?
I think I found something,
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/storage/helper.py#L1442
What's the boto version you have installed?
When exactly are you getting this error ?
I imagine that these phantom dependencies will prevent parallelization. Is there a workaround?
yes, they might... workaround might be a bit ugly but copy pasting the functions and changing the name
BTW: I'll check when is the next RC scheduled for, maybe it will already contain a fix π€
OutrageousSheep60 so this should work, no?ds.upload(output_url='gs://<BUCKET>/', compression=0, chunk_size=100000000000)Notice the chunk size is the maximum size (in bytes) per chunk, so it should basically very large
so when inside the docker, I donβt see the git repo and thatβs why ClearML doesnβt see it
Correct ...
I could map the root folder of the repo into the container, but that would mean everything ends up in there
This is the easiest, you can put it on the ENV variable :
None
This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
Itβs the correct way to do it, right?
Yep π that said this is not running as a service you will need to spin it on your machine. that said you can definitely connect it with the free SaaS server, and spin the serving on your machine with docker-compose
Yes, I think you are correct, verified on Firefox & Chrome. I'll make sure to pass it along.
Thanks SteadyFox10 !
GrievingTurkey78 can you send the entire log?
create a new file, copy paste to the new file these lines, and run it inside vscode, what are you getting in the console?
from clearml import Task Task.add_requirements("tensorflow") task = Task.init(project_name="debug", task_name="requirements") print("done")
It seems to follow a structure specific to clearml,
Actually plotly.js π
Thank you WackyRabbit7 please feel free to remind me if it slips away during my night time (yes I do sleep , contrary to common belief :))
Hi ReassuredTiger98
So let's assume we call:logger.report_image(title='training', series='sample_1', iteration=1, ...)And we report every iteration (keeping the same title.series names). Then in the UI we could iterate back on the last 100 images (back in time) for this title / series.
We could also report a second image with:logger.report_image(title='training', series='sample_2', iteration=1, ...)which means that for each one we will have 100 past images to review ( i.e. same ti...