Reputation
Badges 1
25 × Eureka!connect_configuration
seems to take about the same amount of time unfortunately!
I think it is a better solution, that said from your description it sounds the issue is the upload bandwidth (i.e. json-ing the dict itself), could that be it?
(and even 1000 entries seems like something that would end up at 1mb upload, that is not that much)
Hmm let me check first when it is going to upgraded and if there is a workaround
GiganticTurtle0 your timing is great, the plan is to wrap-up efforts and release early next week (I'm assuming GitHub fixes will be pushed tomorrow I'll post here once they are there)
SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step function or are you decorating a function with PipelineDecorator ?
Would this be equivalent to an automated job submission from clearml to the cluster?
yes exactly
I am looking for a setup which allows me to essentially create the workers and start the tasks from a slurm script
hmm I see, basically the slurm Admins are afraid you will create a script the clogs the SLURM cluster, hence no automated job submission, so you want to use slurm as a "time on cluster" and then when your time is allocated, use clearml for the job submission, is that cor...
Hi @<1742355077231808512:profile|DisturbedLizard6>
the problem maybe in returning None in get_local_model_file()
This tracks, it means that the model file cannot be downloaded for some reason,
when you click on the model here: None
what doe sit say under "MODEL URL:"?

This will update to a specific commit id, you can pass empty string '' to make the agent pull the latest from the branch
I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it 🙂
Ok, I think figured it out.
Nice!
ClearML doesn't add all the imported packages needed to run the task to the Installed Packages
It does (but not derivative packages, that are used by the required packages, the derivative packages will be added when the agent is running it, because it creates a new clean venv and then it add the required packages, then it updates back with everything in pip freeze, because it now represents All the packages the Task needs)
Two questions:
Is t...
A single query will return if the agent is running anything, and for how long, but I do not think you can get the idle time ...
Hi IntriguedRat44
Sorry, I missed this message...
I'm assuming you are running in manual mode (i.e. not through the agent), in that case we do not change the CUDA_VISIBLE_DEVICES.
What do you see in the resource monitoring? Is it a single GPU or multiple GPUs?
(Check the :monitor:gpu in the Scalar tab under results,)
Also what's the Trains/ClearML version you are suing and the OS ?
can i run it on an agent that doesn't have gpu?
Sure this is fully supported
when i run clearml-serving it throughs me an error "please provide specific config.pbtxt definion"
Yes this is a small file that tells the Triton server how load the model:
Here is an example:
https://github.com/triton-inference-server/server/blob/main/docs/examples/model_repository/inception_graphdef/config.pbtxt
RipeGoose2 That sounds familiar. Could you test with the latest RC?pip install trains==0.16.4rc0
HandsomeCrow5 OMG the guys already added it to the debug samples as well, checkout the demo app (drop down "test html sample"):
https://demoapp.trains.allegro.ai/projects/4e7fef090aa849b1acc37d92b59b3360/experiments/83c9ed509f0e421eaadc1ef56b3af5b4/info-output/debugImages
Hi AttractiveShrimp45
Well, I would use the Task.connect to add a section with any configuration your are using. for exampleTask.current_task().connect(my_dict_with_conf_for_data, name="dataset51")wdyt?
Then the only other option is the /tmp is out of space (pip uses it to uncompress the .whl files, then it deletes them)
wdyt?
Hi NastyFox63 could you verify the fix works?pip install git+
Hi @<1603198134261911552:profile|ColossalReindeer77>
I would also check this one: None
ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf
- Artifacts and models will be uploaded to the output URI, debug images are uploaded to the default file server. It can be changed via the Logger.
- Hmm is this like a configuration file?
You can do.
local_text_file = task.connect_configuration('filenotingit.txt')
Then open the 'local_text_file' it will create a local copy of the data in runtime, and the content will be stored on the Task itself. - This is how the agent installs the python packages, but if the docker already contactains th...
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
Hi @<1523701304709353472:profile|OddShrimp85>
You mean something like clearml-serving ?
None
yes, looks like. Is it possible?
Sounds odd...
Whats the exact project/task name?
And what is the output_uri?
OutrageousGrasshopper93tensorflow-gpu is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?