Reputation
Badges 1
25 × Eureka!This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
(torchvision vs. cuda compatibility, will work on that),
The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)
BTW: for future reference, if you set the ulimit in the bash, all processes created after that should have the new ulimit
Hi SteadyFox10
Yes we changed the Web UI, to something more intuitive (but after you get used to the original design , I guess not that obvious).
After selecting a bunch of experiment, right click one of them, you will be able to archive them all (it will display the number of experiments you are about to archive)
Hi @<1547028031053238272:profile|MassiveGoldfish6>
Is there a way for ClearML to simply save the model once training is done and to ignore the model checkpoints?
Yes, you can simple disable the auto logging of the model and manually save the checkpoint:
task = Task.init(..., auto_connect_frameworks={'pytorch': False}
...
task.update_output_model("/my/model.pt", ...)
Or for example, just "white-label" the final model
task = Task.init(..., auto_connect_frameworks={'pyt...
Yes, consider VexedCat68 txt file the Dataset "content" , this will enable ypu to safely get the list of files, and then you can use the StorageManager to download them extend this concept and have it built into the Dataset itself, i.e. allow you to add files as links and make sure it will just download them. The caveat here is that the Dataset at the end, returns a folder with the files, when you specify links, you have to also specify the target location locally (at the end you want a fol...
Anyone wants to open a github issue, so we actually end up implementing it π ?
Hi OutrageousSheep60
AS-IS
- without compressing or breaking it up into chunks.
So for that I would suggest to manually archive it, and upload as external link?
Or are you saying you want to control the compression used by Dataset class ?
https://github.com/allegroai/clearml/blob/72d9b22e0d27f317a364acfeacbcf5c70f852e8c/clearml/datasets/dataset.py#L603
GiddyTurkey39 do you mean to delete them from the server?
The main reason to add the timeout is because the warning was annoying to users π
The secondary was that clearml will start reporting based on seconds from start, then when iterations start it will revert back to iterations. But if the iterations are "epochs" the numbers are lower so you end up with a graph that does not match the expected "iterations" x-axis. Make sense ?
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec π
sudo curl -L "
-s)-$(uname -m)" -o /usr/local/bin/docker-compose
If you do not have a lot of workers, that I would guess console outputs
ShaggyHare67
Now theΒ
trains-agent
Β is running my code but it is unable to importΒ
trains
Β ...
What you are saying is you spin the 'trains-agent' inside a docker? but in venv mode ?
On the server I have both python (2.7) and python3,
Hmm make sure that you run the agent with python3 trains-agent
this way it will use the python3 for the experiments
BeefyCow3 see this https://allegroai-trains.slack.com/archives/CTK20V944/p1593077204051100 :)
JitteryCoyote63 in the UI what's the value of "config" ? Is it empty, it a string?
Also, could you check if removing the 'type=str' from the add_argument changes the behavior?
that clearml-agent needs to be installed from system python mentioned anywhere in the docs, if not I suggest it gets added.
You are right, I will check and fix if not π
Thank you so much for helping.
My pleasure
Just curious about the timeout, was it configured by clearML or the GCS? Can we customize the timeout?
I'm assuming this is GCS, at the end the actual upload is done GCS python package.
Maybe there is an env variable ... Let me google it
DepressedChimpanzee34
What's the hydra version ?
I tested with 1.1.0dev3 and it worked for me
JitteryCoyote63 I think I failed explaining myself.
- I think the problem of the controller is that you are interacting (aka changing hyper parameters)) with a Task created using new SDK version, with an older SDK version. specifically we added section names to the hyper parameters, and only new version of the SDK is aware of it.
Make sense? - Regrading the actual problem. It seems like this is somehow related to the first one, the task at run time is using an older SDK version , and I t...
Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB
web-server seems okay, could you send the logs from the api-server?
Also if you can, the console logs from your browser, when you get the blank screen. Thanks.
like what all are important metric monitoring queries w.r.t. the serving tasks that can be visualized and shown in grafana?
Basically latency amd requests per minute are automatically reported. Additional reports are based on your RestAPI in/out.
Imagine the following restapi request json payload
{x=123, y=456}
and a return json of
{z=789}
The metrics you can add to the monitoring are the keys on both these jsons, i.e. "x", "y", "z"
These metrics can be both log...
Long story short, not any longer (in previous versions of k8s it was possible, but after the runtime container change it is not supported)
I solved the issue by implementing my own ClearML logger
This is awesome! any chance you want to PR it to transformers ?
Thanks JumpyPig73
Yeah this would explain it ... (if hydra is setting something else we can tap into that as well)
Those variables are not passed to the remote instance they are used by the aws autoscaler to launch it, but there is no need to pass them.
I think the easiest is to add them to the "extra_vm_bash_script" as well
Hi @<1707565838988480512:profile|MeltedLizard16>
Maybe I'm missing something but gust add to your YOLO code :
from clearml import Dataset
my_files_folder = Dataset.get("dataset_id_here").get_local_copy()
what am I missing?