Reputation
Badges 1
25 × Eureka!Ohh okay something seems to half work in terms of configuration, the agent has enough configuration to register itself, but fails to pass it to the task.
Can you test with the latest agent RC:0.17.2rc4
I think my main point is, k8s glue on aks or gke basically takes care of spinning new nodes, as the k8s service does that. Aws autoscaler is kind of a replacement , make sense?
Okay let me check if I can test on this git version.
trains[azure] give you the possibility to do the following:from trains import StorageManager my_local_cached_file = StorageManager.get_local_copy('azure://bucket/folder/file.bin')
This means you do not have to manually download stuff/ and maintain the cache local cache, the StorageManager will do that for you.
If you do no need that ability, no need to install the trains[azure]
you can just install trains
Unfortunately, we haven't had the time to upgrade to the Azure storage v...
Hi ObnoxiousStork61
Is it possible to report ie. validation scalars but shifted by 1/2 iteration?
No ๐ these are integers
What's the reason for the shift?
I'm also curious ๐
Okay, so I can't figure why it would "kill" the new experiments, I mean it should run them, but is there any "smart stopping" that causes it to kill he process before it ends ?
BTW: can this be reproduced with the clearml hydra example ?
Hi @<1634001100262608896:profile|LazyAlligator31>
Is this because the code repo is being recreated in this directory?
Yes this is correct ๐
Basically the entire code base + venv is installed there, to make sure it does not intyerfere with the "system" preinstalled environment
(it also allows for caching on the host machine ๐ )
Hi @<1661542579272945664:profile|SaltySpider22>
question 1: are parallel writes to a dataset with the same version possible?
When you are saying parallel what do you mean? from multiple machines ?
Whats the recommended way to append the dataset in a future version?
Once a dataset was finalized the only way to add files is to add another version that inherits from the previous one (i.e. the finalized version becomes the parent of the new version)
If you are worried about multip...
Iโd definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too
That makes sense, any chance you can open a github issue with feature request so that we do not forget ?
The current implementation will upload the result of the first component, and then the first thing the next component will do is download it.
If they are on the same machine, it should be cached when accessed the 2nd time
Wouldnโt it be more performant f...
Sorry, I mean a vault on the clearml-server holding the credentials per user, then agent pulls it based on the user, and it is transparent from the user perspective
I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
In the agent, no, it pipes stdout/stderr of the container and logs everything ๐
to get a json or something like that?
There is an api to get all the console logs, is this what you are after?
, but it seems like I can only trigger a task using a Task scheduler but not a pipeline.
@<1523701132025663488:profile|SlimyElephant79> Maybe we should better state it, but Pipeline is "just" another type of Task. so triggering a Task with the Pipeline ID is essentially triggering the pipeline (do notice you need to select the "services" queue to be used so that the pipeline runs on the correct resource). Make sense ?
OhTask.get_project_object().default_output_destination = None
This has no effect on the backend, meaning this does not actually change the value.from clearml.backend_api.session.client import APIClient c = APIClient() c.projects.update(project="<project_id_here>", default_output_destination="s3://")
btw: how/what it is used for in your workflow ?
In Windows settingย
system_site_packages
ย toย
true
ย allowed all stages in pipeline to start - but doesn't work in Lunux.
Notice that it will inherit from the system packages not the venv the agent is installed in
I've deleted tfrecords from master branch and commit the removal, and set the folder for tfrecords to be ignored in .gitignore. Trying to find, which changes are considered to be uncommited.
you can run git diff
it is essentially...
Yes it should
here is fastai example, just in case ๐
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
Hi StrangePelican34
What exactly I not working? Are you getting any TB reports?
Looking at theย
supervisor
ย method of the baseย
AutoScaler
ย class, where are the worker IDs kept.
Is it in the class attributeย
queues
ย ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right?
I think they send some "odd" character instead of CR, otherwise I cannot explain the difference.
Can you point to a toy example demonstrating the same issue ?
Also I just tried the pytorch-lightningย
RichProgressBar
ย (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
Yey!
Hi SarcasticSparrow10
The plots in the UI allow you to control the colors of the graphs interactively (click on the color in the legend), it also allows you you toggle the legend on/off. This is on purpose so you can later adjust according to your taste ๐
Is the layout okay (it was hard for me to understand form the screen-grab) ?
I'll make sure to reply the GitHub issue as well
ScantMoth28 where are you seeing this warning ?
so I didn't have much time to upgrade all the packs because I have some issues with that but it is on my todo list
No worries ๐
Quick question, if you run https://github.com/allegroai/trains/blob/master/examples/frameworks/keras/legacy/keras_tensorboard.py
Do you see models in the artifacts tab?
Hi MelancholyElk85
I think you are right, OutputModel is missing, remove
method.
Maybe we should have a class method on Model , something like:@classmethod Model.remove(model: Union[str, Model], delete_weights_file: bool, force: bool): # actually remove model and weights file
wdyt?
CrookedWalrus33 I'm testing with the latest RC on a local minio and this is what I'm getting:clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_3by281j8.tmp => 10.99.0.188:9000/bucket/debug/PyTorch MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Waiting to finish uploads clearml.Task - INFO - Completed model upload to
MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Finished uploading
e...
Task.connect is "automagic" i.e. to server when in Manual mode, from server in agent mode,
set_parameter is one way only and should be used to set an external Task's parameters.
Hmm, it is not returned, it is inside the function....
I saw documentation, but I can't make the proper dict object for hyperparams
I see, this is what you are after (I think)
https://github.com/allegroai/clearml/blob/fb644fe9ec6be36b8f2f70a34256fbdc593d663a/clearml/backend_api/services/v2_20/tasks.py#L3138