Reputation
Badges 1
25 × Eureka!send the agent's logs to log management and monitoring service,
These are stored into ELK, it was built to store large amounts of logs, I cannot see any reason why one would want to remove it?
Maybe if there would be a way to change their format, it could also help filtering them from my side.
You mean in the UI?
PompousBeetle71 so in one project the experiment works as expected, while in the other it fails on credentials ? both running on the same trains-agent machine ?
Hi MysteriousBee56 , do you have Trains installed from the git?
Another question, you mentioned "it breaks my execution", I'm assuming you mean trains-agent?!
If that is the case, there is a fix for trains-agent install 0.15.2rc0
YummyFish22 can you point to the huggingface example you are using?
Hi PanickyMoth78
Yes i think you are correct, this looks like gs throttling your connection. You can control the number of concurrent uploads with max_worker=1
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/datasets/dataset.py#L604
Let me know if it works
so 78000 entries ...
wow a lot! would it makes sens to do 1G chunks ? any reason for the initial 1Mb chunk size ?
Is there any references (vlog/blog) on deploying real-time model and do the continuous training pipeline in clear-ml?
Something along the lines of this one ?
https://clear.ml/blog/creating-a-fully-automatic-retraining-loop-using-clearml-data/
Or this one?
https://www.youtube.com/watch?v=uNB6FKIi8Wg
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study
If this is how the repo links look like, do not set anything in the clearml.conf
It "should" use the ssh for the ssh links, and http for the http links.
Hmm check if this one works:optimizer._get_child_tasks_ids( parent_task_id=optimizer._job_parent_id or optimizer._base_task_id, order_by=optimizer._objective_metric._get_last_metrics_encode_field(), additional_filters={'page_size': int(top_k), 'page': 0})
If it does, let's PR it as a dedicated function
AverageBee39 I cannot reproduce it π (at least on the latest from Github)
I'm assuming the pipeline is created with target_project
, anything else I need to add?
you can also get it flattened with:task.get_parameters()
Type in both cases is string
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
I'm pretty sure there is a nice way, let me check soemthing
Hmm so if I understand what's going on, convert_test.py
needs to have the test.json
, since it creates the test.json but it does not call git add
on it, the test.json will not be part of the git diff
hence missing when executing remotely by the agent.
If test.json is relatively small (i.e. not 10s of MB) you could store it as configuration on the Task. for example:
` local_copy_of_test_json = task.connect_configuration('/path/to/test.json', name='test config')
print(...
is "my_package" a local package ?
what is the output of:pip freeze | grep my_package
PompousBeetle71 , basically reset experiment will clear all the outputs, and input model model is well, input, it is not cleared. In the next execution it will be overridden. There is actually a way to change it from the UI, and override the initial model weights.
MysteriousBee56 , The agent is not running on the "server" it's running on its machine.
The server just reflects the fact he agent is up..
To actually take it down you need to SSH (or connect to that machine) and stop the actual trains-agent process.
What is exactly the scenario you had in mind?
Hi CloudyHamster42
how do i have the trains-agent install myΒ
requirements.txt
Β file from my repo when creating the environment?
BTW if you clear all "the installed packages", then trains-agent
will user requirements.txt and update back all the packages in the UI
Please hit Ctrl-F5 refresh the entire page, see if it is till empty....
Hi UnsightlyShark53 apologies for this delayed reply, slack doesn't alert users unless you add @ , so things sometimes get lost :(
I think you pointed at the correct culprit...
Did you manage to overcome the circular include?
BTW , how could I reproduce it? It will be nice if we could solve it
Hi MassiveBat21
CLEARML_AGENT_GIT_USER is actually git personal token
The easiest is to have a read only user/token for all the projects.
Another option is to use the ClearML vault (unfortunately not part of the open source) to automatically take these configuration on a per user basis.
wdyt?
AstonishingSeaturtle47 I think there's a workaround for the GitHub multiple repo issue. See https://gist.github.com/gubatron/d96594d982c5043be6d4
Hi @<1523704157695905792:profile|VivaciousBadger56>
You should replace
task.mark_completed()
with:
task.close()
To your point
parameters = task.connect(parameters)
Will be retrieved with:
task.get_parameters()
fyi:
connect_configuration -> get_configuration_objects
I have a question regarding running the code on the remote machine, each time I run the code I see the console in the ClearML server start downloading all the libraries I used in the code and when I run another code the same thing happens so why it has to download all the libraries again and many times?
I'm assuming you are referring to the installation, the downloaded python packages are cached.
You can turn on full caching by uncommenting the following line:
https://github.com/alleg...
basically the idea is you do not need to configure the Experiment manually, it is created when you actually develop the code / run/debug it, or you have the CLI taking everything from your machine and populating it
Is there any documentation on versioning for Datasets?
You mean how to select the version name ?
Hi PanickyMoth78
My local
clearml.conf
file has agent's
git_user
and
git_pass
defined as in my
in order for the autoscaler to access your git , in the wizard you have to provide the git user/token
The component agent's log has:
Executing task id [90de043e354b4b28a84d5cc0788fe63c]: repository = branch = version_num =
Hmm, how does the decorator of the component looks like ? meaning did you specify a repo/branch/commi...