Follow-up; any ideas how to avoid PEP 517 with the auto scaler?
Takes a
long
time to build the wheels
enable venv caching ?
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L116
Nice ! 🙂
btw: clone=True
means creating a copy of the running Task, but basically there is no need for that , with clone=False, it will stop the running process, and launch it on the remote host, logging everything on the original Task.
Can you verify by adding the the following to your extra_docker_shell_script:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L152extra_docker_shell_script: ["echo machine example.com > ~/.netrc", "echo login MY_USERNAME >> ~/.netrc", "echo password MY_PASSWORD >> ~/.netrc"]
DeterminedToad86
Yes I think this is the issue, on SageMaker a specific compiled version of torchvision was installed (probably part of the image)
Edit the Task (before enqueuing) and change the torchvision URL to:torchvision==0.7.0
Let me know if it worked
Regrading resetting it via code, if you need I can write a few lines for you to do that , although that might be a bit hacky.
Maybe we should just add a flag saying, use requirements.txt ?
What do you think?
CloudyHamster42
RC probably in a few days, but notice that it will just remove the warnings, I still can't reproduce the double axis issue.
It will be helpful if you could send a small script to reproduce the problem.
Maybe this example code can help ? https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py
BTW: Basically just call Task.init(...)
the rest is magic 🙂
Does a pipeline step behave differently?
Are you disabling it in the pipeline step ?
(disabling it for the pipeline Task has no effect on the pipeline steps themselves)
Hi @<1523701304709353472:profile|OddShrimp85>
there anywhere I could get a charr that can work with lower version of k8s? Or any other methods?
I think the solution is to install it manually from the helm chart (basically take it out and build a Job YAML, wdyt?
So if you set it, then all nodes will be provisioned with the same execution script.
This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?
Yes, but only with git clone 🙂
It is not stored on ClearML, this way you can work with the experiment manager without explicitly giving away all your code 😉
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()
You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
compression=ZIP_DEFLATED if compression is None else compression
wdyt?
So obviously that is the problem
Correct.
ShaggyHare67 how come the "installed packages" are now empty ?
They should be automatically filled when executing locally?!
Any chance someone mistakenly deleted them?
Regrading the python environment, trains-agent
is creating a new clean venv for every experiment, if you need you can set in your trains.conf
:agent.package_manager.system_site_packages: true
https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c67...
AstonishingSeaturtle47 , makes sense?
RattySeagull0 I think you are correct, python 3.6 is the installed inside the docker. Is it important to have 3.7 ? You might need another docker (or change the installation script and install python 3.7 inside)
Yep it should :)
I assume you add the previous iteration somewhere else, and this is the cause for the issue?
task = Task.init(...) if task.running_locally(): # wait for the repo detection and requirements update task._wait_for_repo_detection() # reset requirements task._update_requirements(None)
🙂
Hi @<1727497172041076736:profile|TightSheep99>
Yes it can, it will upload the meta-data as well as the files (it will also do de-dup and will not upload files that already exist in the dataset based on the hash of teh file content)
task.connect(model_config)
task.connect(DataAugConfig)
If these are separate dictionaries , you should probably use two sections:
task.connect(model_config, name="model config")
task.connect(DataAugConfig, name="data aug")
It is still getting stuck.
I notice that one of the scalars that gets logged early is logging the epoch while the remaining scalars seem to be iterations because the iteration value is 1355 instead of 26
wait so you are seeing Some scalars ?...
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way 🙂
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65
feature is however available in the Enterprise Version as HyperDatasets. Am i correct?
Correct
BTW you could do:datasets_used = dict(dataset_id="83cfb45cfcbb4a8293ed9f14a2c562c0") task.connect(datasets_used, name='datasets') from clearml import Dataset dataset_path = Dataset.get(dataset_id=datasets_used['dataset_id']).get_local_copy()
This will ensure that not only you have a new section called "datasets" on the Task's configuration, buy tou will also be able to replace the datase...
see here the docker_setup_bash_script
argument
None
It will be executed (no need for the #!/bin/bash
btw) before starting to setup the env inside the container, so apt-get and the like can be executed if needed. Notice that if this is something that Always needs to be executed, you can put the same list of commands here: [None](https://github.com/allegroai/clearml-agen...
EnviousStarfish54
plt.show will capture the figure, that if you call it multiple times, it will add a running number to the figure itself (because the figure might change, and you might want the history)
if you call plt.imshow, it's the equivalent of debug image, hence it will be shown in the debug-samples tab, as an image.
Make sense ?
SteadyFox10 With pleasure 🙂
BTW: you can retrieve the Task id from its name withTask.get_tasks(project_name='my project', task_name='my task name')
See https://allegro.ai/docs/task.html?highlight=get_tasks#trains.task.Task.get_tasks
What's the Windows version, python version, clearml version, you are using ?
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
I have timeseries dataset with dimension 1,60,1 which the first dimension is number of data, the second one is timestep
I think it should be --input-size 1 60 ` if the last dimension is the batch size?
(BTW: this goes directly to Triton configuration, it is the information Triton needs in order to run the model itself)
TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?