FYI:ssh -R 8080:localhost:8080 -R 8008:localhost:8008 -R 8081:localhost:8081 replace_with_username@ubuntu_ip_here
solved the issue š
at the end it's just another env var
It should work GIT_SSH_COMMAND
is used by pip
https://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console
Hmm try to set this one before spinning the agent
Windowsset PYTHONIOENCODING=:replace
Inside Colabos.environ["PYTHONIOENCODING"] = ":replace"
ConvolutedSealion94 try scikit
not scikitlearn
I think we should add a warning if a key is there and is being ignored... let me make sure of that
I just disabled all of them with
auto_connect_frameworks=False
Yep that also works
Looking at theĀ
supervisor
Ā method of the baseĀ
AutoScaler
Ā class, where are the worker IDs kept.
Is it in the class attributeĀ
queues
Ā ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
Hi JitteryCoyote63
The easiest is to inherit the ResourceMonitor class and change the default logging rate (you could also disable some of the metrics).
https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/task.py#L565
Then pass the new class to Task.init as auto_resource_monitoring
Hmm this is odd in deed, let me verify (thanks! @<1643060801088524288:profile|HarebrainedOstrich43> )
do you suggest to delete those first?
it might make it easier on the server (I think there is some bug there when it deleted many tasks it tries to parallelize the delete process, but fails to properly sync, anyhow this is fixed and will be pushed with the next clearml-server version)
but can it NOT use /tmp for this iām merging about 100GB
You mean to configure your Temp folder for when squashing ?
you can do hack the following:
` import tempfile
tempfile.tempdir = "/my/new/temp"
Dataset squash
tempfile.tempdir = None `But regradless I think this is worth a GitHub issue with feature request, to set the temp folder///
That seems like the k8s routing, can you try the web server curl?
This is what I just used:
` import os
from argparse import ArgumentParser
from tensorflow.keras import utils as np_utils
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Activation, Dense, Softmax
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from clearml import Task
parser = ArgumentParser()
parser.add_argument('--output-uri', type=str, required=False)
args =...
Hmm SuccessfulKoala55 what do you think?
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
ComfortableShark77 are you saying you need "transformers" in the serving container?CLEARML_EXTRA_PYTHON_PACKAGES: "transformers==x.y"
https://github.com/allegroai/clearml-serving/blob/6005e238cac6f7fa7406d7276a5662791ccc6c55/docker/docker-compose.yml#L97
PunySquid88 RC1 is out with a fix:pip install trains-agent==0.14.2rc1
with conda ?!
I think it was just pushed, including nested call you have to use the new argument for the decorator, helper_function
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L2392
I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "
See the last package in the package list:
- wget~=3.2
- trains~=0.14.1
- pybullet~=2.6.5
- gym-cartpole-swingup~=0.0.4
- //github.com/ajliu/pytorch_baselines
PunySquid88 do you want to test a fix?
BTW: seems like conda doesn't support git+git:// packages
How about switching to pip ? you can still run the entire thing from conda env, it will just use pip & venv to install everything, other than that it should work as expected.
This is why we recommend using pip and not conda ...
PunySquid88 after removing the "//gihub" package is it working ?
@<1545216070686609408:profile|EnthusiasticCow4>git+ssh://
will be converted automatically to git+https
if you have user/pass ocnfigured in your clearml.conf on the agent machine.
More over, git packages are always installed After all other packages are installed (because pip cannot resolve the requirements inside the git repo in time)
understood trains does not have auto versioning
What do you mean auto versioning ?
task name is not unique, task ID is unique, you can have multiple tasks with the same name and you can edit the name post execution
EnviousStarfish54 could you send the conda / pip environment?
Maybe that's the diff between machine A/B ?
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
Hi @<1547028116780617728:profile|TimelyRabbit96>
Trying to do model inference on a video, so first step in
Preprocess
class is to extract frames.
Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
[None](https://github.com/allegro...
Are you saying this component should pull a specific git repo?PipelineDecorator.component( ..., )
seems like there is no reference to a specific repo (arguments repo
and repo_branch
etc are missing) is that correct?
Hi ObedientDolphin41
I keep bumping against the
ModuleNotFoundError: No module named
exception.
Import the package inside the component function (the one you decorated), it will make sure it lists it in the requirements section automatically.
You can also set it manually by passing it to as the "packages" argument on the decorator function: