Reputation
Badges 1
25 × Eureka!Hi BattyLion34
No problem asking here π
Check your ~/clearml.conf or ~/trains.conf :
There is a section names api, under it you will find the definition of your trains-server π
What are you seeing?
So I see this in the build, which means it works , and compiles, what is missing ?
` Building wheels for collected packages: leap
Building wheel for leap (setup.py) ... [?25l- \ |
1667848450770 UH-LPT371:0 DEBUG / - \ | / - done
[?25h Created wheel for leap: filename=leap-0.4.1-cp38-cp38-linux_x86_64.whl size=1052746 sha256=1dcffa8da97522b2611f7b3e18ef4847f8938610180132a75fd9369f7cbcf0b6
Stored in directory: /root/.cache/pip/wheels/b4/0c/2c/37102da47f10c22620075914c8bb4a9a2b1f858263021...
CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs
Hi @<1523701949617147904:profile|PricklyRaven28>
Sorry, we missed that one
we need to invoke it with
accelerate launch
so we use
subprocess.run
So you have two options, either you change the script entry of the Task from your " script.py " to" -m accelerate launch script.py
or you manually do that inside your entry point (i.e. call accelerate launch)
BTW, I "think" we added an "auto detect" for it, so that if you launched it manually this wa...
Hi @<1726047624538099712:profile|WorriedSwan6>
On a different issue, have you any solution on how to make the agent listen to multiply queues?
each agent is connected with one type of queue that represents the Job that agent will create. You can connect to it multiple queues, and it will pull from creating the same "type" of job regardless of where it's coming from. If you want another job to be created, just spin another agent, there is no limit to the number of agents you can spin ...
Hi @<1544128915683938304:profile|DepravedBee6>
You mean like backup the entire instance and restore it on another machine? Or are you referring to specific data you want to migrate?
BTW if you are upgrading old versions of the server I would recommend upgrading to every version in the middle (there are some migration scripts that need to be run in a few of them)
Yeah we should definitely have get_requirements π
Yes, I think the API is probably the easiest:from clearml.backend_api.session.client import APIClient client = APIClient() project_list = client.projects.get_all() print(project_list)
JitteryCoyote63 Great to hear π
BTW:
Would it be possible to extendΒ
Task.init
Β with aΒ
force_reuse
Β that would enforce reusing these tasks
You can pass continue_last_task=True I think it should be equivalent to what you suggest
Oh i get it now, can you test:git ls-remote --get-url githuband thengit ls-remote --get-url
I'm sorry wrong line reference:
I'm assuming the error is due to ulimit missing:
try adding 16777216 to both soft/hard ulimit
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L58
Hi @<1542316991337992192:profile|AverageMoth57>
is this a follow up of this thread? None
yes, I do, I added a
auxiliary_cfg
and I saw it immediately both in CLI and in the web ui
How many Tasks do you see in the UI in DevOps project with the system Tag SERVING-CONTROL-PLANE ?
TBH our Preprocess class has an import in it that points to a file that is not part of the preprocess.py so I have no idea how you think this can work.
ConvolutedSealion94 actually you can add an entire folder as preprocessing, including multiple files
See example des...
Hi ZanyPig66
I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.
The auto TB logging should work out of the box, where is it failing ?
Also,task = Task.current_task()Why aren't you using Task.init in the original script?
The idea is that you run your code on your machine (where the environment works), ClearML auto detects code + python packages + args etc.
Then you clone it in the UI and launch it on a remote machine.
What am I missing ...
I wonder if the try/except approach would work for XGboost load, could we just try a few classes one after the other?
the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187
How can I add additional information, e.g. debug samples, or scalar to the data to be shown in the UI?Β Logger.current_logger() is not working
Yes π
dataset.get_logger() to the rescue
Now, when I add delta to calculate the variation of this: error: bad_data: 1:110: parse error: ranges only allowed for vector selectors
This means your avg is already a scalar (i.e. not a vector) which means you can (as you said) have the alert based on that
LOL, great minds and so on π
Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.
can we use a currently setup virtualenv by any chance?
You mean, if the cleamrl-agent needs to setup a new venv each time? are you running in docker mode ?
(by default it is caching the venv so the second time it is using a precached full venv, installing nothing)
"warm" as you do not need to sync it with the dataset, every time you access the dataset, clearml will make sure it is there in the cache, when you switch to a new dataset the new dataset will be cached. make sense?
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?