Reputation
Badges 1
25 × Eureka!CLI? Code ?
In Azure VMSS, there is a method called "Custom Data", which is basically a way of passing things to be executed
I know that it is in the to do list to add "azure_autoscaler" which is basically asybling to the aws_autoscaler.
With the same idea of the "custom data" as initial bash script:
You can check here:
https://github.com/allegroai/clearml/blob/4a2099b53c09d1feaf0e079092c9e075b43df7d2/clearml/automation/aws_auto_scaler.py#L54
Hey WickedGoat98
I found the bug, it is due to the fact the numpy (passed to plotly) contains both datetime and nan, and plotly.js does not like it. I'll make sure this is fixed, in the meantime you can just remove the first row (it contains the nan):df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:]
we run in containers without venv, in the main section, and then delete it or use it for similar experimentsIf this is the case then the idea is the venv creation is actually cached, you can turn it on here (unmark the line)
https://github.com/allegroai/clearml-agent/blob/51eb0a713cc78bd35ca15ed9440ddc92ffe7f37c/docs/clearml.conf#L116
What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running,
Wait I think I found it, since usuallyu the case with hydra you configure everything from overrides / config, when launched remotely it looks at it by default. But with the launch plugin it should be overwritten with the Task
` task = Task.init(...)
task.set_parameter(name="Hydra/_allow_omegaconf_ed...
Hi @<1691620877822595072:profile|FlutteringMouse14>
Yes, feast has been integrated by at least a couple if I remember correctly.
Basically there are two ways offline and online feature transformation. For offline your pipeline is exactly what would be recommended. The main difference is online transformation where I think feast is a great start
OutrageousGiraffe8 this sounds like a bug, how can we reproduce it?
Maybe a add another layer here?
https://github.com/allegroai/clearml/blob/a47f127679ebf5912690f7c3e60791a2daa5c984/examples/frameworks/tensorflow/tensorflow_mnist.py#L40
Using agent v1.01r1 in k8s glue.
I think a fix was recently committed, let me check it
Hi @<1598487094601191424:profile|MysteriousCow84>
only one of them uses an already created venv from cache for this task. And the other node starts to re-create the same virtual environment.
Just be clear, the second one is running, but it does not use the same venv as the other one (that is running in parallel), is that correct?
Hi @<1561885941545570304:profile|PunyKangaroo87>
What do mean by store data locally?
Like clearml-data? I.e Dataset?
You can always use file:///root/path/folder as destination, this will store everything into the local folder, is that it?
Okay, this is odd the request returned exactly 100 out 100.
It seems not all of them were reported?!
Could you post the toy code, I'll check what's going on.
although ideally i'd like to tell it exactly where to unzip it.
Ohh you can use .get_local_mutable_copy()
It will unzip it to specific folder
Hi @<1545216070686609408:profile|EnthusiasticCow4> let me know if this one solves the issue
pip install clearml==1.14.2rc0
You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?
yes then the 100 limit is per "Epoch001" and another 100 limit for "Epoch002" etc. π
Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
Yes that should work, only thing is you need to call Task init on the master process (and make sure you call Task.current_task() on the subprocesses, if you want to automagic to kick in, that said, usually there is no need, they are supposed to report everything back to the main one anyhow
basically
` @call_parse
def main(
Β Β gpus:Param("The GPUs to use for distributed training", str)='all',
Β Β script:Param("Script to run", str, opt=False)='',
Β Β args:Param("Args to pass to script", nargs=...
CleanWhale17 per your request :)
An automated ML Pipeline π Automated Data Source Integration π Data Pooling and Web Interface for Manual Annotation of Images(Seg. / Classif) [Allegro Enterprise] or users integrate with open-source Storage of Annotation output files(versioned JSON) π Online-Training Β Support(for Dataset Shifts) [Not Sure what you mean] Data Pre-processessing (filter/augment) [Allegro Enterprise] or users integrate with open-source Data-set visualization(stats...
Right, so this "vault" design is built into the paid tiers of ClearML to achieve exactly that. Long story short, users can put their credentials/configs on the clearml-server and the agent (or the clients) will pull and merge them into the execution.
It's very cool and works really nice, but not part of the open source (or the SaaS tier).
What you could do is store these configurations on the Task itself (one way o r another). Maybe for example have an empty definitions.py file part of ...
Basically the links to the file server are saved in both mongo and elastic, so as long as these are host:ip based, at least in theory it should work
Hmm interesting, will pass it along to FE π 3. That is nice! I wonder if this is built into the graph library
GiddyTurkey39 do you mean to delete them from the server?
yea the api server configuration also went away
okay that proves it
somehow set docker_args and docker_bash_setup_script equivalent??task.set_base_docker(...)# somehow setup repo and branch to download to remote instance before runningThis is automatically detected based on your local commit/branch as well ass uncommitted changes
And the agent is in docker mode or venv mode?
(you can find it in the pipeline component page)
Sure GiddyTurkey39 , Checkout the cleanup service:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
Hi @<1551376687504035840:profile|StraightSealion9>
AWS Autoscaler to create a new instance when you enqueue a task to the relevant queue.
Does that mean that you were able to enqueue a Task and have it launch on the remote EC2 machine ?