Reputation
Badges 1
25 × Eureka!Is it not possible to serve a model with preprocessing pipeline from scikit-learn using clearml-serving?
of course it is, did you first try the example , here: None
If you need to run your own LogisticRegression
call you can use this example:
None
Notice this is where the custom endpoint actually calls the prediction: [None](https...
RoundMosquito25 are you using clearml-agent daemon --stop
or are you killing them ?
killing them basically means you loose them in the UI when they timeout, the backend does not see them for 10min so it assumes they died, when you call clearml-agent --stop they will unregister themselves and disappear immortally
@<1615519322766053376:profile|DrainedOctopus19> if your code is a single file (which was stored on the clearml server), then ity is stored on the Task:
task = Task.get_task("task UID here")
# this should be your entire code
print(task.data.script.diff)
Hi BroadMole98
What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen.Β How does Allegro handle DAGgy workflows?
Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I cr...
If I checkout/download dataset D on a new machine, it will have to download/extract 15GB worth of data instead of 3GB, right? At least I cannot imagine how you would extract the 3GB of individual files out of zip archives on S3.
Yes, I'm not sure there is an interface to extract only partial files from the zip (although worth checking).
I also remember there is a GitHub issue with uploading 50GB dataset, and the bottom line is, we should support setting chuck size, so that we can uploa...
Hi EagerOtter28
Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: ...
Use Dataset.sync (or clearml-data sync) to check which files where changed/added.
All files are already hashed, right? I wonder whyΒ
clearml-data
Β does not keep files in a semi-flat hierarchy and groups them together to datasets?
It kind of does, it has a full listing of all the files with their hash (SHA2) values, ...
Hi ConvincingSwan15
A few background questions:
Where is the code that we want to optimize? Do you already have a Task of that code executed?
"find my learning script"
Could you elaborate ? is this connect to the first question ?
Hi @<1552101447716311040:profile|SteadySeahorse58>
ValueError: Could not find queue named "services"
Did you set an agent / auto-scaler ? where is the pipeline and its components will be running ?
It seems like there is no way to define that a Task requires docker support from an agent, right?
Correct, basically the idea is you either have workers working in venv mode or docker.
If you have a mixture of the two, then you can have the venv agents pulling from one queue (say default_venv) and the docker mode agents pulling from a different queue (say default_docker). This way you always know what you are getting when you enqueue your Task
one of the two experiments for the worker that is running both experiments
So this is the actual bug ? I need some more info on that, what exactly are you seeing?
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
JitteryCoyote63
I am setting up a new machine with two rtx 3070 GPU
Nice! you are one of the lucky few who managed to buy them π
Which makes me think that the wrong torch package is installed
I think that torch 1.3.1 is does not support cuda 11 π
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
Well (yes, I think), the environment section is used mostly for logging, the next version will have full support by the clearml-agent (due next week), and the next release of clearml-server will add basj-script support.
Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?
Is it not possible to say just look at my requirements.txt file and the imports in the script?
I think there is a GitHub Issue for this feature
(basically the issue is, requirements.txt are very often not updated, and have no real version lock, so replicating a working env is always safer)
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?
CleanWhale17 per your request :)
An automated ML Pipeline π Automated Data Source Integration π Data Pooling and Web Interface for Manual Annotation of Images(Seg. / Classif) [Allegro Enterprise] or users integrate with open-source Storage of Annotation output files(versioned JSON) π Online-Training Β Support(for Dataset Shifts) [Not Sure what you mean] Data Pre-processessing (filter/augment) [Allegro Enterprise] or users integrate with open-source Data-set visualization(stats...
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
@<1546303254386708480:profile|DisgustedBear75> is think this was a UI bug, they are just releasing a new version that fixes that (i.e. server version), are you running a self-hosted server?
time-based, dataset creation, model publish (tag),
Anything you think is missing ?
Hi ZippySheep23
Any ideas what might be happening?
I think you passed the upload limit (2.36 GB) π
Hi JitteryCoyote63
Yes I think you are correct, since torch is installed automatically as a requirement by pip, the agent is not aware of it, so it cannot download the correct one.
I think the easiest is just to add the torch as additional package# call before Task.init() Task.add_requirements(package_name="torch", package_version="==1.7.1")
Hi WackyRabbit7 ,
Yes we had the same experience with kaggle competitions. We ended up having a flag that skipped the task init :(
Introducing offline mode is on the to do list, but to be honest it is there for a while. The thing is, since the Task object actually interacts with the backend, creating an offline mode means simulation of the backend response. I'm open to hacking suggestions though :)
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...